英文:
What is the difference between head(10) and [:10] in pyhton?
问题
df['Runtime'].value_counts().sort_values(ascending = False)[:10]
我是新手。导师说我将选择前十个,然后他写了以下代码。
我尝试了这段代码 df['Runtime'].value_counts().sort_values(ascending = False)[:10]
它给了我不同的结果。这两个不同表达式之间的根本区别是什么?
df['Runtime'].value_counts().sort_values(ascending = False)[:10]
90.0 971
95.0 489
92.0 434
93.0 422
85.0 408
...
19.0 8
32.0 8
9.0 8
7.0 8
10.0 8
Name: Runtime, Length: 157, dtype: int64
df['Runtime'].value_counts().sort_values(ascending = False).head(10)
90.0 971
95.0 489
92.0 434
93.0 422
85.0 408
89.0 407
88.0 406
100.0 402
91.0 394
94.0 383
Name: Runtime, dtype: int64
英文:
I am newbie. Instructor said ı am going to select first ten then he write the code below.
df['Runtime'].value_counts().sort_values(ascending = False)[:10]
I tried the code df['Runtime'].value_counts().sort_values(ascending = False)[:10]
it gives me different result. What's the fundamental difference between the two different expressions?
df['Runtime'].value_counts().sort_values(ascending = False)[:10]
90.0 971
95.0 489
92.0 434
93.0 422
85.0 408
...
19.0 8
32.0 8
9.0 8
7.0 8
10.0 8
Name: Runtime, Length: 157, dtype: int64
.
df['Runtime'].value_counts().sort_values(ascending = False).head(10)
90.0 971
95.0 489
92.0 434
93.0 422
85.0 408
89.0 407
88.0 406
100.0 402
91.0 394
94.0 383
Name: Runtime, dtype: int64
答案1
得分: 1
因为您的DataFrame中的Runtime
列是float数据类型,所以value_counts()
的结果是一个带有Float64Index
类型索引的Series。
在具有Float64Index
的Series上执行[ ]
样式的索引行为是古怪的。请参阅这个GitHub问题讨论:
总结一下,在所有索引类型中,使用整数在
[]
(__getitem__
)中是位置索引(类似于iloc
),但对于Float64Index
,这是一个特殊情况。
基本上,使用浮点索引,[:10]
的解释不是“前十个值”,而是“直到索引值为10.0的所有值”。
因此,根据您的数据,这就是为什么您会看到df['Runtime'].value_counts().sort_values(ascending = False)[:10]
返回157行,最后一行的索引值为10.0。
要获得您想要的前十个值,可以执行以下操作:
df['Runtime'].value_counts().sort_values(ascending = False).iloc[:10]
或者等效地:
df['Runtime'].value_counts().sort_values(ascending = False).head(10)
查看源代码显示.head(n)
返回.iloc[:n]
的结果。
如何重现:
我们无法访问您的数据,因此这是一个我创建的快速脚本,用于生成一个足够相似的DataFrame以重现此问题。
import numpy as np
import pandas as pd
np.random.seed(5678)
vals = np.random.randint(0, 50, size=1000).astype(float)
df = pd.DataFrame(vals, columns=['Runtime'])
s = df['Runtime'].value_counts().sort_values(ascending = False)
s[:10] # 产生32行,最后一行索引为10.0
s.iloc[:10] # 前10行
s.head(10) # 也是前10行
英文:
Why the surprising results?
Because the Runtime
column in your DataFrame is of float datatype, the result of value_counts()
is a Series with an index of type Float64Index
.
The behaviour of [ ]
style indexing on a series with a Float64Index
is quirky. See this GitHub issue discussion:
> To summarize in a specific way [...] for all index types, using integers in []
(__getitem__
) is positional (like iloc
), except for Float64Index
, making this a special case.
Essentially, with a float index, [:10]
is interpreted not as "the first ten values" but as "all the values until the one where the index value is 10.0".
With your data, that's why you see df['Runtime'].value_counts().sort_values(ascending = False)[:10]
giving 157 rows, with the last row having an index value of 10.0.
Getting the behaviour you want
To get the first ten values like you want, you can do:
df['Runtime'].value_counts().sort_values(ascending = False).iloc[:10]
or equivalently:
df['Runtime'].value_counts().sort_values(ascending = False).head(10)
(Looking at the source code shows that .head(n)
returns the result of .iloc[:n]
.)
How to reproduce
We don't have access to your data, so here's a quick script I made to produce a similar enough DataFrame to reproduce the issue.
import numpy as np
import pandas as pd
np.random.seed(5678)
vals = np.random.randint(0, 50, size=1000).astype(float)
df = pd.DataFrame(vals, columns=['Runtime'])
s = df['Runtime'].value_counts().sort_values(ascending = False)
s[:10] # produces 32 rows, the last one with index 10.0
s.iloc[:10] # the first 10 rows
s.head(10) # also the first 10 rows
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论