head(10)和[:10]在Python中有什么区别?

huangapple go评论96阅读模式
英文:

What is the difference between head(10) and [:10] in pyhton?

问题

df['Runtime'].value_counts().sort_values(ascending = False)[:10]

我是新手。导师说我将选择前十个,然后他写了以下代码。

我尝试了这段代码 df['Runtime'].value_counts().sort_values(ascending = False)[:10] 它给了我不同的结果。这两个不同表达式之间的根本区别是什么?

df['Runtime'].value_counts().sort_values(ascending = False)[:10]
90.0    971
95.0    489
92.0    434
93.0    422
85.0    408
...
19.0      8
32.0      8
9.0       8
7.0       8
10.0      8
Name: Runtime, Length: 157, dtype: int64
df['Runtime'].value_counts().sort_values(ascending = False).head(10)
90.0     971
95.0     489
92.0     434
93.0     422
85.0     408
89.0     407
88.0     406
100.0    402
91.0     394
94.0     383
Name: Runtime, dtype: int64
英文:

I am newbie. Instructor said ı am going to select first ten then he write the code below.

df['Runtime'].value_counts().sort_values(ascending = False)[:10]

I tried the code df['Runtime'].value_counts().sort_values(ascending = False)[:10] it gives me different result. What's the fundamental difference between the two different expressions?

df['Runtime'].value_counts().sort_values(ascending = False)[:10]
90.0    971
95.0    489
92.0    434
93.0    422
85.0    408
       ... 
19.0      8
32.0      8
9.0       8
7.0       8
10.0      8
Name: Runtime, Length: 157, dtype: int64

.
df['Runtime'].value_counts().sort_values(ascending = False).head(10)
90.0     971
95.0     489
92.0     434
93.0     422
85.0     408
89.0     407
88.0     406
100.0    402
91.0     394
94.0     383
Name: Runtime, dtype: int64

答案1

得分: 1

因为您的DataFrame中的Runtime列是float数据类型,所以value_counts()的结果是一个带有Float64Index类型索引的Series。

在具有Float64Index的Series上执行[ ]样式的索引行为是古怪的。请参阅这个GitHub问题讨论:

总结一下,在所有索引类型中,使用整数在[]__getitem__)中是位置索引(类似于iloc),但对于Float64Index,这是一个特殊情况。

基本上,使用浮点索引,[:10]的解释不是“前十个值”,而是“直到索引值为10.0的所有值”。

因此,根据您的数据,这就是为什么您会看到df['Runtime'].value_counts().sort_values(ascending = False)[:10]返回157行,最后一行的索引值为10.0。

要获得您想要的前十个值,可以执行以下操作:

df['Runtime'].value_counts().sort_values(ascending = False).iloc[:10]

或者等效地:

df['Runtime'].value_counts().sort_values(ascending = False).head(10)

查看源代码显示.head(n)返回.iloc[:n]的结果。

如何重现:

我们无法访问您的数据,因此这是一个我创建的快速脚本,用于生成一个足够相似的DataFrame以重现此问题。

import numpy as np
import pandas as pd
np.random.seed(5678)
vals = np.random.randint(0, 50, size=1000).astype(float)
df = pd.DataFrame(vals, columns=['Runtime'])

s = df['Runtime'].value_counts().sort_values(ascending = False)
s[:10]   # 产生32行,最后一行索引为10.0
s.iloc[:10]  # 前10行
s.head(10)   # 也是前10行
英文:

Why the surprising results?

Because the Runtime column in your DataFrame is of float datatype, the result of value_counts() is a Series with an index of type Float64Index.

The behaviour of [ ] style indexing on a series with a Float64Index is quirky. See this GitHub issue discussion:

> To summarize in a specific way [...] for all index types, using integers in [] (__getitem__) is positional (like iloc), except for Float64Index, making this a special case.

Essentially, with a float index, [:10] is interpreted not as "the first ten values" but as "all the values until the one where the index value is 10.0".

With your data, that's why you see df['Runtime'].value_counts().sort_values(ascending = False)[:10] giving 157 rows, with the last row having an index value of 10.0.


Getting the behaviour you want

To get the first ten values like you want, you can do:

df['Runtime'].value_counts().sort_values(ascending = False).iloc[:10]

or equivalently:

df['Runtime'].value_counts().sort_values(ascending = False).head(10)

(Looking at the source code shows that .head(n) returns the result of .iloc[:n].)


How to reproduce

We don't have access to your data, so here's a quick script I made to produce a similar enough DataFrame to reproduce the issue.

import numpy as np
import pandas as pd
np.random.seed(5678)
vals = np.random.randint(0, 50, size=1000).astype(float)
df = pd.DataFrame(vals, columns=['Runtime'])

s = df['Runtime'].value_counts().sort_values(ascending = False)
s[:10]   # produces 32 rows, the last one with index 10.0
s.iloc[:10]  # the first 10 rows
s.head(10)   # also the first 10 rows

huangapple
  • 本文由 发表于 2023年7月3日 22:34:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76605753.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定