head(10)和[:10]在Python中有什么区别?

huangapple go评论125阅读模式
英文:

What is the difference between head(10) and [:10] in pyhton?

问题

df['Runtime'].value_counts().sort_values(ascending = False)[:10]

我是新手。导师说我将选择前十个,然后他写了以下代码。

我尝试了这段代码 df['Runtime'].value_counts().sort_values(ascending = False)[:10] 它给了我不同的结果。这两个不同表达式之间的根本区别是什么?

  1. df['Runtime'].value_counts().sort_values(ascending = False)[:10]
  2. 90.0 971
  3. 95.0 489
  4. 92.0 434
  5. 93.0 422
  6. 85.0 408
  7. ...
  8. 19.0 8
  9. 32.0 8
  10. 9.0 8
  11. 7.0 8
  12. 10.0 8
  13. Name: Runtime, Length: 157, dtype: int64
  1. df['Runtime'].value_counts().sort_values(ascending = False).head(10)
  2. 90.0 971
  3. 95.0 489
  4. 92.0 434
  5. 93.0 422
  6. 85.0 408
  7. 89.0 407
  8. 88.0 406
  9. 100.0 402
  10. 91.0 394
  11. 94.0 383
  12. Name: Runtime, dtype: int64
英文:

I am newbie. Instructor said ı am going to select first ten then he write the code below.

df['Runtime'].value_counts().sort_values(ascending = False)[:10]

I tried the code df['Runtime'].value_counts().sort_values(ascending = False)[:10] it gives me different result. What's the fundamental difference between the two different expressions?

  1. df['Runtime'].value_counts().sort_values(ascending = False)[:10]
  2. 90.0 971
  3. 95.0 489
  4. 92.0 434
  5. 93.0 422
  6. 85.0 408
  7. ...
  8. 19.0 8
  9. 32.0 8
  10. 9.0 8
  11. 7.0 8
  12. 10.0 8
  13. Name: Runtime, Length: 157, dtype: int64
  14. .
  15. df['Runtime'].value_counts().sort_values(ascending = False).head(10)
  16. 90.0 971
  17. 95.0 489
  18. 92.0 434
  19. 93.0 422
  20. 85.0 408
  21. 89.0 407
  22. 88.0 406
  23. 100.0 402
  24. 91.0 394
  25. 94.0 383
  26. Name: Runtime, dtype: int64

答案1

得分: 1

因为您的DataFrame中的Runtime列是float数据类型,所以value_counts()的结果是一个带有Float64Index类型索引的Series。

在具有Float64Index的Series上执行[ ]样式的索引行为是古怪的。请参阅这个GitHub问题讨论:

总结一下,在所有索引类型中,使用整数在[]__getitem__)中是位置索引(类似于iloc),但对于Float64Index,这是一个特殊情况。

基本上,使用浮点索引,[:10]的解释不是“前十个值”,而是“直到索引值为10.0的所有值”。

因此,根据您的数据,这就是为什么您会看到df['Runtime'].value_counts().sort_values(ascending = False)[:10]返回157行,最后一行的索引值为10.0。

要获得您想要的前十个值,可以执行以下操作:

df['Runtime'].value_counts().sort_values(ascending = False).iloc[:10]

或者等效地:

df['Runtime'].value_counts().sort_values(ascending = False).head(10)

查看源代码显示.head(n)返回.iloc[:n]的结果。

如何重现:

我们无法访问您的数据,因此这是一个我创建的快速脚本,用于生成一个足够相似的DataFrame以重现此问题。

  1. import numpy as np
  2. import pandas as pd
  3. np.random.seed(5678)
  4. vals = np.random.randint(0, 50, size=1000).astype(float)
  5. df = pd.DataFrame(vals, columns=['Runtime'])
  6. s = df['Runtime'].value_counts().sort_values(ascending = False)
  7. s[:10] # 产生32行,最后一行索引为10.0
  8. s.iloc[:10] # 前10行
  9. s.head(10) # 也是前10行
英文:

Why the surprising results?

Because the Runtime column in your DataFrame is of float datatype, the result of value_counts() is a Series with an index of type Float64Index.

The behaviour of [ ] style indexing on a series with a Float64Index is quirky. See this GitHub issue discussion:

> To summarize in a specific way [...] for all index types, using integers in [] (__getitem__) is positional (like iloc), except for Float64Index, making this a special case.

Essentially, with a float index, [:10] is interpreted not as "the first ten values" but as "all the values until the one where the index value is 10.0".

With your data, that's why you see df['Runtime'].value_counts().sort_values(ascending = False)[:10] giving 157 rows, with the last row having an index value of 10.0.


Getting the behaviour you want

To get the first ten values like you want, you can do:

df['Runtime'].value_counts().sort_values(ascending = False).iloc[:10]

or equivalently:

df['Runtime'].value_counts().sort_values(ascending = False).head(10)

(Looking at the source code shows that .head(n) returns the result of .iloc[:n].)


How to reproduce

We don't have access to your data, so here's a quick script I made to produce a similar enough DataFrame to reproduce the issue.

  1. import numpy as np
  2. import pandas as pd
  3. np.random.seed(5678)
  4. vals = np.random.randint(0, 50, size=1000).astype(float)
  5. df = pd.DataFrame(vals, columns=['Runtime'])
  6. s = df['Runtime'].value_counts().sort_values(ascending = False)
  7. s[:10] # produces 32 rows, the last one with index 10.0
  8. s.iloc[:10] # the first 10 rows
  9. s.head(10) # also the first 10 rows

huangapple
  • 本文由 发表于 2023年7月3日 22:34:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76605753.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定