2023年7月3日 22:34:37go评论125阅读模式

英文:

What is the difference between head(10) and [:10] in pyhton?

问题

df['Runtime'].value_counts().sort_values(ascending = False)[:10]

我是新手。导师说我将选择前十个，然后他写了以下代码。

我尝试了这段代码 df['Runtime'].value_counts().sort_values(ascending = False)[:10] 它给了我不同的结果。这两个不同表达式之间的根本区别是什么？

df['Runtime'].value_counts().sort_values(ascending = False)[:10]
90.0    971
95.0    489
92.0    434
93.0    422
85.0    408
...
19.0      8
32.0      8
9.0       8
7.0       8
10.0      8
Name: Runtime, Length: 157, dtype: int64

df['Runtime'].value_counts().sort_values(ascending = False).head(10)
90.0     971
95.0     489
92.0     434
93.0     422
85.0     408
89.0     407
88.0     406
100.0    402
91.0     394
94.0     383
Name: Runtime, dtype: int64

英文:

I am newbie. Instructor said ı am going to select first ten then he write the code below.

df['Runtime'].value_counts().sort_values(ascending = False)[:10]

I tried the code df['Runtime'].value_counts().sort_values(ascending = False)[:10] it gives me different result. What's the fundamental difference between the two different expressions?

df[&#39;Runtime&#39;].value_counts().sort_values(ascending = False)[:10]
90.0    971
95.0    489
92.0    434
93.0    422
85.0    408
       ... 
19.0      8
32.0      8
9.0       8
7.0       8
10.0      8
Name: Runtime, Length: 157, dtype: int64
.
df[&#39;Runtime&#39;].value_counts().sort_values(ascending = False).head(10)
90.0     971
95.0     489
92.0     434
93.0     422
85.0     408
89.0     407
88.0     406
100.0    402
91.0     394
94.0     383
Name: Runtime, dtype: int64

答案1

得分: 1

因为您的DataFrame中的Runtime列是float数据类型，所以value_counts()的结果是一个带有Float64Index类型索引的Series。

在具有Float64Index的Series上执行[ ]样式的索引行为是古怪的。请参阅这个GitHub问题讨论：

总结一下，在所有索引类型中，使用整数在[]（__getitem__）中是位置索引（类似于iloc），但对于Float64Index，这是一个特殊情况。

基本上，使用浮点索引，[:10]的解释不是“前十个值”，而是“直到索引值为10.0的所有值”。

因此，根据您的数据，这就是为什么您会看到df['Runtime'].value_counts().sort_values(ascending = False)[:10]返回157行，最后一行的索引值为10.0。

要获得您想要的前十个值，可以执行以下操作：

df['Runtime'].value_counts().sort_values(ascending = False).iloc[:10]

或者等效地：

df['Runtime'].value_counts().sort_values(ascending = False).head(10)

查看源代码显示.head(n)返回.iloc[:n]的结果。

如何重现：

我们无法访问您的数据，因此这是一个我创建的快速脚本，用于生成一个足够相似的DataFrame以重现此问题。

import numpy as np
import pandas as pd
np.random.seed(5678)
vals = np.random.randint(0, 50, size=1000).astype(float)
df = pd.DataFrame(vals, columns=['Runtime'])
s = df['Runtime'].value_counts().sort_values(ascending = False)
s[:10]   # 产生32行，最后一行索引为10.0
s.iloc[:10]  # 前10行
s.head(10)   # 也是前10行

英文:

Why the surprising results?

Because the Runtime column in your DataFrame is of float datatype, the result of value_counts() is a Series with an index of type Float64Index.

The behaviour of [ ] style indexing on a series with a Float64Index is quirky. See this GitHub issue discussion:

> To summarize in a specific way [...] for all index types, using integers in [] (__getitem__) is positional (like iloc), except for Float64Index, making this a special case.

Essentially, with a float index, [:10] is interpreted not as "the first ten values" but as "all the values until the one where the index value is 10.0".

With your data, that's why you see df['Runtime'].value_counts().sort_values(ascending = False)[:10] giving 157 rows, with the last row having an index value of 10.0.

Getting the behaviour you want

To get the first ten values like you want, you can do:

df['Runtime'].value_counts().sort_values(ascending = False).iloc[:10]

or equivalently:

df['Runtime'].value_counts().sort_values(ascending = False).head(10)

(Looking at the source code shows that .head(n) returns the result of .iloc[:n].)

How to reproduce

We don't have access to your data, so here's a quick script I made to produce a similar enough DataFrame to reproduce the issue.

import numpy as np
import pandas as pd
np.random.seed(5678)
vals = np.random.randint(0, 50, size=1000).astype(float)
df = pd.DataFrame(vals, columns=[&#39;Runtime&#39;])
s = df[&#39;Runtime&#39;].value_counts().sort_values(ascending = False)
s[:10]   # produces 32 rows, the last one with index 10.0
s.iloc[:10]  # the first 10 rows
s.head(10)   # also the first 10 rows

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

head(10)和[:10]在Python中有什么区别？

问题

答案1

Why the surprising results?

Getting the behaviour you want

How to reproduce

在 Azure 搜索索引上的过滤不起作用。

使用loop.run_in_executor从异步函数中调用同步函数。

在IBM CPLEX中使用对数函数在目标函数中。

mysql python 插入用户提供的数据

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。