2023年4月4日 15:32:34go评论75阅读模式

英文:

pandas column-slices with mypy

问题

最近我发现自己陷入了一个我无法解决的奇怪情况：

考虑这个最小化工作示例（MWE）：

import pandas
import numpy as np

data = pandas.DataFrame(np.random.rand(10, 5), columns=list("abcde"))

observations = data.loc[:, : "c"]
features = data.loc[:, "c" :]

print(data)
print(observations)
print(features)

根据这个答案，切片本身是正确的，并且从输出来看也能得到正确的结果。但是，当我尝试在其上运行mypy时，我得到以下错误：

mypy.exe .\t.py
t.py:1: error: Skipping analyzing "pandas": module is installed, but missing library stubs or py.typed marker
t.py:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
t.py:6: error: Slice index must be an integer or None
t.py:7: error: Slice index must be an integer or None
Found 3 errors in 1 file (checked 1 source file)

这也是正确的，因为切片没有使用整数。如何满足或禁用“切片索引必须是整数或None”错误呢？

当然，您可以使用iloc(:,:3)来解决这个问题，但这似乎是一种不好的做法，因为使用iloc依赖于列的顺序（在这个示例中，loc也依赖于列的顺序，但这仅是为了使MWE更简洁）。

英文:

Lately I've found myself in a strange situation I cannot solve for myself:

Consider this MWE:

import pandas
import numpy as np

data = pandas.DataFrame(np.random.rand(10, 5), columns=list(&quot;abcde&quot;))

observations = data.loc[:, :&quot;c&quot;]
features = data.loc[:, &quot;c&quot;:]

print(data)
print(observations)
print(features)

According to this Answer the slicing itself is done correct and it works in the sense that the correct results are printed.
But when I try to run mypy over it I get this error:

mypy.exe .\t.py
t.py:1: error: Skipping analyzing &quot;pandas&quot;: module is installed, but missing library stubs or py.typed marker
t.py:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
t.py:6: error: Slice index must be an integer or None
t.py:7: error: Slice index must be an integer or None
Found 3 errors in 1 file (checked 1 source file)

Which is also correct since the slicing is not done with an integer.
How can I either satisfy or disable the Slice index must be an integer or None error?

Of course you could use iloc(:,:3) to solve this problem, but this feels like a bad practice, since with iloc we are depending on the order of the columns (in this example loc is also dependent on the ordering, but this is only done to keep the MWE short).

答案1

得分: 1

这是一个开放问题（#GH2410）。

作为一种解决方法，您可以尝试使用 get_loc：

col_idx = data.columns.get_loc("c")

observations = data.iloc[:, :col_idx+1]
features = data.iloc[:, col_idx:]

输出：

观察值：

          a         b         c # <- observations
0  0.269605  0.497063  0.676928
1  0.526765  0.204216  0.748203
2  0.919330  0.059722  0.422413
..       ...       ...       ...
7  0.056050  0.521702  0.727323
8  0.635477  0.145401  0.258166
9  0.041886  0.812769  0.839979

[10 行 x 3 列]

特征：

         c         d         e  # <- features
0  0.676928  0.672298  0.177933
1  0.748203  0.995165  0.136659
2  0.422413  0.222377  0.395179
..       ...       ...       ...
7  0.727323  0.291441  0.056998
8  0.258166  0.219025  0.405838
9  0.839979  0.923173  0.431298

[10 行 x 3 列]

英文:

That's an open issue (#GH2410).

As a workaround, you can maybe try with get_loc :

col_idx = data.columns.get_loc(&quot;c&quot;)

observations = data.iloc[:, :col_idx+1]
features = data.iloc[:, col_idx:]

Output :

           a         b         c # &lt;- observations
0   0.269605  0.497063  0.676928
1   0.526765  0.204216  0.748203
2   0.919330  0.059722  0.422413
..       ...       ...       ...
7   0.056050  0.521702  0.727323
8   0.635477  0.145401  0.258166
9   0.041886  0.812769  0.839979

[10 rows x 3 columns]

           c         d         e  # &lt;- features
0   0.676928  0.672298  0.177933
1   0.748203  0.995165  0.136659
2   0.422413  0.222377  0.395179
..       ...       ...       ...
7   0.727323  0.291441  0.056998
8   0.258166  0.219025  0.405838
9   0.839979  0.923173  0.431298

[10 rows x 3 columns]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

pandas column-slices with mypy

问题

答案1

Python正则表达式提取数字之间的文本

在偶数行查找不存在特定正则表达式的文件。

执行一个 SQL 查询，根据 pandas 数据帧的参数进行操作。

创建、保存和加载空间索引使用GeoPandas

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论