pandas column-slices with mypy

huangapple go评论75阅读模式
英文:

pandas column-slices with mypy

问题

最近我发现自己陷入了一个我无法解决的奇怪情况:

考虑这个最小化工作示例(MWE):

import pandas
import numpy as np

data = pandas.DataFrame(np.random.rand(10, 5), columns=list("abcde"))

observations = data.loc[:, : "c"]
features = data.loc[:, "c" :]

print(data)
print(observations)
print(features)

根据这个答案,切片本身是正确的,并且从输出来看也能得到正确的结果。但是,当我尝试在其上运行mypy时,我得到以下错误:

mypy.exe .\t.py
t.py:1: error: Skipping analyzing "pandas": module is installed, but missing library stubs or py.typed marker
t.py:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
t.py:6: error: Slice index must be an integer or None
t.py:7: error: Slice index must be an integer or None
Found 3 errors in 1 file (checked 1 source file)

这也是正确的,因为切片没有使用整数。如何满足或禁用“切片索引必须是整数或None”错误呢?

当然,您可以使用iloc(:,:3)来解决这个问题,但这似乎是一种不好的做法,因为使用iloc依赖于列的顺序(在这个示例中,loc也依赖于列的顺序,但这仅是为了使MWE更简洁)。

英文:

Lately I've found myself in a strange situation I cannot solve for myself:

Consider this MWE:

import pandas
import numpy as np

data = pandas.DataFrame(np.random.rand(10, 5), columns=list("abcde"))

observations = data.loc[:, :"c"]
features = data.loc[:, "c":]

print(data)
print(observations)
print(features)

According to this Answer the slicing itself is done correct and it works in the sense that the correct results are printed.
But when I try to run mypy over it I get this error:

mypy.exe .\t.py
t.py:1: error: Skipping analyzing "pandas": module is installed, but missing library stubs or py.typed marker
t.py:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
t.py:6: error: Slice index must be an integer or None
t.py:7: error: Slice index must be an integer or None
Found 3 errors in 1 file (checked 1 source file)

Which is also correct since the slicing is not done with an integer.
How can I either satisfy or disable the Slice index must be an integer or None error?

Of course you could use iloc(:,:3) to solve this problem, but this feels like a bad practice, since with iloc we are depending on the order of the columns (in this example loc is also dependent on the ordering, but this is only done to keep the MWE short).

答案1

得分: 1

这是一个开放问题(#GH2410)。

作为一种解决方法,您可以尝试使用 get_loc

col_idx = data.columns.get_loc("c")

observations = data.iloc[:, :col_idx+1]
features = data.iloc[:, col_idx:]

输出:

观察值:

          a         b         c # <- observations
0  0.269605  0.497063  0.676928
1  0.526765  0.204216  0.748203
2  0.919330  0.059722  0.422413
..       ...       ...       ...
7  0.056050  0.521702  0.727323
8  0.635477  0.145401  0.258166
9  0.041886  0.812769  0.839979

[10 行 x 3 列]

特征:

         c         d         e  # <- features
0  0.676928  0.672298  0.177933
1  0.748203  0.995165  0.136659
2  0.422413  0.222377  0.395179
..       ...       ...       ...
7  0.727323  0.291441  0.056998
8  0.258166  0.219025  0.405838
9  0.839979  0.923173  0.431298

[10 行 x 3 列]
英文:

That's an open issue (#GH2410).

As a workaround, you can maybe try with get_loc :

col_idx = data.columns.get_loc(&quot;c&quot;)
​
observations = data.iloc[:, :col_idx+1]
features = data.iloc[:, col_idx:]

Output :

           a         b         c # &lt;- observations
0   0.269605  0.497063  0.676928
1   0.526765  0.204216  0.748203
2   0.919330  0.059722  0.422413
..       ...       ...       ...
7   0.056050  0.521702  0.727323
8   0.635477  0.145401  0.258166
9   0.041886  0.812769  0.839979

[10 rows x 3 columns]

           c         d         e  # &lt;- features
0   0.676928  0.672298  0.177933
1   0.748203  0.995165  0.136659
2   0.422413  0.222377  0.395179
..       ...       ...       ...
7   0.727323  0.291441  0.056998
8   0.258166  0.219025  0.405838
9   0.839979  0.923173  0.431298

[10 rows x 3 columns]

huangapple
  • 本文由 发表于 2023年4月4日 15:32:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75926634.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定