英文:
pandas column-slices with mypy
问题
最近我发现自己陷入了一个我无法解决的奇怪情况:
考虑这个最小化工作示例(MWE):
import pandas
import numpy as np
data = pandas.DataFrame(np.random.rand(10, 5), columns=list("abcde"))
observations = data.loc[:, : "c"]
features = data.loc[:, "c" :]
print(data)
print(observations)
print(features)
根据这个答案,切片本身是正确的,并且从输出来看也能得到正确的结果。但是,当我尝试在其上运行mypy时,我得到以下错误:
mypy.exe .\t.py
t.py:1: error: Skipping analyzing "pandas": module is installed, but missing library stubs or py.typed marker
t.py:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
t.py:6: error: Slice index must be an integer or None
t.py:7: error: Slice index must be an integer or None
Found 3 errors in 1 file (checked 1 source file)
这也是正确的,因为切片没有使用整数。如何满足或禁用“切片索引必须是整数或None”错误呢?
当然,您可以使用iloc(:,:3)
来解决这个问题,但这似乎是一种不好的做法,因为使用iloc
依赖于列的顺序(在这个示例中,loc
也依赖于列的顺序,但这仅是为了使MWE更简洁)。
英文:
Lately I've found myself in a strange situation I cannot solve for myself:
Consider this MWE:
import pandas
import numpy as np
data = pandas.DataFrame(np.random.rand(10, 5), columns=list("abcde"))
observations = data.loc[:, :"c"]
features = data.loc[:, "c":]
print(data)
print(observations)
print(features)
According to this Answer the slicing itself is done correct and it works in the sense that the correct results are printed.
But when I try to run mypy over it I get this error:
mypy.exe .\t.py
t.py:1: error: Skipping analyzing "pandas": module is installed, but missing library stubs or py.typed marker
t.py:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
t.py:6: error: Slice index must be an integer or None
t.py:7: error: Slice index must be an integer or None
Found 3 errors in 1 file (checked 1 source file)
Which is also correct since the slicing is not done with an integer.
How can I either satisfy or disable the Slice index must be an integer or None
error?
Of course you could use iloc(:,:3)
to solve this problem, but this feels like a bad practice, since with iloc
we are depending on the order of the columns (in this example loc
is also dependent on the ordering, but this is only done to keep the MWE short).
答案1
得分: 1
这是一个开放问题(#GH2410)。
作为一种解决方法,您可以尝试使用 get_loc
:
col_idx = data.columns.get_loc("c")
observations = data.iloc[:, :col_idx+1]
features = data.iloc[:, col_idx:]
输出:
观察值:
a b c # <- observations
0 0.269605 0.497063 0.676928
1 0.526765 0.204216 0.748203
2 0.919330 0.059722 0.422413
.. ... ... ...
7 0.056050 0.521702 0.727323
8 0.635477 0.145401 0.258166
9 0.041886 0.812769 0.839979
[10 行 x 3 列]
特征:
c d e # <- features
0 0.676928 0.672298 0.177933
1 0.748203 0.995165 0.136659
2 0.422413 0.222377 0.395179
.. ... ... ...
7 0.727323 0.291441 0.056998
8 0.258166 0.219025 0.405838
9 0.839979 0.923173 0.431298
[10 行 x 3 列]
英文:
That's an open issue (#GH2410).
As a workaround, you can maybe try with get_loc
:
col_idx = data.columns.get_loc("c")
observations = data.iloc[:, :col_idx+1]
features = data.iloc[:, col_idx:]
Output :
a b c # <- observations
0 0.269605 0.497063 0.676928
1 0.526765 0.204216 0.748203
2 0.919330 0.059722 0.422413
.. ... ... ...
7 0.056050 0.521702 0.727323
8 0.635477 0.145401 0.258166
9 0.041886 0.812769 0.839979
[10 rows x 3 columns]
c d e # <- features
0 0.676928 0.672298 0.177933
1 0.748203 0.995165 0.136659
2 0.422413 0.222377 0.395179
.. ... ... ...
7 0.727323 0.291441 0.056998
8 0.258166 0.219025 0.405838
9 0.839979 0.923173 0.431298
[10 rows x 3 columns]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论