英文:
Are numpy logical element wise operations broken for pandas 2.0? (np.logical_or)
问题
我有一段代码,在更新到pandas 2.0之前是正常工作的。我在更新日志中查看到他们改变了logical_or的行为,但似乎并不完全相同:https://github.com/pandas-dev/pandas/pull/37374,所以这对我来说是一个非常意外的错误。
a = pd.DataFrame({"or":[False,True], "a":[True,True], "b":[True, False]})
np.logical_or(a[["a","b"]], a[2*["or"]])
在2.0版本之前,它是逐元素的“or”,现在直接失败了,报错如下:
ValueError: cannot reindex on an axis with duplicate labels
或者如果在第二个参数中提供了两个不同的标签,那么情况就更糟了,因为它不会失败,但会以极不稳定的方式连接它们。
这是pandas / numpy中已知的bug吗?还是有意为之的?有没有已知的高效替代方法?
英文:
I have a code that was working till I update to pandas 2.0, I checked in the changelog and I see that they changed the behavior of logical_or, but it doesn't seem exactly the same thing: https://github.com/pandas-dev/pandas/pull/37374 so it a is very unexpected error for me.
a = pd.DataFrame({"or":[False,True], "a":[True,True], "b":[True, False]})
np.logical_or(a[["a","b"]], a[2*["or"]])
Before 2.0 it use to do a element wise "or", now it directly fails 0.o with:
ValueError: cannot reindex on an axis with duplicate labels
Or if provided with two different labels in the second then is even worst as it doesn't fail but concatenates both in a extremely erratic way.
Is this a known bug on pandas / numpy? or is it intended? Is there any known efficient alternative?
答案1
得分: 2
The mentioned error is marked with # GH#42568
(https://github.com/pandas-dev/pandas/pull/42568/files).
Actually, the code they've changed is now checking/ensuring for unique labels in axes["columns"]
. But, even if we change the labels on one of the input dataframes like
np.logical_or(a["a","b"], a[2*["or"]].set_axis(["d","c"], axis=1))
we'd get:
a b c d
0 True True NaN NaN
1 True NaN NaN NaN
as it seems like 2 array-like structures are expected to have the same column labels (if any) when compared:
In [642]: np.logical_or(a["a","b"], a[2*["or"]].set_axis(["a","b"], axis=1))
Out[642]:
a b
0 True True
1 True True
A simple workaround is bypass labels check and pass only arrays of values:
In [644]: np.logical_or(a["a","b"].values, a[2*["or"]].values)
Out[644]:
array([[ True, True],
[ True, True]])
英文:
The mentioned error is marked with # GH#42568
(https://github.com/pandas-dev/pandas/pull/42568/files).
Actually, the code they've changed is now checking/ensuring for unique labels in axes["columns"]
. But, even if we change the labels on one of the input dataframes like
np.logical_or(a[["a","b"]], a[2*["or"]].set_axis(["d","c"], axis=1))
we'd get:
a b c d
0 True True NaN NaN
1 True NaN NaN NaN
as it seems like 2 array-like structures are expected to have the same column labels (if any) when compared:
In [642]: np.logical_or(a[["a","b"]], a[2*["or"]].set_axis(["a","b"], axis=1))
Out[642]:
a b
0 True True
1 True True
A simple workaround is bypass labels check and pass only arrays of values:
In [644]: np.logical_or(a[["a","b"]].values, a[2*["or"]].values)
Out[644]:
array([[ True, True],
[ True, True]])
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论