numpy的逻辑逐元素操作在pandas 2.0中是否出现问题?(np.logical_or)

huangapple go评论80阅读模式
英文:

Are numpy logical element wise operations broken for pandas 2.0? (np.logical_or)

问题

我有一段代码,在更新到pandas 2.0之前是正常工作的。我在更新日志中查看到他们改变了logical_or的行为,但似乎并不完全相同:https://github.com/pandas-dev/pandas/pull/37374,所以这对我来说是一个非常意外的错误。

a = pd.DataFrame({"or":[False,True], "a":[True,True], "b":[True, False]})
np.logical_or(a[["a","b"]], a[2*["or"]])

在2.0版本之前,它是逐元素的“or”,现在直接失败了,报错如下:

ValueError: cannot reindex on an axis with duplicate labels

或者如果在第二个参数中提供了两个不同的标签,那么情况就更糟了,因为它不会失败,但会以极不稳定的方式连接它们。

这是pandas / numpy中已知的bug吗?还是有意为之的?有没有已知的高效替代方法?

英文:

I have a code that was working till I update to pandas 2.0, I checked in the changelog and I see that they changed the behavior of logical_or, but it doesn't seem exactly the same thing: https://github.com/pandas-dev/pandas/pull/37374 so it a is very unexpected error for me.

a = pd.DataFrame({"or":[False,True], "a":[True,True], "b":[True, False]})
np.logical_or(a[["a","b"]], a[2*["or"]])

Before 2.0 it use to do a element wise "or", now it directly fails 0.o with:

ValueError: cannot reindex on an axis with duplicate labels

Or if provided with two different labels in the second then is even worst as it doesn't fail but concatenates both in a extremely erratic way.

Is this a known bug on pandas / numpy? or is it intended? Is there any known efficient alternative?

答案1

得分: 2

The mentioned error is marked with # GH#42568 (https://github.com/pandas-dev/pandas/pull/42568/files).
Actually, the code they've changed is now checking/ensuring for unique labels in axes["columns"]. But, even if we change the labels on one of the input dataframes like

np.logical_or(a["a","b"], a[2*["or"]].set_axis(["d","c"], axis=1))

we'd get:

  a     b    c    d

0 True True NaN NaN
1 True NaN NaN NaN

as it seems like 2 array-like structures are expected to have the same column labels (if any) when compared:

In [642]: np.logical_or(a["a","b"], a[2*["or"]].set_axis(["a","b"], axis=1))
Out[642]:
a b
0 True True
1 True True


A simple workaround is bypass labels check and pass only arrays of values:

In [644]: np.logical_or(a["a","b"].values, a[2*["or"]].values)
Out[644]:
array([[ True, True],
[ True, True]])

英文:

The mentioned error is marked with # GH#42568 (https://github.com/pandas-dev/pandas/pull/42568/files).
Actually, the code they've changed is now checking/ensuring for unique labels in axes["columns"]. But, even if we change the labels on one of the input dataframes like

np.logical_or(a[["a","b"]], a[2*["or"]].set_axis(["d","c"], axis=1))

we'd get:

      a     b    c    d
0  True  True  NaN  NaN
1  True   NaN  NaN  NaN

as it seems like 2 array-like structures are expected to have the same column labels (if any) when compared:

In [642]: np.logical_or(a[["a","b"]], a[2*["or"]].set_axis(["a","b"], axis=1))
Out[642]: 
      a     b
0  True  True
1  True  True

A simple workaround is bypass labels check and pass only arrays of values:

In [644]: np.logical_or(a[["a","b"]].values, a[2*["or"]].values)
Out[644]: 
array([[ True,  True],
       [ True,  True]])

huangapple
  • 本文由 发表于 2023年8月5日 02:48:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76838487.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定