2023年8月5日 02:48:06go评论131阅读模式

英文:

Are numpy logical element wise operations broken for pandas 2.0? (np.logical_or)

问题

我有一段代码，在更新到pandas 2.0之前是正常工作的。我在更新日志中查看到他们改变了logical_or的行为，但似乎并不完全相同：https://github.com/pandas-dev/pandas/pull/37374，所以这对我来说是一个非常意外的错误。

a = pd.DataFrame({"or":[False,True], "a":[True,True], "b":[True, False]})
np.logical_or(a[["a","b"]], a[2*["or"]])

在2.0版本之前，它是逐元素的“or”，现在直接失败了，报错如下：

ValueError: cannot reindex on an axis with duplicate labels

或者如果在第二个参数中提供了两个不同的标签，那么情况就更糟了，因为它不会失败，但会以极不稳定的方式连接它们。

这是pandas / numpy中已知的bug吗？还是有意为之的？有没有已知的高效替代方法？

英文:

I have a code that was working till I update to pandas 2.0, I checked in the changelog and I see that they changed the behavior of logical_or, but it doesn't seem exactly the same thing: https://github.com/pandas-dev/pandas/pull/37374 so it a is very unexpected error for me.

a = pd.DataFrame({&quot;or&quot;:[False,True], &quot;a&quot;:[True,True], &quot;b&quot;:[True, False]})
np.logical_or(a[[&quot;a&quot;,&quot;b&quot;]], a[2*[&quot;or&quot;]])

Before 2.0 it use to do a element wise "or", now it directly fails 0.o with:

ValueError: cannot reindex on an axis with duplicate labels

Or if provided with two different labels in the second then is even worst as it doesn't fail but concatenates both in a extremely erratic way.

Is this a known bug on pandas / numpy? or is it intended? Is there any known efficient alternative?

答案1

得分: 2

The mentioned error is marked with # GH#42568 (https://github.com/pandas-dev/pandas/pull/42568/files).
Actually, the code they've changed is now checking/ensuring for unique labels in axes["columns"]. But, even if we change the labels on one of the input dataframes like

np.logical_or(a["a","b"], a[2*["or"]].set_axis(["d","c"], axis=1))

we'd get:

  a     b    c    d

0 True True NaN NaN
1 True NaN NaN NaN

as it seems like 2 array-like structures are expected to have the same column labels (if any) when compared:

In [642]: np.logical_or(a["a","b"], a[2*["or"]].set_axis(["a","b"], axis=1))
Out[642]:
a b
0 True True
1 True True

A simple workaround is bypass labels check and pass only arrays of values:

In [644]: np.logical_or(a["a","b"].values, a[2*["or"]].values)
Out[644]:
array([[ True, True],
[ True, True]])

英文:

np.logical_or(a[[&quot;a&quot;,&quot;b&quot;]], a[2*[&quot;or&quot;]].set_axis([&quot;d&quot;,&quot;c&quot;], axis=1))

we'd get:

      a     b    c    d
0  True  True  NaN  NaN
1  True   NaN  NaN  NaN

as it seems like 2 array-like structures are expected to have the same column labels (if any) when compared:

In [642]: np.logical_or(a[[&quot;a&quot;,&quot;b&quot;]], a[2*[&quot;or&quot;]].set_axis([&quot;a&quot;,&quot;b&quot;], axis=1))
Out[642]: 
      a     b
0  True  True
1  True  True

A simple workaround is bypass labels check and pass only arrays of values:

In [644]: np.logical_or(a[[&quot;a&quot;,&quot;b&quot;]].values, a[2*[&quot;or&quot;]].values)
Out[644]: 
array([[ True,  True],
       [ True,  True]])

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

numpy的逻辑逐元素操作在pandas 2.0中是否出现问题？（np.logical_or）

问题

答案1

Scrapy仅爬取站点的前5页。

如何在运行Jupyter Notebook中的多进程时使打印语句正常工作

Python代码的自动补全和文档化

如何在列表中创建带有 f-string 的新行？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。