2023年4月11日 02:07:46go评论108阅读模式

英文:

Axial inconsistency of pandas.diff

问题

The following code works:

这段代码可以正常工作：

df['col'].diff()

The result is:

结果如下：

0     NaN
1    True
Name: col, dtype: object

However, the code:

然而，下面的代码：

df.T.diff(axis=1)

gives the error:

会产生错误：

numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

Is that a bug?

这是一个错误吗？

英文:

Consider the dataframe:

df = pd.DataFrame({&#39;col&#39;: [True, False]})

The following code works:

df[&#39;col&#39;].diff()

The result is:

0     NaN
1    True
Name: col, dtype: object

However, the code:

df.T.diff(axis=1)

gives the error:

numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

Is that a bug?

答案1

得分: 1

这似乎是有意为之，根据 GH15856。在NumPy中，布尔数组之间的算术运算 (+, -, *, /, 等) 不再被支持。

在 axis=1 上使用 diff 时，pandas 试图计算沿着列轴的连续元素之间的差异（因为在这里由于转置而包含布尔值），由于底层运行了NumPy来计算，因此会引发 TypeError。

print(df.T)
        0      1
col  True  False
np.array(False) - np.array(True)
TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

这可能令人困惑，因为使用Python布尔值进行相同操作成功：

False - True
# 返回 -1

但是 @seberg 解释了原因：

> 这是一个相当古老的弃用，虽然我似乎记得有一些关于只弃用 -False 而不是 True - True 的讨论。请注意，Python布尔值与NumPy布尔值不同，它们在实际上是整数。NumPy布尔值的行为更不像整数，如果将两个布尔值相加，你会再次得到一个布尔值，等等。

英文:

It seems like this behaviour is intentional as per GH15856. Arithmetic operations (+, -, *, /, etc.) between boolean arrays in NumPy are not (or not anymore?) supported.

With diff on axis=1, pandas tries to compute the difference between consecutive elements along the columns axis (which happens to hold booleans here because of the transposition) and since NumPy is run under the hood to compute that, a TypeError is raised.

print(df.T)
        0      1
col  True  False
np.array(False) - np.array(True)
TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

This can be counterintuitive, because the same operation when using Python boolean, succeeds :

False - True
# return -1

But @seberg explains why :

> This is a pretty old deprecation, although I do seem to remember some
> discussion about only deprecating the unary operator -False not True -
> True. Note that Python booleans are different from NumPy ones, they
> are practically integers. NumPy booleans behave much less like
> integers, if you add two booleans you get a boolean again, etc.

答案2

得分: 1

以下是您要翻译的内容：

"你正在看到的行为似乎与文档中明确说明的不符：

对于布尔数据类型，这里使用的是 operator.xor() 而不是 operator.sub()。结果根据 DataFrame 中的当前数据类型计算，但结果的数据类型始终为 float64。

还有一个有趣的测试：

df = pd.DataFrame({'col': [True, False], 'col2': [True, False]})
print("", "df:", sep='\n')
print(df, df.dtypes, sep='\n')
print("", "df 的差异:", sep='\n')
res = df.diff()
print(res, res.dtypes, sep='\n')
print("", "df['col'] 的差异:", sep='\n')
res = df['col'].diff()
print(res, res.dtypes, sep='\n')
print("", "df.T:", sep='\n')
res = df.T
print(res, res.dtypes, sep='\n')
print("", "df.T 的差异（axis=0）:", sep='\n')
res = df.T.diff(axis=0)
print(res, res.dtypes, sep='\n')
print("", "df.T 转换为 object 数据类型:", sep='\n')
res = df.T.astype(object)
print(res, res.dtypes, sep='\n')
print("", "df.T 转换为 object 数据类型后的差异（axis=1）:", sep='\n')
res = df.T.astype(object).diff(axis=1)
print(res, res.dtypes, sep='\n')
try:
    print("", "df.T 的差异（axis=1）:", sep='\n')
    res = df.T.diff(axis=1)
    print(res, res.dtypes, sep='\n')
except TypeError:
    print('得到 TypeError')

输出：

df:
     col   col2
0   True   True
1  False  False
col     bool
col2    bool
dtype: object
df 的差异:
    col  col2
0   NaN   NaN
1  True  True
col     object
col2    object
dtype: object
df['col'] 的差异:
0     NaN
1    True
Name: col, dtype: object
object
df.T:
         0      1
col   True  False
col2  True  False
0    bool
1    bool
dtype: object
df.T 的差异（axis=0）:
          0      1
col     NaN    NaN
col2  False  False
0    object
1    object
dtype: object
df.T 转换为 object 数据类型:
         0      1
col   True  False
col2  True  False
0    object
1    object
dtype: object
df.T 转换为 object 数据类型后的差异（axis=1）:
        0   1
col   NaN  -1
col2  NaN  -1
0    object
1    object
dtype: object
df.T 的差异（axis=1）:
得到 TypeError

如果我们在调用 diff(axis=1) 之前将列的数据类型更改为 object 类型，将不会引发错误，并且结果似乎会将布尔值转换为整数，然后执行整数减法。然而，正如 OP 指出的，没有使用 astype(object) 进行的相同操作会引发 TypeError：“TypeError: numpy boolean subtract, the - operator, is not supported, use the bitwise_xor, the ^ operator, or the logical_xor function instead.”，尽管 diff() 文档中声称“对于布尔数据类型，这里使用的是 operator.xor() 而不是 operator.sub()”。"

英文:

The behavior you're seeing would appear to be at odds with the docs which clearly state:

> For boolean dtypes, this uses operator.xor() rather than operator.sub(). The result is calculated according to current dtype in DataFrame, however dtype of the result is always float64.

Also interesting is the following test:

df = pd.DataFrame({&#39;col&#39;: [True, False], &#39;col2&#39;: [True, False]})
print(&quot;&quot;,&quot;df:&quot;,sep=&#39;\n&#39;)
print(df,df.dtypes,sep=&#39;\n&#39;)
print(&quot;&quot;,&quot;diff of df:&quot;,sep=&#39;\n&#39;)
res = df.diff()
print(res,res.dtypes,sep=&#39;\n&#39;)
print(&quot;&quot;,&quot;diff of df[&#39;col&#39;]:&quot;,sep=&#39;\n&#39;)
res = df[&#39;col&#39;].diff()
print(res,res.dtypes,sep=&#39;\n&#39;)
print(&quot;&quot;,&quot;df.T:&quot;,sep=&#39;\n&#39;)
res = df.T
print(res,res.dtypes,sep=&#39;\n&#39;)
print(&quot;&quot;,&quot;diff(axis=0) of df.T:&quot;,sep=&#39;\n&#39;)
res = df.T.diff(axis=0)
print(res,res.dtypes,sep=&#39;\n&#39;)
print(&quot;&quot;,&quot;df.T.astype(object):&quot;,sep=&#39;\n&#39;)
res = df.T.astype(object)
print(res,res.dtypes,sep=&#39;\n&#39;)
print(&quot;&quot;,&quot;diff(axis=1) of df.T.astype(object):&quot;,sep=&#39;\n&#39;)
res = df.T.astype(object).diff(axis=1)
print(res,res.dtypes,sep=&#39;\n&#39;)
try:
    print(&quot;&quot;,&quot;diff(axis=1) of df.T:&quot;,sep=&#39;\n&#39;)
    res = df.T.diff(axis=1)
    print(res,res.dtypes,sep=&#39;\n&#39;)
except TypeError:
    print(&#39;got TypeError&#39;)

Output:

df:
     col   col2
0   True   True
1  False  False
col     bool
col2    bool
dtype: object
diff of df:
    col  col2
0   NaN   NaN
1  True  True
col     object
col2    object
dtype: object
diff of df[&#39;col&#39;]:
0     NaN
1    True
Name: col, dtype: object
object
df.T:
         0      1
col   True  False
col2  True  False
0    bool
1    bool
dtype: object
diff(axis=0) of df.T:
          0      1
col     NaN    NaN
col2  False  False
0    object
1    object
dtype: object
df.T.astype(object):
         0      1
col   True  False
col2  True  False
0    object
1    object
dtype: object
diff(axis=1) of df.T.astype(object):
        0   1
col   NaN  -1
col2  NaN  -1
0    object
1    object
dtype: object
diff(axis=1) of df.T:
got TypeError

If we change the column types to object using astype() before the call to diff(axis=1), no error is raised and the result appears to cast the boolean values to int prior to performing the diff using integer subtraction.

However, as OP points out, this same operation without astype(object) raises the TypeError TypeError: numpy boolean subtract, the -operator, is not supported, use the bitwise_xor, the^ operator, or the logical_xor function instead., despite the claim in the diff() docs that For boolean dtypes, this uses operator.xor() rather than operator.sub().

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

“pandas.diff”的”axial inconsistency”

问题

答案1

答案2

将4个字符串的列表分成一对一对的列表。

Format error Exception raised from _load_for_mobile while loading a pytorch model in react native

我的代码为什么会执行我没有告诉它要执行的操作？

谷歌分析在Streamlit应用程序上无法正常工作

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。