英文:
Why is neq-comparison inconsistent in pandas?
问题
两个pandas数组的比较结果与逐元素比较的结果不相同。以下是一个示例:
import numpy as np
import pandas as pd
floats_with_None = np.array([None, 2.3, 1.4])
s_None = pd.Series(floats_with_None)
(s_None != s_None)[0] # 得到True
s_None[0] != s_None[0] # 得到False
有人能解释这个现象的原因吗?
英文:
A comparison of two pandas arrays does not give the same result as performing the comparison element-wise. Here is an example:
import numpy as np
import pandas as pd
floats_with_None = np.array([None,2.3,1.4])
s_None = pd.Series(floats_with_None)
(s_None != s_None)[0] # gives True
s_None[0] != s_None[0] # gives False
Can someone explain the cause of this?
答案1
得分: 3
Pandas将None视为“缺失数据”标记,并在比较中对其进行特殊处理。就像浮点数的NaN或pandas.NaT一样,pandas将None视为与任何东西都不相等,包括自身。
但请注意,None仍然只是一个普通的Python对象,在Pandas之外。Pandas只能在由Pandas实现的比较中特殊处理None。当您执行以下操作时:
s_None[0] != s_None[0]
您正在使用None的!=实现,这只是从object继承的默认实现。此默认!=实现认为所有对象都等于自身。
这种差异在文档中有提到:
必须注意,在Python(和NumPy)中,
nan不相等,但None相等。请注意,pandas/NumPy利用了np.nan != np.nan的事实,并将None视为np.nan一样对待。
尽管“请注意pandas/NumPy...”部分可能应该只说“请注意pandas...”。
英文:
Pandas treats None as a "missing data" marker, and special-cases it in comparisons. Like floating-point NaN or pandas.NaT, pandas will treat None as unequal to everything, including itself.
None is still just an ordinary Python object, outside of Pandas's control, though. Pandas can only special-case None in comparisons implemented by Pandas. When you do
s_None[0] != s_None[0]
you're using None's implementation of !=, which is just a default implementation inherited from object. This default implementation of != considers all objects equal to themselves.
This discrepancy is mentioned in the docs:
> One has to be mindful that in Python (and NumPy), the nan's don’t compare equal, but None's do. Note that pandas/NumPy uses the fact that np.nan != np.nan, and treats None like np.nan.
although the "Note that pandas/NumPy..." part should probably just say "Note that pandas..."
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论