英文:
Why don't pandas DataFrame objects support boolean evaluation?
问题
直接布尔值评估DataFrame的原因有何特殊原因?与pandas的错误消息相反,如何评估DataFrame的真值似乎绝对清晰且完全明确。
在什么情况下,一个shape 0,0
的DataFrame会不想要bool(df) -> False
?这会导致什么问题,或者提供了什么功能,会不好吗?
反之,如果你有一个sum(shape) > 0
的DataFrame,让bool(df) -> True
有何不合理之处,或者会引发问题,或者会启用什么?
我不是在探讨哲学问题,只是想要一个示例或具体的原因。对我来说,这似乎非常明显会有多么有用,以及为什么不支持这一点非常模糊。错误消息本身让我理解了与错误消息正好相反的情况,经过思考后,它几乎看起来像是讽刺。
在我的工作中,结果是,我不能让DataFrame通过一个通常适用的逻辑系统流动,而是必须为这种单一类型的数据对象制定一个特殊情况,使用empty
方法,而不是简单地在一般条件语句中评估它,例如:
if a: do_something()
else: do_something_else()
当a是一个DataFrame时,我必须添加代码以实现明显的预期结果。这是我正在处理的唯一需要单独的方法来检查是否为空并将其等同于false
的数据对象。
显然,我可以编写:
if a or (type(a) is DataFrame and sum(a.shape) > 0): do_something
或
if a or not getattr(a, "empty"): do_something
或其他方式,绕过条件的哪一部分放在哪里,小心地不要将DataFrame布尔化或硬编码对象特定的方法(正如我在这些示例中没有做的那样)。
如上所述,问题是通过强制使用这些方法,可以避免什么具体的问题?
如果你的答案是赞同pandas的错误消息关于模糊性,那么我的问题将是,一个全零形状为False
,任何其他形状为True
的部分有什么明确之处。这是Python内置数据类型的一个基本特性,具有'形状'的对象为True
,而没有形状/零长度的对象为False
。
英文:
If I have a dataframe, is there some specific reason that direct boolean evaluation isn't supported? Contrary to the pandas error message, it seems absolutely clear and completely unambiguous how to evaluate the truth-value of a dataframe.
What case would a dataframe with shape 0,0
would anyone not want bool(df) -> False
? What would that break, or enable, that would be bad?
Conversely, if you have a dataframe with sum(shape) > 0
, what would ever not make sense, break, or be enabled by letting bool(df) -> True
?
Not asking for philosophy, just like an example or specific reason. To me, it seems very clear how useful this would be, and very opaque that it isn't supported. The error message itself led me to understand the exact opposite of the error message, it almost seems like satire after thinking about it.
In my work the result is, I can't have dataframes flow through a logic system that applies generally, instead i have to make a special case for this single type of data object to use the empty
method instead of simply evaluating it in a general condition statement for example.
if a: do_something()
else: do_something_else()
When a is a dataframe, I have to add code to achieve the obvious expected result. It is the only data object I'm working with that requires a separate method to check if it's empty and equate that to being false
.
Obviously I can write
if a or (if type(a) is DataFrame and sum(a.shape)>0: do_something
or
if a or not getattr(a, "empty"): do_something
or whatever, tiptoeing around which part of the condition goes where to carefully not bool the dataframe or hardcode the object-specific methods (as I didn't do in these examples).
As written above, the question is what specific harm is avoided by forcing these approaches?
If your answer is to agree with the pandas error message about ambiguity, I guess my question would be what is unambiguous about an all-zero shape being False
and any other shape True
. Its a fundamental aspect of Python's builtin datatypes that objects with 'shape' are True
, and objects that are empty / without shape / zero length are False
.
答案1
得分: 2
人们经常在数组和数据框上搞砸布尔运算,期望if
检查能广播。从我看到的情况来看,这比人们尝试检查数据框是否为空要常见得多。禁用__bool__
有助于捕捉这些错误,而不是产生无声的错误行为。
另外,您期望至少有一个非零维度的形状被视为True。这不是我对空值检查的期望,也不是DataFrame.empty
的行为。我期望的是像DataFrame.empty
那样的行为:只有在至少有一个零长度维度时,才将数据框视为空。
英文:
People constantly screw up boolean operations with arrays and dataframes, expecting if
checks to broadcast. From what I've seen, this is much more common than people trying to check a dataframe for emptiness. Banning __bool__
helps catch these errors, rather than producing silent misbehavior.
Also, you expect any shape with at least one nonzero dimension to be considered True. That's not what I would expect from an emptiness check, and it's not what DataFrame.empty
does. I would expect something that behaves like DataFrame.empty
: a dataframe is considered empty if it has at least one zero-length dimension.
答案2
得分: 0
不要为每个 if
情况创建特殊情况,为可能的转换创建一个特殊函数
def coerce_to_bool(obj):
if hasattr(obj, "empty"):
return obj.empty
return bool(obj)
if coerce_to_bool(df):
...
英文:
Don't make a special case for each if
, make a special function for possible conversions
def coerce_to_bool(obj):
if hasattr(obj, "empty"):
return obj.empty
return bool(obj)
if coerce_to_bool(df):
...
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论