Pandas DataFrame对象为什么不支持布尔运算?

huangapple go评论54阅读模式

Why don't pandas DataFrame objects support boolean evaluation?



在什么情况下,一个shape 0,0的DataFrame会不想要bool(df) -> False?这会导致什么问题,或者提供了什么功能,会不好吗?

反之,如果你有一个sum(shape) > 0的DataFrame,让bool(df) -> True有何不合理之处,或者会引发问题,或者会启用什么?



if a: do_something()
else: do_something_else()



if a or (type(a) is DataFrame and sum(a.shape) > 0): do_something

if a or not getattr(a, "empty"): do_something





If I have a dataframe, is there some specific reason that direct boolean evaluation isn't supported? Contrary to the pandas error message, it seems absolutely clear and completely unambiguous how to evaluate the truth-value of a dataframe.

What case would a dataframe with shape 0,0 would anyone not want bool(df) -> False? What would that break, or enable, that would be bad?

Conversely, if you have a dataframe with sum(shape) > 0, what would ever not make sense, break, or be enabled by letting bool(df) -> True?

Not asking for philosophy, just like an example or specific reason. To me, it seems very clear how useful this would be, and very opaque that it isn't supported. The error message itself led me to understand the exact opposite of the error message, it almost seems like satire after thinking about it.

In my work the result is, I can't have dataframes flow through a logic system that applies generally, instead i have to make a special case for this single type of data object to use the empty method instead of simply evaluating it in a general condition statement for example.

if a: do_something()
else: do_something_else()

When a is a dataframe, I have to add code to achieve the obvious expected result. It is the only data object I'm working with that requires a separate method to check if it's empty and equate that to being false.

Obviously I can write

if a or (if type(a) is DataFrame and sum(a.shape)>0: do_something


if a or not getattr(a, "empty"): do_something

or whatever, tiptoeing around which part of the condition goes where to carefully not bool the dataframe or hardcode the object-specific methods (as I didn't do in these examples).

As written above, the question is what specific harm is avoided by forcing these approaches?

If your answer is to agree with the pandas error message about ambiguity, I guess my question would be what is unambiguous about an all-zero shape being False and any other shape True. Its a fundamental aspect of Python's builtin datatypes that objects with 'shape' are True, and objects that are empty / without shape / zero length are False.


得分: 2




People constantly screw up boolean operations with arrays and dataframes, expecting if checks to broadcast. From what I've seen, this is much more common than people trying to check a dataframe for emptiness. Banning __bool__ helps catch these errors, rather than producing silent misbehavior.

Also, you expect any shape with at least one nonzero dimension to be considered True. That's not what I would expect from an emptiness check, and it's not what DataFrame.empty does. I would expect something that behaves like DataFrame.empty: a dataframe is considered empty if it has at least one zero-length dimension.


得分: 0

不要为每个 if 情况创建特殊情况,为可能的转换创建一个特殊函数

def coerce_to_bool(obj):
    if hasattr(obj, "empty"):
        return obj.empty
    return bool(obj)
    if coerce_to_bool(df):

Don't make a special case for each if, make a special function for possible conversions

def coerce_to_bool(obj):
    if hasattr(obj, "empty"):
        return obj.empty
    return bool(obj)
    if coerce_to_bool(df):

  • 本文由 发表于 2023年6月9日 05:51:41
  • 转载请务必保留本文链接:



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
