Pandas DataFrame对象为什么不支持布尔运算?

huangapple go评论70阅读模式
英文:

Why don't pandas DataFrame objects support boolean evaluation?

问题

直接布尔值评估DataFrame的原因有何特殊原因?与pandas的错误消息相反,如何评估DataFrame的真值似乎绝对清晰且完全明确。

在什么情况下,一个shape 0,0的DataFrame会不想要bool(df) -> False?这会导致什么问题,或者提供了什么功能,会不好吗?

反之,如果你有一个sum(shape) > 0的DataFrame,让bool(df) -> True有何不合理之处,或者会引发问题,或者会启用什么?

我不是在探讨哲学问题,只是想要一个示例或具体的原因。对我来说,这似乎非常明显会有多么有用,以及为什么不支持这一点非常模糊。错误消息本身让我理解了与错误消息正好相反的情况,经过思考后,它几乎看起来像是讽刺。

在我的工作中,结果是,我不能让DataFrame通过一个通常适用的逻辑系统流动,而是必须为这种单一类型的数据对象制定一个特殊情况,使用empty方法,而不是简单地在一般条件语句中评估它,例如:

if a: do_something()
else: do_something_else()

当a是一个DataFrame时,我必须添加代码以实现明显的预期结果。这是我正在处理的唯一需要单独的方法来检查是否为空并将其等同于false的数据对象。

显然,我可以编写:

if a or (type(a) is DataFrame and sum(a.shape) > 0): do_something

if a or not getattr(a, "empty"): do_something

或其他方式,绕过条件的哪一部分放在哪里,小心地不要将DataFrame布尔化或硬编码对象特定的方法(正如我在这些示例中没有做的那样)。

如上所述,问题是通过强制使用这些方法,可以避免什么具体的问题?

如果你的答案是赞同pandas的错误消息关于模糊性,那么我的问题将是,一个全零形状为False,任何其他形状为True的部分有什么明确之处。这是Python内置数据类型的一个基本特性,具有'形状'的对象为True,而没有形状/零长度的对象为False

英文:

If I have a dataframe, is there some specific reason that direct boolean evaluation isn't supported? Contrary to the pandas error message, it seems absolutely clear and completely unambiguous how to evaluate the truth-value of a dataframe.

What case would a dataframe with shape 0,0 would anyone not want bool(df) -> False? What would that break, or enable, that would be bad?

Conversely, if you have a dataframe with sum(shape) > 0, what would ever not make sense, break, or be enabled by letting bool(df) -> True?

Not asking for philosophy, just like an example or specific reason. To me, it seems very clear how useful this would be, and very opaque that it isn't supported. The error message itself led me to understand the exact opposite of the error message, it almost seems like satire after thinking about it.

In my work the result is, I can't have dataframes flow through a logic system that applies generally, instead i have to make a special case for this single type of data object to use the empty method instead of simply evaluating it in a general condition statement for example.

if a: do_something()
else: do_something_else()

When a is a dataframe, I have to add code to achieve the obvious expected result. It is the only data object I'm working with that requires a separate method to check if it's empty and equate that to being false.

Obviously I can write

if a or (if type(a) is DataFrame and sum(a.shape)>0: do_something

or

if a or not getattr(a, "empty"): do_something

or whatever, tiptoeing around which part of the condition goes where to carefully not bool the dataframe or hardcode the object-specific methods (as I didn't do in these examples).

As written above, the question is what specific harm is avoided by forcing these approaches?

If your answer is to agree with the pandas error message about ambiguity, I guess my question would be what is unambiguous about an all-zero shape being False and any other shape True. Its a fundamental aspect of Python's builtin datatypes that objects with 'shape' are True, and objects that are empty / without shape / zero length are False.

答案1

得分: 2

人们经常在数组和数据框上搞砸布尔运算,期望if检查能广播。从我看到的情况来看,这比人们尝试检查数据框是否为空要常见得多。禁用__bool__有助于捕捉这些错误,而不是产生无声的错误行为。

另外,您期望至少有一个非零维度的形状被视为True。这不是我对空值检查的期望,也不是DataFrame.empty的行为。我期望的是像DataFrame.empty那样的行为:只有在至少有一个零长度维度时,才将数据框视为空。

英文:

People constantly screw up boolean operations with arrays and dataframes, expecting if checks to broadcast. From what I've seen, this is much more common than people trying to check a dataframe for emptiness. Banning __bool__ helps catch these errors, rather than producing silent misbehavior.

Also, you expect any shape with at least one nonzero dimension to be considered True. That's not what I would expect from an emptiness check, and it's not what DataFrame.empty does. I would expect something that behaves like DataFrame.empty: a dataframe is considered empty if it has at least one zero-length dimension.

答案2

得分: 0

不要为每个 if 情况创建特殊情况,为可能的转换创建一个特殊函数

def coerce_to_bool(obj):
    if hasattr(obj, "empty"):
        return obj.empty
    return bool(obj)
    if coerce_to_bool(df):
       ...
英文:

Don't make a special case for each if, make a special function for possible conversions

def coerce_to_bool(obj):
    if hasattr(obj, "empty"):
        return obj.empty
    return bool(obj)
    if coerce_to_bool(df):
       ...

huangapple
  • 本文由 发表于 2023年6月9日 05:51:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76435925.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定