“NaN” 匹配语句

huangapple go评论65阅读模式
英文:

Match statement "NaN"

问题

我正在编写一个用于应用于Pandas数据帧的Python匹配/案例语句。但是数据中包含大量NaN值,我无法弄清楚如何处理它们,以便它们不会在默认情况下处理为_,因为我希望默认情况执行其他操作。

如何匹配NaN值?

这是我尝试过但不起作用的示例代码:

import pandas as pd
import numpy as np

data = {'col': ["foo", "a", "b", np.nan]}
df = pd.DataFrame(data)

def handle_col(n):
    match n:
        case np.nan:
            return "不是数字"
        case "a":
            return "这是字母a"
        case "b":
            return "这是字母b"
        case _:
            return "这是另一个字符串"
    
df["col"].apply(handle_col)

以下是输入数据帧:

    col
0   foo
1   a
2   b
3   NaN

以及我得到的(错误)答案:

0    这是另一个字符串
1         这是字母a
2         这是字母b
3    这是另一个字符串
Name: col, dtype: object
英文:

I am writing a python match/case statement that I want to apply to a pandas dataframe. But the data has a bunch of NaN values in it, and I can't figure out how to process them without them being handled in the default case _, because I want the default case to do something else

How can I match on NaN values ?

Here is an example code that I tried and doesn't work:

import pandas as pd
import numpy as np

data = {'col': ["foo", "a", "b", np.nan]}
df = pd.DataFrame(data)

def handle_col(n):
    match n:
        case np.nan:
            return "not a number"
        case "a":
            return "this is letter a"
        case "b":
            return "this is letter b"
        case _:
            return "this is another string"
    
df["col"].apply(handle_col)

Here's the input dataframe

	col
0	foo
1	a
2	b
3	NaN

and the (wrong) answer I get:

0    this is another string
1          this is letter a
2          this is letter b
3    this is another string
Name: col, dtype: object

答案1

得分: 1

问题在于空值不遵循相等性检查,即 np.nan == np.nanFalse,解决方法是在 case 中使用保护条件:

def handle_col(n):
    match n:
        case _ if pd.isna(n):
            return "不是数字"
        case "a":
            return "这是字母 a"
        case "b":
            return "这是字母 b"
        case _:
            return "这是另一个字符串"

df["col"].apply(handle_col)

0    这是另一个字符串
1       这是字母 a
2       这是字母 b
3         不是数字
Name: col, dtype: object
英文:

The problem is that null values don't respect equality check i.e, np.nan == np.nan is False, the solution is to use guards in case:

def handle_col(n):
    match n:
        case _ if pd.isna(n):
            return "not a number"
        case "a":
            return "this is letter a"
        case "b":
            return "this is letter b"
        case _:
            return "this is another string"

df["col"].apply(handle_col)

0    this is another string
1          this is letter a
2          this is letter b
3              not a number
Name: col, dtype: object

答案2

得分: 0

另一种解决方案,比较 number != number

data = {'col': ["foo", "a", "b", np.nan, 1.4]}
df = pd.DataFrame(data)

def handle_col(n):
    match n:
        case float(n) if n != n:
            return "不是数字"
        case float(n):
            return "是数字"
        case "a":
            return "这是字母a"
        case "b":
            return "这是字母b"
        case _:
            return "这是另一个字符串"

print(df["col"].apply(handle_col))

打印结果:

0    这是另一个字符串
1          这是字母a
2          这是字母b
3           不是数字
4            是数字
Name: col, dtype: object
英文:

Another solution, compare number != number:

data = {'col': ["foo", "a", "b", np.nan, 1.4]}
df = pd.DataFrame(data)

def handle_col(n):
    match n:
        case float(n) if n != n:
            return "not a number"
        case float(n):
            return "a number"
        case "a":
            return "this is letter a"
        case "b":
            return "this is letter b"
        case _:
            return "this is another string"

print(df["col"].apply(handle_col))

Prints:

0    this is another string
1          this is letter a
2          this is letter b
3              not a number
4                  a number
Name: col, dtype: object

huangapple
  • 本文由 发表于 2023年7月24日 18:34:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76753624.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定