“NaN” 匹配语句

huangapple go评论94阅读模式
英文:

Match statement "NaN"

问题

我正在编写一个用于应用于Pandas数据帧的Python匹配/案例语句。但是数据中包含大量NaN值,我无法弄清楚如何处理它们,以便它们不会在默认情况下处理为_,因为我希望默认情况执行其他操作。

如何匹配NaN值?

这是我尝试过但不起作用的示例代码:

  1. import pandas as pd
  2. import numpy as np
  3. data = {'col': ["foo", "a", "b", np.nan]}
  4. df = pd.DataFrame(data)
  5. def handle_col(n):
  6. match n:
  7. case np.nan:
  8. return "不是数字"
  9. case "a":
  10. return "这是字母a"
  11. case "b":
  12. return "这是字母b"
  13. case _:
  14. return "这是另一个字符串"
  15. df["col"].apply(handle_col)

以下是输入数据帧:

  1. col
  2. 0 foo
  3. 1 a
  4. 2 b
  5. 3 NaN

以及我得到的(错误)答案:

  1. 0 这是另一个字符串
  2. 1 这是字母a
  3. 2 这是字母b
  4. 3 这是另一个字符串
  5. Name: col, dtype: object
英文:

I am writing a python match/case statement that I want to apply to a pandas dataframe. But the data has a bunch of NaN values in it, and I can't figure out how to process them without them being handled in the default case _, because I want the default case to do something else

How can I match on NaN values ?

Here is an example code that I tried and doesn't work:

  1. import pandas as pd
  2. import numpy as np
  3. data = {'col': ["foo", "a", "b", np.nan]}
  4. df = pd.DataFrame(data)
  5. def handle_col(n):
  6. match n:
  7. case np.nan:
  8. return "not a number"
  9. case "a":
  10. return "this is letter a"
  11. case "b":
  12. return "this is letter b"
  13. case _:
  14. return "this is another string"
  15. df["col"].apply(handle_col)

Here's the input dataframe

  1. col
  2. 0 foo
  3. 1 a
  4. 2 b
  5. 3 NaN

and the (wrong) answer I get:

  1. 0 this is another string
  2. 1 this is letter a
  3. 2 this is letter b
  4. 3 this is another string
  5. Name: col, dtype: object

答案1

得分: 1

问题在于空值不遵循相等性检查,即 np.nan == np.nanFalse,解决方法是在 case 中使用保护条件:

  1. def handle_col(n):
  2. match n:
  3. case _ if pd.isna(n):
  4. return "不是数字"
  5. case "a":
  6. return "这是字母 a"
  7. case "b":
  8. return "这是字母 b"
  9. case _:
  10. return "这是另一个字符串"
  11. df["col"].apply(handle_col)

  1. 0 这是另一个字符串
  2. 1 这是字母 a
  3. 2 这是字母 b
  4. 3 不是数字
  5. Name: col, dtype: object
英文:

The problem is that null values don't respect equality check i.e, np.nan == np.nan is False, the solution is to use guards in case:

  1. def handle_col(n):
  2. match n:
  3. case _ if pd.isna(n):
  4. return "not a number"
  5. case "a":
  6. return "this is letter a"
  7. case "b":
  8. return "this is letter b"
  9. case _:
  10. return "this is another string"
  11. df["col"].apply(handle_col)

  1. 0 this is another string
  2. 1 this is letter a
  3. 2 this is letter b
  4. 3 not a number
  5. Name: col, dtype: object

答案2

得分: 0

另一种解决方案,比较 number != number

  1. data = {'col': ["foo", "a", "b", np.nan, 1.4]}
  2. df = pd.DataFrame(data)
  3. def handle_col(n):
  4. match n:
  5. case float(n) if n != n:
  6. return "不是数字"
  7. case float(n):
  8. return "是数字"
  9. case "a":
  10. return "这是字母a"
  11. case "b":
  12. return "这是字母b"
  13. case _:
  14. return "这是另一个字符串"
  15. print(df["col"].apply(handle_col))

打印结果:

  1. 0 这是另一个字符串
  2. 1 这是字母a
  3. 2 这是字母b
  4. 3 不是数字
  5. 4 是数字
  6. Name: col, dtype: object
英文:

Another solution, compare number != number:

  1. data = {'col': ["foo", "a", "b", np.nan, 1.4]}
  2. df = pd.DataFrame(data)
  3. def handle_col(n):
  4. match n:
  5. case float(n) if n != n:
  6. return "not a number"
  7. case float(n):
  8. return "a number"
  9. case "a":
  10. return "this is letter a"
  11. case "b":
  12. return "this is letter b"
  13. case _:
  14. return "this is another string"
  15. print(df["col"].apply(handle_col))

Prints:

  1. 0 this is another string
  2. 1 this is letter a
  3. 2 this is letter b
  4. 3 not a number
  5. 4 a number
  6. Name: col, dtype: object

huangapple
  • 本文由 发表于 2023年7月24日 18:34:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76753624.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定