英文:
Match statement "NaN"
问题
我正在编写一个用于应用于Pandas数据帧的Python匹配/案例语句。但是数据中包含大量NaN值,我无法弄清楚如何处理它们,以便它们不会在默认情况下处理为_
,因为我希望默认情况执行其他操作。
如何匹配NaN值?
这是我尝试过但不起作用的示例代码:
import pandas as pd
import numpy as np
data = {'col': ["foo", "a", "b", np.nan]}
df = pd.DataFrame(data)
def handle_col(n):
match n:
case np.nan:
return "不是数字"
case "a":
return "这是字母a"
case "b":
return "这是字母b"
case _:
return "这是另一个字符串"
df["col"].apply(handle_col)
以下是输入数据帧:
col
0 foo
1 a
2 b
3 NaN
以及我得到的(错误)答案:
0 这是另一个字符串
1 这是字母a
2 这是字母b
3 这是另一个字符串
Name: col, dtype: object
英文:
I am writing a python match/case statement that I want to apply to a pandas dataframe. But the data has a bunch of NaN values in it, and I can't figure out how to process them without them being handled in the default case _
, because I want the default case to do something else
How can I match on NaN values ?
Here is an example code that I tried and doesn't work:
import pandas as pd
import numpy as np
data = {'col': ["foo", "a", "b", np.nan]}
df = pd.DataFrame(data)
def handle_col(n):
match n:
case np.nan:
return "not a number"
case "a":
return "this is letter a"
case "b":
return "this is letter b"
case _:
return "this is another string"
df["col"].apply(handle_col)
Here's the input dataframe
col
0 foo
1 a
2 b
3 NaN
and the (wrong) answer I get:
0 this is another string
1 this is letter a
2 this is letter b
3 this is another string
Name: col, dtype: object
答案1
得分: 1
问题在于空值不遵循相等性检查,即 np.nan == np.nan
为 False
,解决方法是在 case
中使用保护条件:
def handle_col(n):
match n:
case _ if pd.isna(n):
return "不是数字"
case "a":
return "这是字母 a"
case "b":
return "这是字母 b"
case _:
return "这是另一个字符串"
df["col"].apply(handle_col)
0 这是另一个字符串
1 这是字母 a
2 这是字母 b
3 不是数字
Name: col, dtype: object
英文:
The problem is that null values don't respect equality check i.e, np.nan == np.nan
is False
, the solution is to use guards in case
:
def handle_col(n):
match n:
case _ if pd.isna(n):
return "not a number"
case "a":
return "this is letter a"
case "b":
return "this is letter b"
case _:
return "this is another string"
df["col"].apply(handle_col)
0 this is another string
1 this is letter a
2 this is letter b
3 not a number
Name: col, dtype: object
答案2
得分: 0
另一种解决方案,比较 number != number
:
data = {'col': ["foo", "a", "b", np.nan, 1.4]}
df = pd.DataFrame(data)
def handle_col(n):
match n:
case float(n) if n != n:
return "不是数字"
case float(n):
return "是数字"
case "a":
return "这是字母a"
case "b":
return "这是字母b"
case _:
return "这是另一个字符串"
print(df["col"].apply(handle_col))
打印结果:
0 这是另一个字符串
1 这是字母a
2 这是字母b
3 不是数字
4 是数字
Name: col, dtype: object
英文:
Another solution, compare number != number
:
data = {'col': ["foo", "a", "b", np.nan, 1.4]}
df = pd.DataFrame(data)
def handle_col(n):
match n:
case float(n) if n != n:
return "not a number"
case float(n):
return "a number"
case "a":
return "this is letter a"
case "b":
return "this is letter b"
case _:
return "this is another string"
print(df["col"].apply(handle_col))
Prints:
0 this is another string
1 this is letter a
2 this is letter b
3 not a number
4 a number
Name: col, dtype: object
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论