2023年7月24日 18:34:00go评论94阅读模式

英文:

Match statement "NaN"

问题

我正在编写一个用于应用于Pandas数据帧的Python匹配/案例语句。但是数据中包含大量NaN值，我无法弄清楚如何处理它们，以便它们不会在默认情况下处理为_，因为我希望默认情况执行其他操作。

如何匹配NaN值？

这是我尝试过但不起作用的示例代码：

import pandas as pd
import numpy as np
data = {'col': ["foo", "a", "b", np.nan]}
df = pd.DataFrame(data)
def handle_col(n):
    match n:
        case np.nan:
            return "不是数字"
        case "a":
            return "这是字母a"
        case "b":
            return "这是字母b"
        case _:
            return "这是另一个字符串"
    
df["col"].apply(handle_col)

以下是输入数据帧：

    col
0   foo
1   a
2   b
3   NaN

以及我得到的（错误）答案：

0    这是另一个字符串
1         这是字母a
2         这是字母b
3    这是另一个字符串
Name: col, dtype: object

英文:

I am writing a python match/case statement that I want to apply to a pandas dataframe. But the data has a bunch of NaN values in it, and I can't figure out how to process them without them being handled in the default case _, because I want the default case to do something else

How can I match on NaN values ?

Here is an example code that I tried and doesn't work:

import pandas as pd
import numpy as np
data = {&#39;col&#39;: [&quot;foo&quot;, &quot;a&quot;, &quot;b&quot;, np.nan]}
df = pd.DataFrame(data)
def handle_col(n):
    match n:
        case np.nan:
            return &quot;not a number&quot;
        case &quot;a&quot;:
            return &quot;this is letter a&quot;
        case &quot;b&quot;:
            return &quot;this is letter b&quot;
        case _:
            return &quot;this is another string&quot;
    
df[&quot;col&quot;].apply(handle_col)

Here's the input dataframe

	col
0	foo
1	a
2	b
3	NaN

and the (wrong) answer I get:

0    this is another string
1          this is letter a
2          this is letter b
3    this is another string
Name: col, dtype: object

答案1

得分: 1

问题在于空值不遵循相等性检查，即 np.nan == np.nan 为 False，解决方法是在 case 中使用保护条件：

def handle_col(n):
    match n:
        case _ if pd.isna(n):
            return "不是数字"
        case "a":
            return "这是字母 a"
        case "b":
            return "这是字母 b"
        case _:
            return "这是另一个字符串"
df["col"].apply(handle_col)

0    这是另一个字符串
1       这是字母 a
2       这是字母 b
3         不是数字
Name: col, dtype: object

英文:

The problem is that null values don't respect equality check i.e, np.nan == np.nan is False, the solution is to use guards in case:

def handle_col(n):
    match n:
        case _ if pd.isna(n):
            return &quot;not a number&quot;
        case &quot;a&quot;:
            return &quot;this is letter a&quot;
        case &quot;b&quot;:
            return &quot;this is letter b&quot;
        case _:
            return &quot;this is another string&quot;
df[&quot;col&quot;].apply(handle_col)

0    this is another string
1          this is letter a
2          this is letter b
3              not a number
Name: col, dtype: object

答案2

得分: 0

另一种解决方案，比较 number != number：

data = {'col': ["foo", "a", "b", np.nan, 1.4]}
df = pd.DataFrame(data)
def handle_col(n):
    match n:
        case float(n) if n != n:
            return "不是数字"
        case float(n):
            return "是数字"
        case "a":
            return "这是字母a"
        case "b":
            return "这是字母b"
        case _:
            return "这是另一个字符串"
print(df["col"].apply(handle_col))

打印结果：

0    这是另一个字符串
1          这是字母a
2          这是字母b
3           不是数字
4            是数字
Name: col, dtype: object

英文:

Another solution, compare number != number:

data = {&#39;col&#39;: [&quot;foo&quot;, &quot;a&quot;, &quot;b&quot;, np.nan, 1.4]}
df = pd.DataFrame(data)
def handle_col(n):
    match n:
        case float(n) if n != n:
            return &quot;not a number&quot;
        case float(n):
            return &quot;a number&quot;
        case &quot;a&quot;:
            return &quot;this is letter a&quot;
        case &quot;b&quot;:
            return &quot;this is letter b&quot;
        case _:
            return &quot;this is another string&quot;
print(df[&quot;col&quot;].apply(handle_col))

Prints:

0    this is another string
1          this is letter a
2          this is letter b
3              not a number
4                  a number
Name: col, dtype: object

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

“NaN” 匹配语句

问题

答案1

答案2

pandas：顺序合并会添加新列而不是替换NaN值。

在 pandas 数据帧中基于其他列最小值的索引创建新列。

使用INSERT INTO合并Spark数据框

如何在Python函数内更改依赖其他全局变量的全局变量。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。