2023年6月29日 21:27:23go评论69阅读模式

英文:

"np.nan" isn't converted properly but "None" is

问题

在以下代码中，我生成了一些包含值 np.nan 的数据：

import pandas as pd
import numpy as np

n = 20
df = pd.DataFrame({"x": np.random.choice(["dog", "cat", np.nan], n), "y": range(0, n)})

随后，我通过函数 pd.notnull 检查缺失值，并且没有指示有任何缺失值：

pd.notnull(df["x"])

好的，原因是在创建中使用的 np.nan 在某种程度上被转换为字符串 "nan"。但是为什么会这样？例如，如果我在表达式中用 None 值替代 np.nan，即如果我通过 np.random.choice(["dog", "cat", None], n) 创建数据，那么一切都正常工作。

有人能解释为什么 np.nan 没有正确转换吗？而且一般来说：如何在不使用 np.nan 或 None 对象的情况下为字符串列创建随机缺失数据？

英文:

In the following code I generate some data containing the value np.nan:

import pandas as pd
import numpy as np

n = 20
df = pd.DataFrame({&quot;x&quot;: np.random.choice([&quot;dog&quot;,&quot;cat&quot;,np.nan],n), &quot;y&quot;: range(0,n)})

Subsequently I check for missing values via the function pd.notnull and this does not indicate that there are any missing values:

pd.notnull(df[&quot;x&quot;])

Ok, the reason is that the np.nan used in the creation got somehow translated into a string "nan". But why? For instance, if I substitute the None value in the expression for np.nan, i.e. if I create the data via np.random.choice(["dog","cat",None],n), then everything works.

Can someone explain why np.nan isn't properly converted? And in general: How do I create random missing data for a string column without using np.nan or the None object?

答案1

得分: 2

np.random.choice 创建一个 numpy 数组，该数组只能容纳一种数据类型，您可以尝试使用 dtype=float 手动设置数据类型（nan 是一个浮点数），但这不适用于字符串值。

options = np.array(["dog", "cat", np.nan], dtype=float) # ValueError: could not convert string to float: 'dog'
df = pd.DataFrame({"x": np.random.choice(options, n), "y": range(0, n)})

编辑：您可以将 dtype 设置为 object，然后代码将正常工作：

import pandas as pd
import numpy as np

n = 20
options = np.array(["dog", "cat", np.nan], dtype=object)
print(options)
df = pd.DataFrame({"x": np.random.choice(options, n), "y": range(0, n)})
print(df)

英文:

np.random.choice creates a numpy array, which can only hold one type of data, you can try to set the datatype manually with dtype=float (nan is a float), but that does not work with the string values.

options = np.array([&quot;dog&quot;,&quot;cat&quot;,np.nan], dtype=float) # ValueError: could not convert string to float: &#39;dog&#39;
df = pd.DataFrame({&quot;x&quot;: np.random.choice(options,n), &quot;y&quot;: range(0,n)})

edit: you can set dtype to object, then the code will work:

import pandas as pd
import numpy as np

n = 20
options = np.array([&quot;dog&quot;,&quot;cat&quot;,np.nan], dtype=object)
print(options)
df = pd.DataFrame({&quot;x&quot;: np.random.choice(options,n), &quot;y&quot;: range(0,n)})
print(df)```

</details>



# 答案2
**得分**: 1

关于为字符串列创建随机缺失数据，可以使用 [`.mask()`][1]：

```python
n = 20  
df = pd.DataFrame({"x": np.random.choice(["dog","cat"],n), "y": range(0, n)})  
mask = pd.Series(np.random.rand(n) < 0.33) # change to any fraction of missing values
df['x'] = df['x'].mask(mask)

英文:

As for creating random missing data for a string column, you can use .mask():

n = 20  
df = pd.DataFrame({&quot;x&quot;: np.random.choice([&quot;dog&quot;,&quot;cat&quot;],n), &quot;y&quot;: range(0, n)})  
mask = pd.Series(np.random.rand(n) &lt; 0.33) # change to any fraction of missing values
df[&#39;x&#39;] = df[&#39;x&#39;].mask(mask)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

“np.nan” 没有正确转换，但 “None” 是。

问题

答案1

Flask应用正在正确运行功能，但未呈现模板。

Google maps api works with manualy inserting json but not with inserting the same json from a python script

Pandas数据框 – 列包含对其他列的索引

替换字符并从pandas数据框中提取子字符串

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论