2023年4月10日 23:03:11go评论92阅读模式

英文:

Pandas Data Error on value_counts() does not display the count correctly to clean data

问题

当清洗数据时，需要识别特定列中的任何拼写错误，该列的值为1或0，表示是或否。

为了查看拼写错误，我尝试执行 print(df["Column Name"].value_counts())。

结果如下：

我尝试使用替换命令来替换Y，但结果是将3添加到一组1中，并仅显示该组1和单个0。

为什么相同类型被分类为两种类型？
如何将字符串更改为数字，使结果如下所示：

1     110
0     122

我尝试了以下操作：

df["Column Name"].str.strip()
df["Column Name"].replace(" 1", "1")
df["Column Name"].replace("Y", "1")

英文:

When cleaning the data it is required to identify any typos
in the particular column that has to be cleaned the values are either 1 or 0 for denoting Yes or No.

To view the typos i try to print(df["Column Name"].value_counts())

The results come as

I try the replace command for Y but it will then result adding 3 for one set of 1s and display only that 1 set and a single 0 set.

Why the same type are being categorised as two types ?
How is it possible to amend the string to the Numbers and get the following result as it should be

1     110
0     122

I tried

df[&quot;Column Name&quot;].str.strip()
df[&quot;Column Name&quot;].replace(&quot; 1&quot;,&quot;1&quot;)
df[&quot;Column Name&quot;].replace(&quot;Y&quot;,&quot;1&quot;)

答案1

得分: 1

尝试使用 pd.to_numeric：

df['Column Name'] = pd.to_numeric(df["Column Name"].str.strip().replace({'Y': 1, 'N': 0}))
df.value_counts()

尝试使用 np.unique 来检查你的数据框：

import numpy as np
np.unique(df['Column Name'], return_counts=True)

未修改的部分：

>>> df['Column Name'].value_counts(sort=False)
1     40
1     67
0     89
0     33
Y      3
Name: Column Name, dtype: int64

带有修改的部分：

>>> pd.to_numeric(df["Column Name"].str.strip().replace({'Y': 1, 'N': 0})).value_counts()
0    122
1    110
Name: Column Name, dtype: int64

英文:

Try to use pd.to_numeric:

df[&#39;Column Name&#39;] = pd.to_numeric(df[&quot;Column Name&quot;].str.strip().replace({&#39;Y&#39;: 1, &#39;N&#39;: 0}))
df.value_counts()

Try to use the np.unique to check your dataframe:

import numpy as np
np.unique(df[&#39;Column Name&#39;], return_counts=True)

Without modification:

&gt;&gt;&gt; df[&#39;Column Name&#39;].value_counts(sort=False)
1     40
1     67
0     89
0     33
Y      3
Name: Column Name, dtype: int64

With modification:

&gt;&gt;&gt; pd.to_numeric(df[&quot;Column Name&quot;].str.strip().replace({&#39;Y&#39;: 1, &#39;N&#39;: 0})).value_counts()
0    122
1    110
Name: Column Name, dtype: int64

答案2

得分: 1

以下是翻译好的内容：

一个强大的将您的数据转换的方法可能是：

df = pd.DataFrame({'列名': [0, 1, '1', '1 ', ' 1 ', 'Y', 'N']})

映射器 = {'Y': 1, 'N': 0}

df['输出'] = df['列名'].astype(str).str.strip().replace(映射器)#.astype(int)

输出：

列名输出
0 0 0
1 1 1
2 1 1
3 1 1
4 1 1
5 Y 1
6 N 0


<details>
<summary>英文:</summary>
A robust method to convert your data might be:

df = pd.DataFrame({'Column Name': [0, 1, '1', '1 ', ' 1 ', 'Y', 'N']})

mapper = {'Y': 1, 'N': 0}

df['out'] = df['Column Name'].astype(str).str.strip().replace(mapper)#.astype(int)

Output:

Column Name out
0 0 0
1 1 1
2 1 1
3 1 1
4 1 1
5 Y 1
6 N 0


</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas Data Error on value_counts() does not display the count correctly to clean data.

问题

答案1

答案2

GitHub Action Error: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt' while deploying on Azure App Service

基于匹配观察时间计算差异。

有没有检查MediaWiki页面标题是否有效的正则表达式或类似简单方法？

我在Tkinter中遇到了网格管理和图像更改方面的困难。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。