2023年5月11日 11:56:11go评论105阅读模式

英文:

How do I get rid of duplicate values in pandas column?

问题

我将显示我的CSV。我正在使用Python中的pandas尝试清理我的CSV。

这是我的问题：

如何在 pandas 列中去除重复值？

我希望我的结果看起来像这样：

如何在 pandas 列中去除重复值？

我知道我只需要摆脱一些重复的内容，但我不知道如何使用pandas做到这一点。

我尝试了一些不同的方法，如重置索引、排序和使用dropna函数，但它们似乎都不起作用。

英文:

I'll show my csv. I'm using pandas from python and tying to clean up my csv.

Here is my problem

如何在 pandas 列中去除重复值？

I want my outcome to look like this

如何在 pandas 列中去除重复值？

I know I just need to get rid of some duplicates I think, but I don't know how with pandas.

I tried some different appoaches like resetting indexes and sorting and using dropna functions, but they didn't seem to work.

答案1

得分: 0

以下是翻译好的内容：

你可以使用groupby和sum的组合来对行进行去重，使用：

df.groupby('Name').sum()

假设你的Pandas数据框叫做df，并且数据框中没有其他列。

这是一个可行的示例：

import pandas as pd
import numpy as np
points = [np.NaN, 20, np.NaN, 1]
rebounds = [21, np.NaN, np.NaN, 300]
assists = [np.NaN, np.NaN, 3, np.NaN]
name = ['Andrew W', 'Andrew W', 'Andrew W', 'Hello World']
data = {
    'name': name,
    'points': points,
    'rebounds': rebounds,
    'assists': assists
}
df = pd.DataFrame(data)
print(df.to_markdown(index=False))
agg_df = df.groupby('name').sum()
print(agg_df.to_markdown())

聚合前的DataFrame

name	points	rebounds	assists
Andrew W	nan	21	nan
Andrew W	20	nan	nan
Andrew W	nan	nan	3
Hello World	1	300	nan

聚合后的DataFrame

name	points	rebounds	assists
Andrew W	20	21	3
Hello World	1	300	0

英文:

You can use a combination of groupby and sum to de-duplicate your rows, using:

df.groupby(&#39;Name&#39;).sum()

Assuming your pandas dataframe is called df and there are no other columns in the dataframe.

Here's a working example:

import pandas as pd
import numpy as np
points = [np.NaN, 20, np.NaN, 1]
rebounds = [21, np.NaN, np.NaN, 300]
assists = [np.NaN, np.NaN, 3, np.NaN]
name = [&#39;Andrew W&#39;, &#39;Andrew W&#39;, &#39;Andrew W&#39;, &#39;Hello World&#39;]
data = {
    &#39;name&#39;: name,
    &#39;points&#39;: points,
    &#39;rebounds&#39;: rebounds,
    &#39;assists&#39;: assists
}
df = pd.DataFrame(data)
print(df.to_markdown(index=False))
agg_df = df.groupby(&#39;name&#39;).sum()
print(agg_df.to_markdown())

DataFrame before aggregation

| name        |   points |   rebounds |   assists |
|:------------|---------:|-----------:|----------:|
| Andrew W    |      nan |         21 |       nan |
| Andrew W    |       20 |        nan |       nan |
| Andrew W    |      nan |        nan |         3 |
| Hello World |        1 |        300 |       nan |

DataFrame after aggregation

| name        |   points |   rebounds |   assists |
|:------------|---------:|-----------:|----------:|
| Andrew W    |       20 |         21 |         3 |
| Hello World |        1 |        300 |         0 |

答案2

得分: 0

你可以使用以下代码完成：

df = df.groupby('Name').sum()

文档链接：https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html

英文:

You can do it with:

df = df.groupby(&#39;Name&#39;).sum()

Docs here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在 pandas 列中去除重复值？

问题

答案1

答案2

Using Python 3.10, but Pyright LSP throws error "Pyright: Alternative syntax for unions requires Python 3.10 or newer"

Seaborn violinplot 中的最小和最大值无效。

获取字符串中每个斜杠之间的字符串。

在列中筛选包含子字符串的pandas数据框。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论