如何在 pandas 列中去除重复值?

huangapple go评论62阅读模式
英文:

How do I get rid of duplicate values in pandas column?

问题

我将显示我的CSV。我正在使用Python中的pandas尝试清理我的CSV。

这是我的问题:

如何在 pandas 列中去除重复值?

我希望我的结果看起来像这样:

如何在 pandas 列中去除重复值?

我知道我只需要摆脱一些重复的内容,但我不知道如何使用pandas做到这一点。

我尝试了一些不同的方法,如重置索引、排序和使用dropna函数,但它们似乎都不起作用。

英文:

I'll show my csv. I'm using pandas from python and tying to clean up my csv.

Here is my problem

如何在 pandas 列中去除重复值?

I want my outcome to look like this

如何在 pandas 列中去除重复值?

I know I just need to get rid of some duplicates I think, but I don't know how with pandas.

I tried some different appoaches like resetting indexes and sorting and using dropna functions, but they didn't seem to work.

答案1

得分: 0

以下是翻译好的内容:

你可以使用groupbysum的组合来对行进行去重,使用:

df.groupby('Name').sum()

假设你的Pandas数据框叫做df,并且数据框中没有其他列。

这是一个可行的示例:

import pandas as pd
import numpy as np

points = [np.NaN, 20, np.NaN, 1]
rebounds = [21, np.NaN, np.NaN, 300]
assists = [np.NaN, np.NaN, 3, np.NaN]
name = ['Andrew W', 'Andrew W', 'Andrew W', 'Hello World']

data = {
    'name': name,
    'points': points,
    'rebounds': rebounds,
    'assists': assists
}

df = pd.DataFrame(data)
print(df.to_markdown(index=False))

agg_df = df.groupby('name').sum()
print(agg_df.to_markdown())

聚合前的DataFrame

name points rebounds assists
Andrew W nan 21 nan
Andrew W 20 nan nan
Andrew W nan nan 3
Hello World 1 300 nan

聚合后的DataFrame

name points rebounds assists
Andrew W 20 21 3
Hello World 1 300 0
英文:

You can use a combination of groupby and sum to de-duplicate your rows, using:

df.groupby('Name').sum()

Assuming your pandas dataframe is called df and there are no other columns in the dataframe.

Here's a working example:

import pandas as pd
import numpy as np

points = [np.NaN, 20, np.NaN, 1]
rebounds = [21, np.NaN, np.NaN, 300]
assists = [np.NaN, np.NaN, 3, np.NaN]
name = ['Andrew W', 'Andrew W', 'Andrew W', 'Hello World']

data = {
    'name': name,
    'points': points,
    'rebounds': rebounds,
    'assists': assists
}

df = pd.DataFrame(data)
print(df.to_markdown(index=False))

agg_df = df.groupby('name').sum()
print(agg_df.to_markdown())

DataFrame before aggregation

| name        |   points |   rebounds |   assists |
|:------------|---------:|-----------:|----------:|
| Andrew W    |      nan |         21 |       nan |
| Andrew W    |       20 |        nan |       nan |
| Andrew W    |      nan |        nan |         3 |
| Hello World |        1 |        300 |       nan |

DataFrame after aggregation

| name        |   points |   rebounds |   assists |
|:------------|---------:|-----------:|----------:|
| Andrew W    |       20 |         21 |         3 |
| Hello World |        1 |        300 |         0 |

答案2

得分: 0

你可以使用以下代码完成:

df = df.groupby('Name').sum()

文档链接:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html

英文:

You can do it with:

df = df.groupby('Name').sum()

Docs here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html

huangapple
  • 本文由 发表于 2023年5月11日 11:56:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76224051.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定