如何在 pandas 列中去除重复值?

huangapple go评论105阅读模式
英文:

How do I get rid of duplicate values in pandas column?

问题

我将显示我的CSV。我正在使用Python中的pandas尝试清理我的CSV。

这是我的问题:

如何在 pandas 列中去除重复值?

我希望我的结果看起来像这样:

如何在 pandas 列中去除重复值?

我知道我只需要摆脱一些重复的内容,但我不知道如何使用pandas做到这一点。

我尝试了一些不同的方法,如重置索引、排序和使用dropna函数,但它们似乎都不起作用。

英文:

I'll show my csv. I'm using pandas from python and tying to clean up my csv.

Here is my problem

如何在 pandas 列中去除重复值?

I want my outcome to look like this

如何在 pandas 列中去除重复值?

I know I just need to get rid of some duplicates I think, but I don't know how with pandas.

I tried some different appoaches like resetting indexes and sorting and using dropna functions, but they didn't seem to work.

答案1

得分: 0

以下是翻译好的内容:

你可以使用groupbysum的组合来对行进行去重,使用:

  1. df.groupby('Name').sum()

假设你的Pandas数据框叫做df,并且数据框中没有其他列。

这是一个可行的示例:

  1. import pandas as pd
  2. import numpy as np
  3. points = [np.NaN, 20, np.NaN, 1]
  4. rebounds = [21, np.NaN, np.NaN, 300]
  5. assists = [np.NaN, np.NaN, 3, np.NaN]
  6. name = ['Andrew W', 'Andrew W', 'Andrew W', 'Hello World']
  7. data = {
  8. 'name': name,
  9. 'points': points,
  10. 'rebounds': rebounds,
  11. 'assists': assists
  12. }
  13. df = pd.DataFrame(data)
  14. print(df.to_markdown(index=False))
  15. agg_df = df.groupby('name').sum()
  16. print(agg_df.to_markdown())

聚合前的DataFrame

name points rebounds assists
Andrew W nan 21 nan
Andrew W 20 nan nan
Andrew W nan nan 3
Hello World 1 300 nan

聚合后的DataFrame

name points rebounds assists
Andrew W 20 21 3
Hello World 1 300 0
英文:

You can use a combination of groupby and sum to de-duplicate your rows, using:

  1. df.groupby('Name').sum()

Assuming your pandas dataframe is called df and there are no other columns in the dataframe.

Here's a working example:

  1. import pandas as pd
  2. import numpy as np
  3. points = [np.NaN, 20, np.NaN, 1]
  4. rebounds = [21, np.NaN, np.NaN, 300]
  5. assists = [np.NaN, np.NaN, 3, np.NaN]
  6. name = ['Andrew W', 'Andrew W', 'Andrew W', 'Hello World']
  7. data = {
  8. 'name': name,
  9. 'points': points,
  10. 'rebounds': rebounds,
  11. 'assists': assists
  12. }
  13. df = pd.DataFrame(data)
  14. print(df.to_markdown(index=False))
  15. agg_df = df.groupby('name').sum()
  16. print(agg_df.to_markdown())

DataFrame before aggregation

  1. | name | points | rebounds | assists |
  2. |:------------|---------:|-----------:|----------:|
  3. | Andrew W | nan | 21 | nan |
  4. | Andrew W | 20 | nan | nan |
  5. | Andrew W | nan | nan | 3 |
  6. | Hello World | 1 | 300 | nan |

DataFrame after aggregation

  1. | name | points | rebounds | assists |
  2. |:------------|---------:|-----------:|----------:|
  3. | Andrew W | 20 | 21 | 3 |
  4. | Hello World | 1 | 300 | 0 |

答案2

得分: 0

你可以使用以下代码完成:

  1. df = df.groupby('Name').sum()

文档链接:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html

英文:

You can do it with:

  1. df = df.groupby('Name').sum()

Docs here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html

huangapple
  • 本文由 发表于 2023年5月11日 11:56:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76224051.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定