英文:
How do I get rid of duplicate values in pandas column?
问题
我将显示我的CSV。我正在使用Python中的pandas尝试清理我的CSV。
这是我的问题:
我希望我的结果看起来像这样:
我知道我只需要摆脱一些重复的内容,但我不知道如何使用pandas做到这一点。
我尝试了一些不同的方法,如重置索引、排序和使用dropna函数,但它们似乎都不起作用。
英文:
I'll show my csv. I'm using pandas from python and tying to clean up my csv.
Here is my problem
I want my outcome to look like this
I know I just need to get rid of some duplicates I think, but I don't know how with pandas.
I tried some different appoaches like resetting indexes and sorting and using dropna functions, but they didn't seem to work.
答案1
得分: 0
以下是翻译好的内容:
你可以使用groupby
和sum
的组合来对行进行去重,使用:
df.groupby('Name').sum()
假设你的Pandas数据框叫做df
,并且数据框中没有其他列。
这是一个可行的示例:
import pandas as pd
import numpy as np
points = [np.NaN, 20, np.NaN, 1]
rebounds = [21, np.NaN, np.NaN, 300]
assists = [np.NaN, np.NaN, 3, np.NaN]
name = ['Andrew W', 'Andrew W', 'Andrew W', 'Hello World']
data = {
'name': name,
'points': points,
'rebounds': rebounds,
'assists': assists
}
df = pd.DataFrame(data)
print(df.to_markdown(index=False))
agg_df = df.groupby('name').sum()
print(agg_df.to_markdown())
聚合前的DataFrame
name | points | rebounds | assists |
---|---|---|---|
Andrew W | nan | 21 | nan |
Andrew W | 20 | nan | nan |
Andrew W | nan | nan | 3 |
Hello World | 1 | 300 | nan |
聚合后的DataFrame
name | points | rebounds | assists |
---|---|---|---|
Andrew W | 20 | 21 | 3 |
Hello World | 1 | 300 | 0 |
英文:
You can use a combination of groupby
and sum
to de-duplicate your rows, using:
df.groupby('Name').sum()
Assuming your pandas dataframe is called df
and there are no other columns in the dataframe.
Here's a working example:
import pandas as pd
import numpy as np
points = [np.NaN, 20, np.NaN, 1]
rebounds = [21, np.NaN, np.NaN, 300]
assists = [np.NaN, np.NaN, 3, np.NaN]
name = ['Andrew W', 'Andrew W', 'Andrew W', 'Hello World']
data = {
'name': name,
'points': points,
'rebounds': rebounds,
'assists': assists
}
df = pd.DataFrame(data)
print(df.to_markdown(index=False))
agg_df = df.groupby('name').sum()
print(agg_df.to_markdown())
DataFrame before aggregation
| name | points | rebounds | assists |
|:------------|---------:|-----------:|----------:|
| Andrew W | nan | 21 | nan |
| Andrew W | 20 | nan | nan |
| Andrew W | nan | nan | 3 |
| Hello World | 1 | 300 | nan |
DataFrame after aggregation
| name | points | rebounds | assists |
|:------------|---------:|-----------:|----------:|
| Andrew W | 20 | 21 | 3 |
| Hello World | 1 | 300 | 0 |
答案2
得分: 0
你可以使用以下代码完成:
df = df.groupby('Name').sum()
文档链接:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html
英文:
You can do it with:
df = df.groupby('Name').sum()
Docs here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论