英文:
Iterating over rows and groups in dataframe
问题
假设我有以下数据框:
d = {'Date': ['2020-1-1', '2020-1-2', '2020-1-3', '2020-1-1', '2020-1-2', '2020-1-3', '2020-1-1', '2020-1-2', '2020-1-3'],
'col2': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
'col3': [0.01, 0.02, 0.03, 0.02, 0.03, 0.04, 0.05, 0.1, 0.01]}
d = pd.DataFrame(data=d)
d['Date'] = pd.to_datetime(d['Date'])
并且得到了以下数据框:
Date col2 col3
0 2020-01-01 A 0.01
1 2020-01-02 A 0.02
2 2020-01-03 A 0.03
3 2020-01-01 B 0.02
4 2020-01-02 B 0.03
5 2020-01-03 B 0.04
6 2020-01-01 C 0.05
7 2020-01-02 C 0.10
8 2020-01-03 C 0.01
你想要遍历行,以便对于每个Date
获取col3
的两个最高值,并按col2
分组。例如,你应该得到以下结果:
Date col2 col3
3 2020-01-01 B 0.02
6 2020-01-01 C 0.05
4 2020-01-02 B 0.03
7 2020-01-02 C 0.10
2 2020-01-03 A 0.03
5 2020-01-03 B 0.04
最后,对于每一天,你想要计算col3
的总和,如下所示:
Date sum
0 2020-01-01 0.07
1 2020-01-02 0.13
2 2020-01-03 0.07
请注意,实际示例中可能有更多的组(col2
)和更多的日期。
英文:
Suppose I have following data frame:
d = {'Date': ['2020-1-1', '2020-1-2', '2020-1-3', '2020-1-1', '2020-1-2',
'2020-1-3','2020-1-1', '2020-1-2', '2020-1-3'], 'col2': ['A','A','A', 'B',
'B','B', 'C','C','C'],
'col3':[0.01,0.02,0.03,0.02,0.03,0.04,0.05,0.1,0.01]}
d = pd.DataFrame(data=d)
d['Date'] = pd.to_datetime(d['Date'])
d
and get:
Date col2 col3
0 2020-01-01 A 0.01
1 2020-01-02 A 0.02
2 2020-01-03 A 0.03
3 2020-01-01 B 0.02
4 2020-01-02 B 0.03
5 2020-01-03 B 0.04
6 2020-01-01 C 0.05
7 2020-01-02 C 0.10
8 2020-01-03 C 0.01
How could I iterate over rows, so that I get for each Date
the 2 highest values of col3
and group from col2
?
So for example I should get:
2020-01-01 B 0.02
2020-01-01 C 0.05
2020-01-02 B 0.03
2020-01-02 C 0.10
2020-01-03 A 0.03
2020-01-03 B 0.04
And at the end to sum for each day col 3:
Date sum
2020-01-01 0.07
2020-01-02 0.13
2020-01-03 0.07
But of course real example has much more groups (col2
) and more dates.
答案1
得分: 1
我认为这样:
d[['日期', 'col3']].groupby('日期')['col3'].nlargest(2).groupby('日期').sum()
英文:
I think this:
d[['Date', 'col3']].groupby('Date')['col3'].nlargest(2).groupby('Date').sum()
答案2
得分: 1
grouped_df = d.groupby("日期")
results = []
for name, group in grouped_df:
group.sort_values(by="col3", ascending=False, inplace=True)
top_2 = group.nlargest(2, "col3")
top_2_sum = top_2["col3"].sum()
results.append((name, top_2_sum))
sum_df = pd.DataFrame(results, columns=["日期", "总和"])
print(sum_df)
Output:
日期 总和
0 2020-01-01 0.07
1 2020-01-02 0.13
2 2020-01-03 0.07
英文:
grouped_df = d.groupby("Date")
results = []
for name, group in grouped_df:
group.sort_values(by="col3", ascending=False, inplace=True)
top_2 = group.nlargest(2, "col3")
top_2_sum = top_2["col3"].sum()
results.append((name, top_2_sum))
sum_df = pd.DataFrame(results, columns=["Date", "Sum"])
print(sum_df)
Output:
Date Sum
0 2020-01-01 0.07
1 2020-01-02 0.13
2 2020-01-03 0.07
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论