英文:
Pandas new column based in sum a column from another pandas
问题
Sure, here's the translated code portion you requested:
我有两个数据框,第一个是:
unit    year
0     1    2020
1     2    2021
2     3    2022
第二个是:
unit    observations
0     1               0
1     2               1
2     2               2
3     2               3
4     2               4
5     3               5
我需要在第一个数据框中添加一个列,该列是第二个数据框中单位的观察值之和,最终结果如下:
unit    year   observations
0     1    2020              0
1     2    2021             10
2     3    2022              5
我尝试了使用 df_1.iterrows 并基于第一个数据框的单位进行查询来进行求和,它有效,但我在谈论有约 400 万行的数据框,这个解决方案将需要几天时间。有人有更快的解决方案吗?
英文:
I have two dataframes, being the first one:
   unit    year
0     1    2020
1     2    2021
2     3    2022
and the second:
   unit    observations
0     1               0
1     2               1
2     2               2
3     2               3
4     2               4
5     3               5
I need to add a column at the first dataframe as a sum of observations for the unit at the second dataframe, I have something like this at the end
   unit    year   observations
0     1    2020              0
1     2    2021             10
2     3    2022              5
I tried to df_1.iterrows and using a query based in the unit from the first df to sum, and it worked, but I'm talking about a df with about to 4 million rows, this solution will take days. Someone have a quicker solution?
答案1
得分: 2
使用Series.map与第二个数据框中的sum进行聚合:
df1['observations'] = df1['unit'].map(df2.groupby('unit')['observations'].sum())
print(df1)
   unit  year  observations
0     1  2020             0
1     2  2021            10
2     3  2022             5
英文:
Use Series.map with aggregate sum in second DataFrame:
df1['observations'] = df1['unit'].map(df2.groupby('unit')['observations'].sum())
print (df1)
   unit  year  observations
0     1  2020             0
1     2  2021            10
2     3  2022             5
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论