Pandas基于另一个Pandas列的总和创建新列

huangapple go评论100阅读模式
英文:

Pandas new column based in sum a column from another pandas

问题

Sure, here's the translated code portion you requested:

  1. 我有两个数据框第一个是

unit year
0 1 2020
1 2 2021
2 3 2022

  1. 第二个是:

unit observations
0 1 0
1 2 1
2 2 2
3 2 3
4 2 4
5 3 5

  1. 我需要在第一个数据框中添加一个列,该列是第二个数据框中单位的观察值之和,最终结果如下:

unit year observations
0 1 2020 0
1 2 2021 10
2 3 2022 5

  1. 我尝试了使用 df_1.iterrows 并基于第一个数据框的单位进行查询来进行求和,它有效,但我在谈论有约 400 万行的数据框,这个解决方案将需要几天时间。有人有更快的解决方案吗?
英文:

I have two dataframes, being the first one:

  1. unit year
  2. 0 1 2020
  3. 1 2 2021
  4. 2 3 2022

and the second:

  1. unit observations
  2. 0 1 0
  3. 1 2 1
  4. 2 2 2
  5. 3 2 3
  6. 4 2 4
  7. 5 3 5

I need to add a column at the first dataframe as a sum of observations for the unit at the second dataframe, I have something like this at the end

  1. unit year observations
  2. 0 1 2020 0
  3. 1 2 2021 10
  4. 2 3 2022 5

I tried to df_1.iterrows and using a query based in the unit from the first df to sum, and it worked, but I'm talking about a df with about to 4 million rows, this solution will take days. Someone have a quicker solution?

答案1

得分: 2

使用Series.map与第二个数据框中的sum进行聚合:

  1. df1['observations'] = df1['unit'].map(df2.groupby('unit')['observations'].sum())
  2. print(df1)
  3. unit year observations
  4. 0 1 2020 0
  5. 1 2 2021 10
  6. 2 3 2022 5
英文:

Use Series.map with aggregate sum in second DataFrame:

  1. df1['observations'] = df1['unit'].map(df2.groupby('unit')['observations'].sum())
  2. print (df1)
  3. unit year observations
  4. 0 1 2020 0
  5. 1 2 2021 10
  6. 2 3 2022 5

huangapple
  • 本文由 发表于 2023年4月17日 20:40:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76035259.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定