2023年4月17日 20:40:56go评论106阅读模式

英文:

Pandas new column based in sum a column from another pandas

问题

Sure, here's the translated code portion you requested:

我有两个数据框，第一个是：

unit year
0 1 2020
1 2 2021
2 3 2022


第二个是：

unit observations
0 1 0
1 2 1
2 2 2
3 2 3
4 2 4
5 3 5


我需要在第一个数据框中添加一个列，该列是第二个数据框中单位的观察值之和，最终结果如下：

unit year observations
0 1 2020 0
1 2 2021 10
2 3 2022 5


我尝试了使用 df_1.iterrows 并基于第一个数据框的单位进行查询来进行求和，它有效，但我在谈论有约 400 万行的数据框，这个解决方案将需要几天时间。有人有更快的解决方案吗？

英文:

I have two dataframes, being the first one:

   unit    year
0     1    2020
1     2    2021
2     3    2022

and the second:

   unit    observations
0     1               0
1     2               1
2     2               2
3     2               3
4     2               4
5     3               5

I need to add a column at the first dataframe as a sum of observations for the unit at the second dataframe, I have something like this at the end

   unit    year   observations
0     1    2020              0
1     2    2021             10
2     3    2022              5

I tried to df_1.iterrows and using a query based in the unit from the first df to sum, and it worked, but I'm talking about a df with about to 4 million rows, this solution will take days. Someone have a quicker solution?

答案1

得分: 2

使用Series.map与第二个数据框中的sum进行聚合：

df1['observations'] = df1['unit'].map(df2.groupby('unit')['observations'].sum())
print(df1)
   unit  year  observations
0     1  2020             0
1     2  2021            10
2     3  2022             5

英文:

Use Series.map with aggregate sum in second DataFrame:

df1[&#39;observations&#39;] = df1[&#39;unit&#39;].map(df2.groupby(&#39;unit&#39;)[&#39;observations&#39;].sum())
print (df1)
   unit  year  observations
0     1  2020             0
1     2  2021            10
2     3  2022             5

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas基于另一个Pandas列的总和创建新列

问题

答案1

在Go、Python和OpenSSL中，SHA1算法的结果可能会有所不同。

有 “tkinter-variable” tkinter 方法参数和字符串 tkinter 方法参数之间有区别吗？

出现Django模板语法错误。我该如何解决？

Multiprocessing Process Pool Executor blocking submit function

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。