2023年4月14日 00:01:32go评论71阅读模式

英文:

How to add a column with given values to a Pandas dataframe, grouped by another column?

问题

我有两个Pandas数据框：

print(df_a)
   ID  irrelevant_value
0   1  1.2
1   1  2.3
2   1  0.9
3   1  1.1
4   2  2.7
5   2  3.1
6   3  1.3
7   3  0.2
8   3  2.3
...

和

    ID  add_these_values_to_the_same_ID
0   1   100
1   2   120
2   3   90
...

我想将它们合并，使其如下所示：

print(df_a)

    ID  irrelevant_value  add_these_values_to_the_same_ID
0   1   1.2               100
1   1   2.3               100
2   1   0.9               100
3   1   1.1               100
4   2   2.7               120
5   2   3.1               120
6   3   1.3               90
7   3   0.2               90
8   3   2.3               90
...

如何实现这一目标？

我一直在尝试使用df_a.groupby(["ID"])，但无法找到下一步的方法。

英文:

I have two Pandas dataframes:

print(df_a)
   ID  irrelevant_value
0   1  1.2
1   1  2.3
2   1  0.9
3   1  1.1
4   2  2.7
5   2  3.1
6   3  1.3
7   3  0.2
8   3  2.3
...

and

    ID  add_these_values_to_the_same_ID
0   1   100
1   2   120
2   3   90
...

I would like to combine them, such that - desired result:

print(df_a)

    ID  irrelevant_value  add_these_values_to_the_same_ID
0   1   1.2               100
1   1   2.3               100
2   1   0.9               100
3   1   1.1               100
4   2   2.7               120
5   2   3.1               120
6   3   1.3               90
7   3   0.2               90
8   3   2.3               90
...

How can this be accomplished?

I have been struggling with df_a.groupby(["ID"]), but cannot find a way forward.

答案1

得分: 2

以下是翻译好的部分：

" groupby" 函数在这里不需要；相反，只需使用 merge。正如文档中所述，
> 如果两个键列都包含键为 null 值的行，则这些行将与对方匹配。这与通常的 SQL 连接行为不同，可能会导致意外结果。

而且，您的两个数据框都包含 ID。因此，您可以使用以下代码将数据框合并在一起（这只是一种方法，但它确实有效）：

import pandas as pd

# 示例数据框 - 这些可能在您的情况下在其他地方定义
data_a = {&#39;ID&#39;: [1, 1, 1, 1, 2, 2, 3, 3, 3],
          &#39;irrelevant_value&#39;: [1.2, 2.3, 0.9, 1.1, 2.7, 3.1, 1.3, 0.2, 2.3]}
df_a = pd.DataFrame(data_a)

data_b = {&#39;ID&#39;: [1, 2, 3],
          &#39;add_these_values_to_the_same_ID&#39;: [100, 120, 90]}
df_b = pd.DataFrame(data_b)

# 在 'ID' 列上合并数据框
result = df_a.merge(df_b, on=&#39;ID&#39;)

print(result)

其中真正重要的一行是：

result = df_a.merge(df_b, on=&#39;ID&#39;)

这将输出：

   ID  irrelevant_value  add_these_values_to_the_same_ID
0   1               1.2                             100
1   1               2.3                             100
2   1               0.9                             100
3   1               1.1                             100
4   2               2.7                             120
5   2               3.1                             120
6   3               1.3                              90
7   3               0.2                              90
8   3               2.3                              90

如果您的数据不适用于此方法，您可以查看 how：
> how{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, 默认 inner
要执行的合并类型。
> - left: 仅使用左框架的键，类似于 SQL 左外连接；保留键顺序。
> - right: 仅使用右框架的键，类似于 SQL 右外连接；保留键顺序。
> - outer: 使用两个框架的键的并集，类似于 SQL 全外连接；按字典顺序排序键。
> - inner: 使用两个框架的键的交集，类似于 SQL 内连接；保留左键的顺序。
> - cross: 从两个框架创建笛卡尔积，保留左键的顺序。

如果上面的代码不起作用，outer 可能会起作用。

英文:

The groupby function is not needed here; instead, just use merge. As it says in those docs,
> If both key columns contain rows where the key is a null value, those rows will be matched against each other. This is different from usual SQL join behaviour and can lead to unexpected results.

And both of your dataframes contain ID. Therefore, you can merge the dataframes with this code (this is just one way to do so with one method, but it does work):

import pandas as pd

# Example dataframes - these might be defined elsewhere in your situation
data_a = {&#39;ID&#39;: [1, 1, 1, 1, 2, 2, 3, 3, 3],
          &#39;irrelevant_value&#39;: [1.2, 2.3, 0.9, 1.1, 2.7, 3.1, 1.3, 0.2, 2.3]}
df_a = pd.DataFrame(data_a)

data_b = {&#39;ID&#39;: [1, 2, 3],
          &#39;add_these_values_to_the_same_ID&#39;: [100, 120, 90]}
df_b = pd.DataFrame(data_b)

# Merge dataframes on the &#39;ID&#39; column
result = df_a.merge(df_b, on=&#39;ID&#39;)

print(result)

The really important line there is:

result = df_a.merge(df_b, on=&#39;ID&#39;)

And this will output:

   ID  irrelevant_value  add_these_values_to_the_same_ID
0   1               1.2                             100
1   1               2.3                             100
2   1               0.9                             100
3   1               1.1                             100
4   2               2.7                             120
5   2               3.1                             120
6   3               1.3                              90
7   3               0.2                              90
8   3               2.3                              90

This should work, but if your data does not work with this method, you can take a look at how:
> how{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default inner
Type of merge to be performed.
> - left: use only keys from left frame, similar to a SQL left outer join; preserve key order.
> - right: use only keys from right frame, similar to a SQL right outer join; preserve key order.
> - outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.
> - inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys.
> - cross: creates the cartesian product from both frames, preserves the order of the left keys.

outer might work if the code above does not.

答案2

得分: 0

尝试：dataframe.merge

df_a.merge(other_df, left_on='ID', right_on='ID')

英文:

Try: dataframe.merge

df_a.merge(other_df, left_on = &#39;ID&#39;, right_on = &#39;ID&#39;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何向 Pandas 数据框按另一列分组添加具体数值的列？

问题

答案1

答案2

Sharing a large numpy array across python multiprocessing map.

为什么在mypy中一个类型被识别为不是它本身？

如何使用Python正确刷新AWS凭证

为什么’³’（上标3）与Python正则表达式中的字母字符匹配？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论