如何向 Pandas 数据框按另一列分组添加具体数值的列?

huangapple go评论55阅读模式
英文:

How to add a column with given values to a Pandas dataframe, grouped by another column?

问题

我有两个Pandas数据框:

print(df_a)
   ID  irrelevant_value
0   1  1.2
1   1  2.3
2   1  0.9
3   1  1.1
4   2  2.7
5   2  3.1
6   3  1.3
7   3  0.2
8   3  2.3
...

    ID  add_these_values_to_the_same_ID
0   1   100
1   2   120
2   3   90
...

我想将它们合并,使其如下所示:

print(df_a)

    ID  irrelevant_value  add_these_values_to_the_same_ID
0   1   1.2               100
1   1   2.3               100
2   1   0.9               100
3   1   1.1               100
4   2   2.7               120
5   2   3.1               120
6   3   1.3               90
7   3   0.2               90
8   3   2.3               90
...

如何实现这一目标?

我一直在尝试使用df_a.groupby(["ID"]),但无法找到下一步的方法。

英文:

I have two Pandas dataframes:

print(df_a)
   ID  irrelevant_value
0   1  1.2
1   1  2.3
2   1  0.9
3   1  1.1
4   2  2.7
5   2  3.1
6   3  1.3
7   3  0.2
8   3  2.3
...

and

    ID  add_these_values_to_the_same_ID
0   1   100
1   2   120
2   3   90
...

I would like to combine them, such that - desired result:

print(df_a)

    ID  irrelevant_value  add_these_values_to_the_same_ID
0   1   1.2               100
1   1   2.3               100
2   1   0.9               100
3   1   1.1               100
4   2   2.7               120
5   2   3.1               120
6   3   1.3               90
7   3   0.2               90
8   3   2.3               90
...

How can this be accomplished?

I have been struggling with df_a.groupby(["ID"]), but cannot find a way forward.

答案1

得分: 2

以下是翻译好的部分:

" groupby" 函数在这里不需要;相反,只需使用 merge。 正如文档中所述,
> 如果两个键列都包含键为 null 值的行,则这些行将与对方匹配。 这与通常的 SQL 连接行为不同,可能会导致意外结果。

而且,您的两个数据框都包含 ID。 因此,您可以使用以下代码将数据框合并在一起(这只是一种方法,但它确实有效):

import pandas as pd

# 示例数据框 - 这些可能在您的情况下在其他地方定义
data_a = {'ID': [1, 1, 1, 1, 2, 2, 3, 3, 3],
          'irrelevant_value': [1.2, 2.3, 0.9, 1.1, 2.7, 3.1, 1.3, 0.2, 2.3]}
df_a = pd.DataFrame(data_a)

data_b = {'ID': [1, 2, 3],
          'add_these_values_to_the_same_ID': [100, 120, 90]}
df_b = pd.DataFrame(data_b)

# 在 'ID' 列上合并数据框
result = df_a.merge(df_b, on='ID')

print(result)

其中真正重要的一行是:

result = df_a.merge(df_b, on='ID')

这将输出:

   ID  irrelevant_value  add_these_values_to_the_same_ID
0   1               1.2                             100
1   1               2.3                             100
2   1               0.9                             100
3   1               1.1                             100
4   2               2.7                             120
5   2               3.1                             120
6   3               1.3                              90
7   3               0.2                              90
8   3               2.3                              90

如果您的数据不适用于此方法,您可以查看 how
> how{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, 默认 inner
要执行的合并类型。
> - left: 仅使用左框架的键,类似于 SQL 左外连接;保留键顺序。
> - right: 仅使用右框架的键,类似于 SQL 右外连接;保留键顺序。
> - outer: 使用两个框架的键的并集,类似于 SQL 全外连接;按字典顺序排序键。
> - inner: 使用两个框架的键的交集,类似于 SQL 内连接;保留左键的顺序。
> - cross: 从两个框架创建笛卡尔积,保留左键的顺序。

如果上面的代码不起作用,outer 可能会起作用。

英文:

The groupby function is not needed here; instead, just use merge. As it says in those docs,
> If both key columns contain rows where the key is a null value, those rows will be matched against each other. This is different from usual SQL join behaviour and can lead to unexpected results.

And both of your dataframes contain ID. Therefore, you can merge the dataframes with this code (this is just one way to do so with one method, but it does work):

import pandas as pd

# Example dataframes - these might be defined elsewhere in your situation
data_a = {'ID': [1, 1, 1, 1, 2, 2, 3, 3, 3],
          'irrelevant_value': [1.2, 2.3, 0.9, 1.1, 2.7, 3.1, 1.3, 0.2, 2.3]}
df_a = pd.DataFrame(data_a)

data_b = {'ID': [1, 2, 3],
          'add_these_values_to_the_same_ID': [100, 120, 90]}
df_b = pd.DataFrame(data_b)

# Merge dataframes on the 'ID' column
result = df_a.merge(df_b, on='ID')

print(result)

The really important line there is:

result = df_a.merge(df_b, on='ID')

And this will output:

   ID  irrelevant_value  add_these_values_to_the_same_ID
0   1               1.2                             100
1   1               2.3                             100
2   1               0.9                             100
3   1               1.1                             100
4   2               2.7                             120
5   2               3.1                             120
6   3               1.3                              90
7   3               0.2                              90
8   3               2.3                              90

This should work, but if your data does not work with this method, you can take a look at how:
> how{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default inner
Type of merge to be performed.
> - left: use only keys from left frame, similar to a SQL left outer join; preserve key order.
> - right: use only keys from right frame, similar to a SQL right outer join; preserve key order.
> - outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.
> - inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys.
> - cross: creates the cartesian product from both frames, preserves the order of the left keys.

outer might work if the code above does not.

答案2

得分: 0

尝试:dataframe.merge

df_a.merge(other_df, left_on='ID', right_on='ID')
英文:

Try: dataframe.merge

df_a.merge(other_df, left_on = 'ID', right_on = 'ID')

huangapple
  • 本文由 发表于 2023年4月14日 00:01:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76007530.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定