将两个具有一对多关系的数据框合并。

huangapple go评论63阅读模式
英文:

Combine two dataframe having one to many relationship

问题

Here's the translation of the code and comments you provided:

我想要合并两个数据框就像下面这样但是不知道要使用什么函数来实现我尝试分解问题但是没有得到正确的解决方案 -

import pandas as pd

person_data = [{'Id': 3763058, 'Name': 'Andi', 'description': 'abc'},
               {'Id': 3763077, 'Name': 'Mark', 'description': 'xyz'}]

person_df1 = pd.DataFrame(person_data)
display(person_df1)

|Id       | Name    |description|
|:--------|:-------:|----------:|
|3763058  |	Andi    |abc        |
|3763077  |	Mark    |xyz        |

object_data = [{'Id': 3763058, 'object_name': 'PlayStation', 'object_count': 2},
               {'Id': 3763077, 'object_name': 'MathsBook', 'object_count': 1},
               {'Id': 3763058, 'object_name': 'MusicSystem', 'object_count': 3}]

object_df2 = pd.DataFrame(object_data)
display(object_df2)

|Id       | object_name |object_count|
|:--------|:-----------:|-----------:|
|3763058  | PlayStation |2           |
|3763077  | MathsBook   |1           |
|3763058  | MusicSystem |3           |

结果数据框 -

|Id       | Name    |description| PlayStation | MathsBook | MusicSystem|
|:--------|:-------:|:---------:|:-----------:|:---------:|-----------:|
|3763058  |	Andi    |abc        |2            |0          |3           |
|3763077  |	Mark    |xyz        |0            |1          |0           |

我尝试分解问题 - 

> 部分-1获取object_name的唯一值

# uniqe_object_name = object_df2['object_name'].unique().tolist()

new_cols= ['PlayStation', 'MathsBook', 'MusicSystem'] # 目前我们有固定的值
new_vals = [0,0,0]

> 部分-2创建唯一object_name的列并初始化为零

person_df1 = person_df1.reindex(columns=person_df1.columns.tolist() + new_cols)
person_df1[new_cols] = new_vals
print(person_df1)

> 部分-3按id分组并将object_count的值存储到object_name列
> 在这里卡住了不知道要使用哪个函数来从其他数据框创建一个列并从其他数据框列中分配值

person_df1['id'][object_name] = object_df2.groupby('id')['object_name'].apply(', '.join).reset_index()

Please note that the last part of your code seems to have some issues, and it's not clear what you intend to achieve with person_df1['id'][object_name]. You may need to clarify your objective for that part of the code.

英文:

I want to combine two dataframe like below but not getting what func to use for this. i tried to breakdown problem but not getting correct solun -

import pandas as pd
person_data = [{'Id': 3763058, 'Name': 'Andi', 'description': 'abc'},
{'Id': 3763077, 'Name': 'Mark', 'description': 'xyz'}]
person_df1 = pd.DataFrame(person_data)
display(person_df1)
Id Name description
3763058 Andi abc
3763077 Mark xyz
object_data = [{'Id': 3763058, 'object_name': 'PlayStation', 'object_count': 2},
{'Id': 3763077, 'object_name': 'MathsBook',   'object_count': 1},
{'Id': 3763058, 'object_name': 'MusicSystem', 'object_count': 3},
]
object_df2 = pd.DataFrame(object_data)
display(object_df2)
Id object_name object_count
3763058 PlayStation 2
3763077 MathsBook 1
3763058 MusicSystem 3

Result DF -

Id Name description PlayStation MathsBook MusicSystem
3763058 Andi abc 2 0 3
3763077 Mark xyz 0 1 0

I tried to breakdown prob -

> part-1 : get unique values of object_name

# uniqe_object_name = object_df2['object_name'].unique().tolist()
new_cols= ['PlayStation', 'MathsBook', 'MusicSystem'] # As of now fix value we have
new_vals = [0,0,0]

> part-2 : Create column of unique object_name and initiate with zero

person_df1 = person_df1.reindex(columns=person_df1.columns.tolist() + new_cols)
person_df1[new_cols] = new_vals
print(person_df1)

> part3 : Group by id and store values of object_count to object_name column
> Stuck here , not getting what func to use to create a column from other df and assign value from other df columns.

person_df1['id'][object_name] = object_df2.groupby('id')['object_name'].apply(', '.join).reset_index()

答案1

得分: 0

First apply pivot_table, then merge:

out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
values='object_count', fill_value=0
).reset_index())

Or, to be explicit on the key and type of merge:

out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
values='object_count', fill_value=0
),
left_on='Id', right_index=True, how='left'
)

Output:

        Id  Name description  MathsBook  MusicSystem  PlayStation
0  3763058  Andi         abc          0            3            2
1  3763077  Mark         xyz          1            0            0

If the order of objects is important:

out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
values='object_count', fill_value=0
)[object_df2['object_name'].unique()],
left_on='Id', right_index=True, how='left'
)

Output:

        Id  Name description  PlayStation  MathsBook  MusicSystem
0  3763058  Andi         abc            2          0            3
1  3763077  Mark         xyz            0          1            0
英文:

First apply pivot_table, then merge:

out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
values='object_count', fill_value=0
).reset_index())

Or, to be explicit on the key and type of merge:

out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
values='object_count', fill_value=0
),
left_on='Id', right_index=True, how='left'
)

Output:

        Id  Name description  MathsBook  MusicSystem  PlayStation
0  3763058  Andi         abc          0            3            2
1  3763077  Mark         xyz          1            0            0

If the order of objects is important:

out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
values='object_count', fill_value=0
)[object_df2['object_name'].unique()],
left_on='Id', right_index=True, how='left'
)

Output:

        Id  Name description  PlayStation  MathsBook  MusicSystem
0  3763058  Andi         abc            2          0            3
1  3763077  Mark         xyz            0          1            0

huangapple
  • 本文由 发表于 2023年4月20日 05:15:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76058860.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定