英文:
Combine two dataframe having one to many relationship
问题
Here's the translation of the code and comments you provided:
我想要合并两个数据框,就像下面这样,但是不知道要使用什么函数来实现。我尝试分解问题,但是没有得到正确的解决方案 -
import pandas as pd
person_data = [{'Id': 3763058, 'Name': 'Andi', 'description': 'abc'},
{'Id': 3763077, 'Name': 'Mark', 'description': 'xyz'}]
person_df1 = pd.DataFrame(person_data)
display(person_df1)
|Id | Name |description|
|:--------|:-------:|----------:|
|3763058 | Andi |abc |
|3763077 | Mark |xyz |
object_data = [{'Id': 3763058, 'object_name': 'PlayStation', 'object_count': 2},
{'Id': 3763077, 'object_name': 'MathsBook', 'object_count': 1},
{'Id': 3763058, 'object_name': 'MusicSystem', 'object_count': 3}]
object_df2 = pd.DataFrame(object_data)
display(object_df2)
|Id | object_name |object_count|
|:--------|:-----------:|-----------:|
|3763058 | PlayStation |2 |
|3763077 | MathsBook |1 |
|3763058 | MusicSystem |3 |
结果数据框 -
|Id | Name |description| PlayStation | MathsBook | MusicSystem|
|:--------|:-------:|:---------:|:-----------:|:---------:|-----------:|
|3763058 | Andi |abc |2 |0 |3 |
|3763077 | Mark |xyz |0 |1 |0 |
我尝试分解问题 -
> 部分-1:获取object_name的唯一值
# uniqe_object_name = object_df2['object_name'].unique().tolist()
new_cols= ['PlayStation', 'MathsBook', 'MusicSystem'] # 目前我们有固定的值
new_vals = [0,0,0]
> 部分-2:创建唯一object_name的列并初始化为零
person_df1 = person_df1.reindex(columns=person_df1.columns.tolist() + new_cols)
person_df1[new_cols] = new_vals
print(person_df1)
> 部分-3:按id分组,并将object_count的值存储到object_name列
> 在这里卡住了,不知道要使用哪个函数来从其他数据框创建一个列,并从其他数据框列中分配值。
person_df1['id'][object_name] = object_df2.groupby('id')['object_name'].apply(', '.join).reset_index()
Please note that the last part of your code seems to have some issues, and it's not clear what you intend to achieve with person_df1['id'][object_name]
. You may need to clarify your objective for that part of the code.
英文:
I want to combine two dataframe like below but not getting what func to use for this. i tried to breakdown problem but not getting correct solun -
import pandas as pd
person_data = [{'Id': 3763058, 'Name': 'Andi', 'description': 'abc'},
{'Id': 3763077, 'Name': 'Mark', 'description': 'xyz'}]
person_df1 = pd.DataFrame(person_data)
display(person_df1)
Id | Name | description |
---|---|---|
3763058 | Andi | abc |
3763077 | Mark | xyz |
object_data = [{'Id': 3763058, 'object_name': 'PlayStation', 'object_count': 2},
{'Id': 3763077, 'object_name': 'MathsBook', 'object_count': 1},
{'Id': 3763058, 'object_name': 'MusicSystem', 'object_count': 3},
]
object_df2 = pd.DataFrame(object_data)
display(object_df2)
Id | object_name | object_count |
---|---|---|
3763058 | PlayStation | 2 |
3763077 | MathsBook | 1 |
3763058 | MusicSystem | 3 |
Result DF -
Id | Name | description | PlayStation | MathsBook | MusicSystem |
---|---|---|---|---|---|
3763058 | Andi | abc | 2 | 0 | 3 |
3763077 | Mark | xyz | 0 | 1 | 0 |
I tried to breakdown prob -
> part-1 : get unique values of object_name
# uniqe_object_name = object_df2['object_name'].unique().tolist()
new_cols= ['PlayStation', 'MathsBook', 'MusicSystem'] # As of now fix value we have
new_vals = [0,0,0]
> part-2 : Create column of unique object_name and initiate with zero
person_df1 = person_df1.reindex(columns=person_df1.columns.tolist() + new_cols)
person_df1[new_cols] = new_vals
print(person_df1)
> part3 : Group by id and store values of object_count to object_name column
> Stuck here , not getting what func to use to create a column from other df and assign value from other df columns.
person_df1['id'][object_name] = object_df2.groupby('id')['object_name'].apply(', '.join).reset_index()
答案1
得分: 0
First apply pivot_table
, then merge
:
out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
values='object_count', fill_value=0
).reset_index())
Or, to be explicit on the key and type of merge:
out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
values='object_count', fill_value=0
),
left_on='Id', right_index=True, how='left'
)
Output:
Id Name description MathsBook MusicSystem PlayStation
0 3763058 Andi abc 0 3 2
1 3763077 Mark xyz 1 0 0
If the order of objects is important:
out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
values='object_count', fill_value=0
)[object_df2['object_name'].unique()],
left_on='Id', right_index=True, how='left'
)
Output:
Id Name description PlayStation MathsBook MusicSystem
0 3763058 Andi abc 2 0 3
1 3763077 Mark xyz 0 1 0
英文:
First apply pivot_table
, then merge
:
out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
values='object_count', fill_value=0
).reset_index())
Or, to be explicit on the key and type of merge:
out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
values='object_count', fill_value=0
),
left_on='Id', right_index=True, how='left'
)
Output:
Id Name description MathsBook MusicSystem PlayStation
0 3763058 Andi abc 0 3 2
1 3763077 Mark xyz 1 0 0
If the order of objects is important:
out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
values='object_count', fill_value=0
)[object_df2['object_name'].unique()],
left_on='Id', right_index=True, how='left'
)
Output:
Id Name description PlayStation MathsBook MusicSystem
0 3763058 Andi abc 2 0 3
1 3763077 Mark xyz 0 1 0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论