2023年4月20日 05:15:06go评论97阅读模式

英文:

Combine two dataframe having one to many relationship

问题

Here's the translation of the code and comments you provided:

我想要合并两个数据框，就像下面这样，但是不知道要使用什么函数来实现。我尝试分解问题，但是没有得到正确的解决方案 -
import pandas as pd
person_data = [{'Id': 3763058, 'Name': 'Andi', 'description': 'abc'},
               {'Id': 3763077, 'Name': 'Mark', 'description': 'xyz'}]
person_df1 = pd.DataFrame(person_data)
display(person_df1)
|Id       | Name    |description|
|:--------|:-------:|----------:|
|3763058  |	Andi    |abc        |
|3763077  |	Mark    |xyz        |
object_data = [{'Id': 3763058, 'object_name': 'PlayStation', 'object_count': 2},
               {'Id': 3763077, 'object_name': 'MathsBook', 'object_count': 1},
               {'Id': 3763058, 'object_name': 'MusicSystem', 'object_count': 3}]
object_df2 = pd.DataFrame(object_data)
display(object_df2)
|Id       | object_name |object_count|
|:--------|:-----------:|-----------:|
|3763058  | PlayStation |2           |
|3763077  | MathsBook   |1           |
|3763058  | MusicSystem |3           |
结果数据框 -
|Id       | Name    |description| PlayStation | MathsBook | MusicSystem|
|:--------|:-------:|:---------:|:-----------:|:---------:|-----------:|
|3763058  |	Andi    |abc        |2            |0          |3           |
|3763077  |	Mark    |xyz        |0            |1          |0           |
我尝试分解问题 - 
> 部分-1：获取object_name的唯一值
# uniqe_object_name = object_df2['object_name'].unique().tolist()
new_cols= ['PlayStation', 'MathsBook', 'MusicSystem'] # 目前我们有固定的值
new_vals = [0,0,0]
> 部分-2：创建唯一object_name的列并初始化为零
person_df1 = person_df1.reindex(columns=person_df1.columns.tolist() + new_cols)
person_df1[new_cols] = new_vals
print(person_df1)
> 部分-3：按id分组，并将object_count的值存储到object_name列
> 在这里卡住了，不知道要使用哪个函数来从其他数据框创建一个列，并从其他数据框列中分配值。
person_df1['id'][object_name] = object_df2.groupby('id')['object_name'].apply(', '.join).reset_index()

Please note that the last part of your code seems to have some issues, and it's not clear what you intend to achieve with person_df1['id'][object_name]. You may need to clarify your objective for that part of the code.

英文:

I want to combine two dataframe like below but not getting what func to use for this. i tried to breakdown problem but not getting correct solun -

import pandas as pd
person_data = [{&#39;Id&#39;: 3763058, &#39;Name&#39;: &#39;Andi&#39;, &#39;description&#39;: &#39;abc&#39;},
{&#39;Id&#39;: 3763077, &#39;Name&#39;: &#39;Mark&#39;, &#39;description&#39;: &#39;xyz&#39;}]
person_df1 = pd.DataFrame(person_data)
display(person_df1)

Id	Name	description
3763058	Andi	abc
3763077	Mark	xyz

object_data = [{&#39;Id&#39;: 3763058, &#39;object_name&#39;: &#39;PlayStation&#39;, &#39;object_count&#39;: 2},
{&#39;Id&#39;: 3763077, &#39;object_name&#39;: &#39;MathsBook&#39;,   &#39;object_count&#39;: 1},
{&#39;Id&#39;: 3763058, &#39;object_name&#39;: &#39;MusicSystem&#39;, &#39;object_count&#39;: 3},
]
object_df2 = pd.DataFrame(object_data)
display(object_df2)

Id	object_name	object_count
3763058	PlayStation	2
3763077	MathsBook	1
3763058	MusicSystem	3

Result DF -

Id	Name	description	PlayStation	MathsBook	MusicSystem
3763058	Andi	abc	2	0	3
3763077	Mark	xyz	0	1	0

I tried to breakdown prob -

> part-1 : get unique values of object_name

# uniqe_object_name = object_df2[&#39;object_name&#39;].unique().tolist()
new_cols= [&#39;PlayStation&#39;, &#39;MathsBook&#39;, &#39;MusicSystem&#39;] # As of now fix value we have
new_vals = [0,0,0]

> part-2 : Create column of unique object_name and initiate with zero

person_df1 = person_df1.reindex(columns=person_df1.columns.tolist() + new_cols)
person_df1[new_cols] = new_vals
print(person_df1)

> part3 : Group by id and store values of object_count to object_name column
> Stuck here , not getting what func to use to create a column from other df and assign value from other df columns.

person_df1[&#39;id&#39;][object_name] = object_df2.groupby(&#39;id&#39;)[&#39;object_name&#39;].apply(&#39;, &#39;.join).reset_index()

答案1

得分: 0

First apply pivot_table, then merge:

out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
values='object_count', fill_value=0
).reset_index())

Or, to be explicit on the key and type of merge:

out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
values='object_count', fill_value=0
),
left_on='Id', right_index=True, how='left'
)

Output:

        Id  Name description  MathsBook  MusicSystem  PlayStation
0  3763058  Andi         abc          0            3            2
1  3763077  Mark         xyz          1            0            0

If the order of objects is important:

out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
values='object_count', fill_value=0
)[object_df2['object_name'].unique()],
left_on='Id', right_index=True, how='left'
)

Output:

        Id  Name description  PlayStation  MathsBook  MusicSystem
0  3763058  Andi         abc            2          0            3
1  3763077  Mark         xyz            0          1            0

英文:

First apply pivot_table, then merge:

out = person_df1.merge(object_df2.pivot_table(index=&#39;Id&#39;, columns=&#39;object_name&#39;,
values=&#39;object_count&#39;, fill_value=0
).reset_index())

Or, to be explicit on the key and type of merge:

out = person_df1.merge(object_df2.pivot_table(index=&#39;Id&#39;, columns=&#39;object_name&#39;,
values=&#39;object_count&#39;, fill_value=0
),
left_on=&#39;Id&#39;, right_index=True, how=&#39;left&#39;
)

Output:

        Id  Name description  MathsBook  MusicSystem  PlayStation
0  3763058  Andi         abc          0            3            2
1  3763077  Mark         xyz          1            0            0

If the order of objects is important:

out = person_df1.merge(object_df2.pivot_table(index=&#39;Id&#39;, columns=&#39;object_name&#39;,
values=&#39;object_count&#39;, fill_value=0
)[object_df2[&#39;object_name&#39;].unique()],
left_on=&#39;Id&#39;, right_index=True, how=&#39;left&#39;
)

Output:

        Id  Name description  PlayStation  MathsBook  MusicSystem
0  3763058  Andi         abc            2          0            3
1  3763077  Mark         xyz            0          1            0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将两个具有一对多关系的数据框合并。

问题

答案1

Pandas中用循环进行多列筛选的函数

我想分割一个字符串，然后统计数据框中列的值。

ModuleNotFound错误在Python中出现，但目录已在sys.path上。

Databricks Pyspark：如何获取外部MySQL中的表列表并创建数据框架？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论