将两个具有一对多关系的数据框合并。

huangapple go评论74阅读模式
英文:

Combine two dataframe having one to many relationship

问题

Here's the translation of the code and comments you provided:

  1. 我想要合并两个数据框就像下面这样但是不知道要使用什么函数来实现我尝试分解问题但是没有得到正确的解决方案 -
  2. import pandas as pd
  3. person_data = [{'Id': 3763058, 'Name': 'Andi', 'description': 'abc'},
  4. {'Id': 3763077, 'Name': 'Mark', 'description': 'xyz'}]
  5. person_df1 = pd.DataFrame(person_data)
  6. display(person_df1)
  7. |Id | Name |description|
  8. |:--------|:-------:|----------:|
  9. |3763058 | Andi |abc |
  10. |3763077 | Mark |xyz |
  11. object_data = [{'Id': 3763058, 'object_name': 'PlayStation', 'object_count': 2},
  12. {'Id': 3763077, 'object_name': 'MathsBook', 'object_count': 1},
  13. {'Id': 3763058, 'object_name': 'MusicSystem', 'object_count': 3}]
  14. object_df2 = pd.DataFrame(object_data)
  15. display(object_df2)
  16. |Id | object_name |object_count|
  17. |:--------|:-----------:|-----------:|
  18. |3763058 | PlayStation |2 |
  19. |3763077 | MathsBook |1 |
  20. |3763058 | MusicSystem |3 |
  21. 结果数据框 -
  22. |Id | Name |description| PlayStation | MathsBook | MusicSystem|
  23. |:--------|:-------:|:---------:|:-----------:|:---------:|-----------:|
  24. |3763058 | Andi |abc |2 |0 |3 |
  25. |3763077 | Mark |xyz |0 |1 |0 |
  26. 我尝试分解问题 -
  27. > 部分-1获取object_name的唯一值
  28. # uniqe_object_name = object_df2['object_name'].unique().tolist()
  29. new_cols= ['PlayStation', 'MathsBook', 'MusicSystem'] # 目前我们有固定的值
  30. new_vals = [0,0,0]
  31. > 部分-2创建唯一object_name的列并初始化为零
  32. person_df1 = person_df1.reindex(columns=person_df1.columns.tolist() + new_cols)
  33. person_df1[new_cols] = new_vals
  34. print(person_df1)
  35. > 部分-3id分组并将object_count的值存储到object_name
  36. > 在这里卡住了不知道要使用哪个函数来从其他数据框创建一个列并从其他数据框列中分配值
  37. person_df1['id'][object_name] = object_df2.groupby('id')['object_name'].apply(', '.join).reset_index()

Please note that the last part of your code seems to have some issues, and it's not clear what you intend to achieve with person_df1['id'][object_name]. You may need to clarify your objective for that part of the code.

英文:

I want to combine two dataframe like below but not getting what func to use for this. i tried to breakdown problem but not getting correct solun -

  1. import pandas as pd
  2. person_data = [{'Id': 3763058, 'Name': 'Andi', 'description': 'abc'},
  3. {'Id': 3763077, 'Name': 'Mark', 'description': 'xyz'}]
  4. person_df1 = pd.DataFrame(person_data)
  5. display(person_df1)
Id Name description
3763058 Andi abc
3763077 Mark xyz
  1. object_data = [{'Id': 3763058, 'object_name': 'PlayStation', 'object_count': 2},
  2. {'Id': 3763077, 'object_name': 'MathsBook', 'object_count': 1},
  3. {'Id': 3763058, 'object_name': 'MusicSystem', 'object_count': 3},
  4. ]
  5. object_df2 = pd.DataFrame(object_data)
  6. display(object_df2)
Id object_name object_count
3763058 PlayStation 2
3763077 MathsBook 1
3763058 MusicSystem 3

Result DF -

Id Name description PlayStation MathsBook MusicSystem
3763058 Andi abc 2 0 3
3763077 Mark xyz 0 1 0

I tried to breakdown prob -

> part-1 : get unique values of object_name

  1. # uniqe_object_name = object_df2['object_name'].unique().tolist()
  2. new_cols= ['PlayStation', 'MathsBook', 'MusicSystem'] # As of now fix value we have
  3. new_vals = [0,0,0]

> part-2 : Create column of unique object_name and initiate with zero

  1. person_df1 = person_df1.reindex(columns=person_df1.columns.tolist() + new_cols)
  2. person_df1[new_cols] = new_vals
  3. print(person_df1)

> part3 : Group by id and store values of object_count to object_name column
> Stuck here , not getting what func to use to create a column from other df and assign value from other df columns.

  1. person_df1['id'][object_name] = object_df2.groupby('id')['object_name'].apply(', '.join).reset_index()

答案1

得分: 0

First apply pivot_table, then merge:

  1. out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
  2. values='object_count', fill_value=0
  3. ).reset_index())

Or, to be explicit on the key and type of merge:

  1. out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
  2. values='object_count', fill_value=0
  3. ),
  4. left_on='Id', right_index=True, how='left'
  5. )

Output:

  1. Id Name description MathsBook MusicSystem PlayStation
  2. 0 3763058 Andi abc 0 3 2
  3. 1 3763077 Mark xyz 1 0 0

If the order of objects is important:

  1. out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
  2. values='object_count', fill_value=0
  3. )[object_df2['object_name'].unique()],
  4. left_on='Id', right_index=True, how='left'
  5. )

Output:

  1. Id Name description PlayStation MathsBook MusicSystem
  2. 0 3763058 Andi abc 2 0 3
  3. 1 3763077 Mark xyz 0 1 0
英文:

First apply pivot_table, then merge:

  1. out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
  2. values='object_count', fill_value=0
  3. ).reset_index())

Or, to be explicit on the key and type of merge:

  1. out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
  2. values='object_count', fill_value=0
  3. ),
  4. left_on='Id', right_index=True, how='left'
  5. )

Output:

  1. Id Name description MathsBook MusicSystem PlayStation
  2. 0 3763058 Andi abc 0 3 2
  3. 1 3763077 Mark xyz 1 0 0

If the order of objects is important:

  1. out = person_df1.merge(object_df2.pivot_table(index='Id', columns='object_name',
  2. values='object_count', fill_value=0
  3. )[object_df2['object_name'].unique()],
  4. left_on='Id', right_index=True, how='left'
  5. )

Output:

  1. Id Name description PlayStation MathsBook MusicSystem
  2. 0 3763058 Andi abc 2 0 3
  3. 1 3763077 Mark xyz 0 1 0

huangapple
  • 本文由 发表于 2023年4月20日 05:15:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76058860.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定