连接两个表格并创建相应的列

huangapple go评论114阅读模式
英文:

Joining 2 tables and creating corresponding columns

问题

我有两个如下定义的数据框:

  1. from numpy import nan
  2. import pandas as pd
  3. df1 = pd.DataFrame({'ID': {0: 'A', 1: 'B', 2: 'C'}, 'Description': {0: 'Apple', 1: 'Book', 2: 'Cat'}})
  4. df2 = pd.DataFrame({'Name': {0: 'David', 1: 'Ken'},'ID1': {0: 'A', 1: 'B'}, 'ID2': {0: 'C', 1: nan}, 'ID3': {0: 'B', 1: 'C'}})

我想要将表1的ID与表2的ID1、ID2和ID3连接,并添加相应的列DESC1、DESC2和DESC3:

  1. Name ID1 DESC1 ID2 DESC2 ID3 DESC3
  2. 0 David A Apple C Cat B Book
  3. 1 Ken B Book NaN Null C Cat

考虑要高效地执行此操作,但不知道如何操作。有什么建议吗?

英文:

I have 2 DataFrames as defined below:

  1. from numpy import nan
  2. import pandas as pd
  3. df1 = pd.DataFrame({'ID': {0: 'A', 1: 'B', 2: 'C'}, 'Description': {0: 'Apple', 1: 'Book', 2: 'Cat'}})
  4. df2 = pd.DataFrame({'Name': {0: 'David', 1: 'Ken'},'ID1': {0: 'A', 1: 'B'}, 'ID2': {0: 'C', 1: nan}, 'ID3': {0: 'B', 1: 'C'}})
  5. # df1
  6. ID Description
  7. 0 A Apple
  8. 1 B Book
  9. 2 C Cat
  10. # df2
  11. Name ID1 ID2 ID3
  12. 0 David A C B
  13. 1 Ken B NaN C

I want to join ID of table1 to ID1, ID2, ID3 of table 2 and add corresponding columns DESC1, DESC2, DESC3: <br>

  1. Name ID1 DESC1 ID2 DESC2 ID3 DESC3
  2. 0 David A Apple C Cat B Book
  3. 1 Ken B Book NaN Null C Cat

Thinking about a for-loop to do that efficiently but do not know how to do about. Any suggestions would be appreciated!

答案1

得分: 0

以下是代码的翻译部分:

  1. 假设使用 [tag:pandas],首先 [`melt`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.melt.html),然后 [`merge`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html) 和 [`pivot`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html):

out = (
df2.melt(ignore_index=False, var_name='col', value_name='ID')
.assign(col=lambda d: d['col'].str.extract(r'(\d+)'))
.reset_index()
.merge(df1.rename(columns={'Description': 'DESC'}), how='left')
.pivot(index='index', columns='col')
.sort_index(axis=1, level=1, sort_remaining=False)
.pipe(lambda d: d.set_axis(d.columns.map(''.join), axis=1))
)

  1. 或者使用简单的循环和 [`concat`](https://pandas.pydata.org/docs/reference/api/pandas.concat.html):

out = pd.concat([df2[[col]].rename(columns={col: 'ID'}).merge(df1)
.add_suffix(col.removeprefix('ID'))
for col in df2], axis=1)

  1. 输出:
  1. ID1 DESC1 ID2 DESC2 ID3 DESC3

index
0 A Apple C Cat B Book
1 B Book NaN NaN C Cat

  1. 处理额外的列:

cols = ['Name']

out = (
df2.rename_axis('index').set_index(cols, append=True)
.melt(ignore_index=False, var_name='col', value_name='ID')
.assign(col=lambda d: d['col'].str.extract(r'(\d+)'))
.reset_index()
.merge(df1.rename(columns={'Description': 'DESC'}), how='left')
.pivot(index=['index']+cols, columns='col')
.sort_index(axis=1, level=1, sort_remaining=False)
.pipe(lambda d: d.set_axis(d.columns.map(''.join), axis=1))
.reset_index(cols)
)

  1. 使用 `concat`

out = df2[cols].join(
pd.concat([df2[[col]].rename(columns={col: 'ID'}).merge(df1)
.add_suffix(col.removeprefix('ID'))
for col in df2.drop(columns=cols)], axis=1)
)

  1. 输出:
  1. Name ID1 DESC1 ID2 DESC2 ID3 DESC3

index
0 David A Apple C Cat B Book
1 Ken B Book NaN NaN C Cat

英文:

Assuming [tag:pandas], first melt, then merge and pivot:

  1. out = (
  2. df2.melt(ignore_index=False, var_name=&#39;col&#39;, value_name=&#39;ID&#39;)
  3. .assign(col=lambda d: d[&#39;col&#39;].str.extract(r&#39;(\d+)&#39;))
  4. .reset_index()
  5. .merge(df1.rename(columns={&#39;Description&#39;: &#39;DESC&#39;}), how=&#39;left&#39;)
  6. .pivot(index=&#39;index&#39;, columns=&#39;col&#39;)
  7. .sort_index(axis=1, level=1, sort_remaining=False)
  8. .pipe(lambda d: d.set_axis(d.columns.map(&#39;&#39;.join), axis=1))
  9. )

Or with a simple loop and concat:

  1. out = pd.concat([df2[[col]].rename(columns={col: &#39;ID&#39;}).merge(df1)
  2. .add_suffix(col.removeprefix(&#39;ID&#39;))
  3. for col in df2], axis=1)

Output:

  1. ID1 DESC1 ID2 DESC2 ID3 DESC3
  2. index
  3. 0 A Apple C Cat B Book
  4. 1 B Book NaN NaN C Cat

handling extra columns

Add the other columns to the index temporarily:

  1. cols = [&#39;Name&#39;]
  2. out = (
  3. df2.rename_axis(&#39;index&#39;).set_index(cols, append=True)
  4. .melt(ignore_index=False, var_name=&#39;col&#39;, value_name=&#39;ID&#39;)
  5. .assign(col=lambda d: d[&#39;col&#39;].str.extract(r&#39;(\d+)&#39;))
  6. .reset_index()
  7. .merge(df1.rename(columns={&#39;Description&#39;: &#39;DESC&#39;}), how=&#39;left&#39;)
  8. .pivot(index=[&#39;index&#39;]+cols, columns=&#39;col&#39;)
  9. .sort_index(axis=1, level=1, sort_remaining=False)
  10. .pipe(lambda d: d.set_axis(d.columns.map(&#39;&#39;.join), axis=1))
  11. .reset_index(cols)
  12. )

With concat:

  1. out = df2[cols].join(
  2. pd.concat([df2[[col]].rename(columns={col: &#39;ID&#39;}).merge(df1)
  3. .add_suffix(col.removeprefix(&#39;ID&#39;))
  4. for col in df2.drop(columns=cols)], axis=1)
  5. )

Output:

  1. Name ID1 DESC1 ID2 DESC2 ID3 DESC3
  2. index
  3. 0 David A Apple C Cat B Book
  4. 1 Ken B Book NaN NaN C Cat

答案2

得分: 0

要获得所需的输出,您可以直接创建一个字典(我们称之为 dic),其中包含键和值,如 &#39;A&#39;: &#39;苹果&#39;。然后,遍历 ID 字典,并将键替换为它们对应的值以创建 desc 字典。最后,将ID和描述字典合并为一个。在这个解释中,Table1 和 Table2 分别指的是 ab

代码:

  1. dic = {value: a[&#39;Description&#39;][key] for key, value in a[&#39;ID&#39;].items()} #output : {&#39;A&#39;: &#39;Apple&#39;, &#39;B&#39;: &#39;Book&#39;, &#39;C&#39;: &#39;Cat&#39;}
  2. {**b, **{key.replace(&#39;ID&#39;, &#39;DESC&#39;) : {k: dic.get(v, &#39;null&#39;) for k, v in val.items()} for key, val in b.items()}}

输出:

  1. {&#39;ID1&#39;: {0: &#39;A&#39;, 1: &#39;B&#39;},
  2. &#39;ID2&#39;: {0: &#39;C&#39;, 1: nan},
  3. &#39;ID3&#39;: {0: &#39;B&#39;, 1: &#39;C&#39;},
  4. &#39;DESC1&#39;: {0: &#39;苹果&#39;, 1: &#39;书&#39;},
  5. &#39;DESC2&#39;: {0: &#39;猫&#39;, 1: &#39;null&#39;},
  6. &#39;DESC3&#39;: {0: &#39;书&#39;, 1: &#39;猫&#39;}}
英文:

To obtain the desired output, you can directly create a dictionary (let's call it dic) with keys and values like &#39;A&#39;: &#39;Apple&#39;. Then, iterate through the ID dictionary and replace the keys with their corresponding values to create the desc dictionary. Finally, merge the ID and description dictionaries into one. In this explanation, Table1 and Table2 refer to a and b respectively.

Code:

  1. dic = {value: a[&#39;Description&#39;][key] for key, value in a[&#39;ID&#39;].items()} #output : {&#39;A&#39;: &#39;Apple&#39;, &#39;B&#39;: &#39;Book&#39;, &#39;C&#39;: &#39;Cat&#39;}
  2. {**b, **{key.replace(&#39;ID&#39;, &#39;DESC&#39;) : {k: dic.get(v, &#39;null&#39;) for k, v in val.items()} for key, val in b.items()}}

Output:

  1. {&#39;ID1&#39;: {0: &#39;A&#39;, 1: &#39;B&#39;},
  2. &#39;ID2&#39;: {0: &#39;C&#39;, 1: nan},
  3. &#39;ID3&#39;: {0: &#39;B&#39;, 1: &#39;C&#39;},
  4. &#39;DESC1&#39;: {0: &#39;Apple&#39;, 1: &#39;Book&#39;},
  5. &#39;DESC2&#39;: {0: &#39;Cat&#39;, 1: &#39;null&#39;},
  6. &#39;DESC3&#39;: {0: &#39;Book&#39;, 1: &#39;Cat&#39;}}

huangapple
  • 本文由 发表于 2023年6月26日 16:21:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76554830.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定