连接两个表格并创建相应的列

huangapple go评论93阅读模式
英文:

Joining 2 tables and creating corresponding columns

问题

我有两个如下定义的数据框:

from numpy import nan
import pandas as pd

df1 = pd.DataFrame({'ID': {0: 'A', 1: 'B', 2: 'C'}, 'Description': {0: 'Apple', 1: 'Book', 2: 'Cat'}})
df2 = pd.DataFrame({'Name': {0: 'David', 1: 'Ken'},'ID1': {0: 'A', 1: 'B'}, 'ID2': {0: 'C', 1: nan}, 'ID3': {0: 'B', 1: 'C'}})

我想要将表1的ID与表2的ID1、ID2和ID3连接,并添加相应的列DESC1、DESC2和DESC3:

  Name  ID1  DESC1  ID2  DESC2  ID3  DESC3
0 David   A   Apple    C    Cat    B   Book
1   Ken   B   Book   NaN   Null    C   Cat

考虑要高效地执行此操作,但不知道如何操作。有什么建议吗?

英文:

I have 2 DataFrames as defined below:

from numpy import nan
import pandas as pd

df1 = pd.DataFrame({'ID': {0: 'A', 1: 'B', 2: 'C'}, 'Description': {0: 'Apple', 1: 'Book', 2: 'Cat'}})
df2 = pd.DataFrame({'Name': {0: 'David', 1: 'Ken'},'ID1': {0: 'A', 1: 'B'}, 'ID2': {0: 'C', 1: nan}, 'ID3': {0: 'B', 1: 'C'}})

# df1
  ID Description
0  A       Apple
1  B        Book
2  C         Cat

# df2
    Name    ID1  ID2  ID3
0   David   A    C    B
1   Ken     B   NaN   C

I want to join ID of table1 to ID1, ID2, ID3 of table 2 and add corresponding columns DESC1, DESC2, DESC3: <br>

  Name    ID1  DESC1  ID2  DESC2 ID3  DESC3
0 David   A    Apple   C    Cat   B   Book
1 Ken     B    Book    NaN  Null  C   Cat

Thinking about a for-loop to do that efficiently but do not know how to do about. Any suggestions would be appreciated!

答案1

得分: 0

以下是代码的翻译部分:

假设使用 [tag:pandas],首先 [`melt`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.melt.html),然后 [`merge`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html) 和 [`pivot`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html):

out = (
df2.melt(ignore_index=False, var_name='col', value_name='ID')
.assign(col=lambda d: d['col'].str.extract(r'(\d+)'))
.reset_index()
.merge(df1.rename(columns={'Description': 'DESC'}), how='left')
.pivot(index='index', columns='col')
.sort_index(axis=1, level=1, sort_remaining=False)
.pipe(lambda d: d.set_axis(d.columns.map(''.join), axis=1))
)


或者使用简单的循环和 [`concat`](https://pandas.pydata.org/docs/reference/api/pandas.concat.html):

out = pd.concat([df2[[col]].rename(columns={col: 'ID'}).merge(df1)
.add_suffix(col.removeprefix('ID'))
for col in df2], axis=1)


输出:

  ID1  DESC1  ID2 DESC2 ID3 DESC3

index
0 A Apple C Cat B Book
1 B Book NaN NaN C Cat


处理额外的列:

cols = ['Name']

out = (
df2.rename_axis('index').set_index(cols, append=True)
.melt(ignore_index=False, var_name='col', value_name='ID')
.assign(col=lambda d: d['col'].str.extract(r'(\d+)'))
.reset_index()
.merge(df1.rename(columns={'Description': 'DESC'}), how='left')
.pivot(index=['index']+cols, columns='col')
.sort_index(axis=1, level=1, sort_remaining=False)
.pipe(lambda d: d.set_axis(d.columns.map(''.join), axis=1))
.reset_index(cols)
)


使用 `concat`:

out = df2[cols].join(
pd.concat([df2[[col]].rename(columns={col: 'ID'}).merge(df1)
.add_suffix(col.removeprefix('ID'))
for col in df2.drop(columns=cols)], axis=1)
)


输出:

    Name ID1  DESC1  ID2 DESC2 ID3 DESC3

index
0 David A Apple C Cat B Book
1 Ken B Book NaN NaN C Cat

英文:

Assuming [tag:pandas], first melt, then merge and pivot:

out = (
 df2.melt(ignore_index=False, var_name=&#39;col&#39;, value_name=&#39;ID&#39;)
    .assign(col=lambda d: d[&#39;col&#39;].str.extract(r&#39;(\d+)&#39;))
    .reset_index()
    .merge(df1.rename(columns={&#39;Description&#39;: &#39;DESC&#39;}), how=&#39;left&#39;)
    .pivot(index=&#39;index&#39;, columns=&#39;col&#39;)
    .sort_index(axis=1, level=1, sort_remaining=False)
    .pipe(lambda d: d.set_axis(d.columns.map(&#39;&#39;.join), axis=1))
)

Or with a simple loop and concat:

out = pd.concat([df2[[col]].rename(columns={col: &#39;ID&#39;}).merge(df1)
                           .add_suffix(col.removeprefix(&#39;ID&#39;))
                 for col in df2], axis=1)

Output:

      ID1  DESC1  ID2 DESC2 ID3 DESC3
index                                
0       A  Apple    C   Cat   B  Book
1       B   Book  NaN   NaN   C   Cat

handling extra columns

Add the other columns to the index temporarily:

cols = [&#39;Name&#39;]

out = (
 df2.rename_axis(&#39;index&#39;).set_index(cols, append=True)
    .melt(ignore_index=False, var_name=&#39;col&#39;, value_name=&#39;ID&#39;)
    .assign(col=lambda d: d[&#39;col&#39;].str.extract(r&#39;(\d+)&#39;))
    .reset_index()
    .merge(df1.rename(columns={&#39;Description&#39;: &#39;DESC&#39;}), how=&#39;left&#39;)
    .pivot(index=[&#39;index&#39;]+cols, columns=&#39;col&#39;)
    .sort_index(axis=1, level=1, sort_remaining=False)
    .pipe(lambda d: d.set_axis(d.columns.map(&#39;&#39;.join), axis=1))
    .reset_index(cols)
)

With concat:

out = df2[cols].join(
       pd.concat([df2[[col]].rename(columns={col: &#39;ID&#39;}).merge(df1)
                            .add_suffix(col.removeprefix(&#39;ID&#39;))
                  for col in df2.drop(columns=cols)], axis=1)
      )

Output:

        Name ID1  DESC1  ID2 DESC2 ID3 DESC3
index                                       
0      David   A  Apple    C   Cat   B  Book
1        Ken   B   Book  NaN   NaN   C   Cat

答案2

得分: 0

要获得所需的输出,您可以直接创建一个字典(我们称之为 dic),其中包含键和值,如 &#39;A&#39;: &#39;苹果&#39;。然后,遍历 ID 字典,并将键替换为它们对应的值以创建 desc 字典。最后,将ID和描述字典合并为一个。在这个解释中,Table1 和 Table2 分别指的是 ab

代码:

dic = {value: a[&#39;Description&#39;][key] for key, value in a[&#39;ID&#39;].items()}    #output : {&#39;A&#39;: &#39;Apple&#39;, &#39;B&#39;: &#39;Book&#39;, &#39;C&#39;: &#39;Cat&#39;}

{**b, **{key.replace(&#39;ID&#39;, &#39;DESC&#39;) : {k: dic.get(v, &#39;null&#39;) for k, v in val.items()} for key, val in b.items()}}

输出:

{&#39;ID1&#39;: {0: &#39;A&#39;, 1: &#39;B&#39;},
 &#39;ID2&#39;: {0: &#39;C&#39;, 1: nan},
 &#39;ID3&#39;: {0: &#39;B&#39;, 1: &#39;C&#39;},
 &#39;DESC1&#39;: {0: &#39;苹果&#39;, 1: &#39;书&#39;},
 &#39;DESC2&#39;: {0: &#39;猫&#39;, 1: &#39;null&#39;},
 &#39;DESC3&#39;: {0: &#39;书&#39;, 1: &#39;猫&#39;}}
英文:

To obtain the desired output, you can directly create a dictionary (let's call it dic) with keys and values like &#39;A&#39;: &#39;Apple&#39;. Then, iterate through the ID dictionary and replace the keys with their corresponding values to create the desc dictionary. Finally, merge the ID and description dictionaries into one. In this explanation, Table1 and Table2 refer to a and b respectively.

Code:

dic = {value: a[&#39;Description&#39;][key] for key, value in a[&#39;ID&#39;].items()}    #output : {&#39;A&#39;: &#39;Apple&#39;, &#39;B&#39;: &#39;Book&#39;, &#39;C&#39;: &#39;Cat&#39;}

{**b, **{key.replace(&#39;ID&#39;, &#39;DESC&#39;) : {k: dic.get(v, &#39;null&#39;) for k, v in val.items()} for key, val in b.items()}}

Output:

{&#39;ID1&#39;: {0: &#39;A&#39;, 1: &#39;B&#39;},
 &#39;ID2&#39;: {0: &#39;C&#39;, 1: nan},
 &#39;ID3&#39;: {0: &#39;B&#39;, 1: &#39;C&#39;},
 &#39;DESC1&#39;: {0: &#39;Apple&#39;, 1: &#39;Book&#39;},
 &#39;DESC2&#39;: {0: &#39;Cat&#39;, 1: &#39;null&#39;},
 &#39;DESC3&#39;: {0: &#39;Book&#39;, 1: &#39;Cat&#39;}}

huangapple
  • 本文由 发表于 2023年6月26日 16:21:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76554830.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定