英文:
Joining 2 tables and creating corresponding columns
问题
我有两个如下定义的数据框:
from numpy import nan
import pandas as pd
df1 = pd.DataFrame({'ID': {0: 'A', 1: 'B', 2: 'C'}, 'Description': {0: 'Apple', 1: 'Book', 2: 'Cat'}})
df2 = pd.DataFrame({'Name': {0: 'David', 1: 'Ken'},'ID1': {0: 'A', 1: 'B'}, 'ID2': {0: 'C', 1: nan}, 'ID3': {0: 'B', 1: 'C'}})
我想要将表1的ID与表2的ID1、ID2和ID3连接,并添加相应的列DESC1、DESC2和DESC3:
Name ID1 DESC1 ID2 DESC2 ID3 DESC3
0 David A Apple C Cat B Book
1 Ken B Book NaN Null C Cat
考虑要高效地执行此操作,但不知道如何操作。有什么建议吗?
英文:
I have 2 DataFrames as defined below:
from numpy import nan
import pandas as pd
df1 = pd.DataFrame({'ID': {0: 'A', 1: 'B', 2: 'C'}, 'Description': {0: 'Apple', 1: 'Book', 2: 'Cat'}})
df2 = pd.DataFrame({'Name': {0: 'David', 1: 'Ken'},'ID1': {0: 'A', 1: 'B'}, 'ID2': {0: 'C', 1: nan}, 'ID3': {0: 'B', 1: 'C'}})
# df1
ID Description
0 A Apple
1 B Book
2 C Cat
# df2
Name ID1 ID2 ID3
0 David A C B
1 Ken B NaN C
I want to join ID of table1 to ID1, ID2, ID3 of table 2 and add corresponding columns DESC1, DESC2, DESC3: <br>
Name ID1 DESC1 ID2 DESC2 ID3 DESC3
0 David A Apple C Cat B Book
1 Ken B Book NaN Null C Cat
Thinking about a for-loop to do that efficiently but do not know how to do about. Any suggestions would be appreciated!
答案1
得分: 0
以下是代码的翻译部分:
假设使用 [tag:pandas],首先 [`melt`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.melt.html),然后 [`merge`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html) 和 [`pivot`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html):
out = (
df2.melt(ignore_index=False, var_name='col', value_name='ID')
.assign(col=lambda d: d['col'].str.extract(r'(\d+)'))
.reset_index()
.merge(df1.rename(columns={'Description': 'DESC'}), how='left')
.pivot(index='index', columns='col')
.sort_index(axis=1, level=1, sort_remaining=False)
.pipe(lambda d: d.set_axis(d.columns.map(''.join), axis=1))
)
或者使用简单的循环和 [`concat`](https://pandas.pydata.org/docs/reference/api/pandas.concat.html):
out = pd.concat([df2[[col]].rename(columns={col: 'ID'}).merge(df1)
.add_suffix(col.removeprefix('ID'))
for col in df2], axis=1)
输出:
ID1 DESC1 ID2 DESC2 ID3 DESC3
index
0 A Apple C Cat B Book
1 B Book NaN NaN C Cat
处理额外的列:
cols = ['Name']
out = (
df2.rename_axis('index').set_index(cols, append=True)
.melt(ignore_index=False, var_name='col', value_name='ID')
.assign(col=lambda d: d['col'].str.extract(r'(\d+)'))
.reset_index()
.merge(df1.rename(columns={'Description': 'DESC'}), how='left')
.pivot(index=['index']+cols, columns='col')
.sort_index(axis=1, level=1, sort_remaining=False)
.pipe(lambda d: d.set_axis(d.columns.map(''.join), axis=1))
.reset_index(cols)
)
使用 `concat`:
out = df2[cols].join(
pd.concat([df2[[col]].rename(columns={col: 'ID'}).merge(df1)
.add_suffix(col.removeprefix('ID'))
for col in df2.drop(columns=cols)], axis=1)
)
输出:
Name ID1 DESC1 ID2 DESC2 ID3 DESC3
index
0 David A Apple C Cat B Book
1 Ken B Book NaN NaN C Cat
英文:
Assuming [tag:pandas], first melt
, then merge
and pivot
:
out = (
df2.melt(ignore_index=False, var_name='col', value_name='ID')
.assign(col=lambda d: d['col'].str.extract(r'(\d+)'))
.reset_index()
.merge(df1.rename(columns={'Description': 'DESC'}), how='left')
.pivot(index='index', columns='col')
.sort_index(axis=1, level=1, sort_remaining=False)
.pipe(lambda d: d.set_axis(d.columns.map(''.join), axis=1))
)
Or with a simple loop and concat
:
out = pd.concat([df2[[col]].rename(columns={col: 'ID'}).merge(df1)
.add_suffix(col.removeprefix('ID'))
for col in df2], axis=1)
Output:
ID1 DESC1 ID2 DESC2 ID3 DESC3
index
0 A Apple C Cat B Book
1 B Book NaN NaN C Cat
handling extra columns
Add the other columns to the index temporarily:
cols = ['Name']
out = (
df2.rename_axis('index').set_index(cols, append=True)
.melt(ignore_index=False, var_name='col', value_name='ID')
.assign(col=lambda d: d['col'].str.extract(r'(\d+)'))
.reset_index()
.merge(df1.rename(columns={'Description': 'DESC'}), how='left')
.pivot(index=['index']+cols, columns='col')
.sort_index(axis=1, level=1, sort_remaining=False)
.pipe(lambda d: d.set_axis(d.columns.map(''.join), axis=1))
.reset_index(cols)
)
With concat
:
out = df2[cols].join(
pd.concat([df2[[col]].rename(columns={col: 'ID'}).merge(df1)
.add_suffix(col.removeprefix('ID'))
for col in df2.drop(columns=cols)], axis=1)
)
Output:
Name ID1 DESC1 ID2 DESC2 ID3 DESC3
index
0 David A Apple C Cat B Book
1 Ken B Book NaN NaN C Cat
答案2
得分: 0
要获得所需的输出,您可以直接创建一个字典(我们称之为 dic
),其中包含键和值,如 'A': '苹果'
。然后,遍历 ID
字典,并将键替换为它们对应的值以创建 desc
字典。最后,将ID和描述字典合并为一个。在这个解释中,Table1 和 Table2 分别指的是 a
和 b
。
代码:
dic = {value: a['Description'][key] for key, value in a['ID'].items()} #output : {'A': 'Apple', 'B': 'Book', 'C': 'Cat'}
{**b, **{key.replace('ID', 'DESC') : {k: dic.get(v, 'null') for k, v in val.items()} for key, val in b.items()}}
输出:
{'ID1': {0: 'A', 1: 'B'},
'ID2': {0: 'C', 1: nan},
'ID3': {0: 'B', 1: 'C'},
'DESC1': {0: '苹果', 1: '书'},
'DESC2': {0: '猫', 1: 'null'},
'DESC3': {0: '书', 1: '猫'}}
英文:
To obtain the desired output, you can directly create a dictionary (let's call it dic
) with keys and values like 'A': 'Apple'
. Then, iterate through the ID
dictionary and replace the keys with their corresponding values to create the desc
dictionary. Finally, merge the ID and description dictionaries into one. In this explanation, Table1 and Table2 refer to a
and b
respectively.
Code:
dic = {value: a['Description'][key] for key, value in a['ID'].items()} #output : {'A': 'Apple', 'B': 'Book', 'C': 'Cat'}
{**b, **{key.replace('ID', 'DESC') : {k: dic.get(v, 'null') for k, v in val.items()} for key, val in b.items()}}
Output:
{'ID1': {0: 'A', 1: 'B'},
'ID2': {0: 'C', 1: nan},
'ID3': {0: 'B', 1: 'C'},
'DESC1': {0: 'Apple', 1: 'Book'},
'DESC2': {0: 'Cat', 1: 'null'},
'DESC3': {0: 'Book', 1: 'Cat'}}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论