英文:
Vertically merge dataframe on a specific column in Pandas
问题
我有两个数据集
df1 = pd.DataFrame([[1, 'CAN_US', 'MCS'], [1, 'ITL_US', 'MCS'], [1, 'MEX_US', 'MCS'], [1, 'KER_US', 'MCS']], columns=['ID', 'Group_N', 'Domain'])
df2 = pd.DataFrame([['BCS', 'JPN_US'], ['MCS', 'MKL_US'], ['MCS', 'GAA_US']], columns=['Domain', 'User_Group'])
df1
ID Group_N Domain
1 CAN_US MCS
1 ITL_US MCS
1 MEX_US MCS
1 KER_US MCS
df2
Domain User_Group
BCS JPN_US
MCS MKL_US
MCS GAA_US
我想要在这两个数据框之间进行查找和合并,如果域匹配,输出应该是
ID Group_N Domain
1 CAN_US MCS
1 ITL_US MCS
1 MEX_US MCS
1 KER_US MCS
1 MKL_US MCS
1 GAA_US MCS
我尝试过使用 `res_df = pd.concat([df1, df2], join='outer', axis=0)` 和 `res_df = pd.merge(df1, df2, on="Domain", how="inner")`,但没有得到预期的输出。
英文:
I have two datasets
df1=pd.DataFrame([[1,'CAN_US','MCS'],[1,'ITL_US','MCS'],[1,'MEX_US','MCS'],[1,'KER_US','MCS']], columns=['ID', 'Group_N','Domain'])
df2=pd.DataFrame([['BCS','JPN_US'],['MCS','MKL_US'],['MCS','GAA_US']], columns=[ 'Domain','User_Group'])
df1
ID Group_N Domain
1 CAN_US MCS
1 ITL_US MCS
1 MEX_US MCS
1 KER_US MCS
df2
Domain User_Group
BCS JPN_US
MCS MKL_US
MCS GAA_US
Where I want to do lookup & merge these two dataframe verically where there is a match for Domain, such that the output should be
ID Group_N Domain
1 CAN_US MCS
1 ITL_US MCS
1 MEX_US MCS
1 KER_US MCS
1 MKL_US MCS
1 GAA_US MCS
I have tried with res_df = pd.concat([df1, df2], join='outer', axis=0)
& res_df = pd.merge(df1, df2, on="Domain", how="inner")
but didnt got the expected output.
答案1
得分: 1
只返回翻译好的部分:
删除不在df1中出现的域:
domains = df1['Domain'].unique()
df2 = df2[df2['Domain'].isin(domains)]
重命名列:
df2.rename(columns={'User_Group': 'Group_N'}, inplace=True)
不需要进行连接。
这是输出结果:
ID Group_N Domain
0 1 CAN_US MCS
1 1 ITL_US MCS
2 1 MEX_US MCS
3 1 KER_US MCS
1 1 MKL_US MCS
2 1 GAA_US MCS
英文:
Remove domains which do not appear in df1:
domains = df1['Domain'].unique()
df2 = df2[df2['Domain'].isin(domains)]
Rename the column:
df2.rename(columns = {'User_Group': 'Group_N'}, inplace=True)
res_df = pd.concat([df1, df2], axis=0)
No need for join.
This is the output:
ID Group_N Domain
0 1 CAN_US MCS
1 1 ITL_US MCS
2 1 MEX_US MCS
3 1 KER_US MCS
1 1 MKL_US MCS
2 1 GAA_US MCS
答案2
得分: 1
以下是已翻译的内容:
A possible solution with concat
:
使用 concat
的可能解决方案:
d = df1.set_index("Domain")["ID"].to_dict()
out = (
pd.concat([df1, df2.rename(columns={"User_Group": "Group_N"})])
.loc[lambda x: x["Domain"].isin(df1["Domain"])]
.assign(ID= lambda x: x["Domain"].map(d))
)
out = (
pd.lreshape(df1.merge(df2), {"Group_N": ["Group_N", "User_Group"]})
.drop_duplicates()[df1.columns]
)
Output:
输出:
print(out)
ID Group_N Domain
0 1 CAN_US MCS
1 1 ITL_US MCS
2 1 MEX_US MCS
3 1 KER_US MCS
1 1 MKL_US MCS
2 1 GAA_US MCS
英文:
A possible solution with concat
:
d = df1.set_index("Domain")["ID"].to_dict()
out = (
pd.concat([df1, df2.rename(columns={"User_Group": "Group_N"})])
.loc[lambda x: x["Domain"].isin(df1["Domain"])]
.assign(ID= lambda x: x["Domain"].map(d))
)
Or this one with merge
/lreshape
:
out = (
pd.lreshape(df1.merge(df2), {"Group_N": ["Group_N", "User_Group"]})
.drop_duplicates()[df1.columns]
)
Output :
print(out)
ID Group_N Domain
0 1 CAN_US MCS
1 1 ITL_US MCS
2 1 MEX_US MCS
3 1 KER_US MCS
1 1 MKL_US MCS
2 1 GAA_US MCS
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论