在Pandas中在特定列上垂直合并数据框。

huangapple go评论87阅读模式
英文:

Vertically merge dataframe on a specific column in Pandas

问题

我有两个数据集

df1 = pd.DataFrame([[1, 'CAN_US', 'MCS'], [1, 'ITL_US', 'MCS'], [1, 'MEX_US', 'MCS'], [1, 'KER_US', 'MCS']], columns=['ID', 'Group_N', 'Domain'])

df2 = pd.DataFrame([['BCS', 'JPN_US'], ['MCS', 'MKL_US'], ['MCS', 'GAA_US']], columns=['Domain', 'User_Group'])

df1

ID   Group_N   Domain
1    CAN_US    MCS
1    ITL_US    MCS
1    MEX_US    MCS
1    KER_US    MCS

df2
Domain   User_Group
BCS      JPN_US
MCS      MKL_US
MCS      GAA_US

我想要在这两个数据框之间进行查找和合并如果域匹配输出应该是

ID   Group_N   Domain
1    CAN_US    MCS
1    ITL_US    MCS
1    MEX_US    MCS
1    KER_US    MCS
1    MKL_US    MCS
1    GAA_US    MCS

我尝试过使用 `res_df = pd.concat([df1, df2], join='outer', axis=0)``res_df = pd.merge(df1, df2, on="Domain", how="inner")`,但没有得到预期的输出
英文:

I have two datasets

df1=pd.DataFrame([[1,'CAN_US','MCS'],[1,'ITL_US','MCS'],[1,'MEX_US','MCS'],[1,'KER_US','MCS']], columns=['ID', 'Group_N','Domain'])

df2=pd.DataFrame([['BCS','JPN_US'],['MCS','MKL_US'],['MCS','GAA_US']], columns=[ 'Domain','User_Group'])

df1

ID   Group_N   Domain
1    CAN_US    MCS
1    ITL_US    MCS
1    MEX_US    MCS
1    KER_US    MCS

df2
Domain   User_Group
BCS      JPN_US
MCS      MKL_US
MCS      GAA_US

Where I want to do lookup & merge these two dataframe verically where there is a match for Domain, such that the output should be

ID   Group_N   Domain
1    CAN_US    MCS
1    ITL_US    MCS
1    MEX_US    MCS
1    KER_US    MCS
1    MKL_US    MCS
1    GAA_US    MCS

I have tried with res_df = pd.concat([df1, df2], join='outer', axis=0)
& res_df = pd.merge(df1, df2, on="Domain", how="inner") but didnt got the expected output.

答案1

得分: 1

只返回翻译好的部分:

删除不在df1中出现的域:

domains = df1['Domain'].unique()
df2 = df2[df2['Domain'].isin(domains)]

重命名列:

df2.rename(columns={'User_Group': 'Group_N'}, inplace=True)

不需要进行连接。

这是输出结果:

   ID Group_N Domain
0   1  CAN_US    MCS
1   1  ITL_US    MCS
2   1  MEX_US    MCS
3   1  KER_US    MCS
1   1  MKL_US    MCS
2   1  GAA_US    MCS
英文:

Remove domains which do not appear in df1:

domains = df1['Domain'].unique()
df2 = df2[df2['Domain'].isin(domains)]

Rename the column:

df2.rename(columns = {'User_Group': 'Group_N'}, inplace=True)

res_df = pd.concat([df1, df2],  axis=0)

No need for join.

This is the output:

   ID Group_N Domain
0   1  CAN_US    MCS
1   1  ITL_US    MCS
2   1  MEX_US    MCS
3   1  KER_US    MCS
1   1  MKL_US    MCS
2   1  GAA_US    MCS

答案2

得分: 1

以下是已翻译的内容:

A possible solution with concat:

使用 concat 的可能解决方案:

d = df1.set_index("Domain")["ID"].to_dict()

out = (
    pd.concat([df1, df2.rename(columns={"User_Group": "Group_N"})])
        .loc[lambda x: x["Domain"].isin(df1["Domain"])]
        .assign(ID= lambda x: x["Domain"].map(d))
)

或者使用 merge/lreshape 的这个解决方案:

out = (
    pd.lreshape(df1.merge(df2), {"Group_N": ["Group_N", "User_Group"]})
        .drop_duplicates()[df1.columns]
)

Output:

输出:

print(out)

   ID Group_N Domain
0   1  CAN_US    MCS
1   1  ITL_US    MCS
2   1  MEX_US    MCS
3   1  KER_US    MCS
1   1  MKL_US    MCS
2   1  GAA_US    MCS
英文:

A possible solution with concat:

d = df1.set_index("Domain")["ID"].to_dict()
​
out = (
    pd.concat([df1, df2.rename(columns={"User_Group": "Group_N"})])
        .loc[lambda x: x["Domain"].isin(df1["Domain"])]
        .assign(ID= lambda x: x["Domain"].map(d))
)

Or this one with merge/lreshape :

out = (
    pd.lreshape(df1.merge(df2), {"Group_N": ["Group_N", "User_Group"]})
        .drop_duplicates()[df1.columns]
)

Output :

print(out)

   ID Group_N Domain
0   1  CAN_US    MCS
1   1  ITL_US    MCS
2   1  MEX_US    MCS
3   1  KER_US    MCS
1   1  MKL_US    MCS
2   1  GAA_US    MCS

huangapple
  • 本文由 发表于 2023年6月13日 00:30:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/76458606.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定