2023年6月13日 00:30:10go评论87阅读模式

英文:

Vertically merge dataframe on a specific column in Pandas

问题

我有两个数据集

df1 = pd.DataFrame([[1, 'CAN_US', 'MCS'], [1, 'ITL_US', 'MCS'], [1, 'MEX_US', 'MCS'], [1, 'KER_US', 'MCS']], columns=['ID', 'Group_N', 'Domain'])

df2 = pd.DataFrame([['BCS', 'JPN_US'], ['MCS', 'MKL_US'], ['MCS', 'GAA_US']], columns=['Domain', 'User_Group'])

df1

ID   Group_N   Domain
1    CAN_US    MCS
1    ITL_US    MCS
1    MEX_US    MCS
1    KER_US    MCS

df2
Domain   User_Group
BCS      JPN_US
MCS      MKL_US
MCS      GAA_US

我想要在这两个数据框之间进行查找和合并，如果域匹配，输出应该是

ID   Group_N   Domain
1    CAN_US    MCS
1    ITL_US    MCS
1    MEX_US    MCS
1    KER_US    MCS
1    MKL_US    MCS
1    GAA_US    MCS

我尝试过使用 `res_df = pd.concat([df1, df2], join='outer', axis=0)` 和 `res_df = pd.merge(df1, df2, on="Domain", how="inner")`，但没有得到预期的输出。

英文:

I have two datasets

df1=pd.DataFrame([[1,&#39;CAN_US&#39;,&#39;MCS&#39;],[1,&#39;ITL_US&#39;,&#39;MCS&#39;],[1,&#39;MEX_US&#39;,&#39;MCS&#39;],[1,&#39;KER_US&#39;,&#39;MCS&#39;]], columns=[&#39;ID&#39;, &#39;Group_N&#39;,&#39;Domain&#39;])

df2=pd.DataFrame([[&#39;BCS&#39;,&#39;JPN_US&#39;],[&#39;MCS&#39;,&#39;MKL_US&#39;],[&#39;MCS&#39;,&#39;GAA_US&#39;]], columns=[ &#39;Domain&#39;,&#39;User_Group&#39;])

df1

ID   Group_N   Domain
1    CAN_US    MCS
1    ITL_US    MCS
1    MEX_US    MCS
1    KER_US    MCS

df2
Domain   User_Group
BCS      JPN_US
MCS      MKL_US
MCS      GAA_US

Where I want to do lookup & merge these two dataframe verically where there is a match for Domain, such that the output should be

ID   Group_N   Domain
1    CAN_US    MCS
1    ITL_US    MCS
1    MEX_US    MCS
1    KER_US    MCS
1    MKL_US    MCS
1    GAA_US    MCS

I have tried with res_df = pd.concat([df1, df2], join='outer', axis=0)
& res_df = pd.merge(df1, df2, on="Domain", how="inner") but didnt got the expected output.

答案1

得分: 1

只返回翻译好的部分：

删除不在df1中出现的域：

domains = df1['Domain'].unique()
df2 = df2[df2['Domain'].isin(domains)]

重命名列：

df2.rename(columns={'User_Group': 'Group_N'}, inplace=True)

不需要进行连接。

这是输出结果：

   ID Group_N Domain
0   1  CAN_US    MCS
1   1  ITL_US    MCS
2   1  MEX_US    MCS
3   1  KER_US    MCS
1   1  MKL_US    MCS
2   1  GAA_US    MCS

英文:

Remove domains which do not appear in df1:

domains = df1[&#39;Domain&#39;].unique()
df2 = df2[df2[&#39;Domain&#39;].isin(domains)]

Rename the column:

df2.rename(columns = {&#39;User_Group&#39;: &#39;Group_N&#39;}, inplace=True)

res_df = pd.concat([df1, df2],  axis=0)

No need for join.

This is the output:

   ID Group_N Domain
0   1  CAN_US    MCS
1   1  ITL_US    MCS
2   1  MEX_US    MCS
3   1  KER_US    MCS
1   1  MKL_US    MCS
2   1  GAA_US    MCS

答案2

得分: 1

以下是已翻译的内容：

A possible solution with concat:

使用 concat 的可能解决方案：

d = df1.set_index("Domain")["ID"].to_dict()

out = (
    pd.concat([df1, df2.rename(columns={"User_Group": "Group_N"})])
        .loc[lambda x: x["Domain"].isin(df1["Domain"])]
        .assign(ID= lambda x: x["Domain"].map(d))
)

或者使用 merge/lreshape 的这个解决方案：

out = (
    pd.lreshape(df1.merge(df2), {"Group_N": ["Group_N", "User_Group"]})
        .drop_duplicates()[df1.columns]
)

Output:

输出：

print(out)

   ID Group_N Domain
0   1  CAN_US    MCS
1   1  ITL_US    MCS
2   1  MEX_US    MCS
3   1  KER_US    MCS
1   1  MKL_US    MCS
2   1  GAA_US    MCS

英文:

A possible solution with concat:

d = df1.set_index(&quot;Domain&quot;)[&quot;ID&quot;].to_dict()

out = (
    pd.concat([df1, df2.rename(columns={&quot;User_Group&quot;: &quot;Group_N&quot;})])
        .loc[lambda x: x[&quot;Domain&quot;].isin(df1[&quot;Domain&quot;])]
        .assign(ID= lambda x: x[&quot;Domain&quot;].map(d))
)

Or this one with merge/lreshape :

out = (
    pd.lreshape(df1.merge(df2), {&quot;Group_N&quot;: [&quot;Group_N&quot;, &quot;User_Group&quot;]})
        .drop_duplicates()[df1.columns]
)

Output :

print(out)

   ID Group_N Domain
0   1  CAN_US    MCS
1   1  ITL_US    MCS
2   1  MEX_US    MCS
3   1  KER_US    MCS
1   1  MKL_US    MCS
2   1  GAA_US    MCS

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Pandas中在特定列上垂直合并数据框。

问题

答案1

答案2

Python Kivy动态类未定义

将 .json 文件转换为 .csv 文件

Google Colab：在%%shell之后使用%%python出现CalledProcessError

Pandas Python: KeyError 日期

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论