2023年5月23日 01:40:52go评论179阅读模式

英文:

Merging dataframes on differently-named columns in pandas while only keeping a subset of columns

问题

I have 2 datasets and I need to merge over only specific columns but none of the fields have the same name.

DF1

Name ID Job_Type Emp_Type
Adam 101 Full-Time Employee
Ben 102 Part-Time Contractor
Cathy 103 Part-Time Employee
Doug 104 Full-Time Contractor
Emily 105 Full-Time Employee

DF2

hiring_manager_id hire_type hiring_status hiring_phase name_of_emp
101 Employee Pending Requested NaN
101 Employee Approved Hired Sam
105 Contractor Approved Approved NaN
113 Employee Approved Hired Gabe
119 Contractor Pending Interviewing NaN

I would like to take specific columns and add them to df1 like so:

Name ID Job_Type Emp_Type hire_type hiring_status hiring_phase
Adam 101 Full-Time Employee Employee Pending Requested
Adam 101 Full-Time Employee Contractor Approved Hired
Ben 102 Part-Time Contractor NaN NaN NaN
Cathy 103 Part-Time Employee NaN NaN NaN
Doug 104 Full-Time Contractor NaN NaN NaN
Emily 105 Full-Time Employee Contractor Approved Hired

I tried

df1 = pd.merge(df1, df2[['ID','hire_type','hiring_status','hiring_phase']], on = 'ID', how= 'left')

This caused an error when I tried it.

Any Suggestions? Thank you!

英文:

I have 2 datasets and I need to merge over only specific columns but none of the fields have the same name.

DF1

 Name    ID       Job_Type       Emp_Type      
 Adam    101      Full-Time      Employee
 Ben     102      Part-Time      Contractor 
 Cathy   103      Part-Time      Employee 
 Doug    104      Full-Time      Contractor   
 Emily   105      Full-Time      Employee

DF2

 hiring_manager_id     hire_type      hiring_status     hiring_phase      name_of_emp
         101           Employee          Pending         Requested           NaN
         101           Employee          Approved        Hired               Sam 
         105           Contractor        Approved        Approved            NaN     
         113           Employee          Approved        Hired               Gabe
         119           Contractor        Pending         Interviewing        NaN

I would like to take specific columns and add them to df1 like so:

Name    ID    Job_Type    Emp_Type   hire_type    hiring_status   hiring_phase  
Adam    101   Full-Time   Employee    Employee      Pending         Requested  
Adam    101   Full-Time   Employee    Contractor    Approved        Hired 
Ben     102   Part-Time   Contractor  NaN           NaN             NaN 
Cathy   103   Part-Time   Employee    NaN           NaN             NaN 
Doug    104   Full-Time   Contractor  NaN           NaN             NaN 
Emily   105   Full-Time   Employee    Contractor    Approved        Hired

I tried

df1 = pd.merge(df1, df2[[&#39;ID&#39;,&#39;hire_type&#39;,&#39;hiring_status&#39;,&#39;hiring_phase&#39;]], on = &#39;ID&#39;, how= &#39;left&#39;)

This caused an error when I tried it.

Any Suggestions? Thank you!

答案1

得分: 2

df2 没有 `ID` 列，将其更改为 `hiring_manager_id`。然后使用 `left_on=` 和 `right_on=` 参数：

```py
df1 = pd.merge(
    df1, df2[['hiring_manager_id', 'hire_type', 'hiring_status', 'hiring_phase']], left_on='ID', right_on='hiring_manager_id', how='left'
)
print(df1)

输出：

    Name   ID   Job_Type    Emp_Type  hiring_manager_id   hire_type hiring_status hiring_phase
0   Adam  101  Full-Time    Employee              101.0    Employee       Pending    Requested
1   Adam  101  Full-Time    Employee              101.0    Employee      Approved        Hired
2    Ben  102  Part-Time  Contractor                NaN         NaN           NaN          NaN
3  Cathy  103  Part-Time    Employee                NaN         NaN           NaN          NaN
4   Doug  104  Full-Time  Contractor                NaN         NaN           NaN          NaN
5  Emily  105  Full-Time    Employee              105.0  Contractor      Approved     Approved


<details>
<summary>英文:</summary>

The `df2` doesn&#39;t have `ID` column, change it to `hiring_manager_id`. Then use `left_on=` and `right_on=` parameters:

```py
df1 = pd.merge(
    df1, df2[[&quot;hiring_manager_id&quot;, &quot;hire_type&quot;, &quot;hiring_status&quot;, &quot;hiring_phase&quot;]], left_on=&quot;ID&quot;, right_on=&#39;hiring_manager_id&#39;, how=&quot;left&quot;
)
print(df1)

Prints:

    Name   ID   Job_Type    Emp_Type  hiring_manager_id   hire_type hiring_status hiring_phase
0   Adam  101  Full-Time    Employee              101.0    Employee       Pending    Requested
1   Adam  101  Full-Time    Employee              101.0    Employee      Approved        Hired
2    Ben  102  Part-Time  Contractor                NaN         NaN           NaN          NaN
3  Cathy  103  Part-Time    Employee                NaN         NaN           NaN          NaN
4   Doug  104  Full-Time  Contractor                NaN         NaN           NaN          NaN
5  Emily  105  Full-Time    Employee              105.0  Contractor      Approved     Approved

答案2

得分: 2

如果你只想保留一个 `ID` 列，你应该将 "hiring_manager_id" 列重命名为 "ID"：

```py
pd.merge(
    df1, 
    df2[["hiring_manager_id", "hire_type", "hiring_status", "hiring_phase"]].rename(columns={"hiring_manager_id":"ID"}), 
    on="ID",
    how="left"
)

英文:

If you want to only keep one ID column, you should rename the "hiring_manager_id" column to "ID":

pd.merge(
    df1, 
    df2[[&quot;hiring_manager_id&quot;, &quot;hire_type&quot;, &quot;hiring_status&quot;, &quot;hiring_phase&quot;]].rename(columns={&quot;hiring_manager_id&quot;:&quot;ID&quot;}), 
    on=&quot;ID&quot;,
    how=&quot;left&quot;
)

Name    ID    Job_Type    Emp_Type   hire_type    hiring_status   hiring_phase  
Adam    101   Full-Time   Employee    Employee      Pending         Requested  
Adam    101   Full-Time   Employee    Contractor    Approved        Hired 
Ben     102   Part-Time   Contractor  NaN           NaN             NaN 
Cathy   103   Part-Time   Employee    NaN           NaN             NaN 
Doug    104   Full-Time   Contractor  NaN           NaN             NaN 
Emily   105   Full-Time   Employee    Contractor    Approved        Hired

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在pandas中，使用不同名称的列合并数据框，同时仅保留列的子集。

问题

答案1

答案2

如何执行一个不返回输出且不赋值给变量的自定义过滤器？

目前我的Discord机器人的”pi”命令存在问题。

How to do simple inheritance in Go

在尝试使用pgadmin4连接本地数据库时遇到的问题。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论