在pandas中,使用不同名称的列合并数据框,同时仅保留列的子集。

huangapple go评论74阅读模式
英文:

Merging dataframes on differently-named columns in pandas while only keeping a subset of columns

问题

I have 2 datasets and I need to merge over only specific columns but none of the fields have the same name.

DF1

Name ID Job_Type Emp_Type
Adam 101 Full-Time Employee
Ben 102 Part-Time Contractor
Cathy 103 Part-Time Employee
Doug 104 Full-Time Contractor
Emily 105 Full-Time Employee

DF2

hiring_manager_id hire_type hiring_status hiring_phase name_of_emp
101 Employee Pending Requested NaN
101 Employee Approved Hired Sam
105 Contractor Approved Approved NaN
113 Employee Approved Hired Gabe
119 Contractor Pending Interviewing NaN

I would like to take specific columns and add them to df1 like so:

Name ID Job_Type Emp_Type hire_type hiring_status hiring_phase
Adam 101 Full-Time Employee Employee Pending Requested
Adam 101 Full-Time Employee Contractor Approved Hired
Ben 102 Part-Time Contractor NaN NaN NaN
Cathy 103 Part-Time Employee NaN NaN NaN
Doug 104 Full-Time Contractor NaN NaN NaN
Emily 105 Full-Time Employee Contractor Approved Hired

I tried

df1 = pd.merge(df1, df2[['ID','hire_type','hiring_status','hiring_phase']], on = 'ID', how= 'left')

This caused an error when I tried it.

Any Suggestions? Thank you!

英文:

I have 2 datasets and I need to merge over only specific columns but none of the fields have the same name.

DF1

 Name    ID       Job_Type       Emp_Type      
 Adam    101      Full-Time      Employee
 Ben     102      Part-Time      Contractor 
 Cathy   103      Part-Time      Employee 
 Doug    104      Full-Time      Contractor   
 Emily   105      Full-Time      Employee 

DF2

 hiring_manager_id     hire_type      hiring_status     hiring_phase      name_of_emp
         101           Employee          Pending         Requested           NaN
         101           Employee          Approved        Hired               Sam 
         105           Contractor        Approved        Approved            NaN     
         113           Employee          Approved        Hired               Gabe
         119           Contractor        Pending         Interviewing        NaN

I would like to take specific columns and add them to df1 like so:

Name    ID    Job_Type    Emp_Type   hire_type    hiring_status   hiring_phase  
Adam    101   Full-Time   Employee    Employee      Pending         Requested  
Adam    101   Full-Time   Employee    Contractor    Approved        Hired 
Ben     102   Part-Time   Contractor  NaN           NaN             NaN 
Cathy   103   Part-Time   Employee    NaN           NaN             NaN 
Doug    104   Full-Time   Contractor  NaN           NaN             NaN 
Emily   105   Full-Time   Employee    Contractor    Approved        Hired

I tried

df1 = pd.merge(df1, df2[['ID','hire_type','hiring_status','hiring_phase']], on = 'ID', how= 'left')

This caused an error when I tried it.

Any Suggestions? Thank you!

答案1

得分: 2

df2 没有 `ID`将其更改为 `hiring_manager_id`。然后使用 `left_on=``right_on=` 参数

```py
df1 = pd.merge(
    df1, df2[['hiring_manager_id', 'hire_type', 'hiring_status', 'hiring_phase']], left_on='ID', right_on='hiring_manager_id', how='left'
)
print(df1)

输出:

    Name   ID   Job_Type    Emp_Type  hiring_manager_id   hire_type hiring_status hiring_phase
0   Adam  101  Full-Time    Employee              101.0    Employee       Pending    Requested
1   Adam  101  Full-Time    Employee              101.0    Employee      Approved        Hired
2    Ben  102  Part-Time  Contractor                NaN         NaN           NaN          NaN
3  Cathy  103  Part-Time    Employee                NaN         NaN           NaN          NaN
4   Doug  104  Full-Time  Contractor                NaN         NaN           NaN          NaN
5  Emily  105  Full-Time    Employee              105.0  Contractor      Approved     Approved

<details>
<summary>英文:</summary>

The `df2` doesn&#39;t have `ID` column, change it to `hiring_manager_id`. Then use `left_on=` and `right_on=` parameters:

```py
df1 = pd.merge(
    df1, df2[[&quot;hiring_manager_id&quot;, &quot;hire_type&quot;, &quot;hiring_status&quot;, &quot;hiring_phase&quot;]], left_on=&quot;ID&quot;, right_on=&#39;hiring_manager_id&#39;, how=&quot;left&quot;
)
print(df1)

Prints:

    Name   ID   Job_Type    Emp_Type  hiring_manager_id   hire_type hiring_status hiring_phase
0   Adam  101  Full-Time    Employee              101.0    Employee       Pending    Requested
1   Adam  101  Full-Time    Employee              101.0    Employee      Approved        Hired
2    Ben  102  Part-Time  Contractor                NaN         NaN           NaN          NaN
3  Cathy  103  Part-Time    Employee                NaN         NaN           NaN          NaN
4   Doug  104  Full-Time  Contractor                NaN         NaN           NaN          NaN
5  Emily  105  Full-Time    Employee              105.0  Contractor      Approved     Approved

答案2

得分: 2

如果你只想保留一个 `ID`你应该将 "hiring_manager_id" 列重命名为 "ID"

```py
pd.merge(
    df1, 
    df2[["hiring_manager_id", "hire_type", "hiring_status", "hiring_phase"]].rename(columns={"hiring_manager_id":"ID"}), 
    on="ID",
    how="left"
)
英文:

If you want to only keep one ID column, you should rename the "hiring_manager_id" column to "ID":

pd.merge(
    df1, 
    df2[[&quot;hiring_manager_id&quot;, &quot;hire_type&quot;, &quot;hiring_status&quot;, &quot;hiring_phase&quot;]].rename(columns={&quot;hiring_manager_id&quot;:&quot;ID&quot;}), 
    on=&quot;ID&quot;,
    how=&quot;left&quot;
)
Name    ID    Job_Type    Emp_Type   hire_type    hiring_status   hiring_phase  
Adam    101   Full-Time   Employee    Employee      Pending         Requested  
Adam    101   Full-Time   Employee    Contractor    Approved        Hired 
Ben     102   Part-Time   Contractor  NaN           NaN             NaN 
Cathy   103   Part-Time   Employee    NaN           NaN             NaN 
Doug    104   Full-Time   Contractor  NaN           NaN             NaN 
Emily   105   Full-Time   Employee    Contractor    Approved        Hired

huangapple
  • 本文由 发表于 2023年5月23日 01:40:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76308700.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定