英文:
Merging dataframes on differently-named columns in pandas while only keeping a subset of columns
问题
I have 2 datasets and I need to merge over only specific columns but none of the fields have the same name.
DF1
Name ID Job_Type Emp_Type
Adam 101 Full-Time Employee
Ben 102 Part-Time Contractor
Cathy 103 Part-Time Employee
Doug 104 Full-Time Contractor
Emily 105 Full-Time Employee
DF2
hiring_manager_id hire_type hiring_status hiring_phase name_of_emp
101 Employee Pending Requested NaN
101 Employee Approved Hired Sam
105 Contractor Approved Approved NaN
113 Employee Approved Hired Gabe
119 Contractor Pending Interviewing NaN
I would like to take specific columns and add them to df1
like so:
Name ID Job_Type Emp_Type hire_type hiring_status hiring_phase
Adam 101 Full-Time Employee Employee Pending Requested
Adam 101 Full-Time Employee Contractor Approved Hired
Ben 102 Part-Time Contractor NaN NaN NaN
Cathy 103 Part-Time Employee NaN NaN NaN
Doug 104 Full-Time Contractor NaN NaN NaN
Emily 105 Full-Time Employee Contractor Approved Hired
I tried
df1 = pd.merge(df1, df2[['ID','hire_type','hiring_status','hiring_phase']], on = 'ID', how= 'left')
This caused an error when I tried it.
Any Suggestions? Thank you!
英文:
I have 2 datasets and I need to merge over only specific columns but none of the fields have the same name.
DF1
Name ID Job_Type Emp_Type
Adam 101 Full-Time Employee
Ben 102 Part-Time Contractor
Cathy 103 Part-Time Employee
Doug 104 Full-Time Contractor
Emily 105 Full-Time Employee
DF2
hiring_manager_id hire_type hiring_status hiring_phase name_of_emp
101 Employee Pending Requested NaN
101 Employee Approved Hired Sam
105 Contractor Approved Approved NaN
113 Employee Approved Hired Gabe
119 Contractor Pending Interviewing NaN
I would like to take specific columns and add them to df1
like so:
Name ID Job_Type Emp_Type hire_type hiring_status hiring_phase
Adam 101 Full-Time Employee Employee Pending Requested
Adam 101 Full-Time Employee Contractor Approved Hired
Ben 102 Part-Time Contractor NaN NaN NaN
Cathy 103 Part-Time Employee NaN NaN NaN
Doug 104 Full-Time Contractor NaN NaN NaN
Emily 105 Full-Time Employee Contractor Approved Hired
I tried
df1 = pd.merge(df1, df2[['ID','hire_type','hiring_status','hiring_phase']], on = 'ID', how= 'left')
This caused an error when I tried it.
Any Suggestions? Thank you!
答案1
得分: 2
df2 没有 `ID` 列,将其更改为 `hiring_manager_id`。然后使用 `left_on=` 和 `right_on=` 参数:
```py
df1 = pd.merge(
df1, df2[['hiring_manager_id', 'hire_type', 'hiring_status', 'hiring_phase']], left_on='ID', right_on='hiring_manager_id', how='left'
)
print(df1)
输出:
Name ID Job_Type Emp_Type hiring_manager_id hire_type hiring_status hiring_phase
0 Adam 101 Full-Time Employee 101.0 Employee Pending Requested
1 Adam 101 Full-Time Employee 101.0 Employee Approved Hired
2 Ben 102 Part-Time Contractor NaN NaN NaN NaN
3 Cathy 103 Part-Time Employee NaN NaN NaN NaN
4 Doug 104 Full-Time Contractor NaN NaN NaN NaN
5 Emily 105 Full-Time Employee 105.0 Contractor Approved Approved
<details>
<summary>英文:</summary>
The `df2` doesn't have `ID` column, change it to `hiring_manager_id`. Then use `left_on=` and `right_on=` parameters:
```py
df1 = pd.merge(
df1, df2[["hiring_manager_id", "hire_type", "hiring_status", "hiring_phase"]], left_on="ID", right_on='hiring_manager_id', how="left"
)
print(df1)
Prints:
Name ID Job_Type Emp_Type hiring_manager_id hire_type hiring_status hiring_phase
0 Adam 101 Full-Time Employee 101.0 Employee Pending Requested
1 Adam 101 Full-Time Employee 101.0 Employee Approved Hired
2 Ben 102 Part-Time Contractor NaN NaN NaN NaN
3 Cathy 103 Part-Time Employee NaN NaN NaN NaN
4 Doug 104 Full-Time Contractor NaN NaN NaN NaN
5 Emily 105 Full-Time Employee 105.0 Contractor Approved Approved
答案2
得分: 2
如果你只想保留一个 `ID` 列,你应该将 "hiring_manager_id" 列重命名为 "ID":
```py
pd.merge(
df1,
df2[["hiring_manager_id", "hire_type", "hiring_status", "hiring_phase"]].rename(columns={"hiring_manager_id":"ID"}),
on="ID",
how="left"
)
英文:
If you want to only keep one ID
column, you should rename the "hiring_manager_id" column to "ID":
pd.merge(
df1,
df2[["hiring_manager_id", "hire_type", "hiring_status", "hiring_phase"]].rename(columns={"hiring_manager_id":"ID"}),
on="ID",
how="left"
)
Name ID Job_Type Emp_Type hire_type hiring_status hiring_phase
Adam 101 Full-Time Employee Employee Pending Requested
Adam 101 Full-Time Employee Contractor Approved Hired
Ben 102 Part-Time Contractor NaN NaN NaN
Cathy 103 Part-Time Employee NaN NaN NaN
Doug 104 Full-Time Contractor NaN NaN NaN
Emily 105 Full-Time Employee Contractor Approved Hired
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论