英文:
Map two dataframes, based on their group/id, with closer values
问题
I have two dataframes as such:
# 加载所需的库
import pandas as pd
import matplotlib.pyplot as plt
# 创建数据集_1
data_set_1 = {'id': [1, 2, 3, 4, 5],
'Available_Salary': [10, 20, 30, 40, 50]}
# 转换为dataframe_1
df_1 = pd.DataFrame(data_set_1)
print("\n df_1 = \n", df_1)
# 创建数据集_2
data_set_2 = {'id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Expected_Salary': [9, 49, 18, 19, 29, 41, 4, 57, 42, 3]}
# 转换为dataframe_2
df_2 = pd.DataFrame(data_set_2)
print("\n df_2 = \n", df_2)
在这里,从视觉上可以说,'Expected_Salary' 9(id=1)、'Expected_Salary' 4(id=7)和'Expected_Salary' 3(id=10)更接近'Available_Salary' 10(id=1)。
同样,'Expected_Salary' 49(id=2)和'Expected_Salary' 57(id=8)更接近'Available_Salary' 50(id=5),依此类推。
这可以在下面的图像文件中更好地表示:
现在,我需要在df_2中生成新的列'Salary_from_df_1'和'id_from_df_1',它们将与df_1的id相对应,表示更接近的工资。
例如,由于'Expected_Salary' 9(id=1)、'Expected_Salary' 4(id=7)和'Expected_Salary' 3(id=10)更接近'Available_Salary' 10(id=1),因此它们将具有'Salary_from_df_1'为10和'id_from_df_1'为1。这看起来如下:
对于df_2的其他id,相同的逻辑也适用于与df_1的映射。
有人能告诉我如何在Python中完成这个任务吗?
英文:
I have two datafames as such:
#Load the required libraries
import pandas as pd
import matplotlib.pyplot as plt
#Create dataset_1
data_set_1 = {'id': [1,
2,
3,
4,
5,
],
'Available_Salary': [10,
20,
30,
40,
50,
],
}
#Convert to dataframe_1
df_1 = pd.DataFrame(data_set_1)
print("\n df_1 = \n",df_1)
#Create dataset_2
data_set_2 = {'id': [1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
],
'Expected_Salary': [9,
49,
18,
19,
29,
41,
4,
57,
42,
3,
],
}
#Convert to dataframe_2
df_2 = pd.DataFrame(data_set_2)
print("\n df_2 = \n",df_2)
Here, visually I can say, 'Expected_Salary' 9 (with id=1), 'Expected_Salary' 4 (with id=7) and 'Expected_Salary' 3 (with id=10) is closer to 'Available_Salary' 10 (with id=1).
Likewise, 'Expected_Salary' of 49 (with id=2) and 'Expected_Salary' 57 (with id=8) is closer to 'Available_Salary' 50 (with id=5), and so on.
This can be shown in below image file for better representation:
Now, I need to generate a new columns 'Salary_from_df_1' and 'id_from_df_1' in df_2 that will map with the id's of df_1 that signifies the closer salary.
For example, since the 'Expected_Salary' 9 (with id=1), 'Expected_Salary' 4 (with id=7) and 'Expected_Salary' 3 (with id=10) is closer to 'Available_Salary' 10 (with id=1), so they will have 'Salary_from_df_1' as 10 and 'id_from_df_1' as 1. This looks as such:
The same logic follows for other id's of df_2 to map with df_1.
Can somebody please let me know how to achieve this task in Python?
答案1
得分: 1
计算df_2
中每个薪水与df_1
中每个可用薪水的绝对差值,然后使用argmin
来获取最小绝对差值的索引,然后使用这个索引从df_1
中提取id值到df_2
中。
i = np.abs(np.subtract.outer(df_2['Expected_Salary'].values, df_1['Available Salary'].values)).argmin(axis=1)
df_2['id_df1'] = df_1['id'].values[i]
id Expected_Salary id_df1
0 1 9 1
1 2 49 5
2 3 18 2
3 4 19 2
4 5 29 3
5 6 41 4
6 7 4 1
7 8 57 5
8 9 42 4
9 10 3 1
英文:
Calculate absolute difference of each salary in df_2
from every available salary in df_1
, then use argmin
to get the index of minimum absolute diff, then use this index to yank the id values from df_1
to df_2
i = np.abs(np.subtract.outer(df_2['Expected_Salary'].values, df_1['Available Salary'].values)).argmin(axis=1)
df_2['id_df1'] = df_1['id'].values[i]
id Expected_Salary id_df1
0 1 9 1
1 2 49 5
2 3 18 2
3 4 19 2
4 5 29 3
5 6 41 4
6 7 4 1
7 8 57 5
8 9 42 4
9 10 3 1
答案2
得分: 1
你可以使用 pandas 的 merge_asof 函数:
pd.merge_asof(df1.sort_values('Expected_Salary'), df2.sort_values('avail_salary'), left_on='Expected_Salary', right_on='avail_salary', direction='nearest').sort_values('id_x')
id_x | Expected_Salary | id_y | avail_salary |
---|---|---|---|
1 | 9 | 1 | 10 |
2 | 49 | 5 | 50 |
3 | 18 | 2 | 20 |
4 | 19 | 2 | 20 |
5 | 29 | 3 | 30 |
6 | 41 | 4 | 40 |
7 | 4 | 1 | 10 |
8 | 57 | 5 | 50 |
9 | 42 | 4 | 40 |
10 | 3 | 1 | 10 |
英文:
You can use pandas merge_asof
pd.merge_asof(df1.sort_values('Expected_Salary'),df2.sort_values('avail_salary'),left_on='Expected_Salary',right_on='avail_salary',direction='nearest').sort_values('id_x')
id_x | Expected_Salary | id_y | avail_salary |
---|---|---|---|
1 | 9 | 1 | 10 |
2 | 49 | 5 | 50 |
3 | 18 | 2 | 20 |
4 | 19 | 2 | 20 |
5 | 29 | 3 | 30 |
6 | 41 | 4 | 40 |
7 | 4 | 1 | 10 |
8 | 57 | 5 | 50 |
9 | 42 | 4 | 40 |
10 | 3 | 1 | 10 |
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论