2023年6月18日 20:39:35go评论91阅读模式

英文:

Map two dataframes, based on their group/id, with closer values

问题

I have two dataframes as such:

# 加载所需的库
import pandas as pd
import matplotlib.pyplot as plt
# 创建数据集_1
data_set_1 = {'id': [1, 2, 3, 4, 5],
              'Available_Salary': [10, 20, 30, 40, 50]}
# 转换为dataframe_1
df_1 = pd.DataFrame(data_set_1)
print("\n df_1 = \n", df_1)
# 创建数据集_2
data_set_2 = {'id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
              'Expected_Salary': [9, 49, 18, 19, 29, 41, 4, 57, 42, 3]}
# 转换为dataframe_2
df_2 = pd.DataFrame(data_set_2)
print("\n df_2 = \n", df_2)

在这里，从视觉上可以说，'Expected_Salary' 9（id=1）、'Expected_Salary' 4（id=7）和'Expected_Salary' 3（id=10）更接近'Available_Salary' 10（id=1）。

同样，'Expected_Salary' 49（id=2）和'Expected_Salary' 57（id=8）更接近'Available_Salary' 50（id=5），依此类推。

这可以在下面的图像文件中更好地表示：

现在，我需要在df_2中生成新的列'Salary_from_df_1'和'id_from_df_1'，它们将与df_1的id相对应，表示更接近的工资。

例如，由于'Expected_Salary' 9（id=1）、'Expected_Salary' 4（id=7）和'Expected_Salary' 3（id=10）更接近'Available_Salary' 10（id=1），因此它们将具有'Salary_from_df_1'为10和'id_from_df_1'为1。这看起来如下：

对于df_2的其他id，相同的逻辑也适用于与df_1的映射。

有人能告诉我如何在Python中完成这个任务吗？

英文:

I have two datafames as such:

#Load the required libraries
import pandas as pd
import matplotlib.pyplot as plt
#Create dataset_1
data_set_1 = {&#39;id&#39;: [1,
                     2,
                     3, 
                     4,
                     5,
                     ],
              &#39;Available_Salary&#39;: [10,
                                   20,
                                   30,
                                   40,
                                   50,
                                   ],
              }
#Convert to dataframe_1
df_1 = pd.DataFrame(data_set_1)
print(&quot;\n df_1 = \n&quot;,df_1)
#Create dataset_2
data_set_2 = {&#39;id&#39;: [1, 
                     2, 
                     3, 
                     4,
                     5,
                     6,
                     7,
                     8,
                     9,
                     10,
                     ],
              &#39;Expected_Salary&#39;: [9,
                                  49,
                                  18,
                                  19,
                                  29,
                                  41,
                                  4,
                                  57,
                                  42,
                                  3,
                                  ],
        }
#Convert to dataframe_2
df_2 = pd.DataFrame(data_set_2)
print(&quot;\n df_2 = \n&quot;,df_2)

Here, visually I can say, 'Expected_Salary' 9 (with id=1), 'Expected_Salary' 4 (with id=7) and 'Expected_Salary' 3 (with id=10) is closer to 'Available_Salary' 10 (with id=1).

Likewise, 'Expected_Salary' of 49 (with id=2) and 'Expected_Salary' 57 (with id=8) is closer to 'Available_Salary' 50 (with id=5), and so on.

This can be shown in below image file for better representation:

Now, I need to generate a new columns 'Salary_from_df_1' and 'id_from_df_1' in df_2 that will map with the id's of df_1 that signifies the closer salary.

For example, since the 'Expected_Salary' 9 (with id=1), 'Expected_Salary' 4 (with id=7) and 'Expected_Salary' 3 (with id=10) is closer to 'Available_Salary' 10 (with id=1), so they will have 'Salary_from_df_1' as 10 and 'id_from_df_1' as 1. This looks as such:

The same logic follows for other id's of df_2 to map with df_1.

Can somebody please let me know how to achieve this task in Python?

答案1

得分: 1

计算df_2中每个薪水与df_1中每个可用薪水的绝对差值，然后使用argmin来获取最小绝对差值的索引，然后使用这个索引从df_1中提取id值到df_2中。

i = np.abs(np.subtract.outer(df_2['Expected_Salary'].values, df_1['Available Salary'].values)).argmin(axis=1)
df_2['id_df1'] = df_1['id'].values[i]

   id  Expected_Salary  id_df1
0   1                9       1
1   2               49       5
2   3               18       2
3   4               19       2
4   5               29       3
5   6               41       4
6   7                4       1
7   8               57       5
8   9               42       4
9  10                3       1

英文:

Calculate absolute difference of each salary in df_2 from every available salary in df_1, then use argmin to get the index of minimum absolute diff, then use this index to yank the id values from df_1 to df_2

i = np.abs(np.subtract.outer(df_2[&#39;Expected_Salary&#39;].values, df_1[&#39;Available Salary&#39;].values)).argmin(axis=1)
df_2[&#39;id_df1&#39;] = df_1[&#39;id&#39;].values[i]

   id  Expected_Salary  id_df1
0   1                9       1
1   2               49       5
2   3               18       2
3   4               19       2
4   5               29       3
5   6               41       4
6   7                4       1
7   8               57       5
8   9               42       4
9  10                3       1

答案2

得分: 1

你可以使用 pandas 的 merge_asof 函数：

pd.merge_asof(df1.sort_values('Expected_Salary'), df2.sort_values('avail_salary'), left_on='Expected_Salary', right_on='avail_salary', direction='nearest').sort_values('id_x')

id_x	Expected_Salary	id_y	avail_salary
1	9	1	10
2	49	5	50
3	18	2	20
4	19	2	20
5	29	3	30
6	41	4	40
7	4	1	10
8	57	5	50
9	42	4	40
10	3	1	10

英文:

You can use pandas merge_asof

pd.merge_asof(df1.sort_values(&#39;Expected_Salary&#39;),df2.sort_values(&#39;avail_salary&#39;),left_on=&#39;Expected_Salary&#39;,right_on=&#39;avail_salary&#39;,direction=&#39;nearest&#39;).sort_values(&#39;id_x&#39;)

id_x	Expected_Salary	id_y	avail_salary
1	9	1	10
2	49	5	50
3	18	2	20
4	19	2	20
5	29	3	30
6	41	4	40
7	4	1	10
8	57	5	50
9	42	4	40
10	3	1	10

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将两个数据框基于它们的分组/ID 进行映射，使用更接近的数值。

问题

答案1

答案2

create a new folder everyday as per UTC time in my s3 bucket and save json files in it

需要帮助网页抓取表格。

实现子类中的抽象属性而不重新定义 init 方法

可以使用一个输入将多个整数放入一个列表吗？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。