2023年6月13日 07:13:40go评论107阅读模式

英文:

Cross referencing Pandas Dataframes with multiple occurrences of right key on left Dataframe

问题

我有两个DataFrame，A和B。A有一个列名为"B_id"，B有一个"ID"列。然而，A中的许多行使用相同的"B_id"来确定关于该行的重复数据。例如，A看起来像这样：

A:
ID    Name    (...)    B_id
1     John     ...     123
2     Emily    ...     123
3     Erick    ...     321

而B可能看起来像这样：

B:
ID     Model  (...)
123    M1      ...
321    M2      ...

我想要创建一个新的DataFrame，将来自两个DataFrame的信息组合和交叉引用，其中相应的B_id的所有列将被添加到A中的行。输出将如下所示：

Result:
ID    Name    (...)    B_id    Model    (...)
1     John     ...     123     M1        ...
2     Emily    ...     123     M1        ...
3     Erick    ...     321     M2        ...

我尝试使用pd.merge并将'left'作为how，但它返回一个空的DataFrame。

英文:

I have two DataFrames, A and B. A has a column which is "B_id", B has a "ID" column. However, many rows on A use the same "B_id" to determine data about that row which repeats. For example, A looks like this:

A:
ID    Name    (...)    B_id
1     John     ...     123
2     Emily    ...     123
3     Erick    ...     321

Where B could look like this:

B:
ID     Model  (...)
123    M1      ...
321    M2      ...

What I want is to create a new Dataframe combine and cross referencing the information from both DataFrames, where all columns for the corresponding B_id would be added to the row in A. The output would look like this:

Result:
ID    Name    (...)    B_id    Model    (...)
1     John     ...     123     M1        ...
2     Emily    ...     123     M1        ...
3     Erick    ...     321     M2        ...

I've tried with pd.merge with 'left' as how, but it returns a empty DataFrame.

答案1

得分: 1

你差不多到了。你需要做的是在pandas的merge方法中使用on参数指定要合并的列（on=B_id）。为了使此方法生效，你必须首先确保要连接的列在两个数据框中具有相同的名称。你可以选择：

在B数据框中将列ID重命名为B_id：dfB.rename(columns={"ID": "B_id"})
或者简单地创建一个具有该名称的新列：dfB['B_id'] = dfB['ID']。

英文:

You were almost there. What you need to do is specify the column on which to merge using the on parameter in pandas' merge method (on=B_id). In order for this to work, you must first ensure that the column to join on shares the same name in both dataframes. You may either:

rename column ID to B_id in the B dataframe: dfB.rename(columns={"ID": "B_id"}
or simply create a new column with that name: dfB['B_id'] = dfB['ID'].

Here is a code example:

import pandas as pd
dfA = pd.DataFrame()
dfA[&#39;ID&#39;] = [1, 2, 3]
dfA[&#39;Name&#39;] = [&#39;John&#39;, &#39;Emily&#39;, &#39;Erick&#39;]
dfA[&#39;B_id&#39;] = [123, 123, 321]
dfB = pd.DataFrame()
dfB[&#39;ID&#39;] = [123, 321]
dfB[&#39;Model&#39;] = [&#39;M1&#39;, &#39;M2&#39;]
dfB[&#39;B_id&#39;] = dfB[&#39;ID&#39;]
pd.merge(df1,df2, on=&#39;B_id&#39;)

This returns:

 	ID_x 	Name 	B_id 	ID_y 	Model
0 	1 	John 	123 	123 	M1
1 	2 	Emily 	123 	123 	M1
2 	3 	Erick 	321 	321 	M2

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用Pandas数据框进行多次出现的右键与左数据框的交叉引用。

问题

答案1

Cheerio XML解析与Crawlee一起为某些键不返回text()的原因是什么？

Question and answer over multiple csv files in langchain

通过迭代参数来找到总和时遇到问题

从列标题中删除特殊字符 \r 回车符。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。