英文:
Cross referencing Pandas Dataframes with multiple occurrences of right key on left Dataframe
问题
我有两个DataFrame,A和B。A有一个列名为"B_id",B有一个"ID"列。然而,A中的许多行使用相同的"B_id"来确定关于该行的重复数据。例如,A看起来像这样:
A:
ID Name (...) B_id
1 John ... 123
2 Emily ... 123
3 Erick ... 321
而B可能看起来像这样:
B:
ID Model (...)
123 M1 ...
321 M2 ...
我想要创建一个新的DataFrame,将来自两个DataFrame的信息组合和交叉引用,其中相应的B_id的所有列将被添加到A中的行。输出将如下所示:
Result:
ID Name (...) B_id Model (...)
1 John ... 123 M1 ...
2 Emily ... 123 M1 ...
3 Erick ... 321 M2 ...
我尝试使用pd.merge
并将'left'作为how
,但它返回一个空的DataFrame。
英文:
I have two DataFrames, A and B. A has a column which is "B_id", B has a "ID" column. However, many rows on A use the same "B_id" to determine data about that row which repeats. For example, A looks like this:
A:
ID Name (...) B_id
1 John ... 123
2 Emily ... 123
3 Erick ... 321
Where B could look like this:
B:
ID Model (...)
123 M1 ...
321 M2 ...
What I want is to create a new Dataframe combine and cross referencing the information from both DataFrames, where all columns for the corresponding B_id would be added to the row in A. The output would look like this:
Result:
ID Name (...) B_id Model (...)
1 John ... 123 M1 ...
2 Emily ... 123 M1 ...
3 Erick ... 321 M2 ...
I've tried with pd.merge
with 'left' as how
, but it returns a empty DataFrame.
答案1
得分: 1
你差不多到了。你需要做的是在pandas的merge
方法中使用on
参数指定要合并的列(on=B_id
)。为了使此方法生效,你必须首先确保要连接的列在两个数据框中具有相同的名称。你可以选择:
- 在B数据框中将列
ID
重命名为B_id
:dfB.rename(columns={"ID": "B_id"})
- 或者简单地创建一个具有该名称的新列:
dfB['B_id'] = dfB['ID']
。
英文:
You were almost there. What you need to do is specify the column on which to merge using the on
parameter in pandas' merge
method (on=B_id
). In order for this to work, you must first ensure that the column to join on shares the same name in both dataframes. You may either:
- rename column
ID
toB_id
in the B dataframe:dfB.rename(columns={"ID": "B_id"}
- or simply create a new column with that name:
dfB['B_id'] = dfB['ID']
.
Here is a code example:
import pandas as pd
dfA = pd.DataFrame()
dfA['ID'] = [1, 2, 3]
dfA['Name'] = ['John', 'Emily', 'Erick']
dfA['B_id'] = [123, 123, 321]
dfB = pd.DataFrame()
dfB['ID'] = [123, 321]
dfB['Model'] = ['M1', 'M2']
dfB['B_id'] = dfB['ID']
pd.merge(df1,df2, on='B_id')
This returns:
ID_x Name B_id ID_y Model
0 1 John 123 123 M1
1 2 Emily 123 123 M1
2 3 Erick 321 321 M2
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论