英文:
Cross referencing Pandas Dataframes with multiple occurrences of right key on left Dataframe
问题
我有两个DataFrame,A和B。A有一个列名为"B_id",B有一个"ID"列。然而,A中的许多行使用相同的"B_id"来确定关于该行的重复数据。例如,A看起来像这样:
A:
ID Name (...) B_id
1 John ... 123
2 Emily ... 123
3 Erick ... 321
而B可能看起来像这样:
B:
ID Model (...)
123 M1 ...
321 M2 ...
我想要创建一个新的DataFrame,将来自两个DataFrame的信息组合和交叉引用,其中相应的B_id的所有列将被添加到A中的行。输出将如下所示:
Result:
ID Name (...) B_id Model (...)
1 John ... 123 M1 ...
2 Emily ... 123 M1 ...
3 Erick ... 321 M2 ...
我尝试使用pd.merge并将'left'作为how,但它返回一个空的DataFrame。
英文:
I have two DataFrames, A and B. A has a column which is "B_id", B has a "ID" column. However, many rows on A use the same "B_id" to determine data about that row which repeats. For example, A looks like this:
A:
ID Name (...) B_id
1 John ... 123
2 Emily ... 123
3 Erick ... 321
Where B could look like this:
B:
ID Model (...)
123 M1 ...
321 M2 ...
What I want is to create a new Dataframe combine and cross referencing the information from both DataFrames, where all columns for the corresponding B_id would be added to the row in A. The output would look like this:
Result:
ID Name (...) B_id Model (...)
1 John ... 123 M1 ...
2 Emily ... 123 M1 ...
3 Erick ... 321 M2 ...
I've tried with pd.merge with 'left' as how, but it returns a empty DataFrame.
答案1
得分: 1
你差不多到了。你需要做的是在pandas的merge方法中使用on参数指定要合并的列(on=B_id)。为了使此方法生效,你必须首先确保要连接的列在两个数据框中具有相同的名称。你可以选择:
- 在B数据框中将列
ID重命名为B_id:dfB.rename(columns={"ID": "B_id"}) - 或者简单地创建一个具有该名称的新列:
dfB['B_id'] = dfB['ID']。
英文:
You were almost there. What you need to do is specify the column on which to merge using the on parameter in pandas' merge method (on=B_id). In order for this to work, you must first ensure that the column to join on shares the same name in both dataframes. You may either:
- rename column
IDtoB_idin the B dataframe:dfB.rename(columns={"ID": "B_id"} - or simply create a new column with that name:
dfB['B_id'] = dfB['ID'].
Here is a code example:
import pandas as pd
dfA = pd.DataFrame()
dfA['ID'] = [1, 2, 3]
dfA['Name'] = ['John', 'Emily', 'Erick']
dfA['B_id'] = [123, 123, 321]
dfB = pd.DataFrame()
dfB['ID'] = [123, 321]
dfB['Model'] = ['M1', 'M2']
dfB['B_id'] = dfB['ID']
pd.merge(df1,df2, on='B_id')
This returns:
ID_x Name B_id ID_y Model
0 1 John 123 123 M1
1 2 Emily 123 123 M1
2 3 Erick 321 321 M2
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论