使用Pandas数据框进行多次出现的右键与左数据框的交叉引用。

huangapple go评论67阅读模式
英文:

Cross referencing Pandas Dataframes with multiple occurrences of right key on left Dataframe

问题

我有两个DataFrame,A和B。A有一个列名为"B_id",B有一个"ID"列。然而,A中的许多行使用相同的"B_id"来确定关于该行的重复数据。例如,A看起来像这样:

A:
ID    Name    (...)    B_id
1     John     ...     123
2     Emily    ...     123
3     Erick    ...     321

而B可能看起来像这样:

B:
ID     Model  (...)
123    M1      ...
321    M2      ...

我想要创建一个新的DataFrame,将来自两个DataFrame的信息组合和交叉引用,其中相应的B_id的所有列将被添加到A中的行。输出将如下所示:

Result:
ID    Name    (...)    B_id    Model    (...)
1     John     ...     123     M1        ...
2     Emily    ...     123     M1        ...
3     Erick    ...     321     M2        ...

我尝试使用pd.merge并将'left'作为how,但它返回一个空的DataFrame。

英文:

I have two DataFrames, A and B. A has a column which is "B_id", B has a "ID" column. However, many rows on A use the same "B_id" to determine data about that row which repeats. For example, A looks like this:

A:
ID    Name    (...)    B_id
1     John     ...     123
2     Emily    ...     123
3     Erick    ...     321

Where B could look like this:

B:
ID     Model  (...)
123    M1      ...
321    M2      ...

What I want is to create a new Dataframe combine and cross referencing the information from both DataFrames, where all columns for the corresponding B_id would be added to the row in A. The output would look like this:

Result:
ID    Name    (...)    B_id    Model    (...)
1     John     ...     123     M1        ...
2     Emily    ...     123     M1        ...
3     Erick    ...     321     M2        ...

I've tried with pd.merge with 'left' as how, but it returns a empty DataFrame.

答案1

得分: 1

你差不多到了。你需要做的是在pandas的merge方法中使用on参数指定要合并的列(on=B_id)。为了使此方法生效,你必须首先确保要连接的列在两个数据框中具有相同的名称。你可以选择:

  • 在B数据框中将列ID重命名为B_iddfB.rename(columns={"ID": "B_id"})
  • 或者简单地创建一个具有该名称的新列:dfB['B_id'] = dfB['ID']
英文:

You were almost there. What you need to do is specify the column on which to merge using the on parameter in pandas' merge method (on=B_id). In order for this to work, you must first ensure that the column to join on shares the same name in both dataframes. You may either:

  • rename column ID to B_id in the B dataframe: dfB.rename(columns={"ID": "B_id"}
  • or simply create a new column with that name: dfB['B_id'] = dfB['ID'].

Here is a code example:

import pandas as pd

dfA = pd.DataFrame()
dfA['ID'] = [1, 2, 3]
dfA['Name'] = ['John', 'Emily', 'Erick']
dfA['B_id'] = [123, 123, 321]

dfB = pd.DataFrame()
dfB['ID'] = [123, 321]
dfB['Model'] = ['M1', 'M2']

dfB['B_id'] = dfB['ID']

pd.merge(df1,df2, on='B_id')

This returns:

 	ID_x 	Name 	B_id 	ID_y 	Model
0 	1 	John 	123 	123 	M1
1 	2 	Emily 	123 	123 	M1
2 	3 	Erick 	321 	321 	M2

huangapple
  • 本文由 发表于 2023年6月13日 07:13:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76460816.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定