使用Pandas数据框进行多次出现的右键与左数据框的交叉引用。

huangapple go评论101阅读模式
英文:

Cross referencing Pandas Dataframes with multiple occurrences of right key on left Dataframe

问题

我有两个DataFrame,A和B。A有一个列名为"B_id",B有一个"ID"列。然而,A中的许多行使用相同的"B_id"来确定关于该行的重复数据。例如,A看起来像这样:

  1. A:
  2. ID Name (...) B_id
  3. 1 John ... 123
  4. 2 Emily ... 123
  5. 3 Erick ... 321

而B可能看起来像这样:

  1. B:
  2. ID Model (...)
  3. 123 M1 ...
  4. 321 M2 ...

我想要创建一个新的DataFrame,将来自两个DataFrame的信息组合和交叉引用,其中相应的B_id的所有列将被添加到A中的行。输出将如下所示:

  1. Result:
  2. ID Name (...) B_id Model (...)
  3. 1 John ... 123 M1 ...
  4. 2 Emily ... 123 M1 ...
  5. 3 Erick ... 321 M2 ...

我尝试使用pd.merge并将'left'作为how,但它返回一个空的DataFrame。

英文:

I have two DataFrames, A and B. A has a column which is "B_id", B has a "ID" column. However, many rows on A use the same "B_id" to determine data about that row which repeats. For example, A looks like this:

  1. A:
  2. ID Name (...) B_id
  3. 1 John ... 123
  4. 2 Emily ... 123
  5. 3 Erick ... 321

Where B could look like this:

  1. B:
  2. ID Model (...)
  3. 123 M1 ...
  4. 321 M2 ...

What I want is to create a new Dataframe combine and cross referencing the information from both DataFrames, where all columns for the corresponding B_id would be added to the row in A. The output would look like this:

  1. Result:
  2. ID Name (...) B_id Model (...)
  3. 1 John ... 123 M1 ...
  4. 2 Emily ... 123 M1 ...
  5. 3 Erick ... 321 M2 ...

I've tried with pd.merge with 'left' as how, but it returns a empty DataFrame.

答案1

得分: 1

你差不多到了。你需要做的是在pandas的merge方法中使用on参数指定要合并的列(on=B_id)。为了使此方法生效,你必须首先确保要连接的列在两个数据框中具有相同的名称。你可以选择:

  • 在B数据框中将列ID重命名为B_iddfB.rename(columns={"ID": "B_id"})
  • 或者简单地创建一个具有该名称的新列:dfB['B_id'] = dfB['ID']
英文:

You were almost there. What you need to do is specify the column on which to merge using the on parameter in pandas' merge method (on=B_id). In order for this to work, you must first ensure that the column to join on shares the same name in both dataframes. You may either:

  • rename column ID to B_id in the B dataframe: dfB.rename(columns={"ID": "B_id"}
  • or simply create a new column with that name: dfB['B_id'] = dfB['ID'].

Here is a code example:

  1. import pandas as pd
  2. dfA = pd.DataFrame()
  3. dfA['ID'] = [1, 2, 3]
  4. dfA['Name'] = ['John', 'Emily', 'Erick']
  5. dfA['B_id'] = [123, 123, 321]
  6. dfB = pd.DataFrame()
  7. dfB['ID'] = [123, 321]
  8. dfB['Model'] = ['M1', 'M2']
  9. dfB['B_id'] = dfB['ID']
  10. pd.merge(df1,df2, on='B_id')

This returns:

  1. ID_x Name B_id ID_y Model
  2. 0 1 John 123 123 M1
  3. 1 2 Emily 123 123 M1
  4. 2 3 Erick 321 321 M2

huangapple
  • 本文由 发表于 2023年6月13日 07:13:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76460816.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定