数据框左合并时添加额外行

huangapple go评论126阅读模式
英文:

dataframe merge on left adding extra rows

问题

我从CSV文件创建了一个发票数据框和多个主数据数据框。

  1. invoice = pd.read_csv('rocaInv4.csv')
  2. soMstr = pd.read_csv('salesOfficeMstr.csv')
  3. custFreightMstr = pd.read_csv('customerCodeFreightMstr.csv')
  4. ratesMstr = pd.read_csv('freightMstr.csv')
  5. pfep = pd.read_csv('pfepMstr.csv')

然后,我根据物料主数据和客户主数据的可用性删除了一些行,并每次重新索引。

  1. # 检查物料的可用性
  2. invoice = invoice[invoice['Material'].isin(pfep['Material'])]
  3. invoice = invoice.reset_index(drop=True)
  4. # 检查客户详情的可用性
  5. invoice = invoice[invoice['Ship to Party'].isin(custFreightMstr['Cust No'])]
  6. invoice = invoice.reset_index(drop=True)
  7. # 检查销售代码的有效性
  8. invoice = invoice[invoice['Sales Office'].isin(soMstr['Code'])]
  9. invoice = invoice.reset_index(drop=True)
  10. invoice.shape
  11. # (384, 22)

然后,我需要将主数据的数据复制到最终的清洁发票数据框中。我选择在选择的列上进行合并,而不是在两个数据框上进行循环操作。

  1. invoice1 = invoice.merge(custFreightMstr[['Cust No', 'City', 'Customer Frgt Code']], left_on='Ship to Party', right_on='Cust No', how='left').drop_duplicates()
  2. invoice1.shape
  3. # (388, 25)

尽管我是在左侧进行合并,但最终多出了4行。我可以识别重复的行,但无法确定原因。我在这里做错了什么?

英文:

I create a invoice dataframe and a number of master dataframes from csv files

  1. invoice=pd.read_csv('rocaInv4.csv')
  2. soMstr=pd.read_csv('salesOfficeMstr.csv')
  3. custFreightMstr=pd.read_csv('customerCodeFreightMstr.csv')
  4. ratesMstr=pd.read_csv('freightMstr.csv')
  5. pfep=pd.read_csv('pfepMstr.csv')

I drop a number of rows depending on availability in material masters and customer masters. I reindex each time.

  1. #checking availability of material
  2. invoice=invoice[invoice['Material'].isin(pfep['Material'])]
  3. invoice=invoice.reset_index(drop=True)
  4. #checking availability of customer details
  5. invoice=invoice[invoice['Ship to Party'].isin(custFreightMstr['Cust No'])]
  6. invoice=invoice.reset_index(drop=True)
  7. #checking validity of sales code
  8. invoice=invoice[invoice['Sales Office'].isin(soMstr['Code'])]
  9. invoice=invoice.reset_index(drop=True)
  10. invoice.shape
  11. #(384, 22)

I then need to copy data from the masters to the final, clean Invoice DataFrame. Instead of doing a for loop over two data frames, I thought I would do a merge on select columns.

  1. invoice1=invoice.merge(custFreightMstr[['Cust No','City','Customer Frgt Code']],left_on='Ship to Party',right_on='Cust No', how='left').drop_duplicates()
  2. invoice1.shape
  3. #(388, 25)

I end up with 4 extra rows even though I am merging on the left. I can identify which rows have been repeated. But I cant identify why. What am I doing wrong here?

答案1

得分: 2

代码部分不要翻译,只返回翻译好的部分:

你的代码中的合并等同于“左外连接”。如讨论所述,对于“Ship to Party”的某个值,你有多个匹配键“Cust No”。在主数据框中删除重复的键可能会有所帮助。

英文:

The merge in your code is equivalent to left outer join. As discussed you have more than one matching keys Cust No for a value of Ship to Party. Remove the duplicate keys in the master dataframe. That might help.

答案2

得分: 1

我不知道主框架中重复的Cust No哪一个是正确的。为了编码目的,我执行了以下操作:

  1. # 在主框架中删除重复的客户编号
  2. invoice1 = invoice.merge(custFreightMstr.drop_duplicates('Cust No', keep='last')[['Cust No', 'City', 'Customer Frgt Code']], left_on='Ship to Party', right_on='Cust No', how='left', validate='m:1')

'Cust No'上使用drop_duplicate会删除所有重复项,仅保留最后一个条目。

关键字validate确认在实际合并过程中每个客户代码只有一个。

英文:

I have no clue which of the repeated Cust No in the master frame is correct. For coding purposes, I executed the following:

  1. #drop duplicate cust no in the master
  2. invoice1=invoice.merge(custFreightMstr.drop_duplicates('Cust No',keep='last')[['Cust No','City','Customer Frgt Code']],left_on='Ship to Party',right_on='Cust No', how='left',validate = 'm:1')

drop_duplicate on 'Cust No'removes all duplicates, retaining the last entry alone.

The validate keyword confirms there is only one of each cust code during actual merge.

huangapple
  • 本文由 发表于 2020年1月3日 13:37:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/59573674.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定