Merge or append 2 dataframes row wise and add a check in a separate column determining which one it came from

huangapple go评论105阅读模式
英文:

Merge or append 2 dataframes row wise and add a check in a separate column determining which one it came from

问题

你可以使用Pandas中的concat函数来按行合并这两个DataFrame,然后填充缺失的值为1。以下是示例代码:

  1. import pandas as pd
  2. # 合并两个DataFrame,忽略索引,保留列名
  3. merged_df = pd.concat([df1, df2], ignore_index=True)
  4. # 填充缺失的值为1
  5. merged_df['common'].fillna(0, inplace=True)
  6. merged_df['alt'].fillna(0, inplace=True)
  7. # 将浮点数列转换为整数
  8. merged_df['common'] = merged_df['common'].astype(int)
  9. merged_df['alt'] = merged_df['alt'].astype(int)
  10. # 如果两个列都有值,将它们相加
  11. merged_df['common'] = merged_df['common'] + merged_df['alt']
  12. # 删除'type'列,如果需要
  13. # merged_df = merged_df.drop('Type', axis=1)
  14. # 打印最终DataFrame
  15. print(merged_df)

这将合并两个DataFrame,根据'commonshortname'、'altshortname'、'Code'、'Type'列进行匹配,并添加'common'和'alt'列以表示数据的来源。

英文:

I have the following 2 dataframes, df1,

  1. import pandas as pd
  2. data = {
  3. 'commonshortname': ['SNX.US', '002400.CH', 'CDW.US', 'CEC.GR', '300002.CH'],
  4. 'altshortname': ['SNX.US', '002400.SHE', 'CDW.US', 'CEC.XETRA', '300002.SHE'],
  5. 'Code': ['SNX', '002400', 'CDW', 'CEC', '300002', ...],
  6. 'Type': ['Common Stock', 'Common Stock', 'Common Stock', 'Common Stock', 'Common Stock'],
  7. 'common': [1, 1, 1, 1, 1]
  8. }
  9. df1 = pd.DataFrame(data)

and df2 which looks like this,

  1. data = {'altshortname': ['SEDG.US', 'MHLD.US', 'CDW.US', 'POLA.US', 'PHASQ.US'],
  2. 'Code': ['SEDG', 'MHLD', 'CDW', 'POLA', 'PHASQ'],
  3. 'Type': ['Common Stock', 'Common Stock', 'Common Stock', 'Common Stock', 'Common Stock'],
  4. 'alt': [1, 1, 1, 1, 1]}
  5. df2 = pd.DataFrame(data)

This is what they look like in dataframe form,

  1. commonshortname altshortname Code Type common
  2. 0 SNX.US SNX.US SNX Common Stock 1
  3. 1 002400.CH 002400.SHE 002400 Common Stock 1
  4. 2 CDW.US CDW.US CDW Common Stock 1
  5. 3 CEC.GR CEC.XETRA CEC Common Stock 1
  6. 4 300002.CH 300002.SHE 300002 Common Stock 1
  7. ... ... ... ... ... ...

and

  1. altshortname Code Type alt
  2. 0 SEDG.US SEDG Common Stock 1
  3. 1 MHLD.US MHLD Common Stock 1
  4. 2 CDW.US CDW Common Stock 1
  5. 3 POLA.US POLA Common Stock 1
  6. 4 PHASQ.US PHASQ Common Stock 1

I want to merge these 2 row wise, so that if they exist in both, the data from the top dataframe is taken and a 1 is added into the alt column for it.

The final frame should look like this,

  1. commonshortname altshortname Code Type common alt
  2. 0 SNX.US SNX.US SNX Common Stock 1
  3. 1 002400.CH 002400.SHE 002400 Common Stock 1
  4. 2 CDW.US CDW.US CDW Common Stock 1 1
  5. 3 CEC.GR CEC.XETRA CEC Common Stock 1
  6. 4 300002.CH 300002.SHE 300002 Common Stock 1
  7. 0 SEDG.US SEDG Common Stock 1
  8. 1 MHLD.US MHLD Common Stock 1
  9. 3 POLA.US POLA Common Stock 1
  10. 4 PHASQ.US PHASQ Common Stock 1

Basically, if the data came from df1, there will be a 1 in the common column, if it came from df2, there will be a 1 in the alt column, and if it came from both, there will be a 1 in both columns.

Can this be done in pandas?

I tried to do a merge, but it keeps joining it column wise and I end up with millions of rows.

  1. merged_df = pd.merge(df1, df2, on=['altshortname', 'Code', 'Type'], how='outer')

答案1

得分: 1

我理解你需要的是 concatdrop_duplicates

  1. out = pd.concat([df1, df2], ignore_index=True).drop_duplicates(
  2. ["altshortname", "Code", "Type"], ignore_index=True
  3. )
英文:

IIUC what you need is a concat and drop_duplicates

  1. out = pd.concat([df1, df2], ignore_index=True).drop_duplicates(
  2. ["altshortname", "Code", "Type"], ignore_index=True
  3. )

答案2

得分: 1

这是一个可能的解决方案:

  1. merged_df = pd.merge(df1, df2, on=['altshortname', 'Code', 'Type'], how='outer')
  2. merged_df.fillna(0, inplace=True)
  3. merged_df[['common', 'alt']] = merged_df[['common', 'alt']].astype(int)
  4. merged_df.replace(0, '', inplace=True)
  5. print(merged_df)

  1. commonshortname altshortname Code Type common alt
  2. 0 SNX.US SNX.US SNX Common Stock 1
  3. 1 002400.CH 002400.SHE 002400 Common Stock 1
  4. 2 CDW.US CDW.US CDW Common Stock 1 1
  5. 3 CEC.GR CEC.XETRA CEC Common Stock 1
  6. 4 300002.CH 300002.SHE 300002 Common Stock 1
  7. 5 SEDG.US SEDG Common Stock 1
  8. 6 MHLD.US MHLD Common Stock 1
  9. 7 POLA.US POLA Common Stock 1
  10. 8 PHASQ.US PHASQ Common Stock 1
英文:

Here is a possible solution:

  1. merged_df = pd.merge(df1, df2, on=['altshortname', 'Code', 'Type'], how='outer')
  2. merged_df.fillna(0, inplace=True)
  3. merged_df[['common', 'alt']] = merged_df[['common', 'alt']].astype(int)
  4. merged_df.replace(0, '', inplace=True)
  5. print(merged_df)

  1. commonshortname altshortname Code Type common alt
  2. 0 SNX.US SNX.US SNX Common Stock 1
  3. 1 002400.CH 002400.SHE 002400 Common Stock 1
  4. 2 CDW.US CDW.US CDW Common Stock 1 1
  5. 3 CEC.GR CEC.XETRA CEC Common Stock 1
  6. 4 300002.CH 300002.SHE 300002 Common Stock 1
  7. 5 SEDG.US SEDG Common Stock 1
  8. 6 MHLD.US MHLD Common Stock 1
  9. 7 POLA.US POLA Common Stock 1
  10. 8 PHASQ.US PHASQ Common Stock 1

huangapple
  • 本文由 发表于 2023年2月27日 04:39:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/75574875.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定