Merge or append 2 dataframes row wise and add a check in a separate column determining which one it came from

huangapple go评论72阅读模式
英文:

Merge or append 2 dataframes row wise and add a check in a separate column determining which one it came from

问题

你可以使用Pandas中的concat函数来按行合并这两个DataFrame,然后填充缺失的值为1。以下是示例代码:

import pandas as pd

# 合并两个DataFrame,忽略索引,保留列名
merged_df = pd.concat([df1, df2], ignore_index=True)

# 填充缺失的值为1
merged_df['common'].fillna(0, inplace=True)
merged_df['alt'].fillna(0, inplace=True)

# 将浮点数列转换为整数
merged_df['common'] = merged_df['common'].astype(int)
merged_df['alt'] = merged_df['alt'].astype(int)

# 如果两个列都有值,将它们相加
merged_df['common'] = merged_df['common'] + merged_df['alt']

# 删除'type'列,如果需要
# merged_df = merged_df.drop('Type', axis=1)

# 打印最终DataFrame
print(merged_df)

这将合并两个DataFrame,根据'commonshortname'、'altshortname'、'Code'、'Type'列进行匹配,并添加'common'和'alt'列以表示数据的来源。

英文:

I have the following 2 dataframes, df1,

import pandas as pd

data = {
    'commonshortname': ['SNX.US', '002400.CH', 'CDW.US', 'CEC.GR', '300002.CH'],
    'altshortname': ['SNX.US', '002400.SHE', 'CDW.US', 'CEC.XETRA', '300002.SHE'],
    'Code': ['SNX', '002400', 'CDW', 'CEC', '300002', ...],
    'Type': ['Common Stock', 'Common Stock', 'Common Stock', 'Common Stock', 'Common Stock'],
    'common': [1, 1, 1, 1, 1]
}

df1 = pd.DataFrame(data)

and df2 which looks like this,

data = {'altshortname': ['SEDG.US', 'MHLD.US', 'CDW.US', 'POLA.US', 'PHASQ.US'],
        'Code': ['SEDG', 'MHLD', 'CDW', 'POLA', 'PHASQ'],
        'Type': ['Common Stock', 'Common Stock', 'Common Stock', 'Common Stock', 'Common Stock'],
        'alt': [1, 1, 1, 1, 1]}

df2 = pd.DataFrame(data)

This is what they look like in dataframe form,

     commonshortname altshortname  Code           Type   common
0          SNX.US       SNX.US      SNX   Common Stock     1
1       002400.CH    002400.SHE  002400  Common Stock      1
2          CDW.US       CDW.US      CDW   Common Stock     1
3          CEC.GR     CEC.XETRA     CEC  Common Stock      1
4       300002.CH    300002.SHE  300002  Common Stock      1
...           ...          ...     ...           ...  ...

and

     altshortname    Code         Type         alt
0         SEDG.US    SEDG  Common Stock          1
1         MHLD.US    MHLD  Common Stock          1
2          CDW.US     CDW  Common Stock          1
3         POLA.US    POLA  Common Stock          1
4        PHASQ.US   PHASQ  Common Stock          1

I want to merge these 2 row wise, so that if they exist in both, the data from the top dataframe is taken and a 1 is added into the alt column for it.

The final frame should look like this,

     commonshortname altshortname  Code           Type   common   alt
0          SNX.US       SNX.US      SNX   Common Stock     1
1       002400.CH    002400.SHE  002400  Common Stock      1
2          CDW.US       CDW.US      CDW   Common Stock     1       1
3          CEC.GR     CEC.XETRA     CEC  Common Stock      1
4       300002.CH    300002.SHE  300002  Common Stock      1
0                      SEDG.US    SEDG  Common Stock               1
1                      MHLD.US    MHLD  Common Stock               1
3                      POLA.US    POLA  Common Stock               1
4                     PHASQ.US   PHASQ  Common Stock               1

Basically, if the data came from df1, there will be a 1 in the common column, if it came from df2, there will be a 1 in the alt column, and if it came from both, there will be a 1 in both columns.

Can this be done in pandas?

I tried to do a merge, but it keeps joining it column wise and I end up with millions of rows.

merged_df = pd.merge(df1, df2, on=['altshortname', 'Code', 'Type'], how='outer')

答案1

得分: 1

我理解你需要的是 concatdrop_duplicates

out = pd.concat([df1, df2], ignore_index=True).drop_duplicates(
    ["altshortname", "Code", "Type"], ignore_index=True
)
英文:

IIUC what you need is a concat and drop_duplicates

out = pd.concat([df1, df2], ignore_index=True).drop_duplicates(
    ["altshortname", "Code", "Type"], ignore_index=True
)

答案2

得分: 1

这是一个可能的解决方案:

merged_df = pd.merge(df1, df2, on=['altshortname', 'Code', 'Type'], how='outer')
merged_df.fillna(0, inplace=True)

merged_df[['common', 'alt']] = merged_df[['common', 'alt']].astype(int)
merged_df.replace(0, '', inplace=True)
print(merged_df)

  commonshortname altshortname    Code          Type common alt
0          SNX.US       SNX.US     SNX  Common Stock      1    
1       002400.CH   002400.SHE  002400  Common Stock      1    
2          CDW.US       CDW.US     CDW  Common Stock      1   1
3          CEC.GR    CEC.XETRA     CEC  Common Stock      1    
4       300002.CH   300002.SHE  300002  Common Stock      1    
5                      SEDG.US    SEDG  Common Stock          1
6                      MHLD.US    MHLD  Common Stock          1
7                      POLA.US    POLA  Common Stock          1
8                     PHASQ.US   PHASQ  Common Stock          1
英文:

Here is a possible solution:

merged_df = pd.merge(df1, df2, on=['altshortname', 'Code', 'Type'], how='outer')
merged_df.fillna(0, inplace=True)

merged_df[['common', 'alt']] = merged_df[['common', 'alt']].astype(int)
merged_df.replace(0, '', inplace=True)
print(merged_df)

  commonshortname altshortname    Code          Type common alt
0          SNX.US       SNX.US     SNX  Common Stock      1    
1       002400.CH   002400.SHE  002400  Common Stock      1    
2          CDW.US       CDW.US     CDW  Common Stock      1   1
3          CEC.GR    CEC.XETRA     CEC  Common Stock      1    
4       300002.CH   300002.SHE  300002  Common Stock      1    
5                      SEDG.US    SEDG  Common Stock          1
6                      MHLD.US    MHLD  Common Stock          1
7                      POLA.US    POLA  Common Stock          1
8                     PHASQ.US   PHASQ  Common Stock          1

huangapple
  • 本文由 发表于 2023年2月27日 04:39:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/75574875.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定