英文:
Merge or append 2 dataframes row wise and add a check in a separate column determining which one it came from
问题
你可以使用Pandas中的concat
函数来按行合并这两个DataFrame,然后填充缺失的值为1。以下是示例代码:
import pandas as pd
# 合并两个DataFrame,忽略索引,保留列名
merged_df = pd.concat([df1, df2], ignore_index=True)
# 填充缺失的值为1
merged_df['common'].fillna(0, inplace=True)
merged_df['alt'].fillna(0, inplace=True)
# 将浮点数列转换为整数
merged_df['common'] = merged_df['common'].astype(int)
merged_df['alt'] = merged_df['alt'].astype(int)
# 如果两个列都有值,将它们相加
merged_df['common'] = merged_df['common'] + merged_df['alt']
# 删除'type'列,如果需要
# merged_df = merged_df.drop('Type', axis=1)
# 打印最终DataFrame
print(merged_df)
这将合并两个DataFrame,根据'commonshortname'、'altshortname'、'Code'、'Type'列进行匹配,并添加'common'和'alt'列以表示数据的来源。
英文:
I have the following 2 dataframes, df1
,
import pandas as pd
data = {
'commonshortname': ['SNX.US', '002400.CH', 'CDW.US', 'CEC.GR', '300002.CH'],
'altshortname': ['SNX.US', '002400.SHE', 'CDW.US', 'CEC.XETRA', '300002.SHE'],
'Code': ['SNX', '002400', 'CDW', 'CEC', '300002', ...],
'Type': ['Common Stock', 'Common Stock', 'Common Stock', 'Common Stock', 'Common Stock'],
'common': [1, 1, 1, 1, 1]
}
df1 = pd.DataFrame(data)
and df2
which looks like this,
data = {'altshortname': ['SEDG.US', 'MHLD.US', 'CDW.US', 'POLA.US', 'PHASQ.US'],
'Code': ['SEDG', 'MHLD', 'CDW', 'POLA', 'PHASQ'],
'Type': ['Common Stock', 'Common Stock', 'Common Stock', 'Common Stock', 'Common Stock'],
'alt': [1, 1, 1, 1, 1]}
df2 = pd.DataFrame(data)
This is what they look like in dataframe form,
commonshortname altshortname Code Type common
0 SNX.US SNX.US SNX Common Stock 1
1 002400.CH 002400.SHE 002400 Common Stock 1
2 CDW.US CDW.US CDW Common Stock 1
3 CEC.GR CEC.XETRA CEC Common Stock 1
4 300002.CH 300002.SHE 300002 Common Stock 1
... ... ... ... ... ...
and
altshortname Code Type alt
0 SEDG.US SEDG Common Stock 1
1 MHLD.US MHLD Common Stock 1
2 CDW.US CDW Common Stock 1
3 POLA.US POLA Common Stock 1
4 PHASQ.US PHASQ Common Stock 1
I want to merge these 2 row wise, so that if they exist in both, the data from the top dataframe is taken and a 1 is added into the alt column for it.
The final frame should look like this,
commonshortname altshortname Code Type common alt
0 SNX.US SNX.US SNX Common Stock 1
1 002400.CH 002400.SHE 002400 Common Stock 1
2 CDW.US CDW.US CDW Common Stock 1 1
3 CEC.GR CEC.XETRA CEC Common Stock 1
4 300002.CH 300002.SHE 300002 Common Stock 1
0 SEDG.US SEDG Common Stock 1
1 MHLD.US MHLD Common Stock 1
3 POLA.US POLA Common Stock 1
4 PHASQ.US PHASQ Common Stock 1
Basically, if the data came from df1, there will be a 1 in the common column, if it came from df2, there will be a 1 in the alt column, and if it came from both, there will be a 1 in both columns.
Can this be done in pandas?
I tried to do a merge, but it keeps joining it column wise and I end up with millions of rows.
merged_df = pd.merge(df1, df2, on=['altshortname', 'Code', 'Type'], how='outer')
答案1
得分: 1
我理解你需要的是 concat
和 drop_duplicates
。
out = pd.concat([df1, df2], ignore_index=True).drop_duplicates(
["altshortname", "Code", "Type"], ignore_index=True
)
英文:
IIUC what you need is a concat
and drop_duplicates
out = pd.concat([df1, df2], ignore_index=True).drop_duplicates(
["altshortname", "Code", "Type"], ignore_index=True
)
答案2
得分: 1
这是一个可能的解决方案:
merged_df = pd.merge(df1, df2, on=['altshortname', 'Code', 'Type'], how='outer')
merged_df.fillna(0, inplace=True)
merged_df[['common', 'alt']] = merged_df[['common', 'alt']].astype(int)
merged_df.replace(0, '', inplace=True)
print(merged_df)
commonshortname altshortname Code Type common alt
0 SNX.US SNX.US SNX Common Stock 1
1 002400.CH 002400.SHE 002400 Common Stock 1
2 CDW.US CDW.US CDW Common Stock 1 1
3 CEC.GR CEC.XETRA CEC Common Stock 1
4 300002.CH 300002.SHE 300002 Common Stock 1
5 SEDG.US SEDG Common Stock 1
6 MHLD.US MHLD Common Stock 1
7 POLA.US POLA Common Stock 1
8 PHASQ.US PHASQ Common Stock 1
英文:
Here is a possible solution:
merged_df = pd.merge(df1, df2, on=['altshortname', 'Code', 'Type'], how='outer')
merged_df.fillna(0, inplace=True)
merged_df[['common', 'alt']] = merged_df[['common', 'alt']].astype(int)
merged_df.replace(0, '', inplace=True)
print(merged_df)
commonshortname altshortname Code Type common alt
0 SNX.US SNX.US SNX Common Stock 1
1 002400.CH 002400.SHE 002400 Common Stock 1
2 CDW.US CDW.US CDW Common Stock 1 1
3 CEC.GR CEC.XETRA CEC Common Stock 1
4 300002.CH 300002.SHE 300002 Common Stock 1
5 SEDG.US SEDG Common Stock 1
6 MHLD.US MHLD Common Stock 1
7 POLA.US POLA Common Stock 1
8 PHASQ.US PHASQ Common Stock 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论