英文:
Pandas Dataframe merge to get only non-existing records
问题
代码片段如下:
df2 = df.merge(df_existing,
on=['symbolid', 'timeframeid', 'datetime'],
how='left',
indicator=True).query('_merge == "left_only"').drop(columns='_merge')
现在的结果显示所有非连接列都以 _x 和 _y 为后缀,根据它们来自的 df 不同。
期望的结果是与原始数据框中相同的列,但基于 symbolid、timeframeid 和 datetime 的重复行已被删除。
英文:
Okay.. so I'm trying to merge two dataframes to only get the records from dataframe1 (df) that doesn't already exist in dataframe2 (df_existing)
columns in both dataframes:
symbolid
timeframeid
datetime
open
high
low
close
volume
Code snippet that as far as I know used to work fine:
df2 = df.merge(df_existing,
on = ['symbolid', 'timeframeid', 'datetime'],
how = 'left',
indicator = True).query('_merge == "left_only"').drop(columns = '_merge')
The result now is showing all the non-join columns duplicated with suffixes _x and _y according to what df they originate from.
The desired outcome is the same columns as in the original dataframes but with the duplicate rows based on symbolid, timeframeid and datetime removed.
答案1
得分: 1
使用 merge
来对齐两个 DataFrame 时,可以通过切片合并的列来避免后缀:
cols = ['symbolid', 'timeframeid', 'datetime']
df2 = (df.merge(df_existing[cols],
on=cols, how='left',
indicator=True)
.query('_merge == "left_only"')
.drop(columns = '_merge')
)
使用 pop
和 loc
进行替代,以在单一步骤中进行筛选和删除:
cols = ['symbolid', 'timeframeid', 'datetime']
df2 = (df.merge(df_existing[cols],
on=cols, how='left',
indicator=True)
.loc[lambda d: d.pop('_merge').eq('left_only')
)
英文:
When using a merge
to align two DataFrames, you can avoid suffixes by just slicing the merging columns:
cols = ['symbolid', 'timeframeid', 'datetime']
df2 = (df.merge(df_existing[cols],
on=cols, how='left',
indicator=True)
.query('_merge == "left_only"')
.drop(columns = '_merge')
)
Alternative with pop
and loc
to filter and drop in a single step:
cols = ['symbolid', 'timeframeid', 'datetime']
df2 = (df.merge(df_existing[cols],
on=cols, how='left',
indicator=True)
.loc[lambda d: d.pop('_merge').eq('left_only')
)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论