2023年7月11日 04:02:43go评论95阅读模式

英文:

Pandas Dataframe merge to get only non-existing records

问题

代码片段如下：

df2 = df.merge(df_existing,
               on=['symbolid', 'timeframeid', 'datetime'],
               how='left',
               indicator=True).query('_merge == "left_only"').drop(columns='_merge')

现在的结果显示所有非连接列都以 _x 和 _y 为后缀，根据它们来自的 df 不同。

期望的结果是与原始数据框中相同的列，但基于 symbolid、timeframeid 和 datetime 的重复行已被删除。

英文:

Okay.. so I'm trying to merge two dataframes to only get the records from dataframe1 (df) that doesn't already exist in dataframe2 (df_existing)

columns in both dataframes:
symbolid
timeframeid
datetime
open
high
low
close
volume

Code snippet that as far as I know used to work fine:

df2 = df.merge(df_existing,
                        on = [&#39;symbolid&#39;, &#39;timeframeid&#39;, &#39;datetime&#39;],
                        how = &#39;left&#39;,
                        indicator = True).query(&#39;_merge == &quot;left_only&quot;&#39;).drop(columns = &#39;_merge&#39;)

The result now is showing all the non-join columns duplicated with suffixes _x and _y according to what df they originate from.

The desired outcome is the same columns as in the original dataframes but with the duplicate rows based on symbolid, timeframeid and datetime removed.

答案1

得分: 1

使用 merge 来对齐两个 DataFrame 时，可以通过切片合并的列来避免后缀：

cols = ['symbolid', 'timeframeid', 'datetime']
df2 = (df.merge(df_existing[cols],
                on=cols, how='left',
                indicator=True)
         .query('_merge == "left_only"')
         .drop(columns = '_merge')
       )

使用 pop 和 loc 进行替代，以在单一步骤中进行筛选和删除：

cols = ['symbolid', 'timeframeid', 'datetime']
df2 = (df.merge(df_existing[cols],
                on=cols, how='left',
                indicator=True)
         .loc[lambda d: d.pop('_merge').eq('left_only')
     )

英文:

When using a merge to align two DataFrames, you can avoid suffixes by just slicing the merging columns:

cols = [&#39;symbolid&#39;, &#39;timeframeid&#39;, &#39;datetime&#39;]
df2 = (df.merge(df_existing[cols],
                on=cols, how=&#39;left&#39;,
                indicator=True)
         .query(&#39;_merge == &quot;left_only&quot;&#39;)
         .drop(columns = &#39;_merge&#39;)
       )

Alternative with pop and loc to filter and drop in a single step:

cols = [&#39;symbolid&#39;, &#39;timeframeid&#39;, &#39;datetime&#39;]
df2 = (df.merge(df_existing[cols],
                on=cols, how=&#39;left&#39;,
                indicator=True)
         .loc[lambda d: d.pop(&#39;_merge&#39;).eq(&#39;left_only&#39;)
     )

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将Pandas DataFrame合并以获取仅存在于其中一个DataFrame中的记录

问题

答案1

传递 R 对象（plot/image）到 Python 环境中的 Python

Pdf文件在前端未显示。

在Python中如何合并字节文件

使用嵌套循环在Python中输入一个二维数组。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。