Left Outer Join两个单列数据框

huangapple go评论68阅读模式
英文:

Left Outer Join with two single columned dataframes

问题

我在Pandas Merging 101中没有看到下面提到的情况。

我在理解Pandas文档中如何执行左外连接方面遇到了困难。

import pandas as pd

left_df = pd.DataFrame({ 
                       'user_id': ['Peter', 'John', 'Robert', 'Anna']
                      })

right_df = pd.DataFrame({'user_id': ['Paul', 'Mary', 'John', 'Anna']})

pd.merge(left_df, right_df, on='user_id', how='left')

输出是:

   user_id
0  Peter
1  John
2  Robert
3  Anna

预期输出是:

   user_id
0  Peter
1  Robert

我漏掉了什么?indicator = True参数是否必需(用于创建一个_merge列以进行过滤)以执行左外连接

英文:

I don't see the below case mentioned in Pandas Merging 101.

<br> <br>
I'm having trouble understanding the Pandas documentation for doing a left outer join.

import pandas as pd

left_df = pd.DataFrame({ 
                       &#39;user_id&#39;: [&#39;Peter&#39;, &#39;John&#39;, &#39;Robert&#39;, &#39;Anna&#39;]
                       
                      })

right_df = pd.DataFrame({&#39;user_id&#39;: [&#39;Paul&#39;, &#39;Mary&#39;, &#39;John&#39;,
                                     &#39;Anna&#39;]
                       })


pd.merge(left_df, right_df, on = &#39;user_id&#39;, how = &#39;left&#39;)

Output is: <br>

 	user_id
0 	Peter
1 	John
2 	Robert
3 	Anna

Expected output:

 	user_id
0 	Peter
1 	Robert

What am I missing? Is the indicator = True parameter a must (to create a _merge column to filter on) for left outer joins?

答案1

得分: 1

你可以使用 merge 并设置 indicator=True,然后只保留值为 left_only 的行,但这不是最佳方法。你可以使用 isin 来获取一个布尔掩码,然后反转它:

left_df[~left_df['user_id'].isin(right_df['user_id'])]

  user_id
0   Peter
2  Robert

使用 merge

(left_df.merge(right_df, on='user_id', how='left', indicator='present')
            .loc[lambda x: x.pop('present') == 'left_only'])

  user_id
0   Peter
2  Robert
英文:

You can use merge with indicator=True and keep only rows where value is set to left_only but it's not the best way. You can use isin to get a boolean mask then invert it:

&gt;&gt;&gt; left_df[~left_df[&#39;user_id&#39;].isin(right_df[&#39;user_id&#39;])]

  user_id
0   Peter
2  Robert

With merge:

&gt;&gt;&gt; (left_df.merge(right_df, on=&#39;user_id&#39;, how=&#39;left&#39;, indicator=&#39;present&#39;)
            .loc[lambda x: x.pop(&#39;present&#39;) == &#39;left_only&#39;])

  user_id
0   Peter
2  Robert

huangapple
  • 本文由 发表于 2023年2月10日 03:47:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75403715.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定