Left Outer Join两个单列数据框

huangapple go评论87阅读模式
英文:

Left Outer Join with two single columned dataframes

问题

我在Pandas Merging 101中没有看到下面提到的情况。

我在理解Pandas文档中如何执行左外连接方面遇到了困难。

  1. import pandas as pd
  2. left_df = pd.DataFrame({
  3. 'user_id': ['Peter', 'John', 'Robert', 'Anna']
  4. })
  5. right_df = pd.DataFrame({'user_id': ['Paul', 'Mary', 'John', 'Anna']})
  6. pd.merge(left_df, right_df, on='user_id', how='left')

输出是:

  1. user_id
  2. 0 Peter
  3. 1 John
  4. 2 Robert
  5. 3 Anna

预期输出是:

  1. user_id
  2. 0 Peter
  3. 1 Robert

我漏掉了什么?indicator = True参数是否必需(用于创建一个_merge列以进行过滤)以执行左外连接

英文:

I don't see the below case mentioned in Pandas Merging 101.

<br> <br>
I'm having trouble understanding the Pandas documentation for doing a left outer join.

  1. import pandas as pd
  2. left_df = pd.DataFrame({
  3. &#39;user_id&#39;: [&#39;Peter&#39;, &#39;John&#39;, &#39;Robert&#39;, &#39;Anna&#39;]
  4. })
  5. right_df = pd.DataFrame({&#39;user_id&#39;: [&#39;Paul&#39;, &#39;Mary&#39;, &#39;John&#39;,
  6. &#39;Anna&#39;]
  7. })
  8. pd.merge(left_df, right_df, on = &#39;user_id&#39;, how = &#39;left&#39;)

Output is: <br>

  1. user_id
  2. 0 Peter
  3. 1 John
  4. 2 Robert
  5. 3 Anna

Expected output:

  1. user_id
  2. 0 Peter
  3. 1 Robert

What am I missing? Is the indicator = True parameter a must (to create a _merge column to filter on) for left outer joins?

答案1

得分: 1

你可以使用 merge 并设置 indicator=True,然后只保留值为 left_only 的行,但这不是最佳方法。你可以使用 isin 来获取一个布尔掩码,然后反转它:

  1. left_df[~left_df['user_id'].isin(right_df['user_id'])]
  2. user_id
  3. 0 Peter
  4. 2 Robert

使用 merge

  1. (left_df.merge(right_df, on='user_id', how='left', indicator='present')
  2. .loc[lambda x: x.pop('present') == 'left_only'])
  3. user_id
  4. 0 Peter
  5. 2 Robert
英文:

You can use merge with indicator=True and keep only rows where value is set to left_only but it's not the best way. You can use isin to get a boolean mask then invert it:

  1. &gt;&gt;&gt; left_df[~left_df[&#39;user_id&#39;].isin(right_df[&#39;user_id&#39;])]
  2. user_id
  3. 0 Peter
  4. 2 Robert

With merge:

  1. &gt;&gt;&gt; (left_df.merge(right_df, on=&#39;user_id&#39;, how=&#39;left&#39;, indicator=&#39;present&#39;)
  2. .loc[lambda x: x.pop(&#39;present&#39;) == &#39;left_only&#39;])
  3. user_id
  4. 0 Peter
  5. 2 Robert

huangapple
  • 本文由 发表于 2023年2月10日 03:47:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75403715.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定