合并两个具有多个ID的数据框(使用查找或条件?)

huangapple go评论94阅读模式
英文:

Merge 2 dfs with several IDs (using a lookup or conditions?)

问题

我想要对这两个数据框进行右连接:

  1. df1 <- structure(list(ID_1 = c("NE1", "NE2", "LR2", "NE4", "WE5"), ID_2 = c("WW1", "NE3", "LR2", "NE5", "WE5"), Country = c("Togo", "USA", "Iceland", "Chad", "UK")), class = "data.frame", row.names = c(NA, -5L))
  2. df2 <- structure(list(ID_1 = c("NE1", "NE3", "LR2", "NE6"), Sex = c("M", "F", "M", "M")), class = "data.frame", row.names = c(NA, -4L))

请注意,df1有两个ID列,而df2只有一个ID列。

我希望首先基于ID_1进行匹配,然后,仅当无法基于ID_1进行匹配时,再基于ID_2进行匹配。换句话说,我不能使用c(ID_1, ID_2)或类似的方式。

期望的输出:

  1. # ID_1 Country Sex
  2. # 1 NE1 Togo M --> 基于df1的ID_1
  3. # 2 NE3 USA F --> 基于df1的ID_2
  4. # 3 LR2 Iceland M --> 基于df1的ID_1和ID_2
  5. # 4 NE6 <NA> M --> 无匹配但保留行(右连接)

我还希望最终只有一个ID列,就像上面的期望输出一样。

我发现了Python的解决方案(https://stackoverflow.com/questions/58739744/how-to-merge-2-df-based-on-comparison-of-2-columns-to-match-1-column 和 https://stackoverflow.com/questions/48585027/merge-pandas-df-based-on-2-keys),但没有找到R的解决方案,或者无法使它们工作。

英文:

I want to do a right join with these 2 dfs:

  1. df1 &lt;- structure(list(ID_1 = c(&quot;NE1&quot;, &quot;NE2&quot;, &quot;LR2&quot;, &quot;NE4&quot;, &quot;WE5&quot;), ID_2 = c(&quot;WW1&quot;,
  2. &quot;NE3&quot;, &quot;LR2&quot;, &quot;NE5&quot;, &quot;WE5&quot;), Country = c(&quot;Togo&quot;, &quot;USA&quot;, &quot;Iceland&quot;,
  3. &quot;Chad&quot;, &quot;UK&quot;)), class = &quot;data.frame&quot;, row.names = c(NA, -5L))
  4. df2 &lt;- structure(list(ID_1 = c(&quot;NE1&quot;, &quot;NE3&quot;, &quot;LR2&quot;, &quot;NE6&quot;), Sex = c(&quot;M&quot;,
  5. &quot;F&quot;, &quot;M&quot;, &quot;M&quot;)), class = &quot;data.frame&quot;, row.names = c(NA, -4L))

Note df1 has 2 ID columns, whereas df2 has only 1 ID column.

I want the match to be made on ID_1, and then, only if the match cannot be made on ID_1, to be made on ID_2. In other words, I cannot use c(ID_1, ID_2) or similar.

Desired output:

  1. # ID_1 Country Sex
  2. # 1 NE1 Togo M --&gt; based on ID_1 of df1
  3. # 2 NE3 USA F --&gt; based on ID_2 of df1
  4. # 3 LR2 Iceland M --&gt; based on both ID_1 and ID_2 of df1
  5. # 4 NE6 &lt;NA&gt; M --&gt; no match but row is kept (right join)

I also want to end up with a single ID column as in the desired output above.

I found solutions for Python (https://stackoverflow.com/questions/58739744/how-to-merge-2-df-based-on-comparison-of-2-columns-to-match-1-column and https://stackoverflow.com/questions/48585027/merge-pandas-df-based-on-2-keys) but not R, or cannot make them work.

答案1

得分: 1

  1. library(dplyr)
  2. df2 %>%
  3. left_join(distinct_all(select(pivot_longer(df1, -Country, values_to = "ID_1"), -name)))
  4. Joining with `by = join_by(ID_1)`
  5. # A tibble: 4 × 3
  6. ID_1 Sex Country
  7. <chr> <chr> <chr>
  8. 1 NE1 M Togo
  9. 2 NE3 F USA
  10. 3 LR2 M Iceland
  11. 4 NE6 M NA
  12. More readable format:
  13. df1 <- df1 %>%
  14. pivot_longer(-Country, values_to = "ID_1") %>%
  15. select(-name) %>%
  16. distinct_all()
  17. df2 %>%
  18. left_join(df1)
英文:
  1. library(dplyr)
  2. df2 %&gt;%
  3. left_join(distinct_all(select(pivot_longer(df1, -Country, values_to = &quot;ID_1&quot;), -name)))
  4. Joining with `by = join_by(ID_1)`
  5. # A tibble: 4 &#215; 3
  6. ID_1 Sex Country
  7. &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  8. 1 NE1 M Togo
  9. 2 NE3 F USA
  10. 3 LR2 M Iceland
  11. 4 NE6 M NA

More readable format:

  1. df1 &lt;- df1 %&gt;%
  2. pivot_longer(-Country, values_to = &quot;ID_1&quot;) %&gt;%
  3. select(-name) %&gt;%
  4. distinct_all()
  5. df2 %&gt;%
  6. left_join(df1)

huangapple
  • 本文由 发表于 2023年6月22日 19:08:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76531275.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定