合并两个具有多个ID的数据框(使用查找或条件?)

huangapple go评论67阅读模式
英文:

Merge 2 dfs with several IDs (using a lookup or conditions?)

问题

我想要对这两个数据框进行右连接:

df1 <- structure(list(ID_1 = c("NE1", "NE2", "LR2", "NE4", "WE5"), ID_2 = c("WW1", "NE3", "LR2", "NE5", "WE5"), Country = c("Togo", "USA", "Iceland", "Chad", "UK")), class = "data.frame", row.names = c(NA, -5L))

df2 <- structure(list(ID_1 = c("NE1", "NE3", "LR2", "NE6"), Sex = c("M", "F", "M", "M")), class = "data.frame", row.names = c(NA, -4L))

请注意,df1有两个ID列,而df2只有一个ID列。

我希望首先基于ID_1进行匹配,然后,仅当无法基于ID_1进行匹配时,再基于ID_2进行匹配。换句话说,我不能使用c(ID_1, ID_2)或类似的方式。

期望的输出:

#     ID_1   Country Sex
# 1   NE1    Togo    M    --> 基于df1的ID_1
# 2   NE3    USA     F    --> 基于df1的ID_2
# 3   LR2    Iceland M    --> 基于df1的ID_1和ID_2
# 4   NE6    <NA>    M    --> 无匹配但保留行(右连接)

我还希望最终只有一个ID列,就像上面的期望输出一样。

我发现了Python的解决方案(https://stackoverflow.com/questions/58739744/how-to-merge-2-df-based-on-comparison-of-2-columns-to-match-1-column 和 https://stackoverflow.com/questions/48585027/merge-pandas-df-based-on-2-keys),但没有找到R的解决方案,或者无法使它们工作。

英文:

I want to do a right join with these 2 dfs:

df1 &lt;- structure(list(ID_1 = c(&quot;NE1&quot;, &quot;NE2&quot;, &quot;LR2&quot;, &quot;NE4&quot;, &quot;WE5&quot;), ID_2 = c(&quot;WW1&quot;, 
                                                                            &quot;NE3&quot;, &quot;LR2&quot;, &quot;NE5&quot;, &quot;WE5&quot;), Country = c(&quot;Togo&quot;, &quot;USA&quot;, &quot;Iceland&quot;, 
                                                                                                                     &quot;Chad&quot;, &quot;UK&quot;)), class = &quot;data.frame&quot;, row.names = c(NA, -5L))


df2 &lt;- structure(list(ID_1 = c(&quot;NE1&quot;, &quot;NE3&quot;, &quot;LR2&quot;, &quot;NE6&quot;), Sex = c(&quot;M&quot;, 
                                                                    &quot;F&quot;, &quot;M&quot;, &quot;M&quot;)), class = &quot;data.frame&quot;, row.names = c(NA, -4L))

Note df1 has 2 ID columns, whereas df2 has only 1 ID column.

I want the match to be made on ID_1, and then, only if the match cannot be made on ID_1, to be made on ID_2. In other words, I cannot use c(ID_1, ID_2) or similar.

Desired output:

#     ID_1    Country   Sex
# 1   NE1     Togo      M     --&gt; based on ID_1 of df1
# 2   NE3     USA       F     --&gt; based on ID_2 of df1
# 3   LR2     Iceland   M     --&gt; based on both ID_1 and ID_2 of df1
# 4   NE6     &lt;NA&gt;      M     --&gt; no match but row is kept (right join)

I also want to end up with a single ID column as in the desired output above.

I found solutions for Python (https://stackoverflow.com/questions/58739744/how-to-merge-2-df-based-on-comparison-of-2-columns-to-match-1-column and https://stackoverflow.com/questions/48585027/merge-pandas-df-based-on-2-keys) but not R, or cannot make them work.

答案1

得分: 1

library(dplyr)

df2 %>%
  left_join(distinct_all(select(pivot_longer(df1, -Country, values_to = "ID_1"), -name)))

Joining with `by = join_by(ID_1)`
# A tibble: 4 × 3
  ID_1  Sex   Country
  <chr> <chr> <chr>  
1 NE1   M     Togo   
2 NE3   F     USA    
3 LR2   M     Iceland
4 NE6   M     NA    

More readable format: 

df1 <- df1 %>%
  pivot_longer(-Country, values_to = "ID_1") %>%
  select(-name) %>%
  distinct_all()

df2 %>%
  left_join(df1)
英文:
library(dplyr)

df2 %&gt;%  
  left_join(distinct_all(select(pivot_longer(df1, -Country, values_to = &quot;ID_1&quot;), -name)))

Joining with `by = join_by(ID_1)`
# A tibble: 4 &#215; 3
  ID_1  Sex   Country
  &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;  
1 NE1   M     Togo   
2 NE3   F     USA    
3 LR2   M     Iceland
4 NE6   M     NA     

More readable format:

df1 &lt;- df1 %&gt;% 
  pivot_longer(-Country, values_to = &quot;ID_1&quot;) %&gt;% 
  select(-name) %&gt;%  
  distinct_all()

df2 %&gt;% 
  left_join(df1)

huangapple
  • 本文由 发表于 2023年6月22日 19:08:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76531275.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定