英文:
Merge 2 dfs with several IDs (using a lookup or conditions?)
问题
我想要对这两个数据框进行右连接:
df1 <- structure(list(ID_1 = c("NE1", "NE2", "LR2", "NE4", "WE5"), ID_2 = c("WW1", "NE3", "LR2", "NE5", "WE5"), Country = c("Togo", "USA", "Iceland", "Chad", "UK")), class = "data.frame", row.names = c(NA, -5L))
df2 <- structure(list(ID_1 = c("NE1", "NE3", "LR2", "NE6"), Sex = c("M", "F", "M", "M")), class = "data.frame", row.names = c(NA, -4L))
请注意,df1
有两个ID列,而df2
只有一个ID列。
我希望首先基于ID_1进行匹配,然后,仅当无法基于ID_1进行匹配时,再基于ID_2进行匹配。换句话说,我不能使用c(ID_1, ID_2)
或类似的方式。
期望的输出:
# ID_1 Country Sex
# 1 NE1 Togo M --> 基于df1的ID_1
# 2 NE3 USA F --> 基于df1的ID_2
# 3 LR2 Iceland M --> 基于df1的ID_1和ID_2
# 4 NE6 <NA> M --> 无匹配但保留行(右连接)
我还希望最终只有一个ID列,就像上面的期望输出一样。
我发现了Python的解决方案(https://stackoverflow.com/questions/58739744/how-to-merge-2-df-based-on-comparison-of-2-columns-to-match-1-column 和 https://stackoverflow.com/questions/48585027/merge-pandas-df-based-on-2-keys),但没有找到R的解决方案,或者无法使它们工作。
英文:
I want to do a right join with these 2 dfs:
df1 <- structure(list(ID_1 = c("NE1", "NE2", "LR2", "NE4", "WE5"), ID_2 = c("WW1",
"NE3", "LR2", "NE5", "WE5"), Country = c("Togo", "USA", "Iceland",
"Chad", "UK")), class = "data.frame", row.names = c(NA, -5L))
df2 <- structure(list(ID_1 = c("NE1", "NE3", "LR2", "NE6"), Sex = c("M",
"F", "M", "M")), class = "data.frame", row.names = c(NA, -4L))
Note df1
has 2 ID columns, whereas df2
has only 1 ID column.
I want the match to be made on ID_1, and then, only if the match cannot be made on ID_1, to be made on ID_2. In other words, I cannot use c(ID_1, ID_2)
or similar.
Desired output:
# ID_1 Country Sex
# 1 NE1 Togo M --> based on ID_1 of df1
# 2 NE3 USA F --> based on ID_2 of df1
# 3 LR2 Iceland M --> based on both ID_1 and ID_2 of df1
# 4 NE6 <NA> M --> no match but row is kept (right join)
I also want to end up with a single ID column as in the desired output above.
I found solutions for Python (https://stackoverflow.com/questions/58739744/how-to-merge-2-df-based-on-comparison-of-2-columns-to-match-1-column and https://stackoverflow.com/questions/48585027/merge-pandas-df-based-on-2-keys) but not R, or cannot make them work.
答案1
得分: 1
library(dplyr)
df2 %>%
left_join(distinct_all(select(pivot_longer(df1, -Country, values_to = "ID_1"), -name)))
Joining with `by = join_by(ID_1)`
# A tibble: 4 × 3
ID_1 Sex Country
<chr> <chr> <chr>
1 NE1 M Togo
2 NE3 F USA
3 LR2 M Iceland
4 NE6 M NA
More readable format:
df1 <- df1 %>%
pivot_longer(-Country, values_to = "ID_1") %>%
select(-name) %>%
distinct_all()
df2 %>%
left_join(df1)
英文:
library(dplyr)
df2 %>%
left_join(distinct_all(select(pivot_longer(df1, -Country, values_to = "ID_1"), -name)))
Joining with `by = join_by(ID_1)`
# A tibble: 4 × 3
ID_1 Sex Country
<chr> <chr> <chr>
1 NE1 M Togo
2 NE3 F USA
3 LR2 M Iceland
4 NE6 M NA
More readable format:
df1 <- df1 %>%
pivot_longer(-Country, values_to = "ID_1") %>%
select(-name) %>%
distinct_all()
df2 %>%
left_join(df1)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论