2023年6月22日 19:08:28go评论94阅读模式

英文:

Merge 2 dfs with several IDs (using a lookup or conditions?)

问题

我想要对这两个数据框进行右连接：

df1 <- structure(list(ID_1 = c("NE1", "NE2", "LR2", "NE4", "WE5"), ID_2 = c("WW1", "NE3", "LR2", "NE5", "WE5"), Country = c("Togo", "USA", "Iceland", "Chad", "UK")), class = "data.frame", row.names = c(NA, -5L))
df2 <- structure(list(ID_1 = c("NE1", "NE3", "LR2", "NE6"), Sex = c("M", "F", "M", "M")), class = "data.frame", row.names = c(NA, -4L))

请注意，df1有两个ID列，而df2只有一个ID列。

我希望首先基于ID_1进行匹配，然后，仅当无法基于ID_1进行匹配时，再基于ID_2进行匹配。换句话说，我不能使用c(ID_1, ID_2)或类似的方式。

期望的输出：

#     ID_1   Country Sex
# 1   NE1    Togo    M    --> 基于df1的ID_1
# 2   NE3    USA     F    --> 基于df1的ID_2
# 3   LR2    Iceland M    --> 基于df1的ID_1和ID_2
# 4   NE6    <NA>    M    --> 无匹配但保留行（右连接）

我还希望最终只有一个ID列，就像上面的期望输出一样。

我发现了Python的解决方案（https://stackoverflow.com/questions/58739744/how-to-merge-2-df-based-on-comparison-of-2-columns-to-match-1-column 和 https://stackoverflow.com/questions/48585027/merge-pandas-df-based-on-2-keys），但没有找到R的解决方案，或者无法使它们工作。

英文:

I want to do a right join with these 2 dfs:

df1 &lt;- structure(list(ID_1 = c(&quot;NE1&quot;, &quot;NE2&quot;, &quot;LR2&quot;, &quot;NE4&quot;, &quot;WE5&quot;), ID_2 = c(&quot;WW1&quot;, 
                                                                            &quot;NE3&quot;, &quot;LR2&quot;, &quot;NE5&quot;, &quot;WE5&quot;), Country = c(&quot;Togo&quot;, &quot;USA&quot;, &quot;Iceland&quot;, 
                                                                                                                     &quot;Chad&quot;, &quot;UK&quot;)), class = &quot;data.frame&quot;, row.names = c(NA, -5L))
df2 &lt;- structure(list(ID_1 = c(&quot;NE1&quot;, &quot;NE3&quot;, &quot;LR2&quot;, &quot;NE6&quot;), Sex = c(&quot;M&quot;, 
                                                                    &quot;F&quot;, &quot;M&quot;, &quot;M&quot;)), class = &quot;data.frame&quot;, row.names = c(NA, -4L))

Note df1 has 2 ID columns, whereas df2 has only 1 ID column.

I want the match to be made on ID_1, and then, only if the match cannot be made on ID_1, to be made on ID_2. In other words, I cannot use c(ID_1, ID_2) or similar.

Desired output:

#     ID_1    Country   Sex
# 1   NE1     Togo      M     --&gt; based on ID_1 of df1
# 2   NE3     USA       F     --&gt; based on ID_2 of df1
# 3   LR2     Iceland   M     --&gt; based on both ID_1 and ID_2 of df1
# 4   NE6     &lt;NA&gt;      M     --&gt; no match but row is kept (right join)

I also want to end up with a single ID column as in the desired output above.

I found solutions for Python (https://stackoverflow.com/questions/58739744/how-to-merge-2-df-based-on-comparison-of-2-columns-to-match-1-column and https://stackoverflow.com/questions/48585027/merge-pandas-df-based-on-2-keys) but not R, or cannot make them work.

答案1

得分: 1

library(dplyr)
df2 %>%
  left_join(distinct_all(select(pivot_longer(df1, -Country, values_to = "ID_1"), -name)))
Joining with `by = join_by(ID_1)`
# A tibble: 4 × 3
  ID_1  Sex   Country
  <chr> <chr> <chr>  
1 NE1   M     Togo   
2 NE3   F     USA    
3 LR2   M     Iceland
4 NE6   M     NA    
More readable format: 
df1 <- df1 %>%
  pivot_longer(-Country, values_to = "ID_1") %>%
  select(-name) %>%
  distinct_all()
df2 %>%
  left_join(df1)

英文:

library(dplyr)
df2 %&gt;%  
  left_join(distinct_all(select(pivot_longer(df1, -Country, values_to = &quot;ID_1&quot;), -name)))
Joining with `by = join_by(ID_1)`
# A tibble: 4 &#215; 3
  ID_1  Sex   Country
  &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;  
1 NE1   M     Togo   
2 NE3   F     USA    
3 LR2   M     Iceland
4 NE6   M     NA

合并两个具有多个ID的数据框（使用查找或条件？）

问题

答案1

有没有一种方法可以根据多个条件来折叠和转换数据框？

rvest：调整网络抓取功能，以便在遇到无数据时继续。

ggplot和names：是否可以使用粗体字符？

如何通过id合并两个对象列表并选择非空值在Kotlin中

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。