英文:
Select rows from R table based on two columns in another table
问题
我有两个表格:table1和table2,其中table1比table2大得多,但table2不完全包含在table1中。每个表格中还有两个ID列 - ID1和ID2。我想获取table1和table2中两个ID列匹配的行。如果一个ID的配对在一个表格中而不在另一个表格中,那么这一行就不应返回。
我尝试了以下代码:t1[which(t1$ID1 == t2$ID1 & t1$ID2 == t2$ID2
,但它显示“长对象长度不是短对象长度的倍数”。有什么想法吗?
英文:
I have two tables; table1 and table2, where table1 is much bigger than table2, but table2 is not fully contained in table1. I also have two ID columns - ID1 and ID2 in each table. I want to obtain the rows in table1 and table 2 in which the two ID columns coincide. If a pairing of ID's is in one table and not the other then the row should not be returned.
I tried t1[which(t1$ID1 == t2$ID1 & t1$ID2 == t2$ID2
It said that the longer object length is not a multiple of shorter object length. Any ideas?
答案1
得分: 3
使用 dplyr::semi_join()
(并借用 @thesixmax 的示例数据):
library(dplyr)
table1 %>%
semi_join(table2, by = c("ID_1", "ID_2"))
# ID_1 ID_2 val
# 1 0_2 1_2 2
# 2 0_4 1_4 4
table2 %>%
semi_join(table1, by = c("ID_1", "ID_2"))
# ID_1 ID_2 val
# 1 0_2 1_2 1
# 2 0_4 1_4 2
英文:
With dplyr::semi_join()
(and borrowing @thesixmax’s example data):
library(dplyr)
table1 %>%
semi_join(table2, by = c("ID_1", "ID_2"))
# ID_1 ID_2 val
# 1 0_2 1_2 2
# 2 0_4 1_4 4
table2 %>%
semi_join(table1, by = c("ID_1", "ID_2"))
# ID_1 ID_2 val
# 1 0_2 1_2 1
# 2 0_4 1_4 2
答案2
得分: 2
简单的示例:
table1 <- data.frame(
"ID_1" = c("0_1", "0_2", "0_3", "0_4", "0_5"),
"ID_2" = c("1_1", "1_2", "1_3", "1_4", "1_5"),
val = c(1, 2, 3, 4, 5)
)
table2 <- data.frame(
"ID_1" = c("0_2", "0_4", "0_6", "0_7", "0_8", "0_9", "0_10"),
"ID_2" = c("1_2", "1_4", "1_6", "1_7", "1_8", "1_9", "1_10"),
val = c(1, 2, 3, 4, 5, 6, 7)
)
使用基本的R解决方案:
ids1 <- which(interaction(table1[,c("ID_1", "ID_2")]) %in%
interaction(table2[,c("ID_1", "ID_2")]))
ids2 <- which(interaction(table2[,c("ID_1", "ID_2")]) %in%
interaction(table1[,c("ID_1", "ID_2")]))
overlap1 <- table1[ids1,]
overlap2 <- table2[ids2,]
英文:
Simple repex:
table1 <- data.frame(
"ID_1" = c("0_1", "0_2", "0_3", "0_4", "0_5"),
"ID_2" = c("1_1", "1_2", "1_3", "1_4", "1_5"),
val = c(1, 2, 3, 4, 5)
)
table2 <- data.frame(
"ID_1" = c("0_2", "0_4", "0_6", "0_7", "0_8", "0_9", "0_10"),
"ID_2" = c("1_2", "1_4", "1_6", "1_7", "1_8", "1_9", "1_10"),
val = c(1, 2, 3, 4, 5, 6, 7)
)
A solution using base R:
ids1 <- which(interaction(table1[,c("ID_1", "ID_2")]) %in%
interaction(table2[,c("ID_1", "ID_2")]))
ids2 <- which(interaction(table2[,c("ID_1", "ID_2")]) %in%
interaction(table1[,c("ID_1", "ID_2")]))
overlap1 <- table1[ids1,]
overlap2 <- table2[ids2,]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论