英文:
Merging two data.tables that don't have common columns
问题
我想合并两个没有共同列的data.table,最终结果将有N1*N2
行,其中N1
和N2
分别是每个数据框中的行数。
使用基本的R可以完成这个操作:
A <- data.frame(id = 1:6, value = 19:24)
B <- data.frame(value2 = c(25, 25, 26, 26), value3 = 4:5)
A
#> id value
#> 1 1 19
#> 2 2 20
#> 3 3 21
#> 4 4 22
#> 5 5 23
#> 6 6 24
B
#> value2 value3
#> 1 25 4
#> 2 25 5
#> 3 26 4
#> 4 26 5
merge(A, B, all = TRUE)
#> id value value2 value3
#> 1 1 19 25 4
#> 2 2 20 25 4
#> 3 3 21 25 4
#> 4 4 22 25 4
#> 5 5 23 25 4
#> 6 6 24 25 4
#> 7 1 19 25 5
#> 8 2 20 25 5
#> 9 3 21 25 5
#> 10 4 22 25 5
#> 11 5 23 25 5
#> 12 6 24 25 5
#> 13 1 19 26 4
#> 14 2 20 26 4
#> 15 3 21 26 4
#> 16 4 22 26 4
#> 17 5 23 26 4
#> 18 6 24 26 4
#> 19 1 19 26 5
#> 20 2 20 26 5
#> 21 3 21 26 5
#> 22 4 22 26 5
#> 23 5 23 26 5
#> 24 6 24 26 5
但是,如果现在有两个data.table而不是dataframes,会出现错误:
library(data.table)
A <- data.table(id = 1:6, value = 19:24)
B <- data.table(value2 = c(25, 25, 26, 26), value3 = 4:5)
merge(A, B, all = TRUE)
#> Error in merge.data.table(A, B, all = TRUE): A non-empty vector of column names for `by` is required.
如何使用data.table
来重现基本R的行为(不一定要使用merge()
)?
英文:
I want to merge two data.tables that don't have a common column, so I would end up with N1*N2
rows, where N1
and N2
are the number of rows in each dataframe.
Doing this with base R works:
A <- data.frame(id = 1:6, value = 19:24)
B <- data.frame(value2 = c(25, 25, 26, 26), value3 = 4:5)
A
#> id value
#> 1 1 19
#> 2 2 20
#> 3 3 21
#> 4 4 22
#> 5 5 23
#> 6 6 24
B
#> value2 value3
#> 1 25 4
#> 2 25 5
#> 3 26 4
#> 4 26 5
merge(A, B, all = TRUE)
#> id value value2 value3
#> 1 1 19 25 4
#> 2 2 20 25 4
#> 3 3 21 25 4
#> 4 4 22 25 4
#> 5 5 23 25 4
#> 6 6 24 25 4
#> 7 1 19 25 5
#> 8 2 20 25 5
#> 9 3 21 25 5
#> 10 4 22 25 5
#> 11 5 23 25 5
#> 12 6 24 25 5
#> 13 1 19 26 4
#> 14 2 20 26 4
#> 15 3 21 26 4
#> 16 4 22 26 4
#> 17 5 23 26 4
#> 18 6 24 26 4
#> 19 1 19 26 5
#> 20 2 20 26 5
#> 21 3 21 26 5
#> 22 4 22 26 5
#> 23 5 23 26 5
#> 24 6 24 26 5
But if I now have two data.tables and not dataframes anymore, it errors:
library(data.table)
A <- data.table(id = 1:6, value = 19:24)
B <- data.table(value2 = c(25, 25, 26, 26), value3 = 4:5)
merge(A, B, all = TRUE)
#> Error in merge.data.table(A, B, all = TRUE): A non-empty vector of column names for `by` is required.
How can I reproduce the base R behavior with data.table
(without necessarily using merge()
)?
答案1
得分: 5
在data.table
中,你正在寻找一个交叉连接。有一个CJ
函数,但它只能用于一个数据集,否则你可以这样做:
res <- setkey(A[, c(k=1, .SD)], k)[B[, c(k=1, .SD)], allow.cartesian = TRUE][, k := NULL]
res
id value value2 value3
1: 1 19 25 4
2: 2 20 25 4
3: 3 21 25 4
4: 4 22 25 4
5: 5 23 25 4
6: 6 24 25 4
7: 1 19 25 5
8: 2 20 25 5
9: 3 21 25 5
10: 4 22 25 5
11: 5 23 25 5
12: 6 24 25 5
13: 1 19 26 4
14: 2 20 26 4
15: 3 21 26 4
16: 4 22 26 4
17: 5 23 26 4
18: 6 24 26 4
19: 1 19 26 5
20: 2 20 26 5
21: 3 21 26 5
22: 4 22 26 5
23: 5 23 26 5
24: 6 24 26 5
id value value2 value3
注意备用的dplyr
解决方案:
dplyr::cross_join(A, B)
英文:
You are looking for a cross-join. In data.table
, there is a CJ
function but it only works with one data set, otherwise you can do:
res <- setkey(A[, c(k=1, .SD)], k)[B[, c(k=1, .SD)], allow.cartesian = TRUE][, k := NULL]
res
id value value2 value3
1: 1 19 25 4
2: 2 20 25 4
3: 3 21 25 4
4: 4 22 25 4
5: 5 23 25 4
6: 6 24 25 4
7: 1 19 25 5
8: 2 20 25 5
9: 3 21 25 5
10: 4 22 25 5
11: 5 23 25 5
12: 6 24 25 5
13: 1 19 26 4
14: 2 20 26 4
15: 3 21 26 4
16: 4 22 26 4
17: 5 23 26 4
18: 6 24 26 4
19: 1 19 26 5
20: 2 20 26 5
21: 3 21 26 5
22: 4 22 26 5
23: 5 23 26 5
24: 6 24 26 5
id value value2 value3
Note the alternative dplyr
solution:
dplyr::cross_join(A, B)
答案2
得分: 1
来自data.table的替代方案:
library(data.table)
A <- data.table(id = 1:6, value = 19:24)
B <- data.table(value2 = c(25, 25, 26, 26), value3 = 4:5)
CJDT <- function(...) {
Reduce(function(DT1, DT2) cbind(DT1, DT2[rep(1:.N, each=nrow(DT1))]), list(...))
}
CJDT(A, B)
#> id value value2 value3
#> 1: 1 19 25 4
#> 2: 2 20 25 4
#> 3: 3 21 25 4
#> 4: 4 22 25 4
#> 5: 5 23 25 4
#> 6: 6 24 25 4
#> 7: 1 19 25 5
#> 8: 2 20 25 5
#> 9: 3 21 25 5
#> 10: 4 22 25 5
#> 11: 5 23 25 5
#> 12: 6 24 25 5
#> 13: 1 19 26 4
#> 14: 2 20 26 4
#> 15: 3 21 26 4
#> 16: 4 22 26 4
#> 17: 5 23 26 4
#> 18: 6 24 26 4
#> 19: 1 19 26 5
#> 20: 2 20 26 5
#> 21: 3 21 26 5
#> 22: 4 22 26 5
#> 23: 5 23 26 5
#> 24: 6 24 26 5
#> id value value2 value3
创建于2023-02-06,使用reprex v2.0.2
英文:
An alternative coming from this GitHub issue on the data.table
repo:
library(data.table)
A <- data.table(id = 1:6, value = 19:24)
B <- data.table(value2 = c(25, 25, 26, 26), value3 = 4:5)
CJDT <- function(...) {
Reduce(function(DT1, DT2) cbind(DT1, DT2[rep(1:.N, each=nrow(DT1))]), list(...))
}
CJDT(A, B)
#> id value value2 value3
#> 1: 1 19 25 4
#> 2: 2 20 25 4
#> 3: 3 21 25 4
#> 4: 4 22 25 4
#> 5: 5 23 25 4
#> 6: 6 24 25 4
#> 7: 1 19 25 5
#> 8: 2 20 25 5
#> 9: 3 21 25 5
#> 10: 4 22 25 5
#> 11: 5 23 25 5
#> 12: 6 24 25 5
#> 13: 1 19 26 4
#> 14: 2 20 26 4
#> 15: 3 21 26 4
#> 16: 4 22 26 4
#> 17: 5 23 26 4
#> 18: 6 24 26 4
#> 19: 1 19 26 5
#> 20: 2 20 26 5
#> 21: 3 21 26 5
#> 22: 4 22 26 5
#> 23: 5 23 26 5
#> 24: 6 24 26 5
#> id value value2 value3
<sup>Created on 2023-02-06 with reprex v2.0.2</sup>
答案3
得分: 0
A[, as.list(B), names(A)]
结果
id value value2 value3
1: 1 19 25 4
2: 1 19 25 5
3: 1 19 26 4
4: 1 19 26 5
5: 2 20 25 4
6: 2 20 25 5
7: 2 20 26 4
8: 2 20 26 5
9: 3 21 25 4
10: 3 21 25 5
11: 3 21 26 4
12: 3 21 26 5
13: 4 22 25 4
14: 4 22 25 5
15: 4 22 26 4
16: 4 22 26 5
17: 5 23 25 4
18: 5 23 25 5
19: 5 23 26 4
20: 5 23 26 5
21: 6 24 25 4
22: 6 24 25 5
23: 6 24 26 4
24: 6 24 26 5
数据
A <- data.table(id = 1:6, value = 19:24)
B <- data.table(value2 = c(25, 25, 26, 26), value3 = 4:5)
<details>
<summary>英文:</summary>
A[, as.list(B), names(A)]
**results**
id value value2 value3
1: 1 19 25 4
2: 1 19 25 5
3: 1 19 26 4
4: 1 19 26 5
5: 2 20 25 4
6: 2 20 25 5
7: 2 20 26 4
8: 2 20 26 5
9: 3 21 25 4
10: 3 21 25 5
11: 3 21 26 4
12: 3 21 26 5
13: 4 22 25 4
14: 4 22 25 5
15: 4 22 26 4
16: 4 22 26 5
17: 5 23 25 4
18: 5 23 25 5
19: 5 23 26 4
20: 5 23 26 5
21: 6 24 25 4
22: 6 24 25 5
23: 6 24 26 4
24: 6 24 26 5
**data**
A <- data.table(id = 1:6, value = 19:24)
B <- data.table(value2 = c(25, 25, 26, 26), value3 = 4:5)
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论