英文:
Return multiple rows per group in data.table
问题
可以通过在 data.table 中使用以下代码来实现类似 reframe 的功能:
library(data.table)
dt[, .(x = setdiff(x, y)), by = g]
英文:
Is it possible to return multiple rows per group in a grouped command in data.table? In dplyr, this is done with reframe:
y <- c("a", "b", "d", "f")
df <- tibble(
g = c(1, 1, 1, 2, 2, 2, 2),
x = c("e", "a", "b", "e", "f", "c", "a")
)
library(dplyr)
df %>%
reframe(x = setdiff(x, y), .by = g)
# g x
# 1 e
# 2 e
# 2 c
In data.table, this returns an error:
library(data.table)
dt <- setDT(df)
dt[, x := setdiff(x, y), g]
> Error in [.data.table(df, , :=(x, intersect(x, y)), g) :
> Supplied 2 items to be assigned to group 1 of size 3 in column 'x'.
> The RHS length must either be 1 (single values are ok) or match the
> LHS length exactly. If you wish to 'recycle' the RHS please use rep()
> explicitly to make this intent clear to readers of your code.
Anyway to get a data.table equivalent of reframe?
答案1
得分: 6
Wrap in .(...) 并且在 .(..) 内部使用 = 替代 :=。
as.data.table(df)[, .(x = setdiff(x, y)), by = g]
# g x
# <num> <char>
# 1: 1 e
# 2: 2 e
# 3: 2 c
请注意,在底层,.(.) 实际上就是 list(.),所以我们也可以使用任何返回类似 list 的对象的方法,包括:
as.data.table(df)[, list(x = setdiff(x, y)), by = g]
as.data.table(df)[, data.table(x = setdiff(x, y)), by = g]
as.data.table(df)[, data.frame(x = setdiff(x, y)), by = g]
英文:
Wrap in .(...) and use = in place of := (because it's within .(..)).
as.data.table(df)[, .(x = setdiff(x, y)), by = g]
# g x
# <num> <char>
# 1: 1 e
# 2: 2 e
# 3: 2 c
Note that under the hood, .(.) is really just list(.), so we could also use anything that returns list-like objects, including:
as.data.table(df)[, list(x = setdiff(x, y)), by = g]
as.data.table(df)[, data.table(x = setdiff(x, y)), by = g]
as.data.table(df)[, data.frame(x = setdiff(x, y)), by = g]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论