英文:
How to get and combine an element from multiple lists in R?
问题
以下是翻译好的部分:
我有一个如下所示的函数。
```R
f1 <- function(x)
{
df1 <- rowSums(x)
df2 <- colSums(x)
return(list(Actuals = df1, Summary = df2))
}
我正在按照下面所示的分组来调用该函数。
out <- by(mtcars, INDICES = mtcars$gear, f1, simplify = TRUE)
现在我只需要从所有分组中获取摘要(Summary)并将它们组合在一个数据框中。
我可以使用以下代码来实现。
summary <- do.call(rbind, sapply(out, function(x) x$Summary, simplify = FALSE))
summary <- cbind(Gear = as.integer(row.names(summary)), summary)
但这个过程非常慢。我原始数据集中有成千上万个分组,采用这种方法大约需要20分钟来完成相同的操作。
有人可以提供一个更好的方法,使用data.table
或任何其他包吗?
提前感谢。
<details>
<summary>英文:</summary>
I have a function as shown below.
f1<-function(x)
{
df1 <- rowSums(x)
df2 <- colSums(x)
return(list(Actuals = df1,Summary = df2))
}
I am calling that function by group as shown below.
out <- by( mtcars, INDICES = mtcars$gear, f1, simplify = TRUE )
Now I need to get only Summary from all the groups and combine them together in a dataframe.
I can do it with the below code.
summary <- do.call( rbind, sapply( out, function(x) x$Summary, simplify = FALSE ) )
summary <- cbind( Gear = as.integer( row.names(summary) ), summary )
But this process is very slow. I have thousands of groups in my original dataset and this approach is taking around 20 mins to do the same.
Could anyone provide a better approach with ```data.table``` or any other packages?
Thanks in advance.
</details>
# 答案1
**得分**: 1
即使有成千上万个组,当前解决方案中的重塑操作仅需要几秒钟。我怀疑几乎所有的处理时间都花在了实际的 `f1` 函数上。
以下是使用 `data.table` 完成(基本上)相同操作的一种方式。它的运行速度大约快两倍。
*示例* `f1` 函数(根据评论“`f1` 函数只是一个示例。我有非常冗长和复杂的函数。但以列表格式返回两个数据帧。”进行修改):
```R
f1 <- function(x) {
df1 <- as.data.frame(t(rowSums(x)))
df2 <- as.data.frame(t(colSums(x)))
return(list(Actuals = df1, Summary = df2))
}
修改后的 f1
函数,以返回嵌套列表:
f2 <- function(x) {
df1 <- as.data.frame(t(rowSums(x)))
df2 <- as.data.frame(t(colSums(x)))
return(list(Actuals = list(df1), Summary = list(df2)))
}
为了进行演示,创建一个更大的数据集:
library(data.table)
bigcars <- setDT(copy(mtcars))
bigcars <- rbindlist(lapply(1:1e4, function(i) copy(bigcars[,gear := gear + 3L])))
原始解决方案:
system.time({
out <- by(bigcars, INDICES = bigcars$gear, f1, simplify = TRUE)
summary <- do.call(rbind, sapply(out, function(x) x$Summary, simplify = FALSE))
summary <- cbind(Gear = as.integer(row.names(summary)), summary)
})
#> user system elapsed
#> 11.82 0.15 11.99
data.table
解决方案:
system.time({
out2 <- setDT(copy(bigcars))[, f2(.SD), gear]
summary2 <- rbindlist(out2$Summary)
})
#> user system elapsed
#> 5.34 0.11 5.47
请注意,[, f2(.SD), gear]
操作未将分组变量传递给 f2
,因此 gear
不会出现在 summary2
中。根据实际的 f1
函数操作的内容,可能需要对此操作进行修改。
英文:
Even with tens of thousands of groups, the reshaping operations in the current solution take just seconds. I suspect almost all the processing time is spent in the actual f1
function.
Below is one way to use data.table
to accomplish (essentially) the same thing. It runs about twice as fast.
Example f1
function (modified based on the comment "f1
function is just an example. I have very lengthy and complex function. But returning two data frames in a list format."):
f1 <- function(x) {
df1 <- as.data.frame(t(rowSums(x)))
df2 <- as.data.frame(t(colSums(x)))
return(list(Actuals = df1, Summary = df2))
}
f1
modified to return nested lists.
f2 <- function(x) {
df1 <- as.data.frame(t(rowSums(x)))
df2 <- as.data.frame(t(colSums(x)))
return(list(Actuals = list(df1), Summary = list(df2)))
}
Make a much bigger dataset for illustration.
library(data.table)
bigcars <- setDT(copy(mtcars))
bigcars <- rbindlist(lapply(1:1e4, function(i) copy(bigcars[,gear := gear + 3L])))
Original solution:
system.time({
out <- by(bigcars, INDICES = bigcars$gear, f1, simplify = TRUE)
summary <- do.call(rbind, sapply(out, function(x) x$Summary, simplify = FALSE))
summary <- cbind(Gear = as.integer(row.names(summary)), summary)
})
#> user system elapsed
#> 11.82 0.15 11.99
data.table
solution:
system.time({
out2 <- setDT(copy(bigcars))[, f2(.SD), gear]
summary2 <- rbindlist(out2$Summary)
})
#> user system elapsed
#> 5.34 0.11 5.47
Note that the [, f2(.SD), gear]
operation does not pass the grouping variable to f2
, so gear
does not appear in summary2
. The operation may need to be modified depending on what the actual f1
function is doing.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论