如何在R中获取并组合来自多个列表的元素?

huangapple go评论58阅读模式
英文:

How to get and combine an element from multiple lists in R?

问题

以下是翻译好的部分:

我有一个如下所示的函数。

```R
f1 <- function(x)
{
  df1 <- rowSums(x)
  df2 <- colSums(x)
  return(list(Actuals = df1, Summary = df2))
}

我正在按照下面所示的分组来调用该函数。

out <- by(mtcars, INDICES = mtcars$gear, f1, simplify = TRUE)

现在我只需要从所有分组中获取摘要(Summary)并将它们组合在一个数据框中。

我可以使用以下代码来实现。

summary <- do.call(rbind, sapply(out, function(x) x$Summary, simplify = FALSE))
summary <- cbind(Gear = as.integer(row.names(summary)), summary)

但这个过程非常慢。我原始数据集中有成千上万个分组,采用这种方法大约需要20分钟来完成相同的操作。

有人可以提供一个更好的方法,使用data.table或任何其他包吗?

提前感谢。


<details>
<summary>英文:</summary>

I have a function as shown below.

f1<-function(x)
{
df1 <- rowSums(x)
df2 <- colSums(x)
return(list(Actuals = df1,Summary = df2))
}


I am calling that function by group as shown below.

out <- by( mtcars, INDICES = mtcars$gear, f1, simplify = TRUE )


Now I need to get only Summary from all the groups and combine them together in a dataframe.

I can do it with the below code.

summary <- do.call( rbind, sapply( out, function(x) x$Summary, simplify = FALSE ) )
summary <- cbind( Gear = as.integer( row.names(summary) ), summary )


But this process is very slow. I have thousands of groups in my original dataset and this approach is taking around 20 mins to do the same.

Could anyone provide a better approach with ```data.table``` or any other packages?

Thanks in advance.



</details>


# 答案1
**得分**: 1

即使有成千上万个组,当前解决方案中的重塑操作仅需要几秒钟。我怀疑几乎所有的处理时间都花在了实际的 `f1` 函数上。

以下是使用 `data.table` 完成(基本上)相同操作的一种方式。它的运行速度大约快两倍。

*示例* `f1` 函数(根据评论“`f1` 函数只是一个示例。我有非常冗长和复杂的函数。但以列表格式返回两个数据帧。”进行修改):

```R
f1 <- function(x) {
  df1 <- as.data.frame(t(rowSums(x)))
  df2 <- as.data.frame(t(colSums(x)))
  return(list(Actuals = df1, Summary = df2))
}

修改后的 f1 函数,以返回嵌套列表:

f2 <- function(x) {
  df1 <- as.data.frame(t(rowSums(x)))
  df2 <- as.data.frame(t(colSums(x)))
  return(list(Actuals = list(df1), Summary = list(df2)))
}

为了进行演示,创建一个更大的数据集:

library(data.table)
bigcars <- setDT(copy(mtcars))
bigcars <- rbindlist(lapply(1:1e4, function(i) copy(bigcars[,gear := gear + 3L])))

原始解决方案:

system.time({
  out <- by(bigcars, INDICES = bigcars$gear, f1, simplify = TRUE)
  summary <- do.call(rbind, sapply(out, function(x) x$Summary, simplify = FALSE))
  summary <- cbind(Gear = as.integer(row.names(summary)), summary)
})
#>    user  system elapsed 
#>   11.82    0.15   11.99

data.table 解决方案:

system.time({
  out2 <- setDT(copy(bigcars))[, f2(.SD), gear]
  summary2 <- rbindlist(out2$Summary)
})
#>    user  system elapsed 
#>    5.34    0.11    5.47

请注意,[, f2(.SD), gear] 操作未将分组变量传递给 f2,因此 gear 不会出现在 summary2 中。根据实际的 f1 函数操作的内容,可能需要对此操作进行修改。

英文:

Even with tens of thousands of groups, the reshaping operations in the current solution take just seconds. I suspect almost all the processing time is spent in the actual f1 function.

Below is one way to use data.table to accomplish (essentially) the same thing. It runs about twice as fast.

Example f1 function (modified based on the comment "f1 function is just an example. I have very lengthy and complex function. But returning two data frames in a list format."):

f1 &lt;- function(x) {
  df1 &lt;- as.data.frame(t(rowSums(x)))
  df2 &lt;- as.data.frame(t(colSums(x)))
  return(list(Actuals = df1, Summary = df2))
}

f1 modified to return nested lists.

f2 &lt;- function(x) {
  df1 &lt;- as.data.frame(t(rowSums(x)))
  df2 &lt;- as.data.frame(t(colSums(x)))
  return(list(Actuals = list(df1), Summary = list(df2)))
}

Make a much bigger dataset for illustration.

library(data.table)
bigcars &lt;- setDT(copy(mtcars))
bigcars &lt;- rbindlist(lapply(1:1e4, function(i) copy(bigcars[,gear := gear + 3L])))

Original solution:

system.time({
  out &lt;- by(bigcars, INDICES = bigcars$gear, f1, simplify = TRUE)
  summary &lt;- do.call(rbind, sapply(out, function(x) x$Summary, simplify = FALSE))
  summary &lt;- cbind(Gear = as.integer(row.names(summary)), summary)
})
#&gt;    user  system elapsed 
#&gt;   11.82    0.15   11.99

data.table solution:

system.time({
  out2 &lt;- setDT(copy(bigcars))[, f2(.SD), gear]
  summary2 &lt;- rbindlist(out2$Summary)
})
#&gt;    user  system elapsed 
#&gt;    5.34    0.11    5.47

Note that the [, f2(.SD), gear] operation does not pass the grouping variable to f2, so gear does not appear in summary2. The operation may need to be modified depending on what the actual f1 function is doing.

huangapple
  • 本文由 发表于 2023年3月1日 12:53:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75599694.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定