R中的”unique”和”unlist”函数的替代方法

huangapple go评论66阅读模式
英文:

unique and unlist function alternative R

问题

我有一个庞大的数据集(超过1000万行),我需要按下面提到的方式获取计数。

为此,我将首先检查所有唯一的值,如下所示。

 lev<-(unique(unlist(mtcars [,8:11])))

然后使用table函数进行计数。

 as.data.frame(sapply(mtcars [,8:11],function(x)table(factor(x,levels = lev))))

但是上述方法仅适用于小数据集。大多数情况下,如果我将其用于大数据集,R会中断此命令。

有没有任何建议/方法可以提高大数据集的速度,例如使用dplyr?

英文:

I have a huge (More than 10 Million rows), and I need to get the count as mentioned below.

For this, I will check first all the unique ones like this.

 lev&lt;-(unique(unlist(mtcars[,8:11])))

Then count using the table function.

 as.data.frame(sapply(mtcars[,8:11], function(x) table(factor(x, levels = lev))))

But the above will work only for small datasets. Most of the time, R will kill this command if I use it for a large dataset.

Is there any suggestion/way to improve speed for large datasets, for example, for using dplyr?

答案1

得分: 2

也许 data.table 方法适合你

首先将数据融合成长格式,然后再次转换为宽格式。这会自动获取唯一值(=行),并根据列(即变量)进行聚合(dcast.data.table 的默认 fun.aggregate)。

DT <- as.data.table(mtcars)  # 或者使用 setDT(mydata)
dcast(melt(DT[,8:11], measure.vars = names(DT)[8:11]),
      value ~ variable)
#    value vs am gear carb
# 1:     0 18 19    0    0
# 2:     1 14 13    0    7
# 3:     2  0  0    0   10
# 4:     3  0  0   15    3
# 5:     4  0  0   12   10
# 6:     5  0  0    5    0
# 7:     6  0  0    0    1
# 8:     8  0  0    0    1
英文:

perhaps a data.table approach might work for you

it first melts the data to a long format, and then casts to wide again. This automatically gets the unique values (=rows), and the length of these values (the default fun.aggregate for dcast.data.table) by column (i.e. variable).

DT &lt;- as.data.table(mtcars)  # or setDT(mydata)
dcast(melt(DT[,8:11], measure.vars = names(DT)[8:11]),
      value ~ variable)
#    value vs am gear carb
# 1:     0 18 19    0    0
# 2:     1 14 13    0    7
# 3:     2  0  0    0   10
# 4:     3  0  0   15    3
# 5:     4  0  0   12   10
# 6:     5  0  0    5    0
# 7:     6  0  0    0    1
# 8:     8  0  0    0    1

huangapple
  • 本文由 发表于 2023年2月16日 19:21:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/75471543.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定