从列表中计数数值

huangapple go评论57阅读模式
英文:

Counting values from a list

问题

我想数一数每个观察中列表的元素数量(考虑到列表中逗号的分隔)。我尝试过将其转换为因子,转换为列表,使用length和lengths等等,有谁知道如何解决这个问题吗?

英文:

I have a column whose observations are of type character and are organized as follows (example of output follows):

df <- data.frame(observation = c('["Extra pillows and blankets", "Dishes and silverware", "Room-darkening shades", "Hot water kettle", "Ethernet connection", "Wifi", "Dedicated workspace", "Oven"]',
                                 '["Extra pillows and blankets", "Dishes and silverware", "Room-darkening shades", "Hot water kettle", "Ethernet connection", "Wifi", "Dedicated workspace", "Oven"]',
                                 '["Extra pillows and blankets", "Dishes and silverware", "Room-darkening shades", "Hot water kettle", "Ethernet connection", "Wifi", "Dedicated workspace", "Oven"]'
))

My goal is to count the number of elements in each list of each observation (considering the separation of these elements by a comma in the list).
I've tried transforming it to a factor, to a list, I've used length and lengths, and many other things that I don't even remember. Does anyone know how to resolve this issue?

答案1

得分: 2

我们可以使用 str_count() 来计算元素的数量:

在这里,我们计算逗号 , 并加上 1 以获得元素的数量:

library(dplyr)
library(stringr)

df %>%
  mutate(n_elements = str_count(observation, ",")+1)
1 ["额外的枕头和毯子", "盘子和银器", "遮光窗帘", "热水壶", "以太网连接", "Wifi", "专用工作区", "烤箱"]
2 ["额外的枕头和毯子", "盘子和银器", "遮光窗帘", "热水壶", "以太网连接", "Wifi", "专用工作区", "烤箱"]
3 ["额外的枕头和毯子", "盘子和银器", "遮光窗帘", "热水壶", "以太网连接", "Wifi", "专用工作区", "烤箱"]
  n_elements
1          8
2          8
3          8
英文:

We can count the elements using str_count():

Here we count the , and add 1 to get the count of elements:

library(dplyr)
library(stringr)

df %>%
  mutate(n_elements = str_count(observation, ",")+1)
1 ["Extra pillows and blankets", "Dishes and silverware", "Room-darkening shades", "Hot water kettle", "Ethernet connection", "Wifi", "Dedicated workspace", "Oven"]
2 ["Extra pillows and blankets", "Dishes and silverware", "Room-darkening shades", "Hot water kettle", "Ethernet connection", "Wifi", "Dedicated workspace", "Oven"]
3 ["Extra pillows and blankets", "Dishes and silverware", "Room-darkening shades", "Hot water kettle", "Ethernet connection", "Wifi", "Dedicated workspace", "Oven"]
  n_elements
1          8
2          8
3          8

答案2

得分: 1

以下是翻译好的代码部分:

这是一个基本的R解决方案。

m <- gregexpr(",", df$observation)
lengths(m) + 1L
#> [1] 8 8 8

或者一行代码的版本

lengths(strsplit(df$observation, ","))
#> [1] 8 8 8

但这会在计数之前创建一个列表,需要更多内存并且较慢。也许第一个解决方案可以重写为一行代码:

lengths(gregexpr(",", df$observation)) + 1L
英文:

Here is a base R solution.

m &lt;- gregexpr(&quot;,&quot;, df$observation)
lengths(m) + 1L
#&gt; [1] 8 8 8

<sup>Created on 2023-05-06 with reprex v2.0.2</sup>

Or the one-liner

lengths(strsplit(df$observation, &quot;,&quot;))
#&gt; [1] 8 8 8

<sup>Created on 2023-05-06 with reprex v2.0.2</sup>

But this creates a list before counting, which needs more memory and is slower. Maybe the first solution rewritten as a one-liner:

lengths(gregexpr(&quot;,&quot;, df$observation)) + 1L

huangapple
  • 本文由 发表于 2023年5月7日 02:10:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76190404.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定