英文:
Counting values from a list
问题
我想数一数每个观察中列表的元素数量(考虑到列表中逗号的分隔)。我尝试过将其转换为因子,转换为列表,使用length和lengths等等,有谁知道如何解决这个问题吗?
英文:
I have a column whose observations are of type character and are organized as follows (example of output follows):
df <- data.frame(observation = c('["Extra pillows and blankets", "Dishes and silverware", "Room-darkening shades", "Hot water kettle", "Ethernet connection", "Wifi", "Dedicated workspace", "Oven"]',
'["Extra pillows and blankets", "Dishes and silverware", "Room-darkening shades", "Hot water kettle", "Ethernet connection", "Wifi", "Dedicated workspace", "Oven"]',
'["Extra pillows and blankets", "Dishes and silverware", "Room-darkening shades", "Hot water kettle", "Ethernet connection", "Wifi", "Dedicated workspace", "Oven"]'
))
My goal is to count the number of elements in each list of each observation (considering the separation of these elements by a comma in the list).
I've tried transforming it to a factor, to a list, I've used length and lengths, and many other things that I don't even remember. Does anyone know how to resolve this issue?
答案1
得分: 2
我们可以使用 str_count()
来计算元素的数量:
在这里,我们计算逗号 ,
并加上 1 以获得元素的数量:
library(dplyr)
library(stringr)
df %>%
mutate(n_elements = str_count(observation, ",")+1)
1 ["额外的枕头和毯子", "盘子和银器", "遮光窗帘", "热水壶", "以太网连接", "Wifi", "专用工作区", "烤箱"]
2 ["额外的枕头和毯子", "盘子和银器", "遮光窗帘", "热水壶", "以太网连接", "Wifi", "专用工作区", "烤箱"]
3 ["额外的枕头和毯子", "盘子和银器", "遮光窗帘", "热水壶", "以太网连接", "Wifi", "专用工作区", "烤箱"]
n_elements
1 8
2 8
3 8
英文:
We can count the elements using str_count()
:
Here we count the ,
and add 1 to get the count of elements:
library(dplyr)
library(stringr)
df %>%
mutate(n_elements = str_count(observation, ",")+1)
1 ["Extra pillows and blankets", "Dishes and silverware", "Room-darkening shades", "Hot water kettle", "Ethernet connection", "Wifi", "Dedicated workspace", "Oven"]
2 ["Extra pillows and blankets", "Dishes and silverware", "Room-darkening shades", "Hot water kettle", "Ethernet connection", "Wifi", "Dedicated workspace", "Oven"]
3 ["Extra pillows and blankets", "Dishes and silverware", "Room-darkening shades", "Hot water kettle", "Ethernet connection", "Wifi", "Dedicated workspace", "Oven"]
n_elements
1 8
2 8
3 8
答案2
得分: 1
以下是翻译好的代码部分:
这是一个基本的R解决方案。
m <- gregexpr(",", df$observation)
lengths(m) + 1L
#> [1] 8 8 8
或者一行代码的版本
lengths(strsplit(df$observation, ","))
#> [1] 8 8 8
但这会在计数之前创建一个列表,需要更多内存并且较慢。也许第一个解决方案可以重写为一行代码:
lengths(gregexpr(",", df$observation)) + 1L
英文:
Here is a base R solution.
m <- gregexpr(",", df$observation)
lengths(m) + 1L
#> [1] 8 8 8
<sup>Created on 2023-05-06 with reprex v2.0.2</sup>
Or the one-liner
lengths(strsplit(df$observation, ","))
#> [1] 8 8 8
<sup>Created on 2023-05-06 with reprex v2.0.2</sup>
But this creates a list before counting, which needs more memory and is slower. Maybe the first solution rewritten as a one-liner:
lengths(gregexpr(",", df$observation)) + 1L
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论