英文:
Reshape strings within a dataframe in R
问题
Sure, here's the translation of the provided content:
我有一个看起来像这样的数据框
data.frame(service=c("1,2,3,4","2,4,5","1,3,4"),
score=c("1,2,1,3","1,1,3","1,1,1"))
a b
1 1,2,3,4 1,2,1,3
2 2,4,5 1,1,3
3 1,3,4 1,1,1
变量"service"是一个分类变量,从1到5,而"score"也是一个分类变量,范围从1到3,对于前述每个服务。请注意,每个受访者都接受了不同的服务,因此每个变量的长度不相同。
我需要将此数据框重新整理,以将得分与每个服务关联起来。最终结果将如下所示:
data_frame(ind=c(1,1,1,1,2,2,2,3,3,3),
serv=c(1,2,3,4,2,4,5,1,3,4),
score=c(1,2,1,3,1,1,3,1,1,1))
ind serv score
<dbl> <dbl> <dbl>
1 1 1 1
2 1 2 2
3 1 3 1
4 1 4 3
5 2 2 1
6 2 4 1
7 2 5 3
8 3 1 1
9 3 3 1
10 3 4 1
我首先分割了"service"变量,以创建所有这样的类别
library("qdapTools","tidyverse")
lst1 <- lapply(strsplit(df$serv, ","), function(x)
replace(x, (! x %in% c("1", "2", "3","4","5")) & !is.na(x), "other"))
lst1_tab<-mtabulate(lst1)%>% setNames(paste0('serv_', names(.)))
df<-cbind(df,lst1_tab)
serv score serv_1 serv_2 serv_3 serv_4 serv_5
1 1,2,3,4 1,2,1,3 1 1 1 1 0
2 2,4,5 1,1,3 0 1 0 1 1
3 1,3,4 1,1,1 1 0 1 1 0
我这样做是为了后来将数据框重新整理为长格式。然而,并非所有个体都接受了所有服务。例如,个体1没有接受服务5。因此,我不知道如何分割"score"变量以将其与每个服务关联起来。
英文:
I have a dataframe that looks like this
data.frame(service=c("1,2,3,4","2,4,5","1,3,4"),
score=c("1,2,1,3","1,1,3","1,1,1"))
a b
1 1,2,3,4 1,2,1,3
2 2,4,5 1,1,3
3 1,3,4 1,1,1
The variable service refers to a categorical variable from 1-5, while the score, is also a categorical variable from 1-3 for each of the previous service. Note that each respondent have taken different services, so the length is not the same for each variable is not the same
I need to reshape this dataframe to associate the score to each service. The final result would look like this
data_frame(ind=c(1,1,1,1,2,2,2,3,3,3),
serv=c(1,2,3,4,2,4,5,1,3,4),
score=c(1,2,1,3,1,1,3,1,1,1))
ind serv score
<dbl> <dbl> <dbl>
1 1 1 1
2 1 2 2
3 1 3 1
4 1 4 3
5 2 2 1
6 2 4 1
7 2 5 3
8 3 1 1
9 3 3 1
10 3 4 1
I first split the variable service to create all the categories in this way
library("qdapTools","tidyverse")
lst1 <- lapply(strsplit(df$serv, ","), function(x)
replace(x, (! x %in% c("1", "2", "3","4","5")) & !is.na(x), "other"))
lst1_tab<-mtabulate(lst1)%>% setNames(paste0('serv_', names(.)))
df<-cbind(df,lst1_tab)
serv score serv_1 serv_2 serv_3 serv_4 serv_5
1 1,2,3,4 1,2,1,3 1 1 1 1 0
2 2,4,5 1,1,3 0 1 0 1 1
3 1,3,4 1,1,1 1 0 1 1 0
I did that to later reshape the df in long form. However, not all individuals took all services. For instance, individual 1 did not take service 5. Therefore, I didn't know how to split also the variable score to associate it to each service
答案1
得分: 3
我们可以使用以下代码:
library(dplyr)
library(tidyr)
df1 %>%
mutate(ind = row_number(), .before = 1) %>%
separate_longer_delim(c(service, score), delim = ",")
-output
ind service score
1 1 1 1
2 1 2 2
3 1 3 1
4 1 4 3
5 2 2 1
6 2 4 1
7 2 5 3
8 3 1 1
9 3 3 1
10 3 4 1
英文:
We could use
library(dplyr)
library(tidyr)
df1 %>%
mutate(ind = row_number(), .before = 1) %>%
separate_longer_delim(c(service, score), delim = ",")
-output
ind service score
1 1 1 1
2 1 2 2
3 1 3 1
4 1 4 3
5 2 2 1
6 2 4 1
7 2 5 3
8 3 1 1
9 3 3 1
10 3 4 1
答案2
得分: 0
使用strsplit
和unnest
的方法:
library(dplyr)
library(tidyr)
df %>%
mutate(across(everything(), ~ strsplit(.x, ",")),
ind = row_number(), .before = service) %>%
unnest(c(service, score))
# A tibble: 10 × 3
ind service score
<int> <chr> <chr>
1 1 1 1
2 1 2 2
3 1 3 1
4 1 4 3
5 2 2 1
6 2 4 1
7 2 5 3
8 3 1 1
9 3 3 1
10 3 4 1
英文:
An approach using strsplit
and unnest
library(dplyr)
library(tidyr)
df %>%
mutate(across(everything(), ~ strsplit(.x, ",")),
ind = row_number(), .before = service) %>%
unnest(c(service, score))
# A tibble: 10 × 3
ind service score
<int> <chr> <chr>
1 1 1 1
2 1 2 2
3 1 3 1
4 1 4 3
5 2 2 1
6 2 4 1
7 2 5 3
8 3 1 1
9 3 3 1
10 3 4 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论