重塑数据框中的字符串在 R 中

huangapple go评论74阅读模式
英文:

Reshape strings within a dataframe in R

问题

Sure, here's the translation of the provided content:

我有一个看起来像这样的数据框

data.frame(service=c("1,2,3,4","2,4,5","1,3,4"),
                      score=c("1,2,1,3","1,1,3","1,1,1"))
        a       b
1 1,2,3,4 1,2,1,3
2   2,4,5   1,1,3
3   1,3,4   1,1,1

变量"service"是一个分类变量,从1到5,而"score"也是一个分类变量,范围从1到3,对于前述每个服务。请注意,每个受访者都接受了不同的服务,因此每个变量的长度不相同。

我需要将此数据框重新整理,以将得分与每个服务关联起来。最终结果将如下所示:

data_frame(ind=c(1,1,1,1,2,2,2,3,3,3),
                 serv=c(1,2,3,4,2,4,5,1,3,4),
                 score=c(1,2,1,3,1,1,3,1,1,1))

    ind  serv score
   <dbl> <dbl> <dbl>
 1     1     1     1
 2     1     2     2
 3     1     3     1
 4     1     4     3
 5     2     2     1
 6     2     4     1
 7     2     5     3
 8     3     1     1
 9     3     3     1
10     3     4     1

我首先分割了"service"变量,以创建所有这样的类别

library("qdapTools","tidyverse")

lst1 <- lapply(strsplit(df$serv, ","), function(x) 
  replace(x, (! x %in% c("1", "2", "3","4","5")) & !is.na(x), "other"))

lst1_tab<-mtabulate(lst1)%>% setNames(paste0('serv_', names(.)))
df<-cbind(df,lst1_tab)

     serv   score serv_1 serv_2 serv_3 serv_4 serv_5
1 1,2,3,4 1,2,1,3      1      1      1      1      0
2   2,4,5   1,1,3      0      1      0      1      1
3   1,3,4   1,1,1      1      0      1      1      0

我这样做是为了后来将数据框重新整理为长格式。然而,并非所有个体都接受了所有服务。例如,个体1没有接受服务5。因此,我不知道如何分割"score"变量以将其与每个服务关联起来。

英文:

I have a dataframe that looks like this

data.frame(service=c(&quot;1,2,3,4&quot;,&quot;2,4,5&quot;,&quot;1,3,4&quot;),
                      score=c(&quot;1,2,1,3&quot;,&quot;1,1,3&quot;,&quot;1,1,1&quot;))
        a       b
1 1,2,3,4 1,2,1,3
2   2,4,5   1,1,3
3   1,3,4   1,1,1

The variable service refers to a categorical variable from 1-5, while the score, is also a categorical variable from 1-3 for each of the previous service. Note that each respondent have taken different services, so the length is not the same for each variable is not the same

I need to reshape this dataframe to associate the score to each service. The final result would look like this

data_frame(ind=c(1,1,1,1,2,2,2,3,3,3),
                 serv=c(1,2,3,4,2,4,5,1,3,4),
                 score=c(1,2,1,3,1,1,3,1,1,1))

    ind  serv score
   &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
 1     1     1     1
 2     1     2     2
 3     1     3     1
 4     1     4     3
 5     2     2     1
 6     2     4     1
 7     2     5     3
 8     3     1     1
 9     3     3     1
10     3     4     1

I first split the variable service to create all the categories in this way

library(&quot;qdapTools&quot;,&quot;tidyverse&quot;)

lst1 &lt;- lapply(strsplit(df$serv, &quot;,&quot;), function(x) 
  replace(x, (! x %in% c(&quot;1&quot;, &quot;2&quot;, &quot;3&quot;,&quot;4&quot;,&quot;5&quot;)) &amp; !is.na(x), &quot;other&quot;))

lst1_tab&lt;-mtabulate(lst1)%&gt;% setNames(paste0(&#39;serv_&#39;, names(.)))
df&lt;-cbind(df,lst1_tab)

     serv   score serv_1 serv_2 serv_3 serv_4 serv_5
1 1,2,3,4 1,2,1,3      1      1      1      1      0
2   2,4,5   1,1,3      0      1      0      1      1
3   1,3,4   1,1,1      1      0      1      1      0

I did that to later reshape the df in long form. However, not all individuals took all services. For instance, individual 1 did not take service 5. Therefore, I didn't know how to split also the variable score to associate it to each service

答案1

得分: 3

我们可以使用以下代码:

library(dplyr)
library(tidyr)
df1 %>%
  mutate(ind = row_number(), .before = 1) %>%
  separate_longer_delim(c(service, score), delim = ",")

-output

   ind service score
1    1       1     1
2    1       2     2
3    1       3     1
4    1       4     3
5    2       2     1
6    2       4     1
7    2       5     3
8    3       1     1
9    3       3     1
10   3       4     1
英文:

We could use

library(dplyr)
library(tidyr)
df1 %&gt;% 
 mutate(ind = row_number(), .before = 1) %&gt;% 
 separate_longer_delim(c(service, score), delim = &quot;,&quot;)

-output

   ind service score
1    1       1     1
2    1       2     2
3    1       3     1
4    1       4     3
5    2       2     1
6    2       4     1
7    2       5     3
8    3       1     1
9    3       3     1
10   3       4     1

答案2

得分: 0

使用strsplitunnest的方法:

library(dplyr)
library(tidyr)

df %>%
  mutate(across(everything(), ~ strsplit(.x, ",")), 
         ind = row_number(), .before = service) %>%
  unnest(c(service, score))
# A tibble: 10 × 3
     ind service score
   <int>   <chr> <chr>
 1     1      1      1
 2     1      2      2
 3     1      3      1
 4     1      4      3
 5     2      2      1
 6     2      4      1
 7     2      5      3
 8     3      1      1
 9     3      3      1
10     3      4      1
英文:

An approach using strsplit and unnest

library(dplyr)
library(tidyr)

df %&gt;% 
  mutate(across(everything(), ~ strsplit(.x, &quot;,&quot;)), 
         ind = row_number(), .before = service) %&gt;% 
  unnest(c(service, score))
# A tibble: 10 &#215; 3
     ind service score
   &lt;int&gt; &lt;chr&gt;   &lt;chr&gt;
 1     1 1       1    
 2     1 2       2    
 3     1 3       1    
 4     1 4       3    
 5     2 2       1    
 6     2 4       1    
 7     2 5       3    
 8     3 1       1    
 9     3 3       1    
10     3 4       1

huangapple
  • 本文由 发表于 2023年4月20日 00:03:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76056633.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定