重塑数据框中的字符串在 R 中

huangapple go评论82阅读模式
英文:

Reshape strings within a dataframe in R

问题

Sure, here's the translation of the provided content:

我有一个看起来像这样的数据框

  1. data.frame(service=c("1,2,3,4","2,4,5","1,3,4"),
  2. score=c("1,2,1,3","1,1,3","1,1,1"))
  3. a b
  4. 1 1,2,3,4 1,2,1,3
  5. 2 2,4,5 1,1,3
  6. 3 1,3,4 1,1,1

变量"service"是一个分类变量,从1到5,而"score"也是一个分类变量,范围从1到3,对于前述每个服务。请注意,每个受访者都接受了不同的服务,因此每个变量的长度不相同。

我需要将此数据框重新整理,以将得分与每个服务关联起来。最终结果将如下所示:

  1. data_frame(ind=c(1,1,1,1,2,2,2,3,3,3),
  2. serv=c(1,2,3,4,2,4,5,1,3,4),
  3. score=c(1,2,1,3,1,1,3,1,1,1))
  4. ind serv score
  5. <dbl> <dbl> <dbl>
  6. 1 1 1 1
  7. 2 1 2 2
  8. 3 1 3 1
  9. 4 1 4 3
  10. 5 2 2 1
  11. 6 2 4 1
  12. 7 2 5 3
  13. 8 3 1 1
  14. 9 3 3 1
  15. 10 3 4 1

我首先分割了"service"变量,以创建所有这样的类别

  1. library("qdapTools","tidyverse")
  2. lst1 <- lapply(strsplit(df$serv, ","), function(x)
  3. replace(x, (! x %in% c("1", "2", "3","4","5")) & !is.na(x), "other"))
  4. lst1_tab<-mtabulate(lst1)%>% setNames(paste0('serv_', names(.)))
  5. df<-cbind(df,lst1_tab)
  6. serv score serv_1 serv_2 serv_3 serv_4 serv_5
  7. 1 1,2,3,4 1,2,1,3 1 1 1 1 0
  8. 2 2,4,5 1,1,3 0 1 0 1 1
  9. 3 1,3,4 1,1,1 1 0 1 1 0

我这样做是为了后来将数据框重新整理为长格式。然而,并非所有个体都接受了所有服务。例如,个体1没有接受服务5。因此,我不知道如何分割"score"变量以将其与每个服务关联起来。

英文:

I have a dataframe that looks like this

  1. data.frame(service=c(&quot;1,2,3,4&quot;,&quot;2,4,5&quot;,&quot;1,3,4&quot;),
  2. score=c(&quot;1,2,1,3&quot;,&quot;1,1,3&quot;,&quot;1,1,1&quot;))
  3. a b
  4. 1 1,2,3,4 1,2,1,3
  5. 2 2,4,5 1,1,3
  6. 3 1,3,4 1,1,1

The variable service refers to a categorical variable from 1-5, while the score, is also a categorical variable from 1-3 for each of the previous service. Note that each respondent have taken different services, so the length is not the same for each variable is not the same

I need to reshape this dataframe to associate the score to each service. The final result would look like this

  1. data_frame(ind=c(1,1,1,1,2,2,2,3,3,3),
  2. serv=c(1,2,3,4,2,4,5,1,3,4),
  3. score=c(1,2,1,3,1,1,3,1,1,1))
  4. ind serv score
  5. &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
  6. 1 1 1 1
  7. 2 1 2 2
  8. 3 1 3 1
  9. 4 1 4 3
  10. 5 2 2 1
  11. 6 2 4 1
  12. 7 2 5 3
  13. 8 3 1 1
  14. 9 3 3 1
  15. 10 3 4 1

I first split the variable service to create all the categories in this way

  1. library(&quot;qdapTools&quot;,&quot;tidyverse&quot;)
  2. lst1 &lt;- lapply(strsplit(df$serv, &quot;,&quot;), function(x)
  3. replace(x, (! x %in% c(&quot;1&quot;, &quot;2&quot;, &quot;3&quot;,&quot;4&quot;,&quot;5&quot;)) &amp; !is.na(x), &quot;other&quot;))
  4. lst1_tab&lt;-mtabulate(lst1)%&gt;% setNames(paste0(&#39;serv_&#39;, names(.)))
  5. df&lt;-cbind(df,lst1_tab)
  6. serv score serv_1 serv_2 serv_3 serv_4 serv_5
  7. 1 1,2,3,4 1,2,1,3 1 1 1 1 0
  8. 2 2,4,5 1,1,3 0 1 0 1 1
  9. 3 1,3,4 1,1,1 1 0 1 1 0

I did that to later reshape the df in long form. However, not all individuals took all services. For instance, individual 1 did not take service 5. Therefore, I didn't know how to split also the variable score to associate it to each service

答案1

得分: 3

我们可以使用以下代码:

  1. library(dplyr)
  2. library(tidyr)
  3. df1 %>%
  4. mutate(ind = row_number(), .before = 1) %>%
  5. separate_longer_delim(c(service, score), delim = ",")

-output

  1. ind service score
  2. 1 1 1 1
  3. 2 1 2 2
  4. 3 1 3 1
  5. 4 1 4 3
  6. 5 2 2 1
  7. 6 2 4 1
  8. 7 2 5 3
  9. 8 3 1 1
  10. 9 3 3 1
  11. 10 3 4 1
英文:

We could use

  1. library(dplyr)
  2. library(tidyr)
  3. df1 %&gt;%
  4. mutate(ind = row_number(), .before = 1) %&gt;%
  5. separate_longer_delim(c(service, score), delim = &quot;,&quot;)

-output

  1. ind service score
  2. 1 1 1 1
  3. 2 1 2 2
  4. 3 1 3 1
  5. 4 1 4 3
  6. 5 2 2 1
  7. 6 2 4 1
  8. 7 2 5 3
  9. 8 3 1 1
  10. 9 3 3 1
  11. 10 3 4 1

答案2

得分: 0

使用strsplitunnest的方法:

  1. library(dplyr)
  2. library(tidyr)
  3. df %>%
  4. mutate(across(everything(), ~ strsplit(.x, ",")),
  5. ind = row_number(), .before = service) %>%
  6. unnest(c(service, score))
  7. # A tibble: 10 × 3
  8. ind service score
  9. <int> <chr> <chr>
  10. 1 1 1 1
  11. 2 1 2 2
  12. 3 1 3 1
  13. 4 1 4 3
  14. 5 2 2 1
  15. 6 2 4 1
  16. 7 2 5 3
  17. 8 3 1 1
  18. 9 3 3 1
  19. 10 3 4 1
英文:

An approach using strsplit and unnest

  1. library(dplyr)
  2. library(tidyr)
  3. df %&gt;%
  4. mutate(across(everything(), ~ strsplit(.x, &quot;,&quot;)),
  5. ind = row_number(), .before = service) %&gt;%
  6. unnest(c(service, score))
  7. # A tibble: 10 &#215; 3
  8. ind service score
  9. &lt;int&gt; &lt;chr&gt; &lt;chr&gt;
  10. 1 1 1 1
  11. 2 1 2 2
  12. 3 1 3 1
  13. 4 1 4 3
  14. 5 2 2 1
  15. 6 2 4 1
  16. 7 2 5 3
  17. 8 3 1 1
  18. 9 3 3 1
  19. 10 3 4 1

huangapple
  • 本文由 发表于 2023年4月20日 00:03:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76056633.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定