2023年4月20日 00:03:33go评论176阅读模式

英文:

Reshape strings within a dataframe in R

问题

Sure, here's the translation of the provided content:

我有一个看起来像这样的数据框

data.frame(service=c("1,2,3,4","2,4,5","1,3,4"),
                      score=c("1,2,1,3","1,1,3","1,1,1"))
        a       b
1 1,2,3,4 1,2,1,3
2   2,4,5   1,1,3
3   1,3,4   1,1,1

变量"service"是一个分类变量，从1到5，而"score"也是一个分类变量，范围从1到3，对于前述每个服务。请注意，每个受访者都接受了不同的服务，因此每个变量的长度不相同。

我需要将此数据框重新整理，以将得分与每个服务关联起来。最终结果将如下所示：

data_frame(ind=c(1,1,1,1,2,2,2,3,3,3),
                 serv=c(1,2,3,4,2,4,5,1,3,4),
                 score=c(1,2,1,3,1,1,3,1,1,1))

    ind  serv score
   <dbl> <dbl> <dbl>
 1     1     1     1
 2     1     2     2
 3     1     3     1
 4     1     4     3
 5     2     2     1
 6     2     4     1
 7     2     5     3
 8     3     1     1
 9     3     3     1
10     3     4     1

我首先分割了"service"变量，以创建所有这样的类别

library("qdapTools","tidyverse")

lst1 <- lapply(strsplit(df$serv, ","), function(x) 
  replace(x, (! x %in% c("1", "2", "3","4","5")) & !is.na(x), "other"))

lst1_tab<-mtabulate(lst1)%>% setNames(paste0('serv_', names(.)))
df<-cbind(df,lst1_tab)

     serv   score serv_1 serv_2 serv_3 serv_4 serv_5
1 1,2,3,4 1,2,1,3      1      1      1      1      0
2   2,4,5   1,1,3      0      1      0      1      1
3   1,3,4   1,1,1      1      0      1      1      0

我这样做是为了后来将数据框重新整理为长格式。然而，并非所有个体都接受了所有服务。例如，个体1没有接受服务5。因此，我不知道如何分割"score"变量以将其与每个服务关联起来。

英文:

I have a dataframe that looks like this

data.frame(service=c(&quot;1,2,3,4&quot;,&quot;2,4,5&quot;,&quot;1,3,4&quot;),
                      score=c(&quot;1,2,1,3&quot;,&quot;1,1,3&quot;,&quot;1,1,1&quot;))
        a       b
1 1,2,3,4 1,2,1,3
2   2,4,5   1,1,3
3   1,3,4   1,1,1

The variable service refers to a categorical variable from 1-5, while the score, is also a categorical variable from 1-3 for each of the previous service. Note that each respondent have taken different services, so the length is not the same for each variable is not the same

I need to reshape this dataframe to associate the score to each service. The final result would look like this

data_frame(ind=c(1,1,1,1,2,2,2,3,3,3),
                 serv=c(1,2,3,4,2,4,5,1,3,4),
                 score=c(1,2,1,3,1,1,3,1,1,1))

    ind  serv score
   &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
 1     1     1     1
 2     1     2     2
 3     1     3     1
 4     1     4     3
 5     2     2     1
 6     2     4     1
 7     2     5     3
 8     3     1     1
 9     3     3     1
10     3     4     1

I first split the variable service to create all the categories in this way

library(&quot;qdapTools&quot;,&quot;tidyverse&quot;)

lst1 &lt;- lapply(strsplit(df$serv, &quot;,&quot;), function(x) 
  replace(x, (! x %in% c(&quot;1&quot;, &quot;2&quot;, &quot;3&quot;,&quot;4&quot;,&quot;5&quot;)) &amp; !is.na(x), &quot;other&quot;))

lst1_tab&lt;-mtabulate(lst1)%&gt;% setNames(paste0(&#39;serv_&#39;, names(.)))
df&lt;-cbind(df,lst1_tab)

     serv   score serv_1 serv_2 serv_3 serv_4 serv_5
1 1,2,3,4 1,2,1,3      1      1      1      1      0
2   2,4,5   1,1,3      0      1      0      1      1
3   1,3,4   1,1,1      1      0      1      1      0

I did that to later reshape the df in long form. However, not all individuals took all services. For instance, individual 1 did not take service 5. Therefore, I didn't know how to split also the variable score to associate it to each service

答案1

得分: 3

我们可以使用以下代码：

library(dplyr)
library(tidyr)
df1 %>%
  mutate(ind = row_number(), .before = 1) %>%
  separate_longer_delim(c(service, score), delim = ",")

-output

   ind service score
1    1       1     1
2    1       2     2
3    1       3     1
4    1       4     3
5    2       2     1
6    2       4     1
7    2       5     3
8    3       1     1
9    3       3     1
10   3       4     1

英文:

We could use

library(dplyr)
library(tidyr)
df1 %&gt;% 
 mutate(ind = row_number(), .before = 1) %&gt;% 
 separate_longer_delim(c(service, score), delim = &quot;,&quot;)

-output

   ind service score
1    1       1     1
2    1       2     2
3    1       3     1
4    1       4     3
5    2       2     1
6    2       4     1
7    2       5     3
8    3       1     1
9    3       3     1
10   3       4     1

答案2

得分: 0

使用strsplit和unnest的方法：

library(dplyr)
library(tidyr)

df %>%
  mutate(across(everything(), ~ strsplit(.x, ",")), 
         ind = row_number(), .before = service) %>%
  unnest(c(service, score))
# A tibble: 10 × 3
     ind service score
   <int>   <chr> <chr>
 1     1      1      1
 2     1      2      2
 3     1      3      1
 4     1      4      3
 5     2      2      1
 6     2      4      1
 7     2      5      3
 8     3      1      1
 9     3      3      1
10     3      4      1

英文:

An approach using strsplit and unnest

library(dplyr)
library(tidyr)

df %&gt;% 
  mutate(across(everything(), ~ strsplit(.x, &quot;,&quot;)), 
         ind = row_number(), .before = service) %&gt;% 
  unnest(c(service, score))
# A tibble: 10 &#215; 3
     ind service score
   &lt;int&gt; &lt;chr&gt;   &lt;chr&gt;
 1     1 1       1    
 2     1 2       2    
 3     1 3       1    
 4     1 4       3    
 5     2 2       1    
 6     2 4       1    
 7     2 5       3    
 8     3 1       1    
 9     3 3       1    
10     3 4       1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

重塑数据框中的字符串在 R 中

问题

答案1

答案2

保存R环境中的绘图列表

在R中按重复日期绑定或合并行。

在R中，如何根据条件将特定行中的值替换为另一行中的值？

在ggplot地图中添加自定义图例。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论