2023年5月10日 17:55:47go评论106阅读模式

英文:

How to use numbers separated by comma as numeric variables

问题

I understand that you want to translate the provided text without the code parts. Here's the translated text:

我正在处理来自Qualtrics的调查回应数据，并在R中分析这些数据。

有15个问题是多选的，一个人可以选择多个选项（例如，选择了1、3和4的人的输出看起来像"1,3,4"）。
假设我有4个问题（而不是15个）：社交、情感、认知和家庭。如果一个人在社交问题上选择了1、2和4，输出将是"1,2,4"，如果他只选择了家庭问题中的"2"，输出将是"2"。请参考下面的示例数据库：

性别	社交	情感	认知	家庭
1	1	1,2,4	3	2
2	2	3,4	4	1,2,4
1	3,4	1,3	1,2,3	1

社交/情感/认知/家庭列中的每个数字代表一个类别。如果被调查者选择了"1"，那么我有该类别的积极回答，如果没有选择，我有该类别的消极回答。因此，这些列中的每个数字实际上是二进制响应（积极/消极）。

因此，为了能够分析数据（卡方检验），我希望数据框看起来像这样：

性别	社交1	社交2	社交3	社交4
1	是	否	是	是
2	是	是	否	否
1	否	否	是	否

是否有函数或一系列函数可以让我做到这一点？

请注意，我有15个类似这个问题的列，所以我更希望能够在整个数据框上进行操作，而不仅仅是在一个问题上。

我尝试了这样做（对于每一列）：

data <- read.csv("data.csv")
social.data <- data.frame(Sex = c(data$gender),
                       social = c(data$social),
                       str_split_fixed(data$social, ',', 3))

R将数字分开放在不同的列中...
从那里，我无法弄清楚如何得到我上面描述的期望数据框？

英文:

I am working with survey responses from Qualtrics and analysing the data in R.

15 of questions are multiple choice in which a person can choose more than one choice (For example, a person who picked choices 1, 3 and 4 has output that looks like "1,3,4").
Let's say I have 4 questions (and not 15): social, emotional, cognitive and family. If a person picked 1,2 and 4 in social, the output will be "1,2,4" and if he picked only "2" for family, the output will be "2". See example database below:

gender	social	emotional	cognitive	family
1	1	1,2,4	3	2
2	2	3,4	4	1,2,4
1	3,4	1,3	1,2,3	1

Each number in the columns social/emotional/cognitive/family represents a category. If the respondent ticked "1", I have an positive answer for that category and if he did not tick it, I have a negative answer for that category. Therefore, each number in these columns is actually a binary response (positive/negative).

Therefore, to be able to analyse the data (chi square), I want the dataframe to look like this:

gender	social1	social2	social3	social4
1	yes	no	yes	yes
2	yes	yes	no	no
1	no	no	yes	no

Is there a function or series of functions that will let me do that?

Note that I have 15 questions (i.e., 15 columns) like this one, so I'd prefer if I could do it on the entire dataframe and not only on one question.

I've tried doing this (for each column):

data&lt;- read.csv(&quot;data.csv&quot;)
social.data&lt;- data.frame(Sex=c(data$gender),
                       social=c(data$social),
                       str_split_fixed(data$social, &#39;,&#39;, 3))

R gives me the numbers in separate columns...
From there I couldn't figure out what to do to get to the desired dataframe I described above?

答案1

得分: 1

dat <- data.frame(
  gender = c(1,2,1), 
  social = c("1", "2", "3,4"), 
  emotional = c("1,2,4", "3,4", "1,3"), 
  cognitive = c("3", "4", "1,2,3"), 
  family = c("2", "1,2,4", "1")
)
purrr::map(1:4, \(i){
  dat %>% 
    mutate(across(social:family,  ~purrr::map(str_split(.x, ","), as.numeric))) %>% 
    rowwise() %>% 
    transmute(across(social:family,  ~+(i %in% .x), .names = paste0("{.col}", i)))
}) %>% 
bind_cols() %>% 
bind_cols(dat,.)

英文:

First, use str_split() to split the comma separated strings into a list of numbers. Then, you could map over the known response values to create the binary variables.

library(tidyr)
library(dplyr)
dat &lt;- data.frame(
  gender = c(1,2,1), 
  social = c(&quot;1&quot;, &quot;2&quot;, &quot;3,4&quot;), 
  emotional = c(&quot;1,2,4&quot;, &quot;3,4&quot;, &quot;1,3&quot;), 
  cognitive = c(&quot;3&quot;, &quot;4&quot;, &quot;1,2,3&quot;), 
  family=c(&quot;2&quot;, &quot;1,2,4&quot;, &quot;1&quot;)
)
purrr::map(1:4, \(i){
  dat %&gt;% 
    mutate(across(social:family,  ~purrr::map(str_split(.x, &quot;,&quot;), as.numeric))) %&gt;% 
    rowwise() %&gt;% 
    transmute(across(social:family,  ~+(i %in% .x), .names = paste0(&quot;{.col}&quot;, i)))}) %&gt;% 
    bind_cols() %&gt;% 
    bind_cols(dat,.)
#&gt;   gender social emotional cognitive family social1 emotional1 cognitive1
#&gt; 1      1      1     1,2,4         3      2       1          1          0
#&gt; 2      2      2       3,4         4  1,2,4       0          0          0
#&gt; 3      1    3,4       1,3     1,2,3      1       0          1          1
#&gt;   family1 social2 emotional2 cognitive2 family2 social3 emotional3 cognitive3
#&gt; 1       0       0          1          0       1       0          0          1
#&gt; 2       1       1          0          0       1       0          1          0
#&gt; 3       1       0          0          1       0       1          1          1
#&gt;   family3 social4 emotional4 cognitive4 family4
#&gt; 1       0       0          1          0       0
#&gt; 2       0       0          1          1       1
#&gt; 3       0       1          0          0       0

old answer:

library(stringr)
library(tidyr)
library(dplyr)
dat &lt;- data.frame(
  gender = c(1,2,1), 
  social = c(&quot;1&quot;, &quot;2&quot;, &quot;3,4&quot;), 
  emotional = c(&quot;1,2,4&quot;, &quot;3,4&quot;, &quot;1,3&quot;), 
  cognitive = c(&quot;3&quot;, &quot;4&quot;, &quot;1,2,3&quot;), 
  family=c(&quot;2&quot;, &quot;1,2,4&quot;, &quot;1&quot;)
)
dat &lt;- dat %&gt;% 
  mutate(across(social:family,  ~purrr::map(str_split(.x, &quot;,&quot;), as.numeric)))

Then, one by one, you can use unnest() on the list columns, and pivot them wider using pivot_wider() from tidyr.

dat %&gt;% 
  mutate(obs = row_number()) %&gt;% 
  dplyr::select(obs, everything()) %&gt;% 
  unnest(social) %&gt;% 
  pivot_wider(names_from = &quot;social&quot;, 
              values_from = &quot;social&quot;, 
              values_fn =  function(x)as.numeric(!is.na(x)), 
              names_prefix=&quot;social&quot;, 
              values_fill=0) %&gt;%
  unnest(emotional) %&gt;% 
  pivot_wider(names_from = &quot;emotional&quot;, 
              values_from = &quot;emotional&quot;, 
              values_fn =  function(x)as.numeric(!is.na(x)), 
              names_prefix=&quot;emotional&quot;, 
              values_fill=0) %&gt;% 
  unnest(cognitive) %&gt;% 
  pivot_wider(names_from = &quot;cognitive&quot;, 
              values_from = &quot;cognitive&quot;, 
              values_fn =  function(x)as.numeric(!is.na(x)), 
              names_prefix=&quot;cognitive&quot;, 
              values_fill=0) %&gt;% 
  unnest(family) %&gt;% 
  pivot_wider(names_from = &quot;family&quot;, 
              values_from = &quot;family&quot;, 
              values_fn =  function(x)as.numeric(!is.na(x)), 
              names_prefix=&quot;family&quot;, 
              values_fill=0)
#&gt; # A tibble: 3 &#215; 17
#&gt;     obs gender social1 social2 social3 social4 emotion…&#185; emoti…&#178; emoti…&#179; emoti…⁴
#&gt;   &lt;int&gt;  &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;     &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;
#&gt; 1     1      1       1       0       0       0         1       1       1       0
#&gt; 2     2      2       0       1       0       0         0       0       1       1
#&gt; 3     3      1       0       0       1       1         1       0       0       1
#&gt; # … with 7 more variables: cognitive3 &lt;dbl&gt;, cognitive4 &lt;dbl&gt;,
#&gt; #   cognitive1 &lt;dbl&gt;, cognitive2 &lt;dbl&gt;, family2 &lt;dbl&gt;, family1 &lt;dbl&gt;,
#&gt; #   family4 &lt;dbl&gt;, and abbreviated variable names &#185;emotional1, &#178;emotional2,
#&gt; #   &#179;emotional4, ⁴emotional3

You could also turn the unnest() and pivot_wider() steps into a function and then just call that function on the data:

u_pivot &lt;- function(.data, x){
  xn &lt;- as_label(enquo(x))
  .data %&gt;% 
    unnest({{ x }}) %&gt;%
    pivot_wider(names_from =  xn,
                values_from =  xn,
                values_fn =  function(x)as.numeric(!is.na(x)),
                names_prefix= xn,
                values_fill=0)
}
dat %&gt;% 
  mutate(obs = row_number()) %&gt;% 
  u_pivot(social) %&gt;% 
  u_pivot(emotional) %&gt;% 
  u_pivot(cognitive) %&gt;% 
  u_pivot(family)
#&gt; # A tibble: 3 &#215; 17
#&gt;   gender   obs social1 social2 social3 social4 emotion…&#185; emoti…&#178; emoti…&#179; emoti…⁴
#&gt;    &lt;dbl&gt; &lt;int&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;     &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;
#&gt; 1      1     1       1       0       0       0         1       1       1       0
#&gt; 2      2     2       0       1       0       0         0       0       1       1
#&gt; 3      1     3       0       0       1       1         1       0       0       1
#&gt; # … with 7 more variables: cognitive3 &lt;dbl&gt;, cognitive4 &lt;dbl&gt;,
#&gt; #   cognitive1 &lt;dbl&gt;, cognitive2 &lt;dbl&gt;, family2 &lt;dbl&gt;, family1 &lt;dbl&gt;,
#&gt; #   family4 &lt;dbl&gt;, and abbreviated variable names &#185;emotional1, &#178;emotional2,
#&gt; #   &#179;emotional4, ⁴emotional3

<sup>Created on 2023-05-10 with reprex v2.0.2</sup>

答案2

得分: 0

Here's the translated content:

可能是一个好的开始。您可以使用dplyr::across来操作多个列或所有列。这将需要一些后续清理，但应该可以帮助您入门。

首先是一些数据：

library(tidyverse)
data <- tibble(gender=c(1,2),
               social=c(1,'1,2,4'),
               emotional=c(2,'3,4'))

不确定是否有办法编写代码来识别逗号的数量，但这会在少于最大数量时添加空白，并且您需要硬编码最大数量！

data %>%
  mutate(across(.cols = everything(),
                .fns = ~str_split_fixed(.,',',3)))

要重命名，可以查看这个问题：
https://stackoverflow.com/questions/63249873/splitting-multiple-string-columns-and-rename-the-new-columns-adequately-r

以下是从链接中适应到这个问题的代码：

pipe_to_do <- . %>%
  str_split_fixed(string = ., pattern = "(,)", n = 3) %>%
  as_tibble() %>%
  rename(letter = V1,
         number = V2,
         sign = V3)
xx <- data %>%
  summarise(across(everything(), .fns = pipe_to_do))
xx
names_xx <- names(xx)
combine_names <- function(df, name) {
  str_c(name, "_", df)
}
combine_names_func <- function(df, name) {
  df %>%
    rename_with(.fn = ~ combine_names(.x, name))
}
map2(xx, names_xx, combine_names_func) %>%
  reduce(bind_cols)

请注意，这只是已翻译的代码部分。

英文:

A start perhaps. You can use dplyr::across to move across multiple columns or all columns. This will require some cleanup afterwards but should get you started.

First some data:

library(tidyverse)
data &lt;- tibble(gender=c(1,2),
               social=c(1,&#39;1,2,4&#39;),
               emotional=c(2,&#39;3,4&#39;))

Not sure if there is a way to program this so it recognized the number of commas but this adds blanks where there are fewer than the max and you need to hard code the max!

data %&gt;% 
  mutate(across(.cols = everything(),
                .fns = ~str_split_fixed(.,&#39;,&#39;,3)))

To rename maybe examine this question:
https://stackoverflow.com/questions/63249873/splitting-multiple-string-columns-and-rename-the-new-columns-adequately-r

Here's code from the link adapted to this question:

pipe_to_do &lt;- . %&gt;%
  str_split_fixed(string = .,pattern = &quot;(,)&quot;,n = 3) %&gt;% 
  as_tibble() %&gt;% 
  rename(letter = V1,
         number = V2,
         sign = V3)
xx &lt;- data %&gt;%
  summarise(across(everything(),.fns = pipe_to_do))
xx
names_xx &lt;- names(xx)
combine_names &lt;- function(df,name) {
  str_c(name,&quot;_&quot;,df)
}
combine_names_func &lt;- function(df,name){
  df %&gt;% 
    rename_with(.fn = ~ combine_names(.x,name))
}
map2(xx,names_xx,combine_names_func) %&gt;% 
  reduce(bind_cols)
```
</details>
# 答案3
**得分**: 0
使用*data.table*，将数据重塑为长格式 - *melt*，然后在逗号上*split*，然后将其重新重塑为宽格式 - *dcast*：
```R
library(data.table)
d <- fread("gender	social	emotional	cognitive	family
1	1	1,2,4	3	2
2	2	3,4	4	1,2,4
1	3,4	1,3	1,2,3	1")
d[, id := .I
  ][, melt(.SD, id.vars = c("id", "gender"), variable.name = "grp")
    ][, .(x = paste(grp, unlist(tstrsplit(value, split = ",")), sep = "_")), by = .(id, gender, grp)
      ][, dcast(.SD, id + gender ~ x, \(i){sum(!is.na(i))}) ]
```
#    id gender cognitive_1 cognitive_2 cognitive_3 cognitive_4
# 1:  1      1           0           0           1           0
# 2:  2      2           0           0           0           1
# 3:  3      1           1           1           1           0
#    emotional_1 emotional_2 emotional_3 emotional_4 family_1 family_2
# 1:           1           1           0           1        0        1
# 2:           0           0           1           1        1        1
# 3:           1           0           1           0        1        0
#    family_4 social_1 social_2 social_3 social_4
<details>
<summary>英文:</summary>
Using *data.table*, reshape the data to a long format - *melt*, then *split* on comma, and reshape it back to wide format - *dcast*:
    library(data.table)
    
    d &lt;- fread(&quot;gender	social	emotional	cognitive	family
    1	1	1,2,4	3	2
    2	2	3,4	4	1,2,4
    1	3,4	1,3	1,2,3	1&quot;)
    
    d[, id := .I
      ][, melt(.SD, id.vars = c(&quot;id&quot;, &quot;gender&quot;), variable.name = &quot;grp&quot;)
        ][, .(x = paste(grp, unlist(tstrsplit(value, split = &quot;,&quot;)), sep = &quot;_&quot;)), by = .(id, gender, grp)
          ][, dcast(.SD, id + gender ~ x, \(i){sum(!is.na(i))}) ]
    
    #    id gender cognitive_1 cognitive_2 cognitive_3 cognitive_4
    # 1:  1      1           0           0           1           0
    # 2:  2      2           0           0           0           1
    # 3:  3      1           1           1           1           0
    #    emotional_1 emotional_2 emotional_3 emotional_4 family_1 family_2
    # 1:           1           1           0           1        0        1
    # 2:           0           0           1           1        1        1
    # 3:           1           0           1           0        1        0
    #    family_4 social_1 social_2 social_3 social_4
    # 1:        0        1        0        0        0
    # 2:        1        0        1        0        0
    # 3:        0        0        0        1        1
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用逗号分隔的数字作为数值变量

问题

答案1

old answer:

答案2

使用`from_dict`函数并选择`orient=”index”`选项后，删除行索引。

尝试将种族和族裔类别结合起来，以了解更多关于患者的信息。

下载在Shiny中进行数据处理后的fileInput()。

如何在单个数据集中将多个列进行 “left_join” 合并为一列？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。