如何使用逗号分隔的数字作为数值变量

huangapple go评论81阅读模式
英文:

How to use numbers separated by comma as numeric variables

问题

I understand that you want to translate the provided text without the code parts. Here's the translated text:

我正在处理来自Qualtrics的调查回应数据,并在R中分析这些数据。

有15个问题是多选的,一个人可以选择多个选项(例如,选择了1、3和4的人的输出看起来像"1,3,4")。
假设我有4个问题(而不是15个):社交、情感、认知和家庭。如果一个人在社交问题上选择了1、2和4,输出将是"1,2,4",如果他只选择了家庭问题中的"2",输出将是"2"。请参考下面的示例数据库:

性别 社交 情感 认知 家庭
1 1 1,2,4 3 2
2 2 3,4 4 1,2,4
1 3,4 1,3 1,2,3 1

社交/情感/认知/家庭列中的每个数字代表一个类别。如果被调查者选择了"1",那么我有该类别的积极回答,如果没有选择,我有该类别的消极回答。因此,这些列中的每个数字实际上是二进制响应(积极/消极)。

因此,为了能够分析数据(卡方检验),我希望数据框看起来像这样:

性别 社交1 社交2 社交3 社交4
1
2
1

是否有函数或一系列函数可以让我做到这一点?

请注意,我有15个类似这个问题的列,所以我更希望能够在整个数据框上进行操作,而不仅仅是在一个问题上。

我尝试了这样做(对于每一列):

data <- read.csv("data.csv")
social.data <- data.frame(Sex = c(data$gender),
                       social = c(data$social),
                       str_split_fixed(data$social, ',', 3))

R将数字分开放在不同的列中...
从那里,我无法弄清楚如何得到我上面描述的期望数据框?

英文:

I am working with survey responses from Qualtrics and analysing the data in R.

15 of questions are multiple choice in which a person can choose more than one choice (For example, a person who picked choices 1, 3 and 4 has output that looks like "1,3,4").
Let's say I have 4 questions (and not 15): social, emotional, cognitive and family. If a person picked 1,2 and 4 in social, the output will be "1,2,4" and if he picked only "2" for family, the output will be "2". See example database below:

gender social emotional cognitive family
1 1 1,2,4 3 2
2 2 3,4 4 1,2,4
1 3,4 1,3 1,2,3 1

Each number in the columns social/emotional/cognitive/family represents a category. If the respondent ticked "1", I have an positive answer for that category and if he did not tick it, I have a negative answer for that category. Therefore, each number in these columns is actually a binary response (positive/negative).

Therefore, to be able to analyse the data (chi square), I want the dataframe to look like this:

gender social1 social2 social3 social4
1 yes no yes yes
2 yes yes no no
1 no no yes no

Is there a function or series of functions that will let me do that?

Note that I have 15 questions (i.e., 15 columns) like this one, so I'd prefer if I could do it on the entire dataframe and not only on one question.

I've tried doing this (for each column):

data&lt;- read.csv(&quot;data.csv&quot;)
social.data&lt;- data.frame(Sex=c(data$gender),
                       social=c(data$social),
                       str_split_fixed(data$social, &#39;,&#39;, 3))

R gives me the numbers in separate columns...
From there I couldn't figure out what to do to get to the desired dataframe I described above?

答案1

得分: 1

dat <- data.frame(
  gender = c(1,2,1), 
  social = c("1", "2", "3,4"), 
  emotional = c("1,2,4", "3,4", "1,3"), 
  cognitive = c("3", "4", "1,2,3"), 
  family = c("2", "1,2,4", "1")
)
purrr::map(1:4, \(i){
  dat %>% 
    mutate(across(social:family,  ~purrr::map(str_split(.x, ","), as.numeric))) %>% 
    rowwise() %>% 
    transmute(across(social:family,  ~+(i %in% .x), .names = paste0("{.col}", i)))
}) %>% 
bind_cols() %>% 
bind_cols(dat,.)
英文:

First, use str_split() to split the comma separated strings into a list of numbers. Then, you could map over the known response values to create the binary variables.

library(tidyr)
library(dplyr)
dat &lt;- data.frame(
  gender = c(1,2,1), 
  social = c(&quot;1&quot;, &quot;2&quot;, &quot;3,4&quot;), 
  emotional = c(&quot;1,2,4&quot;, &quot;3,4&quot;, &quot;1,3&quot;), 
  cognitive = c(&quot;3&quot;, &quot;4&quot;, &quot;1,2,3&quot;), 
  family=c(&quot;2&quot;, &quot;1,2,4&quot;, &quot;1&quot;)
)
purrr::map(1:4, \(i){
  dat %&gt;% 
    mutate(across(social:family,  ~purrr::map(str_split(.x, &quot;,&quot;), as.numeric))) %&gt;% 
    rowwise() %&gt;% 
    transmute(across(social:family,  ~+(i %in% .x), .names = paste0(&quot;{.col}&quot;, i)))}) %&gt;% 
    bind_cols() %&gt;% 
    bind_cols(dat,.)
#&gt;   gender social emotional cognitive family social1 emotional1 cognitive1
#&gt; 1      1      1     1,2,4         3      2       1          1          0
#&gt; 2      2      2       3,4         4  1,2,4       0          0          0
#&gt; 3      1    3,4       1,3     1,2,3      1       0          1          1
#&gt;   family1 social2 emotional2 cognitive2 family2 social3 emotional3 cognitive3
#&gt; 1       0       0          1          0       1       0          0          1
#&gt; 2       1       1          0          0       1       0          1          0
#&gt; 3       1       0          0          1       0       1          1          1
#&gt;   family3 social4 emotional4 cognitive4 family4
#&gt; 1       0       0          1          0       0
#&gt; 2       0       0          1          1       1
#&gt; 3       0       1          0          0       0

old answer:

library(stringr)
library(tidyr)
library(dplyr)
dat &lt;- data.frame(
  gender = c(1,2,1), 
  social = c(&quot;1&quot;, &quot;2&quot;, &quot;3,4&quot;), 
  emotional = c(&quot;1,2,4&quot;, &quot;3,4&quot;, &quot;1,3&quot;), 
  cognitive = c(&quot;3&quot;, &quot;4&quot;, &quot;1,2,3&quot;), 
  family=c(&quot;2&quot;, &quot;1,2,4&quot;, &quot;1&quot;)
)
dat &lt;- dat %&gt;% 
  mutate(across(social:family,  ~purrr::map(str_split(.x, &quot;,&quot;), as.numeric)))

Then, one by one, you can use unnest() on the list columns, and pivot them wider using pivot_wider() from tidyr.

dat %&gt;% 
  mutate(obs = row_number()) %&gt;% 
  dplyr::select(obs, everything()) %&gt;% 
  unnest(social) %&gt;% 
  pivot_wider(names_from = &quot;social&quot;, 
              values_from = &quot;social&quot;, 
              values_fn =  function(x)as.numeric(!is.na(x)), 
              names_prefix=&quot;social&quot;, 
              values_fill=0) %&gt;%
  unnest(emotional) %&gt;% 
  pivot_wider(names_from = &quot;emotional&quot;, 
              values_from = &quot;emotional&quot;, 
              values_fn =  function(x)as.numeric(!is.na(x)), 
              names_prefix=&quot;emotional&quot;, 
              values_fill=0) %&gt;% 
  unnest(cognitive) %&gt;% 
  pivot_wider(names_from = &quot;cognitive&quot;, 
              values_from = &quot;cognitive&quot;, 
              values_fn =  function(x)as.numeric(!is.na(x)), 
              names_prefix=&quot;cognitive&quot;, 
              values_fill=0) %&gt;% 
  unnest(family) %&gt;% 
  pivot_wider(names_from = &quot;family&quot;, 
              values_from = &quot;family&quot;, 
              values_fn =  function(x)as.numeric(!is.na(x)), 
              names_prefix=&quot;family&quot;, 
              values_fill=0)
#&gt; # A tibble: 3 &#215; 17
#&gt;     obs gender social1 social2 social3 social4 emotion…&#185; emoti…&#178; emoti…&#179; emoti…⁴
#&gt;   &lt;int&gt;  &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;     &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;
#&gt; 1     1      1       1       0       0       0         1       1       1       0
#&gt; 2     2      2       0       1       0       0         0       0       1       1
#&gt; 3     3      1       0       0       1       1         1       0       0       1
#&gt; # … with 7 more variables: cognitive3 &lt;dbl&gt;, cognitive4 &lt;dbl&gt;,
#&gt; #   cognitive1 &lt;dbl&gt;, cognitive2 &lt;dbl&gt;, family2 &lt;dbl&gt;, family1 &lt;dbl&gt;,
#&gt; #   family4 &lt;dbl&gt;, and abbreviated variable names &#185;​emotional1, &#178;​emotional2,
#&gt; #   &#179;​emotional4, ⁴​emotional3

You could also turn the unnest() and pivot_wider() steps into a function and then just call that function on the data:

u_pivot &lt;- function(.data, x){
  xn &lt;- as_label(enquo(x))
  .data %&gt;% 
    unnest({{ x }}) %&gt;%
    pivot_wider(names_from =  xn,
                values_from =  xn,
                values_fn =  function(x)as.numeric(!is.na(x)),
                names_prefix= xn,
                values_fill=0)
}

dat %&gt;% 
  mutate(obs = row_number()) %&gt;% 
  u_pivot(social) %&gt;% 
  u_pivot(emotional) %&gt;% 
  u_pivot(cognitive) %&gt;% 
  u_pivot(family)

#&gt; # A tibble: 3 &#215; 17
#&gt;   gender   obs social1 social2 social3 social4 emotion…&#185; emoti…&#178; emoti…&#179; emoti…⁴
#&gt;    &lt;dbl&gt; &lt;int&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;     &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;
#&gt; 1      1     1       1       0       0       0         1       1       1       0
#&gt; 2      2     2       0       1       0       0         0       0       1       1
#&gt; 3      1     3       0       0       1       1         1       0       0       1
#&gt; # … with 7 more variables: cognitive3 &lt;dbl&gt;, cognitive4 &lt;dbl&gt;,
#&gt; #   cognitive1 &lt;dbl&gt;, cognitive2 &lt;dbl&gt;, family2 &lt;dbl&gt;, family1 &lt;dbl&gt;,
#&gt; #   family4 &lt;dbl&gt;, and abbreviated variable names &#185;​emotional1, &#178;​emotional2,
#&gt; #   &#179;​emotional4, ⁴​emotional3

<sup>Created on 2023-05-10 with reprex v2.0.2</sup>

答案2

得分: 0

Here's the translated content:

可能是一个好的开始。您可以使用dplyr::across来操作多个列或所有列。这将需要一些后续清理,但应该可以帮助您入门。

首先是一些数据:

library(tidyverse)
data <- tibble(gender=c(1,2),
               social=c(1,'1,2,4'),
               emotional=c(2,'3,4'))

不确定是否有办法编写代码来识别逗号的数量,但这会在少于最大数量时添加空白,并且您需要硬编码最大数量!

data %>%
  mutate(across(.cols = everything(),
                .fns = ~str_split_fixed(.,',',3)))

要重命名,可以查看这个问题:
https://stackoverflow.com/questions/63249873/splitting-multiple-string-columns-and-rename-the-new-columns-adequately-r

以下是从链接中适应到这个问题的代码:

pipe_to_do <- . %>%
  str_split_fixed(string = ., pattern = "(,)", n = 3) %>%
  as_tibble() %>%
  rename(letter = V1,
         number = V2,
         sign = V3)

xx <- data %>%
  summarise(across(everything(), .fns = pipe_to_do))
xx

names_xx <- names(xx)

combine_names <- function(df, name) {
  str_c(name, "_", df)
}

combine_names_func <- function(df, name) {
  df %>%
    rename_with(.fn = ~ combine_names(.x, name))
}

map2(xx, names_xx, combine_names_func) %>%
  reduce(bind_cols)

请注意,这只是已翻译的代码部分。

英文:

A start perhaps. You can use dplyr::across to move across multiple columns or all columns. This will require some cleanup afterwards but should get you started.

First some data:

library(tidyverse)
data &lt;- tibble(gender=c(1,2),
               social=c(1,&#39;1,2,4&#39;),
               emotional=c(2,&#39;3,4&#39;))

Not sure if there is a way to program this so it recognized the number of commas but this adds blanks where there are fewer than the max and you need to hard code the max!

data %&gt;% 
  mutate(across(.cols = everything(),
                .fns = ~str_split_fixed(.,&#39;,&#39;,3)))

To rename maybe examine this question:
https://stackoverflow.com/questions/63249873/splitting-multiple-string-columns-and-rename-the-new-columns-adequately-r

Here's code from the link adapted to this question:

pipe_to_do &lt;- . %&gt;%
  str_split_fixed(string = .,pattern = &quot;(,)&quot;,n = 3) %&gt;% 
  as_tibble() %&gt;% 
  rename(letter = V1,
         number = V2,
         sign = V3)

xx &lt;- data %&gt;%
  summarise(across(everything(),.fns = pipe_to_do))
xx

names_xx &lt;- names(xx)

combine_names &lt;- function(df,name) {
  str_c(name,&quot;_&quot;,df)
}

combine_names_func &lt;- function(df,name){
  df %&gt;% 
    rename_with(.fn = ~ combine_names(.x,name))
}

map2(xx,names_xx,combine_names_func) %&gt;% 
  reduce(bind_cols)
```

</details>



# 答案3
**得分**: 0

使用*data.table*,将数据重塑为长格式 - *melt*,然后在逗号上*split*,然后将其重新重塑为宽格式 - *dcast*:

```R
library(data.table)

d <- fread("gender	social	emotional	cognitive	family
1	1	1,2,4	3	2
2	2	3,4	4	1,2,4
1	3,4	1,3	1,2,3	1")

d[, id := .I
  ][, melt(.SD, id.vars = c("id", "gender"), variable.name = "grp")
    ][, .(x = paste(grp, unlist(tstrsplit(value, split = ",")), sep = "_")), by = .(id, gender, grp)
      ][, dcast(.SD, id + gender ~ x, \(i){sum(!is.na(i))}) ]
```

#    id gender cognitive_1 cognitive_2 cognitive_3 cognitive_4
# 1:  1      1           0           0           1           0
# 2:  2      2           0           0           0           1
# 3:  3      1           1           1           1           0
#    emotional_1 emotional_2 emotional_3 emotional_4 family_1 family_2
# 1:           1           1           0           1        0        1
# 2:           0           0           1           1        1        1
# 3:           1           0           1           0        1        0
#    family_4 social_1 social_2 social_3 social_4

<details>
<summary>英文:</summary>

Using *data.table*, reshape the data to a long format - *melt*, then *split* on comma, and reshape it back to wide format - *dcast*:

    library(data.table)
    
    d &lt;- fread(&quot;gender	social	emotional	cognitive	family
    1	1	1,2,4	3	2
    2	2	3,4	4	1,2,4
    1	3,4	1,3	1,2,3	1&quot;)
    
    d[, id := .I
      ][, melt(.SD, id.vars = c(&quot;id&quot;, &quot;gender&quot;), variable.name = &quot;grp&quot;)
        ][, .(x = paste(grp, unlist(tstrsplit(value, split = &quot;,&quot;)), sep = &quot;_&quot;)), by = .(id, gender, grp)
          ][, dcast(.SD, id + gender ~ x, \(i){sum(!is.na(i))}) ]
    
    #    id gender cognitive_1 cognitive_2 cognitive_3 cognitive_4
    # 1:  1      1           0           0           1           0
    # 2:  2      2           0           0           0           1
    # 3:  3      1           1           1           1           0
    #    emotional_1 emotional_2 emotional_3 emotional_4 family_1 family_2
    # 1:           1           1           0           1        0        1
    # 2:           0           0           1           1        1        1
    # 3:           1           0           1           0        1        0
    #    family_4 social_1 social_2 social_3 social_4
    # 1:        0        1        0        0        0
    # 2:        1        0        1        0        0
    # 3:        0        0        0        1        1



</details>



huangapple
  • 本文由 发表于 2023年5月10日 17:55:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76217067.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定