How to use numbers separated by comma as numeric variables
I understand that you want to translate the provided text without the code parts. Here's the translated text:
性别 | 社交 | 情感 | 认知 | 家庭 |
1 | 1 | 1,2,4 | 3 | 2 |
2 | 2 | 3,4 | 4 | 1,2,4 |
1 | 3,4 | 1,3 | 1,2,3 | 1 |
性别 | 社交1 | 社交2 | 社交3 | 社交4 |
1 | 是 | 否 | 是 | 是 |
2 | 是 | 是 | 否 | 否 |
1 | 否 | 否 | 是 | 否 |
data <- read.csv("data.csv")
social.data <- data.frame(Sex = c(data$gender),
social = c(data$social),
str_split_fixed(data$social, ',', 3))
I am working with survey responses from Qualtrics and analysing the data in R.
15 of questions are multiple choice in which a person can choose more than one choice (For example, a person who picked choices 1, 3 and 4 has output that looks like "1,3,4").
Let's say I have 4 questions (and not 15): social, emotional, cognitive and family. If a person picked 1,2 and 4 in social, the output will be "1,2,4" and if he picked only "2" for family, the output will be "2". See example database below:
gender | social | emotional | cognitive | family |
1 | 1 | 1,2,4 | 3 | 2 |
2 | 2 | 3,4 | 4 | 1,2,4 |
1 | 3,4 | 1,3 | 1,2,3 | 1 |
Each number in the columns social/emotional/cognitive/family represents a category. If the respondent ticked "1", I have an positive answer for that category and if he did not tick it, I have a negative answer for that category. Therefore, each number in these columns is actually a binary response (positive/negative).
Therefore, to be able to analyse the data (chi square), I want the dataframe to look like this:
gender | social1 | social2 | social3 | social4 |
1 | yes | no | yes | yes |
2 | yes | yes | no | no |
1 | no | no | yes | no |
Is there a function or series of functions that will let me do that?
Note that I have 15 questions (i.e., 15 columns) like this one, so I'd prefer if I could do it on the entire dataframe and not only on one question.
I've tried doing this (for each column):
data<- read.csv("data.csv")
social.data<- data.frame(Sex=c(data$gender),
str_split_fixed(data$social, ',', 3))
R gives me the numbers in separate columns...
From there I couldn't figure out what to do to get to the desired dataframe I described above?
得分: 1
dat <- data.frame(
gender = c(1,2,1),
social = c("1", "2", "3,4"),
emotional = c("1,2,4", "3,4", "1,3"),
cognitive = c("3", "4", "1,2,3"),
family = c("2", "1,2,4", "1")
purrr::map(1:4, \(i){
dat %>%
mutate(across(social:family, ~purrr::map(str_split(.x, ","), as.numeric))) %>%
rowwise() %>%
transmute(across(social:family, ~+(i %in% .x), .names = paste0("{.col}", i)))
}) %>%
bind_cols() %>%
First, use str_split()
to split the comma separated strings into a list of numbers. Then, you could map over the known response values to create the binary variables.
dat <- data.frame(
gender = c(1,2,1),
social = c("1", "2", "3,4"),
emotional = c("1,2,4", "3,4", "1,3"),
cognitive = c("3", "4", "1,2,3"),
family=c("2", "1,2,4", "1")
purrr::map(1:4, \(i){
dat %>%
mutate(across(social:family, ~purrr::map(str_split(.x, ","), as.numeric))) %>%
rowwise() %>%
transmute(across(social:family, ~+(i %in% .x), .names = paste0("{.col}", i)))}) %>%
bind_cols() %>%
#> gender social emotional cognitive family social1 emotional1 cognitive1
#> 1 1 1 1,2,4 3 2 1 1 0
#> 2 2 2 3,4 4 1,2,4 0 0 0
#> 3 1 3,4 1,3 1,2,3 1 0 1 1
#> family1 social2 emotional2 cognitive2 family2 social3 emotional3 cognitive3
#> 1 0 0 1 0 1 0 0 1
#> 2 1 1 0 0 1 0 1 0
#> 3 1 0 0 1 0 1 1 1
#> family3 social4 emotional4 cognitive4 family4
#> 1 0 0 1 0 0
#> 2 0 0 1 1 1
#> 3 0 1 0 0 0
old answer:
dat <- data.frame(
gender = c(1,2,1),
social = c("1", "2", "3,4"),
emotional = c("1,2,4", "3,4", "1,3"),
cognitive = c("3", "4", "1,2,3"),
family=c("2", "1,2,4", "1")
dat <- dat %>%
mutate(across(social:family, ~purrr::map(str_split(.x, ","), as.numeric)))
Then, one by one, you can use unnest()
on the list columns, and pivot them wider using pivot_wider()
from tidyr
dat %>%
mutate(obs = row_number()) %>%
dplyr::select(obs, everything()) %>%
unnest(social) %>%
pivot_wider(names_from = "social",
values_from = "social",
values_fn = function(x)as.numeric(!is.na(x)),
values_fill=0) %>%
unnest(emotional) %>%
pivot_wider(names_from = "emotional",
values_from = "emotional",
values_fn = function(x)as.numeric(!is.na(x)),
values_fill=0) %>%
unnest(cognitive) %>%
pivot_wider(names_from = "cognitive",
values_from = "cognitive",
values_fn = function(x)as.numeric(!is.na(x)),
values_fill=0) %>%
unnest(family) %>%
pivot_wider(names_from = "family",
values_from = "family",
values_fn = function(x)as.numeric(!is.na(x)),
#> # A tibble: 3 × 17
#> obs gender social1 social2 social3 social4 emotion…¹ emoti…² emoti…³ emoti…⁴
#> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 1 0 0 0 1 1 1 0
#> 2 2 2 0 1 0 0 0 0 1 1
#> 3 3 1 0 0 1 1 1 0 0 1
#> # … with 7 more variables: cognitive3 <dbl>, cognitive4 <dbl>,
#> # cognitive1 <dbl>, cognitive2 <dbl>, family2 <dbl>, family1 <dbl>,
#> # family4 <dbl>, and abbreviated variable names ¹emotional1, ²emotional2,
#> # ³emotional4, ⁴emotional3
You could also turn the unnest()
and pivot_wider()
steps into a function and then just call that function on the data:
u_pivot <- function(.data, x){
xn <- as_label(enquo(x))
.data %>%
unnest({{ x }}) %>%
pivot_wider(names_from = xn,
values_from = xn,
values_fn = function(x)as.numeric(!is.na(x)),
names_prefix= xn,
dat %>%
mutate(obs = row_number()) %>%
u_pivot(social) %>%
u_pivot(emotional) %>%
u_pivot(cognitive) %>%
#> # A tibble: 3 × 17
#> gender obs social1 social2 social3 social4 emotion…¹ emoti…² emoti…³ emoti…⁴
#> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 1 0 0 0 1 1 1 0
#> 2 2 2 0 1 0 0 0 0 1 1
#> 3 1 3 0 0 1 1 1 0 0 1
#> # … with 7 more variables: cognitive3 <dbl>, cognitive4 <dbl>,
#> # cognitive1 <dbl>, cognitive2 <dbl>, family2 <dbl>, family1 <dbl>,
#> # family4 <dbl>, and abbreviated variable names ¹emotional1, ²emotional2,
#> # ³emotional4, ⁴emotional3
<sup>Created on 2023-05-10 with reprex v2.0.2</sup>
得分: 0
Here's the translated content:
data <- tibble(gender=c(1,2),
data %>%
mutate(across(.cols = everything(),
.fns = ~str_split_fixed(.,',',3)))
pipe_to_do <- . %>%
str_split_fixed(string = ., pattern = "(,)", n = 3) %>%
as_tibble() %>%
rename(letter = V1,
number = V2,
sign = V3)
xx <- data %>%
summarise(across(everything(), .fns = pipe_to_do))
names_xx <- names(xx)
combine_names <- function(df, name) {
str_c(name, "_", df)
combine_names_func <- function(df, name) {
df %>%
rename_with(.fn = ~ combine_names(.x, name))
map2(xx, names_xx, combine_names_func) %>%
A start perhaps. You can use dplyr::across
to move across multiple columns or all columns. This will require some cleanup afterwards but should get you started.
First some data:
data <- tibble(gender=c(1,2),
Not sure if there is a way to program this so it recognized the number of commas but this adds blanks where there are fewer than the max and you need to hard code the max!
data %>%
mutate(across(.cols = everything(),
.fns = ~str_split_fixed(.,',',3)))
To rename maybe examine this question:
Here's code from the link adapted to this question:
pipe_to_do <- . %>%
str_split_fixed(string = .,pattern = "(,)",n = 3) %>%
as_tibble() %>%
rename(letter = V1,
number = V2,
sign = V3)
xx <- data %>%
summarise(across(everything(),.fns = pipe_to_do))
names_xx <- names(xx)
combine_names <- function(df,name) {
combine_names_func <- function(df,name){
df %>%
rename_with(.fn = ~ combine_names(.x,name))
map2(xx,names_xx,combine_names_func) %>%
# 答案3
**得分**: 0
使用*data.table*,将数据重塑为长格式 - *melt*,然后在逗号上*split*,然后将其重新重塑为宽格式 - *dcast*:
d <- fread("gender social emotional cognitive family
1 1 1,2,4 3 2
2 2 3,4 4 1,2,4
1 3,4 1,3 1,2,3 1")
d[, id := .I
][, melt(.SD, id.vars = c("id", "gender"), variable.name = "grp")
][, .(x = paste(grp, unlist(tstrsplit(value, split = ",")), sep = "_")), by = .(id, gender, grp)
][, dcast(.SD, id + gender ~ x, \(i){sum(!is.na(i))}) ]
# id gender cognitive_1 cognitive_2 cognitive_3 cognitive_4
# 1: 1 1 0 0 1 0
# 2: 2 2 0 0 0 1
# 3: 3 1 1 1 1 0
# emotional_1 emotional_2 emotional_3 emotional_4 family_1 family_2
# 1: 1 1 0 1 0 1
# 2: 0 0 1 1 1 1
# 3: 1 0 1 0 1 0
# family_4 social_1 social_2 social_3 social_4
Using *data.table*, reshape the data to a long format - *melt*, then *split* on comma, and reshape it back to wide format - *dcast*:
d <- fread("gender social emotional cognitive family
1 1 1,2,4 3 2
2 2 3,4 4 1,2,4
1 3,4 1,3 1,2,3 1")
d[, id := .I
][, melt(.SD, id.vars = c("id", "gender"), variable.name = "grp")
][, .(x = paste(grp, unlist(tstrsplit(value, split = ",")), sep = "_")), by = .(id, gender, grp)
][, dcast(.SD, id + gender ~ x, \(i){sum(!is.na(i))}) ]
# id gender cognitive_1 cognitive_2 cognitive_3 cognitive_4
# 1: 1 1 0 0 1 0
# 2: 2 2 0 0 0 1
# 3: 3 1 1 1 1 0
# emotional_1 emotional_2 emotional_3 emotional_4 family_1 family_2
# 1: 1 1 0 1 0 1
# 2: 0 0 1 1 1 1
# 3: 1 0 1 0 1 0
# family_4 social_1 social_2 social_3 social_4
# 1: 0 1 0 0 0
# 2: 1 0 1 0 0
# 3: 0 0 0 1 1