英文:
Creating categories according to several criteria's
问题
我尝试创建一个名为"emotional_ipv"的分类,使用以下标准:
如果所有回答都是"never",则经历了没有 IPV;
如果有一个回答是"once",则是一次性的 IPV 事件;
如果对多个问题至少有一个回答是"once",则是低频次的暴力;
如果至少有一个问题的回答是"a few times",但没有回答是"many times",则是中等频次的 IPV;
如果有回答是"many times",则是高频次的 IPV。
我有以下数据框(df):
df <- structure(list(
subject_id = c("191-5467", "191-6784", "191-3457", "191-0987", "191-1245", "191-2365", "191-4532", "191-9901", "191-2710", "191-5098"),
ipv_q1_en = c("0", "1", "3", "0", "2", "2", "3", "2", "0", "2"),
ipv_q2_en = c("0", "0", "3", "0", "2", "2", "0", "1", "0", "3"),
ipv_q3_en = c("0", "1", "3", "2", "1", "2", "0", "1", "0", "2"),
ipv_q4_en = c("0", "0", "3", "0", "2", "2", "0", "1", "0", "3")),
class = "data.frame",
row.names = c(NA, -10L)
)
编码键:0表示"Never";1表示"Once";2表示"Few times";3表示"Many times"。
我希望得到以下数据框(df1):
df1 <- structure(list(
subject_id = c("191-5467", "191-6784", "191-3457", "191-0987", "191-1245", "191-2365", "191-4532", "191-9901", "191-2710", "191-5098"),
ipv_q1_en = c("0", "1", "3", "0", "2", "2", "3", "2", "0", "2"),
ipv_q2_en = c("0", "0", "3", "0", "2", "2", "0", "1", "0", "3"),
ipv_q3_en = c("0", "1", "3", "2", "1", "2", "0", "1", "0", "2"),
ipv_q4_en = c("0", "0", "3", "0", "2", "2", "0", "1", "0", "3"),
emotional_ipv = c("never", "low frequency", "high frequency", "mid frequency", "mid frequency", "mid frequency", "mid frequency", "high frequency", "never", "high frequency")),
class = "data.frame",
row names = c(NA, -10L)
)
我尝试了以下代码,但肯定不会起作用,我不知道该如何完成它。
英文:
I am trying to created a category called emotional_ipv using the following criteria:
Having experienced no IPV if all responses are “never”; an isolated incident of IPV if one response is “once”; a low frequency of violence if the response is “once” to more than one item; a mid frequency if they respond “a few times” to at least one item, but do not respond “many times” to any item; and a high frequency if there are any responses of “many times”.
df
df <- structure (list(subject_id = c("191-5467", "191-6784", "191-3457", "191-0987", "191-1245","191-2365", "191-4532", "191-9901", "191-2710", "191-5098"), ipv_q1_en = c("0", "1", "3", "0", "2", "2", "3", "2", "0", "2"), ipv_q2_en = c("0", "0", "3", "0", "2", "2", "0", "1", "0", "3"), ipv_q3_en = c("0", "1", "3", "2", "1", "2", "0", "1", "0","2"),ipv_q4_en = c("0", "0", "3", "0", "2", "2", "0", "1", "0", "3")),class = "data.frame", row.names = c (NA, -10L))
coding key...0 Never;1 Once;2 Few times;3 Many times
Desired dataset:
df1 <- structure (list(subject_id = c("191-5467", "191-6784", "191-3457", "191-0987", "191-1245", "191-2365", "191-4532", "191-9901", "191-2710", "191-5098"),ipv_q1_en = c("0", "1", "3", "0", "2", "2", "3", "2", "0", "2"),ipv_q2_en = c("0", "0", "3", "0", "2", "2", "0", "1", "0", "3"),
ipv_q3_en = c("0", "1", "3", "2", "1", "2", "0", "1", "0", "2"),ipv_q4_en = c("0", "0", "3", "0", "2", "2", "0", "1", "0", "3"),emotional_ipv = c("never", "low frequency", "high frequency", "mid frequency","mid frequency","mid frequency", "mid frequency", "high frequency", "never", "high frequency")),class = "data.frame", row.names = c (NA, -10L))
What I have tried
df %>% select(subject_id, ipv_q1_en:ipv_q4_en) %>% ifelse(ipv_q1_en == 0 & ipv_q2_en == 0 & ipv_q3_en == 0 & ipv_q4 == 0, "never", ifelse(sum(ipv_q1_en:ipv_q4_en == 1, "isolated incident")),ifelse(ipv_q1_en <= 2 & ipv_q2_en <= 2 & ipv_q3_en <= 2 & ipv_q4 <= 2, "mid frequency",ifelse())
so the above code definitely won't work but I do not know how else to do it.
答案1
得分: 1
尝试这个(并在数据中有缺失值的情况下添加 na.rm = TRUE
参数):
library(tidyverse)
# 定义数据框
df <- tibble(
subject_id = c(
"191-5467",
"191-6784",
"191-3457",
"191-0987",
"191-1245",
"191-2365",
"191-4532",
"191-9901",
"191-2710",
"191-5098"
),
ipv_q1_en = c(0L, 1L, 3L, 0L, 2L, 2L, 3L, 2L, 0L, 2L),
ipv_q2_en = c(0L, 0L, 3L, 0L, 2L, 2L, 0L, 1L, 0L, 3L),
ipv_q3_en = c(0L, 1L, 3L, 2L, 1L, 2L, 0L, 1L, 0L, 2L),
ipv_q4_en = c(0L, 0L, 3L, 0L, 2L, 2L, 0L, 1L, 0L, 3L)
)
# 重塑数据
df <- df %>%
pivot_longer(
!subject_id,
names_to = "question",
names_pattern = "ipv_q(\\d+)_en",
values_to = "answer")
# 添加情况区分
df %>%
group_by(subject_id) %>%
summarise(emotional_ipv = case_when(
sum(answer) == 0 ~ "never",
sum(answer == 1) == 1 ~ "isolated incident",
sum(answer == 1) > 1 ~ "low frequency",
sum(answer == 2) >= 1 & !any(answer > 2) ~ "medium frequency",
any(answer == 3) ~ "high frequency"
))
创建于2023年03月03日,使用 reprex v2.0.2。
你的 ifelse()
语句不起作用的原因是,如果要修改列,需要将它们包装在 mutate()
中。如果你不想将数据变得更长,你需要使用 rowwise()
允许跨列进行聚合。
英文:
Try this (and add na.rm = TRUE
arguments in case you have missing values in your data):
library(tidyverse)
# define dataframe
df <-tibble(
subject_id = c(
"191-5467",
"191-6784",
"191-3457",
"191-0987",
"191-1245",
"191-2365",
"191-4532",
"191-9901",
"191-2710",
"191-5098"
),
ipv_q1_en = c(0L, 1L, 3L, 0L, 2L, 2L, 3L, 2L, 0L, 2L),
ipv_q2_en = c(0L, 0L, 3L, 0L, 2L, 2L, 0L, 1L, 0L, 3L),
ipv_q3_en = c(0L, 1L, 3L, 2L, 1L, 2L, 0L, 1L, 0L, 2L),
ipv_q4_en = c(0L, 0L, 3L, 0L, 2L, 2L, 0L, 1L, 0L, 3L)
)
# reshape longer
df <- df |>
pivot_longer(
!subject_id,
names_to = "question",
names_pattern = "ipv_q(\\d+)_en",
values_to = "answer")
# add case distinction
df |>
group_by(subject_id) |>
summarise(emotional_ipv = case_when(
sum(answer) == 0 ~ "never",
sum(answer == 1) == 1 ~ "isolated incident",
sum(answer == 1) > 1 ~ "low frequency",
sum(answer == 2) >=1 & !any(answer > 2) ~ "medium frequency",
any(answer == 3) ~ "high frequency"
))
#> # A tibble: 10 × 2
#> subject_id emotional_ipv
#> <chr> <chr>
#> 1 191-0987 medium frequency
#> 2 191-1245 isolated incident
#> 3 191-2365 medium frequency
#> 4 191-2710 never
#> 5 191-3457 high frequency
#> 6 191-4532 high frequency
#> 7 191-5098 high frequency
#> 8 191-5467 never
#> 9 191-6784 low frequency
#> 10 191-9901 low frequency
<sup>Created on 2023-03-03 with reprex v2.0.2</sup>
The reason why your ifelse()
statements do not work is that you need to wrap them inside mutate()
if you want to modify columns. If you prefer not to make your data longer, you need rowwise()
to allow aggregation across columns.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论