根据多个标准创建分类

huangapple go评论75阅读模式
英文:

Creating categories according to several criteria's

问题

我尝试创建一个名为"emotional_ipv"的分类,使用以下标准:

如果所有回答都是"never",则经历了没有 IPV;
如果有一个回答是"once",则是一次性的 IPV 事件;
如果对多个问题至少有一个回答是"once",则是低频次的暴力;
如果至少有一个问题的回答是"a few times",但没有回答是"many times",则是中等频次的 IPV;
如果有回答是"many times",则是高频次的 IPV。

我有以下数据框(df):

df <- structure(list(
  subject_id = c("191-5467", "191-6784", "191-3457", "191-0987", "191-1245", "191-2365", "191-4532", "191-9901", "191-2710", "191-5098"),
  ipv_q1_en = c("0", "1", "3", "0", "2", "2", "3", "2", "0", "2"),
  ipv_q2_en = c("0", "0", "3", "0", "2", "2", "0", "1", "0", "3"),
  ipv_q3_en = c("0", "1", "3", "2", "1", "2", "0", "1", "0", "2"),
  ipv_q4_en = c("0", "0", "3", "0", "2", "2", "0", "1", "0", "3")),
  class = "data.frame",
  row.names = c(NA, -10L)
)

编码键:0表示"Never";1表示"Once";2表示"Few times";3表示"Many times"。

我希望得到以下数据框(df1):

df1 <- structure(list(
  subject_id = c("191-5467", "191-6784", "191-3457", "191-0987", "191-1245", "191-2365", "191-4532", "191-9901", "191-2710", "191-5098"),
  ipv_q1_en = c("0", "1", "3", "0", "2", "2", "3", "2", "0", "2"),
  ipv_q2_en = c("0", "0", "3", "0", "2", "2", "0", "1", "0", "3"),
  ipv_q3_en = c("0", "1", "3", "2", "1", "2", "0", "1", "0", "2"),
  ipv_q4_en = c("0", "0", "3", "0", "2", "2", "0", "1", "0", "3"),
  emotional_ipv = c("never", "low frequency", "high frequency", "mid frequency", "mid frequency", "mid frequency", "mid frequency", "high frequency", "never", "high frequency")),
  class = "data.frame",
  row names = c(NA, -10L)
)

我尝试了以下代码,但肯定不会起作用,我不知道该如何完成它。

英文:

I am trying to created a category called emotional_ipv using the following criteria:

Having experienced no IPV if all responses are “never”; an isolated incident of IPV if one response is “once”; a low frequency of violence if the response is “once” to more than one item; a mid frequency if they respond “a few times” to at least one item, but do not respond “many times” to any item; and a high frequency if there are any responses of “many times”.

df

df &lt;- structure (list(subject_id = c(&quot;191-5467&quot;, &quot;191-6784&quot;, &quot;191-3457&quot;, &quot;191-0987&quot;, &quot;191-1245&quot;,&quot;191-2365&quot;, &quot;191-4532&quot;, &quot;191-9901&quot;, &quot;191-2710&quot;, &quot;191-5098&quot;), ipv_q1_en = c(&quot;0&quot;, &quot;1&quot;, &quot;3&quot;, &quot;0&quot;, &quot;2&quot;, &quot;2&quot;, &quot;3&quot;, &quot;2&quot;, &quot;0&quot;, &quot;2&quot;), ipv_q2_en = c(&quot;0&quot;, &quot;0&quot;, &quot;3&quot;, &quot;0&quot;, &quot;2&quot;, &quot;2&quot;, &quot;0&quot;, &quot;1&quot;, &quot;0&quot;, &quot;3&quot;), ipv_q3_en = c(&quot;0&quot;, &quot;1&quot;, &quot;3&quot;, &quot;2&quot;, &quot;1&quot;, &quot;2&quot;, &quot;0&quot;, &quot;1&quot;, &quot;0&quot;,&quot;2&quot;),ipv_q4_en = c(&quot;0&quot;, &quot;0&quot;, &quot;3&quot;, &quot;0&quot;, &quot;2&quot;, &quot;2&quot;, &quot;0&quot;, &quot;1&quot;, &quot;0&quot;, &quot;3&quot;)),class = &quot;data.frame&quot;, row.names = c (NA, -10L))

coding key...0 Never;1 Once;2 Few times;3 Many times

Desired dataset:

df1 &lt;- structure (list(subject_id = c(&quot;191-5467&quot;, &quot;191-6784&quot;, &quot;191-3457&quot;, &quot;191-0987&quot;, &quot;191-1245&quot;,                                   &quot;191-2365&quot;, &quot;191-4532&quot;, &quot;191-9901&quot;, &quot;191-2710&quot;, &quot;191-5098&quot;),ipv_q1_en = c(&quot;0&quot;, &quot;1&quot;, &quot;3&quot;, &quot;0&quot;, &quot;2&quot;, &quot;2&quot;, &quot;3&quot;, &quot;2&quot;, &quot;0&quot;, &quot;2&quot;),ipv_q2_en = c(&quot;0&quot;, &quot;0&quot;, &quot;3&quot;, &quot;0&quot;, &quot;2&quot;, &quot;2&quot;, &quot;0&quot;, &quot;1&quot;, &quot;0&quot;, &quot;3&quot;), 
ipv_q3_en = c(&quot;0&quot;, &quot;1&quot;, &quot;3&quot;, &quot;2&quot;, &quot;1&quot;, &quot;2&quot;, &quot;0&quot;, &quot;1&quot;, &quot;0&quot;, &quot;2&quot;),ipv_q4_en = c(&quot;0&quot;, &quot;0&quot;, &quot;3&quot;, &quot;0&quot;, &quot;2&quot;, &quot;2&quot;, &quot;0&quot;, &quot;1&quot;, &quot;0&quot;, &quot;3&quot;),emotional_ipv = c(&quot;never&quot;, &quot;low frequency&quot;, &quot;high frequency&quot;, &quot;mid frequency&quot;,&quot;mid frequency&quot;,&quot;mid frequency&quot;, &quot;mid frequency&quot;, &quot;high frequency&quot;, &quot;never&quot;, &quot;high frequency&quot;)),class = &quot;data.frame&quot;, row.names = c (NA, -10L))

What I have tried

df %&gt;% select(subject_id, ipv_q1_en:ipv_q4_en) %&gt;% ifelse(ipv_q1_en == 0 &amp; ipv_q2_en == 0 &amp; ipv_q3_en == 0 &amp; ipv_q4 == 0, &quot;never&quot;, ifelse(sum(ipv_q1_en:ipv_q4_en == 1, &quot;isolated incident&quot;)),ifelse(ipv_q1_en &lt;= 2 &amp; ipv_q2_en &lt;= 2 &amp; ipv_q3_en &lt;= 2 &amp; ipv_q4 &lt;= 2, &quot;mid frequency&quot;,ifelse())

so the above code definitely won't work but I do not know how else to do it.

答案1

得分: 1

尝试这个(并在数据中有缺失值的情况下添加 na.rm = TRUE 参数):

library(tidyverse)

# 定义数据框
df <- tibble(
    subject_id = c(
      "191-5467",
      "191-6784",
      "191-3457",
      "191-0987",
      "191-1245",
      "191-2365",
      "191-4532",
      "191-9901",
      "191-2710",
      "191-5098"
    ),
    ipv_q1_en = c(0L, 1L, 3L, 0L, 2L, 2L, 3L, 2L, 0L, 2L),
    ipv_q2_en = c(0L, 0L, 3L, 0L, 2L, 2L, 0L, 1L, 0L, 3L),
    ipv_q3_en = c(0L, 1L, 3L, 2L, 1L, 2L, 0L, 1L, 0L, 2L),
    ipv_q4_en = c(0L, 0L, 3L, 0L, 2L, 2L, 0L, 1L, 0L, 3L)
  )

# 重塑数据
df <- df %>% 
  pivot_longer(
    !subject_id,
    names_to = "question",
    names_pattern = "ipv_q(\\d+)_en",
    values_to = "answer")

# 添加情况区分
df %>% 
  group_by(subject_id) %>% 
  summarise(emotional_ipv = case_when(
    sum(answer) == 0 ~ "never",
    sum(answer == 1) == 1 ~ "isolated incident",
    sum(answer == 1) > 1 ~ "low frequency",
    sum(answer == 2) >= 1 & !any(answer > 2) ~ "medium frequency",
    any(answer == 3) ~ "high frequency"
  ))

创建于2023年03月03日,使用 reprex v2.0.2

你的 ifelse() 语句不起作用的原因是,如果要修改列,需要将它们包装在 mutate() 中。如果你不想将数据变得更长,你需要使用 rowwise() 允许跨列进行聚合。

英文:

Try this (and add na.rm = TRUE arguments in case you have missing values in your data):

library(tidyverse)

# define dataframe
df &lt;-tibble(
    subject_id = c(
      &quot;191-5467&quot;,
      &quot;191-6784&quot;,
      &quot;191-3457&quot;,
      &quot;191-0987&quot;,
      &quot;191-1245&quot;,
      &quot;191-2365&quot;,
      &quot;191-4532&quot;,
      &quot;191-9901&quot;,
      &quot;191-2710&quot;,
      &quot;191-5098&quot;
    ),
    ipv_q1_en = c(0L, 1L, 3L, 0L, 2L, 2L, 3L, 2L, 0L, 2L),
    ipv_q2_en = c(0L, 0L, 3L, 0L, 2L, 2L, 0L, 1L, 0L, 3L),
    ipv_q3_en = c(0L, 1L, 3L, 2L, 1L, 2L, 0L, 1L, 0L, 2L),
    ipv_q4_en = c(0L, 0L, 3L, 0L, 2L, 2L, 0L, 1L, 0L, 3L)
  )

# reshape longer
df &lt;- df |&gt; 
  pivot_longer(
    !subject_id,
    names_to = &quot;question&quot;,
    names_pattern = &quot;ipv_q(\\d+)_en&quot;,
    values_to = &quot;answer&quot;)

# add case distinction
df |&gt; 
  group_by(subject_id) |&gt; 
  summarise(emotional_ipv = case_when(
    sum(answer) == 0 ~ &quot;never&quot;,
    sum(answer == 1) == 1 ~ &quot;isolated incident&quot;,
    sum(answer == 1) &gt; 1 ~ &quot;low frequency&quot;,
    sum(answer == 2) &gt;=1 &amp; !any(answer &gt; 2) ~ &quot;medium frequency&quot;,
    any(answer == 3) ~ &quot;high frequency&quot;
  ))
#&gt; # A tibble: 10 &#215; 2
#&gt;    subject_id emotional_ipv    
#&gt;    &lt;chr&gt;      &lt;chr&gt;            
#&gt;  1 191-0987   medium frequency 
#&gt;  2 191-1245   isolated incident
#&gt;  3 191-2365   medium frequency 
#&gt;  4 191-2710   never            
#&gt;  5 191-3457   high frequency   
#&gt;  6 191-4532   high frequency   
#&gt;  7 191-5098   high frequency   
#&gt;  8 191-5467   never            
#&gt;  9 191-6784   low frequency    
#&gt; 10 191-9901   low frequency

<sup>Created on 2023-03-03 with reprex v2.0.2</sup>

The reason why your ifelse() statements do not work is that you need to wrap them inside mutate() if you want to modify columns. If you prefer not to make your data longer, you need rowwise() to allow aggregation across columns.

huangapple
  • 本文由 发表于 2023年3月3日 17:54:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75625547.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定