在dplyr的if_else中如何编写多个“或”条件?

huangapple go评论71阅读模式
英文:

How to write multiple "or" conditions in the dplyr - if_else?

问题

以下是你要翻译的部分:

"这可能是基础知识。考虑一个数据框:

df <- data.frame(year = c(2006:2015),
               one = rep(2010, 10),
               two = rep(2011, 10),
               three = rep(2012, 10))

其中 one 是事件一发生的年份,two 是事件二发生的年份,three 是事件三发生的年份。我想构建一个变量 ha,如果距离这些事件发生不超过2年,就将其值设为1,否则设为0。

例如,在2009年,这些事件都没有发生,所以2009年的 ha 应为0。在2015年,距离事件三发生已经3年,距离事件一发生已经5年,所以 ha 应为0。但在2011年,事件一发生在去年,事件二刚刚发生,所以2011年的 ha 应为1。

最终结果应该如下所示:

0 0 0 0 1 1 1 1 1 0

然而,当我尝试在 dplyr 中使用 if_else 来评估多个“或”条件时,我未能获得所需的结果。以下是我的代码:

df <- df %>%
  mutate(
    ha = if_else(year - one %in% c(0, 1, 2) | year - two %in% c(0, 1, 2) | year - three %in% c(0, 1, 2), 1, 0)
  )

我想知道我的错误在哪里。"

英文:

This might be basic. Consider a data frame:

df&lt;-data.frame(year=c(2006:2015),
               one=rep(2010,10),
               two=rep(2011,10),
               three=rep(2012,10))

where one is the year that Event One occurred, two is the year that Event Two occurred, and three is the year that Event Three occurred. I want to construct a variable ha, which takes the value 1 if it has been at most 2 years since either of these events occurred, and 0 if otherwise.

For example, in 2009, neither of these events occurred, so ha for 2009 should be 0. In 2015, it has been 3 years since Event Three occurred and 5 years since Event One occurred, so ha should be 0. But in 2011, Event One occurred last year and Event Two just occurred, so ha for 2011 should be 1.

The ending result should look like this for ha:

0 0 0 0 1 1 1 1 1 0

However, when I try to use if_else in dplyr to evaluate multiple "or" conditions, I failed to get the desired result. Here is my code:

df&lt;-df%&gt;%
  mutate(
    ha=if_else(year-one%in%c(0,1,2)|year-two%in%c(0,1,2)|year-three%in%c(0,1,2),1,0)
  )

I wonder where my mistake is.

答案1

得分: 7

你的问题是优先级。%in% 首先被计算,因此它计算了 one%in%c(0,1,2) 而不是 year-one%in%c(0,1,2)。解决办法是在 year - one 周围加上括号:

df %>%
  mutate(
    ha = if_else((year - one) %in% c(0, 1, 2) | (year - two) %in% c(0, 1, 2) | (year - three) %in% c(0, 1, 2), 1, 0)
  )
英文:

Your problem is precedence. The %in% gets evaluated first, so it calculated one%in%c(0,1,2) instead of year-one%in%c(0,1,2). The solution is to wrap parentheses around year - one:

df%&gt;%
  mutate(
    ha=if_else((year-one)%in%c(0,1,2)|(year-two)%in%c(0,1,2)|(year-three)%in%c(0,1,2),1,0)
  )

答案2

得分: 0

以下是您要求的代码部分的翻译:

在基本R中

    a <- df[,1] - df[-1] 
    df$ha <- +(rowSums(a >= 0 & a <= 2) > 0)
    df
       year  one  two three ha
    1  2006 2010 2011  2012  0
    2  2007 2010 2011  2012  0
    3  2008 2010 2011  2012  0
    4  2009 2010 2011  2012  0
    5  2010 2010 2011  2012  1
    6  2011 2010 2011  2012  1
    7  2012 2010 2011  2012  1
    8  2013 2010 2011  2012  1
    9  2014 2010 2011  2012  1
    10 2015 2010 2011  2012  0

您要求的代码已经翻译完毕,没有其他内容。

英文:

in base R

a &lt;- df[,1] - df[-1] 
df$ha &lt;- +(rowSums(a &gt;=0 &amp; a&lt;=2)&gt;0)
df
   year  one  two three ha
1  2006 2010 2011  2012  0
2  2007 2010 2011  2012  0
3  2008 2010 2011  2012  0
4  2009 2010 2011  2012  0
5  2010 2010 2011  2012  1
6  2011 2010 2011  2012  1
7  2012 2010 2011  2012  1
8  2013 2010 2011  2012  1
9  2014 2010 2011  2012  1
10 2015 2010 2011  2012  0

You could also use apply:

as.numeric(apply(a &gt;= 0 &amp; a &lt;= 2, 1, any))
 [1] 0 0 0 0 1 1 1 1 1 0

using tidyverse:

df %&gt;%
  mutate(ha = +if_any(-year, ~ year - .x &gt;= 0 &amp; year -. x &lt;= 2))

   year  one  two three ha
1  2006 2010 2011  2012  0
2  2007 2010 2011  2012  0
3  2008 2010 2011  2012  0
4  2009 2010 2011  2012  0
5  2010 2010 2011  2012  1
6  2011 2010 2011  2012  1
7  2012 2010 2011  2012  1
8  2013 2010 2011  2012  1
9  2014 2010 2011  2012  1
10 2015 2010 2011  2012  0

huangapple
  • 本文由 发表于 2023年4月7日 03:17:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75953022.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定