Create a dummy variable based on two variables x1 and x2 (dummy=x1 only if at least one adjacent x2=yes)

huangapple go评论90阅读模式
英文:

Create a dummy variable based on two variables x1 and x2 (dummy=x1 only if at least one adjacent x2=yes)

问题

以下是用于演示生成数据框的代码:

  1. d <- data.frame(
  2. x1 = c(rep("no", 5), rep("yes", 4), rep("no", 2), rep("yes", 3), rep("no", 2), rep("yes", 3)),
  3. x2 = c(rep("no", 6), rep("yes", 1), rep("no", 9), rep("yes", 2), rep("no", 1)),
  4. dummy = c(rep(0, 5), rep(1, 4), rep(0, 5), rep(0, 2), rep(1, 3))
  5. )

我有两个变量x1和x2。我想要一个名为'dummy'的虚拟变量,基于x1和x2的指标。具体来说,虚拟变量应该在捕获所有x1=yes值的情况下等于1,条件是其相邻的x2至少有一个为yes。如果x1=yes但其相邻的x2=no,则虚拟变量应为0。

要创建一个虚拟变量,当x1和x2都等于'yes'时取值为1,可以使用以下代码:

  1. d$dummy = ifelse(d$x1 == "yes" & d$x2 == "yes", 1, 0)

但这不能捕获我希望实现的x1=yes的整个集群。

我希望的输出如下图所示:

Create a dummy variable based on two variables x1 and x2 (dummy=x1 only if at least one adjacent x2=yes)

有没有办法实现这个目标?

英文:

Below is the code to generate the dataframe for demonstration:

  1. d&lt;-data.frame(x1=c(rep(&quot;no&quot;,5),rep(&quot;yes&quot;,4),rep(&quot;no&quot;,2),rep(&quot;yes&quot;,3),rep(&quot;no&quot;,2),rep(&quot;yes&quot;,3)),
  2. x2=c(rep(&quot;no&quot;,6),rep(&quot;yes&quot;,1),rep(&quot;no&quot;,9),rep(&quot;yes&quot;,2),rep(&quot;no&quot;,1)),
  3. dummy=c(rep(0,5),rep(1,4),rep(0,5),rep(0,2),rep(1,3)))

I have two variables x1 and x2. What I want is a dummy variable, named as 'dummy', based on both x1 and x2 indicators. Specifically, the dummy should equal 1 by capturing all x1=yes values conditional that at least one of its adjacent x2=yes. If x1=yes but its adjacent x2=no, then the dummy should be 0.

It is easy to create a dummy variable taking value of 1 when both x1 and x2 equal 'yes', using

  1. d$dummy=ifelse(d$x1==&quot;yes&quot; &amp; d$x2==&quot;yes&quot;,1,0)

But it would not be able to capture the whole cluster of x1=yes which is what I wish to do.

The desired output I am looking for is like this:
Create a dummy variable based on two variables x1 and x2 (dummy=x1 only if at least one adjacent x2=yes)

Any idea how this could be done?

答案1

得分: 1

你可以对consecutive_ids进行分组,然后如果x1 == "yes"且相邻的x2 == "yes",则得到1。

  1. library(dplyr) #1.1.0+
  2. d %>%
  3. group_by(cons = consecutive_id(x1)) %>%
  4. mutate(dummy = +(x1 == "yes" & any(x2 == "yes"))) %>%
  5. ungroup()
  6. x1 x2 dummy cons
  7. <chr> <chr> <int> <int>
  8. 1 no no 0 1
  9. 2 no no 0 1
  10. 3 no no 0 1
  11. 4 no no 0 1
  12. 5 no no 0 1
  13. 6 yes no 1 2
  14. 7 yes yes 1 2
  15. 8 yes no 1 2
  16. 9 yes no 1 2
  17. 10 no no 0 3
  18. 11 no no 0 3
  19. 12 yes no 0 4
  20. 13 yes no 0 4
  21. 14 yes no 0 4
  22. 15 no no 0 5
  23. 16 no no 0 5
  24. 17 yes yes 1 6
  25. 18 yes yes 1 6
  26. 19 yes no 1 6
英文:

You can group_by consecutive_ids, and then get 1 if x1 == &quot;yes&quot;, and any adjacent x2 is &quot;yes&quot;.

  1. library(dplyr) #1.1.0+
  2. d %&gt;%
  3. group_by(cons = consecutive_id(x1)) %&gt;%
  4. mutate(dummy = +(x1 == &quot;yes&quot; &amp; any(x2 == &quot;yes&quot;))) %&gt;%
  5. ungroup()
  6. x1 x2 dummy cons
  7. &lt;chr&gt; &lt;chr&gt; &lt;int&gt; &lt;int&gt;
  8. 1 no no 0 1
  9. 2 no no 0 1
  10. 3 no no 0 1
  11. 4 no no 0 1
  12. 5 no no 0 1
  13. 6 yes no 1 2
  14. 7 yes yes 1 2
  15. 8 yes no 1 2
  16. 9 yes no 1 2
  17. 10 no no 0 3
  18. 11 no no 0 3
  19. 12 yes no 0 4
  20. 13 yes no 0 4
  21. 14 yes no 0 4
  22. 15 no no 0 5
  23. 16 no no 0 5
  24. 17 yes yes 1 6
  25. 18 yes yes 1 6
  26. 19 yes no 1 6

答案2

得分: 1

你可以使用 data.table::rleid() 来创建以 x1 为块的分组。

  1. library(tidyverse)
  2. library(data.table)
  3. d %>% mutate(gp = rleid(x1)) %>%
  4. group_by(gp) %>%
  5. mutate(dummy2 = ifelse(x1 == "yes" & any(x2 == "yes"), 1, 0))
  6. # 一个 tibble: 19 × 5
  7. # Groups: gp [6]
  8. x1 x2 dummy gp dummy2
  9. <chr> <chr> <dbl> <int> <dbl>
  10. 1 no no 0 1 0
  11. 2 no no 0 1 0
  12. 3 no no 0 1 0
  13. 4 no no 0 1 0
  14. 5 no no 0 1 0
  15. 6 yes no 1 2 1
  16. 7 yes yes 1 2 1
  17. 8 yes no 1 2 1
  18. 9 yes no 1 2 1
  19. 10 no no 0 3 0
  20. 11 no no 0 3 0
  21. 12 yes no 0 4 0
  22. 13 yes no 0 4 0
  23. 14 yes no 0 4 0
  24. 15 no no 0 5 0
  25. 16 no no 0 5 0
  26. 17 yes yes 1 6 1
  27. 18 yes yes 1 6 1
  28. 19 yes no 1 6 1
英文:

You can use data.table::rleid() to create groups with blocks of x1.

  1. library(tidyverse)
  2. library(data.table)
  3. d %&gt;% mutate(gp = rleid(x1)) %&gt;%
  4. group_by(gp) %&gt;%
  5. mutate(dummy2 = ifelse(x1 == &quot;yes&quot; &amp; any(x2 == &quot;yes&quot;) , 1, 0))
  6. # A tibble: 19 &#215; 5
  7. # Groups: gp [6]
  8. x1 x2 dummy gp dummy2
  9. &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt;
  10. 1 no no 0 1 0
  11. 2 no no 0 1 0
  12. 3 no no 0 1 0
  13. 4 no no 0 1 0
  14. 5 no no 0 1 0
  15. 6 yes no 1 2 1
  16. 7 yes yes 1 2 1
  17. 8 yes no 1 2 1
  18. 9 yes no 1 2 1
  19. 10 no no 0 3 0
  20. 11 no no 0 3 0
  21. 12 yes no 0 4 0
  22. 13 yes no 0 4 0
  23. 14 yes no 0 4 0
  24. 15 no no 0 5 0
  25. 16 no no 0 5 0
  26. 17 yes yes 1 6 1
  27. 18 yes yes 1 6 1
  28. 19 yes no 1 6 1

huangapple
  • 本文由 发表于 2023年3月7日 18:19:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/75660671.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定