英文:
Create a dummy variable based on two variables x1 and x2 (dummy=x1 only if at least one adjacent x2=yes)
问题
以下是用于演示生成数据框的代码:
d <- data.frame(
x1 = c(rep("no", 5), rep("yes", 4), rep("no", 2), rep("yes", 3), rep("no", 2), rep("yes", 3)),
x2 = c(rep("no", 6), rep("yes", 1), rep("no", 9), rep("yes", 2), rep("no", 1)),
dummy = c(rep(0, 5), rep(1, 4), rep(0, 5), rep(0, 2), rep(1, 3))
)
我有两个变量x1和x2。我想要一个名为'dummy'的虚拟变量,基于x1和x2的指标。具体来说,虚拟变量应该在捕获所有x1=yes值的情况下等于1,条件是其相邻的x2至少有一个为yes。如果x1=yes但其相邻的x2=no,则虚拟变量应为0。
要创建一个虚拟变量,当x1和x2都等于'yes'时取值为1,可以使用以下代码:
d$dummy = ifelse(d$x1 == "yes" & d$x2 == "yes", 1, 0)
但这不能捕获我希望实现的x1=yes的整个集群。
我希望的输出如下图所示:
有没有办法实现这个目标?
英文:
Below is the code to generate the dataframe for demonstration:
d<-data.frame(x1=c(rep("no",5),rep("yes",4),rep("no",2),rep("yes",3),rep("no",2),rep("yes",3)),
x2=c(rep("no",6),rep("yes",1),rep("no",9),rep("yes",2),rep("no",1)),
dummy=c(rep(0,5),rep(1,4),rep(0,5),rep(0,2),rep(1,3)))
I have two variables x1 and x2. What I want is a dummy variable, named as 'dummy', based on both x1 and x2 indicators. Specifically, the dummy should equal 1 by capturing all x1=yes values conditional that at least one of its adjacent x2=yes. If x1=yes but its adjacent x2=no, then the dummy should be 0.
It is easy to create a dummy variable taking value of 1 when both x1 and x2 equal 'yes', using
d$dummy=ifelse(d$x1=="yes" & d$x2=="yes",1,0)
But it would not be able to capture the whole cluster of x1=yes which is what I wish to do.
The desired output I am looking for is like this:
Any idea how this could be done?
答案1
得分: 1
你可以对consecutive_ids
进行分组,然后如果x1 == "yes"
且相邻的x2 == "yes"
,则得到1。
library(dplyr) #1.1.0+
d %>%
group_by(cons = consecutive_id(x1)) %>%
mutate(dummy = +(x1 == "yes" & any(x2 == "yes"))) %>%
ungroup()
x1 x2 dummy cons
<chr> <chr> <int> <int>
1 no no 0 1
2 no no 0 1
3 no no 0 1
4 no no 0 1
5 no no 0 1
6 yes no 1 2
7 yes yes 1 2
8 yes no 1 2
9 yes no 1 2
10 no no 0 3
11 no no 0 3
12 yes no 0 4
13 yes no 0 4
14 yes no 0 4
15 no no 0 5
16 no no 0 5
17 yes yes 1 6
18 yes yes 1 6
19 yes no 1 6
英文:
You can group_by
consecutive_ids, and then get 1 if x1 == "yes"
, and any adjacent x2 is "yes"
.
library(dplyr) #1.1.0+
d %>%
group_by(cons = consecutive_id(x1)) %>%
mutate(dummy = +(x1 == "yes" & any(x2 == "yes"))) %>%
ungroup()
x1 x2 dummy cons
<chr> <chr> <int> <int>
1 no no 0 1
2 no no 0 1
3 no no 0 1
4 no no 0 1
5 no no 0 1
6 yes no 1 2
7 yes yes 1 2
8 yes no 1 2
9 yes no 1 2
10 no no 0 3
11 no no 0 3
12 yes no 0 4
13 yes no 0 4
14 yes no 0 4
15 no no 0 5
16 no no 0 5
17 yes yes 1 6
18 yes yes 1 6
19 yes no 1 6
答案2
得分: 1
你可以使用 data.table::rleid()
来创建以 x1
为块的分组。
library(tidyverse)
library(data.table)
d %>% mutate(gp = rleid(x1)) %>%
group_by(gp) %>%
mutate(dummy2 = ifelse(x1 == "yes" & any(x2 == "yes"), 1, 0))
# 一个 tibble: 19 × 5
# Groups: gp [6]
x1 x2 dummy gp dummy2
<chr> <chr> <dbl> <int> <dbl>
1 no no 0 1 0
2 no no 0 1 0
3 no no 0 1 0
4 no no 0 1 0
5 no no 0 1 0
6 yes no 1 2 1
7 yes yes 1 2 1
8 yes no 1 2 1
9 yes no 1 2 1
10 no no 0 3 0
11 no no 0 3 0
12 yes no 0 4 0
13 yes no 0 4 0
14 yes no 0 4 0
15 no no 0 5 0
16 no no 0 5 0
17 yes yes 1 6 1
18 yes yes 1 6 1
19 yes no 1 6 1
英文:
You can use data.table::rleid()
to create groups with blocks of x1
.
library(tidyverse)
library(data.table)
d %>% mutate(gp = rleid(x1)) %>%
group_by(gp) %>%
mutate(dummy2 = ifelse(x1 == "yes" & any(x2 == "yes") , 1, 0))
# A tibble: 19 × 5
# Groups: gp [6]
x1 x2 dummy gp dummy2
<chr> <chr> <dbl> <int> <dbl>
1 no no 0 1 0
2 no no 0 1 0
3 no no 0 1 0
4 no no 0 1 0
5 no no 0 1 0
6 yes no 1 2 1
7 yes yes 1 2 1
8 yes no 1 2 1
9 yes no 1 2 1
10 no no 0 3 0
11 no no 0 3 0
12 yes no 0 4 0
13 yes no 0 4 0
14 yes no 0 4 0
15 no no 0 5 0
16 no no 0 5 0
17 yes yes 1 6 1
18 yes yes 1 6 1
19 yes no 1 6 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论