2023年3月7日 18:19:19go评论90阅读模式

英文:

Create a dummy variable based on two variables x1 and x2 (dummy=x1 only if at least one adjacent x2=yes)

问题

以下是用于演示生成数据框的代码：

d <- data.frame(
  x1 = c(rep("no", 5), rep("yes", 4), rep("no", 2), rep("yes", 3), rep("no", 2), rep("yes", 3)),
  x2 = c(rep("no", 6), rep("yes", 1), rep("no", 9), rep("yes", 2), rep("no", 1)),
  dummy = c(rep(0, 5), rep(1, 4), rep(0, 5), rep(0, 2), rep(1, 3))
)

我有两个变量x1和x2。我想要一个名为'dummy'的虚拟变量，基于x1和x2的指标。具体来说，虚拟变量应该在捕获所有x1=yes值的情况下等于1，条件是其相邻的x2至少有一个为yes。如果x1=yes但其相邻的x2=no，则虚拟变量应为0。

要创建一个虚拟变量，当x1和x2都等于'yes'时取值为1，可以使用以下代码：

d$dummy = ifelse(d$x1 == "yes" & d$x2 == "yes", 1, 0)

但这不能捕获我希望实现的x1=yes的整个集群。

我希望的输出如下图所示：

有没有办法实现这个目标？

英文:

Below is the code to generate the dataframe for demonstration:

d&lt;-data.frame(x1=c(rep(&quot;no&quot;,5),rep(&quot;yes&quot;,4),rep(&quot;no&quot;,2),rep(&quot;yes&quot;,3),rep(&quot;no&quot;,2),rep(&quot;yes&quot;,3)),
              x2=c(rep(&quot;no&quot;,6),rep(&quot;yes&quot;,1),rep(&quot;no&quot;,9),rep(&quot;yes&quot;,2),rep(&quot;no&quot;,1)),
              dummy=c(rep(0,5),rep(1,4),rep(0,5),rep(0,2),rep(1,3)))

I have two variables x1 and x2. What I want is a dummy variable, named as 'dummy', based on both x1 and x2 indicators. Specifically, the dummy should equal 1 by capturing all x1=yes values conditional that at least one of its adjacent x2=yes. If x1=yes but its adjacent x2=no, then the dummy should be 0.

It is easy to create a dummy variable taking value of 1 when both x1 and x2 equal 'yes', using

d$dummy=ifelse(d$x1==&quot;yes&quot; &amp; d$x2==&quot;yes&quot;,1,0)

But it would not be able to capture the whole cluster of x1=yes which is what I wish to do.

The desired output I am looking for is like this:

Any idea how this could be done?

答案1

得分: 1

你可以对consecutive_ids进行分组，然后如果x1 == "yes"且相邻的x2 == "yes"，则得到1。

library(dplyr) #1.1.0+
d %>%
  group_by(cons = consecutive_id(x1)) %>%
  mutate(dummy = +(x1 == "yes" & any(x2 == "yes"))) %>%
  ungroup()
   x1    x2    dummy  cons
   <chr> <chr> <int>  <int>
 1 no    no        0     1
 2 no    no        0     1
 3 no    no        0     1
 4 no    no        0     1
 5 no    no        0     1
 6 yes   no        1     2
 7 yes   yes       1     2
 8 yes   no        1     2
 9 yes   no        1     2
10 no    no        0     3
11 no    no        0     3
12 yes   no        0     4
13 yes   no        0     4
14 yes   no        0     4
15 no    no        0     5
16 no    no        0     5
17 yes   yes       1     6
18 yes   yes       1     6
19 yes   no        1     6

英文:

You can group_by consecutive_ids, and then get 1 if x1 == "yes", and any adjacent x2 is "yes".

library(dplyr) #1.1.0+
d %&gt;% 
  group_by(cons = consecutive_id(x1)) %&gt;% 
  mutate(dummy = +(x1 == &quot;yes&quot; &amp; any(x2 == &quot;yes&quot;))) %&gt;%
  ungroup()
   x1    x2    dummy  cons
   &lt;chr&gt; &lt;chr&gt; &lt;int&gt; &lt;int&gt;
 1 no    no        0     1
 2 no    no        0     1
 3 no    no        0     1
 4 no    no        0     1
 5 no    no        0     1
 6 yes   no        1     2
 7 yes   yes       1     2
 8 yes   no        1     2
 9 yes   no        1     2
10 no    no        0     3
11 no    no        0     3
12 yes   no        0     4
13 yes   no        0     4
14 yes   no        0     4
15 no    no        0     5
16 no    no        0     5
17 yes   yes       1     6
18 yes   yes       1     6
19 yes   no        1     6

答案2

得分: 1

你可以使用 data.table::rleid() 来创建以 x1 为块的分组。

library(tidyverse)
library(data.table)
d %>% mutate(gp = rleid(x1)) %>% 
  group_by(gp) %>% 
  mutate(dummy2 = ifelse(x1 == "yes" & any(x2 == "yes"), 1, 0))
# 一个 tibble: 19 × 5
# Groups:   gp [6]
   x1    x2    dummy    gp dummy2
   <chr> <chr> <dbl> <int>  <dbl>
 1 no    no        0     1      0
 2 no    no        0     1      0
 3 no    no        0     1      0
 4 no    no        0     1      0
 5 no    no        0     1      0
 6 yes   no        1     2      1
 7 yes   yes       1     2      1
 8 yes   no        1     2      1
 9 yes   no        1     2      1
10 no    no        0     3      0
11 no    no        0     3      0
12 yes   no        0     4      0
13 yes   no        0     4      0
14 yes   no        0     4      0
15 no    no        0     5      0
16 no    no        0     5      0
17 yes   yes       1     6      1
18 yes   yes       1     6      1
19 yes   no        1     6      1

英文:

You can use data.table::rleid() to create groups with blocks of x1.

library(tidyverse)
library(data.table)
d %&gt;% mutate(gp = rleid(x1)) %&gt;% 
  group_by(gp) %&gt;% 
  mutate(dummy2 = ifelse(x1 == &quot;yes&quot; &amp; any(x2 == &quot;yes&quot;) , 1, 0))
# A tibble: 19 &#215; 5
# Groups:   gp [6]
   x1    x2    dummy    gp dummy2
   &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;int&gt;  &lt;dbl&gt;
 1 no    no        0     1      0
 2 no    no        0     1      0
 3 no    no        0     1      0
 4 no    no        0     1      0
 5 no    no        0     1      0
 6 yes   no        1     2      1
 7 yes   yes       1     2      1
 8 yes   no        1     2      1
 9 yes   no        1     2      1
10 no    no        0     3      0
11 no    no        0     3      0
12 yes   no        0     4      0
13 yes   no        0     4      0
14 yes   no        0     4      0
15 no    no        0     5      0
16 no    no        0     5      0
17 yes   yes       1     6      1
18 yes   yes       1     6      1
19 yes   no        1     6      1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Create a dummy variable based on two variables x1 and x2 (dummy=x1 only if at least one adjacent x2=yes)

问题

答案1

答案2

将numpy数组添加到Pandas数据帧单元格中会导致ValueError。

bbmap 更改原子向量值

使用正则表达式组来在pandas数据框中通过同时匹配多个模式来重命名列。

在R中运行GDC_prepare时出现的问题

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。