问题

我已经翻译好了你提供的代码和注释，如下所示：

df <- data.frame(Person = c('111','334','334','334','334','334','888','888','888','888','888','888','888','888'), 
                 RelevantCase = c(0,1,1,0,1,0,1,0,1,0,0,1,0,1), 
                 StartDate = c('2017-03-04','2015-11-14','2018-04-26','2020-01-24','2020-01-25','2020-02-29','2015-08-09',
                          '2015-08-09','2018-04-10','2019-09-20','2020-06-30','2020-11-01','2021-08-13','2022-11-11'),
                 EndDate = c('2017-12-12','2022-01-25','2020-03-01','2021-02-24','2020-01-30','2022-02-02','2019-10-20',
                             '2019-10-30','2018-10-10','2021-10-10','2020-07-20','2022-11-20','2021-11-12','2023-01-01')
)

我明白你要创建两个新变量的需求，不过我不会回答这个问题。如果你有任何其他需要翻译的内容，请告诉我。

英文:

I have this dataset with variables Person, RelevantCase, StartDate, and EndDate:

df &lt;- data.frame(Person = c(&#39;111&#39;,&#39;334&#39;,&#39;334&#39;,&#39;334&#39;,&#39;334&#39;,&#39;334&#39;,&#39;888&#39;,&#39;888&#39;,&#39;888&#39;,&#39;888&#39;,&#39;888&#39;,&#39;888&#39;,&#39;888&#39;,&#39;888&#39;), 
                 RelevantCase = c(0,1,1,0,1,0,1,0,1,0,0,1,0,1), 
                 StartDate = c(&#39;2017-03-04&#39;,&#39;2015-11-14&#39;,&#39;2018-04-26&#39;,&#39;2020-01-24&#39;,&#39;2020-01-25&#39;,&#39;2020-02-29&#39;,&#39;2015-08-09&#39;,
                          &#39;2015-08-09&#39;,&#39;2018-04-10&#39;,&#39;2019-09-20&#39;,&#39;2020-06-30&#39;,&#39;2020-11-01&#39;,&#39;2021-08-13&#39;,&#39;2022-11-11&#39;),
                 EndDate = c(&#39;2017-12-12&#39;,&#39;2022-01-25&#39;,&#39;2020-03-01&#39;,&#39;2021-02-24&#39;,&#39;2020-01-30&#39;,&#39;2022-02-02&#39;,&#39;2019-10-20&#39;,
                             &#39;2019-10-30&#39;,&#39;2018-10-10&#39;,&#39;2021-10-10&#39;,&#39;2020-07-20&#39;,&#39;2022-11-20&#39;,&#39;2021-11-12&#39;,&#39;2023-01-01&#39;)
)

I want to create two new variables:

A count of the number of relevant open cases per Person. That is, I want to count how many relevant cases have

1.1. StartDates before the current cases' StartDate and

1.2. EndDates on or after the current StartDate.

By "relevant case" I mean that I want to only count observations with RelevantCase==1.

A count of the number of relevant open cases per Person that started within the last two years of the current StartDate. So, this is the same as the first new variable, but it will not count relevant open cases with StartDates that are more than two years prior to the current StartDate.

The resulting dataset should look like this:

df2 &lt;- data.frame(Person = c(&#39;111&#39;,&#39;334&#39;,&#39;334&#39;,&#39;334&#39;,&#39;334&#39;,&#39;334&#39;,&#39;888&#39;,&#39;888&#39;,&#39;888&#39;,&#39;888&#39;,&#39;888&#39;,&#39;888&#39;,&#39;888&#39;,&#39;888&#39;), 
                 RelevantCase = c(0,1,1,0,1,0,1,0,1,0,0,1,0,1), 
                 StartDate = c(&#39;2017-03-04&#39;,&#39;2015-11-14&#39;,&#39;2018-04-26&#39;,&#39;2020-01-24&#39;,&#39;2020-01-25&#39;,&#39;2020-02-29&#39;,&#39;2015-08-09&#39;,
                               &#39;2015-08-09&#39;,&#39;2018-04-10&#39;,&#39;2019-09-20&#39;,&#39;2020-06-30&#39;,&#39;2020-11-01&#39;,&#39;2021-08-13&#39;,&#39;2022-11-11&#39;),
                 EndDate = c(&#39;2017-12-12&#39;,&#39;2022-01-25&#39;,&#39;2020-03-01&#39;,&#39;2021-02-24&#39;,&#39;2020-01-30&#39;,&#39;2022-02-02&#39;,&#39;2019-10-20&#39;,
                             &#39;2019-10-30&#39;,&#39;2018-10-10&#39;,&#39;2021-10-10&#39;,&#39;2020-07-20&#39;,&#39;2022-11-20&#39;,&#39;2021-11-12&#39;,&#39;2023-01-01&#39;),
                 NumberOpenCases = c(0,0,1,2,2,2,0,0,1,1,0,0,1,1),
                 NumberOpenCases_2y = c(0,0,0,1,1,1,0,0,0,0,0,0,1,0)
)

答案1

得分: 1

这段代码的功能是通过循环遍历每个分组中的 StartDate 列，并检查所需条件来计算相关的开放案例数量。

英文:

This gives the number of relevant open cases by looping over StartDate column within each group and checking for the conditions desired.

library(dplyr)
library(purrr)

df %&gt;% 
  mutate(StartDate = as.Date(StartDate),
         EndDate = as.Date(EndDate)) %&gt;% 
  arrange(Person, StartDate, EndDate) %&gt;% 
  group_by(Person) %&gt;% 
  mutate(NumberOpenCases    = map_int(StartDate, ~sum(StartDate &lt; .x  &amp; 
                                                      EndDate &gt;= .x &amp; 
                                                      RelevantCase == 1)),
         NumberOpenCases_2y = map_int(StartDate, ~sum(StartDate &lt; .x  &amp; 
                                                      EndDate &gt;= .x &amp; 
                                                      RelevantCase == 1 &amp;
                                                      .x - StartDate &lt; 730)))
#&gt; # A tibble: 14 x 6
#&gt; # Groups:   Person [3]
#&gt;    Person RelevantCase StartDate  EndDate    NumberOpenCases NumberOpenCases_2y
#&gt;    &lt;chr&gt;         &lt;dbl&gt; &lt;date&gt;     &lt;date&gt;               &lt;int&gt;              &lt;int&gt;
#&gt;  1 111               0 2017-03-04 2017-12-12               0                  0
#&gt;  2 334               1 2015-11-14 2022-01-25               0                  0
#&gt;  3 334               1 2018-04-26 2020-03-01               1                  0
#&gt;  4 334               0 2020-01-24 2021-02-24               2                  1
#&gt;  5 334               1 2020-01-25 2020-01-30               2                  1
#&gt;  6 334               0 2020-02-29 2022-02-02               2                  1
#&gt;  7 888               1 2015-08-09 2019-10-20               0                  0
#&gt;  8 888               0 2015-08-09 2019-10-30               0                  0
#&gt;  9 888               1 2018-04-10 2018-10-10               1                  0
#&gt; 10 888               0 2019-09-20 2021-10-10               1                  0
#&gt; 11 888               0 2020-06-30 2020-07-20               0                  0
#&gt; 12 888               1 2020-11-01 2022-11-20               0                  0
#&gt; 13 888               0 2021-08-13 2021-11-12               1                  1
#&gt; 14 888               1 2022-11-11 2023-01-01               1                  0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

计算有时间限制和无时间限制的未结案件数。

问题

答案1

DataFrame转换：从多列到单列

如何使用cowplot和ggplot排列多个图。

如何循环以下 group_by

如何使ggPredict图中的数据点更透明？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论