2020年1月7日 02:39:36go评论80阅读模式

英文:

Create a count consecutive variable which resets to 1 based on POSIXct date

问题

以下是你要翻译的代码部分：

df <- data.frame(group=c(1, 1, 1, 1, 2, 2, 2), 
               date=c("2000-01-01 00:00:00", "2000-01-03 00:00:00", "2000-01-04 07:07:40", "2000-01-05 09:09:00", "2000-01-09 00:00:00", "2000-01-10 14:00:00", "2000-01-11 13:00:00"),
               want=c(1,1,2,3,1,2,1),
               want2=c(3,3,3,3,2,2,2))
library(anytime)
df <- df %>% mutate(date = anytime::anytime(str_c(date, sep= ' ')))

  group                date want want2
1     1 2000-01-01 00:00:00    1     3
2     1 2000-01-03 00:00:00    1     3
3     1 2000-01-04 07:07:40    2     3
4     1 2000-01-05 09:09:00    3     3
5     2 2000-01-09 00:00:00    1     2
6     2 2000-01-10 14:00:00    2     2
7     2 2000-01-11 13:00:00    1     2

我已经移除了不需要翻译的部分，只保留了代码。

英文:

Follow up to this: https://stackoverflow.com/questions/58293432/create-a-count-consecutive-variable-which-resets-to-1

and the solution worked great. Now I have below, where date is POSixct:

df&lt;-data.frame(group=c(1, 1, 1, 1, 2, 2, 2), 
               date=c(&quot;2000-01-01 00:00:00&quot;, &quot;2000-01-03 00:00:00&quot;, &quot;2000-01-04 07:07:40&quot;, &quot;2000-01-05 09:09:00&quot;, &quot;2000-01-09 00:00:00&quot;, &quot;2000-01-10 14:00:00&quot;, &quot;2000-01-11 13:00:00&quot;),
               want=c(1,1,2,3,1,2,1),
               want2=c(3,3,3,3,2,2,2))
library(anytime)
df&lt;-df %&gt;% mutate(date = anytime::anytime(str_c(date, sep= &#39; &#39;)))

  group                date want want2
1     1 2000-01-01 00:00:00    1     3
2     1 2000-01-03 00:00:00    1     3
3     1 2000-01-04 07:07:40    2     3
4     1 2000-01-05 09:09:00    3     3
5     2 2000-01-09 00:00:00    1     2
6     2 2000-01-10 14:00:00    2     2
7     2 2000-01-11 13:00:00    1     2

I want to begin counting when the 'next day' is after 24 hrs but before 48 hrs.

Trying this without success, because I think the diff function gives me a result in seconds:

df %&gt;%
    group_by(group) %&gt;%
    group_by(group2 = cumsum(c(TRUE, diff(date)&lt;86400&amp;diff(date)&gt;172800))), add = TRUE) %&gt;%
    mutate(wantn = row_number()) %&gt;%
    group_by(group) %&gt;%
    mutate(want2n = max(wantn)) %&gt;%       
    select(-group2)

答案1

得分: 1

这里，difftime()比diff()更好的选择，因为可以指定单位。

如果我理解正确，一系列的POSIXct时间戳被认为是连续的，如果时间差为24小时或更多但少于48小时。

以下代码重新生成了示例数据集的预期结果：

library(dplyr)
library(magrittr)
df %>% 
  group_by(group) %>% 
  mutate(want = difftime(date, lag(date, default = date[1L]), units = "days") %>%
           floor() %>%
           equals(1) %>%
           not() %>%
           cumsum() %>%
           data.table::rowid(),
         want2 = max(want))

解释：

df %>% 
  group_by(group) %>% 
  mutate(delta = difftime(date, lag(date, default = date[1L]), units = "days"))

# A tibble: 7 x 5
# Groups:   group [2]
  group date                want want2 delta         
  <dbl> <dttm>              <dbl> <dbl> <drtn>        
1     1 2000-01-01 00:00:00     1     3 0.0000000 days
2     1 2000-01-03 00:00:00     1     3 2.0000000 days
3     1 2000-01-04 07:07:40     2     3 1.2969907 days
4     1 2000-01-05 09:09:00     3     3 1.0842593 days
5     2 2000-01-09 00:00:00     1     2 0.0000000 days
6     2 2000-01-10 14:00:00     2     2 1.5833333 days
7     2 2000-01-11 13:00:00     1     2 0.9583333 days

通过向下取整（floor()），可以用于Date情况的逻辑。

数据：

library(magrittr)
df <- data.frame(
  group = c(1, 1, 1, 1, 2, 2, 2),
  date = c(
    "2000-01-01 00:00:00",
    "2000-01-03 00:00:00",
    "2000-01-04 07:07:40",
    "2000-01-05 09:09:00",
    "2000-01-09 00:00:00",
    "2000-01-10 14:00:00",
    "2000-01-11 13:00:00"
  ) %>% lubridate::as_datetime(),
  want = c(1, 1, 2, 3, 1, 2, 1),
  want2 = c(3, 3, 3, 3, 2, 2, 2)
)

英文:

Here, difftime() is a better choice than diff() because the units can be specified.

If I understand correctly, a sequence of POSIXct timestamps is considered consecutive if the time difference is 24 hours or more but less than 48 hours.

The code below reproduces the expected result for the sample dataset:

library(dplyr)
library(magrittr)
df %&gt;% 
  group_by(group) %&gt;% 
  mutate(want = difftime(date, lag(date, default = date[1L]), units = &quot;days&quot;) %&gt;% 
           floor() %&gt;% 
           equals(1) %&gt;% 
           not() %&gt;% 
           cumsum() %&gt;% 
           data.table::rowid(),
         want2 = max(want))

> # A tibble: 7 x 4
> # Groups: group [2]
> group date want want2
> <dbl> <dttm> <int> <int>
> 1 1 2000-01-01 00:00:00 1 3
> 2 1 2000-01-03 00:00:00 1 3
> 3 1 2000-01-04 07:07:40 2 3
> 4 1 2000-01-05 09:09:00 3 3
> 5 2 2000-01-09 00:00:00 1 2
> 6 2 2000-01-10 14:00:00 2 2
> 7 2 2000-01-11 13:00:00 1 2

Explanation

df %&gt;% 
  group_by(group) %&gt;% 
  mutate(delta = difftime(date, lag(date, default = date[1L]), units = &quot;days&quot;))

returns

> # A tibble: 7 x 5
> # Groups: group [2]
> group date want want2 delta
> <dbl> <dttm> <dbl> <dbl> <drtn>
> 1 1 2000-01-01 00:00:00 1 3 0.0000000 days
> 2 1 2000-01-03 00:00:00 1 3 2.0000000 days
> 3 1 2000-01-04 07:07:40 2 3 1.2969907 days
> 4 1 2000-01-05 09:09:00 3 3 1.0842593 days
> 5 2 2000-01-09 00:00:00 1 2 0.0000000 days
> 6 2 2000-01-10 14:00:00 2 2 1.5833333 days
> 7 2 2000-01-11 13:00:00 1 2 0.9583333 days

By rounding down to the next lower integer (floor()), the logic for the Date case can be used.

Data

library(magrittr)
df &lt;- data.frame(
  group = c(1, 1, 1, 1, 2, 2, 2),
  date = c(
    &quot;2000-01-01 00:00:00&quot;,
    &quot;2000-01-03 00:00:00&quot;,
    &quot;2000-01-04 07:07:40&quot;,
    &quot;2000-01-05 09:09:00&quot;,
    &quot;2000-01-09 00:00:00&quot;,
    &quot;2000-01-10 14:00:00&quot;,
    &quot;2000-01-11 13:00:00&quot;
  ) %&gt;% lubridate::as_datetime(),
  want = c(1, 1, 2, 3, 1, 2, 1),
  want2 = c(3, 3, 3, 3, 2, 2, 2)
)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建一个连续计数变量，根据POSIXct日期重置为1。

问题

答案1

Explanation

Data

如何在dplyr中避免使用省略号…？

在R中为岭回归模型绘制均方误差图的问题。

在RMarkdown中呈现生成的HTML。

自定义Tmap调色板基于结果

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论