创建一个连续计数变量,根据POSIXct日期重置为1。

huangapple go评论75阅读模式
英文:

Create a count consecutive variable which resets to 1 based on POSIXct date

问题

以下是你要翻译的代码部分:

df <- data.frame(group=c(1, 1, 1, 1, 2, 2, 2), 
               date=c("2000-01-01 00:00:00", "2000-01-03 00:00:00", "2000-01-04 07:07:40", "2000-01-05 09:09:00", "2000-01-09 00:00:00", "2000-01-10 14:00:00", "2000-01-11 13:00:00"),
               want=c(1,1,2,3,1,2,1),
               want2=c(3,3,3,3,2,2,2))
library(anytime)
df <- df %>% mutate(date = anytime::anytime(str_c(date, sep= ' ')))
  group                date want want2
1     1 2000-01-01 00:00:00    1     3
2     1 2000-01-03 00:00:00    1     3
3     1 2000-01-04 07:07:40    2     3
4     1 2000-01-05 09:09:00    3     3
5     2 2000-01-09 00:00:00    1     2
6     2 2000-01-10 14:00:00    2     2
7     2 2000-01-11 13:00:00    1     2

我已经移除了不需要翻译的部分,只保留了代码。

英文:

Follow up to this: https://stackoverflow.com/questions/58293432/create-a-count-consecutive-variable-which-resets-to-1

and the solution worked great. Now I have below, where date is POSixct:

df&lt;-data.frame(group=c(1, 1, 1, 1, 2, 2, 2), 
               date=c(&quot;2000-01-01 00:00:00&quot;, &quot;2000-01-03 00:00:00&quot;, &quot;2000-01-04 07:07:40&quot;, &quot;2000-01-05 09:09:00&quot;, &quot;2000-01-09 00:00:00&quot;, &quot;2000-01-10 14:00:00&quot;, &quot;2000-01-11 13:00:00&quot;),
               want=c(1,1,2,3,1,2,1),
               want2=c(3,3,3,3,2,2,2))
library(anytime)
df&lt;-df %&gt;% mutate(date = anytime::anytime(str_c(date, sep= &#39; &#39;)))
  group                date want want2
1     1 2000-01-01 00:00:00    1     3
2     1 2000-01-03 00:00:00    1     3
3     1 2000-01-04 07:07:40    2     3
4     1 2000-01-05 09:09:00    3     3
5     2 2000-01-09 00:00:00    1     2
6     2 2000-01-10 14:00:00    2     2
7     2 2000-01-11 13:00:00    1     2

I want to begin counting when the 'next day' is after 24 hrs but before 48 hrs.

Trying this without success, because I think the diff function gives me a result in seconds:

df %&gt;%
    group_by(group) %&gt;%
    group_by(group2 = cumsum(c(TRUE, diff(date)&lt;86400&amp;diff(date)&gt;172800))), add = TRUE) %&gt;%
    mutate(wantn = row_number()) %&gt;%
    group_by(group) %&gt;%
    mutate(want2n = max(wantn)) %&gt;%       
    select(-group2)

答案1

得分: 1

这里,difftime()diff()更好的选择,因为可以指定单位。

如果我理解正确,一系列的POSIXct时间戳被认为是连续的,如果时间差为24小时或更多但少于48小时。

以下代码重新生成了示例数据集的预期结果:

library(dplyr)
library(magrittr)
df %>% 
  group_by(group) %>% 
  mutate(want = difftime(date, lag(date, default = date[1L]), units = "days") %>%
           floor() %>%
           equals(1) %>%
           not() %>%
           cumsum() %>%
           data.table::rowid(),
         want2 = max(want))

解释:

df %>% 
  group_by(group) %>% 
  mutate(delta = difftime(date, lag(date, default = date[1L]), units = "days"))

返回:

# A tibble: 7 x 5
# Groups:   group [2]
  group date                want want2 delta         
  <dbl> <dttm>              <dbl> <dbl> <drtn>        
1     1 2000-01-01 00:00:00     1     3 0.0000000 days
2     1 2000-01-03 00:00:00     1     3 2.0000000 days
3     1 2000-01-04 07:07:40     2     3 1.2969907 days
4     1 2000-01-05 09:09:00     3     3 1.0842593 days
5     2 2000-01-09 00:00:00     1     2 0.0000000 days
6     2 2000-01-10 14:00:00     2     2 1.5833333 days
7     2 2000-01-11 13:00:00     1     2 0.9583333 days

通过向下取整(floor()),可以用于Date情况的逻辑。

数据:

library(magrittr)
df <- data.frame(
  group = c(1, 1, 1, 1, 2, 2, 2),
  date = c(
    "2000-01-01 00:00:00",
    "2000-01-03 00:00:00",
    "2000-01-04 07:07:40",
    "2000-01-05 09:09:00",
    "2000-01-09 00:00:00",
    "2000-01-10 14:00:00",
    "2000-01-11 13:00:00"
  ) %>% lubridate::as_datetime(),
  want = c(1, 1, 2, 3, 1, 2, 1),
  want2 = c(3, 3, 3, 3, 2, 2, 2)
)
英文:

Here, difftime() is a better choice than diff() because the units can be specified.

If I understand correctly, a sequence of POSIXct timestamps is considered consecutive if the time difference is 24 hours or more but less than 48 hours.

The code below reproduces the expected result for the sample dataset:

library(dplyr)
library(magrittr)
df %&gt;% 
  group_by(group) %&gt;% 
  mutate(want = difftime(date, lag(date, default = date[1L]), units = &quot;days&quot;) %&gt;% 
           floor() %&gt;% 
           equals(1) %&gt;% 
           not() %&gt;% 
           cumsum() %&gt;% 
           data.table::rowid(),
         want2 = max(want))

> # A tibble: 7 x 4
> # Groups: group [2]
> group date want want2
> <dbl> <dttm> <int> <int>
> 1 1 2000-01-01 00:00:00 1 3
> 2 1 2000-01-03 00:00:00 1 3
> 3 1 2000-01-04 07:07:40 2 3
> 4 1 2000-01-05 09:09:00 3 3
> 5 2 2000-01-09 00:00:00 1 2
> 6 2 2000-01-10 14:00:00 2 2
> 7 2 2000-01-11 13:00:00 1 2

Explanation

df %&gt;% 
  group_by(group) %&gt;% 
  mutate(delta = difftime(date, lag(date, default = date[1L]), units = &quot;days&quot;))

returns

> # A tibble: 7 x 5
> # Groups: group [2]
> group date want want2 delta
> <dbl> <dttm> <dbl> <dbl> <drtn>
> 1 1 2000-01-01 00:00:00 1 3 0.0000000 days
> 2 1 2000-01-03 00:00:00 1 3 2.0000000 days
> 3 1 2000-01-04 07:07:40 2 3 1.2969907 days
> 4 1 2000-01-05 09:09:00 3 3 1.0842593 days
> 5 2 2000-01-09 00:00:00 1 2 0.0000000 days
> 6 2 2000-01-10 14:00:00 2 2 1.5833333 days
> 7 2 2000-01-11 13:00:00 1 2 0.9583333 days

By rounding down to the next lower integer (floor()), the logic for the Date case can be used.

Data

library(magrittr)
df &lt;- data.frame(
  group = c(1, 1, 1, 1, 2, 2, 2),
  date = c(
    &quot;2000-01-01 00:00:00&quot;,
    &quot;2000-01-03 00:00:00&quot;,
    &quot;2000-01-04 07:07:40&quot;,
    &quot;2000-01-05 09:09:00&quot;,
    &quot;2000-01-09 00:00:00&quot;,
    &quot;2000-01-10 14:00:00&quot;,
    &quot;2000-01-11 13:00:00&quot;
  ) %&gt;% lubridate::as_datetime(),
  want = c(1, 1, 2, 3, 1, 2, 1),
  want2 = c(3, 3, 3, 3, 2, 2, 2)
)

huangapple
  • 本文由 发表于 2020年1月7日 02:39:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/59617275.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定