英文:
Create a count consecutive variable which resets to 1 based on POSIXct date
问题
以下是你要翻译的代码部分:
df <- data.frame(group=c(1, 1, 1, 1, 2, 2, 2),
date=c("2000-01-01 00:00:00", "2000-01-03 00:00:00", "2000-01-04 07:07:40", "2000-01-05 09:09:00", "2000-01-09 00:00:00", "2000-01-10 14:00:00", "2000-01-11 13:00:00"),
want=c(1,1,2,3,1,2,1),
want2=c(3,3,3,3,2,2,2))
library(anytime)
df <- df %>% mutate(date = anytime::anytime(str_c(date, sep= ' ')))
group date want want2
1 1 2000-01-01 00:00:00 1 3
2 1 2000-01-03 00:00:00 1 3
3 1 2000-01-04 07:07:40 2 3
4 1 2000-01-05 09:09:00 3 3
5 2 2000-01-09 00:00:00 1 2
6 2 2000-01-10 14:00:00 2 2
7 2 2000-01-11 13:00:00 1 2
我已经移除了不需要翻译的部分,只保留了代码。
英文:
Follow up to this: https://stackoverflow.com/questions/58293432/create-a-count-consecutive-variable-which-resets-to-1
and the solution worked great. Now I have below, where date is POSixct:
df<-data.frame(group=c(1, 1, 1, 1, 2, 2, 2),
date=c("2000-01-01 00:00:00", "2000-01-03 00:00:00", "2000-01-04 07:07:40", "2000-01-05 09:09:00", "2000-01-09 00:00:00", "2000-01-10 14:00:00", "2000-01-11 13:00:00"),
want=c(1,1,2,3,1,2,1),
want2=c(3,3,3,3,2,2,2))
library(anytime)
df<-df %>% mutate(date = anytime::anytime(str_c(date, sep= ' ')))
group date want want2
1 1 2000-01-01 00:00:00 1 3
2 1 2000-01-03 00:00:00 1 3
3 1 2000-01-04 07:07:40 2 3
4 1 2000-01-05 09:09:00 3 3
5 2 2000-01-09 00:00:00 1 2
6 2 2000-01-10 14:00:00 2 2
7 2 2000-01-11 13:00:00 1 2
I want to begin counting when the 'next day' is after 24 hrs but before 48 hrs.
Trying this without success, because I think the diff function gives me a result in seconds:
df %>%
group_by(group) %>%
group_by(group2 = cumsum(c(TRUE, diff(date)<86400&diff(date)>172800))), add = TRUE) %>%
mutate(wantn = row_number()) %>%
group_by(group) %>%
mutate(want2n = max(wantn)) %>%
select(-group2)
答案1
得分: 1
这里,difftime()
比diff()
更好的选择,因为可以指定单位。
如果我理解正确,一系列的POSIXct
时间戳被认为是连续的,如果时间差为24小时或更多但少于48小时。
以下代码重新生成了示例数据集的预期结果:
library(dplyr)
library(magrittr)
df %>%
group_by(group) %>%
mutate(want = difftime(date, lag(date, default = date[1L]), units = "days") %>%
floor() %>%
equals(1) %>%
not() %>%
cumsum() %>%
data.table::rowid(),
want2 = max(want))
解释:
df %>%
group_by(group) %>%
mutate(delta = difftime(date, lag(date, default = date[1L]), units = "days"))
返回:
# A tibble: 7 x 5
# Groups: group [2]
group date want want2 delta
<dbl> <dttm> <dbl> <dbl> <drtn>
1 1 2000-01-01 00:00:00 1 3 0.0000000 days
2 1 2000-01-03 00:00:00 1 3 2.0000000 days
3 1 2000-01-04 07:07:40 2 3 1.2969907 days
4 1 2000-01-05 09:09:00 3 3 1.0842593 days
5 2 2000-01-09 00:00:00 1 2 0.0000000 days
6 2 2000-01-10 14:00:00 2 2 1.5833333 days
7 2 2000-01-11 13:00:00 1 2 0.9583333 days
通过向下取整(floor()
),可以用于Date
情况的逻辑。
数据:
library(magrittr)
df <- data.frame(
group = c(1, 1, 1, 1, 2, 2, 2),
date = c(
"2000-01-01 00:00:00",
"2000-01-03 00:00:00",
"2000-01-04 07:07:40",
"2000-01-05 09:09:00",
"2000-01-09 00:00:00",
"2000-01-10 14:00:00",
"2000-01-11 13:00:00"
) %>% lubridate::as_datetime(),
want = c(1, 1, 2, 3, 1, 2, 1),
want2 = c(3, 3, 3, 3, 2, 2, 2)
)
英文:
Here, difftime()
is a better choice than diff()
because the units can be specified.
If I understand correctly, a sequence of POSIXct
timestamps is considered consecutive if the time difference is 24 hours or more but less than 48 hours.
The code below reproduces the expected result for the sample dataset:
library(dplyr)
library(magrittr)
df %>%
group_by(group) %>%
mutate(want = difftime(date, lag(date, default = date[1L]), units = "days") %>%
floor() %>%
equals(1) %>%
not() %>%
cumsum() %>%
data.table::rowid(),
want2 = max(want))
> # A tibble: 7 x 4
> # Groups: group [2]
> group date want want2
> <dbl> <dttm> <int> <int>
> 1 1 2000-01-01 00:00:00 1 3
> 2 1 2000-01-03 00:00:00 1 3
> 3 1 2000-01-04 07:07:40 2 3
> 4 1 2000-01-05 09:09:00 3 3
> 5 2 2000-01-09 00:00:00 1 2
> 6 2 2000-01-10 14:00:00 2 2
> 7 2 2000-01-11 13:00:00 1 2
Explanation
df %>%
group_by(group) %>%
mutate(delta = difftime(date, lag(date, default = date[1L]), units = "days"))
returns
> # A tibble: 7 x 5
> # Groups: group [2]
> group date want want2 delta
> <dbl> <dttm> <dbl> <dbl> <drtn>
> 1 1 2000-01-01 00:00:00 1 3 0.0000000 days
> 2 1 2000-01-03 00:00:00 1 3 2.0000000 days
> 3 1 2000-01-04 07:07:40 2 3 1.2969907 days
> 4 1 2000-01-05 09:09:00 3 3 1.0842593 days
> 5 2 2000-01-09 00:00:00 1 2 0.0000000 days
> 6 2 2000-01-10 14:00:00 2 2 1.5833333 days
> 7 2 2000-01-11 13:00:00 1 2 0.9583333 days
By rounding down to the next lower integer (floor()
), the logic for the Date
case can be used.
Data
library(magrittr)
df <- data.frame(
group = c(1, 1, 1, 1, 2, 2, 2),
date = c(
"2000-01-01 00:00:00",
"2000-01-03 00:00:00",
"2000-01-04 07:07:40",
"2000-01-05 09:09:00",
"2000-01-09 00:00:00",
"2000-01-10 14:00:00",
"2000-01-11 13:00:00"
) %>% lubridate::as_datetime(),
want = c(1, 1, 2, 3, 1, 2, 1),
want2 = c(3, 3, 3, 3, 2, 2, 2)
)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论