英文:
Grouping by ID, Grouping by time (within 5 minutes of each activity), Find Time Difference of Activity in R
问题
Sure, here's the translated content:
有没有一种方法可以让R按ID分组,然后识别时间上的“断裂”,然后计算时间差?
例如:
ID TIME
A 12/18/2019 4:45:10 AM
A 12/18/2019 4:45:11 AM
A 12/18/2019 9:06:59 PM
B 12/18/2019 4:14:13 AM
B 12/18/2019 4:14:14 AM
有人知道如何找出A的时间持续时间吗?请注意,这不是一个difftime问题。我在上午4:45:10进行了某项活动,然后在上午4:45:11又进行了一次。然后我停止了这项活动,并在晚上9:06又重新开始了。是否有代码可以准确地分组ID,然后分组时间,同时检测时间上的巨大间隙,以避免不准确的值?
这不是正确的解决方案。
diff<- data %>%
mutate(diff = difftime(as.POSIXct(Endtime, format = "%m/%d/%Y %I:%M:%S %p"),
as.POSIXct(Starttime, format = "%m/%d/%Y %I:%M:%S %p"), units = "secs"))
非常感谢任何帮助。
我将继续研究这个问题。谢谢。
英文:
Is there a way for R to group by ID, and then to identify a 'break' in time and then calculate time difference?
For instance:
ID TIME
A 12/18/2019 4:45:10 AM
A 12/18/2019 4:45:11 AM
A 12/18/2019 9:06:59 PM
B 12/18/2019 4:14:13 AM
B 12/18/2019 4:14:14 AM
Does anyone know of a way to find the time duration for A? Notice this is not a difftime problem. I performed a certain activity at 4:45:10 am, then again at 4:45:11 am. I then stopped this activity, and picked back up at 9:06pm. Is there code that can accurately group IDs, and then group time whilst detecting a huge gap in the time to avoid inaccurate values?
This is not the correct solution.
diff<- data %>%
mutate(diff = difftime(as.POSIXct(Endtime, format = "%m/%d/%Y %I:%M:%S %p"),
as.POSIXct(Starttime, format = "%m/%d/%Y %I:%M:%S %p"), units = "secs"))
Any help is greatly appreciated.
I will continue to research this. Thank you
答案1
得分: 1
这是一种方法:
library(lubridate)
sample_df$TIME = mdy_hms(sample_df$TIME)
sample_df = sample_df %>%
group_by(ID) %>%
# lag基本上将下一个值提前一步
# 这样我们可以减去索引0和索引1、索引1和索引2等……
mutate(time_diff = TIME - lag(TIME, n = 1, default = NA)) %>%
mutate(time_diff = replace_na(time_diff, 0))
希望这能给你一些思路。为了理解,可以分为两步进行:
sample_df = sample_df %>%
group_by(ID) %>%
mutate(time_lag = dplyr::lag(TIME, n = 1, default = NA)) %>%
mutate(time_diff = TIME - time_lag) %>%
mutate(time_diff = replace_na(time_diff, 0))
检查一下 time_lag
列的样子。
英文:
Here's a way to do:
library(lubridate)
sample_df$TIME = mdy_hms(sample_df$TIME)
sample_df = sample_df %>%
group_by(ID) %>%
# lag basically bring the next value one step up
# so we can subtract value at index 0 and index 1, index 1 and index 2 and so on....
mutate(time_diff = TIME - lag(TIME, n = 1, default = NA)) %>%
mutate(time_diff = replace_na(time_diff, 0))
Hope this gives you some idea.
For understanding, do it in two steps:
sample_df = sample_df %>%
group_by(ID) %>%
mutate(time_lag = dplyr::lag(TIME, n = 1, default = NA)) %>%
mutate(time_diff = TIME - time_lag) %>%
mutate(time_diff = replace_na(time_diff, 0))
Check how time_lag
column looks.
答案2
得分: 1
就像我之前提到的那样,首先要将你的日期时间转换为日期时间对象;我使用lubridate
来实现这一点。由于你想要在某个阈值内保持差异,我保存了一个阈值持续时间为5分钟,你可以根据需要进行更改。如果差异超过了这个阈值,就将它们设为NA
。
我将差异分为2步进行,这样你可以看到原始差异与去除长时间差异的差异。你可能只想在一步中完成这个操作。
library(dplyr)
library(lubridate)
thresh <- duration(5, units = "minutes")
sample_df %>%
mutate(TIME = mdy_hms(TIME)) %>%
group_by(ID) %>%
mutate(diff1 = TIME - lag(TIME)) %>%
mutate(delta = if_else(diff1 < thresh, diff1, NA_real_))
#> # A tibble: 10 x 4
#> # Groups: ID [3]
#> ID TIME diff1 delta
#> <chr> <dttm> <drtn> <drtn>
#> 1 A 2019-12-18 04:45:10 NA secs NA secs
#> 2 A 2019-12-18 04:45:11 1 secs 1 secs
#> 3 A 2019-12-18 16:06:59 40908 secs NA secs
#> 4 A 2019-12-18 16:07:01 2 secs 2 secs
#> 5 B 2019-12-18 04:14:13 NA secs NA secs
#> 6 B 2019-12-18 04:14:14 1 secs 1 secs
#> 7 B 2019-12-18 04:14:15 1 secs 1 secs
#> 8 C 2019-12-18 04:59:49 NA secs NA secs
#> 9 C 2019-12-18 04:59:50 1 secs 1 secs
#> 10 C 2019-12-18 04:59:51 1 secs 1 secs
使用dplyr::if_else
而不是基本的ifelse
很方便,因为它使用严格的类型,这有助于确保我将delta列保持为持续时间对象,而不是失去其时间组件并只获得一个数值,这将是使用NA
而不是NA_real_
的情况。
英文:
Like I mentioned above, the first thing to do is convert your date-times to a date-time object; I'm using lubridate
for this. Since you want to keep delta within some threshold, I saved a threshold duration of 5 minutes which you can change as needed. If differences are more than that, make them NA
.
I'm doing the diffing in 2 steps, just so you can see the original difference vs the one with long differences removed. You'll probably want to just do that in one step.
<!-- language-all: lang-r -->
library(dplyr)
library(lubridate)
thresh <- duration(5, units = "minutes")
sample_df %>%
mutate(TIME = mdy_hms(TIME)) %>%
group_by(ID) %>%
mutate(diff1 = TIME - lag(TIME)) %>%
mutate(delta = if_else(diff1 < thresh, diff1, NA_real_))
#> # A tibble: 10 x 4
#> # Groups: ID [3]
#> ID TIME diff1 delta
#> <chr> <dttm> <drtn> <drtn>
#> 1 A 2019-12-18 04:45:10 NA secs NA secs
#> 2 A 2019-12-18 04:45:11 1 secs 1 secs
#> 3 A 2019-12-18 16:06:59 40908 secs NA secs
#> 4 A 2019-12-18 16:07:01 2 secs 2 secs
#> 5 B 2019-12-18 04:14:13 NA secs NA secs
#> 6 B 2019-12-18 04:14:14 1 secs 1 secs
#> 7 B 2019-12-18 04:14:15 1 secs 1 secs
#> 8 C 2019-12-18 04:59:49 NA secs NA secs
#> 9 C 2019-12-18 04:59:50 1 secs 1 secs
#> 10 C 2019-12-18 04:59:51 1 secs 1 secs
Using dplyr::if_else
rather than the base ifelse
was handy because it uses strict typing, which helped make sure I kept the delta column as a duration object, rather than losing its time component and just getting a numeric, which would be the case with NA
instead of NA_real_
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论