Grouping by ID, Grouping by time (within 5 minutes of each activity), Find Time Difference of Activity in R

huangapple go评论125阅读模式
英文:

Grouping by ID, Grouping by time (within 5 minutes of each activity), Find Time Difference of Activity in R

问题

Sure, here's the translated content:

有没有一种方法可以让R按ID分组,然后识别时间上的“断裂”,然后计算时间差?
例如:

  1. ID TIME
  2. A 12/18/2019 4:45:10 AM
  3. A 12/18/2019 4:45:11 AM
  4. A 12/18/2019 9:06:59 PM
  5. B 12/18/2019 4:14:13 AM
  6. B 12/18/2019 4:14:14 AM

有人知道如何找出A的时间持续时间吗?请注意,这不是一个difftime问题。我在上午4:45:10进行了某项活动,然后在上午4:45:11又进行了一次。然后我停止了这项活动,并在晚上9:06又重新开始了。是否有代码可以准确地分组ID,然后分组时间,同时检测时间上的巨大间隙,以避免不准确的值?

这不是正确的解决方案。

  1. diff<- data %>%
  2. mutate(diff = difftime(as.POSIXct(Endtime, format = "%m/%d/%Y %I:%M:%S %p"),
  3. as.POSIXct(Starttime, format = "%m/%d/%Y %I:%M:%S %p"), units = "secs"))

非常感谢任何帮助。
我将继续研究这个问题。谢谢。

英文:

Is there a way for R to group by ID, and then to identify a 'break' in time and then calculate time difference?
For instance:

  1. ID TIME
  2. A 12/18/2019 4:45:10 AM
  3. A 12/18/2019 4:45:11 AM
  4. A 12/18/2019 9:06:59 PM
  5. B 12/18/2019 4:14:13 AM
  6. B 12/18/2019 4:14:14 AM

Does anyone know of a way to find the time duration for A? Notice this is not a difftime problem. I performed a certain activity at 4:45:10 am, then again at 4:45:11 am. I then stopped this activity, and picked back up at 9:06pm. Is there code that can accurately group IDs, and then group time whilst detecting a huge gap in the time to avoid inaccurate values?

This is not the correct solution.

  1. diff&lt;- data %&gt;%
  2. mutate(diff = difftime(as.POSIXct(Endtime, format = &quot;%m/%d/%Y %I:%M:%S %p&quot;),
  3. as.POSIXct(Starttime, format = &quot;%m/%d/%Y %I:%M:%S %p&quot;), units = &quot;secs&quot;))

Any help is greatly appreciated.
I will continue to research this. Thank you

答案1

得分: 1

这是一种方法:

  1. library(lubridate)
  2. sample_df$TIME = mdy_hms(sample_df$TIME)
  3. sample_df = sample_df %>%
  4. group_by(ID) %>%
  5. # lag基本上将下一个值提前一步
  6. # 这样我们可以减去索引0和索引1、索引1和索引2等……
  7. mutate(time_diff = TIME - lag(TIME, n = 1, default = NA)) %>%
  8. mutate(time_diff = replace_na(time_diff, 0))

希望这能给你一些思路。为了理解,可以分为两步进行:

  1. sample_df = sample_df %>%
  2. group_by(ID) %>%
  3. mutate(time_lag = dplyr::lag(TIME, n = 1, default = NA)) %>%
  4. mutate(time_diff = TIME - time_lag) %>%
  5. mutate(time_diff = replace_na(time_diff, 0))

检查一下 time_lag 列的样子。

英文:

Here's a way to do:

  1. library(lubridate)
  2. sample_df$TIME = mdy_hms(sample_df$TIME)
  3. sample_df = sample_df %&gt;%
  4. group_by(ID) %&gt;%
  5. # lag basically bring the next value one step up
  6. # so we can subtract value at index 0 and index 1, index 1 and index 2 and so on....
  7. mutate(time_diff = TIME - lag(TIME, n = 1, default = NA)) %&gt;%
  8. mutate(time_diff = replace_na(time_diff, 0))

Hope this gives you some idea.
For understanding, do it in two steps:

  1. sample_df = sample_df %&gt;%
  2. group_by(ID) %&gt;%
  3. mutate(time_lag = dplyr::lag(TIME, n = 1, default = NA)) %&gt;%
  4. mutate(time_diff = TIME - time_lag) %&gt;%
  5. mutate(time_diff = replace_na(time_diff, 0))

Check how time_lag column looks.

答案2

得分: 1

就像我之前提到的那样,首先要将你的日期时间转换为日期时间对象;我使用lubridate来实现这一点。由于你想要在某个阈值内保持差异,我保存了一个阈值持续时间为5分钟,你可以根据需要进行更改。如果差异超过了这个阈值,就将它们设为NA

我将差异分为2步进行,这样你可以看到原始差异与去除长时间差异的差异。你可能只想在一步中完成这个操作。

  1. library(dplyr)
  2. library(lubridate)
  3. thresh <- duration(5, units = "minutes")
  4. sample_df %>%
  5. mutate(TIME = mdy_hms(TIME)) %>%
  6. group_by(ID) %>%
  7. mutate(diff1 = TIME - lag(TIME)) %>%
  8. mutate(delta = if_else(diff1 < thresh, diff1, NA_real_))
  9. #> # A tibble: 10 x 4
  10. #> # Groups: ID [3]
  11. #> ID TIME diff1 delta
  12. #> <chr> <dttm> <drtn> <drtn>
  13. #> 1 A 2019-12-18 04:45:10 NA secs NA secs
  14. #> 2 A 2019-12-18 04:45:11 1 secs 1 secs
  15. #> 3 A 2019-12-18 16:06:59 40908 secs NA secs
  16. #> 4 A 2019-12-18 16:07:01 2 secs 2 secs
  17. #> 5 B 2019-12-18 04:14:13 NA secs NA secs
  18. #> 6 B 2019-12-18 04:14:14 1 secs 1 secs
  19. #> 7 B 2019-12-18 04:14:15 1 secs 1 secs
  20. #> 8 C 2019-12-18 04:59:49 NA secs NA secs
  21. #> 9 C 2019-12-18 04:59:50 1 secs 1 secs
  22. #> 10 C 2019-12-18 04:59:51 1 secs 1 secs

使用dplyr::if_else而不是基本的ifelse很方便,因为它使用严格的类型,这有助于确保我将delta列保持为持续时间对象,而不是失去其时间组件并只获得一个数值,这将是使用NA而不是NA_real_的情况。

英文:

Like I mentioned above, the first thing to do is convert your date-times to a date-time object; I'm using lubridate for this. Since you want to keep delta within some threshold, I saved a threshold duration of 5 minutes which you can change as needed. If differences are more than that, make them NA.

I'm doing the diffing in 2 steps, just so you can see the original difference vs the one with long differences removed. You'll probably want to just do that in one step.

<!-- language-all: lang-r -->

  1. library(dplyr)
  2. library(lubridate)
  3. thresh &lt;- duration(5, units = &quot;minutes&quot;)
  4. sample_df %&gt;%
  5. mutate(TIME = mdy_hms(TIME)) %&gt;%
  6. group_by(ID) %&gt;%
  7. mutate(diff1 = TIME - lag(TIME)) %&gt;%
  8. mutate(delta = if_else(diff1 &lt; thresh, diff1, NA_real_))
  9. #&gt; # A tibble: 10 x 4
  10. #&gt; # Groups: ID [3]
  11. #&gt; ID TIME diff1 delta
  12. #&gt; &lt;chr&gt; &lt;dttm&gt; &lt;drtn&gt; &lt;drtn&gt;
  13. #&gt; 1 A 2019-12-18 04:45:10 NA secs NA secs
  14. #&gt; 2 A 2019-12-18 04:45:11 1 secs 1 secs
  15. #&gt; 3 A 2019-12-18 16:06:59 40908 secs NA secs
  16. #&gt; 4 A 2019-12-18 16:07:01 2 secs 2 secs
  17. #&gt; 5 B 2019-12-18 04:14:13 NA secs NA secs
  18. #&gt; 6 B 2019-12-18 04:14:14 1 secs 1 secs
  19. #&gt; 7 B 2019-12-18 04:14:15 1 secs 1 secs
  20. #&gt; 8 C 2019-12-18 04:59:49 NA secs NA secs
  21. #&gt; 9 C 2019-12-18 04:59:50 1 secs 1 secs
  22. #&gt; 10 C 2019-12-18 04:59:51 1 secs 1 secs

Using dplyr::if_else rather than the base ifelse was handy because it uses strict typing, which helped make sure I kept the delta column as a duration object, rather than losing its time component and just getting a numeric, which would be the case with NA instead of NA_real_.

huangapple
  • 本文由 发表于 2020年1月3日 23:22:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/59581106.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定