将满足条件的第一行和最后一行之间的连续行分类。

huangapple go评论99阅读模式
英文:

classify runs of rows between first and last rows that meet condition

问题

  1. # Load necessary libraries
  2. library(dplyr)
  3. # Define the dataset
  4. id <- rep.int(c("A"), times = c(22))
  5. datetimes <- c(seq(lubridate::ymd_hms('2021-05-21T06:00:00'), lubridate::ymd_hms('2021-05-23T23:00:00'), by = "3 hours"))
  6. commute <- c('am_commute', 'am_commute', 'stationary', 'stationary', 'pm_commute',
  7. 'stationary', 'stationary', 'stationary', 'am_commute', 'stationary',
  8. 'stationary', 'stationary', 'stationary', 'pm_commute', 'pm_commute',
  9. 'stationary', 'am_commute', 'am_commute', 'stationary', 'stationary',
  10. 'pm_commute', 'pm_commute')
  11. data <- data.frame(id, datetimes, commute)
  12. # Create the time_of_day variable
  13. data <- data %>%
  14. group_by(id) %>%
  15. arrange(datetimes) %>%
  16. mutate(
  17. first_am_commute = first(commute == 'am_commute' & !is.na(commute)),
  18. first_pm_commute = first(commute == 'pm_commute' & !is.na(commute)),
  19. night_start = lag(first_pm_commute),
  20. time_of_day = case_when(
  21. is.na(night_start) ~ NA_character_,
  22. datetimes >= first_am_commute & datetimes < first_pm_commute ~ 'day',
  23. datetimes >= night_start & datetimes < first_am_commute ~ 'night',
  24. TRUE ~ NA_character_
  25. )
  26. )
  27. # Print the resulting dataset
  28. data

This code uses the dplyr package to create the time_of_day variable based on the morning and evening commute points for each animal. The resulting dataset should match the desired results you provided.

英文:

I have movement data from different animals (id) that commute to and from a central location each day, departing in the morning ('am_commute') and returning in the evening ('pm_commute'). Some commutes are longer than others, hence sometimes multiple consecutive points are associated with a commute. Here is a simplified version with just one individual, where all rows not associated with commuting movements are labeled as 'stationary'.

  1. id &lt;- rep.int(c(&quot;A&quot;), times = c(22))
  2. datetimes &lt;- c(seq(lubridate::ymd_hms(&#39;2021-05-21T06:00:00&#39;), lubridate::ymd_hms(&#39;2021-05-23T23:00:00&#39;), by = &quot;3 hours&quot;))
  3. commute &lt;- c(&#39;am_commute&#39;, &#39;am_commute&#39;, &#39;stationary&#39;, &#39;stationary&#39;, &#39;pm_commute&#39;,
  4. &#39;stationary&#39;, &#39;stationary&#39;, &#39;stationary&#39;, &#39;am_commute&#39;, &#39;stationary&#39;,
  5. &#39;stationary&#39;, &#39;stationary&#39;, &#39;stationary&#39;, &#39;pm_commute&#39;, &#39;pm_commute&#39;,
  6. &#39;stationary&#39;, &#39;am_commute&#39;, &#39;am_commute&#39;, &#39;stationary&#39;, &#39;stationary&#39;,
  7. &#39;pm_commute&#39;, &#39;pm_commute&#39;)
  8. data &lt;- data.frame(id, datetimes, commute)

I want to create a new variable (time_of_day) that indicates whether each stationary point is part of the animal's daytime or nighttime movements. In the real dataset, this cannot be defined simply by time of day, as there are multiple individuals, each that returns to the central location at different times. For each animal, "day" is when they are away from this location, and "night" is when they are back at this location. I therefore want to define "day" and "night" based on morning and evening commutes, such that "day" points are all points on a given day between the animal's first morning commute point and first evening commute point, and "night" points are all points between the first evening commute point and the first morning commute on the following day.

For this dataset, the desired results would look like this:

  1. id datetimes commute time_of_day
  2. 1 A 2021-05-21 06:00:00 am_commute &lt;NA&gt;
  3. 2 A 2021-05-21 09:00:00 am_commute &lt;NA&gt;
  4. 3 A 2021-05-21 12:00:00 stationary day
  5. 4 A 2021-05-21 15:00:00 stationary day
  6. 5 A 2021-05-21 18:00:00 pm_commute &lt;NA&gt;
  7. 6 A 2021-05-21 21:00:00 stationary night
  8. 7 A 2021-05-22 00:00:00 stationary night
  9. 8 A 2021-05-22 03:00:00 stationary night
  10. 9 A 2021-05-22 06:00:00 am_commute &lt;NA&gt;
  11. 10 A 2021-05-22 09:00:00 stationary day
  12. 11 A 2021-05-22 12:00:00 stationary day
  13. 12 A 2021-05-22 15:00:00 stationary day
  14. 13 A 2021-05-22 18:00:00 stationary day
  15. 14 A 2021-05-22 21:00:00 pm_commute &lt;NA&gt;
  16. 15 A 2021-05-23 00:00:00 pm_commute &lt;NA&gt;
  17. 16 A 2021-05-23 03:00:00 stationary night
  18. 17 A 2021-05-23 06:00:00 am_commute &lt;NA&gt;
  19. 18 A 2021-05-23 09:00:00 am_commute &lt;NA&gt;
  20. 19 A 2021-05-23 12:00:00 stationary day
  21. 20 A 2021-05-23 15:00:00 stationary day
  22. 21 A 2021-05-23 18:00:00 pm_commute &lt;NA&gt;
  23. 22 A 2021-05-23 21:00:00 pm_commute &lt;NA&gt;

I have tried achieving this with dplyr using a combination of mutate(), case_when(), first() and lead(), but I cannot figure out how to reference the value of the commute column on the following date. Can this all be done in a piped workflow using dplyr?

答案1

得分: 1

One way would be to label the first day/night event, then fill in the 'stationary' points based on those values, then change 'am_commute' and 'pm_commute' commutes to NA, e.g.

  1. library(tidyverse)
  2. library(vctrs, warn = FALSE)
  3. id <- rep.int(c("A"), times = c(22))
  4. datetimes <- c(seq(ymd_hms('2021-05-21T06:00:00'), ymd_hms('2021-05-23T23:00:00'), by = "3 hours"))
  5. commute <- c('am_commute', 'am_commute', 'stationary', 'stationary', 'pm_commute',
  6. 'stationary', 'stationary', 'stationary', 'am_commute', 'stationary',
  7. 'stationary', 'stationary', 'stationary', 'pm_commute', 'pm_commute',
  8. 'stationary', 'am_commute', 'am_commute', 'stationary', 'stationary',
  9. 'pm_commute', 'pm_commute')
  10. data <- data.frame(id, datetimes, commute)
  11. data %>%
  12. group_by(id) %>%
  13. mutate(time_of_day = case_when(commute == "stationary" &
  14. lag(commute) == "am_commute" ~ "day",
  15. commute == "stationary" &
  16. lag(commute) == "pm_commute" ~ "night")) %>%
  17. mutate(time_of_day = vec_fill_missing(time_of_day, "down")) %>%
  18. mutate(time_of_day = if_else(commute != "stationary", NA, time_of_day)) %>%
  19. ungroup() %>%
  20. print(n = 22)
  21. #> # A tibble: 22 × 4
  22. #> id datetimes commute time_of_day
  23. #> <chr> <dttm> <chr> <chr>
  24. #> 1 A 2021-05-21 06:00:00 am_commute <NA>
  25. #> 2 A 2021-05-21 09:00:00 am_commute <NA>
  26. #> 3 A 2021-05-21 12:00:00 stationary day
  27. #> 4 A 2021-05-21 15:00:00 stationary day
  28. #> 5 A 2021-05-21 18:00:00 pm_commute <NA>
  29. #> 6 A 2021-05-21 21:00:00 stationary night
  30. #> 7 A 2021-05-22 00:00:00 stationary night
  31. #> 8 A 2021-05-22 03:00:00 stationary night
  32. #> 9 A 2021-05-22 06:00:00 am_commute <NA>
  33. #> 10 A 2021-05-22 09:00:00 stationary day
  34. #> 11 A 2021-05-22 12:00:00 stationary day
  35. #> 12 A 2021-05-22 15:00:00 stationary day
  36. #> 13 A 2021-05-22 18:00:00 stationary day
  37. #> 14 A 2021-05-22 21:00:00 pm_commute <NA>
  38. #> 15 A 2021-05-23 00:00:00 pm_commute <NA>
  39. #> 16 A 2021-05-23 03:00:00 stationary night
  40. #> 17 A 2021-05-23 06:00:00 am_commute <NA>
  41. #> 18 A 2021-05-23 09:00:00 am_commute <NA>
  42. #> 19 A 2021-05-23 12:00:00 stationary day
  43. #> 20 A 2021-05-23 15:00:00 stationary day
  44. #> 21 A 2021-05-23 18:00:00 pm_commute <NA>
  45. #> 22 A 2021-05-23 21:00:00 pm_commute <NA>

Created on 2023-07-18 with reprex v2.0.2

英文:

One way would be to label the first day/night event, then fill in the 'stationary' points based on those values, then change 'am_commute' and 'pm_commute' commutes to NA, e.g.

  1. library(tidyverse)
  2. library(vctrs, warn = FALSE)
  3. id &lt;- rep.int(c(&quot;A&quot;), times = c(22))
  4. datetimes &lt;- c(seq(ymd_hms(&#39;2021-05-21T06:00:00&#39;), ymd_hms(&#39;2021-05-23T23:00:00&#39;), by = &quot;3 hours&quot;))
  5. commute &lt;- c(&#39;am_commute&#39;, &#39;am_commute&#39;, &#39;stationary&#39;, &#39;stationary&#39;, &#39;pm_commute&#39;,
  6. &#39;stationary&#39;, &#39;stationary&#39;, &#39;stationary&#39;, &#39;am_commute&#39;, &#39;stationary&#39;,
  7. &#39;stationary&#39;, &#39;stationary&#39;, &#39;stationary&#39;, &#39;pm_commute&#39;, &#39;pm_commute&#39;,
  8. &#39;stationary&#39;, &#39;am_commute&#39;, &#39;am_commute&#39;, &#39;stationary&#39;, &#39;stationary&#39;,
  9. &#39;pm_commute&#39;, &#39;pm_commute&#39;)
  10. data &lt;- data.frame(id, datetimes, commute)
  11. data %&gt;%
  12. group_by(id) %&gt;%
  13. mutate(time_of_day = case_when(commute == &quot;stationary&quot; &amp;
  14. lag(commute) == &quot;am_commute&quot; ~ &quot;day&quot;,
  15. commute == &quot;stationary&quot; &amp;
  16. lag(commute) == &quot;pm_commute&quot; ~ &quot;night&quot;)) %&gt;%
  17. mutate(time_of_day = vec_fill_missing(time_of_day, &quot;down&quot;)) %&gt;%
  18. mutate(time_of_day = if_else(commute != &quot;stationary&quot;, NA, time_of_day)) %&gt;%
  19. ungroup() %&gt;%
  20. print(n = 22)
  21. #&gt; # A tibble: 22 &#215; 4
  22. #&gt; id datetimes commute time_of_day
  23. #&gt; &lt;chr&gt; &lt;dttm&gt; &lt;chr&gt; &lt;chr&gt;
  24. #&gt; 1 A 2021-05-21 06:00:00 am_commute &lt;NA&gt;
  25. #&gt; 2 A 2021-05-21 09:00:00 am_commute &lt;NA&gt;
  26. #&gt; 3 A 2021-05-21 12:00:00 stationary day
  27. #&gt; 4 A 2021-05-21 15:00:00 stationary day
  28. #&gt; 5 A 2021-05-21 18:00:00 pm_commute &lt;NA&gt;
  29. #&gt; 6 A 2021-05-21 21:00:00 stationary night
  30. #&gt; 7 A 2021-05-22 00:00:00 stationary night
  31. #&gt; 8 A 2021-05-22 03:00:00 stationary night
  32. #&gt; 9 A 2021-05-22 06:00:00 am_commute &lt;NA&gt;
  33. #&gt; 10 A 2021-05-22 09:00:00 stationary day
  34. #&gt; 11 A 2021-05-22 12:00:00 stationary day
  35. #&gt; 12 A 2021-05-22 15:00:00 stationary day
  36. #&gt; 13 A 2021-05-22 18:00:00 stationary day
  37. #&gt; 14 A 2021-05-22 21:00:00 pm_commute &lt;NA&gt;
  38. #&gt; 15 A 2021-05-23 00:00:00 pm_commute &lt;NA&gt;
  39. #&gt; 16 A 2021-05-23 03:00:00 stationary night
  40. #&gt; 17 A 2021-05-23 06:00:00 am_commute &lt;NA&gt;
  41. #&gt; 18 A 2021-05-23 09:00:00 am_commute &lt;NA&gt;
  42. #&gt; 19 A 2021-05-23 12:00:00 stationary day
  43. #&gt; 20 A 2021-05-23 15:00:00 stationary day
  44. #&gt; 21 A 2021-05-23 18:00:00 pm_commute &lt;NA&gt;
  45. #&gt; 22 A 2021-05-23 21:00:00 pm_commute &lt;NA&gt;

<sup>Created on 2023-07-18 with reprex v2.0.2</sup>

答案2

得分: 1

您可以使用tidyr::fill()id组填充"stationary"的单元格中的先前通勤时间。

  1. library(dplyr)
  2. data %>%
  3. mutate(time_of_day = case_when(grepl("^am", commute) ~ "day", grepl("^pm", commute) ~ "night")) %>%
  4. group_by(id) %>%
  5. tidyr::fill(time_of_day) %>%
  6. ungroup() %>%
  7. mutate(time_of_day = replace(time_of_day, commute != "stationary", NA))
输出
  1. # A tibble: 22 × 4
  2. id datetimes commute time_of_day
  3. <chr> <dttm> <chr> <chr>
  4. 1 A 2021-05-21 06:00:00 am_commute NA
  5. 2 A 2021-05-21 09:00:00 am_commute NA
  6. 3 A 2021-05-21 12:00:00 stationary day
  7. 4 A 2021-05-21 15:00:00 stationary day
  8. 5 A 2021-05-21 18:00:00 pm_commute NA
  9. 6 A 2021-05-21 21:00:00 stationary night
  10. 7 A 2021-05-22 00:00:00 stationary night
  11. 8 A 2021-05-22 03:00:00 stationary night
  12. 9 A 2021-05-22 06:00:00 am_commute NA
  13. 10 A 2021-05-22 09:00:00 stationary day
  14. 11 A 2021-05-22 12:00:00 stationary day
  15. 12 A 2021-05-22 15:00:00 stationary day
  16. 13 A 2021-05-22 18:00:00 stationary day
  17. 14 A 2021-05-22 21:00:00 pm_commute NA
  18. 15 A 2021-05-23 00:00:00 pm_commute NA
  19. 16 A 2021-05-23 03:00:00 stationary night
  20. 17 A 2021-05-23 06:00:00 am_commute NA
  21. 18 A 2021-05-23 09:00:00 am_commute NA
  22. 19 A 2021-05-23 12:00:00 stationary day
  23. 20 A 2021-05-23 15:00:00 stationary day
  24. 21 A 2021-05-23 18:00:00 pm_commute NA
  25. 22 A 2021-05-23 21:00:00 pm_commute NA
英文:

You can fill in "stationary" cells with previous commuting time using tidyr::fill() by groups of id.

  1. library(dplyr)
  2. data %&gt;%
  3. mutate(time_of_day = case_when(grepl(&quot;^am&quot;, commute) ~ &quot;day&quot;, grepl(&quot;^pm&quot;, commute) ~ &quot;night&quot;)) %&gt;%
  4. group_by(id) %&gt;%
  5. tidyr::fill(time_of_day) %&gt;%
  6. ungroup() %&gt;%
  7. mutate(time_of_day = replace(time_of_day, commute != &quot;stationary&quot;, NA))
Output
  1. # A tibble: 22 &#215; 4
  2. id datetimes commute time_of_day
  3. &lt;chr&gt; &lt;dttm&gt; &lt;chr&gt; &lt;chr&gt;
  4. 1 A 2021-05-21 06:00:00 am_commute NA
  5. 2 A 2021-05-21 09:00:00 am_commute NA
  6. 3 A 2021-05-21 12:00:00 stationary day
  7. 4 A 2021-05-21 15:00:00 stationary day
  8. 5 A 2021-05-21 18:00:00 pm_commute NA
  9. 6 A 2021-05-21 21:00:00 stationary night
  10. 7 A 2021-05-22 00:00:00 stationary night
  11. 8 A 2021-05-22 03:00:00 stationary night
  12. 9 A 2021-05-22 06:00:00 am_commute NA
  13. 10 A 2021-05-22 09:00:00 stationary day
  14. 11 A 2021-05-22 12:00:00 stationary day
  15. 12 A 2021-05-22 15:00:00 stationary day
  16. 13 A 2021-05-22 18:00:00 stationary day
  17. 14 A 2021-05-22 21:00:00 pm_commute NA
  18. 15 A 2021-05-23 00:00:00 pm_commute NA
  19. 16 A 2021-05-23 03:00:00 stationary night
  20. 17 A 2021-05-23 06:00:00 am_commute NA
  21. 18 A 2021-05-23 09:00:00 am_commute NA
  22. 19 A 2021-05-23 12:00:00 stationary day
  23. 20 A 2021-05-23 15:00:00 stationary day
  24. 21 A 2021-05-23 18:00:00 pm_commute NA
  25. 22 A 2021-05-23 21:00:00 pm_commute NA

huangapple
  • 本文由 发表于 2023年7月18日 11:35:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76709382.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定