英文:
classify runs of rows between first and last rows that meet condition
问题
# Load necessary libraries
library(dplyr)
# Define the dataset
id <- rep.int(c("A"), times = c(22))
datetimes <- c(seq(lubridate::ymd_hms('2021-05-21T06:00:00'), lubridate::ymd_hms('2021-05-23T23:00:00'), by = "3 hours"))
commute <- c('am_commute', 'am_commute', 'stationary', 'stationary', 'pm_commute',
'stationary', 'stationary', 'stationary', 'am_commute', 'stationary',
'stationary', 'stationary', 'stationary', 'pm_commute', 'pm_commute',
'stationary', 'am_commute', 'am_commute', 'stationary', 'stationary',
'pm_commute', 'pm_commute')
data <- data.frame(id, datetimes, commute)
# Create the time_of_day variable
data <- data %>%
group_by(id) %>%
arrange(datetimes) %>%
mutate(
first_am_commute = first(commute == 'am_commute' & !is.na(commute)),
first_pm_commute = first(commute == 'pm_commute' & !is.na(commute)),
night_start = lag(first_pm_commute),
time_of_day = case_when(
is.na(night_start) ~ NA_character_,
datetimes >= first_am_commute & datetimes < first_pm_commute ~ 'day',
datetimes >= night_start & datetimes < first_am_commute ~ 'night',
TRUE ~ NA_character_
)
)
# Print the resulting dataset
data
This code uses the dplyr
package to create the time_of_day
variable based on the morning and evening commute points for each animal. The resulting dataset should match the desired results you provided.
英文:
I have movement data from different animals (id
) that commute to and from a central location each day, departing in the morning ('am_commute') and returning in the evening ('pm_commute'). Some commutes are longer than others, hence sometimes multiple consecutive points are associated with a commute. Here is a simplified version with just one individual, where all rows not associated with commuting movements are labeled as 'stationary'.
id <- rep.int(c("A"), times = c(22))
datetimes <- c(seq(lubridate::ymd_hms('2021-05-21T06:00:00'), lubridate::ymd_hms('2021-05-23T23:00:00'), by = "3 hours"))
commute <- c('am_commute', 'am_commute', 'stationary', 'stationary', 'pm_commute',
'stationary', 'stationary', 'stationary', 'am_commute', 'stationary',
'stationary', 'stationary', 'stationary', 'pm_commute', 'pm_commute',
'stationary', 'am_commute', 'am_commute', 'stationary', 'stationary',
'pm_commute', 'pm_commute')
data <- data.frame(id, datetimes, commute)
I want to create a new variable (time_of_day
) that indicates whether each stationary point is part of the animal's daytime or nighttime movements. In the real dataset, this cannot be defined simply by time of day, as there are multiple individuals, each that returns to the central location at different times. For each animal, "day" is when they are away from this location, and "night" is when they are back at this location. I therefore want to define "day" and "night" based on morning and evening commutes, such that "day" points are all points on a given day between the animal's first morning commute point and first evening commute point, and "night" points are all points between the first evening commute point and the first morning commute on the following day.
For this dataset, the desired results would look like this:
id datetimes commute time_of_day
1 A 2021-05-21 06:00:00 am_commute <NA>
2 A 2021-05-21 09:00:00 am_commute <NA>
3 A 2021-05-21 12:00:00 stationary day
4 A 2021-05-21 15:00:00 stationary day
5 A 2021-05-21 18:00:00 pm_commute <NA>
6 A 2021-05-21 21:00:00 stationary night
7 A 2021-05-22 00:00:00 stationary night
8 A 2021-05-22 03:00:00 stationary night
9 A 2021-05-22 06:00:00 am_commute <NA>
10 A 2021-05-22 09:00:00 stationary day
11 A 2021-05-22 12:00:00 stationary day
12 A 2021-05-22 15:00:00 stationary day
13 A 2021-05-22 18:00:00 stationary day
14 A 2021-05-22 21:00:00 pm_commute <NA>
15 A 2021-05-23 00:00:00 pm_commute <NA>
16 A 2021-05-23 03:00:00 stationary night
17 A 2021-05-23 06:00:00 am_commute <NA>
18 A 2021-05-23 09:00:00 am_commute <NA>
19 A 2021-05-23 12:00:00 stationary day
20 A 2021-05-23 15:00:00 stationary day
21 A 2021-05-23 18:00:00 pm_commute <NA>
22 A 2021-05-23 21:00:00 pm_commute <NA>
I have tried achieving this with dplyr
using a combination of mutate()
, case_when()
, first()
and lead()
, but I cannot figure out how to reference the value of the commute
column on the following date. Can this all be done in a piped workflow using dplyr
?
答案1
得分: 1
One way would be to label the first day/night event, then fill in the 'stationary' points based on those values, then change 'am_commute' and 'pm_commute' commutes to NA, e.g.
library(tidyverse)
library(vctrs, warn = FALSE)
id <- rep.int(c("A"), times = c(22))
datetimes <- c(seq(ymd_hms('2021-05-21T06:00:00'), ymd_hms('2021-05-23T23:00:00'), by = "3 hours"))
commute <- c('am_commute', 'am_commute', 'stationary', 'stationary', 'pm_commute',
'stationary', 'stationary', 'stationary', 'am_commute', 'stationary',
'stationary', 'stationary', 'stationary', 'pm_commute', 'pm_commute',
'stationary', 'am_commute', 'am_commute', 'stationary', 'stationary',
'pm_commute', 'pm_commute')
data <- data.frame(id, datetimes, commute)
data %>%
group_by(id) %>%
mutate(time_of_day = case_when(commute == "stationary" &
lag(commute) == "am_commute" ~ "day",
commute == "stationary" &
lag(commute) == "pm_commute" ~ "night")) %>%
mutate(time_of_day = vec_fill_missing(time_of_day, "down")) %>%
mutate(time_of_day = if_else(commute != "stationary", NA, time_of_day)) %>%
ungroup() %>%
print(n = 22)
#> # A tibble: 22 × 4
#> id datetimes commute time_of_day
#> <chr> <dttm> <chr> <chr>
#> 1 A 2021-05-21 06:00:00 am_commute <NA>
#> 2 A 2021-05-21 09:00:00 am_commute <NA>
#> 3 A 2021-05-21 12:00:00 stationary day
#> 4 A 2021-05-21 15:00:00 stationary day
#> 5 A 2021-05-21 18:00:00 pm_commute <NA>
#> 6 A 2021-05-21 21:00:00 stationary night
#> 7 A 2021-05-22 00:00:00 stationary night
#> 8 A 2021-05-22 03:00:00 stationary night
#> 9 A 2021-05-22 06:00:00 am_commute <NA>
#> 10 A 2021-05-22 09:00:00 stationary day
#> 11 A 2021-05-22 12:00:00 stationary day
#> 12 A 2021-05-22 15:00:00 stationary day
#> 13 A 2021-05-22 18:00:00 stationary day
#> 14 A 2021-05-22 21:00:00 pm_commute <NA>
#> 15 A 2021-05-23 00:00:00 pm_commute <NA>
#> 16 A 2021-05-23 03:00:00 stationary night
#> 17 A 2021-05-23 06:00:00 am_commute <NA>
#> 18 A 2021-05-23 09:00:00 am_commute <NA>
#> 19 A 2021-05-23 12:00:00 stationary day
#> 20 A 2021-05-23 15:00:00 stationary day
#> 21 A 2021-05-23 18:00:00 pm_commute <NA>
#> 22 A 2021-05-23 21:00:00 pm_commute <NA>
Created on 2023-07-18 with reprex v2.0.2
英文:
One way would be to label the first day/night event, then fill in the 'stationary' points based on those values, then change 'am_commute' and 'pm_commute' commutes to NA, e.g.
library(tidyverse)
library(vctrs, warn = FALSE)
id <- rep.int(c("A"), times = c(22))
datetimes <- c(seq(ymd_hms('2021-05-21T06:00:00'), ymd_hms('2021-05-23T23:00:00'), by = "3 hours"))
commute <- c('am_commute', 'am_commute', 'stationary', 'stationary', 'pm_commute',
'stationary', 'stationary', 'stationary', 'am_commute', 'stationary',
'stationary', 'stationary', 'stationary', 'pm_commute', 'pm_commute',
'stationary', 'am_commute', 'am_commute', 'stationary', 'stationary',
'pm_commute', 'pm_commute')
data <- data.frame(id, datetimes, commute)
data %>%
group_by(id) %>%
mutate(time_of_day = case_when(commute == "stationary" &
lag(commute) == "am_commute" ~ "day",
commute == "stationary" &
lag(commute) == "pm_commute" ~ "night")) %>%
mutate(time_of_day = vec_fill_missing(time_of_day, "down")) %>%
mutate(time_of_day = if_else(commute != "stationary", NA, time_of_day)) %>%
ungroup() %>%
print(n = 22)
#> # A tibble: 22 × 4
#> id datetimes commute time_of_day
#> <chr> <dttm> <chr> <chr>
#> 1 A 2021-05-21 06:00:00 am_commute <NA>
#> 2 A 2021-05-21 09:00:00 am_commute <NA>
#> 3 A 2021-05-21 12:00:00 stationary day
#> 4 A 2021-05-21 15:00:00 stationary day
#> 5 A 2021-05-21 18:00:00 pm_commute <NA>
#> 6 A 2021-05-21 21:00:00 stationary night
#> 7 A 2021-05-22 00:00:00 stationary night
#> 8 A 2021-05-22 03:00:00 stationary night
#> 9 A 2021-05-22 06:00:00 am_commute <NA>
#> 10 A 2021-05-22 09:00:00 stationary day
#> 11 A 2021-05-22 12:00:00 stationary day
#> 12 A 2021-05-22 15:00:00 stationary day
#> 13 A 2021-05-22 18:00:00 stationary day
#> 14 A 2021-05-22 21:00:00 pm_commute <NA>
#> 15 A 2021-05-23 00:00:00 pm_commute <NA>
#> 16 A 2021-05-23 03:00:00 stationary night
#> 17 A 2021-05-23 06:00:00 am_commute <NA>
#> 18 A 2021-05-23 09:00:00 am_commute <NA>
#> 19 A 2021-05-23 12:00:00 stationary day
#> 20 A 2021-05-23 15:00:00 stationary day
#> 21 A 2021-05-23 18:00:00 pm_commute <NA>
#> 22 A 2021-05-23 21:00:00 pm_commute <NA>
<sup>Created on 2023-07-18 with reprex v2.0.2</sup>
答案2
得分: 1
您可以使用tidyr::fill()
按id
组填充"stationary"的单元格中的先前通勤时间。
library(dplyr)
data %>%
mutate(time_of_day = case_when(grepl("^am", commute) ~ "day", grepl("^pm", commute) ~ "night")) %>%
group_by(id) %>%
tidyr::fill(time_of_day) %>%
ungroup() %>%
mutate(time_of_day = replace(time_of_day, commute != "stationary", NA))
输出
# A tibble: 22 × 4
id datetimes commute time_of_day
<chr> <dttm> <chr> <chr>
1 A 2021-05-21 06:00:00 am_commute NA
2 A 2021-05-21 09:00:00 am_commute NA
3 A 2021-05-21 12:00:00 stationary day
4 A 2021-05-21 15:00:00 stationary day
5 A 2021-05-21 18:00:00 pm_commute NA
6 A 2021-05-21 21:00:00 stationary night
7 A 2021-05-22 00:00:00 stationary night
8 A 2021-05-22 03:00:00 stationary night
9 A 2021-05-22 06:00:00 am_commute NA
10 A 2021-05-22 09:00:00 stationary day
11 A 2021-05-22 12:00:00 stationary day
12 A 2021-05-22 15:00:00 stationary day
13 A 2021-05-22 18:00:00 stationary day
14 A 2021-05-22 21:00:00 pm_commute NA
15 A 2021-05-23 00:00:00 pm_commute NA
16 A 2021-05-23 03:00:00 stationary night
17 A 2021-05-23 06:00:00 am_commute NA
18 A 2021-05-23 09:00:00 am_commute NA
19 A 2021-05-23 12:00:00 stationary day
20 A 2021-05-23 15:00:00 stationary day
21 A 2021-05-23 18:00:00 pm_commute NA
22 A 2021-05-23 21:00:00 pm_commute NA
英文:
You can fill in "stationary" cells with previous commuting time using tidyr::fill()
by groups of id
.
library(dplyr)
data %>%
mutate(time_of_day = case_when(grepl("^am", commute) ~ "day", grepl("^pm", commute) ~ "night")) %>%
group_by(id) %>%
tidyr::fill(time_of_day) %>%
ungroup() %>%
mutate(time_of_day = replace(time_of_day, commute != "stationary", NA))
Output
# A tibble: 22 × 4
id datetimes commute time_of_day
<chr> <dttm> <chr> <chr>
1 A 2021-05-21 06:00:00 am_commute NA
2 A 2021-05-21 09:00:00 am_commute NA
3 A 2021-05-21 12:00:00 stationary day
4 A 2021-05-21 15:00:00 stationary day
5 A 2021-05-21 18:00:00 pm_commute NA
6 A 2021-05-21 21:00:00 stationary night
7 A 2021-05-22 00:00:00 stationary night
8 A 2021-05-22 03:00:00 stationary night
9 A 2021-05-22 06:00:00 am_commute NA
10 A 2021-05-22 09:00:00 stationary day
11 A 2021-05-22 12:00:00 stationary day
12 A 2021-05-22 15:00:00 stationary day
13 A 2021-05-22 18:00:00 stationary day
14 A 2021-05-22 21:00:00 pm_commute NA
15 A 2021-05-23 00:00:00 pm_commute NA
16 A 2021-05-23 03:00:00 stationary night
17 A 2021-05-23 06:00:00 am_commute NA
18 A 2021-05-23 09:00:00 am_commute NA
19 A 2021-05-23 12:00:00 stationary day
20 A 2021-05-23 15:00:00 stationary day
21 A 2021-05-23 18:00:00 pm_commute NA
22 A 2021-05-23 21:00:00 pm_commute NA
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论