将满足条件的第一行和最后一行之间的连续行分类。

huangapple go评论84阅读模式
英文:

classify runs of rows between first and last rows that meet condition

问题

# Load necessary libraries
library(dplyr)

# Define the dataset
id <- rep.int(c("A"), times = c(22))
datetimes <- c(seq(lubridate::ymd_hms('2021-05-21T06:00:00'), lubridate::ymd_hms('2021-05-23T23:00:00'), by = "3 hours"))
commute <- c('am_commute', 'am_commute', 'stationary', 'stationary', 'pm_commute', 
             'stationary', 'stationary', 'stationary', 'am_commute', 'stationary', 
             'stationary', 'stationary', 'stationary', 'pm_commute', 'pm_commute',
             'stationary', 'am_commute', 'am_commute', 'stationary', 'stationary', 
             'pm_commute', 'pm_commute')

data <- data.frame(id, datetimes, commute) 

# Create the time_of_day variable
data <- data %>%
  group_by(id) %>%
  arrange(datetimes) %>%
  mutate(
    first_am_commute = first(commute == 'am_commute' & !is.na(commute)),
    first_pm_commute = first(commute == 'pm_commute' & !is.na(commute)),
    night_start = lag(first_pm_commute),
    time_of_day = case_when(
      is.na(night_start) ~ NA_character_,
      datetimes >= first_am_commute & datetimes < first_pm_commute ~ 'day',
      datetimes >= night_start & datetimes < first_am_commute ~ 'night',
      TRUE ~ NA_character_
    )
  )

# Print the resulting dataset
data

This code uses the dplyr package to create the time_of_day variable based on the morning and evening commute points for each animal. The resulting dataset should match the desired results you provided.

英文:

I have movement data from different animals (id) that commute to and from a central location each day, departing in the morning ('am_commute') and returning in the evening ('pm_commute'). Some commutes are longer than others, hence sometimes multiple consecutive points are associated with a commute. Here is a simplified version with just one individual, where all rows not associated with commuting movements are labeled as 'stationary'.

id &lt;- rep.int(c(&quot;A&quot;), times = c(22))
datetimes &lt;- c(seq(lubridate::ymd_hms(&#39;2021-05-21T06:00:00&#39;), lubridate::ymd_hms(&#39;2021-05-23T23:00:00&#39;), by = &quot;3 hours&quot;))
commute &lt;- c(&#39;am_commute&#39;, &#39;am_commute&#39;, &#39;stationary&#39;, &#39;stationary&#39;, &#39;pm_commute&#39;, 
             &#39;stationary&#39;, &#39;stationary&#39;, &#39;stationary&#39;, &#39;am_commute&#39;, &#39;stationary&#39;, 
             &#39;stationary&#39;, &#39;stationary&#39;, &#39;stationary&#39;, &#39;pm_commute&#39;, &#39;pm_commute&#39;,
             &#39;stationary&#39;, &#39;am_commute&#39;, &#39;am_commute&#39;, &#39;stationary&#39;, &#39;stationary&#39;, 
             &#39;pm_commute&#39;, &#39;pm_commute&#39;)

data &lt;- data.frame(id, datetimes, commute) 

I want to create a new variable (time_of_day) that indicates whether each stationary point is part of the animal's daytime or nighttime movements. In the real dataset, this cannot be defined simply by time of day, as there are multiple individuals, each that returns to the central location at different times. For each animal, "day" is when they are away from this location, and "night" is when they are back at this location. I therefore want to define "day" and "night" based on morning and evening commutes, such that "day" points are all points on a given day between the animal's first morning commute point and first evening commute point, and "night" points are all points between the first evening commute point and the first morning commute on the following day.

For this dataset, the desired results would look like this:

   id           datetimes    commute time_of_day
1   A 2021-05-21 06:00:00 am_commute        &lt;NA&gt;
2   A 2021-05-21 09:00:00 am_commute        &lt;NA&gt;
3   A 2021-05-21 12:00:00 stationary         day
4   A 2021-05-21 15:00:00 stationary         day
5   A 2021-05-21 18:00:00 pm_commute        &lt;NA&gt;
6   A 2021-05-21 21:00:00 stationary       night
7   A 2021-05-22 00:00:00 stationary       night
8   A 2021-05-22 03:00:00 stationary       night
9   A 2021-05-22 06:00:00 am_commute        &lt;NA&gt;
10  A 2021-05-22 09:00:00 stationary         day
11  A 2021-05-22 12:00:00 stationary         day
12  A 2021-05-22 15:00:00 stationary         day
13  A 2021-05-22 18:00:00 stationary         day
14  A 2021-05-22 21:00:00 pm_commute        &lt;NA&gt;
15  A 2021-05-23 00:00:00 pm_commute        &lt;NA&gt;
16  A 2021-05-23 03:00:00 stationary       night
17  A 2021-05-23 06:00:00 am_commute        &lt;NA&gt;
18  A 2021-05-23 09:00:00 am_commute        &lt;NA&gt;
19  A 2021-05-23 12:00:00 stationary         day
20  A 2021-05-23 15:00:00 stationary         day
21  A 2021-05-23 18:00:00 pm_commute        &lt;NA&gt;
22  A 2021-05-23 21:00:00 pm_commute        &lt;NA&gt;

I have tried achieving this with dplyr using a combination of mutate(), case_when(), first() and lead(), but I cannot figure out how to reference the value of the commute column on the following date. Can this all be done in a piped workflow using dplyr?

答案1

得分: 1

One way would be to label the first day/night event, then fill in the 'stationary' points based on those values, then change 'am_commute' and 'pm_commute' commutes to NA, e.g.

library(tidyverse)
library(vctrs, warn = FALSE)

id <- rep.int(c("A"), times = c(22))
datetimes <- c(seq(ymd_hms('2021-05-21T06:00:00'), ymd_hms('2021-05-23T23:00:00'), by = "3 hours"))
commute <- c('am_commute', 'am_commute', 'stationary', 'stationary', 'pm_commute', 
             'stationary', 'stationary', 'stationary', 'am_commute', 'stationary', 
             'stationary', 'stationary', 'stationary', 'pm_commute', 'pm_commute',
             'stationary', 'am_commute', 'am_commute', 'stationary', 'stationary', 
             'pm_commute', 'pm_commute')

data <- data.frame(id, datetimes, commute) 

data %>%
  group_by(id) %>%
  mutate(time_of_day = case_when(commute == "stationary" &
                                   lag(commute) == "am_commute" ~ "day",
                                 commute == "stationary" &
                                   lag(commute) == "pm_commute" ~ "night")) %>%
  mutate(time_of_day = vec_fill_missing(time_of_day, "down")) %>%
  mutate(time_of_day = if_else(commute != "stationary", NA, time_of_day)) %>%
  ungroup() %>%
  print(n = 22)
#> # A tibble: 22 × 4
#>    id    datetimes           commute    time_of_day
#>    <chr> <dttm>              <chr>      <chr>      
#>  1 A     2021-05-21 06:00:00 am_commute <NA>       
#>  2 A     2021-05-21 09:00:00 am_commute <NA>       
#>  3 A     2021-05-21 12:00:00 stationary day        
#>  4 A     2021-05-21 15:00:00 stationary day        
#>  5 A     2021-05-21 18:00:00 pm_commute <NA>       
#>  6 A     2021-05-21 21:00:00 stationary night      
#>  7 A     2021-05-22 00:00:00 stationary night      
#>  8 A     2021-05-22 03:00:00 stationary night      
#>  9 A     2021-05-22 06:00:00 am_commute <NA>       
#> 10 A     2021-05-22 09:00:00 stationary day        
#> 11 A     2021-05-22 12:00:00 stationary day        
#> 12 A     2021-05-22 15:00:00 stationary day        
#> 13 A     2021-05-22 18:00:00 stationary day        
#> 14 A     2021-05-22 21:00:00 pm_commute <NA>       
#> 15 A     2021-05-23 00:00:00 pm_commute <NA>       
#> 16 A     2021-05-23 03:00:00 stationary night      
#> 17 A     2021-05-23 06:00:00 am_commute <NA>       
#> 18 A     2021-05-23 09:00:00 am_commute <NA>       
#> 19 A     2021-05-23 12:00:00 stationary day        
#> 20 A     2021-05-23 15:00:00 stationary day        
#> 21 A     2021-05-23 18:00:00 pm_commute <NA>       
#> 22 A     2021-05-23 21:00:00 pm_commute <NA>

Created on 2023-07-18 with reprex v2.0.2

英文:

One way would be to label the first day/night event, then fill in the 'stationary' points based on those values, then change 'am_commute' and 'pm_commute' commutes to NA, e.g.

library(tidyverse)
library(vctrs, warn = FALSE)

id &lt;- rep.int(c(&quot;A&quot;), times = c(22))
datetimes &lt;- c(seq(ymd_hms(&#39;2021-05-21T06:00:00&#39;), ymd_hms(&#39;2021-05-23T23:00:00&#39;), by = &quot;3 hours&quot;))
commute &lt;- c(&#39;am_commute&#39;, &#39;am_commute&#39;, &#39;stationary&#39;, &#39;stationary&#39;, &#39;pm_commute&#39;, 
             &#39;stationary&#39;, &#39;stationary&#39;, &#39;stationary&#39;, &#39;am_commute&#39;, &#39;stationary&#39;, 
             &#39;stationary&#39;, &#39;stationary&#39;, &#39;stationary&#39;, &#39;pm_commute&#39;, &#39;pm_commute&#39;,
             &#39;stationary&#39;, &#39;am_commute&#39;, &#39;am_commute&#39;, &#39;stationary&#39;, &#39;stationary&#39;, 
             &#39;pm_commute&#39;, &#39;pm_commute&#39;)

data &lt;- data.frame(id, datetimes, commute) 

data %&gt;%
  group_by(id) %&gt;%
  mutate(time_of_day = case_when(commute == &quot;stationary&quot; &amp;
                                   lag(commute) == &quot;am_commute&quot; ~ &quot;day&quot;,
                                 commute == &quot;stationary&quot; &amp;
                                   lag(commute) == &quot;pm_commute&quot; ~ &quot;night&quot;)) %&gt;%
  mutate(time_of_day = vec_fill_missing(time_of_day, &quot;down&quot;)) %&gt;%
  mutate(time_of_day = if_else(commute != &quot;stationary&quot;, NA, time_of_day)) %&gt;%
  ungroup() %&gt;%
  print(n = 22)
#&gt; # A tibble: 22 &#215; 4
#&gt;    id    datetimes           commute    time_of_day
#&gt;    &lt;chr&gt; &lt;dttm&gt;              &lt;chr&gt;      &lt;chr&gt;      
#&gt;  1 A     2021-05-21 06:00:00 am_commute &lt;NA&gt;       
#&gt;  2 A     2021-05-21 09:00:00 am_commute &lt;NA&gt;       
#&gt;  3 A     2021-05-21 12:00:00 stationary day        
#&gt;  4 A     2021-05-21 15:00:00 stationary day        
#&gt;  5 A     2021-05-21 18:00:00 pm_commute &lt;NA&gt;       
#&gt;  6 A     2021-05-21 21:00:00 stationary night      
#&gt;  7 A     2021-05-22 00:00:00 stationary night      
#&gt;  8 A     2021-05-22 03:00:00 stationary night      
#&gt;  9 A     2021-05-22 06:00:00 am_commute &lt;NA&gt;       
#&gt; 10 A     2021-05-22 09:00:00 stationary day        
#&gt; 11 A     2021-05-22 12:00:00 stationary day        
#&gt; 12 A     2021-05-22 15:00:00 stationary day        
#&gt; 13 A     2021-05-22 18:00:00 stationary day        
#&gt; 14 A     2021-05-22 21:00:00 pm_commute &lt;NA&gt;       
#&gt; 15 A     2021-05-23 00:00:00 pm_commute &lt;NA&gt;       
#&gt; 16 A     2021-05-23 03:00:00 stationary night      
#&gt; 17 A     2021-05-23 06:00:00 am_commute &lt;NA&gt;       
#&gt; 18 A     2021-05-23 09:00:00 am_commute &lt;NA&gt;       
#&gt; 19 A     2021-05-23 12:00:00 stationary day        
#&gt; 20 A     2021-05-23 15:00:00 stationary day        
#&gt; 21 A     2021-05-23 18:00:00 pm_commute &lt;NA&gt;       
#&gt; 22 A     2021-05-23 21:00:00 pm_commute &lt;NA&gt;

<sup>Created on 2023-07-18 with reprex v2.0.2</sup>

答案2

得分: 1

您可以使用tidyr::fill()id组填充"stationary"的单元格中的先前通勤时间。

library(dplyr)

data %>%
  mutate(time_of_day = case_when(grepl("^am", commute) ~ "day", grepl("^pm", commute) ~ "night")) %>%
  group_by(id) %>%
  tidyr::fill(time_of_day) %>%
  ungroup() %>%
  mutate(time_of_day = replace(time_of_day, commute != "stationary", NA))
输出
# A tibble: 22 × 4
   id    datetimes           commute    time_of_day
   <chr> <dttm>              <chr>      <chr>      
 1 A     2021-05-21 06:00:00 am_commute NA         
 2 A     2021-05-21 09:00:00 am_commute NA         
 3 A     2021-05-21 12:00:00 stationary day        
 4 A     2021-05-21 15:00:00 stationary day        
 5 A     2021-05-21 18:00:00 pm_commute NA         
 6 A     2021-05-21 21:00:00 stationary night      
 7 A     2021-05-22 00:00:00 stationary night      
 8 A     2021-05-22 03:00:00 stationary night      
 9 A     2021-05-22 06:00:00 am_commute NA         
10 A     2021-05-22 09:00:00 stationary day        
11 A     2021-05-22 12:00:00 stationary day        
12 A     2021-05-22 15:00:00 stationary day        
13 A     2021-05-22 18:00:00 stationary day        
14 A     2021-05-22 21:00:00 pm_commute NA         
15 A     2021-05-23 00:00:00 pm_commute NA         
16 A     2021-05-23 03:00:00 stationary night      
17 A     2021-05-23 06:00:00 am_commute NA         
18 A     2021-05-23 09:00:00 am_commute NA         
19 A     2021-05-23 12:00:00 stationary day        
20 A     2021-05-23 15:00:00 stationary day        
21 A     2021-05-23 18:00:00 pm_commute NA         
22 A     2021-05-23 21:00:00 pm_commute NA
英文:

You can fill in "stationary" cells with previous commuting time using tidyr::fill() by groups of id.

library(dplyr)

data %&gt;%
  mutate(time_of_day = case_when(grepl(&quot;^am&quot;, commute) ~ &quot;day&quot;, grepl(&quot;^pm&quot;, commute) ~ &quot;night&quot;)) %&gt;%
  group_by(id) %&gt;%
  tidyr::fill(time_of_day) %&gt;%
  ungroup() %&gt;%
  mutate(time_of_day = replace(time_of_day, commute != &quot;stationary&quot;, NA))
Output
# A tibble: 22 &#215; 4
id    datetimes           commute    time_of_day
&lt;chr&gt; &lt;dttm&gt;              &lt;chr&gt;      &lt;chr&gt;      
1 A     2021-05-21 06:00:00 am_commute NA         
2 A     2021-05-21 09:00:00 am_commute NA         
3 A     2021-05-21 12:00:00 stationary day        
4 A     2021-05-21 15:00:00 stationary day        
5 A     2021-05-21 18:00:00 pm_commute NA         
6 A     2021-05-21 21:00:00 stationary night      
7 A     2021-05-22 00:00:00 stationary night      
8 A     2021-05-22 03:00:00 stationary night      
9 A     2021-05-22 06:00:00 am_commute NA         
10 A     2021-05-22 09:00:00 stationary day        
11 A     2021-05-22 12:00:00 stationary day        
12 A     2021-05-22 15:00:00 stationary day        
13 A     2021-05-22 18:00:00 stationary day        
14 A     2021-05-22 21:00:00 pm_commute NA         
15 A     2021-05-23 00:00:00 pm_commute NA         
16 A     2021-05-23 03:00:00 stationary night      
17 A     2021-05-23 06:00:00 am_commute NA         
18 A     2021-05-23 09:00:00 am_commute NA         
19 A     2021-05-23 12:00:00 stationary day        
20 A     2021-05-23 15:00:00 stationary day        
21 A     2021-05-23 18:00:00 pm_commute NA         
22 A     2021-05-23 21:00:00 pm_commute NA

huangapple
  • 本文由 发表于 2023年7月18日 11:35:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76709382.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定