2023年7月18日 11:35:03go评论116阅读模式

英文:

classify runs of rows between first and last rows that meet condition

问题

# Load necessary libraries
library(dplyr)
# Define the dataset
id <- rep.int(c("A"), times = c(22))
datetimes <- c(seq(lubridate::ymd_hms('2021-05-21T06:00:00'), lubridate::ymd_hms('2021-05-23T23:00:00'), by = "3 hours"))
commute <- c('am_commute', 'am_commute', 'stationary', 'stationary', 'pm_commute', 
             'stationary', 'stationary', 'stationary', 'am_commute', 'stationary', 
             'stationary', 'stationary', 'stationary', 'pm_commute', 'pm_commute',
             'stationary', 'am_commute', 'am_commute', 'stationary', 'stationary', 
             'pm_commute', 'pm_commute')
data <- data.frame(id, datetimes, commute) 
# Create the time_of_day variable
data <- data %>%
  group_by(id) %>%
  arrange(datetimes) %>%
  mutate(
    first_am_commute = first(commute == 'am_commute' & !is.na(commute)),
    first_pm_commute = first(commute == 'pm_commute' & !is.na(commute)),
    night_start = lag(first_pm_commute),
    time_of_day = case_when(
      is.na(night_start) ~ NA_character_,
      datetimes >= first_am_commute & datetimes < first_pm_commute ~ 'day',
      datetimes >= night_start & datetimes < first_am_commute ~ 'night',
      TRUE ~ NA_character_
    )
  )
# Print the resulting dataset
data

This code uses the dplyr package to create the time_of_day variable based on the morning and evening commute points for each animal. The resulting dataset should match the desired results you provided.

英文:

I have movement data from different animals (id) that commute to and from a central location each day, departing in the morning ('am_commute') and returning in the evening ('pm_commute'). Some commutes are longer than others, hence sometimes multiple consecutive points are associated with a commute. Here is a simplified version with just one individual, where all rows not associated with commuting movements are labeled as 'stationary'.

id &lt;- rep.int(c(&quot;A&quot;), times = c(22))
datetimes &lt;- c(seq(lubridate::ymd_hms(&#39;2021-05-21T06:00:00&#39;), lubridate::ymd_hms(&#39;2021-05-23T23:00:00&#39;), by = &quot;3 hours&quot;))
commute &lt;- c(&#39;am_commute&#39;, &#39;am_commute&#39;, &#39;stationary&#39;, &#39;stationary&#39;, &#39;pm_commute&#39;, 
             &#39;stationary&#39;, &#39;stationary&#39;, &#39;stationary&#39;, &#39;am_commute&#39;, &#39;stationary&#39;, 
             &#39;stationary&#39;, &#39;stationary&#39;, &#39;stationary&#39;, &#39;pm_commute&#39;, &#39;pm_commute&#39;,
             &#39;stationary&#39;, &#39;am_commute&#39;, &#39;am_commute&#39;, &#39;stationary&#39;, &#39;stationary&#39;, 
             &#39;pm_commute&#39;, &#39;pm_commute&#39;)
data &lt;- data.frame(id, datetimes, commute)

I want to create a new variable (time_of_day) that indicates whether each stationary point is part of the animal's daytime or nighttime movements. In the real dataset, this cannot be defined simply by time of day, as there are multiple individuals, each that returns to the central location at different times. For each animal, "day" is when they are away from this location, and "night" is when they are back at this location. I therefore want to define "day" and "night" based on morning and evening commutes, such that "day" points are all points on a given day between the animal's first morning commute point and first evening commute point, and "night" points are all points between the first evening commute point and the first morning commute on the following day.

For this dataset, the desired results would look like this:

   id           datetimes    commute time_of_day
1   A 2021-05-21 06:00:00 am_commute        &lt;NA&gt;
2   A 2021-05-21 09:00:00 am_commute        &lt;NA&gt;
3   A 2021-05-21 12:00:00 stationary         day
4   A 2021-05-21 15:00:00 stationary         day
5   A 2021-05-21 18:00:00 pm_commute        &lt;NA&gt;
6   A 2021-05-21 21:00:00 stationary       night
7   A 2021-05-22 00:00:00 stationary       night
8   A 2021-05-22 03:00:00 stationary       night
9   A 2021-05-22 06:00:00 am_commute        &lt;NA&gt;
10  A 2021-05-22 09:00:00 stationary         day
11  A 2021-05-22 12:00:00 stationary         day
12  A 2021-05-22 15:00:00 stationary         day
13  A 2021-05-22 18:00:00 stationary         day
14  A 2021-05-22 21:00:00 pm_commute        &lt;NA&gt;
15  A 2021-05-23 00:00:00 pm_commute        &lt;NA&gt;
16  A 2021-05-23 03:00:00 stationary       night
17  A 2021-05-23 06:00:00 am_commute        &lt;NA&gt;
18  A 2021-05-23 09:00:00 am_commute        &lt;NA&gt;
19  A 2021-05-23 12:00:00 stationary         day
20  A 2021-05-23 15:00:00 stationary         day
21  A 2021-05-23 18:00:00 pm_commute        &lt;NA&gt;
22  A 2021-05-23 21:00:00 pm_commute        &lt;NA&gt;

I have tried achieving this with dplyr using a combination of mutate(), case_when(), first() and lead(), but I cannot figure out how to reference the value of the commute column on the following date. Can this all be done in a piped workflow using dplyr?

答案1

得分: 1

One way would be to label the first day/night event, then fill in the 'stationary' points based on those values, then change 'am_commute' and 'pm_commute' commutes to NA, e.g.

library(tidyverse)
library(vctrs, warn = FALSE)
id <- rep.int(c("A"), times = c(22))
datetimes <- c(seq(ymd_hms('2021-05-21T06:00:00'), ymd_hms('2021-05-23T23:00:00'), by = "3 hours"))
commute <- c('am_commute', 'am_commute', 'stationary', 'stationary', 'pm_commute', 
             'stationary', 'stationary', 'stationary', 'am_commute', 'stationary', 
             'stationary', 'stationary', 'stationary', 'pm_commute', 'pm_commute',
             'stationary', 'am_commute', 'am_commute', 'stationary', 'stationary', 
             'pm_commute', 'pm_commute')
data <- data.frame(id, datetimes, commute) 
data %>%
  group_by(id) %>%
  mutate(time_of_day = case_when(commute == "stationary" &
                                   lag(commute) == "am_commute" ~ "day",
                                 commute == "stationary" &
                                   lag(commute) == "pm_commute" ~ "night")) %>%
  mutate(time_of_day = vec_fill_missing(time_of_day, "down")) %>%
  mutate(time_of_day = if_else(commute != "stationary", NA, time_of_day)) %>%
  ungroup() %>%
  print(n = 22)
#> # A tibble: 22 × 4
#>    id    datetimes           commute    time_of_day
#>    <chr> <dttm>              <chr>      <chr>      
#>  1 A     2021-05-21 06:00:00 am_commute <NA>       
#>  2 A     2021-05-21 09:00:00 am_commute <NA>       
#>  3 A     2021-05-21 12:00:00 stationary day        
#>  4 A     2021-05-21 15:00:00 stationary day        
#>  5 A     2021-05-21 18:00:00 pm_commute <NA>       
#>  6 A     2021-05-21 21:00:00 stationary night      
#>  7 A     2021-05-22 00:00:00 stationary night      
#>  8 A     2021-05-22 03:00:00 stationary night      
#>  9 A     2021-05-22 06:00:00 am_commute <NA>       
#> 10 A     2021-05-22 09:00:00 stationary day        
#> 11 A     2021-05-22 12:00:00 stationary day        
#> 12 A     2021-05-22 15:00:00 stationary day        
#> 13 A     2021-05-22 18:00:00 stationary day        
#> 14 A     2021-05-22 21:00:00 pm_commute <NA>       
#> 15 A     2021-05-23 00:00:00 pm_commute <NA>       
#> 16 A     2021-05-23 03:00:00 stationary night      
#> 17 A     2021-05-23 06:00:00 am_commute <NA>       
#> 18 A     2021-05-23 09:00:00 am_commute <NA>       
#> 19 A     2021-05-23 12:00:00 stationary day        
#> 20 A     2021-05-23 15:00:00 stationary day        
#> 21 A     2021-05-23 18:00:00 pm_commute <NA>       
#> 22 A     2021-05-23 21:00:00 pm_commute <NA>

^{Created on 2023-07-18 with reprex v2.0.2}

英文:

One way would be to label the first day/night event, then fill in the 'stationary' points based on those values, then change 'am_commute' and 'pm_commute' commutes to NA, e.g.

library(tidyverse)
library(vctrs, warn = FALSE)
id &lt;- rep.int(c(&quot;A&quot;), times = c(22))
datetimes &lt;- c(seq(ymd_hms(&#39;2021-05-21T06:00:00&#39;), ymd_hms(&#39;2021-05-23T23:00:00&#39;), by = &quot;3 hours&quot;))
commute &lt;- c(&#39;am_commute&#39;, &#39;am_commute&#39;, &#39;stationary&#39;, &#39;stationary&#39;, &#39;pm_commute&#39;, 
             &#39;stationary&#39;, &#39;stationary&#39;, &#39;stationary&#39;, &#39;am_commute&#39;, &#39;stationary&#39;, 
             &#39;stationary&#39;, &#39;stationary&#39;, &#39;stationary&#39;, &#39;pm_commute&#39;, &#39;pm_commute&#39;,
             &#39;stationary&#39;, &#39;am_commute&#39;, &#39;am_commute&#39;, &#39;stationary&#39;, &#39;stationary&#39;, 
             &#39;pm_commute&#39;, &#39;pm_commute&#39;)
data &lt;- data.frame(id, datetimes, commute) 
data %&gt;%
  group_by(id) %&gt;%
  mutate(time_of_day = case_when(commute == &quot;stationary&quot; &amp;
                                   lag(commute) == &quot;am_commute&quot; ~ &quot;day&quot;,
                                 commute == &quot;stationary&quot; &amp;
                                   lag(commute) == &quot;pm_commute&quot; ~ &quot;night&quot;)) %&gt;%
  mutate(time_of_day = vec_fill_missing(time_of_day, &quot;down&quot;)) %&gt;%
  mutate(time_of_day = if_else(commute != &quot;stationary&quot;, NA, time_of_day)) %&gt;%
  ungroup() %&gt;%
  print(n = 22)
#&gt; # A tibble: 22 &#215; 4
#&gt;    id    datetimes           commute    time_of_day
#&gt;    &lt;chr&gt; &lt;dttm&gt;              &lt;chr&gt;      &lt;chr&gt;      
#&gt;  1 A     2021-05-21 06:00:00 am_commute &lt;NA&gt;       
#&gt;  2 A     2021-05-21 09:00:00 am_commute &lt;NA&gt;       
#&gt;  3 A     2021-05-21 12:00:00 stationary day        
#&gt;  4 A     2021-05-21 15:00:00 stationary day        
#&gt;  5 A     2021-05-21 18:00:00 pm_commute &lt;NA&gt;       
#&gt;  6 A     2021-05-21 21:00:00 stationary night      
#&gt;  7 A     2021-05-22 00:00:00 stationary night      
#&gt;  8 A     2021-05-22 03:00:00 stationary night      
#&gt;  9 A     2021-05-22 06:00:00 am_commute &lt;NA&gt;       
#&gt; 10 A     2021-05-22 09:00:00 stationary day        
#&gt; 11 A     2021-05-22 12:00:00 stationary day        
#&gt; 12 A     2021-05-22 15:00:00 stationary day        
#&gt; 13 A     2021-05-22 18:00:00 stationary day        
#&gt; 14 A     2021-05-22 21:00:00 pm_commute &lt;NA&gt;       
#&gt; 15 A     2021-05-23 00:00:00 pm_commute &lt;NA&gt;       
#&gt; 16 A     2021-05-23 03:00:00 stationary night      
#&gt; 17 A     2021-05-23 06:00:00 am_commute &lt;NA&gt;       
#&gt; 18 A     2021-05-23 09:00:00 am_commute &lt;NA&gt;       
#&gt; 19 A     2021-05-23 12:00:00 stationary day        
#&gt; 20 A     2021-05-23 15:00:00 stationary day        
#&gt; 21 A     2021-05-23 18:00:00 pm_commute &lt;NA&gt;       
#&gt; 22 A     2021-05-23 21:00:00 pm_commute &lt;NA&gt;

<sup>Created on 2023-07-18 with reprex v2.0.2</sup>

答案2

得分: 1

您可以使用tidyr::fill()按id组填充"stationary"的单元格中的先前通勤时间。

library(dplyr)
data %>%
  mutate(time_of_day = case_when(grepl("^am", commute) ~ "day", grepl("^pm", commute) ~ "night")) %>%
  group_by(id) %>%
  tidyr::fill(time_of_day) %>%
  ungroup() %>%
  mutate(time_of_day = replace(time_of_day, commute != "stationary", NA))

输出

# A tibble: 22 × 4
   id    datetimes           commute    time_of_day
   <chr> <dttm>              <chr>      <chr>      
 1 A     2021-05-21 06:00:00 am_commute NA         
 2 A     2021-05-21 09:00:00 am_commute NA         
 3 A     2021-05-21 12:00:00 stationary day        
 4 A     2021-05-21 15:00:00 stationary day        
 5 A     2021-05-21 18:00:00 pm_commute NA         
 6 A     2021-05-21 21:00:00 stationary night      
 7 A     2021-05-22 00:00:00 stationary night      
 8 A     2021-05-22 03:00:00 stationary night      
 9 A     2021-05-22 06:00:00 am_commute NA         
10 A     2021-05-22 09:00:00 stationary day        
11 A     2021-05-22 12:00:00 stationary day        
12 A     2021-05-22 15:00:00 stationary day        
13 A     2021-05-22 18:00:00 stationary day        
14 A     2021-05-22 21:00:00 pm_commute NA         
15 A     2021-05-23 00:00:00 pm_commute NA         
16 A     2021-05-23 03:00:00 stationary night      
17 A     2021-05-23 06:00:00 am_commute NA         
18 A     2021-05-23 09:00:00 am_commute NA         
19 A     2021-05-23 12:00:00 stationary day        
20 A     2021-05-23 15:00:00 stationary day        
21 A     2021-05-23 18:00:00 pm_commute NA         
22 A     2021-05-23 21:00:00 pm_commute NA

英文:

You can fill in "stationary" cells with previous commuting time using tidyr::fill() by groups of id.

library(dplyr)
data %&gt;%
  mutate(time_of_day = case_when(grepl(&quot;^am&quot;, commute) ~ &quot;day&quot;, grepl(&quot;^pm&quot;, commute) ~ &quot;night&quot;)) %&gt;%
  group_by(id) %&gt;%
  tidyr::fill(time_of_day) %&gt;%
  ungroup() %&gt;%
  mutate(time_of_day = replace(time_of_day, commute != &quot;stationary&quot;, NA))

Output

# A tibble: 22 &#215; 4
id    datetimes           commute    time_of_day
&lt;chr&gt; &lt;dttm&gt;              &lt;chr&gt;      &lt;chr&gt;      
1 A     2021-05-21 06:00:00 am_commute NA         
2 A     2021-05-21 09:00:00 am_commute NA         
3 A     2021-05-21 12:00:00 stationary day        
4 A     2021-05-21 15:00:00 stationary day        
5 A     2021-05-21 18:00:00 pm_commute NA         
6 A     2021-05-21 21:00:00 stationary night      
7 A     2021-05-22 00:00:00 stationary night      
8 A     2021-05-22 03:00:00 stationary night      
9 A     2021-05-22 06:00:00 am_commute NA         
10 A     2021-05-22 09:00:00 stationary day        
11 A     2021-05-22 12:00:00 stationary day        
12 A     2021-05-22 15:00:00 stationary day        
13 A     2021-05-22 18:00:00 stationary day        
14 A     2021-05-22 21:00:00 pm_commute NA         
15 A     2021-05-23 00:00:00 pm_commute NA         
16 A     2021-05-23 03:00:00 stationary night      
17 A     2021-05-23 06:00:00 am_commute NA         
18 A     2021-05-23 09:00:00 am_commute NA         
19 A     2021-05-23 12:00:00 stationary day        
20 A     2021-05-23 15:00:00 stationary day        
21 A     2021-05-23 18:00:00 pm_commute NA         
22 A     2021-05-23 21:00:00 pm_commute NA

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将满足条件的第一行和最后一行之间的连续行分类。

问题

答案1

答案2

输出

Output

在Pandas中，按另一列对数据进行分组，计算行之间的百分比变化。

使用tidyr unite将某些选择列的列值与列名合并。

为什么在R中的as.factor()函数如此缓慢，能否改进？

相图，用垂直线连接离散的相或状态。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。