如何在R中按连续的开始时间和结束时间分组?

huangapple go评论70阅读模式
英文:

How to group by consecutive start time with end time in R?

问题

我尝试解决以下问题,将连续的开始和结束时间分组在一起,以计算总旅行费用。以下是一个示例数据和所需的输出。

# 从工作区中删除所有内存
rm(list = ls())
# 必需的库
library(tidyverse)
library(lubridate)
# 创建数据
df <- data.frame(CountryID = c('101', '101', '101', '101', '101', '102', '102', '102', '102'),
                 AreaID = c('1', '1', '1', '1', '1', '2', '2', '2', '2'),
                 Period = c('01/01/2023', '01/01/2023', '01/01/2023', '01/01/2023', '01/01/2023', '02/01/2023', '02/01/2023', '02/01/2023', '02/01/2023'),
                 Day = c('Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Monday', 'Monday', 'Monday', 'Monday'),
                 StartTime = c('7:00:00 AM', '7:30:00 AM', '8:00:00 AM', '8:30:00 AM', '9:00:00 AM', '7:00:00 AM', '7:30:00 AM', '8:00:00 AM', '8:30:00 AM'),
                 EndTime = c('7:30:00 AM', '8:00:00 AM', '8:30:00 AM', '9:00:00 AM', '9:30:00 AM', '7:30:00 AM', '8:00:00 AM', '8:30:00 AM', '9:00:00 AM'),
                 TravelCost = c(10, 12, 11, 13, 14, 12, 10, 9, 8))
# 所需的输出格式
Output <- data.frame(CountryID = c(101, 102),
                    AreaID = c(1, 2),
                    Period = c('01/01/2023', '02/01/2023'),
                    Day = c('Sunday', 'Monday'),
                    StartTime = c('7:00:00 AM', '7:00:00 AM'),
                    EndTime = c('9:30:00 AM', '9:0:00 AM'),
                    TotalTravelCost = c(60, 39))
# 我尝试了如下,但无法达到示例中所需的输出。
# 有人可以帮我找出我代码中可能遗漏的问题吗?
# 提前感谢。

Output <- df %>%
  group_by(CountryID, AreaID, Period, Day, StartTime, EndTime) %>%
  summarise(TotalTravelCost = sum(TravelCost))
英文:

I'm trying to solve below problem to group together for consecutive starting and ending time to calculate the total travel cost for each day in total duration. Here below is an example data and require output.

Remove all the memory from workspace

rm(list =ls())

Required library

library(tidyverse)
library(lubridate)

Craete data

df &lt;- data.frame(CountryID = c(&#39;101&#39;, &#39;101&#39;, &#39;101&#39;, &#39;101&#39;, &#39;101&#39;, &#39;102&#39;, &#39;102&#39;, &#39;102&#39;, &#39;102&#39;),
AreaID = c(&#39;1&#39;, &#39;1&#39;, &#39;1&#39;, &#39;1&#39;, &#39;1&#39;, &#39;2&#39;, &#39;2&#39;, &#39;2&#39;, &#39;2&#39;),
Period = c(&#39;01/01/2023&#39;, &#39;01/01/2023&#39;, &#39;01/01/2023&#39;, &#39;01/01/2023&#39;, &#39;01/01/2023&#39;, &#39;02/01/2023&#39;, &#39;02/01/2023&#39;, &#39;02/01/2023&#39;, &#39;02/01/2023&#39;),
Day = c(&#39;Sunday&#39;, &#39;Sunday&#39;, &#39;Sunday&#39;, &#39;Sunday&#39;, &#39;Sunday&#39;, &#39;Monday&#39;, &#39;Monday&#39;, &#39;Monday&#39;, &#39;Monday&#39;),
StartTime = c(&#39;7:00:00 AM&#39;, &#39;7:30:00 AM&#39;, &#39;8:00:00 AM&#39;, &#39;8:30:00 AM&#39;, &#39;9:00:00 AM&#39;, &#39;7:00:00 AM&#39;, &#39;7:30:00 AM&#39;, &#39;8:00:00 AM&#39;, &#39;8:30:00 AM&#39;),
EndTime = c(&#39;7:30:00 AM&#39;, &#39;8:00:00 AM&#39;, &#39;8:30:00 AM&#39;, &#39;9:00:00 AM&#39;, &#39;9:30:00 AM&#39;, &#39;7:30:00 AM&#39;, &#39;8:00:00 AM&#39;, &#39;8:30:00 AM&#39;, &#39;9:00:00 AM&#39;)
TravelCost = c(&#39;10&#39;, &#39;12&#39;, &#39;11&#39;, &#39;13&#39;, &#39;14&#39;, &#39;12&#39;, &#39;10&#39;, &#39;9&#39;, &#39;8&#39;))

Output format is required

Output &lt;- data.frame(CountryID = C(101, 102),
AreaID = C(1, 2),
Period = c(&#39;01/01/2023&#39;, &#39;02/01/2023&#39;),
Day = c(&#39;Sunday&#39;, &#39;Monday&#39;),
StartTime = c(&#39;7:00:00 AM&#39;, &#39;7:00:00 AM&#39;),
EndTime = c(&#39;9:30:00 AM&#39;, &#39;9:0:00 AM&#39;)
TotalTravelCost = c(&#39;60&#39;, &#39;39&#39;)

I tried as below but couldn't reach my require output as mentioned in example.

Can anyone help me to figure out the issue I missed in my codes?
Thanks in advance.


Output &lt;- df %&gt;%
group_by(CountryID, AreaID, Period, Day, StartTime, EndTime) %&gt;%
summarise(TotalTravelCost = sum(TravelCost))

答案1

得分: 0

下面是翻译好的部分:

可能类似以下内容:

library(dplyr)

Output <- df %>%
  group_by(CountryID, AreaID, Period, Day) %>%
  mutate(across(ends_with('Time'), ~ strptime(., '%I:%M:%S %p'))) %>%
  mutate(idx = cumsum(coalesce(+(StartTime - lag(EndTime) > 1L), 0L))) %>%
  group_by(CountryID, AreaID, Period, Day, idx) %>%
  summarise(
    StartTime = format(min(StartTime), '%I:%M:%S %p'),
    EndTime = format(max(EndTime), '%I:%M:%S %p'),
    TravelCost = sum(as.numeric(TravelCost), na.rm = TRUE)
  ) %>%
  ungroup %>%
  select(-idx)

Output:

> Output

# 一个数据框: 2 × 7
  CountryID AreaID Period     Day    StartTime   EndTime     TravelCost
  <chr>     <chr>  <chr>      <chr>  <chr>       <chr>            <dbl>
1 101       1      01/01/2023 Sunday 07:00:00 am 09:30:00 am         60
2 102       2      02/01/2023 Monday 07:00:00 am 09:00:00 am         39

请注意,我已编辑您的 `data.frame`,纠正了(假定的)拼写错误。如果您确实有奇怪格式的时间(例如 `08:0:00`),请还原到初始版本并解释。

<details>
<summary>英文:</summary>

Perhaps something like below:

    library(dplyr)
    
    Output &lt;- df %&gt;%
      group_by(CountryID, AreaID, Period, Day) %&gt;%
      mutate(across(ends_with(&#39;Time&#39;), ~ strptime(., &#39;%I:%M:%S %p&#39;))) %&gt;%
      mutate(idx = cumsum(coalesce(+(StartTime - lag(EndTime) &gt; 1L), 0L))) %&gt;%
      group_by(CountryID, AreaID, Period, Day, idx) %&gt;%
      summarise(
        StartTime = format(min(StartTime), &#39;%I:%M:%S %p&#39;),
        EndTime = format(max(EndTime), &#39;%I:%M:%S %p&#39;),
        TravelCost = sum(as.numeric(TravelCost), na.rm = TRUE)
      ) %&gt;%
      ungroup %&gt;%
      select(-idx)

Output:

    &gt; Output

    # A tibble: 2 &#215; 7
      CountryID AreaID Period     Day    StartTime   EndTime     TravelCost
      &lt;chr&gt;     &lt;chr&gt;  &lt;chr&gt;      &lt;chr&gt;  &lt;chr&gt;       &lt;chr&gt;            &lt;dbl&gt;
    1 101       1      01/01/2023 Sunday 07:00:00 am 09:30:00 am         60
    2 102       2      02/01/2023 Monday 07:00:00 am 09:00:00 am         39

Note that I&#39;ve edited your `data.frame` correcting the (assumed) typos. If you really have strangely formatted times (e.g. `08:0:00`), please revert to initial version and explain.

</details>



huangapple
  • 本文由 发表于 2023年2月26日 19:09:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/75571564.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定