2023年2月26日 19:09:58go评论94阅读模式

英文:

How to group by consecutive start time with end time in R?

问题

我尝试解决以下问题，将连续的开始和结束时间分组在一起，以计算总旅行费用。以下是一个示例数据和所需的输出。

# 从工作区中删除所有内存
rm(list = ls())

# 必需的库
library(tidyverse)
library(lubridate)

# 创建数据
df <- data.frame(CountryID = c('101', '101', '101', '101', '101', '102', '102', '102', '102'),
                 AreaID = c('1', '1', '1', '1', '1', '2', '2', '2', '2'),
                 Period = c('01/01/2023', '01/01/2023', '01/01/2023', '01/01/2023', '01/01/2023', '02/01/2023', '02/01/2023', '02/01/2023', '02/01/2023'),
                 Day = c('Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Monday', 'Monday', 'Monday', 'Monday'),
                 StartTime = c('7:00:00 AM', '7:30:00 AM', '8:00:00 AM', '8:30:00 AM', '9:00:00 AM', '7:00:00 AM', '7:30:00 AM', '8:00:00 AM', '8:30:00 AM'),
                 EndTime = c('7:30:00 AM', '8:00:00 AM', '8:30:00 AM', '9:00:00 AM', '9:30:00 AM', '7:30:00 AM', '8:00:00 AM', '8:30:00 AM', '9:00:00 AM'),
                 TravelCost = c(10, 12, 11, 13, 14, 12, 10, 9, 8))

# 所需的输出格式
Output <- data.frame(CountryID = c(101, 102),
                    AreaID = c(1, 2),
                    Period = c('01/01/2023', '02/01/2023'),
                    Day = c('Sunday', 'Monday'),
                    StartTime = c('7:00:00 AM', '7:00:00 AM'),
                    EndTime = c('9:30:00 AM', '9:0:00 AM'),
                    TotalTravelCost = c(60, 39))

# 我尝试了如下，但无法达到示例中所需的输出。
# 有人可以帮我找出我代码中可能遗漏的问题吗？
# 提前感谢。
Output <- df %>%
  group_by(CountryID, AreaID, Period, Day, StartTime, EndTime) %>%
  summarise(TotalTravelCost = sum(TravelCost))

英文:

I'm trying to solve below problem to group together for consecutive starting and ending time to calculate the total travel cost for each day in total duration. Here below is an example data and require output.

Remove all the memory from workspace

rm(list =ls())

Required library

library(tidyverse)
library(lubridate)

Craete data

df &lt;- data.frame(CountryID = c(&#39;101&#39;, &#39;101&#39;, &#39;101&#39;, &#39;101&#39;, &#39;101&#39;, &#39;102&#39;, &#39;102&#39;, &#39;102&#39;, &#39;102&#39;),
AreaID = c(&#39;1&#39;, &#39;1&#39;, &#39;1&#39;, &#39;1&#39;, &#39;1&#39;, &#39;2&#39;, &#39;2&#39;, &#39;2&#39;, &#39;2&#39;),
Period = c(&#39;01/01/2023&#39;, &#39;01/01/2023&#39;, &#39;01/01/2023&#39;, &#39;01/01/2023&#39;, &#39;01/01/2023&#39;, &#39;02/01/2023&#39;, &#39;02/01/2023&#39;, &#39;02/01/2023&#39;, &#39;02/01/2023&#39;),
Day = c(&#39;Sunday&#39;, &#39;Sunday&#39;, &#39;Sunday&#39;, &#39;Sunday&#39;, &#39;Sunday&#39;, &#39;Monday&#39;, &#39;Monday&#39;, &#39;Monday&#39;, &#39;Monday&#39;),
StartTime = c(&#39;7:00:00 AM&#39;, &#39;7:30:00 AM&#39;, &#39;8:00:00 AM&#39;, &#39;8:30:00 AM&#39;, &#39;9:00:00 AM&#39;, &#39;7:00:00 AM&#39;, &#39;7:30:00 AM&#39;, &#39;8:00:00 AM&#39;, &#39;8:30:00 AM&#39;),
EndTime = c(&#39;7:30:00 AM&#39;, &#39;8:00:00 AM&#39;, &#39;8:30:00 AM&#39;, &#39;9:00:00 AM&#39;, &#39;9:30:00 AM&#39;, &#39;7:30:00 AM&#39;, &#39;8:00:00 AM&#39;, &#39;8:30:00 AM&#39;, &#39;9:00:00 AM&#39;)
TravelCost = c(&#39;10&#39;, &#39;12&#39;, &#39;11&#39;, &#39;13&#39;, &#39;14&#39;, &#39;12&#39;, &#39;10&#39;, &#39;9&#39;, &#39;8&#39;))

Output format is required

Output &lt;- data.frame(CountryID = C(101, 102),
AreaID = C(1, 2),
Period = c(&#39;01/01/2023&#39;, &#39;02/01/2023&#39;),
Day = c(&#39;Sunday&#39;, &#39;Monday&#39;),
StartTime = c(&#39;7:00:00 AM&#39;, &#39;7:00:00 AM&#39;),
EndTime = c(&#39;9:30:00 AM&#39;, &#39;9:0:00 AM&#39;)
TotalTravelCost = c(&#39;60&#39;, &#39;39&#39;)

I tried as below but couldn't reach my require output as mentioned in example.

Can anyone help me to figure out the issue I missed in my codes?
Thanks in advance.


Output &lt;- df %&gt;%
group_by(CountryID, AreaID, Period, Day, StartTime, EndTime) %&gt;%
summarise(TotalTravelCost = sum(TravelCost))

答案1

得分: 0

下面是翻译好的部分：

可能类似以下内容：
library(dplyr)
Output <- df %>%
  group_by(CountryID, AreaID, Period, Day) %>%
  mutate(across(ends_with('Time'), ~ strptime(., '%I:%M:%S %p'))) %>%
  mutate(idx = cumsum(coalesce(+(StartTime - lag(EndTime) > 1L), 0L))) %>%
  group_by(CountryID, AreaID, Period, Day, idx) %>%
  summarise(
    StartTime = format(min(StartTime), '%I:%M:%S %p'),
    EndTime = format(max(EndTime), '%I:%M:%S %p'),
    TravelCost = sum(as.numeric(TravelCost), na.rm = TRUE)
  ) %>%
  ungroup %>%
  select(-idx)
Output:
> Output
# 一个数据框: 2 × 7
  CountryID AreaID Period     Day    StartTime   EndTime     TravelCost
  <chr>     <chr>  <chr>      <chr>  <chr>       <chr>            <dbl>
1 101       1      01/01/2023 Sunday 07:00:00 am 09:30:00 am         60
2 102       2      02/01/2023 Monday 07:00:00 am 09:00:00 am         39
请注意，我已编辑您的 `data.frame`，纠正了（假定的）拼写错误。如果您确实有奇怪格式的时间（例如 `08:0:00`），请还原到初始版本并解释。
<details>
<summary>英文:</summary>
Perhaps something like below:
    library(dplyr)
    
    Output &lt;- df %&gt;%
      group_by(CountryID, AreaID, Period, Day) %&gt;%
      mutate(across(ends_with(&#39;Time&#39;), ~ strptime(., &#39;%I:%M:%S %p&#39;))) %&gt;%
      mutate(idx = cumsum(coalesce(+(StartTime - lag(EndTime) &gt; 1L), 0L))) %&gt;%
      group_by(CountryID, AreaID, Period, Day, idx) %&gt;%
      summarise(
        StartTime = format(min(StartTime), &#39;%I:%M:%S %p&#39;),
        EndTime = format(max(EndTime), &#39;%I:%M:%S %p&#39;),
        TravelCost = sum(as.numeric(TravelCost), na.rm = TRUE)
      ) %&gt;%
      ungroup %&gt;%
      select(-idx)
Output:
    &gt; Output
    # A tibble: 2 &#215; 7
      CountryID AreaID Period     Day    StartTime   EndTime     TravelCost
      &lt;chr&gt;     &lt;chr&gt;  &lt;chr&gt;      &lt;chr&gt;  &lt;chr&gt;       &lt;chr&gt;            &lt;dbl&gt;
    1 101       1      01/01/2023 Sunday 07:00:00 am 09:30:00 am         60
    2 102       2      02/01/2023 Monday 07:00:00 am 09:00:00 am         39
Note that I&#39;ve edited your `data.frame` correcting the (assumed) typos. If you really have strangely formatted times (e.g. `08:0:00`), please revert to initial version and explain.
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在R中按连续的开始时间和结束时间分组？

问题

Remove all the memory from workspace

Required library

Craete data

Output format is required

I tried as below but couldn't reach my require output as mentioned in example.

答案1

如何在不同的ggplot列图中为相同类别使用相同的填充颜色？

如何使用 ggplot 绘制 F 统计量和 p 值

如何最佳优化我的R代码并避免循环

如何基于独占的共享值在R中选择列？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。