英文:
How to group by consecutive start time with end time in R?
问题
我尝试解决以下问题,将连续的开始和结束时间分组在一起,以计算总旅行费用。以下是一个示例数据和所需的输出。
# 从工作区中删除所有内存
rm(list = ls())
# 必需的库
library(tidyverse)
library(lubridate)
# 创建数据
df <- data.frame(CountryID = c('101', '101', '101', '101', '101', '102', '102', '102', '102'),
AreaID = c('1', '1', '1', '1', '1', '2', '2', '2', '2'),
Period = c('01/01/2023', '01/01/2023', '01/01/2023', '01/01/2023', '01/01/2023', '02/01/2023', '02/01/2023', '02/01/2023', '02/01/2023'),
Day = c('Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Monday', 'Monday', 'Monday', 'Monday'),
StartTime = c('7:00:00 AM', '7:30:00 AM', '8:00:00 AM', '8:30:00 AM', '9:00:00 AM', '7:00:00 AM', '7:30:00 AM', '8:00:00 AM', '8:30:00 AM'),
EndTime = c('7:30:00 AM', '8:00:00 AM', '8:30:00 AM', '9:00:00 AM', '9:30:00 AM', '7:30:00 AM', '8:00:00 AM', '8:30:00 AM', '9:00:00 AM'),
TravelCost = c(10, 12, 11, 13, 14, 12, 10, 9, 8))
# 所需的输出格式
Output <- data.frame(CountryID = c(101, 102),
AreaID = c(1, 2),
Period = c('01/01/2023', '02/01/2023'),
Day = c('Sunday', 'Monday'),
StartTime = c('7:00:00 AM', '7:00:00 AM'),
EndTime = c('9:30:00 AM', '9:0:00 AM'),
TotalTravelCost = c(60, 39))
# 我尝试了如下,但无法达到示例中所需的输出。
# 有人可以帮我找出我代码中可能遗漏的问题吗?
# 提前感谢。
Output <- df %>%
group_by(CountryID, AreaID, Period, Day, StartTime, EndTime) %>%
summarise(TotalTravelCost = sum(TravelCost))
英文:
I'm trying to solve below problem to group together for consecutive starting and ending time to calculate the total travel cost for each day in total duration. Here below is an example data and require output.
Remove all the memory from workspace
rm(list =ls())
Required library
library(tidyverse)
library(lubridate)
Craete data
df <- data.frame(CountryID = c('101', '101', '101', '101', '101', '102', '102', '102', '102'),
AreaID = c('1', '1', '1', '1', '1', '2', '2', '2', '2'),
Period = c('01/01/2023', '01/01/2023', '01/01/2023', '01/01/2023', '01/01/2023', '02/01/2023', '02/01/2023', '02/01/2023', '02/01/2023'),
Day = c('Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Monday', 'Monday', 'Monday', 'Monday'),
StartTime = c('7:00:00 AM', '7:30:00 AM', '8:00:00 AM', '8:30:00 AM', '9:00:00 AM', '7:00:00 AM', '7:30:00 AM', '8:00:00 AM', '8:30:00 AM'),
EndTime = c('7:30:00 AM', '8:00:00 AM', '8:30:00 AM', '9:00:00 AM', '9:30:00 AM', '7:30:00 AM', '8:00:00 AM', '8:30:00 AM', '9:00:00 AM')
TravelCost = c('10', '12', '11', '13', '14', '12', '10', '9', '8'))
Output format is required
Output <- data.frame(CountryID = C(101, 102),
AreaID = C(1, 2),
Period = c('01/01/2023', '02/01/2023'),
Day = c('Sunday', 'Monday'),
StartTime = c('7:00:00 AM', '7:00:00 AM'),
EndTime = c('9:30:00 AM', '9:0:00 AM')
TotalTravelCost = c('60', '39')
I tried as below but couldn't reach my require output as mentioned in example.
Can anyone help me to figure out the issue I missed in my codes?
Thanks in advance.
Output <- df %>%
group_by(CountryID, AreaID, Period, Day, StartTime, EndTime) %>%
summarise(TotalTravelCost = sum(TravelCost))
答案1
得分: 0
下面是翻译好的部分:
可能类似以下内容:
library(dplyr)
Output <- df %>%
group_by(CountryID, AreaID, Period, Day) %>%
mutate(across(ends_with('Time'), ~ strptime(., '%I:%M:%S %p'))) %>%
mutate(idx = cumsum(coalesce(+(StartTime - lag(EndTime) > 1L), 0L))) %>%
group_by(CountryID, AreaID, Period, Day, idx) %>%
summarise(
StartTime = format(min(StartTime), '%I:%M:%S %p'),
EndTime = format(max(EndTime), '%I:%M:%S %p'),
TravelCost = sum(as.numeric(TravelCost), na.rm = TRUE)
) %>%
ungroup %>%
select(-idx)
Output:
> Output
# 一个数据框: 2 × 7
CountryID AreaID Period Day StartTime EndTime TravelCost
<chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 101 1 01/01/2023 Sunday 07:00:00 am 09:30:00 am 60
2 102 2 02/01/2023 Monday 07:00:00 am 09:00:00 am 39
请注意,我已编辑您的 `data.frame`,纠正了(假定的)拼写错误。如果您确实有奇怪格式的时间(例如 `08:0:00`),请还原到初始版本并解释。
<details>
<summary>英文:</summary>
Perhaps something like below:
library(dplyr)
Output <- df %>%
group_by(CountryID, AreaID, Period, Day) %>%
mutate(across(ends_with('Time'), ~ strptime(., '%I:%M:%S %p'))) %>%
mutate(idx = cumsum(coalesce(+(StartTime - lag(EndTime) > 1L), 0L))) %>%
group_by(CountryID, AreaID, Period, Day, idx) %>%
summarise(
StartTime = format(min(StartTime), '%I:%M:%S %p'),
EndTime = format(max(EndTime), '%I:%M:%S %p'),
TravelCost = sum(as.numeric(TravelCost), na.rm = TRUE)
) %>%
ungroup %>%
select(-idx)
Output:
> Output
# A tibble: 2 × 7
CountryID AreaID Period Day StartTime EndTime TravelCost
<chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 101 1 01/01/2023 Sunday 07:00:00 am 09:30:00 am 60
2 102 2 02/01/2023 Monday 07:00:00 am 09:00:00 am 39
Note that I've edited your `data.frame` correcting the (assumed) typos. If you really have strangely formatted times (e.g. `08:0:00`), please revert to initial version and explain.
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论