英文:
Trying to use dplyr in R to calculate values over time
问题
在R中,我想要按天计算 sum(car * car_speed_hist_0to70plus[1]) / sum(car),我已经尝试了一些代码,但是感觉已经做错了。以下是您提供的代码的翻译:
library(dplyr)
library(timetk)
results %>%
group_by(date_corr) %>%
timetk::mutate_by_time(., .date_var = date_corr, .by = "day") %>%
*** 一些用于按组求平均的函数 ***
请注意,您提到的列实际上不是 share,而是在 car_speed_hist_0to70plus 中使用的值。如果您需要更多的帮助,请提供更多具体的问题或要求。
英文:
I am using a data.frame in Rwhich has among column car, car_speed_hist_0to70plus (itself a vector), date_corr.
My time resolution is hour, on the time is POSIXct time.
I would like to calculate sum(car * car_speed_hist_0to70plus[1]) / sum(car) by day.
I have tried
library(dplyr)
library(timetk)
results %>%
group_by(date_corr) %>%
timetk::mutate_by_time(., .date_var = date_corr, .by = "day") %>%
*** some function to average by group ***
but I think I am already doing everything wrong here.
Would you be able to help me?
Edits
dput(head(results))leads to:
structure(list(instance_id = c(-1L, -1L, -1L, -1L, -1L, -1L),
segment_id = c(9000004903, 9000004903, 9000004903, 9000004903,
9000004903, 9000004903), date = c("2023-03-29T19:00:00.000Z",
"2023-03-29T20:00:00.000Z", "2023-03-29T21:00:00.000Z", "2023-03-29T22:00:00.000Z",
"2023-03-29T23:00:00.000Z", "2023-03-30T00:00:00.000Z"),
interval = c("hourly", "hourly", "hourly", "hourly", "hourly",
"hourly"), uptime = c(0.4997222222, 0.6575, 0.9997222222,
0.9997222222, 0.9997222222, 0.9991666667), heavy = c(6, 0,
0, 0, 0, 0), car = c(4, 0, 0, 0, 0, 0), bike = c(0, 0, 0,
0, 0, 0), pedestrian = c(0, 0, 0, 0, 0, 0), heavy_lft = c(0,
0, 0, 0, 0, 0), heavy_rgt = c(6, 0, 0, 0, 0, 0), car_lft = c(2,
0, 0, 0, 0, 0), car_rgt = c(2, 0, 0, 0, 0, 0), bike_lft = c(0,
0, 0, 0, 0, 0), bike_rgt = c(0, 0, 0, 0, 0, 0), pedestrian_lft = c(0,
0, 0, 0, 0, 0), pedestrian_rgt = c(0, 0, 0, 0, 0, 0), direction = c(1L,
1L, 1L, 1L, 1L, 1L), car_speed_hist_0to70plus = list(c(50,
50, 0, 0, 0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0), c(0, 0,
0, 0, 0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0), c(0, 0, 0,
0, 0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0)), car_speed_hist_0to120plus = list(
c(50, 0, 50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), timezone = c("Europe/Paris",
"Europe/Paris", "Europe/Paris", "Europe/Paris", "Europe/Paris",
"Europe/Paris"), v85 = c(11, NA, NA, NA, NA, NA), date_corr = structure(c(1680116400,
1680120000, 1680123600, 1680127200, 1680130800, 1680134400
), class = c("POSIXct", "POSIXt"), tzone = "Europe/Paris"),
valid = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)), row.names = c(NA,
6L), class = "data.frame")
- In fact, my column is not exactly
sharebut I am using values withincar_speed_hist_0to70plus
答案1
得分: 1
你希望你的分组变量在整个组中保持一致,因为每一行都有唯一的 date_corr 值(1小时分辨率),group_by(date_corr) 会生成单行组。我们将通过先将 date_corr 转换为日期来获取每日分组。
将 car_speed_hist_0to70plus 向量的第一个元素提取为单独的步骤,这样更容易理解:
library(dplyr)
results %>%
# 从 car_speed_hist_0to70plus 向量中提取第一个元素
mutate(spd0to70plus_1 = purrr::map_int(car_speed_hist_0to70plus, first)) %>%
group_by(date = lubridate::date(date_corr)) %>%
summarise(daily_avg = sum(car * spd0to70plus_1) / sum(car))
#> # A tibble: 2 × 2
#> date daily_avg
#> <date> <dbl>
#> 1 2023-03-29 50
#> 2 2023-03-30 NaN
在这里使用 map_int 来访问每个个体行的 car_speed_hist_0to70plus 项目,first 是一个函数,用于调用每个 car_speed_hist_0to70plus 向量,并从这些向量中提取第一个项目。在 mutate 中使用 car_speed_hist_0to70plus[1] 将意味着提取 car_speed_hist_0to70plus 列的第一个项目,在这个示例中它是一个向量 c(50, 50, 0, 0, 0, 0, 0, 0)。
另一种方法是按行分组,然后我们可以在 mutate() 中使用例如 car_speed_hist_0to70plus[1] 和 car_speed_hist_0to70plus[4]:
results %>%
rowwise() %>%
mutate(spd0to70plus_s = car_speed_hist_0to70plus[1] + car_speed_hist_0to70plus[4]) %>%
# 或者:
# mutate(spd0to70plus_s = sum(car_speed_hist_0to70plus[c(1,4)]))
group_by(date = lubridate::date(date_corr)) %>%
summarise(daily_avg = sum(car * spd0to70plus_s) / sum(car))
创建于2023-05-29,使用 reprex v2.0.2
英文:
You'd want your grouping variable to be identical across the group, as each row has unique date_corr value (1h resolution), group_by(date_corr) would generate single-row groups. We'll get daily groups by transforming that date_corr to date first.
Extracting first elements of car_speed_hist_0to70plus vectors as a separate step makes it bit easier to follow:
library(dplyr)
results %>%
# extact first items from car_speed_hist_0to70plus vectors
mutate(spd0to70plus_1 = purrr::map_int(car_speed_hist_0to70plus, first)) %>%
group_by(date = lubridate::date(date_corr)) %>%
summarise(daily_avg = sum(car * spd0to70plus_1) / sum(car))
#> # A tibble: 2 × 2
#> date daily_avg
#> <date> <dbl>
#> 1 2023-03-29 50
#> 2 2023-03-30 NaN
map_int is used here to access car_speed_hist_0to70plus items of each individual row, first is a function that gets called for each
car_speed_hist_0to70plus vector and it extracts first item from each of those vectors. Using car_speed_hist_0to70plus[1] in mutate would mean first item of car_speed_hist_0to70plus column, in this example it's a vector c(50, 50, 0, 0, 0, 0, 0, 0)
Another method would be rowwise grouping, then we could just use e.g. car_speed_hist_0to70plus[1] and car_speed_hist_0to70plus[4] in mutate():
results %>%
rowwise() %>%
mutate(spd0to70plus_s = car_speed_hist_0to70plus[1] + car_speed_hist_0to70plus[4]) %>%
# or:
# mutate(spd0to70plus_s = sum(car_speed_hist_0to70plus[c(1,4)]))
group_by(date = lubridate::date(date_corr)) %>%
summarise(daily_avg = sum(car * spd0to70plus_s) / sum(car))
<sup>Created on 2023-05-29 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论