英文:
Trying to use dplyr in R to calculate values over time
问题
在R中,我想要按天计算 sum(car * car_speed_hist_0to70plus[1]) / sum(car)
,我已经尝试了一些代码,但是感觉已经做错了。以下是您提供的代码的翻译:
library(dplyr)
library(timetk)
results %>%
group_by(date_corr) %>%
timetk::mutate_by_time(., .date_var = date_corr, .by = "day") %>%
*** 一些用于按组求平均的函数 ***
请注意,您提到的列实际上不是 share
,而是在 car_speed_hist_0to70plus
中使用的值。如果您需要更多的帮助,请提供更多具体的问题或要求。
英文:
I am using a data.frame
in R
which has among column car
, car_speed_hist_0to70plus
(itself a vector), date_corr
.
My time resolution is hour, on the time is POSIXct
time.
I would like to calculate sum(car * car_speed_hist_0to70plus[1]) / sum(car)
by day.
I have tried
library(dplyr)
library(timetk)
results %>%
group_by(date_corr) %>%
timetk::mutate_by_time(., .date_var = date_corr, .by = "day") %>%
*** some function to average by group ***
but I think I am already doing everything wrong here.
Would you be able to help me?
Edits
dput(head(results))
leads to:
structure(list(instance_id = c(-1L, -1L, -1L, -1L, -1L, -1L),
segment_id = c(9000004903, 9000004903, 9000004903, 9000004903,
9000004903, 9000004903), date = c("2023-03-29T19:00:00.000Z",
"2023-03-29T20:00:00.000Z", "2023-03-29T21:00:00.000Z", "2023-03-29T22:00:00.000Z",
"2023-03-29T23:00:00.000Z", "2023-03-30T00:00:00.000Z"),
interval = c("hourly", "hourly", "hourly", "hourly", "hourly",
"hourly"), uptime = c(0.4997222222, 0.6575, 0.9997222222,
0.9997222222, 0.9997222222, 0.9991666667), heavy = c(6, 0,
0, 0, 0, 0), car = c(4, 0, 0, 0, 0, 0), bike = c(0, 0, 0,
0, 0, 0), pedestrian = c(0, 0, 0, 0, 0, 0), heavy_lft = c(0,
0, 0, 0, 0, 0), heavy_rgt = c(6, 0, 0, 0, 0, 0), car_lft = c(2,
0, 0, 0, 0, 0), car_rgt = c(2, 0, 0, 0, 0, 0), bike_lft = c(0,
0, 0, 0, 0, 0), bike_rgt = c(0, 0, 0, 0, 0, 0), pedestrian_lft = c(0,
0, 0, 0, 0, 0), pedestrian_rgt = c(0, 0, 0, 0, 0, 0), direction = c(1L,
1L, 1L, 1L, 1L, 1L), car_speed_hist_0to70plus = list(c(50,
50, 0, 0, 0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0), c(0, 0,
0, 0, 0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0), c(0, 0, 0,
0, 0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0)), car_speed_hist_0to120plus = list(
c(50, 0, 50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), timezone = c("Europe/Paris",
"Europe/Paris", "Europe/Paris", "Europe/Paris", "Europe/Paris",
"Europe/Paris"), v85 = c(11, NA, NA, NA, NA, NA), date_corr = structure(c(1680116400,
1680120000, 1680123600, 1680127200, 1680130800, 1680134400
), class = c("POSIXct", "POSIXt"), tzone = "Europe/Paris"),
valid = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)), row.names = c(NA,
6L), class = "data.frame")
- In fact, my column is not exactly
share
but I am using values withincar_speed_hist_0to70plus
答案1
得分: 1
你希望你的分组变量在整个组中保持一致,因为每一行都有唯一的 date_corr
值(1小时分辨率),group_by(date_corr)
会生成单行组。我们将通过先将 date_corr
转换为日期来获取每日分组。
将 car_speed_hist_0to70plus
向量的第一个元素提取为单独的步骤,这样更容易理解:
library(dplyr)
results %>%
# 从 car_speed_hist_0to70plus 向量中提取第一个元素
mutate(spd0to70plus_1 = purrr::map_int(car_speed_hist_0to70plus, first)) %>%
group_by(date = lubridate::date(date_corr)) %>%
summarise(daily_avg = sum(car * spd0to70plus_1) / sum(car))
#> # A tibble: 2 × 2
#> date daily_avg
#> <date> <dbl>
#> 1 2023-03-29 50
#> 2 2023-03-30 NaN
在这里使用 map_int
来访问每个个体行的 car_speed_hist_0to70plus
项目,first
是一个函数,用于调用每个 car_speed_hist_0to70plus
向量,并从这些向量中提取第一个项目。在 mutate 中使用 car_speed_hist_0to70plus[1]
将意味着提取 car_speed_hist_0to70plus
列的第一个项目,在这个示例中它是一个向量 c(50, 50, 0, 0, 0, 0, 0, 0)
。
另一种方法是按行分组,然后我们可以在 mutate()
中使用例如 car_speed_hist_0to70plus[1]
和 car_speed_hist_0to70plus[4]
:
results %>%
rowwise() %>%
mutate(spd0to70plus_s = car_speed_hist_0to70plus[1] + car_speed_hist_0to70plus[4]) %>%
# 或者:
# mutate(spd0to70plus_s = sum(car_speed_hist_0to70plus[c(1,4)]))
group_by(date = lubridate::date(date_corr)) %>%
summarise(daily_avg = sum(car * spd0to70plus_s) / sum(car))
创建于2023-05-29,使用 reprex v2.0.2
英文:
You'd want your grouping variable to be identical across the group, as each row has unique date_corr
value (1h resolution), group_by(date_corr)
would generate single-row groups. We'll get daily groups by transforming that date_corr
to date first.
Extracting first elements of car_speed_hist_0to70plus
vectors as a separate step makes it bit easier to follow:
library(dplyr)
results %>%
# extact first items from car_speed_hist_0to70plus vectors
mutate(spd0to70plus_1 = purrr::map_int(car_speed_hist_0to70plus, first)) %>%
group_by(date = lubridate::date(date_corr)) %>%
summarise(daily_avg = sum(car * spd0to70plus_1) / sum(car))
#> # A tibble: 2 × 2
#> date daily_avg
#> <date> <dbl>
#> 1 2023-03-29 50
#> 2 2023-03-30 NaN
map_int
is used here to access car_speed_hist_0to70plus
items of each individual row, first
is a function that gets called for each
car_speed_hist_0to70plus
vector and it extracts first item from each of those vectors. Using car_speed_hist_0to70plus[1]
in mutate would mean first item of car_speed_hist_0to70plus
column, in this example it's a vector c(50, 50, 0, 0, 0, 0, 0, 0)
Another method would be rowwise grouping, then we could just use e.g. car_speed_hist_0to70plus[1]
and car_speed_hist_0to70plus[4]
in mutate()
:
results %>%
rowwise() %>%
mutate(spd0to70plus_s = car_speed_hist_0to70plus[1] + car_speed_hist_0to70plus[4]) %>%
# or:
# mutate(spd0to70plus_s = sum(car_speed_hist_0to70plus[c(1,4)]))
group_by(date = lubridate::date(date_corr)) %>%
summarise(daily_avg = sum(car * spd0to70plus_s) / sum(car))
<sup>Created on 2023-05-29 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论