尝试在R中使用dplyr计算随时间变化的值。

huangapple go评论73阅读模式
英文:

Trying to use dplyr in R to calculate values over time

问题

在R中,我想要按天计算 sum(car * car_speed_hist_0to70plus[1]) / sum(car),我已经尝试了一些代码,但是感觉已经做错了。以下是您提供的代码的翻译:

library(dplyr)
library(timetk)

results %>%
  group_by(date_corr) %>%
  timetk::mutate_by_time(., .date_var = date_corr, .by = "day") %>%
  *** 一些用于按组求平均的函数 ***

请注意,您提到的列实际上不是 share,而是在 car_speed_hist_0to70plus 中使用的值。如果您需要更多的帮助,请提供更多具体的问题或要求。

英文:

I am using a data.frame in Rwhich has among column car, car_speed_hist_0to70plus (itself a vector), date_corr.

My time resolution is hour, on the time is POSIXct time.

I would like to calculate sum(car * car_speed_hist_0to70plus[1]) / sum(car) by day.

I have tried

library(dplyr)
library(timetk)
    
results %>%
      group_by(date_corr) %>%
      timetk::mutate_by_time(., .date_var = date_corr, .by = "day") %>%
      *** some function to average by group ***

but I think I am already doing everything wrong here.

Would you be able to help me?

Edits

  • dput(head(results)) leads to:
    structure(list(instance_id = c(-1L, -1L, -1L, -1L, -1L, -1L), 
        segment_id = c(9000004903, 9000004903, 9000004903, 9000004903, 
        9000004903, 9000004903), date = c("2023-03-29T19:00:00.000Z", 
        "2023-03-29T20:00:00.000Z", "2023-03-29T21:00:00.000Z", "2023-03-29T22:00:00.000Z", 
        "2023-03-29T23:00:00.000Z", "2023-03-30T00:00:00.000Z"), 
        interval = c("hourly", "hourly", "hourly", "hourly", "hourly", 
        "hourly"), uptime = c(0.4997222222, 0.6575, 0.9997222222, 
        0.9997222222, 0.9997222222, 0.9991666667), heavy = c(6, 0, 
        0, 0, 0, 0), car = c(4, 0, 0, 0, 0, 0), bike = c(0, 0, 0, 
        0, 0, 0), pedestrian = c(0, 0, 0, 0, 0, 0), heavy_lft = c(0, 
        0, 0, 0, 0, 0), heavy_rgt = c(6, 0, 0, 0, 0, 0), car_lft = c(2, 
        0, 0, 0, 0, 0), car_rgt = c(2, 0, 0, 0, 0, 0), bike_lft = c(0, 
        0, 0, 0, 0, 0), bike_rgt = c(0, 0, 0, 0, 0, 0), pedestrian_lft = c(0, 
        0, 0, 0, 0, 0), pedestrian_rgt = c(0, 0, 0, 0, 0, 0), direction = c(1L, 
        1L, 1L, 1L, 1L, 1L), car_speed_hist_0to70plus = list(c(50, 
        50, 0, 0, 0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0), c(0, 0, 
        0, 0, 0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0), c(0, 0, 0, 
        0, 0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0)), car_speed_hist_0to120plus = list(
            c(50, 0, 50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
            0, 0, 0, 0, 0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), c(0, 
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
            0, 0, 0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), c(0, 0, 0, 
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
            0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), timezone = c("Europe/Paris", 
        "Europe/Paris", "Europe/Paris", "Europe/Paris", "Europe/Paris", 
        "Europe/Paris"), v85 = c(11, NA, NA, NA, NA, NA), date_corr = structure(c(1680116400, 
        1680120000, 1680123600, 1680127200, 1680130800, 1680134400
        ), class = c("POSIXct", "POSIXt"), tzone = "Europe/Paris"), 
        valid = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)), row.names = c(NA, 
    6L), class = "data.frame")
  • In fact, my column is not exactly share but I am using values within car_speed_hist_0to70plus

答案1

得分: 1

你希望你的分组变量在整个组中保持一致,因为每一行都有唯一的 date_corr 值(1小时分辨率),group_by(date_corr) 会生成单行组。我们将通过先将 date_corr 转换为日期来获取每日分组。

car_speed_hist_0to70plus 向量的第一个元素提取为单独的步骤,这样更容易理解:

library(dplyr)
results %>%
  # 从 car_speed_hist_0to70plus 向量中提取第一个元素
  mutate(spd0to70plus_1 = purrr::map_int(car_speed_hist_0to70plus, first)) %>%
  group_by(date = lubridate::date(date_corr)) %>%
  summarise(daily_avg = sum(car * spd0to70plus_1) / sum(car))
#> # A tibble: 2 × 2
#>   date       daily_avg
#>   <date>         <dbl>
#> 1 2023-03-29        50
#> 2 2023-03-30       NaN

在这里使用 map_int 来访问每个个体行的 car_speed_hist_0to70plus 项目,first 是一个函数,用于调用每个 car_speed_hist_0to70plus 向量,并从这些向量中提取第一个项目。在 mutate 中使用 car_speed_hist_0to70plus[1] 将意味着提取 car_speed_hist_0to70plus 列的第一个项目,在这个示例中它是一个向量 c(50, 50, 0, 0, 0, 0, 0, 0)

另一种方法是按行分组,然后我们可以在 mutate() 中使用例如 car_speed_hist_0to70plus[1]car_speed_hist_0to70plus[4]

results %>%
  rowwise() %>%
  mutate(spd0to70plus_s = car_speed_hist_0to70plus[1] + car_speed_hist_0to70plus[4]) %>%
  # 或者:
  # mutate(spd0to70plus_s = sum(car_speed_hist_0to70plus[c(1,4)]))
  group_by(date = lubridate::date(date_corr)) %>%
  summarise(daily_avg = sum(car * spd0to70plus_s) / sum(car))

创建于2023-05-29,使用 reprex v2.0.2

英文:

You'd want your grouping variable to be identical across the group, as each row has unique date_corr value (1h resolution), group_by(date_corr) would generate single-row groups. We'll get daily groups by transforming that date_corr to date first.

Extracting first elements of car_speed_hist_0to70plus vectors as a separate step makes it bit easier to follow:

library(dplyr)
results %&gt;% 
  # extact first items from car_speed_hist_0to70plus vectors
  mutate(spd0to70plus_1 = purrr::map_int(car_speed_hist_0to70plus, first)) %&gt;% 
  group_by(date = lubridate::date(date_corr)) %&gt;% 
  summarise(daily_avg = sum(car * spd0to70plus_1) / sum(car))
#&gt; # A tibble: 2 &#215; 2
#&gt;   date       daily_avg
#&gt;   &lt;date&gt;         &lt;dbl&gt;
#&gt; 1 2023-03-29        50
#&gt; 2 2023-03-30       NaN

map_int is used here to access car_speed_hist_0to70plus items of each individual row, first is a function that gets called for each
car_speed_hist_0to70plus vector and it extracts first item from each of those vectors. Using car_speed_hist_0to70plus[1] in mutate would mean first item of car_speed_hist_0to70plus column, in this example it's a vector c(50, 50, 0, 0, 0, 0, 0, 0)

Another method would be rowwise grouping, then we could just use e.g. car_speed_hist_0to70plus[1] and car_speed_hist_0to70plus[4] in mutate():

results %&gt;% 
  rowwise() %&gt;% 
  mutate(spd0to70plus_s = car_speed_hist_0to70plus[1] + car_speed_hist_0to70plus[4]) %&gt;% 
  # or:
  # mutate(spd0to70plus_s = sum(car_speed_hist_0to70plus[c(1,4)]))
  group_by(date = lubridate::date(date_corr)) %&gt;% 
  summarise(daily_avg = sum(car * spd0to70plus_s) / sum(car))

<sup>Created on 2023-05-29 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年5月29日 20:44:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76357508.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定