2023年5月29日 20:44:55go评论114阅读模式

英文:

Trying to use dplyr in R to calculate values over time

问题

在R中，我想要按天计算 sum(car * car_speed_hist_0to70plus[1]) / sum(car)，我已经尝试了一些代码，但是感觉已经做错了。以下是您提供的代码的翻译：

library(dplyr)
library(timetk)
results %>%
  group_by(date_corr) %>%
  timetk::mutate_by_time(., .date_var = date_corr, .by = "day") %>%
  *** 一些用于按组求平均的函数 ***

请注意，您提到的列实际上不是 share，而是在 car_speed_hist_0to70plus 中使用的值。如果您需要更多的帮助，请提供更多具体的问题或要求。

英文:

I am using a data.frame in Rwhich has among column car, car_speed_hist_0to70plus (itself a vector), date_corr.

My time resolution is hour, on the time is POSIXct time.

I would like to calculate sum(car * car_speed_hist_0to70plus[1]) / sum(car) by day.

I have tried

library(dplyr)
library(timetk)
    
results %&gt;%
      group_by(date_corr) %&gt;%
      timetk::mutate_by_time(., .date_var = date_corr, .by = &quot;day&quot;) %&gt;%
      *** some function to average by group ***

but I think I am already doing everything wrong here.

Would you be able to help me?

Edits

dput(head(results)) leads to:

    structure(list(instance_id = c(-1L, -1L, -1L, -1L, -1L, -1L), 
        segment_id = c(9000004903, 9000004903, 9000004903, 9000004903, 
        9000004903, 9000004903), date = c(&quot;2023-03-29T19:00:00.000Z&quot;, 
        &quot;2023-03-29T20:00:00.000Z&quot;, &quot;2023-03-29T21:00:00.000Z&quot;, &quot;2023-03-29T22:00:00.000Z&quot;, 
        &quot;2023-03-29T23:00:00.000Z&quot;, &quot;2023-03-30T00:00:00.000Z&quot;), 
        interval = c(&quot;hourly&quot;, &quot;hourly&quot;, &quot;hourly&quot;, &quot;hourly&quot;, &quot;hourly&quot;, 
        &quot;hourly&quot;), uptime = c(0.4997222222, 0.6575, 0.9997222222, 
        0.9997222222, 0.9997222222, 0.9991666667), heavy = c(6, 0, 
        0, 0, 0, 0), car = c(4, 0, 0, 0, 0, 0), bike = c(0, 0, 0, 
        0, 0, 0), pedestrian = c(0, 0, 0, 0, 0, 0), heavy_lft = c(0, 
        0, 0, 0, 0, 0), heavy_rgt = c(6, 0, 0, 0, 0, 0), car_lft = c(2, 
        0, 0, 0, 0, 0), car_rgt = c(2, 0, 0, 0, 0, 0), bike_lft = c(0, 
        0, 0, 0, 0, 0), bike_rgt = c(0, 0, 0, 0, 0, 0), pedestrian_lft = c(0, 
        0, 0, 0, 0, 0), pedestrian_rgt = c(0, 0, 0, 0, 0, 0), direction = c(1L, 
        1L, 1L, 1L, 1L, 1L), car_speed_hist_0to70plus = list(c(50, 
        50, 0, 0, 0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0), c(0, 0, 
        0, 0, 0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0), c(0, 0, 0, 
        0, 0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0)), car_speed_hist_0to120plus = list(
            c(50, 0, 50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
            0, 0, 0, 0, 0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), c(0, 
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
            0, 0, 0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), c(0, 0, 0, 
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
            0, 0, 0, 0), c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), timezone = c(&quot;Europe/Paris&quot;, 
        &quot;Europe/Paris&quot;, &quot;Europe/Paris&quot;, &quot;Europe/Paris&quot;, &quot;Europe/Paris&quot;, 
        &quot;Europe/Paris&quot;), v85 = c(11, NA, NA, NA, NA, NA), date_corr = structure(c(1680116400, 
        1680120000, 1680123600, 1680127200, 1680130800, 1680134400
        ), class = c(&quot;POSIXct&quot;, &quot;POSIXt&quot;), tzone = &quot;Europe/Paris&quot;), 
        valid = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)), row.names = c(NA, 
    6L), class = &quot;data.frame&quot;)

In fact, my column is not exactly share but I am using values within car_speed_hist_0to70plus

答案1

得分: 1

你希望你的分组变量在整个组中保持一致，因为每一行都有唯一的 date_corr 值（1小时分辨率），group_by(date_corr) 会生成单行组。我们将通过先将 date_corr 转换为日期来获取每日分组。

将 car_speed_hist_0to70plus 向量的第一个元素提取为单独的步骤，这样更容易理解：

library(dplyr)
results %>%
  # 从 car_speed_hist_0to70plus 向量中提取第一个元素
  mutate(spd0to70plus_1 = purrr::map_int(car_speed_hist_0to70plus, first)) %>%
  group_by(date = lubridate::date(date_corr)) %>%
  summarise(daily_avg = sum(car * spd0to70plus_1) / sum(car))
#> # A tibble: 2 × 2
#>   date       daily_avg
#>   <date>         <dbl>
#> 1 2023-03-29        50
#> 2 2023-03-30       NaN

在这里使用 map_int 来访问每个个体行的 car_speed_hist_0to70plus 项目，first 是一个函数，用于调用每个 car_speed_hist_0to70plus 向量，并从这些向量中提取第一个项目。在 mutate 中使用 car_speed_hist_0to70plus[1] 将意味着提取 car_speed_hist_0to70plus 列的第一个项目，在这个示例中它是一个向量 c(50, 50, 0, 0, 0, 0, 0, 0)。

另一种方法是按行分组，然后我们可以在 mutate() 中使用例如 car_speed_hist_0to70plus[1] 和 car_speed_hist_0to70plus[4]：

results %>%
  rowwise() %>%
  mutate(spd0to70plus_s = car_speed_hist_0to70plus[1] + car_speed_hist_0to70plus[4]) %>%
  # 或者:
  # mutate(spd0to70plus_s = sum(car_speed_hist_0to70plus[c(1,4)]))
  group_by(date = lubridate::date(date_corr)) %>%
  summarise(daily_avg = sum(car * spd0to70plus_s) / sum(car))

^{创建于2023-05-29，使用 reprex v2.0.2}

英文:

You'd want your grouping variable to be identical across the group, as each row has unique date_corr value (1h resolution), group_by(date_corr) would generate single-row groups. We'll get daily groups by transforming that date_corr to date first.

Extracting first elements of car_speed_hist_0to70plus vectors as a separate step makes it bit easier to follow:

library(dplyr)
results %&gt;% 
  # extact first items from car_speed_hist_0to70plus vectors
  mutate(spd0to70plus_1 = purrr::map_int(car_speed_hist_0to70plus, first)) %&gt;% 
  group_by(date = lubridate::date(date_corr)) %&gt;% 
  summarise(daily_avg = sum(car * spd0to70plus_1) / sum(car))
#&gt; # A tibble: 2 &#215; 2
#&gt;   date       daily_avg
#&gt;   &lt;date&gt;         &lt;dbl&gt;
#&gt; 1 2023-03-29        50
#&gt; 2 2023-03-30       NaN

map_int is used here to access car_speed_hist_0to70plus items of each individual row, first is a function that gets called for each
car_speed_hist_0to70plus vector and it extracts first item from each of those vectors. Using car_speed_hist_0to70plus[1] in mutate would mean first item of car_speed_hist_0to70plus column, in this example it's a vector c(50, 50, 0, 0, 0, 0, 0, 0)

Another method would be rowwise grouping, then we could just use e.g. car_speed_hist_0to70plus[1] and car_speed_hist_0to70plus[4] in mutate():

results %&gt;% 
  rowwise() %&gt;% 
  mutate(spd0to70plus_s = car_speed_hist_0to70plus[1] + car_speed_hist_0to70plus[4]) %&gt;% 
  # or:
  # mutate(spd0to70plus_s = sum(car_speed_hist_0to70plus[c(1,4)]))
  group_by(date = lubridate::date(date_corr)) %&gt;% 
  summarise(daily_avg = sum(car * spd0to70plus_s) / sum(car))

<sup>Created on 2023-05-29 with reprex v2.0.2</sup>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

尝试在R中使用dplyr计算随时间变化的值。

问题

Edits

答案1

查找以特定模式结尾的名称/文本（使用基本R）。

如何在Go语言中从非英语字符串中解析月份

htpdate不会更新时间。

如何在Quarto文档中使用`bslib::layout_column_wrap()`和grid-column CSS属性。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。