Finding mean of variable across each month/year 在每个月/年中查找变量的均值

huangapple go评论92阅读模式
英文:

Finding mean of variable across each month/year

问题

我有一个类似于这样的数据集:

  1. > dput(df)
  2. structure(list(Date = c("3/23/21", "4/11/22", "6/30/22"), Banana_wasted = c(4L,
  3. 2L, 5L), Apple_wasted = c(6L, 0L, 3L), Orange_wasted = c(1L,
  4. 4L, 1L), Banana_ordered = c(5L, 7L, 7L), Apple_Ordered = c(9L,
  5. 8L, 9L), Orange_ordered = c(5L, 6L, 6L), Banana_eaten = c(5L,
  6. 5L, 6L), Apple_eaten = c(7L, 7L, 4L), Orange_eaten = c(8L, 8L,
  7. 8L)), class = "data.frame", row names = c(NA, -3L))

我想要计算每个月/年水果浪费的百分比(与订购了多少水果有关)。
应该是:
Banana_wasted+Apple_wasted+Orange_wasted)/(Banana_ordered + Apple_ordered+ Orange_ordered

因此,对于 3/21,应该是:
(4+6+1/5+9+5)*100 = 57.9%

我想要为一年中的每个月都这样做。

英文:

I have a dataset that looks similar to this:

  1. > dput(df)
  2. structure(list(Date = c("3/23/21", "4/11/22", "6/30/22"), Banana_wasted = c(4L,
  3. 2L, 5L), Apple_wasted = c(6L, 0L, 3L), Orange_wasted = c(1L,
  4. 4L, 1L), Banana_ordered = c(5L, 7L, 7L), Apple_Ordered = c(9L,
  5. 8L, 9L), Orange_ordered = c(5L, 6L, 6L), Banana_eaten = c(5L,
  6. 5L, 6L), Apple_eaten = c(7L, 7L, 4L), Orange_eaten = c(8L, 8L,
  7. 8L)), class = "data.frame", row.names = c(NA, -3L))

I want to find the % of fruit wasted per month/year (in relation to how many fruits were ordered).
it should be:
(Banana_wasted+Apple_wasted+Orange_wasted) / (Banana_ordered + Apple_ordered+ Orange_ordered)

So, for 3/21, it should be:
(4+6+1/5+9+5)*100 = 57.9%

I would like to do this for every month of the year.

答案1

得分: 2

  1. library(tidyverse)
  2. df %>%
  3. group_by(Date = floor_date(mdy(Date), "month")) %>%
  4. summarise(
  5. wasted = sum(across(contains("wasted"))) / sum(across(contains("ordered"))),
  6. wasted_eaten = sum(across(contains("wasted"))) / sum(across(contains("eaten")))
  7. )
  8. # A tibble: 3 x 3
  9. Date wasted wasted_eaten
  10. <date> <dbl> <dbl>
  11. 1 2021-03-01 0.579 0.579
  12. 2 2022-04-01 0.286 0.314
  13. 3 2022-06-01 0.409 0.523
英文:
  1. library(tidyverse)
  2. df %&gt;%
  3. group_by(Date = floor_date(mdy(Date), &quot;month&quot;)) %&gt;%
  4. summarise(
  5. wasted = sum(across(contains(&quot;wasted&quot;))) / sum(across(contains(&quot;ordered&quot;))),
  6. wasted_eaten = sum(across(contains(&quot;wasted&quot;))) / sum(across(contains(&quot;eaten&quot;)))
  7. )
  8. # A tibble: 3 x 3
  9. Date wasted wasted_eaten
  10. &lt;date&gt; &lt;dbl&gt; &lt;dbl&gt;
  11. 1 2021-03-01 0.579 0.579
  12. 2 2022-04-01 0.286 0.314
  13. 3 2022-06-01 0.409 0.523

答案2

得分: 1

  1. 库(dplyr)
  2. 库(lubridate)
  3. df %&gt;%
  4. 变异(日期 = as.Date(日期, format = &quot;%m/%d/%y&quot;),
  5. 浪费百分比 = (香蕉浪费 + 苹果浪费 + 橙子浪费) / (香蕉订购 + 苹果订购 + 橙子订购) * 100) %&gt;%
  6. 分组依据(年份 = year(日期), 月份 = month(日期)) %&gt;%
  7. 汇总(平均浪费百分比 = mean(浪费百分比))
  8. #&gt; # A tibble: 3 &#215; 3
  9. #&gt; # Groups: year [2]
  10. #&gt; year month avg_pct_wasted
  11. #&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
  12. #&gt; 1 2021 3 57.9
  13. #&gt; 2 2022 4 28.6
  14. #&gt; 3 2022 6 40.9
英文:
  1. library(dplyr)
  2. library(lubridate)
  3. df %&gt;%
  4. mutate(Date = as.Date(Date, format = &quot;%m/%d/%y&quot;),
  5. pct_wasted = (Banana_wasted + Apple_wasted + Orange_wasted) / (Banana_ordered + Apple_Ordered + Orange_ordered) * 100) %&gt;%
  6. group_by(year = year(Date), month = month(Date)) %&gt;%
  7. summarize(avg_pct_wasted = mean(pct_wasted))
  8. #&gt; # A tibble: 3 &#215; 3
  9. #&gt; # Groups: year [2]
  10. #&gt; year month avg_pct_wasted
  11. #&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
  12. #&gt; 1 2021 3 57.9
  13. #&gt; 2 2022 4 28.6
  14. #&gt; 3 2022 6 40.9

<sup>Created on 2023-02-06 with reprex v2.0.2</sup>

答案3

得分: 0

以下是翻译好的代码部分:

  1. library(dplyr)
  2. library(tidyr)
  3. library(lubridate)
  4. dat %>%
  5. rename(Apple_ordered = Apple_Ordered) %>%
  6. pivot_longer(
  7. Banana_wasted:Orange_eaten,
  8. names_to = c("水果", ".value"),
  9. names_sep = "_"
  10. ) %>%
  11. group_by(month = floor_date(mdy(Date), "month")) %>%
  12. summarize(pct_wasted = sum(wasted) / sum(ordered)) %>%
  13. ungroup()
  14. # # 一个数据框: 3 × 2
  15. # 月份 百分比浪费
  16. # <日期> <dbl>
  17. # 1 2021-03-01 0.579
  18. # 2 2022-04-01 0.286
  19. # 3 2022-06-01 0.409
  1. library(scales)
  2. dat %>%
  3. rename(Apple_ordered = Apple_Ordered) %>%
  4. pivot_longer(
  5. Banana_wasted:Orange_eaten,
  6. names_to = c("水果", ".value"),
  7. names_sep = "_"
  8. ) %>%
  9. group_by(month = strftime(mdy(Date), "%B %Y")) %>%
  10. summarize(pct_wasted = percent(sum(wasted) / sum(ordered), accuracy = 0.1)) %>%
  11. ungroup()
  12. # # 一个数据框: 3 × 2
  13. # 月份 百分比浪费
  14. # <字符> <字符>
  15. # 1 April 2022 28.6%
  16. # 2 June 2022 40.9%
  17. # 3 March 2021 57.9%
英文:

Pivot longer to get single wasted and ordered columns across all fruits; use lubridate::floor_date() and mdy() to get months from Date; group by month; then sum and divide to get your percentages:

  1. library(dplyr)
  2. library(tidyr)
  3. library(lubridate)
  4. dat %&gt;%
  5. rename(Apple_ordered = Apple_Ordered) %&gt;% # for consistent capitalization
  6. pivot_longer(
  7. Banana_wasted:Orange_eaten,
  8. names_to = c(&quot;Fruit&quot;, &quot;.value&quot;),
  9. names_sep = &quot;_&quot;
  10. ) %&gt;%
  11. group_by(month = floor_date(mdy(Date), &quot;month&quot;)) %&gt;%
  12. summarize(pct_wasted = sum(wasted) / sum(ordered)) %&gt;%
  13. ungroup()
  14. # # A tibble: 3 &#215; 2
  15. # month pct_wasted
  16. # &lt;date&gt; &lt;dbl&gt;
  17. # 1 2021-03-01 0.579
  18. # 2 2022-04-01 0.286
  19. # 3 2022-06-01 0.409

If you prefer character labels, use strftime() instead of floor_date(), and scales::percent() for the percentages:

  1. library(scales)
  2. dat %&gt;%
  3. rename(Apple_ordered = Apple_Ordered) %&gt;%
  4. pivot_longer(
  5. Banana_wasted:Orange_eaten,
  6. names_to = c(&quot;Fruit&quot;, &quot;.value&quot;),
  7. names_sep = &quot;_&quot;
  8. ) %&gt;%
  9. group_by(month = strftime(mdy(Date), &quot;%B %Y&quot;)) %&gt;%
  10. summarize(pct_wasted = percent(sum(wasted) / sum(ordered), accuracy = 0.1)) %&gt;%
  11. ungroup()
  12. # # A tibble: 3 &#215; 2
  13. # month pct_wasted
  14. # &lt;chr&gt; &lt;chr&gt;
  15. # 1 April 2022 28.6%
  16. # 2 June 2022 40.9%
  17. # 3 March 2021 57.9%

huangapple
  • 本文由 发表于 2023年2月7日 02:01:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75364960.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定