2023年2月7日 02:01:10go评论92阅读模式

英文:

Finding mean of variable across each month/year

问题

我有一个类似于这样的数据集：

&gt; dput(df)
structure(list(Date = c(&quot;3/23/21&quot;, &quot;4/11/22&quot;, &quot;6/30/22&quot;), Banana_wasted = c(4L, 
2L, 5L), Apple_wasted = c(6L, 0L, 3L), Orange_wasted = c(1L, 
4L, 1L), Banana_ordered = c(5L, 7L, 7L), Apple_Ordered = c(9L, 
8L, 9L), Orange_ordered = c(5L, 6L, 6L), Banana_eaten = c(5L, 
5L, 6L), Apple_eaten = c(7L, 7L, 4L), Orange_eaten = c(8L, 8L, 
8L)), class = &quot;data.frame&quot;, row names = c(NA, -3L))

我想要计算每个月/年水果浪费的百分比（与订购了多少水果有关）。
应该是：
（Banana_wasted+Apple_wasted+Orange_wasted）/（Banana_ordered + Apple_ordered+ Orange_ordered）

因此，对于 3/21，应该是：
(4+6+1/5+9+5)*100 = 57.9%

我想要为一年中的每个月都这样做。

英文:

I have a dataset that looks similar to this:

&gt; dput(df)
structure(list(Date = c(&quot;3/23/21&quot;, &quot;4/11/22&quot;, &quot;6/30/22&quot;), Banana_wasted = c(4L, 
2L, 5L), Apple_wasted = c(6L, 0L, 3L), Orange_wasted = c(1L, 
4L, 1L), Banana_ordered = c(5L, 7L, 7L), Apple_Ordered = c(9L, 
8L, 9L), Orange_ordered = c(5L, 6L, 6L), Banana_eaten = c(5L, 
5L, 6L), Apple_eaten = c(7L, 7L, 4L), Orange_eaten = c(8L, 8L, 
8L)), class = &quot;data.frame&quot;, row.names = c(NA, -3L))

I want to find the % of fruit wasted per month/year (in relation to how many fruits were ordered).
it should be:
(Banana_wasted+Apple_wasted+Orange_wasted) / (Banana_ordered + Apple_ordered+ Orange_ordered)

So, for 3/21, it should be:
(4+6+1/5+9+5)*100 = 57.9%

I would like to do this for every month of the year.

答案1

得分: 2

library(tidyverse)
df %>%
  group_by(Date = floor_date(mdy(Date), "month")) %>%
  summarise(
    wasted = sum(across(contains("wasted"))) / sum(across(contains("ordered"))),
    wasted_eaten = sum(across(contains("wasted"))) / sum(across(contains("eaten")))
  )
# A tibble: 3 x 3
  Date       wasted wasted_eaten
  <date>      <dbl>        <dbl>
1 2021-03-01  0.579        0.579
2 2022-04-01  0.286        0.314
3 2022-06-01  0.409        0.523

英文:

library(tidyverse)
df %&gt;%
  group_by(Date = floor_date(mdy(Date), &quot;month&quot;)) %&gt;%
  summarise(
    wasted = sum(across(contains(&quot;wasted&quot;))) / sum(across(contains(&quot;ordered&quot;))),
    wasted_eaten = sum(across(contains(&quot;wasted&quot;))) / sum(across(contains(&quot;eaten&quot;)))
  )
# A tibble: 3 x 3
  Date       wasted wasted_eaten
  &lt;date&gt;      &lt;dbl&gt;        &lt;dbl&gt;
1 2021-03-01  0.579        0.579
2 2022-04-01  0.286        0.314
3 2022-06-01  0.409        0.523

答案2

得分: 1

库(dplyr)
库(lubridate)
df %&gt;% 
  变异(日期 = as.Date(日期, format = &quot;%m/%d/%y&quot;),
         浪费百分比 = (香蕉浪费 + 苹果浪费 + 橙子浪费) / (香蕉订购 + 苹果订购 + 橙子订购) * 100) %&gt;% 
  分组依据(年份 = year(日期), 月份 = month(日期)) %&gt;% 
  汇总(平均浪费百分比 = mean(浪费百分比))
#&gt; # A tibble: 3 &#215; 3
#&gt; # Groups:   year [2]
#&gt;    year month avg_pct_wasted
#&gt;   &lt;dbl&gt; &lt;dbl&gt;          &lt;dbl&gt;
#&gt; 1  2021     3           57.9
#&gt; 2  2022     4           28.6
#&gt; 3  2022     6           40.9

英文:

library(dplyr)
library(lubridate)
df %&gt;% 
  mutate(Date = as.Date(Date, format = &quot;%m/%d/%y&quot;),
         pct_wasted = (Banana_wasted + Apple_wasted + Orange_wasted) / (Banana_ordered + Apple_Ordered + Orange_ordered) * 100) %&gt;% 
  group_by(year = year(Date), month = month(Date)) %&gt;% 
  summarize(avg_pct_wasted = mean(pct_wasted))
#&gt; # A tibble: 3 &#215; 3
#&gt; # Groups:   year [2]
#&gt;    year month avg_pct_wasted
#&gt;   &lt;dbl&gt; &lt;dbl&gt;          &lt;dbl&gt;
#&gt; 1  2021     3           57.9
#&gt; 2  2022     4           28.6
#&gt; 3  2022     6           40.9

<sup>Created on 2023-02-06 with reprex v2.0.2</sup>

答案3

得分: 0

以下是翻译好的代码部分：

library(dplyr)
library(tidyr)
library(lubridate)
dat %>%
  rename(Apple_ordered = Apple_Ordered) %>%
  pivot_longer(
    Banana_wasted:Orange_eaten,
    names_to = c("水果", ".value"),
    names_sep = "_"
  ) %>%
  group_by(month = floor_date(mdy(Date), "month")) %>%
  summarize(pct_wasted = sum(wasted) / sum(ordered)) %>%
  ungroup()
# # 一个数据框: 3 × 2
#   月份          百分比浪费
#   <日期>           <dbl>
# 1 2021-03-01      0.579
# 2 2022-04-01      0.286
# 3 2022-06-01      0.409

library(scales)
dat %>%
  rename(Apple_ordered = Apple_Ordered) %>%
  pivot_longer(
    Banana_wasted:Orange_eaten,
    names_to = c("水果", ".value"),
    names_sep = "_"
  ) %>%
  group_by(month = strftime(mdy(Date), "%B %Y")) %>%
  summarize(pct_wasted = percent(sum(wasted) / sum(ordered), accuracy = 0.1)) %>%
  ungroup()
# # 一个数据框: 3 × 2
#   月份          百分比浪费
#   <字符>         <字符>     
# 1 April 2022  28.6%     
# 2 June 2022   40.9%     
# 3 March 2021  57.9%

英文:

Pivot longer to get single wasted and ordered columns across all fruits; use lubridate::floor_date() and mdy() to get months from Date; group by month; then sum and divide to get your percentages:

library(dplyr)
library(tidyr)
library(lubridate)
dat %&gt;% 
  rename(Apple_ordered = Apple_Ordered) %&gt;% # for consistent capitalization
  pivot_longer(
    Banana_wasted:Orange_eaten,
    names_to = c(&quot;Fruit&quot;, &quot;.value&quot;),
    names_sep = &quot;_&quot;
  ) %&gt;% 
  group_by(month = floor_date(mdy(Date), &quot;month&quot;)) %&gt;% 
  summarize(pct_wasted = sum(wasted) / sum(ordered)) %&gt;% 
  ungroup()
# # A tibble: 3 &#215; 2
#   month      pct_wasted
#   &lt;date&gt;          &lt;dbl&gt;
# 1 2021-03-01      0.579
# 2 2022-04-01      0.286
# 3 2022-06-01      0.409

If you prefer character labels, use strftime() instead of floor_date(), and scales::percent() for the percentages:

library(scales)
dat %&gt;% 
  rename(Apple_ordered = Apple_Ordered) %&gt;% 
  pivot_longer(
    Banana_wasted:Orange_eaten,
    names_to = c(&quot;Fruit&quot;, &quot;.value&quot;),
    names_sep = &quot;_&quot;
  ) %&gt;% 
  group_by(month = strftime(mdy(Date), &quot;%B %Y&quot;)) %&gt;% 
  summarize(pct_wasted = percent(sum(wasted) / sum(ordered), accuracy = 0.1)) %&gt;% 
  ungroup()
# # A tibble: 3 &#215; 2
#   month      pct_wasted
#   &lt;chr&gt;      &lt;chr&gt;     
# 1 April 2022 28.6%     
# 2 June 2022  40.9%     
# 3 March 2021 57.9%

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Finding mean of variable across each month/year 在每个月/年中查找变量的均值

问题

答案1

答案2

答案3

如何使用R从URL下载具有合并单元格的.xls数据。

R Shiny RenderUI 输出格式问题：删除 HTML 文本并重新格式化外观。

Connect multiple polygons from the same shapefile.

创建一个基于相同疾病和相同个体的不同日期的新列。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。