2023年8月4日 01:41:02go评论88阅读模式

英文:

Determine how much each group is contributing to a total amount in R

问题

以下是翻译好的内容：

我有一个类似以下的数据集：

df_initial <- data.frame(total = c(1000, 800),
                         group1 = c(500, 100),
                         group2 = c(0, 600),
                         group3 = c(600, 0),
                         group4 = c(100, 200))

这个数据集有多个字段，每个字段代表不同的组。每一行都有一个总金额和与每个组相关的金额。我想确定每个组如何为总金额做出贡献，通过按特定顺序分配与每个组相关的金额 - 即，我想首先分配与group1相关的金额，然后是group2，依此类推。

换句话说，我想创建df_new，显示对于总金额1000，group1贡献了500，group2贡献了0，group3贡献了500，group4贡献了0。

df_new <- data.frame(total = c(1000, 800), 
                     group1_contribution = c(500, 100), 
                     group2_contribution = c(0, 600),
                     group3_contribution = c(500, 0), 
                     group4_contribution = c(0, 100))

我非常感谢有关如何在R中执行这个操作的任何建议或想法。

英文:

I have a dataset similar to the following:

df_initial &lt;- data.frame(total = c(1000, 800),
                         group1 = c(500, 100),
                         group2 = c(0, 600),
                         group3 = c(600, 0),
                         group4 = c(100, 200))

The dataset has multiple fields, each representing a different group. Each row has a total amount and amounts associated with each group. I would like to determine how much each group is contributing to the total amount by allocating the amounts associated with each group in a specific order - i.e., I want to allocate the amounts associated with group1 first, group2 second, etc.

In other words, I'm looking to create df_new, which shows that for the total amount of 1000, group1 contributes 500, group2 contributes 0, group3 contributes 500, and group4 contributes 0.

df_new &lt;- data.frame(total = c(1000, 800), 
                     group1_contribution = c(500, 100), 
                     group2_contribution = c(0, 600),
                     group3_contribution = c(500, 0), 
                     group4_contribution = c(0, 100))

I'd greatly appreciate any advice or ideas about how to do this in R.

答案1

得分: 2

我认为在这里使用基本的循环会很有效。

remaining <- df_initial[[1]]
df_output <- data.frame(total=remaining)
for (i in 2:ncol(df_initial)) {
  contrib <- pmin(df_initial[[i]], remaining)
  df_output[,names(df_initial)[i]] <- contrib
  remaining <- remaining - contrib
}
df_output
#   total group1 group2 group3 group4
# 1  1000    500      0    500      0
# 2   800    100    600      0    100

它循环遍历列并跟踪“剩余余额”。它减去剩余金额或列值中较小的一个。

英文:

I think a basic loop would work well here

remaining &lt;- df_initial[[1]]
df_output &lt;- data.frame(total=remaining)
for (i in 2:ncol(df_initial)) {
  contrib &lt;- pmin(df_initial[[i]], remaining)
  df_output[,names(df_initial)[i]] &lt;- contrib
  remaining &lt;- remaining - contrib
}
df_output
#   total group1 group2 group3 group4
# 1  1000    500      0    500      0
# 2   800    100    600      0    100

It loops over the columns keep track of the "remaining balance". It deducts the smaller of the remaining amount or the column value.

答案2

得分: 1

(groups <- grep("^group", names(df_initial), value = TRUE))
# [1] "group1" "group2" "group3" "group4"
df_new <- df_initial
df_new[,groups] <- t(
  apply(df_initial[,c("total", groups)], 1, function(z) pmin(z[-1], pmax(0, Reduce(`-`, z[-1], init=z[1], accumulate=TRUE))[-length(z)]))
)
names(df_new)[2:5] <- paste0(names(df_new)[2:5], "_contribution")
df_new
#   total group1_contribution group2_contribution group3_contribution group4_contribution
# 1  1000                 500                   0                 500                   0
# 2   800                 100                 600                   0                 100

英文:

(groups &lt;- grep(&quot;^group&quot;, names(df_initial), value = TRUE))
# [1] &quot;group1&quot; &quot;group2&quot; &quot;group3&quot; &quot;group4&quot;
df_new &lt;- df_initial
df_new[,groups] &lt;- t(
  apply(df_initial[,c(&quot;total&quot;, groups)], 1, function(z) pmin(z[-1], pmax(0, Reduce(`-`, z[-1], init=z[1], accumulate=TRUE))[-length(z)]))
)
names(df_new)[2:5] &lt;- paste0(names(df_new)[2:5], &quot;_contribution&quot;)
df_new
#   total group1_contribution group2_contribution group3_contribution group4_contribution
# 1  1000                 500                   0                 500                   0
# 2   800                 100                 600                   0                 100

Note: the use of apply works here because all referenced columns are numeric; blending different classes (e.g., string) will break this.
This is why I explicitly use groups to indicate all group# columns, and c("total", groups) to indicate the total and all group# columns.

The setNames step is merely to align the output with your df_new, without it the frame still has all of the correct values, and columns are named group1, etc.

答案3

得分: 1

library(dplyr)
library(tidyr)
df_initial |>
  pivot_longer(starts_with("group"),
               names_pattern = "(\\d+)") |>
  mutate(remaining = purrr::accumulate(value, `-`, .init = first(total))[-n()],
         value = ifelse(remaining < 0, value + remaining, pmin(value, remaining)),
         value = replace(value, value < 0, 0),
         .by = total) |>
  pivot_wider(names_from = name,
              names_glue = "group{name}_contribution",
              id_cols = total)

英文:

library(dplyr)
library(tidyr)
df_initial |&gt;
  pivot_longer(starts_with(&quot;group&quot;),
               names_pattern = &quot;(\\d+)&quot;) |&gt;
  mutate(remaining = purrr::accumulate(value, `-`, .init = first(total))[-n()],
         value = ifelse(remaining &lt; 0, value + remaining, pmin(value, remaining)),
         value = replace(value, value&lt;0, 0),
         .by = total) |&gt;
  pivot_wider(names_from = name,
              names_glue = &quot;group{name}_contribution&quot;,
              id_cols = total)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

确定每个群体对总金额的贡献在R中如何。

问题

答案1

答案2

答案3

如何防止dplyr::select合并名称而不是分配新名称？

R Web scraping code to pick all cast members and directors on the IMDB website not working?

Polars相对于{data.table}的内存使用情况

在R中仅针对相同类别的连续行分组数据。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。