确定每个群体对总金额的贡献在R中如何。

huangapple go评论64阅读模式
英文:

Determine how much each group is contributing to a total amount in R

问题

以下是翻译好的内容:

我有一个类似以下的数据集:

df_initial <- data.frame(total = c(1000, 800),
                         group1 = c(500, 100),
                         group2 = c(0, 600),
                         group3 = c(600, 0),
                         group4 = c(100, 200))

这个数据集有多个字段,每个字段代表不同的组。每一行都有一个总金额和与每个组相关的金额。我想确定每个组如何为总金额做出贡献,通过按特定顺序分配与每个组相关的金额 - 即,我想首先分配与group1相关的金额,然后是group2,依此类推。

换句话说,我想创建df_new,显示对于总金额1000,group1贡献了500,group2贡献了0,group3贡献了500,group4贡献了0。

df_new <- data.frame(total = c(1000, 800), 
                     group1_contribution = c(500, 100), 
                     group2_contribution = c(0, 600),
                     group3_contribution = c(500, 0), 
                     group4_contribution = c(0, 100))

我非常感谢有关如何在R中执行这个操作的任何建议或想法。

英文:

I have a dataset similar to the following:

df_initial &lt;- data.frame(total = c(1000, 800),
                         group1 = c(500, 100),
                         group2 = c(0, 600),
                         group3 = c(600, 0),
                         group4 = c(100, 200))

The dataset has multiple fields, each representing a different group. Each row has a total amount and amounts associated with each group. I would like to determine how much each group is contributing to the total amount by allocating the amounts associated with each group in a specific order - i.e., I want to allocate the amounts associated with group1 first, group2 second, etc.

In other words, I'm looking to create df_new, which shows that for the total amount of 1000, group1 contributes 500, group2 contributes 0, group3 contributes 500, and group4 contributes 0.

df_new &lt;- data.frame(total = c(1000, 800), 
                     group1_contribution = c(500, 100), 
                     group2_contribution = c(0, 600),
                     group3_contribution = c(500, 0), 
                     group4_contribution = c(0, 100))

I'd greatly appreciate any advice or ideas about how to do this in R.

答案1

得分: 2

我认为在这里使用基本的循环会很有效。

remaining <- df_initial[[1]]
df_output <- data.frame(total=remaining)
for (i in 2:ncol(df_initial)) {
  contrib <- pmin(df_initial[[i]], remaining)
  df_output[,names(df_initial)[i]] <- contrib
  remaining <- remaining - contrib
}
df_output
#   total group1 group2 group3 group4
# 1  1000    500      0    500      0
# 2   800    100    600      0    100

它循环遍历列并跟踪“剩余余额”。它减去剩余金额或列值中较小的一个。

英文:

I think a basic loop would work well here

remaining &lt;- df_initial[[1]]
df_output &lt;- data.frame(total=remaining)
for (i in 2:ncol(df_initial)) {
  contrib &lt;- pmin(df_initial[[i]], remaining)
  df_output[,names(df_initial)[i]] &lt;- contrib
  remaining &lt;- remaining - contrib
}
df_output
#   total group1 group2 group3 group4
# 1  1000    500      0    500      0
# 2   800    100    600      0    100

It loops over the columns keep track of the "remaining balance". It deducts the smaller of the remaining amount or the column value.

答案2

得分: 1

(groups <- grep("^group", names(df_initial), value = TRUE))
# [1] "group1" "group2" "group3" "group4"
df_new <- df_initial
df_new[,groups] <- t(
  apply(df_initial[,c("total", groups)], 1, function(z) pmin(z[-1], pmax(0, Reduce(`-`, z[-1], init=z[1], accumulate=TRUE))[-length(z)]))
)
names(df_new)[2:5] <- paste0(names(df_new)[2:5], "_contribution")
df_new
#   total group1_contribution group2_contribution group3_contribution group4_contribution
# 1  1000                 500                   0                 500                   0
# 2   800                 100                 600                   0                 100
英文:
(groups &lt;- grep(&quot;^group&quot;, names(df_initial), value = TRUE))
# [1] &quot;group1&quot; &quot;group2&quot; &quot;group3&quot; &quot;group4&quot;
df_new &lt;- df_initial
df_new[,groups] &lt;- t(
  apply(df_initial[,c(&quot;total&quot;, groups)], 1, function(z) pmin(z[-1], pmax(0, Reduce(`-`, z[-1], init=z[1], accumulate=TRUE))[-length(z)]))
)
names(df_new)[2:5] &lt;- paste0(names(df_new)[2:5], &quot;_contribution&quot;)
df_new
#   total group1_contribution group2_contribution group3_contribution group4_contribution
# 1  1000                 500                   0                 500                   0
# 2   800                 100                 600                   0                 100

Note: the use of apply works here because all referenced columns are numeric; blending different classes (e.g., string) will break this.
This is why I explicitly use groups to indicate all group# columns, and c(&quot;total&quot;, groups) to indicate the total and all group# columns.

The setNames step is merely to align the output with your df_new, without it the frame still has all of the correct values, and columns are named group1, etc.

答案3

得分: 1

library(dplyr)
library(tidyr)

df_initial |>
  pivot_longer(starts_with("group"),
               names_pattern = "(\\d+)") |>
  mutate(remaining = purrr::accumulate(value, `-`, .init = first(total))[-n()],
         value = ifelse(remaining < 0, value + remaining, pmin(value, remaining)),
         value = replace(value, value < 0, 0),
         .by = total) |>
  pivot_wider(names_from = name,
              names_glue = "group{name}_contribution",
              id_cols = total)
英文:
library(dplyr)
library(tidyr)

df_initial |&gt;
  pivot_longer(starts_with(&quot;group&quot;),
               names_pattern = &quot;(\\d+)&quot;) |&gt;
  mutate(remaining = purrr::accumulate(value, `-`, .init = first(total))[-n()],
         value = ifelse(remaining &lt; 0, value + remaining, pmin(value, remaining)),
         value = replace(value, value&lt;0, 0),
         .by = total) |&gt;
  pivot_wider(names_from = name,
              names_glue = &quot;group{name}_contribution&quot;,
              id_cols = total)

huangapple
  • 本文由 发表于 2023年8月4日 01:41:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76830436.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定