确定每个群体对总金额的贡献在R中如何。

huangapple go评论88阅读模式
英文:

Determine how much each group is contributing to a total amount in R

问题

以下是翻译好的内容:

我有一个类似以下的数据集:

  1. df_initial <- data.frame(total = c(1000, 800),
  2. group1 = c(500, 100),
  3. group2 = c(0, 600),
  4. group3 = c(600, 0),
  5. group4 = c(100, 200))

这个数据集有多个字段,每个字段代表不同的组。每一行都有一个总金额和与每个组相关的金额。我想确定每个组如何为总金额做出贡献,通过按特定顺序分配与每个组相关的金额 - 即,我想首先分配与group1相关的金额,然后是group2,依此类推。

换句话说,我想创建df_new,显示对于总金额1000,group1贡献了500,group2贡献了0,group3贡献了500,group4贡献了0。

  1. df_new <- data.frame(total = c(1000, 800),
  2. group1_contribution = c(500, 100),
  3. group2_contribution = c(0, 600),
  4. group3_contribution = c(500, 0),
  5. group4_contribution = c(0, 100))

我非常感谢有关如何在R中执行这个操作的任何建议或想法。

英文:

I have a dataset similar to the following:

  1. df_initial &lt;- data.frame(total = c(1000, 800),
  2. group1 = c(500, 100),
  3. group2 = c(0, 600),
  4. group3 = c(600, 0),
  5. group4 = c(100, 200))

The dataset has multiple fields, each representing a different group. Each row has a total amount and amounts associated with each group. I would like to determine how much each group is contributing to the total amount by allocating the amounts associated with each group in a specific order - i.e., I want to allocate the amounts associated with group1 first, group2 second, etc.

In other words, I'm looking to create df_new, which shows that for the total amount of 1000, group1 contributes 500, group2 contributes 0, group3 contributes 500, and group4 contributes 0.

  1. df_new &lt;- data.frame(total = c(1000, 800),
  2. group1_contribution = c(500, 100),
  3. group2_contribution = c(0, 600),
  4. group3_contribution = c(500, 0),
  5. group4_contribution = c(0, 100))

I'd greatly appreciate any advice or ideas about how to do this in R.

答案1

得分: 2

我认为在这里使用基本的循环会很有效。

  1. remaining <- df_initial[[1]]
  2. df_output <- data.frame(total=remaining)
  3. for (i in 2:ncol(df_initial)) {
  4. contrib <- pmin(df_initial[[i]], remaining)
  5. df_output[,names(df_initial)[i]] <- contrib
  6. remaining <- remaining - contrib
  7. }
  8. df_output
  9. # total group1 group2 group3 group4
  10. # 1 1000 500 0 500 0
  11. # 2 800 100 600 0 100

它循环遍历列并跟踪“剩余余额”。它减去剩余金额或列值中较小的一个。

英文:

I think a basic loop would work well here

  1. remaining &lt;- df_initial[[1]]
  2. df_output &lt;- data.frame(total=remaining)
  3. for (i in 2:ncol(df_initial)) {
  4. contrib &lt;- pmin(df_initial[[i]], remaining)
  5. df_output[,names(df_initial)[i]] &lt;- contrib
  6. remaining &lt;- remaining - contrib
  7. }
  8. df_output
  9. # total group1 group2 group3 group4
  10. # 1 1000 500 0 500 0
  11. # 2 800 100 600 0 100

It loops over the columns keep track of the "remaining balance". It deducts the smaller of the remaining amount or the column value.

答案2

得分: 1

  1. (groups <- grep("^group", names(df_initial), value = TRUE))
  2. # [1] "group1" "group2" "group3" "group4"
  3. df_new <- df_initial
  4. df_new[,groups] <- t(
  5. apply(df_initial[,c("total", groups)], 1, function(z) pmin(z[-1], pmax(0, Reduce(`-`, z[-1], init=z[1], accumulate=TRUE))[-length(z)]))
  6. )
  7. names(df_new)[2:5] <- paste0(names(df_new)[2:5], "_contribution")
  8. df_new
  9. # total group1_contribution group2_contribution group3_contribution group4_contribution
  10. # 1 1000 500 0 500 0
  11. # 2 800 100 600 0 100
英文:
  1. (groups &lt;- grep(&quot;^group&quot;, names(df_initial), value = TRUE))
  2. # [1] &quot;group1&quot; &quot;group2&quot; &quot;group3&quot; &quot;group4&quot;
  3. df_new &lt;- df_initial
  4. df_new[,groups] &lt;- t(
  5. apply(df_initial[,c(&quot;total&quot;, groups)], 1, function(z) pmin(z[-1], pmax(0, Reduce(`-`, z[-1], init=z[1], accumulate=TRUE))[-length(z)]))
  6. )
  7. names(df_new)[2:5] &lt;- paste0(names(df_new)[2:5], &quot;_contribution&quot;)
  8. df_new
  9. # total group1_contribution group2_contribution group3_contribution group4_contribution
  10. # 1 1000 500 0 500 0
  11. # 2 800 100 600 0 100

Note: the use of apply works here because all referenced columns are numeric; blending different classes (e.g., string) will break this.
This is why I explicitly use groups to indicate all group# columns, and c(&quot;total&quot;, groups) to indicate the total and all group# columns.

The setNames step is merely to align the output with your df_new, without it the frame still has all of the correct values, and columns are named group1, etc.

答案3

得分: 1

  1. library(dplyr)
  2. library(tidyr)
  3. df_initial |>
  4. pivot_longer(starts_with("group"),
  5. names_pattern = "(\\d+)") |>
  6. mutate(remaining = purrr::accumulate(value, `-`, .init = first(total))[-n()],
  7. value = ifelse(remaining < 0, value + remaining, pmin(value, remaining)),
  8. value = replace(value, value < 0, 0),
  9. .by = total) |>
  10. pivot_wider(names_from = name,
  11. names_glue = "group{name}_contribution",
  12. id_cols = total)
英文:
  1. library(dplyr)
  2. library(tidyr)
  3. df_initial |&gt;
  4. pivot_longer(starts_with(&quot;group&quot;),
  5. names_pattern = &quot;(\\d+)&quot;) |&gt;
  6. mutate(remaining = purrr::accumulate(value, `-`, .init = first(total))[-n()],
  7. value = ifelse(remaining &lt; 0, value + remaining, pmin(value, remaining)),
  8. value = replace(value, value&lt;0, 0),
  9. .by = total) |&gt;
  10. pivot_wider(names_from = name,
  11. names_glue = &quot;group{name}_contribution&quot;,
  12. id_cols = total)

huangapple
  • 本文由 发表于 2023年8月4日 01:41:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76830436.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定