英文:
Determine how much each group is contributing to a total amount in R
问题
以下是翻译好的内容:
我有一个类似以下的数据集:
df_initial <- data.frame(total = c(1000, 800),
group1 = c(500, 100),
group2 = c(0, 600),
group3 = c(600, 0),
group4 = c(100, 200))
这个数据集有多个字段,每个字段代表不同的组。每一行都有一个总金额和与每个组相关的金额。我想确定每个组如何为总金额做出贡献,通过按特定顺序分配与每个组相关的金额 - 即,我想首先分配与group1相关的金额,然后是group2,依此类推。
换句话说,我想创建df_new
,显示对于总金额1000,group1贡献了500,group2贡献了0,group3贡献了500,group4贡献了0。
df_new <- data.frame(total = c(1000, 800),
group1_contribution = c(500, 100),
group2_contribution = c(0, 600),
group3_contribution = c(500, 0),
group4_contribution = c(0, 100))
我非常感谢有关如何在R中执行这个操作的任何建议或想法。
英文:
I have a dataset similar to the following:
df_initial <- data.frame(total = c(1000, 800),
group1 = c(500, 100),
group2 = c(0, 600),
group3 = c(600, 0),
group4 = c(100, 200))
The dataset has multiple fields, each representing a different group. Each row has a total amount and amounts associated with each group. I would like to determine how much each group is contributing to the total amount by allocating the amounts associated with each group in a specific order - i.e., I want to allocate the amounts associated with group1 first, group2 second, etc.
In other words, I'm looking to create df_new
, which shows that for the total amount of 1000, group1 contributes 500, group2 contributes 0, group3 contributes 500, and group4 contributes 0.
df_new <- data.frame(total = c(1000, 800),
group1_contribution = c(500, 100),
group2_contribution = c(0, 600),
group3_contribution = c(500, 0),
group4_contribution = c(0, 100))
I'd greatly appreciate any advice or ideas about how to do this in R.
答案1
得分: 2
我认为在这里使用基本的循环会很有效。
remaining <- df_initial[[1]]
df_output <- data.frame(total=remaining)
for (i in 2:ncol(df_initial)) {
contrib <- pmin(df_initial[[i]], remaining)
df_output[,names(df_initial)[i]] <- contrib
remaining <- remaining - contrib
}
df_output
# total group1 group2 group3 group4
# 1 1000 500 0 500 0
# 2 800 100 600 0 100
它循环遍历列并跟踪“剩余余额”。它减去剩余金额或列值中较小的一个。
英文:
I think a basic loop would work well here
remaining <- df_initial[[1]]
df_output <- data.frame(total=remaining)
for (i in 2:ncol(df_initial)) {
contrib <- pmin(df_initial[[i]], remaining)
df_output[,names(df_initial)[i]] <- contrib
remaining <- remaining - contrib
}
df_output
# total group1 group2 group3 group4
# 1 1000 500 0 500 0
# 2 800 100 600 0 100
It loops over the columns keep track of the "remaining balance". It deducts the smaller of the remaining amount or the column value.
答案2
得分: 1
(groups <- grep("^group", names(df_initial), value = TRUE))
# [1] "group1" "group2" "group3" "group4"
df_new <- df_initial
df_new[,groups] <- t(
apply(df_initial[,c("total", groups)], 1, function(z) pmin(z[-1], pmax(0, Reduce(`-`, z[-1], init=z[1], accumulate=TRUE))[-length(z)]))
)
names(df_new)[2:5] <- paste0(names(df_new)[2:5], "_contribution")
df_new
# total group1_contribution group2_contribution group3_contribution group4_contribution
# 1 1000 500 0 500 0
# 2 800 100 600 0 100
英文:
(groups <- grep("^group", names(df_initial), value = TRUE))
# [1] "group1" "group2" "group3" "group4"
df_new <- df_initial
df_new[,groups] <- t(
apply(df_initial[,c("total", groups)], 1, function(z) pmin(z[-1], pmax(0, Reduce(`-`, z[-1], init=z[1], accumulate=TRUE))[-length(z)]))
)
names(df_new)[2:5] <- paste0(names(df_new)[2:5], "_contribution")
df_new
# total group1_contribution group2_contribution group3_contribution group4_contribution
# 1 1000 500 0 500 0
# 2 800 100 600 0 100
Note: the use of apply
works here because all referenced columns are numeric; blending different classes (e.g., string) will break this.
This is why I explicitly use groups
to indicate all group#
columns, and c("total", groups)
to indicate the total
and all group#
columns.
The setNames
step is merely to align the output with your df_new
, without it the frame still has all of the correct values, and columns are named group1
, etc.
答案3
得分: 1
library(dplyr)
library(tidyr)
df_initial |>
pivot_longer(starts_with("group"),
names_pattern = "(\\d+)") |>
mutate(remaining = purrr::accumulate(value, `-`, .init = first(total))[-n()],
value = ifelse(remaining < 0, value + remaining, pmin(value, remaining)),
value = replace(value, value < 0, 0),
.by = total) |>
pivot_wider(names_from = name,
names_glue = "group{name}_contribution",
id_cols = total)
英文:
library(dplyr)
library(tidyr)
df_initial |>
pivot_longer(starts_with("group"),
names_pattern = "(\\d+)") |>
mutate(remaining = purrr::accumulate(value, `-`, .init = first(total))[-n()],
value = ifelse(remaining < 0, value + remaining, pmin(value, remaining)),
value = replace(value, value<0, 0),
.by = total) |>
pivot_wider(names_from = name,
names_glue = "group{name}_contribution",
id_cols = total)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论