在R中如何对每个组的系数进行乘法运算,然后计算原始值的百分比。

huangapple go评论114阅读模式
英文:

How to multiply the coefficients for each group and then calculate the percentage of the original value in R

问题

以下是您要翻译的内容:

我有一个数据集,类似下面的玩具数据集:

       s=structure(list(month_id = c(202306L, 202305L, 202307L, 202305L, 
    202306L, 202307L), MDM_Key = c(1L, 1L, 1L, 2L, 2L, 2L), sale_count = c(NA, 
    19161L, NA, 17726L, NA, NA), iek_max_discount = c(0.5356, 0.5256, 
    0.5456, 0.559, 0.569, 0.589)), class = "data.frame", row.names = c(NA, 
    -6L))

和另一个具有系数的数据集

    cf=structure(list(MDM_Key = 1:2, ML.coef = c(1.46, 1.67)), class = "data.frame", row.names = c(NA, 
    -2L))

'iek_max_discount' 是一个百分比值,即不是0.5356,而是53.56

我需要将每个`mdm_key`(来自`cf`的分组变量)的每个系数乘以最后一个当前月份(来自`s`数据集)的相应`iek_max_discount`(对应于`mdm_key`的)值。

例如,今天的当前月份是五月。这意味着对于`mdm_key =1`,我们取系数1.46并乘以六月的值(`month_id=202306`和`1,46*53,56=78`)。

得到的值也是一个百分比;现在,让我们计算78%的初始值`sale_count`是多少(它也只针对当前月份)。 `19161/100*78=14945`。

然后将结果添加到19161中,并将结果插入到六月份。

类似地,我们对七月份(`month_id=202307`)进行同样的操作;乘以系数`1,46*54,56=79`。计算比例:`19161/100*79=15137`

我们将得到的值添加到19161中,并将结果插入到七月份。

例如,对于mdm_key=1的期望输出

    month_id	MDM_Key	sale_count	iek_max_discount	percent	prop	final
    202306	         1		         0,5356	                 78	14945	34106
    202305	         1	19161	     0,5256			
    202307	         1		         0,5456	                 79	15137	34298

经过相同的过程,我们对mdm_key=2(或任何其他mdm_key,如果有的话)进行操作。

执行这样一系列算术操作的最佳和最简单的方法是什么?
英文:

I have a dataset like the toy dataset below:

   s=structure(list(month_id = c(202306L, 202305L, 202307L, 202305L, 
202306L, 202307L), MDM_Key = c(1L, 1L, 1L, 2L, 2L, 2L), sale_count = c(NA, 
19161L, NA, 17726L, NA, NA), iek_max_discount = c(0.5356, 0.5256, 
0.5456, 0.559, 0.569, 0.589)), class = "data.frame", row.names = c(NA, 
-6L))

and another dataset with coefficients

cf=structure(list(MDM_Key = 1:2, ML.coef = c(1.46, 1.67)), class = "data.frame", row.names = c(NA, 
-2L))

iek_max_discount is a percentage value, that is, not 0.5356, but 53.56

I need each coefficient of each mdm_key(group variable, from cf) multiply by the corresponding iek_max_discount(to the corresponding mdm_key) value from the last current month (from s dataset).

For example, today the current month is May. This means that for mdm_key =1, we take the coefficient 1.46 and multiply by the value of June (month_id=202306 and 1,46*53,56=78).

The resulting value is also a percentage; now, let's calculate what proportion is 78% of the initial valuesale_count (it is also presented only for the current month). 19161/100*78=14945.

Then the result must be added to 19161 and sum is inserted into the month of June.

Similarly, we do this for the month of July (month_id=202307); multiply by the coefficient 1,46*54,56=79. Calculate proportion: 19161/100*79=15137

We add the resulting value to 19161 and insert the result into the month of July.

For example desired output for mdm_key=1

month_id	MDM_Key	sale_count	iek_max_discount	percent	prop	final
202306	         1		         0,5356	                 78	14945	34106
202305	         1	19161	     0,5256			
202307	         1		         0,5456	                 79	15137	34298

After the same procedure we do for mdm_key=2 (or any other mdm_key, if any.)

What is the best and easiest way to do such a sequence of arithmetic operations?

答案1

得分: 1

假设总是有一个第一个月,每个 *MDM_Key* 有一个 *sale_count*

library(dplyr)

merge(s, cf) %>%
arrange(MDM_Key, month_id) %>%
mutate(ML.coef = if_else(!is.na(sale_count), NA, ML.coef),
percent = iek_max_discount * ML.coef * 100,
prop = (percent / 100) * sale_count[1],
final = sale_count[1] + prop, .by = MDM_Key)
MDM_Key month_id sale_count iek_max_discount ML.coef percent prop
1 1 202305 19161 0.5256 NA NA NA
2 1 202306 NA 0.5356 1.46 78.1976 14983.44
3 1 202307 NA 0.5456 1.46 79.6576 15263.19
4 2 202305 17726 0.5590 NA NA NA
5 2 202306 NA 0.5690 1.67 95.0230 16843.78
6 2 202307 NA 0.5890 1.67 98.3630 17435.83
final
1 NA
2 34144.44
3 34424.19
4 NA
5 34569.78
6 35161.83


<details>
<summary>英文:</summary>

Assuming there is always a first month that has a *sale_count* per *MDM_Key*

library(dplyr)

merge(s, cf) %>%
arrange(MDM_Key, month_id) %>%
mutate(ML.coef = if_else(!is.na(sale_count), NA, ML.coef),
percent = iek_max_discount * ML.coef * 100,
prop = (percent / 100) * sale_count[1],
final = sale_count[1] + prop, .by = MDM_Key)
MDM_Key month_id sale_count iek_max_discount ML.coef percent prop
1 1 202305 19161 0.5256 NA NA NA
2 1 202306 NA 0.5356 1.46 78.1976 14983.44
3 1 202307 NA 0.5456 1.46 79.6576 15263.19
4 2 202305 17726 0.5590 NA NA NA
5 2 202306 NA 0.5690 1.67 95.0230 16843.78
6 2 202307 NA 0.5890 1.67 98.3630 17435.83
final
1 NA
2 34144.44
3 34424.19
4 NA
5 34569.78
6 35161.83


</details>



# 答案2
**得分**: 1

``` r
library(dplyr)

cur_month <- as.integer(202305)

s %>%
  right_join(cf, by = "MDM_Key") %>%
  mutate(percent = floor(if_else(month_id == cur_month,
                                  NA, ML.coef * iek_max_discount * 100)),
         prop = floor(percent * sale_count[month_id == cur_month]/100),
         final = sale_count[month_id == cur_month] + prop,
         .by = MDM_Key)
#>   month_id MDM_Key sale_count iek_max_discount ML.coef percent  prop final
#> 1   202306       1         NA           0.5356    1.46      78 14945 34106
#> 2   202305       1      19161           0.5256    1.46      NA    NA    NA
#> 3   202307       1         NA           0.5456    1.46      79 15137 34298
#> 4   202305       2      17726           0.5590    1.67      NA    NA    NA
#> 5   202306       2         NA           0.5690    1.67      95 16839 34565
#> 6   202307       2         NA           0.5890    1.67      98 17371 35097
英文:
library(dplyr)

cur_month &lt;- as.integer(202305)

s %&gt;% 
  right_join(cf, by = &quot;MDM_Key&quot;) %&gt;% 
  mutate(percent = floor(if_else(month_id == cur_month, 
                           NA, ML.coef * iek_max_discount * 100)),
         prop = floor(percent * sale_count[month_id == cur_month]/100),
         final = sale_count[month_id == cur_month] + prop, 
         .by = MDM_Key)
#&gt;   month_id MDM_Key sale_count iek_max_discount ML.coef percent  prop final
#&gt; 1   202306       1         NA           0.5356    1.46      78 14945 34106
#&gt; 2   202305       1      19161           0.5256    1.46      NA    NA    NA
#&gt; 3   202307       1         NA           0.5456    1.46      79 15137 34298
#&gt; 4   202305       2      17726           0.5590    1.67      NA    NA    NA
#&gt; 5   202306       2         NA           0.5690    1.67      95 16839 34565
#&gt; 6   202307       2         NA           0.5890    1.67      98 17371 35097

答案3

得分: 0

这是翻译后的代码部分:

这个部分有点难以理解你尝试做什么。以下是我最佳猜测:

# 合并它们以使键在同一行上
scf <- merge(s, cf, by = "MDM_Key")

# 按每个唯一的MDM_Key拆分为列表
s_by_key <- split(scf, s$MDM_Key)

# 遍历列表中的每个元素
for(i in 1:length(s_by_key)){

  # 截断以去除小数部分
  s_by_key[[i]]$percent <- trunc(s_by_key[[i]]$iek_max_discount * s_by_key[[i]]$ML.coef * 100)

  # 获取最近的销售月份
  months_with_sales <- which(!is.na(s_by_key[[i]]$sale_count))
  latest_month_with_sales <- which.max(s_by_key[[i]]$month_id[months_with_sales])

  last_sales <- s_by_key[[i]][latest_month_with_sales, "sale_count"]
  s_by_key[[i]]$prop <- trunc(last_sales / 100 * s_by_key[[i]]$percent)
  s_by_key[[i]]$final <- last_sales + s_by_key[[i]]$prop

  # 并将这些行设置为NA,以符合所需的输出
  s_by_key[[i]][latest_month_with_sales, c("percent","prop","final")] <- NA

}

output <- do.call(rbind, s_by_key)
output
#    MDM_Key month_id sale_count iek_max_discount ML.coef percent  prop final
#1.1       1   202306         NA           0.5356    1.46      78 14945 34106
#1.2       1   202305      19161           0.5256    1.46      NA    NA    NA
#1.3       1   202307         NA           0.5456    1.46      79 15137 34298
#2.4       2   202305      17726           0.5590    1.67      NA    NA    NA
#2.5       2   202306         NA           0.5690    1.67      95 16839 34565
#2.6       2   202307         NA           0.5890    1.67      98 17371 35097

希望这可以帮助你理解代码的内容。

英文:

It was a little hard to understand what you were trying to do. Here is my best guess:

# merge them together to have the key in the same row
scf &lt;- merge(s, cf, by = &quot;MDM_Key&quot;)

# split into lists by each unique MDM_Key
s_by_key &lt;- split(scf, s$MDM_Key)

# loop through each element of the list
for(i in 1:length(s_by_key)){

  # trunc to remove the decimals
  s_by_key[[i]]$percent &lt;- trunc(s_by_key[[i]]$iek_max_discount *  s_by_key[[i]]$ML.coef * 100)
  
  # get the most recent sales month
  months_with_sales &lt;- which(!is.na(s_by_key[[i]]$sale_count))
  latest_month_with_sales &lt;- which.max(s_by_key[[i]]$month_id[months_with_sales])
  
  last_sales &lt;- s_by_key[[i]][latest_month_with_sales, &quot;sale_count&quot;]
  s_by_key[[i]]$prop &lt;- trunc(last_sales / 100 * s_by_key[[i]]$percent)
  s_by_key[[i]]$final &lt;- last_sales + s_by_key[[i]]$prop
  
  # and set those rows to NA to be like the desired output
  s_by_key[[i]][latest_month_with_sales, c(&quot;percent&quot;,&quot;prop&quot;,&quot;final&quot;)] &lt;- NA
  
}

output &lt;- do.call(rbind, s_by_key)
output
#    MDM_Key month_id sale_count iek_max_discount ML.coef percent  prop final
#1.1       1   202306         NA           0.5356    1.46      78 14945 34106
#1.2       1   202305      19161           0.5256    1.46      NA    NA    NA
#1.3       1   202307         NA           0.5456    1.46      79 15137 34298
#2.4       2   202305      17726           0.5590    1.67      NA    NA    NA
#2.5       2   202306         NA           0.5690    1.67      95 16839 34565
#2.6       2   202307         NA           0.5890    1.67      98 17371 35097

huangapple
  • 本文由 发表于 2023年5月6日 19:03:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76188516.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定