如何将变量值更新为当前值(当前值 * 100 / colsums())

huangapple go评论71阅读模式
英文:

How to update variable value to current value (current value * 100/ colsums())

问题

我有一个如下的数据框(df):

如何将变量值更新为当前值(当前值 * 100 / colsums())

我想要执行以下两个步骤:

  1. 如果列名的第一个字母与Grade中的值匹配,并且当前值为NA,则将其替换为0。
  2. 将A1:B2范围的值更新为当前值(当前值 * 100 / colsums())。

如何实现这个目标?

df <- structure(list(Grade = c("A", "A", "B", "B"), Pass = c("Y", "N", 
"Y", "N"), A1 = c(7, 8, NA, NA), A2 = c(4, 5, NA, NA), A3 = c(9, 
NA, NA, NA), B1 = c(NA, NA, 8, NA), B2 = c(NA, NA, 3, 4)), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))
英文:

I have a df like below.:

如何将变量值更新为当前值(当前值 * 100 / colsums())

I would like to do following two steps:

  1. if first letter of col name matched the value in Grade and current value is NA, then replace it as 0.
  2. update the value from A1:B2 as current value, (current value * 100/ colsums()).

How can I make this happen?

df &lt;- structure(list(Grade = c(&quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;B&quot;), Pass = c(&quot;Y&quot;, &quot;N&quot;, 
&quot;Y&quot;, &quot;N&quot;), A1 = c(7, 8, NA, NA), A2 = c(4, 5, NA, NA), A3 = c(9, 
NA, NA, NA), B1 = c(NA, NA, 8, NA), B2 = c(NA, NA, 3, 4)), row.names = c(NA, 
-4L), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;))

答案1

得分: 3

使用dplyr包,我们可以在case_when中使用across(如果您想要使用两个单独的ifelse,请查看我的旧编辑)。

library(dplyr)

df %>%
  mutate(across(A1:B2, ~case_when(str_extract(cur_column(), "^.") == Grade & is.na(.x) ~ "0 (0%)",
                                  is.na(.x) ~ NA,
                                  .default = paste0(.x, " (", round(.x * 100/(sum(.x, na.rm = T)), digits = 2), "%)"))))

A tibble: 4 × 7

Grade Pass A1 A2 A3 B1 B2

1 A Y 7 (46.67%) 4 (44.44%) 9 (100%) NA NA
2 A N 8 (53.33%) 5 (55.56%) 0 (0%) NA NA
3 B Y NA NA NA 8 (100%) 3 (42.86%)
4 B N NA NA NA 0 (0%) 4 (57.14%)


<hr>

### 更新

如果要使用`colnames`,您需要首先`pick`列。

```r
df %>%
  mutate(across(A1:B2, ~ifelse(str_extract(colnames(pick(.x)), "^.") == Grade & is.na(.x), 0, .x)), 
         across(A1:B2, ~ifelse(is.na(.x), NA, paste0(.x, " (", round(.x * 100/(sum(.x, na.rm = T)), digits = 2), "%)"))))
英文:

With the dplyr package, we can use across with case_when (check my old edits if you want to use two separate ifelse).

library(dplyr)

df |&gt; 
  mutate(across(A1:B2, ~case_when(str_extract(cur_column(), &quot;^.&quot;) == Grade &amp; is.na(.x) ~ &quot;0 (0%)&quot;,
                                  is.na(.x) ~ NA,
                                  .default = paste0(.x, &quot; (&quot;, round(.x * 100/(sum(.x, na.rm = T)), digits = 2), &quot;%)&quot;))))

# A tibble: 4 &#215; 7
  Grade Pass  A1         A2         A3       B1       B2        
  &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;      &lt;chr&gt;      &lt;chr&gt;    &lt;chr&gt;    &lt;chr&gt;     
1 A     Y     7 (46.67%) 4 (44.44%) 9 (100%) NA       NA        
2 A     N     8 (53.33%) 5 (55.56%) 0 (0%)   NA       NA        
3 B     Y     NA         NA         NA       8 (100%) 3 (42.86%)
4 B     N     NA         NA         NA       0 (0%)   4 (57.14%)

<hr>

Update

You need to first pick the columns if you want to use colnames.

df %&gt;% 
  mutate(across(A1:B2, ~ifelse(str_extract(colnames(pick(.x)), &quot;^.&quot;) == Grade &amp; is.na(.x), 0, .x)), 
         across(A1:B2, ~ifelse(is.na(.x), NA, paste0(.x, &quot; (&quot;, round(.x * 100/(sum(.x, na.rm = T)), digits = 2), &quot;%)&quot;))))

答案2

得分: 3

以下是代码的中文翻译:

# 使用两个枢轴的替代方法:

library(dplyr)
library(tidyr) # pivot_*

# 将数据框进行长格式化,排除 Grade 和 Pass 列
df %>%
  pivot_longer(-c(Grade, Pass)) %>%
  
  # 如果 Grade 与 name 的首字母相匹配,则将值设为 0
  mutate(value = if_else(Grade == substring(name, 1, 1), coalesce(value, 0), value)) %>%
  
  # 按 name 列分组
  group_by(name) %>%
  
  # 如果值为 NA,则将其替换为空字符串,否则将其格式化为带百分比的字符串
  mutate(value = if_else(is.na(value), "", sprintf("%i (%0.02f%%)", value, 100 * value / sum(value, na.rm = TRUE)))) %>%
  
  # 将数据框重新转换为宽格式
  pivot_wider()
# # 一个数据框: 4 × 7
#   Grade Pass  A1           A2           A3            B1            B2          
#   <chr> <chr> <chr>        <chr>        <chr>         <chr>         <chr>       
# 1 A     Y     "7 (46.67%)" "4 (44.44%)" "9 (100.00%)" ""            ""          
# 2 A     N     "8 (53.33%)" "5 (55.56%)" "0 (0.00%)"   ""            ""          
# 3 B     Y     ""           ""           ""            "8 (100.00%)" "3 (42.86%)"
# 4 B     N     ""           ""           ""            "0 (0.00%)"   "4 (57.14%)"

这是你要求的代码部分的中文翻译。如果还有其他需要,请随时告诉我。

英文:

An alternative with two pivots:

library(dplyr)
library(tidyr) # pivot_*
df %&gt;%
  pivot_longer(-c(Grade, Pass)) %&gt;%
  mutate(value = if_else(Grade == substring(name, 1, 1), coalesce(value, 0), value)) %&gt;%
  group_by(name) %&gt;%
  mutate(value = if_else(is.na(value), &quot;&quot;, sprintf(&quot;%i (%0.02f%%)&quot;, value, 100 * value / sum(value, na.rm = TRUE)))) %&gt;%
  pivot_wider()
# # A tibble: 4 &#215; 7
#   Grade Pass  A1           A2           A3            B1            B2          
#   &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;        &lt;chr&gt;        &lt;chr&gt;         &lt;chr&gt;         &lt;chr&gt;       
# 1 A     Y     &quot;7 (46.67%)&quot; &quot;4 (44.44%)&quot; &quot;9 (100.00%)&quot; &quot;&quot;            &quot;&quot;          
# 2 A     N     &quot;8 (53.33%)&quot; &quot;5 (55.56%)&quot; &quot;0 (0.00%)&quot;   &quot;&quot;            &quot;&quot;          
# 3 B     Y     &quot;&quot;           &quot;&quot;           &quot;&quot;            &quot;8 (100.00%)&quot; &quot;3 (42.86%)&quot;
# 4 B     N     &quot;&quot;           &quot;&quot;           &quot;&quot;            &quot;0 (0.00%)&quot;   &quot;4 (57.14%)&quot;

huangapple
  • 本文由 发表于 2023年6月5日 09:19:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76403037.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定