计算多个配对变量的实际差异和百分比差异同时。

huangapple go评论105阅读模式
英文:

Calculating the actual difference and percentage difference for multiple paired variables simultaneously

问题

以下是您要翻译的内容:

  1. 我有以下示例数据框,并希望一次计算多个配对变量(“10”和“20”对应于测试年份)的实际和百分比差异:
  2. 样本数据:
  3. Group| A_10 | A_20 | B_10 | B_20
  4. 0 20 21 20 23
  5. 1 30 10 19 11
  6. 2 10 53 30 34
  7. 1 22 32 25 20
  8. 2 34 40 32 30
  9. 0 30 50 NA 40
  10. 0 39 40 19 20
  11. 1 40 NA 20 20
  12. 2 50 10 20 10
  13. 0 34 23 30 10
  14. 这是当前的工作代码:
  15. library(dplyr)
  16. # 假设数据框命名为'df'并具有以下结构:
  17. # 'var1_1','var1_2',...代表第一组变量
  18. # 'var2_1','var2_2',...代表第二组变量
  19. # 定义要计算差异的变量对
  20. variable_pairs <- list(
  21. c("A_10", "A_20"),
  22. c("B_10", "B_20")) # 我还有另外20个配对变量
  23. # 计算每个变量对的实际差异和百分比差异
  24. df6 <- df %>%
  25. mutate(
  26. across(
  27. all_of(unlist(variable_pairs)),
  28. ~ .x - get(variable_pairs[[cur_column()]][2]),
  29. .names = "{.col}_actual_diff"
  30. ),
  31. across(
  32. all_of(unlist(variable_pairs)),
  33. ~ (.x - get(variable_pairs[[cur_column()]][2])) / get(variable_pairs[[cur_column()]][2]) * 100,
  34. .names = "{.col}_percentage_diff"
  35. )
  36. )

不幸的是,我在某个地方出错了或者过于复杂。上述代码会出现以下错误:

  1. 错误 in `mutate()`:
  2. In argument: `across(...)`.
  3. Caused by error in `across()`:
  4. ! Can't compute column `vo2mlkg_12_actual_diff`.
  5. Caused by error in `get()`:
  6. ! invalid first argument
  7. Run `rlang::last_trace()` to see where the error occurred.

有人能提出修复或更简单的解决方案吗?

附加说明:

长数据:

Group| variable | phase | Value |

0 A 10 20
1 B 20 19
2 C 20 30
1 D 10 25
2 E 20 32
0 F 10 NA
0 G 20 19
1 H 10 20
2 I 10 20
0 J 20 30

  1. 感谢@Maël的解决方案:
  2. ```R
  3. library(dplyr)
  4. library(tidyr)
  5. library(magrittr)
  6. df2 <- df[,-2]
  7. df2 %<>% ...
英文:

I have the following example data frame and would like to calculate the actual and percentage differences across multiple paired variables ("10" and "20" correspond to year tested) at once:

sample data:

  1. Group| A_10 | A_20 | B_10 | B_20
  2. 0 20 21 20 23
  3. 1 30 10 19 11
  4. 2 10 53 30 34
  5. 1 22 32 25 20
  6. 2 34 40 32 30
  7. 0 30 50 NA 40
  8. 0 39 40 19 20
  9. 1 40 NA 20 20
  10. 2 50 10 20 10
  11. 0 34 23 30 10

This is the current working code:

  1. library(dplyr)
  2. # Assuming data frame is named 'df' and has the following structure:
  3. # 'var1_1', 'var1_2', ... represent the first set of variables
  4. # 'var2_1', 'var2_2', ... represent the second set of variables
  5. # Define the pairs of variables for which you want to calculate the differences
  6. variable_pairs <- list(
  7. c("A_10", "A_20"),
  8. c("B_10", "B_20")) # I have another 20 paired variabels
  9. # Calculate the actual and percentage differences for each variable pair
  10. df6 <- df %>%
  11. mutate(
  12. across(
  13. all_of(unlist(variable_pairs)),
  14. ~ .x - get(variable_pairs[[cur_column()]][2]),
  15. .names = "{.col}_actual_diff"
  16. ),
  17. across(
  18. all_of(unlist(variable_pairs)),
  19. ~ (.x - get(variable_pairs[[cur_column()]][2])) / get(variable_pairs[[cur_column()]][2]) * 100,
  20. .names = "{.col}_percentage_diff"
  21. )
  22. )

Unfortunately I am going wrong somewhere or overcomplicating things. The above code give this error: Error in `mutate()`:
ℹ In argument: `across(...)`.
Caused by error in `across()`:
! Can't compute column `vo2mlkg_12_actual_diff`.
Caused by error in `get()`:
! invalid first argument
Run `rlang::last_trace()` to see where the error occurred.

Can anyone suggest a fix or a simpler solution.

addendum:

long data

  1. Group| variable | phase | Value |
  2. 0 A 10 20
  3. 1 B 20 19
  4. 2 C 20 30
  5. 1 D 10 25
  6. 2 E 20 32
  7. 0 F 10 NA
  8. 0 G 20 19
  9. 1 H 10 20
  10. 2 I 10 20
  11. 0 J 20 30

Solution thanks to @Maël:

  1. library(dplyr)
  2. library(tidyr)
  3. library(magrittr)
  4. df2 <- df[,-2]
  5. df2 %<>%
  6. pivot_longer(-group, names_sep = "_", names_to = c("set", ".value")) %>%
  7. {colnames(.) <- c("group", "set", "pre", "post"); .} %>%
  8. mutate(
  9. diff = post - pre,
  10. diff_perc = ((post - pre) / pre) * 100
  11. )%>%
  12. group_by(group, set) %>%
  13. summarize(
  14. mean_diff = mean(diff, na.rm = TRUE),
  15. mean_diff_perc = mean(diff_perc, na.rm = TRUE)
  16. ) %>%
  17. pivot_wider(names_from = set, values_from = c(mean_diff, mean_diff_perc))

答案1

得分: 1

你可以使用多个 across 函数来计算差异:

  1. library(dplyr)
  2. df %>%
  3. mutate(across(matches("_post$"), .names = "{gsub('post','', .col)}diff") - across(matches("_pre$")),
  4. (across(matches("_post$"), .names = "{gsub('post','', .col)}perc_diff") - across(matches("_pre$"))) / across(matches("_post$")))

或者,可能更简单的方法是,首先对数据进行透视,然后计算差异:

  1. library(tidyr)
  2. df %>%
  3. pivot_longer(-Group, names_sep = "_", names_to = c("set", ".value")) %>%
  4. mutate(diff = post - pre,
  5. diff_perc = (post - pre) / post)
英文:

You can use multiple across:

  1. library(dplyr)
  2. df %>%
  3. mutate(across(matches("_post$"), .names = "{gsub('post','', .col)}diff") - across(matches("_pre$")),
  4. (across(matches("_post$"), .names = "{gsub('post','', .col)}perc_diff") - across(matches("_pre$"))) / across(matches("_post$"))) %>%
  5. # # A tibble: 10 × 9
  6. # Group A_pre A_post B_pre B_post A_diff B_diff A_perc_diff B_perc_diff
  7. # <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl>
  8. # 1 0 20 21 20 23 1 3 0.0476 0.130
  9. # 2 1 30 10 19 11 -20 -8 -2 -0.727
  10. # 3 2 10 53 30 34 43 4 0.811 0.118
  11. # 4 1 22 32 25 20 10 -5 0.312 -0.25
  12. # 5 2 34 40 32 30 6 -2 0.15 -0.0667
  13. # 6 0 30 50 NA 40 20 NA 0.4 NA
  14. # 7 0 39 40 19 20 1 1 0.025 0.05
  15. # 8 1 40 NA 20 20 NA 0 NA 0
  16. # 9 2 50 10 20 10 -40 -10 -4 -1
  17. # 10 0 34 23 30 10 -11 -20 -0.478 -2

Or, probably simpler, you can pivot your data first, and then compute the differences:

  1. library(tidyr)
  2. df %>%
  3. pivot_longer(-Group, names_sep = "_", names_to = c("set", ".value")) %>%
  4. mutate(diff = post - pre,
  5. diff_perc = (post - pre) / post)
  6. # # A tibble: 20 × 6
  7. # Group set pre post diff diff_perc
  8. # <int> <chr> <int> <int> <int> <dbl>
  9. # 1 0 A 20 21 1 0.0476
  10. # 2 0 B 20 23 3 0.130
  11. # 3 1 A 30 10 -20 -2
  12. # 4 1 B 19 11 -8 -0.727
  13. # 5 2 A 10 53 43 0.811
  14. # 6 2 B 30 34 4 0.118
  15. # 7 1 A 22 32 10 0.312
  16. # 8 1 B 25 20 -5 -0.25
  17. # 9 2 A 34 40 6 0.15
  18. # 10 2 B 32 30 -2 -0.0667
  19. # 11 0 A 30 50 20 0.4
  20. # 12 0 B NA 40 NA NA
  21. # 13 0 A 39 40 1 0.025
  22. # 14 0 B 19 20 1 0.05
  23. # 15 1 A 40 NA NA NA
  24. # 16 1 B 20 20 0 0
  25. # 17 2 A 50 10 -40 -4
  26. # 18 2 B 20 10 -10 -1
  27. # 19 0 A 34 23 -11 -0.478
  28. # 20 0 B 30 10 -20 -2

huangapple
  • 本文由 发表于 2023年5月17日 17:46:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/76270719.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定