从多个较大的组中减去指定子组的值。

huangapple go评论105阅读模式
英文:

Subtract values of specified subgroups from another within multiple larger groups

问题

我有以下形式的数据:

  1. set.seed(123456)
  2. domain <- as.factor(rep(c('alpha', 'bravo', 'charlie', 'delta', 'echo',
  3. 'foxtrot', 'golf', 'hotel', 'india', 'juliet'),
  4. each = 8))
  5. group <- as.factor(rep(c('group 1', 'group 2', 'group 3', 'group 4', 'group 5',
  6. 'group 6', 'group 7', 'group 8'), 10))
  7. freq <- signif(rnorm(80, mean = 1750, sd = 500), 1)
  8. df <- data.frame(domain, group, freq)
  9. df

我试图从所有10个领域中的group 5的值中减去group 1的freq值,同时保留原始数据框。此代码将在多个数据集上运行,因此需要自动化并且易于多个用户重现。

这是我要的,注意每个领域中group 5的更改:

  1. domain group freq
  2. 1 alpha group 1 2000
  3. 2 alpha group 2 2000
  4. 3 alpha group 3 2000
  5. 4 alpha group 4 2000
  6. 5 alpha group 5 **1000**
  7. 6 alpha group 6 2000
  8. 7 alpha group 7 2000
  9. 8 alpha group 8 3000
  10. 9 bravo group 1 2000
  11. 10 bravo group 2 2000
  12. 11 bravo group 3 1000
  13. 12 bravo group 4 1000
  14. 13 bravo group 5 **0**
  15. 14 bravo group 6 2000
  16. 15 bravo group 7 2000
  17. 16 bravo group 8 2000
  18. 17 charlie group 1 1000
  19. 18 charlie group 2 2000
  20. 19 charlie group 3 3000
  21. 20 charlie group 4 2000
  22. 21 charlie group 5 **0**
  23. 22 charlie group 6 2000
  24. ...

我尝试使用dplyr的group_by()结合ifelse()语句或基本的R来做到这一点,但没有成功。类似的问题在这个网站上的问题旨在从一个组中减去一个值,这不是我要的。

如果有人能够提供一个(我想是相当简单的)dplyr命令来实现这个目标,我将不胜感激。

这是我的第一个问题,如果有任何需要改进的地方,请告诉我。

英文:

I have data shaped like this:

  1. set.seed(123456)
  2. domain <- as.factor(rep(c('alpha', 'bravo', 'charlie', 'delta', 'echo',
  3. 'foxtrot', 'golf', 'hotel', 'india', 'juliet'),
  4. each = 8))
  5. group <- as.factor(rep(c('group 1', 'group 2', 'group 3', 'group 4', 'group 5',
  6. 'group 6', 'group 7', 'group 8'), 10))
  7. freq <- signif(rnorm(80, mean = 1750, sd = 500), 1)
  8. df <- data.frame(domain, group, freq)
  9. df
  10. domain group freq
  11. 1 alpha group 1 2000
  12. 2 alpha group 2 2000
  13. 3 alpha group 3 2000
  14. 4 alpha group 4 2000
  15. 5 alpha group 5 3000
  16. 6 alpha group 6 2000
  17. 7 alpha group 7 2000
  18. 8 alpha group 8 3000
  19. 9 bravo group 1 2000
  20. 10 bravo group 2 2000
  21. 11 bravo group 3 1000
  22. 12 bravo group 4 1000
  23. 13 bravo group 5 2000
  24. 14 bravo group 6 2000
  25. 15 bravo group 7 2000
  26. 16 bravo group 8 2000
  27. 17 charlie group 1 1000
  28. 18 charlie group 2 2000
  29. 19 charlie group 3 3000
  30. 20 charlie group 4 2000
  31. 21 charlie group 5 1000
  32. 22 charlie group 6 2000
  33. ...

I'm trying to subtract the freq value of group 1 from the value in group 5 for all 10 domains whilst retaining the original data frame. This code will be ran on multiple datasets and so needs to be automated and be easily reproducible across multiple users.

This is what I'm after, note changes to group 5 in each domain:

  1. domain group freq
  2. 1 alpha group 1 2000
  3. 2 alpha group 2 2000
  4. 3 alpha group 3 2000
  5. 4 alpha group 4 2000
  6. 5 alpha group 5 **1000**
  7. 6 alpha group 6 2000
  8. 7 alpha group 7 2000
  9. 8 alpha group 8 3000
  10. 9 bravo group 1 2000
  11. 10 bravo group 2 2000
  12. 11 bravo group 3 1000
  13. 12 bravo group 4 1000
  14. 13 bravo group 5 **0**
  15. 14 bravo group 6 2000
  16. 15 bravo group 7 2000
  17. 16 bravo group 8 2000
  18. 17 charlie group 1 1000
  19. 18 charlie group 2 2000
  20. 19 charlie group 3 3000
  21. 20 charlie group 4 2000
  22. 21 charlie group 5 **0**
  23. 22 charlie group 6 2000
  24. ...

I've tried using group_by() from dplyr in combination with ifelse() statements or base R to do this to no avail. Similar questions on this site aim to subtract a value from all others in a group which is not what I'm after.

If anyone could assist with a (what I imagine is a fairly simple) dplyr command to get this I'd appreciate it.

This is my first question, so please let me know if there are any housekeeping rules I could follow in a better manner!

答案1

得分: 2

你可以简单地使用以下方式结合mutateifelse、子集化和.by = domain来完成任务:

  1. df %>%
  2. mutate(diffvals = ifelse(!(group %in% "group 5"), freq,
  3. freq[group == "group 5"] - freq[group == "group 1"]),
  4. .by = domain)

输出 - 请注意,我创建了一个新变量(diffvals),仅用于演示和验证目的。根据您的期望输出,您可以通过将mutate(diffvals = ...更改为mutate(freq = ...)来覆盖原始变量。

  1. domain group freq diffvals
  2. 1 alpha group 1 2000 2000
  3. 2 alpha group 2 2000 2000
  4. 3 alpha group 3 2000 2000
  5. 4 alpha group 4 2000 2000
  6. 5 alpha group 5 3000 1000
  7. 6 alpha group 6 2000 2000
  8. 7 alpha group 7 2000 2000
  9. 8 alpha group 8 3000 3000
  10. 9 bravo group 1 2000 2000
  11. 10 bravo group 2 2000 2000
  12. 11 bravo group 3 1000 1000
  13. 12 bravo group 4 1000 1000
  14. 13 bravo group 5 2000 0
  15. 14 bravo group 6 2000 2000
  16. 15 bravo group 7 2000 2000
  17. 16 bravo group 8 2000 2000
  18. 17 charlie group 1 1000 1000
  19. 18 charlie group 2 2000 2000
  20. 19 charlie group 3 3000 3000
  21. 20 charlie group 4 2000 2000
  22. 21 charlie group 5 1000 0
  23. 22 charlie group 6 2000 2000
  24. ...
英文:

You should be able to simply use mutate here with an ifelse and little bit of subsetting and .by = domain in the following way:

  1. df %>%
  2. mutate(diffvals = ifelse(!(group %in% "group 5"), freq,
  3. freq[group == "group 5"] - freq[group == "group 1"]),
  4. .by = domain)

Output - note I created a new variable (diffvals) just for demonstration/verification purposes. You could overwrite the original variable per your desired output by changing mutate(diffvals = ... to mutate(freq = ...)

  1. domain group freq diffvals
  2. 1 alpha group 1 2000 2000
  3. 2 alpha group 2 2000 2000
  4. 3 alpha group 3 2000 2000
  5. 4 alpha group 4 2000 2000
  6. 5 alpha group 5 3000 1000
  7. 6 alpha group 6 2000 2000
  8. 7 alpha group 7 2000 2000
  9. 8 alpha group 8 3000 3000
  10. 9 bravo group 1 2000 2000
  11. 10 bravo group 2 2000 2000
  12. 11 bravo group 3 1000 1000
  13. 12 bravo group 4 1000 1000
  14. 13 bravo group 5 2000 0
  15. 14 bravo group 6 2000 2000
  16. 15 bravo group 7 2000 2000
  17. 16 bravo group 8 2000 2000
  18. 17 charlie group 1 1000 1000
  19. 18 charlie group 2 2000 2000
  20. 19 charlie group 3 3000 3000
  21. 20 charlie group 4 2000 2000
  22. 21 charlie group 5 1000 0
  23. 22 charlie group 6 2000 2000
  24. ...

答案2

得分: 1

可能有益于在这里使用宽格式

  1. 载入tidyverse
  2. df %>%
  3. pivot_wider(names_from = group, values_from = freq, names_glue = "group_{group}") %>%
  4. mutate(across(group_5, ~ .x - group_1))
  5. 一个tibble: 10 × 9
  6. 领域 group_1 group_2 group_3 group_4 group_5 group_6 group_7 group_8
  7. <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
  8. 1 1 1000 500 3000 3000 1000 2000 600 400
  9. 2 2 2000 2000 2000 2000 0 2000 2000 2000
  10. 3 3 2000 2000 2000 2000 1000 1000 1000 2000
  11. 4 4 1000 2000 2000 2000 1000 1000 2000 1000
  12. 5 5 2000 2000 2000 1000 0 2000 1000 1000
  13. 6 6 2000 2000 2000 2000 0 2000 1000 1000
  14. 7 7 2000 1000 1000 3000 0 1000 1000 2000
  15. 8 8 3000 1000 2000 2000 -2000 3000 2000 2000
  16. 9 9 2000 2000 2000 1000 1000 2000 2000 2000
  17. 10 10 2000 2000 1000 2000 -1000 1000 2000 2000
英文:

Might be benefitical to work with a wide format here

  1. library(tidyverse)
  2. df %&gt;%
  3. pivot_wider(names_from = group, values_from = freq, names_glue = &quot;group_{group}&quot;) %&gt;%
  4. mutate(across(group_5, ~ .x - group_1))
  5. # A tibble: 10 &#215; 9
  6. domain group_1 group_2 group_3 group_4 group_5 group_6 group_7 group_8
  7. &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
  8. 1 1 1000 500 3000 3000 1000 2000 600 400
  9. 2 2 2000 2000 2000 2000 0 2000 2000 2000
  10. 3 3 2000 2000 2000 2000 1000 1000 1000 2000
  11. 4 4 1000 2000 2000 2000 1000 1000 2000 1000
  12. 5 5 2000 2000 2000 1000 0 2000 1000 1000
  13. 6 6 2000 2000 2000 2000 0 2000 1000 1000
  14. 7 7 2000 1000 1000 3000 0 1000 1000 2000
  15. 8 8 3000 1000 2000 2000 -2000 3000 2000 2000
  16. 9 9 2000 2000 2000 1000 1000 2000 2000 2000
  17. 10 10 2000 2000 1000 2000 -1000 1000 2000 2000

huangapple
  • 本文由 发表于 2023年6月29日 21:56:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76581721.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定