从多个较大的组中减去指定子组的值。

huangapple go评论62阅读模式
英文:

Subtract values of specified subgroups from another within multiple larger groups

问题

我有以下形式的数据:

set.seed(123456)
domain <- as.factor(rep(c('alpha', 'bravo', 'charlie', 'delta', 'echo', 
                          'foxtrot', 'golf', 'hotel', 'india', 'juliet'), 
                        each = 8))
group <- as.factor(rep(c('group 1', 'group 2', 'group 3', 'group 4', 'group 5', 
                         'group 6', 'group 7', 'group 8'), 10))
freq <- signif(rnorm(80, mean = 1750, sd = 500), 1)
df <- data.frame(domain, group, freq)

df

我试图从所有10个领域中的group 5的值中减去group 1的freq值,同时保留原始数据框。此代码将在多个数据集上运行,因此需要自动化并且易于多个用户重现。

这是我要的,注意每个领域中group 5的更改:

    domain   group freq
1    alpha group 1 2000
2    alpha group 2 2000
3    alpha group 3 2000
4    alpha group 4 2000
5    alpha group 5 **1000**
6    alpha group 6 2000
7    alpha group 7 2000
8    alpha group 8 3000
9    bravo group 1 2000
10   bravo group 2 2000
11   bravo group 3 1000
12   bravo group 4 1000
13   bravo group 5 **0**
14   bravo group 6 2000
15   bravo group 7 2000
16   bravo group 8 2000
17 charlie group 1 1000
18 charlie group 2 2000
19 charlie group 3 3000
20 charlie group 4 2000
21 charlie group 5 **0**
22 charlie group 6 2000
...

我尝试使用dplyr的group_by()结合ifelse()语句或基本的R来做到这一点,但没有成功。类似的问题在这个网站上的问题旨在从一个组中减去一个值,这不是我要的。

如果有人能够提供一个(我想是相当简单的)dplyr命令来实现这个目标,我将不胜感激。

这是我的第一个问题,如果有任何需要改进的地方,请告诉我。

英文:

I have data shaped like this:

set.seed(123456)
domain <- as.factor(rep(c('alpha', 'bravo', 'charlie', 'delta', 'echo', 
                          'foxtrot', 'golf', 'hotel', 'india', 'juliet'), 
                        each = 8))
group <- as.factor(rep(c('group 1', 'group 2', 'group 3', 'group 4', 'group 5', 
                         'group 6', 'group 7', 'group 8'), 10))
freq <- signif(rnorm(80, mean = 1750, sd = 500), 1)
df <- data.frame(domain, group, freq)

df

    domain   group freq
1    alpha group 1 2000
2    alpha group 2 2000
3    alpha group 3 2000
4    alpha group 4 2000
5    alpha group 5 3000
6    alpha group 6 2000
7    alpha group 7 2000
8    alpha group 8 3000
9    bravo group 1 2000
10   bravo group 2 2000
11   bravo group 3 1000
12   bravo group 4 1000
13   bravo group 5 2000
14   bravo group 6 2000
15   bravo group 7 2000
16   bravo group 8 2000
17 charlie group 1 1000
18 charlie group 2 2000
19 charlie group 3 3000
20 charlie group 4 2000
21 charlie group 5 1000
22 charlie group 6 2000
...

I'm trying to subtract the freq value of group 1 from the value in group 5 for all 10 domains whilst retaining the original data frame. This code will be ran on multiple datasets and so needs to be automated and be easily reproducible across multiple users.

This is what I'm after, note changes to group 5 in each domain:

    domain   group freq
1    alpha group 1 2000
2    alpha group 2 2000
3    alpha group 3 2000
4    alpha group 4 2000
5    alpha group 5 **1000**
6    alpha group 6 2000
7    alpha group 7 2000
8    alpha group 8 3000
9    bravo group 1 2000
10   bravo group 2 2000
11   bravo group 3 1000
12   bravo group 4 1000
13   bravo group 5 **0**
14   bravo group 6 2000
15   bravo group 7 2000
16   bravo group 8 2000
17 charlie group 1 1000
18 charlie group 2 2000
19 charlie group 3 3000
20 charlie group 4 2000
21 charlie group 5 **0**
22 charlie group 6 2000
...

I've tried using group_by() from dplyr in combination with ifelse() statements or base R to do this to no avail. Similar questions on this site aim to subtract a value from all others in a group which is not what I'm after.

If anyone could assist with a (what I imagine is a fairly simple) dplyr command to get this I'd appreciate it.

This is my first question, so please let me know if there are any housekeeping rules I could follow in a better manner!

答案1

得分: 2

你可以简单地使用以下方式结合mutateifelse、子集化和.by = domain来完成任务:

df %>%
  mutate(diffvals = ifelse(!(group %in% "group 5"), freq,
                              freq[group == "group 5"] - freq[group == "group 1"]), 
         .by = domain)

输出 - 请注意,我创建了一个新变量(diffvals),仅用于演示和验证目的。根据您的期望输出,您可以通过将mutate(diffvals = ...更改为mutate(freq = ...)来覆盖原始变量。

    domain   group freq diffvals
1    alpha group 1 2000     2000
2    alpha group 2 2000     2000
3    alpha group 3 2000     2000
4    alpha group 4 2000     2000
5    alpha group 5 3000     1000
6    alpha group 6 2000     2000
7    alpha group 7 2000     2000
8    alpha group 8 3000     3000
9    bravo group 1 2000     2000
10   bravo group 2 2000     2000
11   bravo group 3 1000     1000
12   bravo group 4 1000     1000
13   bravo group 5 2000        0
14   bravo group 6 2000     2000
15   bravo group 7 2000     2000
16   bravo group 8 2000     2000
17 charlie group 1 1000     1000
18 charlie group 2 2000     2000
19 charlie group 3 3000     3000
20 charlie group 4 2000     2000
21 charlie group 5 1000        0
22 charlie group 6 2000     2000
...
英文:

You should be able to simply use mutate here with an ifelse and little bit of subsetting and .by = domain in the following way:

df %>%
  mutate(diffvals = ifelse(!(group %in% "group 5"), freq,
                              freq[group == "group 5"] - freq[group == "group 1"]), 
            .by = domain)

Output - note I created a new variable (diffvals) just for demonstration/verification purposes. You could overwrite the original variable per your desired output by changing mutate(diffvals = ... to mutate(freq = ...)

    domain   group freq diffvals
1    alpha group 1 2000     2000
2    alpha group 2 2000     2000
3    alpha group 3 2000     2000
4    alpha group 4 2000     2000
5    alpha group 5 3000     1000
6    alpha group 6 2000     2000
7    alpha group 7 2000     2000
8    alpha group 8 3000     3000
9    bravo group 1 2000     2000
10   bravo group 2 2000     2000
11   bravo group 3 1000     1000
12   bravo group 4 1000     1000
13   bravo group 5 2000        0
14   bravo group 6 2000     2000
15   bravo group 7 2000     2000
16   bravo group 8 2000     2000
17 charlie group 1 1000     1000
18 charlie group 2 2000     2000
19 charlie group 3 3000     3000
20 charlie group 4 2000     2000
21 charlie group 5 1000        0
22 charlie group 6 2000     2000
...

答案2

得分: 1

可能有益于在这里使用宽格式

载入tidyverse库

df %>%
  pivot_wider(names_from = group, values_from = freq, names_glue = "group_{group}") %>%
  mutate(across(group_5, ~ .x - group_1))

一个tibble: 10 × 9
   领域 group_1 group_2 group_3 group_4 group_5 group_6 group_7 group_8
    <int>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1      1    1000     500    3000    3000    1000    2000     600     400
 2      2    2000    2000    2000    2000       0    2000    2000    2000
 3      3    2000    2000    2000    2000    1000    1000    1000    2000
 4      4    1000    2000    2000    2000    1000    1000    2000    1000
 5      5    2000    2000    2000    1000       0    2000    1000    1000
 6      6    2000    2000    2000    2000       0    2000    1000    1000
 7      7    2000    1000    1000    3000       0    1000    1000    2000
 8      8    3000    1000    2000    2000   -2000    3000    2000    2000
 9      9    2000    2000    2000    1000    1000    2000    2000    2000
10     10    2000    2000    1000    2000   -1000    1000    2000    2000
英文:

Might be benefitical to work with a wide format here

library(tidyverse)

df %&gt;%  
  pivot_wider(names_from = group, values_from = freq, names_glue = &quot;group_{group}&quot;) %&gt;% 
  mutate(across(group_5, ~ .x - group_1))

# A tibble: 10 &#215; 9
   domain group_1 group_2 group_3 group_4 group_5 group_6 group_7 group_8
    &lt;int&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;
 1      1    1000     500    3000    3000    1000    2000     600     400
 2      2    2000    2000    2000    2000       0    2000    2000    2000
 3      3    2000    2000    2000    2000    1000    1000    1000    2000
 4      4    1000    2000    2000    2000    1000    1000    2000    1000
 5      5    2000    2000    2000    1000       0    2000    1000    1000
 6      6    2000    2000    2000    2000       0    2000    1000    1000
 7      7    2000    1000    1000    3000       0    1000    1000    2000
 8      8    3000    1000    2000    2000   -2000    3000    2000    2000
 9      9    2000    2000    2000    1000    1000    2000    2000    2000
10     10    2000    2000    1000    2000   -1000    1000    2000    2000

huangapple
  • 本文由 发表于 2023年6月29日 21:56:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76581721.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定