英文:
Subtract values of specified subgroups from another within multiple larger groups
问题
我有以下形式的数据:
set.seed(123456)
domain <- as.factor(rep(c('alpha', 'bravo', 'charlie', 'delta', 'echo',
'foxtrot', 'golf', 'hotel', 'india', 'juliet'),
each = 8))
group <- as.factor(rep(c('group 1', 'group 2', 'group 3', 'group 4', 'group 5',
'group 6', 'group 7', 'group 8'), 10))
freq <- signif(rnorm(80, mean = 1750, sd = 500), 1)
df <- data.frame(domain, group, freq)
df
我试图从所有10个领域中的group 5的值中减去group 1的freq值,同时保留原始数据框。此代码将在多个数据集上运行,因此需要自动化并且易于多个用户重现。
这是我要的,注意每个领域中group 5的更改:
domain group freq
1 alpha group 1 2000
2 alpha group 2 2000
3 alpha group 3 2000
4 alpha group 4 2000
5 alpha group 5 **1000**
6 alpha group 6 2000
7 alpha group 7 2000
8 alpha group 8 3000
9 bravo group 1 2000
10 bravo group 2 2000
11 bravo group 3 1000
12 bravo group 4 1000
13 bravo group 5 **0**
14 bravo group 6 2000
15 bravo group 7 2000
16 bravo group 8 2000
17 charlie group 1 1000
18 charlie group 2 2000
19 charlie group 3 3000
20 charlie group 4 2000
21 charlie group 5 **0**
22 charlie group 6 2000
...
我尝试使用dplyr的group_by()
结合ifelse()
语句或基本的R来做到这一点,但没有成功。类似的问题在这个网站上的问题旨在从一个组中减去一个值,这不是我要的。
如果有人能够提供一个(我想是相当简单的)dplyr命令来实现这个目标,我将不胜感激。
这是我的第一个问题,如果有任何需要改进的地方,请告诉我。
英文:
I have data shaped like this:
set.seed(123456)
domain <- as.factor(rep(c('alpha', 'bravo', 'charlie', 'delta', 'echo',
'foxtrot', 'golf', 'hotel', 'india', 'juliet'),
each = 8))
group <- as.factor(rep(c('group 1', 'group 2', 'group 3', 'group 4', 'group 5',
'group 6', 'group 7', 'group 8'), 10))
freq <- signif(rnorm(80, mean = 1750, sd = 500), 1)
df <- data.frame(domain, group, freq)
df
domain group freq
1 alpha group 1 2000
2 alpha group 2 2000
3 alpha group 3 2000
4 alpha group 4 2000
5 alpha group 5 3000
6 alpha group 6 2000
7 alpha group 7 2000
8 alpha group 8 3000
9 bravo group 1 2000
10 bravo group 2 2000
11 bravo group 3 1000
12 bravo group 4 1000
13 bravo group 5 2000
14 bravo group 6 2000
15 bravo group 7 2000
16 bravo group 8 2000
17 charlie group 1 1000
18 charlie group 2 2000
19 charlie group 3 3000
20 charlie group 4 2000
21 charlie group 5 1000
22 charlie group 6 2000
...
I'm trying to subtract the freq value of group 1 from the value in group 5 for all 10 domains whilst retaining the original data frame. This code will be ran on multiple datasets and so needs to be automated and be easily reproducible across multiple users.
This is what I'm after, note changes to group 5 in each domain:
domain group freq
1 alpha group 1 2000
2 alpha group 2 2000
3 alpha group 3 2000
4 alpha group 4 2000
5 alpha group 5 **1000**
6 alpha group 6 2000
7 alpha group 7 2000
8 alpha group 8 3000
9 bravo group 1 2000
10 bravo group 2 2000
11 bravo group 3 1000
12 bravo group 4 1000
13 bravo group 5 **0**
14 bravo group 6 2000
15 bravo group 7 2000
16 bravo group 8 2000
17 charlie group 1 1000
18 charlie group 2 2000
19 charlie group 3 3000
20 charlie group 4 2000
21 charlie group 5 **0**
22 charlie group 6 2000
...
I've tried using group_by()
from dplyr in combination with ifelse()
statements or base R to do this to no avail. Similar questions on this site aim to subtract a value from all others in a group which is not what I'm after.
If anyone could assist with a (what I imagine is a fairly simple) dplyr command to get this I'd appreciate it.
This is my first question, so please let me know if there are any housekeeping rules I could follow in a better manner!
答案1
得分: 2
你可以简单地使用以下方式结合mutate
、ifelse
、子集化和.by = domain
来完成任务:
df %>%
mutate(diffvals = ifelse(!(group %in% "group 5"), freq,
freq[group == "group 5"] - freq[group == "group 1"]),
.by = domain)
输出 - 请注意,我创建了一个新变量(diffvals
),仅用于演示和验证目的。根据您的期望输出,您可以通过将mutate(diffvals = ...
更改为mutate(freq = ...)
来覆盖原始变量。
domain group freq diffvals
1 alpha group 1 2000 2000
2 alpha group 2 2000 2000
3 alpha group 3 2000 2000
4 alpha group 4 2000 2000
5 alpha group 5 3000 1000
6 alpha group 6 2000 2000
7 alpha group 7 2000 2000
8 alpha group 8 3000 3000
9 bravo group 1 2000 2000
10 bravo group 2 2000 2000
11 bravo group 3 1000 1000
12 bravo group 4 1000 1000
13 bravo group 5 2000 0
14 bravo group 6 2000 2000
15 bravo group 7 2000 2000
16 bravo group 8 2000 2000
17 charlie group 1 1000 1000
18 charlie group 2 2000 2000
19 charlie group 3 3000 3000
20 charlie group 4 2000 2000
21 charlie group 5 1000 0
22 charlie group 6 2000 2000
...
英文:
You should be able to simply use mutate
here with an ifelse
and little bit of subsetting and .by = domain
in the following way:
df %>%
mutate(diffvals = ifelse(!(group %in% "group 5"), freq,
freq[group == "group 5"] - freq[group == "group 1"]),
.by = domain)
Output - note I created a new variable (diffvals
) just for demonstration/verification purposes. You could overwrite the original variable per your desired output by changing mutate(diffvals = ...
to mutate(freq = ...)
domain group freq diffvals
1 alpha group 1 2000 2000
2 alpha group 2 2000 2000
3 alpha group 3 2000 2000
4 alpha group 4 2000 2000
5 alpha group 5 3000 1000
6 alpha group 6 2000 2000
7 alpha group 7 2000 2000
8 alpha group 8 3000 3000
9 bravo group 1 2000 2000
10 bravo group 2 2000 2000
11 bravo group 3 1000 1000
12 bravo group 4 1000 1000
13 bravo group 5 2000 0
14 bravo group 6 2000 2000
15 bravo group 7 2000 2000
16 bravo group 8 2000 2000
17 charlie group 1 1000 1000
18 charlie group 2 2000 2000
19 charlie group 3 3000 3000
20 charlie group 4 2000 2000
21 charlie group 5 1000 0
22 charlie group 6 2000 2000
...
答案2
得分: 1
可能有益于在这里使用宽格式
载入tidyverse库
df %>%
pivot_wider(names_from = group, values_from = freq, names_glue = "group_{group}") %>%
mutate(across(group_5, ~ .x - group_1))
一个tibble: 10 × 9
领域 group_1 group_2 group_3 group_4 group_5 group_6 group_7 group_8
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1000 500 3000 3000 1000 2000 600 400
2 2 2000 2000 2000 2000 0 2000 2000 2000
3 3 2000 2000 2000 2000 1000 1000 1000 2000
4 4 1000 2000 2000 2000 1000 1000 2000 1000
5 5 2000 2000 2000 1000 0 2000 1000 1000
6 6 2000 2000 2000 2000 0 2000 1000 1000
7 7 2000 1000 1000 3000 0 1000 1000 2000
8 8 3000 1000 2000 2000 -2000 3000 2000 2000
9 9 2000 2000 2000 1000 1000 2000 2000 2000
10 10 2000 2000 1000 2000 -1000 1000 2000 2000
英文:
Might be benefitical to work with a wide format here
library(tidyverse)
df %>%
pivot_wider(names_from = group, values_from = freq, names_glue = "group_{group}") %>%
mutate(across(group_5, ~ .x - group_1))
# A tibble: 10 × 9
domain group_1 group_2 group_3 group_4 group_5 group_6 group_7 group_8
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1000 500 3000 3000 1000 2000 600 400
2 2 2000 2000 2000 2000 0 2000 2000 2000
3 3 2000 2000 2000 2000 1000 1000 1000 2000
4 4 1000 2000 2000 2000 1000 1000 2000 1000
5 5 2000 2000 2000 1000 0 2000 1000 1000
6 6 2000 2000 2000 2000 0 2000 1000 1000
7 7 2000 1000 1000 3000 0 1000 1000 2000
8 8 3000 1000 2000 2000 -2000 3000 2000 2000
9 9 2000 2000 2000 1000 1000 2000 2000 2000
10 10 2000 2000 1000 2000 -1000 1000 2000 2000
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论