英文:
Per group wilcox.test against everything else using two data frame columns
问题
输入数据框:
df <- data.frame(x=abs(rnorm(50)),col1=rep(1:5,10), col2=rep(1:4,25))
我想要执行以下操作:
df %>%
group_by(col1) %>%
# 在col2中的每个组上执行wilcox.test,计算p_value
< for g in col2 do wilcox.test(.data[.data$col2 == g]$x,.data[.data$col2 != g]$x)$p.value >;
我不确定如何实现括号内的部分。最终结果应该有三列:col1、col2、p_value;其中p_value来自于col2中每个组与col2中其他值(在每个col1值内)执行的wilcox.test。
英文:
Input data frame:
df <- data.frame(x=abs(rnorm(50)),col1=rep(1:5,10), col2=rep(1:4,25))
I want to do:
df %>%
group_by(col1) %>%
< for g in col2 do wilcox.test(.data[.data$col2 == g]$x,.data[.data$col2 != g]$x)$p.value >
So what I am not sure is how to implement the part in the brackets. The end result should have three columns: col1, col2, p_value; where the p_value is from the wilcox.test of each group in col2 against all other values outside the group in col2 (within each col1 value).
答案1
得分: 0
以下是您要求的翻译内容:
你可以创建一个辅助函数,该函数接受x和col2列,按组返回带有p值的数据框。然后,只需使用`reframe`调用该函数,使用`.by=col1`。
```R
f <- function(x, c2) {
vs <- unique(c2)
data.frame(col2 = vs, p_value = sapply(vs, function(v) wilcox.test(x[c2 == v], x[c2 != v])$p.value))
}
reframe(df, f(x, col2), .by = col1)
输出:
col1 col2 p_value
1 1 1 0.08062436
2 1 2 0.44453044
3 1 3 0.16795666
4 1 4 0.67247162
5 2 2 0.02541280
6 2 3 0.14176987
7 2 4 0.80005160
8 2 1 0.73542312
9 3 3 0.73542312
10 3 4 0.86597007
11 3 1 0.19736842
12 3 2 0.49729102
13 4 4 0.30559856
14 4 1 0.14176987
15 4 2 0.11855005
16 4 3 0.34855521
17 5 1 0.26612487
18 5 2 0.14176987
19 5 3 0.05263158
20 5 4 0.49729102
输入(请注意,我使用rnorm(100)
以避免循环使用):
set.seed(123)
df <- data.frame(x = abs(rnorm(100)), col1 = rep(1:5, 10), col2 = rep(1:4, 25))
<details>
<summary>英文:</summary>
You can make a helper function that takes the x and col2 columns, by group and returns a dataframe with the p values. Then, just call that function using `reframe`, with `.by=col1`
f <- (x,c2) {
vs <- unique(c2)
data.frame(col2=vs,p_value=sapply(vs, (v) wilcox.test(x[c2==v],x[c2!=v])$p.value))
}
reframe(df, f(x,col2), .by=col1)
Output:
col1 col2 p_value
1 1 1 0.08062436
2 1 2 0.44453044
3 1 3 0.16795666
4 1 4 0.67247162
5 2 2 0.02541280
6 2 3 0.14176987
7 2 4 0.80005160
8 2 1 0.73542312
9 3 3 0.73542312
10 3 4 0.86597007
11 3 1 0.19736842
12 3 2 0.49729102
13 4 4 0.30559856
14 4 1 0.14176987
15 4 2 0.11855005
16 4 3 0.34855521
17 5 1 0.26612487
18 5 2 0.14176987
19 5 3 0.05263158
20 5 4 0.49729102
Input (notice that I use `rnorm(100)` to avoid recycling):
set.seed=123
df <- data.frame(x=abs(rnorm(100)),col1=rep(1:5,10), col2=rep(1:4,25))
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论