使用两个数据框列对每个组进行wilcox.test与其他所有组比较。

huangapple go评论50阅读模式
英文:

Per group wilcox.test against everything else using two data frame columns

问题

输入数据框:

df <- data.frame(x=abs(rnorm(50)),col1=rep(1:5,10), col2=rep(1:4,25))

我想要执行以下操作:

df %>%
  group_by(col1) %>%
  # 在col2中的每个组上执行wilcox.test,计算p_value
  < for g in col2 do wilcox.test(.data[.data$col2 == g]$x,.data[.data$col2 != g]$x)$p.value >;

我不确定如何实现括号内的部分。最终结果应该有三列:col1、col2、p_value;其中p_value来自于col2中每个组与col2中其他值(在每个col1值内)执行的wilcox.test。

英文:

Input data frame:

df &lt;- data.frame(x=abs(rnorm(50)),col1=rep(1:5,10), col2=rep(1:4,25))

I want to do:

df %&gt;% 
  group_by(col1) %&gt;%
  &lt; for g in col2 do wilcox.test(.data[.data$col2 == g]$x,.data[.data$col2 != g]$x)$p.value &gt;

So what I am not sure is how to implement the part in the brackets. The end result should have three columns: col1, col2, p_value; where the p_value is from the wilcox.test of each group in col2 against all other values outside the group in col2 (within each col1 value).

答案1

得分: 0

以下是您要求的翻译内容:

你可以创建一个辅助函数,该函数接受x和col2列,按组返回带有p值的数据框。然后,只需使用`reframe`调用该函数,使用`.by=col1`。

```R
f <- function(x, c2) {
  vs <- unique(c2)
  data.frame(col2 = vs, p_value = sapply(vs, function(v) wilcox.test(x[c2 == v], x[c2 != v])$p.value))
} 

reframe(df, f(x, col2), .by = col1)

输出:

   col1 col2    p_value
1     1    1 0.08062436
2     1    2 0.44453044
3     1    3 0.16795666
4     1    4 0.67247162
5     2    2 0.02541280
6     2    3 0.14176987
7     2    4 0.80005160
8     2    1 0.73542312
9     3    3 0.73542312
10    3    4 0.86597007
11    3    1 0.19736842
12    3    2 0.49729102
13    4    4 0.30559856
14    4    1 0.14176987
15    4    2 0.11855005
16    4    3 0.34855521
17    5    1 0.26612487
18    5    2 0.14176987
19    5    3 0.05263158
20    5    4 0.49729102

输入(请注意,我使用rnorm(100)以避免循环使用):

set.seed(123)
df <- data.frame(x = abs(rnorm(100)), col1 = rep(1:5, 10), col2 = rep(1:4, 25))

<details>
<summary>英文:</summary>

You can make a helper function that takes the x and col2 columns, by group and returns a dataframe with the p values. Then, just call that function using `reframe`, with `.by=col1`

f <- (x,c2) {
vs <- unique(c2)
data.frame(col2=vs,p_value=sapply(vs, (v) wilcox.test(x[c2==v],x[c2!=v])$p.value))
}

reframe(df, f(x,col2), .by=col1)


Output:

col1 col2 p_value
1 1 1 0.08062436
2 1 2 0.44453044
3 1 3 0.16795666
4 1 4 0.67247162
5 2 2 0.02541280
6 2 3 0.14176987
7 2 4 0.80005160
8 2 1 0.73542312
9 3 3 0.73542312
10 3 4 0.86597007
11 3 1 0.19736842
12 3 2 0.49729102
13 4 4 0.30559856
14 4 1 0.14176987
15 4 2 0.11855005
16 4 3 0.34855521
17 5 1 0.26612487
18 5 2 0.14176987
19 5 3 0.05263158
20 5 4 0.49729102


Input (notice that I use `rnorm(100)` to avoid recycling):

set.seed=123
df <- data.frame(x=abs(rnorm(100)),col1=rep(1:5,10), col2=rep(1:4,25))



</details>



huangapple
  • 本文由 发表于 2023年6月13日 01:33:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76459029.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定