如何计算具有不同样本大小的标准误差?

huangapple go评论58阅读模式
英文:

How to calculate standard error with differing sample sizes?

问题

I apologize for the lengthy text, but here's the translated portion:

我对在R中进行数据分析相对新手,如果我没有使用正确的术语,我道歉。我在这里搜索过,但找不到这个特定的问题。

我正在使用dplyr包来计算数据集中变量的均值、标准差和标准误差。然而,我不得不省略两个样本,这意味着每个变量的样本大小现在不同。

使用count函数:

nTable <- count(df, variable)

variables n

1 A 13
2 A Control 13
3 B 12
4 B Control 13
5 C 13
6 C Control 13
7 D 13
8 D Control 13
9 E 12
10 E Control 13
11 Standard 13

您可以看到变量'B'和'E'比其他变量少一个样本。我如何编写代码以反映这一点,使用summarise函数?如果所有样本大小都相同,这将是直接的:

Summary <- df %>%
group_by(variable) %>%
summarise(Mean=mean(value), SD=sd(value),
SE=sd(value)/sqrt((13)))

但是否有一种方法可以用各个变量的单独样本大小替换上面的样本大小(13)?

这是我迄今为止尝试过的内容,但没有成功:

nTable <- count(df, variable)
nTable

n <- nTable$n
n

Summary <- df %>%
group_by(variable) %>%
summarise(Mean=mean(value), SD=sd(value),
SE=sd(value)/sqrt((n)))

但我收到了这个消息:

警告信息:
在dplyr 1.1.0中,每个summarise()组返回的行数多于(或少于)1行被弃用。
请改用reframe()
当从summarise()切换到reframe()时,请记住reframe()始终返回一个未分组的数据框,并相应调整。
调用lifecycle::last_lifecycle_warnings()以查看生成此警告的位置。

我不太确定这条消息试图告诉我什么,或者如何获得我想要的输出。任何帮助将不胜感激!

英文:

I'm relatively new to data analysis in R, so I apologize if I do not use the correct terminology. I've looked around on here, but could not find this specific question.

I am using the dplyr package to calculate the mean, standard deviation, and standard error of a my variables in a data set. However, I had to omit two samples, meaning there is now a different sample size for each of my variables.

Using the count function:

nTable \&lt;- count(df, variable)

variables         n
\&lt;fct\&gt;          \&lt;int\&gt;
1 A                13
2 A Control        13
3 B                12
4 B Control        13
5 C                13
6 C Control        13
7 D                13
8 D Control        13
9 E                12
10 E Control       13
11 Standard        13

You can see that variable 'B' and 'E' have one less sample than the other variables. How can I code to reflect this using the summarise function? If all sample sizes were the same, this would be straight forward:

Summary &lt;- df %&gt;% group_by(variable) %&gt;%  
summarise(Mean=mean(value), SD=sd(value),
SE=sd(value)/sqrt((13)))

But is there a way to replace the sample size above (13) with an individual sample size per each variable?

Here is what I tried so far, with no luck:

nTable &lt;- count(df, variable)
nTable

n &lt;- nTable$n
n

Summary &lt;- df %&gt;% group_by(variable) %&gt;%  
summarise(Mean=mean(value), SD=sd(value),
SE=sd(value)/sqrt((n)))

But I get this message:

Warning message:
Returning more (or less) than 1 row per `summarise()` group was 
deprecated in dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember 
that 
`reframe()` always returns an ungrouped data frame and
 adjust accordingly.
Call `lifecycle::last_lifecycle_warnings()` to see where this 
warning was generated.

I am not quite sure what this message is trying to tell me, or how I can get my desired output. Any help would be greatly appreciated!

答案1

得分: 1

这应该可以正常工作,如果你使用 sqrt(n()) 而不是 sqrt(n)(请参考 ?dplyr::n)。

英文:

(Promoted from a comment)

This should work if you use sqrt(n()) instead of sqrt(n) (see ?dplyr::n)

huangapple
  • 本文由 发表于 2023年5月25日 04:09:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/76327076.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定