2023年5月25日 04:09:26go评论79阅读模式

英文:

How to calculate standard error with differing sample sizes?

问题

I apologize for the lengthy text, but here's the translated portion:

我对在R中进行数据分析相对新手，如果我没有使用正确的术语，我道歉。我在这里搜索过，但找不到这个特定的问题。

我正在使用dplyr包来计算数据集中变量的均值、标准差和标准误差。然而，我不得不省略两个样本，这意味着每个变量的样本大小现在不同。

使用count函数：

nTable <- count(df, variable)

variables n

1 A 13
2 A Control 13
3 B 12
4 B Control 13
5 C 13
6 C Control 13
7 D 13
8 D Control 13
9 E 12
10 E Control 13
11 Standard 13

您可以看到变量'B'和'E'比其他变量少一个样本。我如何编写代码以反映这一点，使用summarise函数？如果所有样本大小都相同，这将是直接的：

Summary <- df %>%
group_by(variable) %>%
summarise(Mean=mean(value), SD=sd(value),
SE=sd(value)/sqrt((13)))

但是否有一种方法可以用各个变量的单独样本大小替换上面的样本大小（13）？

这是我迄今为止尝试过的内容，但没有成功：

nTable <- count(df, variable)
nTable

n <- nTable$n
n

Summary <- df %>%
group_by(variable) %>%
summarise(Mean=mean(value), SD=sd(value),
SE=sd(value)/sqrt((n)))

但我收到了这个消息：

警告信息：
在dplyr 1.1.0中，每个summarise()组返回的行数多于（或少于）1行被弃用。
请改用reframe()。
当从summarise()切换到reframe()时，请记住reframe()始终返回一个未分组的数据框，并相应调整。
调用lifecycle::last_lifecycle_warnings()以查看生成此警告的位置。

我不太确定这条消息试图告诉我什么，或者如何获得我想要的输出。任何帮助将不胜感激！

英文:

I'm relatively new to data analysis in R, so I apologize if I do not use the correct terminology. I've looked around on here, but could not find this specific question.

I am using the dplyr package to calculate the mean, standard deviation, and standard error of a my variables in a data set. However, I had to omit two samples, meaning there is now a different sample size for each of my variables.

Using the count function:

nTable \&lt;- count(df, variable)
variables         n
\&lt;fct\&gt;          \&lt;int\&gt;
1 A                13
2 A Control        13
3 B                12
4 B Control        13
5 C                13
6 C Control        13
7 D                13
8 D Control        13
9 E                12
10 E Control       13
11 Standard        13

You can see that variable 'B' and 'E' have one less sample than the other variables. How can I code to reflect this using the summarise function? If all sample sizes were the same, this would be straight forward:

Summary &lt;- df %&gt;% group_by(variable) %&gt;%  
summarise(Mean=mean(value), SD=sd(value),
SE=sd(value)/sqrt((13)))

But is there a way to replace the sample size above (13) with an individual sample size per each variable?

Here is what I tried so far, with no luck:

nTable &lt;- count(df, variable)
nTable
n &lt;- nTable$n
n
Summary &lt;- df %&gt;% group_by(variable) %&gt;%  
summarise(Mean=mean(value), SD=sd(value),
SE=sd(value)/sqrt((n)))

But I get this message:

Warning message:
Returning more (or less) than 1 row per `summarise()` group was 
deprecated in dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember 
that 
`reframe()` always returns an ungrouped data frame and
 adjust accordingly.
Call `lifecycle::last_lifecycle_warnings()` to see where this 
warning was generated.

I am not quite sure what this message is trying to tell me, or how I can get my desired output. Any help would be greatly appreciated!

答案1

得分: 1

这应该可以正常工作，如果你使用 sqrt(n()) 而不是 sqrt(n)（请参考 ?dplyr::n）。

英文:

(Promoted from a comment)

This should work if you use sqrt(n()) instead of sqrt(n) (see ?dplyr::n)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何计算具有不同样本大小的标准误差？

问题

答案1

从R中的smooth.spline检索/重现设计矩阵。

使用map()和map2()来运行回归并将拟合值添加到数据框中。

使用Quarto中的ojs_define从R块传递日期到ojs块

修改现有绘图的颜色比例尺。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。