英文:
cor_auto giving different results for missing = 'listwise' vs 'pairwise' for correlation with two variables
问题
When calculating a polychoric correlation between two variables with missing values, cor_auto
在missing
参数设置为 'listwise' 与 'pairwise' 时提供不同的输出,例如:
library(qgraph)
set.seed(5)
df<-data.frame(lapply(1:2,function(x)sample(1:6,100,replace = T)),
stringsAsFactors = F)
colnames(df)=c("a", "b")
# make some missing values
df[10:20,2]<-NA
# these are different
cor_auto(df[,c("a", "b")], missing = "listwise")
cor_auto(df[,c("a", "b")], missing = "pairwise")
我期望当只包括两个变量时(只包括两个变量都非缺失的情况),这两者应该产生相同的输出。有人知道这种差异是如何产生的吗?
英文:
When calculating a polychoric correlation between two variables with missing values, cor_auto
is providing different outputs with the missing argument set to 'listwise' compared to 'pairwise', for example:
library(qgraph)
set.seed(5)
df<-data.frame(lapply(1:2,function(x)sample(1:6,100,replace = T)),
stringsAsFactors = F)
colnames(df)=c("a", "b")
# make some missing values
df[10:20,2]<-NA
# these are different
cor_auto(df[,c("a", "b")], missing = "listwise")
cor_auto(df[,c("a", "b")], missing = "pairwise")
I expected that these should result in the same output when only two variables are included (only cases with both variables non-missing included). Does anyone know how this difference comes about?
答案1
得分: 1
以下是翻译好的部分:
这里的基础功能是 lavaan::lavCor
,除了众多项的多项式相关性估计外,它还估计了阈值。通过设置 missing = "listwise"
,变量 a
的阈值仅使用具有完整数据的行来估计,因此与使用 missing = "pairwise"
估计的阈值不同。这导致了不一致性。
英文:
The underlying function here is lavaan::lavCor
which also estimates thresholds in addition to the polychoric correlation. By setting missing = "listwise"
, the thresholds of variable a
are estimated using only the rows that have complete data, and so are different than the thresholds estimated with missing = "pairwise"
. This leads to the discrepancy.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论