英文:
Correlation Matrix Between Variables in R
问题
我一直在尝试确定面板数据中变量之间的相关性。我的数据如下(有更多的日期,一些PM10的值为NA):
structure(list(NetC = c("Cosenza Provincia", "Cosenza Provincia",
"Cosenza Provincia", "Cosenza Provincia", "Cosenza Provincia",
"Cosenza Provincia", "Cosenza Provincia", "Cosenza Provincia",
"Cosenza Provincia", "Reti Private", "Reti Private", "Reti Private",
"Reti Private", "Reti Private", "Reti Private"), ID = c("IT1938A",
"IT1938A", "IT1938A", "IT2086A", "IT2086A", "IT2086A", "IT2110A",
"IT2110A", "IT2110A", "IT1766A", "IT1766A", "IT1766A", "IT2090A",
"IT2090A", "IT2090A"), Stat = c("Citta dei Ragazzi", "Citta dei Ragazzi",
"Citta dei Ragazzi", "Rende", "Rende", "Rende", "Acri", "Acri",
"Acri", "Firmo", "Firmo", "Firmo", "Schiavonea", "Schiavonea",
"Schiavonea"), Data = c("1/1/2022", "1/2/2022", "1/3/2022", "1/1/2022",
"1/2/2022", "1/3/2022", "1/1/2022", "1/2/2022", "1/3/2022", "1/1/2022",
"1/2/2022", "1/3/2022", "1/1/2022", "1/2/2022", "1/3/2022"),
PM10 = c(13.29, 11.14, 9.08, 16.62, 12.98, 10.4, 16.2, 19.4,
15.7, 10.82, 12.29, 9.54, 24.54, 22.88, 27.33)), class = "data.frame", row.names = c(NA,
-15L))
我尝试使用plm::cortab
,但它不计算相关性。
library(plm)
cortab(data$PM10, grouping = Stat, groupnames = c("Citta dei Ragazzi", "Rende",
"Acri", "Firmo", "Schiavonea"))
输出应该如下所示:
Citta dei Ragazzi | Rende | Acri | |
---|---|---|---|
Citta dei Ragazzi | 1 | ||
Rende | x | 1 | |
Acri | x | x | 1 |
英文:
I have been trying to determine the correlation between variable in panel data. My data is in the form (with more dates, some values of PM10 are NA):
structure(list(NetC = c("Cosenza Provincia", "Cosenza Provincia",
"Cosenza Provincia", "Cosenza Provincia", "Cosenza Provincia",
"Cosenza Provincia", "Cosenza Provincia", "Cosenza Provincia",
"Cosenza Provincia", "Reti Private", "Reti Private", "Reti Private",
"Reti Private", "Reti Private", "Reti Private"), ID = c("IT1938A",
"IT1938A", "IT1938A", "IT2086A", "IT2086A", "IT2086A", "IT2110A",
"IT2110A", "IT2110A", "IT1766A", "IT1766A", "IT1766A", "IT2090A",
"IT2090A", "IT2090A"), Stat = c("Citta dei Ragazzi", "Citta dei Ragazzi",
"Citta dei Ragazzi", "Rende", "Rende", "Rende", "Acri", "Acri",
"Acri", "Firmo", "Firmo", "Firmo", "Schiavonea", "Schiavonea",
"Schiavonea"), Data = c("1/1/2022", "1/2/2022", "1/3/2022", "1/1/2022",
"1/2/2022", "1/3/2022", "1/1/2022", "1/2/2022", "1/3/2022", "1/1/2022",
"1/2/2022", "1/3/2022", "1/1/2022", "1/2/2022", "1/3/2022"),
PM10 = c(13.29, 11.14, 9.08, 16.62, 12.98, 10.4, 16.2, 19.4,
15.7, 10.82, 12.29, 9.54, 24.54, 22.88, 27.33)), class = "data.frame", row.names = c(NA,
-15L))
I have tried using plm::cortab
, but it doesn't calculate the correlation.
library(plm)
cortab(data$PM10, grouping = Stat, groupnames = c("Citta dei Ragazzi", "Rende",
"Acri", "Firmo", "Schiavonea"))
The output should look like:
Citta dei Ragazzi | Rende | Acri | |
---|---|---|---|
Citta dei Ragazzi | 1 | ||
Rende | x | 1 | |
Acri | x | x | 1 |
答案1
得分: 0
以下是代码的翻译部分:
# 简单的相关性矩阵:
data.wider <- data %>%
select(-ID, -NetC) %>% # 移除不必要的变量
pivot_wider(names_from = 'Stat', values_from = 'PM10')
cor(data.wider[,-1], use = 'p')
# 需要更多行来设置相关性测试:
pw <- combn(unique(data$Stat),2) # 创建成对的组合
pw
pairwise_c <- apply(pw,2,function(i){
tidy(cor.test(data.wider[[i[1]]],data.wider[[i[2]]]))
})
results <- cbind(data.frame(t(pw)),bind_rows(pairwise_c))
results
英文:
This has pretty much already been asked (https://stackoverflow.com/questions/62473889/how-can-i-complete-a-correlation-in-r-of-one-variable-across-its-factor-levels) but for ease I have adapted that answer here for your use:
# simple correlation matrix:
data.wider <- data %>%
select(-ID, -NetC) %>% # remove unnecessary vars
pivot_wider(names_from = 'Stat', values_from = 'PM10')
cor(data.wider[,-1], use = 'p')
# more lines required to set up correlation testing:
pw <- combn(unique(data$Stat),2) # make pairwise sets
pw
pairwise_c <- apply(pw,2,function(i){
tidy(cor.test(data.wider[[i[1]]],data.wider[[i[2]]]))
})
results <- cbind(data.frame(t(pw)),bind_rows(pairwise_c))
results
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论