如何在R中使用CAST包处理shapefile(多边形)?

huangapple go评论59阅读模式
英文:

How to use CAST package for a shapefile (polygons) in R?

问题

以下是翻译好的部分:

My goal: 我的目标是运行一个套索回归模型,以进行数据变量选择(数据以sf多边形格式存在)。

My data: 如上所述,我的数据是一个sf对象,具体而言,是一个包含多边形的shapefile

我已经尝试使用ffstrain,但都没有成功。
以下是一个可重现的示例,其中包含一个多多边形shapefile。

请忘记以“74”结尾的变量与以“79”结尾的变量之间可能存在的时间关系。

library(sf)
library(CAST)

#加载数据
nc <- st_read(system.file("shape/nc.shp", package="sf"))

#训练和测试数据
set.seed(100)
ind   <- sample(2, nrow(nc), replace=TRUE, prob = c(0.7, 0.3))
train <- nc[ind==1,]
test  <- nc[ind==2,]

predictors <- c("SID74","BIR79","BIR74")
response   <- "NWBIR79"
## 第一选项 ##
#==============#
#前向特征选择
set.seed(10)
ffs(train[,predictors], train$NWBIR79, method = "lasso")
[1] "model using SID74,BIR79 will be trained now..."
出现问题:所有的RMSE度量值都缺失:
      RMSE        Rsquared        MAE     
 Min.   : NA   Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA   Max.   : NA  
 NA's   :3     NA's   :3     NA's   :3    
Error: Stopping
此外:有26个警告(使用warnings()查看它们)
## 第二选项 ##
#==============#
#没有前向特征选择的模型
set.seed(100)
model <- train(train[,predictors], train$NWBIR79, method="lasso", trControl=trainControl(method = "cv"),importance=TRUE)
出现问题:所有的RMSE度量值都缺失:
      RMSE        Rsquared        MAE     
 Min.   : NA   Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA   Max.   : NA  
 NA's   :3     NA's   :3     NA's   :3    
Error: Stopping
此外:有11个警告(使用warnings()查看它们)
英文:

any help with the following is really appreciated!!

My goal: I need to run a lasso model for variable selection for my data (which is in sf polygon format).

My data: As said above, is a sf object. Specifically, is a shapefile with polygons.

I have tried using either ffs or train. But none of them work.
Here is a reproducible example, with a multipolygon shapefile.

Please forget about the possible time relationship between the variables that end in "74" and the ones that end in "79".

library(sf)
library(CAST)

#Loading data
nc &lt;- st_read(system.file(&quot;shape/nc.shp&quot;, package=&quot;sf&quot;))

#Training and test data
set.seed(100)
ind   &lt;- sample(2,nrow(nc),replace=T,prob = c(0.7,0.3))
train &lt;- nc[ind==1,]
test  &lt;- nc[ind==2,]

predictors &lt;- c(&quot;SID74&quot;,&quot;BIR79&quot;,&quot;BIR74&quot;)
response   &lt;- &quot;NWBIR79&quot;
## 1st option ##
#==============#
#ffs Forward feature selection
set.seed(10)
ffs(train[,predictors], train$NWBIR79,method = &quot;lasso&quot;)
[1] &quot;model using SID74,BIR79 will be trained now...&quot;
Something is wrong; all the RMSE metric values are missing:
      RMSE        Rsquared        MAE     
 Min.   : NA   Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA   Max.   : NA  
 NA&#39;s   :3     NA&#39;s   :3     NA&#39;s   :3    
Error: Stopping
In addition: There were 26 warnings (use warnings() to see them)
## 2nd option ##
#==============#
#model without ffs
set.seed(100)
model &lt;- train(train[,predictors], train$NWBIR79, method=&quot;lasso&quot;, trControl=trainControl(method = &quot;cv&quot;),importance=T)
Something is wrong; all the RMSE metric values are missing:
      RMSE        Rsquared        MAE     
 Min.   : NA   Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA   Max.   : NA  
 NA&#39;s   :3     NA&#39;s   :3     NA&#39;s   :3    
Error: Stopping
In addition: There were 11 warnings (use warnings() to see them)

答案1

得分: 1

  1. 当前错误的两个原因:

  2. 几何形状目前是预测变量的一部分。请删除几何形状:st_drop_geometry(train[,predictors])

  3. “importance”不是套索方法的参数

model <- train(st_drop_geometry(train[,predictors]), train$NWBIR79, method="lasso", trControl=trainControl(method = "cv"))

这应该与CAST::ffs一样工作。

英文:

Two things that lead to the error:

  1. The geometries are currently part of the predictors. Drop geometries: st_drop_geometry(train[,predictors])

  2. "importance" is not a parameter of the lasso method

model &lt;- train(st_drop_geometry(train[,predictors]), train$NWBIR79, method=&quot;lasso&quot;, trControl=trainControl(method = &quot;cv&quot;))

This should work with CAST::ffs in the same way.

huangapple
  • 本文由 发表于 2023年7月13日 18:52:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76678560.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定