NAs produced using indexing to calculate RMSE.

huangapple go评论111阅读模式
英文:

NAs produced using indexing to calculate RMSE

问题

I'm working with the Boston Housing data set in the MASS package. The code produces NAs when calculating RMSE using indexing:

library(MASS)
library(glmnet)

for (i in 1:5){
  
  idx <- sample(seq(1, 3), size = nrow(MASS::Boston), replace = TRUE, prob = c(.6, .2, .2))
  train <- MASS::Boston[idx == 1,]
  test <- MASS::Boston[idx == 2,]
  validation <- MASS::Boston[idx == 3,]

elastic.test.RMSE <- 0
elastic.test.pred <- 0
y <- train$medv
x <- data.matrix(train %>% dplyr::select(-medv))
elastic.model <- glmnet(x, y, alpha = 0.5)
elastic.cv <- cv.glmnet(x, y, alpha = 0.5)
best.elastic.lambda <- elastic.cv$lambda.min
best.elastic.model <- glmnet(x, y, alpha = 0, lambda = best.elastic.lambda)
elastic.test.pred <- predict(best.elastic.model, s = best.elastic.lambda, newx = data.matrix(test %>% dplyr::select(-medv)))
elastic.test.RMSE[i] <- Metrics::rmse(actual = test$medv, predicted = elastic.test.pred)
}

As an example, elastic.test.RMSE returns:

[1] 0.000000       NA       NA       NA 4.019411

However, if I create a data frame and add new RMSE values to the data frame, using the same formula, everything is fine.

elastic.test.RMSE.df <- data.frame(elastic.test.RMSE)
library(MASS)
library(glmnet)

for (i in 1:5){
  
  idx <- sample(seq(1, 3), size = nrow(MASS::Boston), replace = TRUE, prob = c(.6, .2, .2))
  train <- MASS::Boston[idx == 1,]
  test <- MASS::Boston[idx == 2,]
  validation <- MASS::Boston[idx == 3,]

elastic.test.RMSE <- 0
elastic.test.pred <- 0
y <- train$medv
x <- data.matrix(train %>% dplyr::select(-medv))
elastic.model <- glmnet(x, y, alpha = 0.5)
elastic.cv <- cv.glmnet(x, y, alpha = 0.5)
best.elastic.lambda <- elastic.cv$lambda.min
best.elastic.model <- glmnet(x, y, alpha = 0, lambda = best.elastic.lambda)
elastic.test.pred <- predict(best.elastic.model, s = best.elastic.lambda, newx = data.matrix(test %>% dplyr::select(-medv)))
elastic.test.RMSE <- Metrics::rmse(actual = test$medv, predicted = elastic.test.pred)
elastic.test.RMSE.df <- rbind(elastic.test.RMSE.df, elastic.test.RMSE)
}

for example,

> elastic.test.RMSE.df
  elastic.test.RMSE
1          5.213519
2          4.806393
3          5.412275
4          5.749699
5          5.192845
6          4.229541

I'd much rather do this with indexing, but I can't see what's causing the NA values. Obviously I've checked Stackoverflow, and the help files, I didn't find anything to solve the issue.

英文:

I'm working with the Boston Housing data set in the MASS package. The code produces NAs when calculating RMSE using indexing:

library(MASS)
library(glmnet)

for (i in 1:5){
  
  idx &lt;- sample(seq(1, 3), size = nrow(MASS::Boston), replace = TRUE, prob = c(.6, .2, .2))
  train &lt;- MASS::Boston[idx == 1,]
  test &lt;- MASS::Boston[idx == 2,]
  validation &lt;- MASS::Boston[idx == 3,]

elastic.test.RMSE &lt;- 0
elastic.test.pred &lt;- 0
y &lt;- train$medv
x &lt;- data.matrix(train %&gt;% dplyr::select(-medv))
elastic.model &lt;- glmnet(x, y, alpha = 0.5)
elastic.cv &lt;- cv.glmnet(x, y, alpha = 0.5)
best.elastic.lambda &lt;- elastic.cv$lambda.min
best.elastic.model &lt;- glmnet(x, y, alpha = 0, lambda = best.elastic.lambda)
elastic.test.pred &lt;- predict(best.elastic.model, s = best.elastic.lambda, newx = data.matrix(test %&gt;% dplyr::select(-medv)))
elastic.test.RMSE[i] &lt;- Metrics::rmse(actual = test$medv, predicted = elastic.test.pred)
}

As an example, elastic.test.RMSE returns:

[1] 0.000000       NA       NA       NA 4.019411

However, if I create a data frame and add new RMSE values to the data frame, using the same formula, everything is fine.

elastic.test.RMSE.df &lt;- data.frame(elastic.test.RMSE)
library(MASS)
library(glmnet)

for (i in 1:5){
  
  idx &lt;- sample(seq(1, 3), size = nrow(MASS::Boston), replace = TRUE, prob = c(.6, .2, .2))
  train &lt;- MASS::Boston[idx == 1,]
  test &lt;- MASS::Boston[idx == 2,]
  validation &lt;- MASS::Boston[idx == 3,]

elastic.test.RMSE &lt;- 0
elastic.test.pred &lt;- 0
y &lt;- train$medv
x &lt;- data.matrix(train %&gt;% dplyr::select(-medv))
elastic.model &lt;- glmnet(x, y, alpha = 0.5)
elastic.cv &lt;- cv.glmnet(x, y, alpha = 0.5)
best.elastic.lambda &lt;- elastic.cv$lambda.min
best.elastic.model &lt;- glmnet(x, y, alpha = 0, lambda = best.elastic.lambda)
elastic.test.pred &lt;- predict(best.elastic.model, s = best.elastic.lambda, newx = data.matrix(test %&gt;% dplyr::select(-medv)))
elastic.test.RMSE &lt;- Metrics::rmse(actual = test$medv, predicted = elastic.test.pred)
elastic.test.RMSE.df &lt;- rbind(elastic.test.RMSE.df, elastic.test.RMSE)
}

for example,

&gt; elastic.test.RMSE.df
  elastic.test.RMSE
1          5.213519
2          4.806393
3          5.412275
4          5.749699
5          5.192845
6          4.229541

I'd much rather do this with indexing, but I can't see what's causing the NA values. Obviously I've checked Stackoverflow, and the help files, I didn't find anything to solve the issue.

答案1

得分: 1

问题出在这行代码上:

elastic.test.RMSE &lt;- 0

变量 elastic.test.RMSE 在每次迭代时都被重新初始化。你应该将这行代码放在 for 循环之外,如下所示:

elastic.test.RMSE &lt;- 0
for (i in 1:5){
    ...
}

另外,我认为以下这行代码可以安全删除:

elastic.test.pred &lt;- 0
英文:

The problem is with this line of code:

elastic.test.RMSE &lt;- 0

The variable elastic.test.RMSE gets reinitialized at each iteration. You should place the line outside the for loop, like so:

elastic.test.RMSE &lt;- 0
for (i in 1:5){
    ...
}

Also, I believe the following line can be safely deleted:

elastic.test.pred &lt;- 0

huangapple
  • 本文由 发表于 2023年5月26日 08:49:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76337020.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定