NAs produced using indexing to calculate RMSE.

huangapple go评论143阅读模式
英文:

NAs produced using indexing to calculate RMSE

问题

I'm working with the Boston Housing data set in the MASS package. The code produces NAs when calculating RMSE using indexing:

  1. library(MASS)
  2. library(glmnet)
  3. for (i in 1:5){
  4. idx <- sample(seq(1, 3), size = nrow(MASS::Boston), replace = TRUE, prob = c(.6, .2, .2))
  5. train <- MASS::Boston[idx == 1,]
  6. test <- MASS::Boston[idx == 2,]
  7. validation <- MASS::Boston[idx == 3,]
  8. elastic.test.RMSE <- 0
  9. elastic.test.pred <- 0
  10. y <- train$medv
  11. x <- data.matrix(train %>% dplyr::select(-medv))
  12. elastic.model <- glmnet(x, y, alpha = 0.5)
  13. elastic.cv <- cv.glmnet(x, y, alpha = 0.5)
  14. best.elastic.lambda <- elastic.cv$lambda.min
  15. best.elastic.model <- glmnet(x, y, alpha = 0, lambda = best.elastic.lambda)
  16. elastic.test.pred <- predict(best.elastic.model, s = best.elastic.lambda, newx = data.matrix(test %>% dplyr::select(-medv)))
  17. elastic.test.RMSE[i] <- Metrics::rmse(actual = test$medv, predicted = elastic.test.pred)
  18. }

As an example, elastic.test.RMSE returns:

  1. [1] 0.000000 NA NA NA 4.019411

However, if I create a data frame and add new RMSE values to the data frame, using the same formula, everything is fine.

  1. elastic.test.RMSE.df <- data.frame(elastic.test.RMSE)
  2. library(MASS)
  3. library(glmnet)
  4. for (i in 1:5){
  5. idx <- sample(seq(1, 3), size = nrow(MASS::Boston), replace = TRUE, prob = c(.6, .2, .2))
  6. train <- MASS::Boston[idx == 1,]
  7. test <- MASS::Boston[idx == 2,]
  8. validation <- MASS::Boston[idx == 3,]
  9. elastic.test.RMSE <- 0
  10. elastic.test.pred <- 0
  11. y <- train$medv
  12. x <- data.matrix(train %>% dplyr::select(-medv))
  13. elastic.model <- glmnet(x, y, alpha = 0.5)
  14. elastic.cv <- cv.glmnet(x, y, alpha = 0.5)
  15. best.elastic.lambda <- elastic.cv$lambda.min
  16. best.elastic.model <- glmnet(x, y, alpha = 0, lambda = best.elastic.lambda)
  17. elastic.test.pred <- predict(best.elastic.model, s = best.elastic.lambda, newx = data.matrix(test %>% dplyr::select(-medv)))
  18. elastic.test.RMSE <- Metrics::rmse(actual = test$medv, predicted = elastic.test.pred)
  19. elastic.test.RMSE.df <- rbind(elastic.test.RMSE.df, elastic.test.RMSE)
  20. }

for example,

  1. > elastic.test.RMSE.df
  2. elastic.test.RMSE
  3. 1 5.213519
  4. 2 4.806393
  5. 3 5.412275
  6. 4 5.749699
  7. 5 5.192845
  8. 6 4.229541

I'd much rather do this with indexing, but I can't see what's causing the NA values. Obviously I've checked Stackoverflow, and the help files, I didn't find anything to solve the issue.

英文:

I'm working with the Boston Housing data set in the MASS package. The code produces NAs when calculating RMSE using indexing:

  1. library(MASS)
  2. library(glmnet)
  3. for (i in 1:5){
  4. idx &lt;- sample(seq(1, 3), size = nrow(MASS::Boston), replace = TRUE, prob = c(.6, .2, .2))
  5. train &lt;- MASS::Boston[idx == 1,]
  6. test &lt;- MASS::Boston[idx == 2,]
  7. validation &lt;- MASS::Boston[idx == 3,]
  8. elastic.test.RMSE &lt;- 0
  9. elastic.test.pred &lt;- 0
  10. y &lt;- train$medv
  11. x &lt;- data.matrix(train %&gt;% dplyr::select(-medv))
  12. elastic.model &lt;- glmnet(x, y, alpha = 0.5)
  13. elastic.cv &lt;- cv.glmnet(x, y, alpha = 0.5)
  14. best.elastic.lambda &lt;- elastic.cv$lambda.min
  15. best.elastic.model &lt;- glmnet(x, y, alpha = 0, lambda = best.elastic.lambda)
  16. elastic.test.pred &lt;- predict(best.elastic.model, s = best.elastic.lambda, newx = data.matrix(test %&gt;% dplyr::select(-medv)))
  17. elastic.test.RMSE[i] &lt;- Metrics::rmse(actual = test$medv, predicted = elastic.test.pred)
  18. }

As an example, elastic.test.RMSE returns:

  1. [1] 0.000000 NA NA NA 4.019411

However, if I create a data frame and add new RMSE values to the data frame, using the same formula, everything is fine.

  1. elastic.test.RMSE.df &lt;- data.frame(elastic.test.RMSE)
  2. library(MASS)
  3. library(glmnet)
  4. for (i in 1:5){
  5. idx &lt;- sample(seq(1, 3), size = nrow(MASS::Boston), replace = TRUE, prob = c(.6, .2, .2))
  6. train &lt;- MASS::Boston[idx == 1,]
  7. test &lt;- MASS::Boston[idx == 2,]
  8. validation &lt;- MASS::Boston[idx == 3,]
  9. elastic.test.RMSE &lt;- 0
  10. elastic.test.pred &lt;- 0
  11. y &lt;- train$medv
  12. x &lt;- data.matrix(train %&gt;% dplyr::select(-medv))
  13. elastic.model &lt;- glmnet(x, y, alpha = 0.5)
  14. elastic.cv &lt;- cv.glmnet(x, y, alpha = 0.5)
  15. best.elastic.lambda &lt;- elastic.cv$lambda.min
  16. best.elastic.model &lt;- glmnet(x, y, alpha = 0, lambda = best.elastic.lambda)
  17. elastic.test.pred &lt;- predict(best.elastic.model, s = best.elastic.lambda, newx = data.matrix(test %&gt;% dplyr::select(-medv)))
  18. elastic.test.RMSE &lt;- Metrics::rmse(actual = test$medv, predicted = elastic.test.pred)
  19. elastic.test.RMSE.df &lt;- rbind(elastic.test.RMSE.df, elastic.test.RMSE)
  20. }

for example,

  1. &gt; elastic.test.RMSE.df
  2. elastic.test.RMSE
  3. 1 5.213519
  4. 2 4.806393
  5. 3 5.412275
  6. 4 5.749699
  7. 5 5.192845
  8. 6 4.229541

I'd much rather do this with indexing, but I can't see what's causing the NA values. Obviously I've checked Stackoverflow, and the help files, I didn't find anything to solve the issue.

答案1

得分: 1

问题出在这行代码上:

  1. elastic.test.RMSE &lt;- 0

变量 elastic.test.RMSE 在每次迭代时都被重新初始化。你应该将这行代码放在 for 循环之外,如下所示:

  1. elastic.test.RMSE &lt;- 0
  2. for (i in 1:5){
  3. ...
  4. }

另外,我认为以下这行代码可以安全删除:

  1. elastic.test.pred &lt;- 0
英文:

The problem is with this line of code:

  1. elastic.test.RMSE &lt;- 0

The variable elastic.test.RMSE gets reinitialized at each iteration. You should place the line outside the for loop, like so:

  1. elastic.test.RMSE &lt;- 0
  2. for (i in 1:5){
  3. ...
  4. }

Also, I believe the following line can be safely deleted:

  1. elastic.test.pred &lt;- 0

huangapple
  • 本文由 发表于 2023年5月26日 08:49:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76337020.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定