tidymodels: loss_accuracy 不提供变量重要性结果

huangapple go评论92阅读模式
英文:

tidymodels: loss_accuracy provides no variable importance results

问题

使用鸢尾花数据集,通过迭代搜索对knn分类器进行了调优,以进行多类别分类。然而,在DALEX::model_parts()中使用loss accuracy来计算变量重要性时,结果为空。

我会感激任何想法。非常感谢您的支持!

英文:

Using the iris dataset, a knn-classifier was tuned with iterative search for multiple classification. However, using loss accuracy in DALEX::model_parts() for variable importance, provides empty results.

I would appreciate any ideas. Thank you so much for your support!

  1. library(tidyverse)
  2. library(tidymodels)
  3. library(DALEXtra)
  4. tidymodels_prefer()
  5. df <- iris
  6. # split
  7. set.seed(2023)
  8. splits <- initial_split(df, strata = Species, prop = 4/5)
  9. df_train <- training(splits)
  10. df_test <- testing(splits)
  11. # workflow
  12. df_rec <- recipe(Species ~ ., data = df_train)
  13. knn_model <- nearest_neighbor(neighbors = tune()) %>%
  14. set_engine("kknn") %>%
  15. set_mode("classification")
  16. df_wflow <- workflow() %>%
  17. add_model(knn_model) %>%
  18. add_recipe(df_rec)
  19. # cross-validation
  20. set.seed(2023)
  21. knn_res <-
  22. df_wflow %>%
  23. tune_bayes(
  24. metrics = metric_set(accuracy),
  25. resamples = vfold_cv(df_train, strata = "Species", v = 2),
  26. control = control_bayes(verbose = TRUE, save_pred = TRUE))
  27. # fit
  28. best_k <- knn_res %>%
  29. select_best("accuracy")
  30. knn_mod <- df_wflow %>%
  31. finalize_workflow(best_k) %>%
  32. fit(df_train)
  33. # variable importance
  34. knn_exp <- explain_tidymodels(extract_fit_parsnip(knn_mod),
  35. data = df_rec %>% prep() %>% bake(new_data = NULL, all_predictors()),
  36. y = df_train$Species)
  37. set.seed(2023)
  38. vip <- model_parts(knn_exp, type = "variable_importance", loss_function = loss_accuracy)
  39. plot(vip) # empty plot

答案1

得分: 1

以下是您要翻译的内容:

You are getting 0 for all your results because the model type according to {DALEX} is "multiclass".

These calculations would have worked well if the type is "classification".

  1. knn_exp$model_info$type
  2. # [1] "multiclass"

This means that the prediction that happens will be the predicted probabilities (here we get 1s and 0s because the modeling is quite overfit)

  1. predicted <- knn_exp$predict_function(knn_exp$model, newdata = df_train)
  2. predicted
  3. # setosa versicolor virginica
  4. # [1,] 1 0 0
  5. # [2,] 1 0 0
  6. # [3,] 1 0 0
  7. # [4,] 1 0 0
  8. # [5,] 1 0 0
  9. # [6,] 1 0 0
  10. # ...

When you use loss_accuracy() as your loss function, it does that by using the following calculations

  1. loss_accuracy
  2. # function (observed, predicted, na.rm = TRUE)
  3. # mean(observed == predicted, na.rm = na.rm)
  4. # <bytecode: 0x159276bb8>
  5. # <environment: namespace:DALEX>
  6. # attr(,"loss_name")
  7. # [1] "Accuracy"

And we can see why this becomes an issue if we do the calculations steps by step. First we define the observed as the outcome factor

  1. observed <- df_train$Species
  2. observed
  3. # [1] setosa setosa setosa setosa setosa setosa
  4. # [7] setosa setosa setosa setosa setosa setosa
  5. # [13] setosa setosa setosa setosa setosa setosa
  6. # [19] setosa setosa setosa setosa setosa setosa
  7. # [25] setosa setosa setosa setosa setosa setosa
  8. # [31] setosa setosa setosa setosa setosa setosa
  9. # [37] setosa setosa setosa setosa versicolor versicolor
  10. # [43] versicolor versicolor versicolor versicolor versicolor versicolor
  11. # [49] versicolor versicolor versicolor versicolor versicolor versicolor
  12. # [55] versicolor versicolor versicolor versicolor versicolor versicolor
  13. # [61] versicolor versicolor versicolor versicolor versicolor versicolor
  14. # [67] versicolor versicolor versicolor versicolor versicolor versicolor
  15. # [73] versicolor versicolor versicolor versicolor versicolor versicolor
  16. # [79] versicolor versicolor virginica virginica virginica virginica
  17. # [85] virginica virginica virginica virginica virginica virginica
  18. # [91] virginica virginica virginica virginica virginica virginica
  19. # [97] virginica virginica virginica virginica virginica virginica
  20. # [103] virginica virginica virginica virginica virginica virginica
  21. # [109] virginica virginica virginica virginica virginica virginica
  22. # [115] virginica virginica virginica virginica virginica virginica
  23. # Levels: setosa versicolor virginica

since observed is a factor vector, and predicted is a numeric matrix we get back a logical matrix of FALSE since the values are never the same.

  1. head(observed == predicted)
  2. # setosa versicolor virginica
  3. # [1,] FALSE FALSE FALSE
  4. # [2,] FALSE FALSE FALSE
  5. # [3,] FALSE FALSE FALSE
  6. # [4,] FALSE FALSE FALSE
  7. # [5,] FALSE FALSE FALSE
  8. # [6,] FALSE FALSE FALSE

So when we take the mean of this we get the expected 0.

  1. mean(observed == predicted)
  2. # [1] 0
英文:

You are getting 0 for all your results because the the model type according to {DALEX} is &quot;multiclass&quot;.

These calculations would have worked well if the type is &quot;classification&quot;.

  1. knn_exp$model_info$type
  2. #&gt; [1] &quot;multiclass&quot;

This means that the prediction that happens will be the predicted probabilities (here we get 1s and 0s because the modeling is quite overfit)

  1. predicted &lt;- knn_exp$predict_function(knn_exp$model, newdata = df_train)
  2. predicted
  3. #&gt; setosa versicolor virginica
  4. #&gt; [1,] 1 0 0
  5. #&gt; [2,] 1 0 0
  6. #&gt; [3,] 1 0 0
  7. #&gt; [4,] 1 0 0
  8. #&gt; [5,] 1 0 0
  9. #&gt; [6,] 1 0 0
  10. #&gt; ...

When you use loss_accuracy() as your loss function, it does that by using the following calculations

  1. loss_accuracy
  2. #&gt; function (observed, predicted, na.rm = TRUE)
  3. #&gt; mean(observed == predicted, na.rm = na.rm)
  4. #&gt; &lt;bytecode: 0x159276bb8&gt;
  5. #&gt; &lt;environment: namespace:DALEX&gt;
  6. #&gt; attr(,&quot;loss_name&quot;)
  7. #&gt; [1] &quot;Accuracy&quot;

And we can see why this becomes an issue if we do the calculations steps by step. First we define the observed as the outcome factor

  1. observed &lt;- df_train$Species
  2. observed
  3. #&gt; [1] setosa setosa setosa setosa setosa setosa
  4. #&gt; [7] setosa setosa setosa setosa setosa setosa
  5. #&gt; [13] setosa setosa setosa setosa setosa setosa
  6. #&gt; [19] setosa setosa setosa setosa setosa setosa
  7. #&gt; [25] setosa setosa setosa setosa setosa setosa
  8. #&gt; [31] setosa setosa setosa setosa setosa setosa
  9. #&gt; [37] setosa setosa setosa setosa versicolor versicolor
  10. #&gt; [43] versicolor versicolor versicolor versicolor versicolor versicolor
  11. #&gt; [49] versicolor versicolor versicolor versicolor versicolor versicolor
  12. #&gt; [55] versicolor versicolor versicolor versicolor versicolor versicolor
  13. #&gt; [61] versicolor versicolor versicolor versicolor versicolor versicolor
  14. #&gt; [67] versicolor versicolor versicolor versicolor versicolor versicolor
  15. #&gt; [73] versicolor versicolor versicolor versicolor versicolor versicolor
  16. #&gt; [79] versicolor versicolor virginica virginica virginica virginica
  17. #&gt; [85] virginica virginica virginica virginica virginica virginica
  18. #&gt; [91] virginica virginica virginica virginica virginica virginica
  19. #&gt; [97] virginica virginica virginica virginica virginica virginica
  20. #&gt; [103] virginica virginica virginica virginica virginica virginica
  21. #&gt; [109] virginica virginica virginica virginica virginica virginica
  22. #&gt; [115] virginica virginica virginica virginica virginica virginica
  23. #&gt; Levels: setosa versicolor virginica

since observed is a factor vector, and predicted is a numeric matrix we get back a logical matrix of FALSE since the values are never the same.

  1. head(observed == predicted)
  2. #&gt; setosa versicolor virginica
  3. #&gt; [1,] FALSE FALSE FALSE
  4. #&gt; [2,] FALSE FALSE FALSE
  5. #&gt; [3,] FALSE FALSE FALSE
  6. #&gt; [4,] FALSE FALSE FALSE
  7. #&gt; [5,] FALSE FALSE FALSE
  8. #&gt; [6,] FALSE FALSE FALSE

So when we take the mean of this we get the expected 0.

  1. mean(observed == predicted)
  2. #&gt; [1] 0

huangapple
  • 本文由 发表于 2023年7月31日 22:13:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76804459.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定