从在基准实验中使用的重采样中获取重采样索引 – mlr

huangapple go评论63阅读模式
英文:

getResamplingIndices from resampling used in benchmark experiment - mlr

问题

我正在使用嵌套交叉验证进行基准实验。我想要检索每个外部循环使用的实例的索引。我知道有一个适用于此任务的函数getResamplingIndices()。但它不会接受'BenchmarkResult'对象。有没有办法绕过这个问题?以下是一个示例:

res = benchmark(lrns, task, outer, measures = list(auc, ppv, npv), show.info = FALSE, models = T)

getResamplingIndices(res)

Error in getResamplingIndices(res) : 
'object'的断言失败:必须继承自类'ResampleResult',但它的类是'BenchmarkResult'
英文:

I am using nested cross-validation in a benchmark experiment. I would like to retrieve the indices of the instances used for each outer loop. I am aware there is a function getResamplingIndices() suited for this task. But it won't accept 'BenchmarkResult' object. Is there a way to get around this? Here is an example:

res = benchmark(lrns, task, outer, measures = list(auc, ppv, npv), show.info = FALSE, models = T)

getResamplingIndices(res)

Error in getResamplingIndices(res) : 
  Assertion on 'object' failed: Must inherit from class 'ResampleResult', but has class 'BenchmarkResult'.

答案1

得分: 2

重采样索引在各个任务之间是相同的,因此您只需将其应用于嵌套在您的BenchmarkResult对象中的ResamplingResult对象中。

如果您希望在BenchmarkResult对象中的每个任务中都有这些索引,可以执行以下操作:

inds_by_task = lapply(bmr$results, function(x) getResamplingIndices(x[[1]]))

以下是完整的reprex示例:

library(mlr)

lrns = list(makeLearner("classif.lda"), makeLearner("classif.rpart"))
tasks = list(iris.task, sonar.task)
rdesc = makeResampleDesc("CV", iters = 2L)
meas = list(acc, ber)
bmr = benchmark(lrns, tasks, rdesc, measures = meas, show.info = FALSE)

getResamplingIndices(bmr$results$`iris-example`$classif.lda)
# $train.inds
# $train.inds[[1]]
#  [1]  25 136  62  77  22 114 101 145  87  34  93 120 133 126  76   4 105  97   7
# [20] 128  49  19 106   9  54   5  72 102  73  51 109 115  23  86  89 112 130  69
# [39]  57 122  53  12  60  40  36  70  83  90 108  81  38  50 129  75  71  59  47
# [58]  95  31 147  37  30 127  43 148 103  27  66 137  29 124  35 143 132  45
# 
# $train.inds[[2]]
#  [1]  79 123  28  17  82  55 110   8 107  26 125 121  32  33  48 119  98  58 116
# [20] 144 139  67  13 142 111  16  65  21  74  42 113 149 117   2  99  68 140 104
# [39]  96 150   1  94  46 131 135  11  10 146  85  15  52  78  63 100   3 118 134
# [58]   6  88 138  41  39  92  56  84  61  20  24  91  80  18  64  14 141  44
# 
# 
# $test.inds
# $test.inds[[1]]
#  [1]   1   2   3   6   8  10  11  13  14  15  16  17  18  20  21  24  26  28  32
# [20]  33  39  41  42  44  46  48  52  55  56  58  61  63  64  65  67  68  74  78
# [39]  79  80  82  84  85  88  91  92  94  96  98  99 100 104 107 110 111 113 116
# [58] 117 118 119 121 123 125 131 134 135 138 139 140 141 142 144 146 149 150
# 
# $test.inds[[2]]
#  [1]   4   5   7   9  12  19  22  23  25  27  29  30  31  34  35  36  37  38  40
# [20]  43  45  47  49  50  51  53  54  57  59  60  62  66  69  70  71  72  73  75
# [39]  76  77  81  83  86  87  89  90  93  95  97 101 102 103 105 106 108 109 112
# [58] 114 115 120 122 124 126 127 128 129 130 132 133 136 137 143 145 147 148

创建于2020-01-04,使用 reprex包 (v0.3.0)

英文:

Resampling indices are the same across tasks, so you just apply it in a ResamplingResult object which is nested within your BenchmarkResult object.

If you want to have it for every task in your BenchmarkResult object, do

inds_by_task = lapply(bmr$results, function(x) getResamplingIndices(x[[1]]))

See below for a full reprex.

library(mlr)
#> Loading required package: ParamHelpers

lrns = list(makeLearner("classif.lda"), makeLearner("classif.rpart"))
tasks = list(iris.task, sonar.task)
rdesc = makeResampleDesc("CV", iters = 2L)
meas = list(acc, ber)
bmr = benchmark(lrns, tasks, rdesc, measures = meas, show.info = FALSE)

getResamplingIndices(bmr$results$`iris-example`$classif.lda)
#> $train.inds
#> $train.inds[[1]]
#>  [1]  25 136  62  77  22 114 101 145  87  34  93 120 133 126  76   4 105  97   7
#> [20] 128  49  19 106   9  54   5  72 102  73  51 109 115  23  86  89 112 130  69
#> [39]  57 122  53  12  60  40  36  70  83  90 108  81  38  50 129  75  71  59  47
#> [58]  95  31 147  37  30 127  43 148 103  27  66 137  29 124  35 143 132  45
#> 
#> $train.inds[[2]]
#>  [1]  79 123  28  17  82  55 110   8 107  26 125 121  32  33  48 119  98  58 116
#> [20] 144 139  67  13 142 111  16  65  21  74  42 113 149 117   2  99  68 140 104
#> [39]  96 150   1  94  46 131 135  11  10 146  85  15  52  78  63 100   3 118 134
#> [58]   6  88 138  41  39  92  56  84  61  20  24  91  80  18  64  14 141  44
#> 
#> 
#> $test.inds
#> $test.inds[[1]]
#>  [1]   1   2   3   6   8  10  11  13  14  15  16  17  18  20  21  24  26  28  32
#> [20]  33  39  41  42  44  46  48  52  55  56  58  61  63  64  65  67  68  74  78
#> [39]  79  80  82  84  85  88  91  92  94  96  98  99 100 104 107 110 111 113 116
#> [58] 117 118 119 121 123 125 131 134 135 138 139 140 141 142 144 146 149 150
#> 
#> $test.inds[[2]]
#>  [1]   4   5   7   9  12  19  22  23  25  27  29  30  31  34  35  36  37  38  40
#> [20]  43  45  47  49  50  51  53  54  57  59  60  62  66  69  70  71  72  73  75
#> [39]  76  77  81  83  86  87  89  90  93  95  97 101 102 103 105 106 108 109 112
#> [58] 114 115 120 122 124 126 127 128 129 130 132 133 136 137 143 145 147 148

<sup>Created on 2020-01-04 by the reprex package (v0.3.0)</sup>

huangapple
  • 本文由 发表于 2020年1月4日 01:58:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/59583188.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定