计算嵌套/分割对象上的基尼指数

huangapple go评论74阅读模式
英文:

Compute Gini Index on a nested/rsplit object

问题

我使用了rsample::bootstraps函数来创建一个嵌套对象,就像下面这样:

Sampled_Data = bootstraps(credit_data, times = 2, strata = "Home", apparent = TRUE)

我得到的结果如下:

  splits                id        
  <list>                <chr>     
1 <split [34338/12635]> Bootstrap1
2 <split [34338/12592]> Bootstrap2
3 <split [34338/34338]> Apparent  

我想要根据“Status”和“Expenses”列计算所有引导数据框的基尼指数,就像这样:

library(pROC)
2 * auc(credit_data$Status, credit_data$Expenses) - 1

问题是,我不知道如何在不展开数据和使用循环的情况下完成这个任务。

似乎purr包在这里可能很有用,但我不熟悉它。

我想要得到的结果是:

  splits                id            Gini
  <list>                <chr>         <dbl>
1 <split [34338/12635]> Bootstrap1      x
2 <split [34338/12592]> Bootstrap2      y
3 <split [34338/34338]> Apparent        z

有什么帮助吗?

谢谢。

英文:

I used rsample::bootstraps function to create a nested object just as follows :

Sampled_Data=bootstraps(credit_data,times = 2,strata=&quot;Home&quot;,apparent = TRUE)

What I get is as follows :

  splits                id        
  &lt;list&gt;                &lt;chr&gt;     
1 &lt;split [34338/12635]&gt; Bootstrap1
2 &lt;split [34338/12592]&gt; Bootstrap2
3 &lt;split [34338/34338]&gt; Apparent  

I would like to compute the Gini Index based on Columns "Status" and "Expenses" for all the bootstrapped dataframes just like this :

library(pROC)
2*auc(credit_data$Status,credit_data$Expenses)-1

The problem is that i don't know how to do it without unnesting and doing a for loop.

It seems that purr package should be interesting to be used here but I'm not familiar with this.

What I would like to have :

  splits                id            Gini
  &lt;list&gt;                &lt;chr&gt;     
1 &lt;split [34338/12635]&gt; Bootstrap1    x
2 &lt;split [34338/12592]&gt; Bootstrap2    y
3 &lt;split [34338/34338]&gt; Apparent      z

Any help ?

Thanks

答案1

得分: 1

我会假设您想要进行自助法来获取置信区间。

您可以在某些类型的区间中使用 apparent = TRUE,所以我会在这里省略它。

library(tidymodels)
tidymodels_prefer()

data("credit_data")

# 有关更多信息,请参阅 ?int_pctl 和 https://www.tidymodels.org/learn/statistics/bootstrap
get_gini <- function(split) {
  dat <- analysis(split)
  roc_res <- roc_auc(dat, truth = Status, Expenses)
  # 转换为基尼系数统计值
  roc_res %>%
    mutate(
      .metric = "gini",
      .estimate = 2 * .estimate - 1
    ) %>%
    # 现在使用与 `tidy()` 相同的格式
    select(estimate = .estimate, term = .metric)
}

set.seed(1)
# 为自助法区间设置更高的次数
bts <- 
  bootstraps(credit_data, times = 50) %>%
  mutate(gini = map(splits, get_gini))

int_pctl(bts, gini)
#> 警告: 推荐至少进行 1000 次非缺失自助法重采样以计算 `gini`。
#> # A tibble: 1 × 6
#>   term   .lower .estimate .upper .alpha .method   
#>   <chr>   <dbl>     <dbl>  <dbl>  <dbl> <chr>     
#> 1 gini  -0.0463  -0.00173 0.0377   0.05 percentile

创建于 2023-07-17,使用 reprex v2.0.2

英文:

I'll assume that you want to bootstrap this to get confidence intervals.

You would use apparent = TRUE for some types of intervals, so I'll omit that here.

library(tidymodels)
tidymodels_prefer()

data(&quot;credit_data&quot;)

# See ?int_pctl and
# https://www.tidymodels.org/learn/statistics/bootstrap
# for more info. 
get_gini &lt;- function(split) {
  dat &lt;- analysis(split)
  roc_res &lt;- roc_auc(dat, truth = Status, Expenses)
  # Convert to gini stat
  roc_res %&gt;% 
    mutate(
      .metric = &quot;gini&quot;,
      .estimate = 2 * .estimate - 1
    ) %&gt;% 
    # now use same fomrat as `tidy()`
    select(estimate = .estimate, term = .metric)
}

set.seed(1)
# Set times higher for bootstrap intervals
bts &lt;- 
  bootstraps(credit_data, times = 50) %&gt;% 
  mutate(gini = map(splits, get_gini))

int_pctl(bts, gini)
#&gt; Warning: Recommend at least 1000 non-missing bootstrap resamples for term
#&gt; `gini`.
#&gt; # A tibble: 1 &#215; 6
#&gt;   term   .lower .estimate .upper .alpha .method   
#&gt;   &lt;chr&gt;   &lt;dbl&gt;     &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt; &lt;chr&gt;     
#&gt; 1 gini  -0.0463  -0.00173 0.0377   0.05 percentile

<sup>Created on 2023-07-17 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年7月17日 22:42:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76705587.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定