英文:
Compute Gini Index on a nested/rsplit object
问题
我使用了rsample::bootstraps
函数来创建一个嵌套对象,就像下面这样:
Sampled_Data = bootstraps(credit_data, times = 2, strata = "Home", apparent = TRUE)
我得到的结果如下:
splits id
<list> <chr>
1 <split [34338/12635]> Bootstrap1
2 <split [34338/12592]> Bootstrap2
3 <split [34338/34338]> Apparent
我想要根据“Status”和“Expenses”列计算所有引导数据框的基尼指数,就像这样:
library(pROC)
2 * auc(credit_data$Status, credit_data$Expenses) - 1
问题是,我不知道如何在不展开数据和使用循环的情况下完成这个任务。
似乎purr
包在这里可能很有用,但我不熟悉它。
我想要得到的结果是:
splits id Gini
<list> <chr> <dbl>
1 <split [34338/12635]> Bootstrap1 x
2 <split [34338/12592]> Bootstrap2 y
3 <split [34338/34338]> Apparent z
有什么帮助吗?
谢谢。
英文:
I used rsample::bootstraps function to create a nested object just as follows :
Sampled_Data=bootstraps(credit_data,times = 2,strata="Home",apparent = TRUE)
What I get is as follows :
splits id
<list> <chr>
1 <split [34338/12635]> Bootstrap1
2 <split [34338/12592]> Bootstrap2
3 <split [34338/34338]> Apparent
I would like to compute the Gini Index based on Columns "Status" and "Expenses" for all the bootstrapped dataframes just like this :
library(pROC)
2*auc(credit_data$Status,credit_data$Expenses)-1
The problem is that i don't know how to do it without unnesting and doing a for loop.
It seems that purr package should be interesting to be used here but I'm not familiar with this.
What I would like to have :
splits id Gini
<list> <chr>
1 <split [34338/12635]> Bootstrap1 x
2 <split [34338/12592]> Bootstrap2 y
3 <split [34338/34338]> Apparent z
Any help ?
Thanks
答案1
得分: 1
我会假设您想要进行自助法来获取置信区间。
您可以在某些类型的区间中使用 apparent = TRUE
,所以我会在这里省略它。
library(tidymodels)
tidymodels_prefer()
data("credit_data")
# 有关更多信息,请参阅 ?int_pctl 和 https://www.tidymodels.org/learn/statistics/bootstrap
get_gini <- function(split) {
dat <- analysis(split)
roc_res <- roc_auc(dat, truth = Status, Expenses)
# 转换为基尼系数统计值
roc_res %>%
mutate(
.metric = "gini",
.estimate = 2 * .estimate - 1
) %>%
# 现在使用与 `tidy()` 相同的格式
select(estimate = .estimate, term = .metric)
}
set.seed(1)
# 为自助法区间设置更高的次数
bts <-
bootstraps(credit_data, times = 50) %>%
mutate(gini = map(splits, get_gini))
int_pctl(bts, gini)
#> 警告: 推荐至少进行 1000 次非缺失自助法重采样以计算 `gini`。
#> # A tibble: 1 × 6
#> term .lower .estimate .upper .alpha .method
#> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 gini -0.0463 -0.00173 0.0377 0.05 percentile
创建于 2023-07-17,使用 reprex v2.0.2
英文:
I'll assume that you want to bootstrap this to get confidence intervals.
You would use apparent = TRUE
for some types of intervals, so I'll omit that here.
library(tidymodels)
tidymodels_prefer()
data("credit_data")
# See ?int_pctl and
# https://www.tidymodels.org/learn/statistics/bootstrap
# for more info.
get_gini <- function(split) {
dat <- analysis(split)
roc_res <- roc_auc(dat, truth = Status, Expenses)
# Convert to gini stat
roc_res %>%
mutate(
.metric = "gini",
.estimate = 2 * .estimate - 1
) %>%
# now use same fomrat as `tidy()`
select(estimate = .estimate, term = .metric)
}
set.seed(1)
# Set times higher for bootstrap intervals
bts <-
bootstraps(credit_data, times = 50) %>%
mutate(gini = map(splits, get_gini))
int_pctl(bts, gini)
#> Warning: Recommend at least 1000 non-missing bootstrap resamples for term
#> `gini`.
#> # A tibble: 1 × 6
#> term .lower .estimate .upper .alpha .method
#> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 gini -0.0463 -0.00173 0.0377 0.05 percentile
<sup>Created on 2023-07-17 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论