R中的加权自助法

huangapple go评论79阅读模式
英文:

R: Weighted Bootstrap in R

问题

我正在使用R编程语言工作。

我熟悉一般的自助法程序(https://en.wikipedia.org/wiki/Bootstrapping_(statistics):

  • 假设您有大小为 "n" 的数据集。
  • 用替换方式随机抽取大小为 "n" 的样本。
  • 取这个随机样本的均值。
  • 重复上述步骤多次。

我的问题: 我有兴趣将这扩展到 "加权自助法",也就是说,现在每个观测都有与之关联的被选中的概率。

这是我尝试编写的R代码:

# 计算加权均值的函数(输入:数据 x 和权重 w)

weighted_mean <- function(x, w) {
  sum(x * w) / sum(w)
}

# 执行带有替换的随机抽样的函数,其中选择任何数据点的概率与分配的权重成比例(输入:R 是自助法重复的次数)

weighted_bootstrap <- function(data, weights, R) {   
  estimates <- numeric(R)  
  for (i in seq_len(R)) {
    bootstrap_sample <- sample(data, size = length(data), replace = TRUE, prob = weights)
    estimates[i] <- weighted_mean(bootstrap_sample, weights)
  }
  estimates
}

这是如何在一些数据上使用此加权自助法函数的示例(注意权重必须加起来等于1):

data <- c(1, 2, 3, 4, 5)
weights <- c(0.1, 0.2, 0.3, 0.2, 0.2)
R <- 1000
estimates <- weighted_bootstrap(data, weights, R)
plot(hist(estimates))

请问有人可以告诉我是否我理解正确吗?
谢谢!

英文:

I am working with the R programming language.

I am familiar with the general bootstrap procedure (https://en.wikipedia.org/wiki/Bootstrapping_(statistics):

  • Suppose you have a dataset of size "n"
  • Take a random sample with replacement of size "n"
  • Take the mean of this random sample
  • Repeat the above steps many times

My Question: I am interested in extending this to the "weighted bootstrap" - that is, now each observation has an associated probability of being selected.

Here is my attempt to write the R code for this:

  # function to calculate the weighted mean (inputs: data x and weights w)

    weighted_mean &lt;- function(x, w) {
      sum(x * w) / sum(w)
    }
    
    # function that performs random sampling with replacement where the probability of selecting any point is proportional to the assigned weight (inputs: R is the number of bootstrap repetitions) 

  
    weighted_bootstrap &lt;- function(data, weights, R) {   
      estimates &lt;- numeric(R)  
      for (i in seq_len(R)) {
        bootstrap_sample &lt;- sample(data, size = length(data), replace = TRUE, prob = weights)
        estimates[i] &lt;- weighted_mean(bootstrap_sample, weights)
      }
      estimates
    }

Here is how this weighted bootstrap function would be used on some data (note that the weights must add to 1) :

  data &lt;- c(1, 2, 3, 4, 5)
        weights &lt;- c(0.1, 0.2, 0.3, 0.2, 0.2)
        R &lt;- 1000
        estimates &lt;- weighted_bootstrap(data, weights, R)
        plot(hist(estimates))

Can someone please tell me if I have understood this correctly?
Thanks!

答案1

得分: 3

目前的实现方式中,权重被使用了两次。
首先是在 sample() 函数中,这是正确的。
然后再次在 weighted_mean() 函数中使用。这将产生错误的结果,因为权重向量不会改变,因此例如第五个权重将始终用于加权自举样本中的第五个观测值。

因此,要实现你想要的功能,代码应该是:

weighted_bootstrap <- function(data, weights, R) {   
  estimates <- numeric(R)  
  for (i in seq_len(R)) {
    bootstrap_sample <- sample(data, size = length(data), replace = TRUE, prob = weights)
    estimates[i] <- mean(bootstrap_sample)
  }
  return(estimates)
}

data <- c(1, 2, 3, 4, 5)
weights <- c(0.1, 0.2, 0.3, 0.2, 0.2)
R <- 1000
estimates <- weighted_bootstrap(data, weights, R)
hist(estimates)

这段代码修复了权重重复使用的问题,以正确计算加权自举样本的均值。

英文:

The way it is currently implemented you are using weights twice.
First in the sample()function. There it is correct.
And then once again for the weighted_mean() function. This one will produce wrong results as the weight vector does not change and therefore e.g. the fifth weight will always be used to weigh the fifth observation in your bootstrap sample.

Therefore, to achieve what you want to do the code would be:

weighted_bootstrap &lt;- function(data, weights, R) {   
  estimates &lt;- numeric(R)  
  for (i in seq_len(R)) {
    bootstrap_sample &lt;- sample(data, size = length(data), replace = TRUE, prob = weights)
    estimates[i] &lt;- mean(bootstrap_sample)
  }
  return(estimates)
}

data &lt;- c(1, 2, 3, 4, 5)
weights &lt;- c(0.1, 0.2, 0.3, 0.2, 0.2)
R &lt;- 1000
estimates &lt;- weighted_bootstrap(data, weights, R)
hist(estimates)

huangapple
  • 本文由 发表于 2023年8月9日 11:28:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76864378-2.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定