2023年8月9日 11:28:08go评论97阅读模式

英文:

R: Weighted Bootstrap in R

问题

我正在使用R编程语言工作。

我熟悉一般的自助法程序（https://en.wikipedia.org/wiki/Bootstrapping_(statistics)：

假设您有大小为 "n" 的数据集。
用替换方式随机抽取大小为 "n" 的样本。
取这个随机样本的均值。
重复上述步骤多次。

我的问题： 我有兴趣将这扩展到 "加权自助法"，也就是说，现在每个观测都有与之关联的被选中的概率。

这是我尝试编写的R代码：

# 计算加权均值的函数（输入：数据 x 和权重 w）
weighted_mean <- function(x, w) {
  sum(x * w) / sum(w)
}
# 执行带有替换的随机抽样的函数，其中选择任何数据点的概率与分配的权重成比例（输入：R 是自助法重复的次数）
weighted_bootstrap <- function(data, weights, R) {   
  estimates <- numeric(R)  
  for (i in seq_len(R)) {
    bootstrap_sample <- sample(data, size = length(data), replace = TRUE, prob = weights)
    estimates[i] <- weighted_mean(bootstrap_sample, weights)
  }
  estimates
}

这是如何在一些数据上使用此加权自助法函数的示例（注意权重必须加起来等于1）：

data <- c(1, 2, 3, 4, 5)
weights <- c(0.1, 0.2, 0.3, 0.2, 0.2)
R <- 1000
estimates <- weighted_bootstrap(data, weights, R)
plot(hist(estimates))

请问有人可以告诉我是否我理解正确吗？
谢谢！

英文:

I am working with the R programming language.

I am familiar with the general bootstrap procedure (https://en.wikipedia.org/wiki/Bootstrapping_(statistics):

Suppose you have a dataset of size "n"
Take a random sample with replacement of size "n"
Take the mean of this random sample
Repeat the above steps many times

My Question: I am interested in extending this to the "weighted bootstrap" - that is, now each observation has an associated probability of being selected.

Here is my attempt to write the R code for this:

  # function to calculate the weighted mean (inputs: data x and weights w)
    weighted_mean &lt;- function(x, w) {
      sum(x * w) / sum(w)
    }
    
    # function that performs random sampling with replacement where the probability of selecting any point is proportional to the assigned weight (inputs: R is the number of bootstrap repetitions) 
  
    weighted_bootstrap &lt;- function(data, weights, R) {   
      estimates &lt;- numeric(R)  
      for (i in seq_len(R)) {
        bootstrap_sample &lt;- sample(data, size = length(data), replace = TRUE, prob = weights)
        estimates[i] &lt;- weighted_mean(bootstrap_sample, weights)
      }
      estimates
    }

Here is how this weighted bootstrap function would be used on some data (note that the weights must add to 1) :

  data &lt;- c(1, 2, 3, 4, 5)
        weights &lt;- c(0.1, 0.2, 0.3, 0.2, 0.2)
        R &lt;- 1000
        estimates &lt;- weighted_bootstrap(data, weights, R)
        plot(hist(estimates))

Can someone please tell me if I have understood this correctly?
Thanks!

答案1

得分: 3

目前的实现方式中，权重被使用了两次。
首先是在 sample() 函数中，这是正确的。
然后再次在 weighted_mean() 函数中使用。这将产生错误的结果，因为权重向量不会改变，因此例如第五个权重将始终用于加权自举样本中的第五个观测值。

因此，要实现你想要的功能，代码应该是：

weighted_bootstrap <- function(data, weights, R) {   
  estimates <- numeric(R)  
  for (i in seq_len(R)) {
    bootstrap_sample <- sample(data, size = length(data), replace = TRUE, prob = weights)
    estimates[i] <- mean(bootstrap_sample)
  }
  return(estimates)
}
data <- c(1, 2, 3, 4, 5)
weights <- c(0.1, 0.2, 0.3, 0.2, 0.2)
R <- 1000
estimates <- weighted_bootstrap(data, weights, R)
hist(estimates)

这段代码修复了权重重复使用的问题，以正确计算加权自举样本的均值。

英文:

The way it is currently implemented you are using weights twice.
First in the sample()function. There it is correct.
And then once again for the weighted_mean() function. This one will produce wrong results as the weight vector does not change and therefore e.g. the fifth weight will always be used to weigh the fifth observation in your bootstrap sample.

Therefore, to achieve what you want to do the code would be:

weighted_bootstrap &lt;- function(data, weights, R) {   
  estimates &lt;- numeric(R)  
  for (i in seq_len(R)) {
    bootstrap_sample &lt;- sample(data, size = length(data), replace = TRUE, prob = weights)
    estimates[i] &lt;- mean(bootstrap_sample)
  }
  return(estimates)
}
data &lt;- c(1, 2, 3, 4, 5)
weights &lt;- c(0.1, 0.2, 0.3, 0.2, 0.2)
R &lt;- 1000
estimates &lt;- weighted_bootstrap(data, weights, R)
hist(estimates)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

R中的加权自助法

问题

答案1

Create plot with relative time point in R

如何将摘要函数存储到一个向量中，然后在R中使用for循环？

修复R中的foreach和dopar循环。

如何将数据框中的每一列分别乘以不同的值每列。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。