2023年8月9日 11:28:08go评论96阅读模式

英文:

R: Weighted Bootstrap in R

问题

我正在使用R编程语言进行工作。

我熟悉一般的自助法程序（https://en.wikipedia.org/wiki/Bootstrapping_(statistics)）：

假设你有一个大小为“n”的数据集
用放回抽样的方式从中抽取大小为“n”的随机样本
计算这个随机样本的均值
重复上述步骤多次

**我的问题：**我对将这个方法扩展到“加权自助法”很感兴趣，也就是说，现在每个观测值都有一个与之相关的选择概率。

这是我尝试编写的R代码：

  # 计算加权均值的函数（输入：数据x和权重w）
    weighted_mean <- function(x, w) {
      sum(x * w) / sum(w)
    }
    
    # 执行随机抽样的函数，其中选择任何点的概率与分配的权重成比例（输入：R是自助法重复次数）
  
    weighted_bootstrap <- function(data, weights, R) {   
      estimates <- numeric(R)  
      for (i in seq_len(R)) {
        bootstrap_sample <- sample(data, size = length(data), replace = TRUE, prob = weights)
        estimates[i] <- weighted_mean(bootstrap_sample, weights)
      }
      estimates
    }

这是如何在一些数据上使用这个加权自助法函数的示例（注意权重必须加起来等于1）：

data <- c(1, 2, 3, 4, 5)
        weights <- c(0.1, 0.2, 0.3, 0.2, 0.2)
        R <- 1000
        estimates <- weighted_bootstrap(data, weights, R)
        plot(hist(estimates))

请问有人可以告诉我我是否理解得正确吗？
谢谢！

英文:

I am working with the R programming language.

I am familiar with the general bootstrap procedure (https://en.wikipedia.org/wiki/Bootstrapping_(statistics):

Suppose you have a dataset of size "n"
Take a random sample with replacement of size "n"
Take the mean of this random sample
Repeat the above steps many times

My Question: I am interested in extending this to the "weighted bootstrap" - that is, now each observation has an associated probability of being selected.

Here is my attempt to write the R code for this:

  # function to calculate the weighted mean (inputs: data x and weights w)
    weighted_mean &lt;- function(x, w) {
      sum(x * w) / sum(w)
    }
    
    # function that performs random sampling with replacement where the probability of selecting any point is proportional to the assigned weight (inputs: R is the number of bootstrap repetitions) 
  
    weighted_bootstrap &lt;- function(data, weights, R) {   
      estimates &lt;- numeric(R)  
      for (i in seq_len(R)) {
        bootstrap_sample &lt;- sample(data, size = length(data), replace = TRUE, prob = weights)
        estimates[i] &lt;- weighted_mean(bootstrap_sample, weights)
      }
      estimates
    }

Here is how this weighted bootstrap function would be used on some data (note that the weights must add to 1) :

  data &lt;- c(1, 2, 3, 4, 5)
        weights &lt;- c(0.1, 0.2, 0.3, 0.2, 0.2)
        R &lt;- 1000
        estimates &lt;- weighted_bootstrap(data, weights, R)
        plot(hist(estimates))

Can someone please tell me if I have understood this correctly?
Thanks!

答案1

得分: 3

目前的实现方式中，权重被使用了两次。
首先在sample()函数中使用是正确的。
然后在weighted_mean()函数中再次使用权重。这会导致错误的结果，因为权重向量不会改变，因此例如第五个权重将始终用于加权自助样本中的第五个观测值。

因此，要实现你想要的功能，代码应该是这样的：

weighted_bootstrap <- function(data, weights, R) {   
  estimates <- numeric(R)  
  for (i in seq_len(R)) {
    bootstrap_sample <- sample(data, size = length(data), replace = TRUE, prob = weights)
    estimates[i] <- mean(bootstrap_sample)
  }
  return(estimates)
}
data <- c(1, 2, 3, 4, 5)
weights <- c(0.1, 0.2, 0.3, 0.2, 0.2)
R <- 1000
estimates <- weighted_bootstrap(data, weights, R)
hist(estimates)

英文:

The way it is currently implemented you are using weights twice.
First in the sample()function. There it is correct.
And then once again for the weighted_mean() function. This one will produce wrong results as the weight vector does not change and therefore e.g. the fifth weight will always be used to weigh the fifth observation in your bootstrap sample.

Therefore, to achieve what you want to do the code would be:

weighted_bootstrap &lt;- function(data, weights, R) {   
  estimates &lt;- numeric(R)  
  for (i in seq_len(R)) {
    bootstrap_sample &lt;- sample(data, size = length(data), replace = TRUE, prob = weights)
    estimates[i] &lt;- mean(bootstrap_sample)
  }
  return(estimates)
}
data &lt;- c(1, 2, 3, 4, 5)
weights &lt;- c(0.1, 0.2, 0.3, 0.2, 0.2)
R &lt;- 1000
estimates &lt;- weighted_bootstrap(data, weights, R)
hist(estimates)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

R: Weighted Bootstrap in R

问题

答案1

比较数字和整数值有时为TRUE，有时为FALSE。

闪亮，表格切片，文本字段

在R中创建Sankey或Alluvial图，并在”next_node”和”next_x”值为”NA”时停止流动。

Mutate case_when 嵌套条件标签

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。