2023年7月14日 02:58:32go评论102阅读模式

英文:

Why is confidence interval greater than 1 for binary data?

问题

当计算二进制数据的置信区间时，上限大于1可能是因为计算中出现了问题。二进制数据的置信区间应该在0到1之间，因为它代表了一个概率或比例。在你的代码中，问题可能出在如何处理二进制数据的方式上。

首先，让我们检查你的第一个代码片段：

import math
def find_CI(a):
    n = len(a)
    p_hat = sum(a) / n
    h = math.sqrt((p_hat * (1 - p_hat) / n))
    ub = p_hat + (1.96 * h)
    lb = p_hat - (1.96 * h)
    return lb, ub

这段代码计算二进制数据的置信区间，但它假设这些数据是0和1的整数。如果你的二进制数据只包含0和1，那么这个代码在理论上是正确的，但实际应用中可能会出现问题。确保你的二进制数据确实只包含0和1，否则结果可能不正确。

然后，让我们来看你的第二个代码片段：

import scipy.stats as st
def find_confidence_interval(a):
    x = st.t.interval(alpha=0.95, df=len(a) - 1,
              loc=np.mean(a),
              scale=st.sem(a))
    return x

这段代码使用了scipy.stats库中的t.interval函数来计算置信区间。同样，它也假设数据是连续的，可能不适用于二进制数据。你需要确保你的数据类型与这个函数的预期输入相匹配。

如果你的数据确实是二进制的，你可以考虑使用二项分布来计算置信区间，因为二项分布适用于二进制数据的情况。你可以使用scipy.stats.binom.interval来进行计算。请确保将数据正确传递给这个函数以获得正确的结果。

不过，无论你使用哪种方法，最终的置信区间应该在0到1之间，因为它表示了一个比例或概率。如果你的结果不在这个范围内，那么可能存在代码实现或数据处理上的问题。

英文:

I have binary data and I want to calculate the confidence interval for that, but why do I get the upper bound greater than 1?
Here is my code

import math
def find_CI(a):
    n = len(a)
    p_hat = sum(a)/n
    h = math.sqrt((p_hat * (1- p_hat) /n))
    ub = p_hat + (1.96 * h)
    lb = p_hat - (1.96 * h)
    return lb, ub

When I pass a = [1,0,0,1,1], I get the result (0.17058551491594975, 1.0294144850840503)

I also tried the following code

import scipy.stats as st
def find_confidence_interval(a):
    x = st.t.interval(alpha=0.95, df=len(a)-1,
              loc=np.mean(a),
              scale=st.sem(a))
    return x

I got the result as (-0.08008738065825705, 1.280087380658257)

I am confused. Shouldn't the confidence interval be between 0 and 1?

答案1

得分: 0

使用 t 统计量来计算二项分布数据的置信区间可能不是一个好主意，因为这意味着你假设你的数据来自一个近似正态分布。

详细了解如何更合适地处理二项分布中的置信区间，请参阅此处。例如，如果你没有很多数据点，可以使用Wilson区间。对于你提供的示例 [1, 1, 1, 0, 0]，Wilson 95% 区间会得到 (0.23, 0.88)。

英文:

Using a t-statistic to calculate confidence intervals for binomial data is probably not a good idea because this means you are assuming your data comes from an approximately normal distribution.

See here for details on how to more appropriately deal with confidence intervals in binomial distributions. For example, you could use the Wilson interval if you don't have many data points. For your [1, 1, 1, 0, 0] example a Wilson 95% interval would give (0.23, 0.88)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

置信区间为二进制数据大于1的原因是什么？

问题

答案1

Creating hash in Go

数据为什么没有添加到单元格？

“Got an unexpected keyword argument ‘skiprows'”

Go语言的线性回归库

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。