Scipy.stats T-分布的置信区间与手动计算的结果不同。

huangapple go评论82阅读模式
英文:

Scipy.stats Confidence Intervals for T-distribution are different than when calculated by hand

问题

我正在尝试使用给定的数组找到均值的95%置信区间。问题在于,每当我尝试使用stats.t中的interval方法时,它给我一个与我手动计算的置信区间不同的结果。我是否可能不小心错误地使用了interval方法?

我已附上我使用的代码如下:

# 找到一个95%置信度的爆米花袋均重置信区间
data = np.array([91, 101, 98, 98, 103, 97, 102, 105, 94, 90])

sMu = np.mean(data)
sSigma = np.std(data)
sem = stats.sem(data)
n = len(data)
df = n - 1

dist = stats.t(df)
critical_value = dist.ppf(0.975)

print(dist.interval(0.975, loc=sMu, scale=sem))
print(stats.t.interval(0.975, df=9, loc=sMu, scale=sem))

upper = sMu + (sem * critical_value)
lower = sMu - (sem * critical_value)
print('手动计算的:', upper, ' ', lower)
英文:

I’m trying to find the 95 percent confidence interval of the mean using the given array. The problem is whenever I try using the interval method from stats.t, it gives me a different result than my hand calculated confidence interval. Could I have been inadvertently using the interval method incorrectly?

I’ve attached the code which I used below.

# find a 95 percent confidence interval for the mean weight of a popcorn bag
data = np.array([91, 101, 98, 98, 103, 97, 102, 105, 94, 90])

sMu = np.mean(data)
sSigma = np.std(data)
sem = stats.sem(data)
n = len(data)
df = n - 1

dist = stats.t(df)
critical_value = dist.ppf(0.975)

print(dist.interval(0.975, loc=sMu, scale=sem))
print(stats.t.interval(0.975, df=9, loc=sMu, scale=sem))

upper = sMu + (sem * critical_value)
lower = sMu - (sem * critical_value)
print('Manually Calculated: ', upper,' ', lower)

答案1

得分: 2

t-分布的ppf方法(百分点函数)用于找到与给定置信水平相对应的临界值。在这种情况下,您想要找到95%置信区间的临界值,对应于t-分布中的0.975。之所以使用0.975,是因为临界值与t-分布的上尾相关联,您需要找到使得尾部保留2.5%概率的值(以获取中间的95%置信区间)。

但是,interval方法用于直接从分布对象计算置信区间。因此,您必须将0.95(而不是0.975)作为第一个参数传递,以获得95%的置信区间。该方法在内部会处理找到与给定置信水平相对应的临界值。说到这一点,只需在以下两行代码中将0.975的值替换为0.95即可:

print(dist.interval(0.95, loc=sMu, scale=sem))
print(stats.t.interval(0.95, df=9, loc=sMu, scale=sem))
英文:

The ppf method (percent point function) of the t-distribution is used to find the critical value corresponding to a given level of confidence. In this case, you want to find the critical value for a 95% confidence interval, which corresponds to 0.975 in the t-distribution. The reason for using 0.975 is that the critical value is associated with the upper tail of the t-distribution, and you need to find the value that leaves 2.5% of the probability in the tail (to get the middle 95% confidence interval).

But, the interval method is used to calculate a confidence interval directly from the distribution object. So, you have to pass 0.95 (instead of 0.975) as the first argument to get a 95% confidence interval. The method internally takes care of finding the critical values corresponding to the given level of confidence. With that being said, simply replace the value of 0.975 with 0.95 in the following two lines:

print(dist.interval(0.95, loc=sMu, scale=sem))
print(stats.t.interval(0.95, df=9, loc=sMu, scale=sem))

huangapple
  • 本文由 发表于 2023年8月4日 00:50:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76830122.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定