英文:
How can I incoporate error bars into my P values for linear regression in python?
问题
我对在Python中统计验证线性回归问题感兴趣。传统上,可以使用scipy的linregress
函数来解决这些问题。例如:
x = np.linspace(0,1,25)
y = 0.5*x + np.random.normal(0,0.15,len(x))
err = np.random.uniform(3.8,0.5,len(x))
plt.scatter(x,y)
然后,我们可以使用linregress(x,y)
来计算我们的p值。在这种情况下,我们得到一个pvalue=1.3e-8
,所以我们的拟合是显著的,这在我们的图中似乎是合理的。
现在,考虑到误差的大小,拟合显著性的结论似乎有问题。是否有一种方法可以在Python中将误差大小的信息纳入p值测试中?
英文:
I'm interested in statistically validating linear regression problems in python. Traditionally, these problems can be solved with scipy's linregress
function. For example:
x = np.linspace(0,1,25)
y = 0.5*x + np.random.normal(0,0.15,len(x))
err = np.random.uniform(3.8,0.5,len(x))
plt.scatter(x,y)
then we can use linregress(x,y)
to compute our p value. In this case we obtain a pvalue=1.3e-8
so our fit is significant, which seems reasonable given our plot.
However, the picture changes if we also plot the error bars:
Now, given the size of the error, the conclusion that the fit is significant seems suspect. Is there a way to incorporate information about the size of the errors into a pvalue test in python?
答案1
得分: 1
根据我所知,普通的线性回归只是最小化了回归线的误差平方和,因此它不考虑数据点的个体误差。
我认为你可能对p值的解释出现了错误,即使误差非常大,如此情况下,相关性和斜率看起来也存在。
可以这样想,如果误差范围非常大,那么你怎么解释数据点之间存在如此明确的升序线性关系呢?这就是p值较小的原因。
根据文档:
p值 浮点数
针对零斜率的假设检验的p值,使用t-分布的瓦尔德检验统计量。
对我来说看起来没问题,你还可以考虑一种情况,即测量是精确但不准确的,因此在y轴上可能存在极大的偏移(如果你愿意,可以称之为误差栏),就像一个非校准仪器(具有线性响应)的情况,这仍然不会影响p值,这与这种情况有些相似。
英文:
As far as i know, the common linear regression just minimizes the squared sum of the errors to the regression line, so it doesn't take into account the individual errors of the data points.
What I think is that you may have a interpretation error of the p-value, even if the error is absolutely huge as it is the case, the correlation and slope looks to be there.
Think it like this, if the error bars are sooo huge, isn't it weird that you have such a well defined ascending line by the points? so that's why the p-value is small.
From the docs:
> pvalue float
>
> The p-value for a hypothesis test whose null hypothesis is that the
> slope is zero, using Wald Test with t-distribution of the test
> statistic.
So for me it looks ok, also you can think of the case where your measurement is precise but not accurate, so you may have extremely big shifts in the y axis (hence errors bars if you want) like in a non calibrated instrument (with a linear response), that would still not a affect to that p-value, and it is kind of similar case to this one.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论