2023年6月29日 14:28:08go评论98阅读模式

英文:

How can I incoporate error bars into my P values for linear regression in python?

问题

我对在Python中统计验证线性回归问题感兴趣。传统上，可以使用scipy的linregress函数来解决这些问题。例如：

x = np.linspace(0,1,25)
y = 0.5*x + np.random.normal(0,0.15,len(x))
err = np.random.uniform(3.8,0.5,len(x))
plt.scatter(x,y)

然后，我们可以使用linregress(x,y)来计算我们的p值。在这种情况下，我们得到一个pvalue=1.3e-8，所以我们的拟合是显著的，这在我们的图中似乎是合理的。

然而，如果我们也绘制误差线，情况就会改变：

现在，考虑到误差的大小，拟合显著性的结论似乎有问题。是否有一种方法可以在Python中将误差大小的信息纳入p值测试中？

英文:

I'm interested in statistically validating linear regression problems in python. Traditionally, these problems can be solved with scipy's linregress function. For example:

x = np.linspace(0,1,25)
y = 0.5*x + np.random.normal(0,0.15,len(x))
err = np.random.uniform(3.8,0.5,len(x))
plt.scatter(x,y)

then we can use linregress(x,y) to compute our p value. In this case we obtain a pvalue=1.3e-8 so our fit is significant, which seems reasonable given our plot.

However, the picture changes if we also plot the error bars:

Now, given the size of the error, the conclusion that the fit is significant seems suspect. Is there a way to incorporate information about the size of the errors into a pvalue test in python?

答案1

得分: 1

根据我所知，普通的线性回归只是最小化了回归线的误差平方和，因此它不考虑数据点的个体误差。

我认为你可能对p值的解释出现了错误，即使误差非常大，如此情况下，相关性和斜率看起来也存在。

可以这样想，如果误差范围非常大，那么你怎么解释数据点之间存在如此明确的升序线性关系呢？这就是p值较小的原因。

根据文档：

p值浮点数

针对零斜率的假设检验的p值，使用t-分布的瓦尔德检验统计量。

对我来说看起来没问题，你还可以考虑一种情况，即测量是精确但不准确的，因此在y轴上可能存在极大的偏移（如果你愿意，可以称之为误差栏），就像一个非校准仪器（具有线性响应）的情况，这仍然不会影响p值，这与这种情况有些相似。

英文:

As far as i know, the common linear regression just minimizes the squared sum of the errors to the regression line, so it doesn't take into account the individual errors of the data points.

What I think is that you may have a interpretation error of the p-value, even if the error is absolutely huge as it is the case, the correlation and slope looks to be there.

Think it like this, if the error bars are sooo huge, isn't it weird that you have such a well defined ascending line by the points? so that's why the p-value is small.

From the docs:

> pvalue float
>
> The p-value for a hypothesis test whose null hypothesis is that the
> slope is zero, using Wald Test with t-distribution of the test
> statistic.

So for me it looks ok, also you can think of the case where your measurement is precise but not accurate, so you may have extremely big shifts in the y axis (hence errors bars if you want) like in a non calibrated instrument (with a linear response), that would still not a affect to that p-value, and it is kind of similar case to this one.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Python中为线性回归的P值加入误差线？

问题

答案1

GCP: 在本地创建Client()对象

在二维列表中搜索（位置）

How do I get Selenium in Python to select a dropdown that doesn't appear to have <select> and <option> tags?

Pandas的table_pivot生成了错误数量的列。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。