英文:
How to perform a linear regression with a forced gradient in Python?
问题
我正在尝试对一些有限且分散的数据进行线性回归。我从理论上知道斜率应该为1,但可能有一个y轴偏移。我找到了很多关于如何强制进行线性回归截距的资源,但从未找到关于如何强制斜率的内容。我需要报告线性回归的统计数据,并且需要斜率精确为1。
是否需要手动计算统计数据?还是可以使用像"statsmodels," "scipy," 或 "scikit-learn" 这样的包?或者我需要使用先前对斜率的了解来进行贝叶斯方法?
以下是我试图实现的图形示例:
import numpy as np
import matplotlib.pyplot as plt
# 生成随机数据以说明观点
n = 20
x = np.random.uniform(10, 20, n)
y = x - np.random.normal(1, 1, n) # 在1:1关系上加入噪声
plt.scatter(x, y, ec="k", label="测量数据")
true_x = np.array((8, 20))
plt.plot(true_x, true_x, "k--") # 1:1线
plt.plot(true_x, true_x-1, "r:", label="强制斜率") # 理论线
m, c = np.polyfit(x, y, 1)
plt.plot(true_x, true_x*m + c, "g:", label="线性回归")
plt.xlabel("理论值")
plt.ylabel("测量值")
plt.legend()
英文:
I am trying to do a linear regression on some limited and scattered data. I know from theory that the gradient should be 1, but it may have a y-offset. I found a lot of resources on how to force an intercept for linear regression, but never on forcing a gradient. I need the linear regression statistics to be reported and the gradient to be precisely 1.
Would I need to manually calculate the statistics? Or is there a way to use some packages like "statsmodels," "scipy," or "scikit-learn"? Or do I need to use a Bayesian approach with previous knowledge of the gradient?
Here is a graphical example of what I am trying to achieve.
import numpy as np
import matplotlib.pyplot as plt
# Generate random data to illustrate the point
n = 20
x = np.random.uniform(10, 20, n)
y = x - np.random.normal(1, 1, n) # Add noise to the 1:1 relationship
plt.scatter(x, y, ec="k", label="Measured data")
true_x = np.array((8, 20))
plt.plot(true_x, true_x, "k--") # 1:1 line
plt.plot(true_x, true_x-1, "r:", label="Forced gradient") # Theoretical line
m, c = np.polyfit(x, y, 1)
plt.plot(true_x, true_x\*m + c, "g:", label="Linear regression")
plt.xlabel("Theoretical value")
plt.ylabel("Measured value")
plt.legend()
答案1
得分: 2
我建议使用 scipy.optimize.curve_fit
,它具有灵活和易于使用的优点,也适用于非线性回归。您只需要定义一个表示具有已知斜率和输入偏移的线的函数:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a):
gradient = 1 # 固定斜率,不进行优化
return gradient * x + a
xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5)
rng = np.random.default_rng()
y_noise = 0.2 * rng.normal(size=xdata.size)
ydata = y + y_noise
plt.plot(xdata, ydata, 'b-', label='data')
popt, pcov = curve_fit(func, xdata, ydata)
popt
plt.plot(xdata, func(xdata, *popt), 'r-', label='fit: a=%5.3f' % tuple(popt))
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
生成的图表如下:
英文:
I suggest using scipy.optimize.curve_fit
that has the benefit of being flexible and easy to use also for non-linear regressions. You just need to define a function that represents a line with a known gradient and an offset given as input:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a):
gradient = 1 # fixed gradient, not optimized
return gradient * x + a
xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5)
rng = np.random.default_rng()
y_noise = 0.2 * rng.normal(size=xdata.size)
ydata = y + y_noise
plt.plot(xdata, ydata, 'b-', label='data')
popt, pcov = curve_fit(func, xdata, ydata)
popt
plt.plot(xdata, func(xdata, *popt), 'r-',
label='fit: a=%5.3f' % tuple(popt))
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
That generates the plot:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论