如何在Python中执行具有强制梯度的线性回归?

huangapple go评论65阅读模式
英文:

How to perform a linear regression with a forced gradient in Python?

问题

我正在尝试对一些有限且分散的数据进行线性回归。我从理论上知道斜率应该为1,但可能有一个y轴偏移。我找到了很多关于如何强制进行线性回归截距的资源,但从未找到关于如何强制斜率的内容。我需要报告线性回归的统计数据,并且需要斜率精确为1。

是否需要手动计算统计数据?还是可以使用像"statsmodels," "scipy," 或 "scikit-learn" 这样的包?或者我需要使用先前对斜率的了解来进行贝叶斯方法?

以下是我试图实现的图形示例:

import numpy as np
import matplotlib.pyplot as plt

# 生成随机数据以说明观点
n = 20
x = np.random.uniform(10, 20, n)
y = x - np.random.normal(1, 1, n) # 在1:1关系上加入噪声

plt.scatter(x, y, ec="k", label="测量数据")

true_x = np.array((8, 20))
plt.plot(true_x, true_x, "k--") # 1:1线
plt.plot(true_x, true_x-1, "r:", label="强制斜率") # 理论线

m, c = np.polyfit(x, y, 1)
plt.plot(true_x, true_x*m + c, "g:", label="线性回归")

plt.xlabel("理论值")
plt.ylabel("测量值")
plt.legend()

如何在Python中执行具有强制梯度的线性回归?

英文:

I am trying to do a linear regression on some limited and scattered data. I know from theory that the gradient should be 1, but it may have a y-offset. I found a lot of resources on how to force an intercept for linear regression, but never on forcing a gradient. I need the linear regression statistics to be reported and the gradient to be precisely 1.
Would I need to manually calculate the statistics? Or is there a way to use some packages like "statsmodels," "scipy," or "scikit-learn"? Or do I need to use a Bayesian approach with previous knowledge of the gradient?

Here is a graphical example of what I am trying to achieve.

import numpy as np
import matplotlib.pyplot as plt

# Generate random data to illustrate the point

n = 20
x = np.random.uniform(10, 20, n)
y = x - np.random.normal(1, 1, n) # Add noise to the 1:1 relationship

plt.scatter(x, y, ec="k", label="Measured data")

true_x = np.array((8, 20))
plt.plot(true_x, true_x, "k--") # 1:1 line
plt.plot(true_x, true_x-1, "r:", label="Forced gradient") # Theoretical line

m, c = np.polyfit(x, y, 1)
plt.plot(true_x, true_x\*m + c, "g:", label="Linear regression")

plt.xlabel("Theoretical value")
plt.ylabel("Measured value")
plt.legend()

如何在Python中执行具有强制梯度的线性回归?

答案1

得分: 2

我建议使用 scipy.optimize.curve_fit,它具有灵活和易于使用的优点,也适用于非线性回归。您只需要定义一个表示具有已知斜率和输入偏移的线的函数:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

def func(x, a):
    gradient = 1 # 固定斜率,不进行优化
    return gradient * x + a

xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5)
rng = np.random.default_rng()
y_noise = 0.2 * rng.normal(size=xdata.size)
ydata = y + y_noise
plt.plot(xdata, ydata, 'b-', label='data')

popt, pcov = curve_fit(func, xdata, ydata)
popt
plt.plot(xdata, func(xdata, *popt), 'r-', label='fit: a=%5.3f' % tuple(popt))

plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

生成的图表如下:

如何在Python中执行具有强制梯度的线性回归?

英文:

I suggest using scipy.optimize.curve_fit that has the benefit of being flexible and easy to use also for non-linear regressions. You just need to define a function that represents a line with a known gradient and an offset given as input:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

def func(x, a):
    gradient = 1 # fixed gradient, not optimized
    return gradient * x + a

xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5)
rng = np.random.default_rng()
y_noise = 0.2 * rng.normal(size=xdata.size)
ydata = y + y_noise
plt.plot(xdata, ydata, 'b-', label='data')

popt, pcov = curve_fit(func, xdata, ydata)
popt
plt.plot(xdata, func(xdata, *popt), 'r-',
         label='fit: a=%5.3f' % tuple(popt))

plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

That generates the plot:

如何在Python中执行具有强制梯度的线性回归?

huangapple
  • 本文由 发表于 2023年2月8日 18:56:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/75384761.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定