英文:
scipy curve_fit does not find the best fit
问题
我想找出我的时间序列中是否存在峰值。为此,我尝试将数据点拟合成高斯曲线。对于数千个样本,这个方法效果很好。但是,尽管存在明显的峰值,有些样本拟合得不够好。这是代码部分:
def gauss(x, a, x0, sigma):
return a * np.exp(-(x - x0) ** 2 / (2 * sigma ** 2))
param, covariance = curve_fit(gauss, x, y)
我注意到y值的幅度对拟合结果有影响,所以我将数据重新缩放到<0, 100>的区间。这有所帮助,但并没有解决所有情况。有没有其他方法可以改善拟合?不同的初始化?更小的优化步长?
以下是有关数据的一些信息:
- 每个样本有3-20个数据点。
- 峰值(如果有的话)必须在这个范围内具有最高点。
- x轴范围从0到20
- y轴范围从0到100
我浏览了stackoverflow上的其他类似问题,但没有找到解决方案。
如果有人知道更好的方法来确定时间序列中是否存在峰值,我会很感激在评论中听到。无论如何,我想知道为什么有些曲线拟合得不好。
英文:
I'd like to find out whether there is a peak in my time series or not. To do so, I'm trying to fit the data points to a gauss curve. It works well for thousands of my samples:
but few are not fitter properly despite there is an obvious peak:
(see the very low peak with the highest point around 0.03)
Here's the code:
def gauss(x, a, x0, sigma):
return a * np.exp(-(x - x0) ** 2 / (2 * sigma ** 2))
param, covariance = curve_fit(gauss, x, y)
I've noticed that the magnitude of the y values plays a role in the fitting so I've rescaled the data into <0, 100> interval. This helped but did not solve all the cases. Is there anything else I can do to improve the fitting? Different initialization? Smaller optimization step?
Here are some facts about the data:
- Every sample has 3-20 data points.
- The peak (if there is any) must have its highest point inside the span.
- x axis is from 0 to 20
- y axis is from 0 to 100
I've browse through other similar questions at stackoverflow but did not find a solution to my problem.
If anybody knows a better solution to determine whether there is a peak in the time series or not, I'd appreciate to hear it in the comments. Anyway, I'd like to know why some curves are not fitted properly.
答案1
得分: 0
非线性回归涉及从参数的初始“猜测”值开始的迭代计算。
可能的问题是当“猜测”值与未知的正确值相距太远时会出现困难。
为了避免这种困难,可以使用一种不需要初始值的非迭代方法。
这种非传统的方法如下所示:
数值示例:
一种方法的一般原理在这里解释:https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales。与拟合模型方程不同,它将积分方程与模型方程的解关联起来。这需要初步的数值积分(用于Sk和Tk)。如果点数不足,数值积分将不够准确,最终拟合结果会很差。这就是为什么我没有尝试拟合第一个OP的示例的原因:显然6个点的数据不足。在这种情况下,不建议使用带有积分方程的非迭代回归方法。
注意:点的坐标是根据第二个OP的图测量的,这不是准确的测量。
我建议OP尝试将上述结果作为参数的初始值,以启动他的软件的迭代过程,看看计算困难是否消失。要注意使用不同的符号:他的“a”=我的“b”;他的“x0”=我的“c”;他的“sigma”=我的“1/sqrt(2p)”。
英文:
The nonlinear regression involves iterative calculus starting from initial "guessed" values of the parameters.
Probably the trouble appears when the "guessed" values are too far from the unknown correct values.
In order to avoid the difficulty one can use a non-iterative method which doesn't requires initial values.
This non-conventional method is shown below :
Numerical example :
The general principle of this method is explained in : https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales . Instead of fitting the model equation one fit an integral equation to which the model equation is solution. This requires preliminary numerical integrations (for Sk and Tk). If they are not enough points the numerical integrations are not accurate enough and the final fitting is bad. That is why I didn't try to fit the first OP's example : A data of 6 points is obviously not enough. The non-iterative method of regression with integral equation is not recommended in this case.
Note : The coordinates of the points were measured on the second OP's graph which is not accurate measurement.
I suggest to the OP to try the above resuts as initial values of the parameters to start the iterative process of his software and see if the trouble of calculus disappears. Taking care of the different symbols used : his "a"= my "b" ; his "x0"= my "c" ; his "sigma"= my "1/sqrt(2p)".
答案2
得分: 0
curve_fit
方法提供了定义初始函数参数的选项。选择钟形曲线的初始高度和位置,使其与系列中的“最高”点的y和x坐标非常匹配,在我的数据上效果非常好。
此外,curve_fit还允许设置函数参数的边界,以确保钟形曲线的高度不会低于(或高于)某个特定水平。
以下代码适用于我所有的数据样本:
init = [max(y), np.argmax(y), 1.5]
bounds = ([50, len(x) * 0.05, -20], [150, (len(x) - 1) * 0.95, len(x) / 2])
param, covariance = curve_fit(gauss, x, y, p0=init, bounds=bounds)
如果参数超出了边界,我会捕获异常(ValueError),并假定时间序列中没有峰值。
英文:
The curve_fit
method provides an option to define the initial function parameters. Choosing the initial height and position of the bell curve to the y and x coordinates of the "highest" point in the series works very well on my data.
Additionally, curve_fit also allows to set bounds of the function parameters to assure the height of the bell curve does not go below (or above) certain level.
Following code works for all of my data samples:
init = [max(y), np.argmax(y), 1.5]
bounds = ([50, len(x) * 0.05, -20], [150, (len(x) - 1) * 0.95, len(x) / 2])
param, covariance = curve_fit(gauss, x, y, p0=init, bounds=bounds)
If the parameters are outside the bounds, I catch the exception (ValueError) and assume there is no peak in the time series.
答案3
得分: 0
多项式特征将返回每个自由度的曲线集。我展示了一个3阶多项式与其两个曲线拟合。
from sklearn.preprocessing import PolynomialFeatures
def gaussian(x, mu, sig):
return np.exp(-np.power(x - mu, 2.) / (2 * np.power(sig, 2.)))
n = 500
x = np.linspace(0, 1, n)
mu = x.mean()
sig = x.std()
y = gaussian(x, mu, sig)
poly = PolynomialFeatures(degree=3)
curve_y = poly.fit_transform(np.array(y).reshape(-1,1))
print(curve_y)
# three degrees of free
degree_of_freedom = 1
plt.plot(x, curve_y[:, degree_of_freedom], '.', c="red")
degree_of_freedom = 2
plt.plot(x, curve_y[:, degree_of_freedom], '.', c="red")
plt.plot(x, y, '.', alpha=0.2, c="blue")
plt.show()
print("you can see degree of 1 is the best")
英文:
The Polynomial Feature will return a curve set for each degree of freedom. I demonstrated a 3rd order polynomial with two of its curve fits.
from sklearn.preprocessing import PolynomialFeatures
def gaussian(x, mu, sig):
return np.exp(-np.power(x - mu, 2.) / (2 * np.power(sig, 2.)))
n=500
x = np.linspace(0, 1, n)
mu=x.mean()
sig=x.std()
y=gaussian(x,mu,sig)
poly = PolynomialFeatures(degree=3)
curve_y = poly.fit_transform(np.array(y).reshape(-1,1))
print(curve_y)
#three degrees of free
degree_of_freedom=1
plt.plot(x, curve_y[:,degree_of_freedom], '.',c="red")
degree_of_freedom=2
plt.plot(x, curve_y[:,degree_of_freedom], '.',c="red")
plt.plot(x, y, '.',alpha=0.2, c="blue")
plt.show()
print("you can see degree of 1 is the best")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论