2023年6月1日 16:45:49go评论102阅读模式

英文:

Python fit and chi square comparison

问题

我想知道通过计算卡方值来确定哪个函数更适合我的数据，然后进行比较。

我拥有的数据可能看起来像高斯分布，也可能像一条水平线，所以我的目标是每次都使用scipy.curve_fit将我的数据拟合到这两个函数，然后计算卡方值（使用scipy.stats）。但我可以直接比较这两个值并选择"最小"的值吗？因为这两个函数具有相同数量的数据点，但不同的参数（因此自由度不同）：高斯分布有4个参数，而水平线只有1个参数。

def horiz(x, c):
    return np.full_like(x, c)

# 定义高斯函数
def gauss(x, H, A, x0, sigma):
    return H + A * np.exp(-(x - x0) ** 2 / (2 * sigma ** 2))

以下是代码示例：

p0_c = [moy]
popt_c, pcov = curve_fit(horiz, x, Smooth_data, p0=p0_c)
fit_c = horiz(x, *popt_c)
chi_c, p_c = stats.chisquare(Counts_list[i], fit_c)

结束。

英文:

I want to know which functions fits better to my data by calculating the chi square value and then compare it.

The data i have could like either like a gaussian or kind of like an horizontal line, so my goal would be to fit each time my data (using scipy.curve fit) to both functions and then calculate the chisquare value (using scipy.stats), but can i directly compare both values and take the 'lowest' one as both functions have the same number of points but different parameters (hence not the same df): 4 parameters for my gaussian and 1 for my horizontal line ?

def horiz(x,c):
    return np.full_like(x, c)

#Define the Gaussian function
def gauss(x, H, A, x0, sigma):
    return H + A * np.exp(-(x - x0) ** 2 / (2 * sigma ** 2))

example of code below:

p0_c = [moy]
popt_c, pcov = curve_fit(horiz, x, Smooth_data, p0 = p0_c)
fit_c = horiz(x, *popt_c)
chi_c, p_c = stats.chisquare(Counts_list[i], fit_c )

end.

答案1

得分: 0

您可以使用类似于贝叶斯信息准则（BIC）进行模型选择，该准则考虑了每个模型中参数数量的不同，并对具有更多参数的模型施加了惩罚因子。

模拟您示例的版本，可以按照以下步骤完成。

设置模型函数并生成包含高斯“峰”的一些模拟数据。

import numpy as np
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit
def gauss(x, H, A, x0, sigma):
    return H + A * np.exp(-(x - x0) ** 2 / (2 * sigma ** 2))
def horiz(x,c):
    return np.full_like(x, c)
# 生成包含高斯峰的一些数据
n = 10_000  # 数据点的数量
x = np.linspace(0, 1_000_000, n, dtype=float)
# 随机高斯噪声
y = np.random.randn(n) * 5e-8
# 高斯峰参数
sigma = 30_000
x0 = 400_000
H = 0
A = 4.5e-7
y += gauss(x, H, A, x0, sigma)

使用curve_fit来获得最佳拟合、最大似然模型（我不得不缩放参数以使拟合合理工作）。

# 缩放因子
scaleamp = 1e8
scalerange = 1e-5
# 初始参数猜测
iniguess = (0.1e-7 * scaleamp, 1e-7 * scaleamp, 350000.0 * scalerange, 10000.0 * scalerange)
# 拟合高斯峰模型
gaussfit = curve_fit(gauss, x * scalerange, y * scaleamp, p0=iniguess)
# 拟合水平线模型
linefit = curve_fit(horiz, x, y * scaleamp, p0=(0.0,))

生成最佳拟合模型并计算给定数据的（对数）似然度（对于似然函数，因为我们没有数据不确定性的估计，我们可以使用Sivia中第53页的方程3.38中定义的类似于学生t分布）。

Hf, Af, x0f, sigmaf = gaussfit[0]
# 重新缩放最佳拟合参数以生成高斯模型
bestgaussfit = gauss(x, Hf / scaleamp, Af / scaleamp, x0f / scalerange, sigmaf / scalerange)
# 获取对数似然度
logLgauss = -((n - 1) / 2) * np.log(np.sum((y - bestgaussfit)**2))
# 生成最佳拟合水平线模型
bestlinefit = horiz(x, linefit[0][0] / scaleamp)
# 获取对数似然度
logLhoriz = -((n - 1) / 2) * np.log(np.sum((y - bestlinefit)**2))

为两个模型生成BIC。

ngaussparams = 4  # 高斯模型参数的数量
bicgauss = ngaussparams * np.log(n) - 2 * logLgauss
nlineparams = 1  # 平坦线的参数数量
bicline = nlineparams * np.log(n) - 2 * logLhoriz

使用BIC，较低参数数量的模型是首选模型。在这种情况下，高斯模型的BIC要低得多（对于我生成的数据，低约16000）。

英文:

You could use something like the Bayesian Information Criterion (BIC) for model selection, which accounts for the different numbers of parameters in each model with a penalty factor for more parameters.

Mocking up a version of your example, this could be done with the following steps.

Set up model functions and generate some simulated data containing a Gaussian "bump".

import numpy as np
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit
def gauss(x, H, A, x0, sigma):
    return H + A * np.exp(-(x - x0) ** 2 / (2 * sigma ** 2))
def horiz(x,c):
    return np.full_like(x, c)
# generate some data containing a Gaussian bump
n = 10_000  # number of data points
x = np.linspace(0, 1_000_000, n, dtype=float)
# random Gaussian noise
y = np.random.randn(n) * 5e-8
# Gaussian bump parameters
sigma = 30_000
x0 = 400_000
H = 0
A = 4.5e-7
y += gauss(x, H, A, x0, sigma)

Use curve_fit to get a best fit, maximum likelihood, model (I've had to scale the parameters to get the fitting to work reasonably).

# scale factors
scaleamp = 1e8
scalerange = 1e-5
# initial parameter guesses
iniguess = (0.1e-7 * scaleamp, 1e-7 * scaleamp, 350000.0 * scalerange, 10000.0 * scalerange)
# fit Gaussian bump model
gaussfit = curve_fit(gauss, x * scalerange, y * scaleamp, p0=iniguess)
# fit horizontal line model
linefit = curve_fit(horiz, x, y * scaleamp, p0=(0.0,))

Generate the best fit models and calculate the (log)likelihood for that model given the data (for the likelihood function, because we don't have an estimate of the uncertainty in the data, we can use the Student's t-like distribution defined in Eqn 3.38 in Sivia on pg. 53).

Hf, Af, x0f, sigmaf = gaussfit[0]
# rescale best fit parameters to generate the Gaussian model
bestgaussfit = gauss(x, Hf / scaleamp, Af / scaleamp, x0f / scalerange, sigmaf / scalerange)
# get log likelihood
logLgauss = -((n - 1) / 2) * np.log(np.sum((y - bestgaussfit)**2))
# generate the best fit horizontal line model
bestlinefit = horiz(x, linefit[0][0] / scaleamp)
# get log likelihood
logLhoriz = -((n - 1) / 2) * np.log(np.sum((y - bestlinefit)**2))

Generate the BIC for both models.

ngaussparams = 4  # number of Gaussian model parameters
bicgauss = ngaussparams * np.log(n) - 2 * logLgauss
nlineparams = 1  # number of parameters for the flat line
bicline = nlineparams * np.log(n) - 2 * logLhoriz

With the BIC, the model with the lower number is the preferred model. In this case the BIC for the Gaussian model is a lot lower (by about 16000 for the data I generated).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python拟合和卡方比较

问题

答案1

Using Python, how to print output string as -> aaa3bb2c1ddddd5 when Input string is aaabbcddddd

JAX用于在Python中最小化2点Lennard-Jones势能的代码产生了意外结果。

为什么在我通过切片更改二维列表的值时，列表的值没有反映出来？

如何在对数刻度下显示所有主要和次要刻度标签

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。