英文:
Generate several smooth curves passing through a given set of points
问题
我正在尝试生成大约5000条不同的曲线,这些曲线通过一组预定的点,代表地球内部的层界面。这些曲线必须确保在附近的点之间不出现明显的振荡或突变,因为地球的内部结构不会发生如此剧烈的变化。所有曲线都必须通过给定的点。
目前,我的工作流程涉及在提供的数据点之间随机生成点,并尝试拟合曲线到这些点,从而产生各种曲线。然而,我对探索更高效的方法以实现这一目标很感兴趣。我还在寻找插值方法的建议,这些方法可以在点之间产生平滑的行为并避免振荡倾向。
感谢任何指导!
import numpy as np
from scipy import interpolate
x = [7.81, 21.65, 186.29, 246.62, 402.74, 446.24, 572.99, 585.11, 613.57, 762.97]
y = [26.27, 26.41, 26.46, 27.36, 34.32, 34.50, 37.94, 39.05, 39.23, 39.88]
# y 只能在 25 到 40 之间
fig = plt.figure(figsize=(20, 5))
xnew = np.linspace(min(x), max(x), num=50)
oned = interpolate.CubicSpline(x, y)
yfit = oned(xnew)
plt.scatter(x, y, color='red')
plt.plot(xnew, yfit)
plt.xlim(0, 800)
plt.ylim(20, 45)
plt.gca().invert_yaxis()
plt.show()
一个示例曲线:
英文:
I am trying to generate approximately 5000 distinct curves that pass through a predetermined set of points, representing layer interfaces within the earth. It is essential that these curves do not exhibit significant oscillations or abrupt changes between nearby points, as the internal structure of the Earth does not undergo such drastic changes. All the curves must pass through the given set of points.
At present, my workflow involves randomly generating points between the provided data points and attempting to fit curves to them, resulting in various curves. However, I am interested in exploring more efficient approaches to achieve this. I am also seeking recommendations for interpolation methods that can produce smooth behavior between points and avoid oscillatory tendencies.
Appreciate any guidance!
import numpy as np
from scipy import interpolate
x = [7.81, 21.65, 186.29, 246.62, 402.74, 446.24, 572.99, 585.11, 613.57, 762.97]
y = [26.27, 26.41, 26.46, 27.36, 34.32, 34.50, 37.94, 39.05, 39.23, 39.88]
# y can only range between 25 and 40
fig = plt.figure(figsize=(20,5))
xnew = np.linspace(min(x), max(x), num=50)
oned = interpolate.CubicSpline(x, y)
yfit = oned(xnew)
plt.scatter(x, y, color='red')
plt.plot(xnew, yfit)
plt.xlim(0, 800)
plt.ylim(20,45)
plt.gca().invert_yaxis()
plt.show()
Example of one such curve:
答案1
得分: 1
你试图实现的目标让我想起了一个无噪声高斯过程(GP)回归的教科书示例。GP是一个先进的机器学习主题,对于你的使用情况可能非常有用。以下是一个极好的交互工具,用于理解GP。
如图所示:
它们允许我们通过整合先前的知识来对数据进行预测。它们最明显的应用领域是将一个函数拟合到数据中。这被称为回归,在机器人技术或时间序列预测等领域中使用。但高斯过程不仅限于回归 - 它们也可以扩展到分类和聚类任务。对于给定的一组训练点,可能存在无限多个适合数据的函数。高斯过程通过为每个函数分配一个概率来提供了对这个问题的优雅解决方案。这个概率分布的均值则表示了数据最可能的特征描述。此外,使用概率方法允许我们将预测的置信度纳入到回归结果中。
让我们看看如何使用scikit-learn
库来使用GP回归来实现你想要的结果。
以下代码由三个函数组成:
train_gaussian_process
: 使用你的观察值x
和y
来训练GP。generate_samples_from_gp
: 从我们刚刚训练的GP中采样5个函数(在你的情况下将是5000个)。函数值存储在一个2D的numpy数组中,其中第i列代表第i个函数。draw_samples
: 绘制观察值和采样的函数。
GP的一个关键组成部分是协方差函数。你要求平滑的函数并且想要避免振荡行为,所以Matern内核的RBF内核可能是正确的选择。以下代码使用了Matern内核。可以尝试调整它们的参数以获得你想要的结果。
import numpy as np
from matplotlib import pyplot as plt
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import Matern
rng = np.random.RandomState(42)
# ...(以下为函数的实现部分)
正如你可以从下面的图像中看到的,所有的函数都会通过你的观察点。
你还可以通过输入以下代码获取均值函数:
mean_prediction = gp_regressor.predict(additional_x)
这是带有5000个函数的结果图!
上面的代码是通用的,因此可以用于n维的情况。例如,下面是一个3D情况的5个样本(我假设数据生成过程是 x + ln(y)
,但你可以选择任何你想要的公式,我需要这个方程来获得训练步骤中的 z
值)。
英文:
What you are trying to achieve reminded me of a textbook example of a noise-free Gaussian Process (GP) Regression. GP is an advance machine learning topic that could be very useful for your use case. Here is a terrific interactive tool for understanding GPs.
As explained there
> They allow us to make predictions about our data by incorporating prior knowledge. Their most obvious area of application is fitting a function to the data. This is called regression and is used, for example, in robotics or time series forecasting. But Gaussian processes are not limited to regression — they can also be extended to classification and clustering tasks. For a given set of training points, there are potentially infinitely many functions that fit the data. Gaussian processes offer an elegant solution to this problem by assigning a probability to each of these functions. The mean of this probability distribution then represents the most probable characterization of the data. Furthermore, using a probabilistic approach allows us to incorporate the confidence of the prediction into the regression result.
Let's see how you can use GP regression to accomplish what you want using the scikit-learn
library.
The code below is made of three functions:
train_gaussian_process
: your observationsx
andy
are used to train the GP.generate_samples_from_gp
: from the GP we've just trained, we sample 5 functions (in your case it will be 5000). The function values are stored in a 2D numpy array, where the i-th column represents the i-th function.draw_samples
: draw the observations and the sampled functions.
A critical component of a GP is the covariance function. You asked for smooth functions and you want to avoid oscillatory behavior, so the RBF kernel of the Matern one could be the right choice. The code below uses the Matern kernel. Play around with their parameters to get what you want.
import numpy as np
from matplotlib import pyplot as plt
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import Matern
rng = np.random.RandomState(42)
def main() -> None:
x = [7.81, 21.65, 186.29, 246.62, 402.74, 446.24, 572.99, 585.11, 613.57, 762.97]
y = [26.27, 26.41, 26.46, 27.36, 34.32, 34.50, 37.94, 39.05, 39.23, 39.88]
x = np.array(x).reshape((-1, 1))
y = np.array(y).reshape(-1, 1)
z = data_generating_process(x, y)
sample_size = 5 # replace with 5_000
gp_regressor = train_gaussian_process(x, y)
gp_regressor_3d = train_gaussian_process(np.column_stack((x, y)), z)
offset = 5.0
additional_x = np.linspace(min(x) - offset, max(x) + offset, 1000).reshape((-1, 1))
x_ = np.linspace(min(x) - offset, max(x) + offset, 50)
y_ = np.linspace(min(y) - offset, max(y) + offset, 50)
additional_xy = np.array([[_x, _y] for _x in x_ for _y in y_]).squeeze(-1)
samples = generate_samples_from_gp(gp_regressor, additional_x, sample_size)
samples_3d = generate_samples_from_gp(gp_regressor_3d, additional_xy, sample_size)
draw_samples(x, y, samples, additional_x)
draw_samples_3d(x, y, z, samples_3d, additional_xy)
def data_generating_process(x: np.ndarray, y: np.ndarray) -> np.ndarray:
"""
Generate points from a test function.
Parameters
----------
x : np.ndarray
y : np.ndarray
Returns
-------
np.ndarray
"""
return y + np.log(x)
def train_gaussian_process(x: np.ndarray, y: np.ndarray) -> GaussianProcessRegressor:
"""
Train Gaussian Process.
Parameters
----------
x : np.ndarray
y : np.ndarray
Returns
-------
GaussianProcessRegressor
"""
kernel = Matern(length_scale=4.0, nu=2.5)
gaussian_process = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9)
gaussian_process.fit(x, y)
return gaussian_process
def generate_samples_from_gp(
gaussian_process: GaussianProcessRegressor, points: np.ndarray, sample_size: int
) -> np.ndarray:
"""
Sample functions from Gaussian Process.
Parameters
----------
gaussian_process : np.ndarray
points : np.ndarray
sample_size : int
Returns
-------
np.ndarray
"""
return gaussian_process.sample_y(points, n_samples=sample_size)
def draw_samples(
x: np.ndarray, y: np.ndarray, samples: np.ndarray, points: np.ndarray
) -> None:
"""
Draw samples.
Parameters
----------
x : np.ndarray
y : np.ndarray
samples : np.ndarray
points : np.ndarray
"""
plt.figure(figsize=(14, 3))
plt.scatter(x, y, label="Observations")
for i in range(samples.shape[1]):
plt.plot(points, samples[:, i], linewidth=1.2)
plt.legend()
plt.show()
def draw_samples_3d(
x: np.ndarray, y: np.ndarray, z: np.ndarray, samples: np.ndarray, points: np.ndarray
) -> None:
"""
Draw samples.
Parameters
----------
x : np.ndarray
y : np.ndarray
z : np.ndarray
samples : np.ndarray
points : np.ndarray
"""
fig = plt.figure(figsize=(17, 4))
n = int(np.sqrt(points.shape[0]))
x_grid = points[:, 0].reshape((n, n))
y_grid = points[:, 1].reshape((n, n))
for i in range(samples.shape[1]):
axis = fig.add_subplot(1, 5, i + 1, projection="3d")
axis.scatter(x, y, z, label="Observations", s=100, alpha=1)
z_grid = samples[:, i].reshape((n, n))
axis.plot_surface(x_grid, y_grid, z_grid, cmap=plt.cm.plasma)
axis.set_xlabel("x-axis")
axis.set_ylabel("y-axis")
axis.set_zlabel("z-axis")
axis.set_title(f"Sample {i + 1}")
plt.legend()
plt.tight_layout()
plt.show()
if __name__ == "__main__":
main()
As you can see from the image below, all the functions will pass through your observations.
You can also get the mean function by typing:
mean_prediction = gp_regressor.predict(additional_x)
Here is the resulting plot with 5000 functions!
The code above is general so it's ready to be used for a n-dimensional use case. For instance, below 5 sample for a 3D case (I have suppose that the data generating process is x + ln(y)
, but you can choose whatever you want, I needed this equation to get the z
values for the training step).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论