Jupyter notebook在使用SVM核进行训练时需要无限的时间。

huangapple go评论84阅读模式
英文:

Jupyter notebook taking infinite time to train over an SVM kernel

问题

我正在尝试手动编写一个核函数,并使用支持向量机进行分类,目的是使用这个手动核函数。我的数据集为X,标签为y。我简单地定义了一个核函数,并用它来拟合训练数据集。但是它花费了无限的时间才给出任何结果。

你能给我一些线索吗?

我还有以下问题:

  1. 可能是因为我有一个稀疏的数据集吗?如果是这样,我该如何处理稀疏数据?
  2. 在尝试新的核函数时,为什么总是显示“X.shape[0]应该等于X.shape1”(在我的问题中,我将训练数据设置为方阵,以避免出现此错误)?我尝试了一些文档,但我觉得我漏掉了一些东西。你能给我提供任何相关文章吗?

我尝试了以下代码:

# 将数据集拆分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
def my_kernel(x,z):
   return sqrt(exp(exp(z)))

clf = SVC(kernel=my_kernel)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
from sklearn import metrics

# 模型准确率:分类器的正确率有多高?
print("准确率:",metrics.accuracy_score(y_test, y_pred))

这段简单的代码一直在无限循环中,没有给出任何结果。
另外,如果我取X_train这样的值,即X_train的行数不等于X_train的列数,我会得到以下错误:

ValueError                                Traceback (most recent call last)
Cell In[4], line 26
     23     return sqrt(exp(exp(z)))
     25 clf = SVC(kernel=my_kernel)
---> 26 clf.fit(X_train, y_train)
     27 y_pred = clf.predict(X_test)
     28 from sklearn import metrics

File ~\anaconda3\lib\site-packages\sklearn\svm\_base.py:252, in BaseLibSVM.fit(self, X, y,     sample_weight)
    249     print(" [LibSVM]", end="")
    251 seed = rnd.randint(np.iinfo("i").max)
--> 252 fit(X, y, sample_weight, solver_type, kernel, random_seed=seed)
    253 # see comment on the other call to np.iinfo in this file
    255 self.shape_fit_ = X.shape if hasattr(X, "shape") else (n_samples,)

File ~\anaconda3\lib\site-packages\sklearn\svm\_base.py:315, in BaseLibSVM._dense_fit(self, X, y, sample_weight, solver_type, kernel, random_seed)
    312     X = self._compute_kernel(X)
    314     if X.shape[0] != X.shape[1]:
--> 315         raise ValueError("X.shape[0] should be equal to X.shape[1]")
    317 libsvm.set_verbosity_wrap(self.verbose)
    319 # we don't pass **self.get_params() to allow subclasses to
    320 # add other parameters to __init__

ValueError: X.shape[0] should be equal to X.shape[1]

任何帮助将不胜感激。

英文:

I was trying to write a kernel function manually and do the classification using Support Vector Machine, which was intended to use this manual kernel. I have my dataset as X and labels as y. I simply defined a kernel function, and used it to fit the training dataset. But it is taking infinite time to give any result.

Can you give me any lead?

I also have the following questions:

  1. Can it possibly be because I have a sparse dataset? If so, then how can I deal with sparse data?
  2. While trying a new kernel, why is it always showing "X.shape[0] should be equal to X.shape1" (In my problem, I took the training data to be square matrix so that this error does not appear)? I trid some documentations but I think I am missing something. Can you provide me any article over this?

I tried the following code:

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
def my_kernel(x,z):
   return sqrt(exp(exp(z)))

clf = SVC(kernel=my_kernel)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
from sklearn import metrics

# Model Accuracy: how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

This simple code is just running infinintely without any result.
Also, if I am taking X_train such that, number of rows in X_train!=number of columns in X_train, I get the following error:

ValueError                                Traceback (most recent call last)
Cell In[4], line 26
     23     return sqrt(exp(exp(z)))
     25 clf = SVC(kernel=my_kernel)
---> 26 clf.fit(X_train, y_train)
     27 y_pred = clf.predict(X_test)
     28 from sklearn import metrics

File ~\anaconda3\lib\site-packages\sklearn\svm\_base.py:252, in BaseLibSVM.fit(self, X, y,     sample_weight)
    249     print("[LibSVM]", end="")
    251 seed = rnd.randint(np.iinfo("i").max)
--> 252 fit(X, y, sample_weight, solver_type, kernel, random_seed=seed)
    253 # see comment on the other call to np.iinfo in this file
    255 self.shape_fit_ = X.shape if hasattr(X, "shape") else (n_samples,)

File ~\anaconda3\lib\site-packages\sklearn\svm\_base.py:315, in BaseLibSVM._dense_fit(self, X, y, sample_weight, solver_type, kernel, random_seed)
    312     X = self._compute_kernel(X)
    314     if X.shape[0] != X.shape[1]:
--> 315         raise ValueError("X.shape[0] should be equal to X.shape[1]")
    317 libsvm.set_verbosity_wrap(self.verbose)
    319 # we don't pass **self.get_params() to allow subclasses to
    320 # add other parameters to __init__

ValueError: X.shape[0] should be equal to X.shape[1]

Any help will be appreciated.

答案1

得分: 1

核函数接收两个数据矩阵作为输入:n_samples_1 x n_featuresn_samples_2 x n_features(在训练时它们可能是相同的)。你的函数应该返回一个大小为 n_samples_1 x n_samples_2 的矩阵。换句话说,你的核函数需要对矩阵1中的每个样本与矩阵2中的每个样本计算核函数的值。然后,你对矩阵1中的下一个样本进行计算,与矩阵2中的每个样本计算核函数的值。最终得到一个大小为 n_samples_1 x n_samples_2 的矩阵,其中每个元素是两个样本之间的核函数值。

你当前的核函数返回一个大小为 n_samples_2 x n_features 的矩阵。如果你改为在特征的点积上运行核函数:kernel_ij = sqrt(exp(exp( feat0_i*feat0_j + feat1_i*feat1_j + ... )),它将给出一个具有正确维度 n_samples_1 x n_sampels_2 的结果矩阵:

def my_kernel(X, Y):
    return np.sqrt(np.exp(np.exp(X @ Y.T)))

下面的代码示例执行了这个操作并绘制了决策边界。

Jupyter notebook在使用SVM核进行训练时需要无限的时间。

在3D中查看:

Jupyter notebook在使用SVM核进行训练时需要无限的时间。

from sklearn.svm import SVC
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

np.random.seed(0)

#合成数据
n_pts = 200
y = np.hstack([np.ones(n_pts // 2), np.zeros(n_pts // 2)])
X = np.hstack([np.sin(np.linspace(0, 2 * np.pi, n_pts)).reshape(-1, 1),
               np.cos(np.linspace(0, 2 * np.pi, n_pts)).reshape(-1, 1)]) +\
                   np.random.randn(n_pts, 2) / 10

#定义一个核函数并拟合分类器
def my_kernel(X, Y):
    return np.sqrt(np.exp(np.exp(X @ Y.T)))

clf = SVC(kernel=my_kernel)
clf.fit(X, y)

print('分类器的训练得分为:', clf.score(X, y))

#
#绘图
#显示原始数据和决策边界
#
f, ax = plt.subplots(figsize=(5, 5))
ax.scatter(X[:, 0], X[:, 1], c=y, zorder=2, cmap='bwr', marker='s', s=40)
ax.set_xlabel('特征 0')
ax.set_ylabel('特征 1')

xx, yy = np.meshgrid(
    np.linspace(X[:, 0].min(), X[:, 0].max()),
    np.linspace(X[:, 1].min(), X[:, 1].max())
)
feat_space = np.stack([xx.ravel(), yy.ravel()], axis=1)

predictions = clf.predict(feat_space).reshape(xx.shape)
decision_vals = clf.decision_function(feat_space).reshape(xx.shape)

#使用预测类进行填充
cont_fill = ax.contourf(xx, yy, predictions, alpha=0.3, cmap='bwr')

#叠加光源阴影-光源帮助突出地形
# shaded = matplotlib.colors.LightSource().shade_rgb(matplotlib.cm.bwr(predictions),
#                                                    decision_vals,
#                                                    vert_exag=3,
#                                                    fraction=1)
# ax.imshow(shaded, alpha=0.5, extent=ax.axes.axis())

#突出显示决策地形的等高线
cont_lines = ax.contour(xx, yy, decision_vals, levels=10, cmap='cool', alpha=0.8)
ax.clabel(cont_lines, inline=True, fontsize=10, colors='k')

ax.set_title('样本(方块)叠加到区域\n'
             '预测(背景)和决策函数等高线')

对于3D图:

#3D
decision_vals_at_samples = clf.decision_function(X)

ax3d = plt.figure(figsize=(5, 5)).add_subplot(projection='3d')
ax3d.view_init(elev=15, azim=120, roll=0)
ax3d.scatter3D(X[:, 0], X[:, 1], decision_vals_at_samples, c=y, cmap='bwr', alpha=0.7)
surf = ax3d.plot_surface(xx, yy, decision_vals,
                         linewidth=0.1, rcount=30, ccount=30,
                         facecolors=matplotlib.cm.bwr(decision_vals), facecolor=[0,0,0,0])
# ax3d.plot_wireframe(xx, yy, decision_vals, cmap='bwr', linewidth=0.2)
ax3d.set_xlabel('特征 0')
ax3d.set_ylabel('特征 1')
ax3d.set_zlabel('决策函数')
ax3d.set_title('样本叠加到决策函数(网格)\n'
               '网格按类别预测着色')

关于你在评论中关于对单个变量进行操作的问题,你可以按照以下方式进行操作。

def my_kernel_absX(X, Y):
    select_feature = 0 #将函数应用于第一个特征
    kernel_x = np.abs(X[:, 0]).reshape(-1, 1)
    kernel_rpt_for_y = np.tile(kernel_x, [1, len(Y)])
    return kernel_rpt_for_y

def my_kernel_absY(X, Y):
    select_feature = 0 #将函数应用于第一个特征
    kernel_y = np.abs(Y[:, select_feature]).reshape(1, -1)
    kernel_rpt_for_x = np.tile(kernel_y, [len(X), 1])
    return kernel_rpt_for_x

核函数通常定义为一对变量,所以我认为上面的函数不算是核函数。

英文:

The kernel function is given two data matrices: n_samples_1 x n_features and n_samples_2 x n_features (at train time I think they're the same). Your function should return a matrix of size n_samples_1 x n_samples_2. In other words, your kernel function needs to take each sample in matrix 1, and compute the kernel between that sample and each sample in matrix 2. Then you take the next sample in matrix 1, and compute the kernel with every sample in matrix 2. You end up with a matrix of size n_samples_1 x n_samples_2, where each entry is the value of the kernel function between two samples.

Your current kernel function returns a matrix of size n_samples_2 x n_features. If you instead run your kernel on the dot product of your features: kernel_ij = sqrt(exp(exp( feat0_i*feat0_j + feat1_i*feat1_j + ... )), it will give you a result matrix that has the right dimensions n_samples_1 x n_sampels_2:

def my_kernel(X, Y):
    return np.sqrt(np.exp(np.exp(X @ Y.T)))

The code example below does this and plots the decision boundary.

Jupyter notebook在使用SVM核进行训练时需要无限的时间。

Viewed in 3D:

Jupyter notebook在使用SVM核进行训练时需要无限的时间。

from sklearn.svm import SVC
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

np.random.seed(0)

#Synthetic data
n_pts = 200
y = np.hstack([np.ones(n_pts // 2), np.zeros(n_pts // 2)])
X = np.hstack([np.sin(np.linspace(0, 2 * np.pi, n_pts)).reshape(-1, 1),
               np.cos(np.linspace(0, 2 * np.pi, n_pts)).reshape(-1, 1)]) +\
                   np.random.randn(n_pts, 2) / 10

#Define a kernel and fit classifier
def my_kernel(X, Y):
    return np.sqrt(np.exp(np.exp(X @ Y.T)))

clf = SVC(kernel=my_kernel)
clf.fit(X, y)

print('Classifier train score is:', clf.score(X, y))

#
#Plots
#Show the original data, and the decision boundaries
#
f, ax = plt.subplots(figsize=(5, 5))
ax.scatter(X[:, 0], X[:, 1], c=y, zorder=2, cmap='bwr', marker='s', s=40)
ax.set_xlabel('feature 0')
ax.set_ylabel('feature 1')

xx, yy = np.meshgrid(
    np.linspace(X[:, 0].min(), X[:, 0].max()),
    np.linspace(X[:, 1].min(), X[:, 1].max())
)
feat_space = np.stack([xx.ravel(), yy.ravel()], axis=1)

predictions = clf.predict(feat_space).reshape(xx.shape)
decision_vals = clf.decision_function(feat_space).reshape(xx.shape)

#Floodfill with predicted class
cont_fill = ax.contourf(xx, yy, predictions, alpha=0.3, cmap='bwr')

#Overlay lightsource shading - lighsource help brings out the topography
# shaded = matplotlib.colors.LightSource().shade_rgb(matplotlib.cm.bwr(predictions),
#                                                    decision_vals,
#                                                    vert_exag=3,
#                                                    fraction=1)
# ax.imshow(shaded, alpha=0.5, extent=ax.axes.axis())

#Contours highlighting the decision terrain
cont_lines = ax.contour(xx, yy, decision_vals, levels=10, cmap='cool', alpha=0.8)
ax.clabel(cont_lines, inline=True, fontsize=10, colors='k')

ax.set_title('Samples (squares) overlaid onto region\n'
             'predictions (background), with decision\nfunction contours')

For the 3D plot:

#3D
decision_vals_at_samples = clf.decision_function(X)

ax3d = plt.figure(figsize=(5, 5)).add_subplot(projection='3d')
ax3d.view_init(elev=15, azim=120, roll=0)
ax3d.scatter3D(X[:, 0], X[:, 1], decision_vals_at_samples, c=y, cmap='bwr', alpha=0.7)
surf = ax3d.plot_surface(xx, yy, decision_vals,
                         linewidth=0.1, rcount=30, ccount=30,
                         facecolors=matplotlib.cm.bwr(decision_vals), facecolor=[0,0,0,0])
# ax3d.plot_wireframe(xx, yy, decision_vals, cmap='bwr', linewidth=0.2)
ax3d.set_xlabel('feature 0')
ax3d.set_ylabel('feature 1')
ax3d.set_zlabel('decision function')
ax3d.set_title('Samples overlaid onto decision function (mesh)\n'
               'Mesh coloured by class prediction')

Regarding your question in the comments about operating on a single variable, you can do it as follows.

def my_kernel_absX(X, Y):
    select_feature = 0 #apply function to the first feature
    kernel_x = np.abs(X[:, 0]).reshape(-1, 1)
    kernel_rpt_for_y = np.tile(kernel_x, [1, len(Y)])
    return kernel_rpt_for_y

def my_kernel_absY(X, Y):
    select_feature = 0 #apply function to the first feature
    kernel_y = np.abs(Y[:, select_feature]).reshape(1, -1)
    kernel_rpt_for_x = np.tile(kernel_y, [len(X), 1])
    return kernel_rpt_for_x

A kernel function is usually defined for a pair of variables, so I don't think the functions above would count as kernel functions.

huangapple
  • 本文由 发表于 2023年8月8日 20:03:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76859378.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定