2023年7月3日 21:53:50go评论199阅读模式

英文:

Sklearn SequentialFeatureSelector "Pipeline should either be a classifier" when using a classifier

问题

当在使用sklearn管道的分类器和SFS时，我遇到了以下错误：

Traceback (most recent call last):
  File "main.py", line 45, in <module>
    rs.fit(X_train, y_train)
  File "/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/base.py", line 1151, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/model_selection/_search.py", line 898, in fit
    self._run_search(evaluate_candidates)
  File "/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/model_selection/_search.py", line 1419, in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))
  File "/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/model_selection/_search.py", line 845, in evaluate_candidates
    out = parallel(
  File "/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/utils/parallel.py", line 65, in __call__
    return super().__call__(iterable_with_config)
  File "/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/joblib/parallel.py", line 1855, in __call__
    return output if self.return_generator else list(output)
  File "/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/joblib/parallel.py", line 1784, in _get_sequential_output
    res = func(*args, **kwargs)
  File "/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/utils/parallel.py", line 127, in __call__
    return self.function(*args, **kwargs)
  File "/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 754, in _fit_and_score
    test_scores = _score(estimator, X_test, y_test, scorer, error_score)
  File "/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 813, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/metrics/_scorer.py", line 266, in __call__
    return this._score(partial(_cached_call, None), estimator, X, y_true, **_kwargs)
  File "/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/metrics/_scorer.py", line 459, in _score
    y_pred = method_caller(clf, "decision_function", X, pos_label=pos_label)
  File "/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/metrics/_scorer.py", line 86, in _cached_call
    result, _ = _get_response_values(
  File "/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/utils/_response.py", line 103, in _get_response_values
    raise ValueError(
ValueError: Pipeline should either be a classifier to be used with response_method=decision_function or the response_method should be 'predict'. Got a regressor with response_method=decision_function instead.

要重现此问题的代码位于此处。

包版本：

Python = 3.10.8
scikit-learn = 1.3.0

英文:

I get this error when using a classifier and SFS as a part of sklearn pipeline:

Traceback (most recent call last):
  File &quot;main.py&quot;, line 45, in &lt;module&gt;
    rs.fit(X_train, y_train)
  File &quot;/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/base.py&quot;, line 1151, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File &quot;/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/model_selection/_search.py&quot;, line 898, in fit
    self._run_search(evaluate_candidates)
  File &quot;/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/model_selection/_search.py&quot;, line 1419, in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))
  File &quot;/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/model_selection/_search.py&quot;, line 845, in evaluate_candidates
    out = parallel(
  File &quot;/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/utils/parallel.py&quot;, line 65, in __call__
    return super().__call__(iterable_with_config)
  File &quot;/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/joblib/parallel.py&quot;, line 1855, in __call__
    return output if self.return_generator else list(output)
  File &quot;/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/joblib/parallel.py&quot;, line 1784, in _get_sequential_output
    res = func(*args, **kwargs)
  File &quot;/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/utils/parallel.py&quot;, line 127, in __call__
    return self.function(*args, **kwargs)
  File &quot;/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/model_selection/_validation.py&quot;, line 754, in _fit_and_score
    test_scores = _score(estimator, X_test, y_test, scorer, error_score)
  File &quot;/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/model_selection/_validation.py&quot;, line 813, in _score
    scores = scorer(estimator, X_test, y_test)
  File &quot;/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/metrics/_scorer.py&quot;, line 266, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true, **_kwargs)
  File &quot;/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/metrics/_scorer.py&quot;, line 459, in _score
    y_pred = method_caller(clf, &quot;decision_function&quot;, X, pos_label=pos_label)
  File &quot;/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/metrics/_scorer.py&quot;, line 86, in _cached_call
    result, _ = _get_response_values(
  File &quot;/home/runner/SFSpredictproba/venv/lib/python3.10/site-packages/sklearn/utils/_response.py&quot;, line 103, in _get_response_values
    raise ValueError(
ValueError: Pipeline should either be a classifier to be used with response_method=decision_function or the response_method should be &#39;predict&#39;. Got a regressor with response_method=decision_function instead.

Code to reproduce (replit):

clf = LogisticRegression()
cv = StratifiedKFold(n_splits=2)
sfs = SFS(clf, n_features_to_select=1, scoring=&#39;accuracy&#39;, cv=cv, n_jobs=-1)
imputer = SimpleImputer(missing_values=np.nan, strategy=&#39;median&#39;)
lr_param_grid = {
  &#39;sequentialfeatureselector__estimator__class_weight&#39;: [&#39;balanced&#39;, None]
}
pipe = make_pipeline(imputer, sfs)
rs = GridSearchCV(estimator=pipe,
                  param_grid=lr_param_grid,
                  cv=cv,
                  scoring=&quot;roc_auc&quot;,
                  error_score=&quot;raise&quot;)
# Generate random data for binary classification
X, y = make_classification(
  n_samples=10,  # Number of samples
  n_features=3,  # Number of features
  n_informative=2,  # Number of informative features
  n_redundant=1,  # Number of redundant features
  n_clusters_per_class=1,  # Number of clusters per class
  random_state=42)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
rs.fit(X_train, y_train)

I get the same error when using other classifiers, other performance metrics, and when using mlxtend version of SFS.

Versions of packages:

python = 3.10.8
scikit-learn = 1.3.0

答案1

得分: 1

你遇到的问题源于Sequential Feature Selector、GridSearchCV和所使用的评分方法之间的交互。

GridSearchCV在内部使用交叉验证来验证模型。对于一些评分方法，如'roc_auc'，它需要模型提供的类别概率。这些概率通常通过分类器的predict_proba()或decision_function()方法获得。

然而，SFS没有暴露所封装的分类器中的这些方法。因此，当GridSearchCV尝试应用评分函数'roc_auc'时，它会遇到错误，因为它无法访问所需的概率估计。

类似地，如果你将评分函数更改为'accuracy'或其他依赖于predict()方法的函数，你可能会遇到另一个问题，因为SFS没有暴露封装分类器的predict()方法。

这就是你看到的错误消息的根本原因 - 由于分类器在SFS中的封装，无法访问所需的方法。

至于mlxtend，看起来你可能遇到了相同的问题。如果mlxtend的Sequential Feature Selector也没有暴露predict_proba()、decision_function()或predict()方法，你将面临类似的问题。

英文:

The issue you're encountering stems from an interaction between the Sequential Feature Selector, GridSearchCV, and the scoring method being used.

GridSearchCV validates your model using cross-validation internally. For some scoring methods, such as 'roc_auc', it requires class probabilities provided by the model. These probabilities are typically obtained via the predict_proba() or decision_function() methods from the classifier.

However, the SFS does not expose these methods from the classifier it encapsulates. As a result, when GridSearchCV attempts to apply the scoring function 'roc_auc', it encounters an error because it cannot access the required probability estimates.

Similarly, if you change the scoring function to 'accuracy' or others that rely on the predict() method, you may face another issue since SFS does not expose the predict() method from the encapsulated classifier.

This is the root cause of the error message you're seeing - a lack of access to required methods due to the encapsulation of the classifier within the SFS.

As for mlxtend, it seems probable that you're encountering the same issue. If mlxtend's Sequential Feature Selector also does not expose the predict_proba(), decision_function(), or predict() methods from the encapsulated classifier, you would face a similar problem.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Sklearn SequentialFeatureSelector：“Pipeline 应该是一个分类器”，当使用分类器时

问题

答案1

无法点击按钮 Selenium Python

将Django中的DecimalField格式化为货币金额。

如何在VS Code中突出显示Python函数调用？

Python：如何从具有相同维度的多个数据框创建唯一的数据框

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。