2023年2月27日 06:35:15go评论104阅读模式

英文:

continuous data, Y response not support in the cross_val_score() binary|multiclass for IterativeImputer for BayesianRidge

问题

以下是代码部分的翻译：

def imputer_regressor_bay_ridge(data, y):
    data_array = data.values.  ##看起来没问题
    interative_imputer = IterativeImputer(BayesianRidge()).  ##运行正常
    interative_imputer_fit = interative_imputer.fit(data_array)  ##运行正常
    data_imputed = interative_imputer_fit.transform(data_array)  ##运行正常
    cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)  ##运行正常
    scores = cross_val_score(interative_imputer, data_array, y, 
                             scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')
    
    return scores, data_imputed

如果您需要更多帮助，请随时提出具体问题。

英文:

Problem Defined, Continuous Challenge

This new imputer_bayesian_ridge() function is for Iterative Imputer to impute training data. Sending in data frame training data, then immediately get data.values for numpy array variable. This send or passes a training data with many features, and Y response variable. This effort is only seeking to impute on one single feature.

Apparently my continuous data, Y response data, which is price $$$$ continuous data, is not supported in the cross_val_score(interative_imputer, data_array).

So what advise on how to work with continuous data in Y response variable to work with Iterative Imputer and satisfy the cross_val_score for the object 'interativea_imputer'

To support the target type, should I cast my continuous data in Y response variable to binary? No. For this is not a binary classification, so multiclass is more in line. So how to handle price data when it is the response variable?

Error Received

ValueError: Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead.

CODE

   
def imputer_regressor_bay_ridge(data, y):
    data_array = data.values. ##looks OK
    interative_imputer = IterativeImputer(BayesianRidge()). ## runs OK
    interative_imputer_fit = interative_imputer.fit(data_array) ## runs OK
    data_imputed = interative_imputer_fit.transform(data_array) ## runs OK
    cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) ## runs OK
    scores = cross_val_score(interative_imputer, data_array, y, 
                             scoring=&#39;accuracy&#39;, cv=cv, n_jobs=-1, error_score=&#39;raise&#39;)
    
    return scores, data_imputed

DATA SAMPLE

print(train_data.shape)
data_array = train_data.values
data_array
(1460, 250)
array([[-1.73086488, -0.20803433, -0.20714171, ..., -0.11785113,
         0.4676514 , -0.30599503],
       [-1.7284922 ,  0.40989452, -0.09188637, ..., -0.11785113,
         0.4676514 , -0.30599503],
       [-1.72611953, -0.08444856,  0.07347998, ..., -0.11785113,
         0.4676514 , -0.30599503],
       ...,
       [ 1.72611953, -0.16683907, -0.14781027, ..., -0.11785113,
         0.4676514 , -0.30599503],
       [ 1.7284922 , -0.08444856, -0.08016039, ..., -0.11785113,
         0.4676514 , -0.30599503],
       [ 1.73086488,  0.20391824, -0.05811155, ..., -0.11785113,
         0.4676514 , -0.30599503]])
y = train_data[&#39;ResponseY&#39;].values
y.shape  
(1460,)
array([ 0.34727322,  0.00728832,  0.53615372, ...,  1.07761115,
       -0.48852299, -0.42084081])

Value Error

Apparently my continuous data, which is price $ data, is not supported in cross_val_score(interative_imputer, data_array on:

ValueError: Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead.

Empty                                     Traceback (most recent call last)
File ~/opt/anaconda3/lib/python3.9/site-packages/joblib/parallel.py:820, in Parallel.dispatch_one_batch(self, iterator)
819 try:
--&gt; 820     tasks = self._ready_batches.get(block=False)
821 except queue.Empty:
822     # slice the iterator n_jobs * batchsize items at a time. If the
823     # slice returns less than that, then the current batchsize puts
(...)
826     # accordingly to distribute evenly the last items between all
827     # workers.
File ~/opt/anaconda3/lib/python3.9/queue.py:168, in Queue.get(self, block, timeout)
167     if not self._qsize():
--&gt; 168         raise Empty
169 elif timeout is None:
Empty: 
During handling of the above exception, another exception occurred:
ValueError                                Traceback (most recent call last)
Cell In[27], line 5
3 #train_data, test_data = minmaxscaler(train_data, test_data)  # alternate run for min-max scaler
4 columns, imputed_df = imputer_regressor(train_data)
----&gt; 5 scores, data_imputed = imputer_regressor_bay_ridge(train_data, y)
7 misTrain = whichColumnsMissing(train_data)
8 misTest = whichColumnsMissing(test_data)
Cell In[24], line 110, in imputer_regressor_bay_ridge(data, y)
108 data_imputed = interative_imputer_fit.transform(data_array)
109 cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
--&gt; 110 scores = cross_val_score(interative_imputer, data_array, 
111                          y, scoring=&#39;accuracy&#39;, cv=cv, n_jobs=-1, error_score=&#39;raise&#39;)
113 return scores, data_imputed
File ~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:509, in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, error_score)
506 # To ensure multimetric format is not supported
507 scorer = check_scoring(estimator, scoring=scoring)
--&gt; 509 cv_results = cross_validate(
510     estimator=estimator,
511     X=X,
512     y=y,
513     groups=groups,
514     scoring={&quot;score&quot;: scorer},
515     cv=cv,
516     n_jobs=n_jobs,
517     verbose=verbose,
518     fit_params=fit_params,
519     pre_dispatch=pre_dispatch,
520     error_score=error_score,
521 )
522 return cv_results[&quot;test_score&quot;]
File ~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:267, in cross_validate(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, return_train_score, return_estimator, error_score)
264 # We clone the estimator to make sure that all the folds are
265 # independent, and that it is pickle-able.
266 parallel = Parallel(n_jobs=n_jobs, verbose=verbose, pre_dispatch=pre_dispatch)
--&gt; 267 results = parallel(
268     delayed(_fit_and_score)(
269         clone(estimator),
270         X,
271         y,
272         scorers,
273         train,
274         test,
275         verbose,
276         None,
277         fit_params,
278         return_train_score=return_train_score,
279         return_times=True,
280         return_estimator=return_estimator,
281         error_score=error_score,
282     )
283     for train, test in cv.split(X, y, groups)
284 )
286 _warn_about_fit_failures(results, error_score)
288 # For callabe scoring, the return type is only know after calling. If the
289 # return type is a dictionary, the error scores can now be inserted with
290 # the correct key.
File ~/opt/anaconda3/lib/python3.9/site-packages/joblib/parallel.py:1041, in Parallel.__call__(self, iterable)
1032 try:
1033     # Only set self._iterating to True if at least a batch
1034     # was dispatched. In particular this covers the edge
(...)
1038     # was very quick and its callback already dispatched all the
1039     # remaining jobs.
1040     self._iterating = False
-&gt; 1041     if self.dispatch_one_batch(iterator):
1042         self._iterating = self._original_iterator is not None
1044     while self.dispatch_one_batch(iterator):
File ~/opt/anaconda3/lib/python3.9/site-packages/joblib/parallel.py:831, in Parallel.dispatch_one_batch(self, iterator)
828 n_jobs = self._cached_effective_n_jobs
829 big_batch_size = batch_size * n_jobs
--&gt; 831 islice = list(itertools.islice(iterator, big_batch_size))
832 if len(islice) == 0:
833     return False
File ~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:267, in &lt;genexpr&gt;(.0)
264 # We clone the estimator to make sure that all the folds are
265 # independent, and that it is pickle-able.
266 parallel = Parallel(n_jobs=n_jobs, verbose=verbose, pre_dispatch=pre_dispatch)
--&gt; 267 results = parallel(
268     delayed(_fit_and_score)(
269         clone(estimator),
270         X,
271         y,
272         scorers,
273         train,
274         test,
275         verbose,
276         None,
277         fit_params,
278         return_train_score=return_train_score,
279         return_times=True,
280         return_estimator=return_estimator,
281         error_score=error_score,
282     )
283     for train, test in cv.split(X, y, groups)
284 )
286 _warn_about_fit_failures(results, error_score)
288 # For callabe scoring, the return type is only know after calling. If the
289 # return type is a dictionary, the error scores can now be inserted with
290 # the correct key.
File ~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_split.py:1411, in _RepeatedSplits.split(self, X, y, groups)
1409 for idx in range(n_repeats):
1410     cv = self.cv(random_state=rng, shuffle=True, **self.cvargs)
-&gt; 1411     for train_index, test_index in cv.split(X, y, groups):
1412         yield train_index, test_index
File ~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_split.py:340, in _BaseKFold.split(self, X, y, groups)
332 if self.n_splits &gt; n_samples:
333     raise ValueError(
334         (
335             &quot;Cannot have number of splits n_splits={0} greater&quot;
336             &quot; than the number of samples: n_samples={1}.&quot;
337         ).format(self.n_splits, n_samples)
338     )
--&gt; 340 for train, test in super().split(X, y, groups):
341     yield train, test
File ~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_split.py:86, in BaseCrossValidator.split(self, X, y, groups)
84 X, y, groups = indexable(X, y, groups)
85 indices = np.arange(_num_samples(X))
---&gt; 86 for test_index in self._iter_test_masks(X, y, groups):
87     train_index = indices[np.logical_not(test_index)]
88     test_index = indices[test_index]
File ~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_split.py:709, in StratifiedKFold._iter_test_masks(self, X, y, groups)
708 def _iter_test_masks(self, X, y=None, groups=None):
--&gt; 709     test_folds = self._make_test_folds(X, y)
710     for i in range(self.n_splits):
711         yield test_folds == i
File ~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_split.py:652, in StratifiedKFold._make_test_folds(self, X, y)
650 allowed_target_types = (&quot;binary&quot;, &quot;multiclass&quot;)
651 if type_of_target_y not in allowed_target_types:
--&gt; 652     raise ValueError(
653         &quot;Supported target types are: {}. Got {!r} instead.&quot;.format(
654             allowed_target_types, type_of_target_y
655         )
656     )
658 y = column_or_1d(y)
660 _, y_idx, y_inv = np.unique(y, return_index=True, return_inverse=True)
ValueError: Supported target types are: (&#39;binary&#39;, &#39;multiclass&#39;). Got &#39;continuous&#39; instead.

答案1

得分: 2

在cross_val_score中，使用scoring='accuracy'仅适用于二元或多类目标。
应该使用scoring=None或其他适用于连续目标的评分。请参阅回归（'neg_mean_absolute_error'，'neg_mean_squared_error'，...）。

英文:

In cross_val_score, the use of scoring='accuracy' is only for binary or multiclass targets.
You should use instead scoring=None or some other scoring adequate for continuous targets. See regression ('neg_mean_absolute_error', 'neg_mean_squared_error', ...).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

continuous data, Y response not support in the cross_val_score() binary|multiclass for IterativeImputer for BayesianRidge

问题

答案1

如何获取您在Github项目中编写了多少行代码以及删除了多少行代码

Pandas从字符串中提取在列表中出现的短语。

如何将包含对象的列拆分成多个列？

新的针孔相机内参矩阵适用于裁剪后的图像。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。