continuous data, Y response not support in the cross_val_score() binary|multiclass for IterativeImputer for BayesianRidge

huangapple go评论104阅读模式
英文:

continuous data, Y response not support in the cross_val_score() binary|multiclass for IterativeImputer for BayesianRidge

问题

以下是代码部分的翻译:

  1. def imputer_regressor_bay_ridge(data, y):
  2. data_array = data.values. ##看起来没问题
  3. interative_imputer = IterativeImputer(BayesianRidge()). ##运行正常
  4. interative_imputer_fit = interative_imputer.fit(data_array) ##运行正常
  5. data_imputed = interative_imputer_fit.transform(data_array) ##运行正常
  6. cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) ##运行正常
  7. scores = cross_val_score(interative_imputer, data_array, y,
  8. scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')
  9. return scores, data_imputed

如果您需要更多帮助,请随时提出具体问题。

英文:

Problem Defined, Continuous Challenge

This new imputer_bayesian_ridge() function is for Iterative Imputer to impute training data. Sending in data frame training data, then immediately get data.values for numpy array variable. This send or passes a training data with many features, and Y response variable. This effort is only seeking to impute on one single feature.

Apparently my continuous data, Y response data, which is price $$$$ continuous data, is not supported in the cross_val_score(interative_imputer, data_array).

So what advise on how to work with continuous data in Y response variable to work with Iterative Imputer and satisfy the cross_val_score for the object 'interativea_imputer'

To support the target type, should I cast my continuous data in Y response variable to binary? No. For this is not a binary classification, so multiclass is more in line. So how to handle price data when it is the response variable?

Error Received

ValueError: Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead.

CODE

  1. def imputer_regressor_bay_ridge(data, y):
  2. data_array = data.values. ##looks OK
  3. interative_imputer = IterativeImputer(BayesianRidge()). ## runs OK
  4. interative_imputer_fit = interative_imputer.fit(data_array) ## runs OK
  5. data_imputed = interative_imputer_fit.transform(data_array) ## runs OK
  6. cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) ## runs OK
  7. scores = cross_val_score(interative_imputer, data_array, y,
  8. scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')
  9. return scores, data_imputed

DATA SAMPLE

  1. print(train_data.shape)
  2. data_array = train_data.values
  3. data_array
  4. (1460, 250)
  5. array([[-1.73086488, -0.20803433, -0.20714171, ..., -0.11785113,
  6. 0.4676514 , -0.30599503],
  7. [-1.7284922 , 0.40989452, -0.09188637, ..., -0.11785113,
  8. 0.4676514 , -0.30599503],
  9. [-1.72611953, -0.08444856, 0.07347998, ..., -0.11785113,
  10. 0.4676514 , -0.30599503],
  11. ...,
  12. [ 1.72611953, -0.16683907, -0.14781027, ..., -0.11785113,
  13. 0.4676514 , -0.30599503],
  14. [ 1.7284922 , -0.08444856, -0.08016039, ..., -0.11785113,
  15. 0.4676514 , -0.30599503],
  16. [ 1.73086488, 0.20391824, -0.05811155, ..., -0.11785113,
  17. 0.4676514 , -0.30599503]])
  18. y = train_data['ResponseY'].values
  19. y.shape
  20. (1460,)
  21. array([ 0.34727322, 0.00728832, 0.53615372, ..., 1.07761115,
  22. -0.48852299, -0.42084081])

Value Error

Apparently my continuous data, which is price $ data, is not supported in cross_val_score(interative_imputer, data_array on:

ValueError: Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead.

  1. Empty Traceback (most recent call last)
  2. File ~/opt/anaconda3/lib/python3.9/site-packages/joblib/parallel.py:820, in Parallel.dispatch_one_batch(self, iterator)
  3. 819 try:
  4. --> 820 tasks = self._ready_batches.get(block=False)
  5. 821 except queue.Empty:
  6. 822 # slice the iterator n_jobs * batchsize items at a time. If the
  7. 823 # slice returns less than that, then the current batchsize puts
  8. (...)
  9. 826 # accordingly to distribute evenly the last items between all
  10. 827 # workers.
  11. File ~/opt/anaconda3/lib/python3.9/queue.py:168, in Queue.get(self, block, timeout)
  12. 167 if not self._qsize():
  13. --> 168 raise Empty
  14. 169 elif timeout is None:
  15. Empty:
  16. During handling of the above exception, another exception occurred:
  17. ValueError Traceback (most recent call last)
  18. Cell In[27], line 5
  19. 3 #train_data, test_data = minmaxscaler(train_data, test_data) # alternate run for min-max scaler
  20. 4 columns, imputed_df = imputer_regressor(train_data)
  21. ----> 5 scores, data_imputed = imputer_regressor_bay_ridge(train_data, y)
  22. 7 misTrain = whichColumnsMissing(train_data)
  23. 8 misTest = whichColumnsMissing(test_data)
  24. Cell In[24], line 110, in imputer_regressor_bay_ridge(data, y)
  25. 108 data_imputed = interative_imputer_fit.transform(data_array)
  26. 109 cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
  27. --> 110 scores = cross_val_score(interative_imputer, data_array,
  28. 111 y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')
  29. 113 return scores, data_imputed
  30. File ~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:509, in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, error_score)
  31. 506 # To ensure multimetric format is not supported
  32. 507 scorer = check_scoring(estimator, scoring=scoring)
  33. --> 509 cv_results = cross_validate(
  34. 510 estimator=estimator,
  35. 511 X=X,
  36. 512 y=y,
  37. 513 groups=groups,
  38. 514 scoring={"score": scorer},
  39. 515 cv=cv,
  40. 516 n_jobs=n_jobs,
  41. 517 verbose=verbose,
  42. 518 fit_params=fit_params,
  43. 519 pre_dispatch=pre_dispatch,
  44. 520 error_score=error_score,
  45. 521 )
  46. 522 return cv_results["test_score"]
  47. File ~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:267, in cross_validate(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, return_train_score, return_estimator, error_score)
  48. 264 # We clone the estimator to make sure that all the folds are
  49. 265 # independent, and that it is pickle-able.
  50. 266 parallel = Parallel(n_jobs=n_jobs, verbose=verbose, pre_dispatch=pre_dispatch)
  51. --> 267 results = parallel(
  52. 268 delayed(_fit_and_score)(
  53. 269 clone(estimator),
  54. 270 X,
  55. 271 y,
  56. 272 scorers,
  57. 273 train,
  58. 274 test,
  59. 275 verbose,
  60. 276 None,
  61. 277 fit_params,
  62. 278 return_train_score=return_train_score,
  63. 279 return_times=True,
  64. 280 return_estimator=return_estimator,
  65. 281 error_score=error_score,
  66. 282 )
  67. 283 for train, test in cv.split(X, y, groups)
  68. 284 )
  69. 286 _warn_about_fit_failures(results, error_score)
  70. 288 # For callabe scoring, the return type is only know after calling. If the
  71. 289 # return type is a dictionary, the error scores can now be inserted with
  72. 290 # the correct key.
  73. File ~/opt/anaconda3/lib/python3.9/site-packages/joblib/parallel.py:1041, in Parallel.__call__(self, iterable)
  74. 1032 try:
  75. 1033 # Only set self._iterating to True if at least a batch
  76. 1034 # was dispatched. In particular this covers the edge
  77. (...)
  78. 1038 # was very quick and its callback already dispatched all the
  79. 1039 # remaining jobs.
  80. 1040 self._iterating = False
  81. -> 1041 if self.dispatch_one_batch(iterator):
  82. 1042 self._iterating = self._original_iterator is not None
  83. 1044 while self.dispatch_one_batch(iterator):
  84. File ~/opt/anaconda3/lib/python3.9/site-packages/joblib/parallel.py:831, in Parallel.dispatch_one_batch(self, iterator)
  85. 828 n_jobs = self._cached_effective_n_jobs
  86. 829 big_batch_size = batch_size * n_jobs
  87. --> 831 islice = list(itertools.islice(iterator, big_batch_size))
  88. 832 if len(islice) == 0:
  89. 833 return False
  90. File ~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:267, in <genexpr>(.0)
  91. 264 # We clone the estimator to make sure that all the folds are
  92. 265 # independent, and that it is pickle-able.
  93. 266 parallel = Parallel(n_jobs=n_jobs, verbose=verbose, pre_dispatch=pre_dispatch)
  94. --> 267 results = parallel(
  95. 268 delayed(_fit_and_score)(
  96. 269 clone(estimator),
  97. 270 X,
  98. 271 y,
  99. 272 scorers,
  100. 273 train,
  101. 274 test,
  102. 275 verbose,
  103. 276 None,
  104. 277 fit_params,
  105. 278 return_train_score=return_train_score,
  106. 279 return_times=True,
  107. 280 return_estimator=return_estimator,
  108. 281 error_score=error_score,
  109. 282 )
  110. 283 for train, test in cv.split(X, y, groups)
  111. 284 )
  112. 286 _warn_about_fit_failures(results, error_score)
  113. 288 # For callabe scoring, the return type is only know after calling. If the
  114. 289 # return type is a dictionary, the error scores can now be inserted with
  115. 290 # the correct key.
  116. File ~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_split.py:1411, in _RepeatedSplits.split(self, X, y, groups)
  117. 1409 for idx in range(n_repeats):
  118. 1410 cv = self.cv(random_state=rng, shuffle=True, **self.cvargs)
  119. -> 1411 for train_index, test_index in cv.split(X, y, groups):
  120. 1412 yield train_index, test_index
  121. File ~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_split.py:340, in _BaseKFold.split(self, X, y, groups)
  122. 332 if self.n_splits > n_samples:
  123. 333 raise ValueError(
  124. 334 (
  125. 335 "Cannot have number of splits n_splits={0} greater"
  126. 336 " than the number of samples: n_samples={1}."
  127. 337 ).format(self.n_splits, n_samples)
  128. 338 )
  129. --> 340 for train, test in super().split(X, y, groups):
  130. 341 yield train, test
  131. File ~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_split.py:86, in BaseCrossValidator.split(self, X, y, groups)
  132. 84 X, y, groups = indexable(X, y, groups)
  133. 85 indices = np.arange(_num_samples(X))
  134. ---> 86 for test_index in self._iter_test_masks(X, y, groups):
  135. 87 train_index = indices[np.logical_not(test_index)]
  136. 88 test_index = indices[test_index]
  137. File ~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_split.py:709, in StratifiedKFold._iter_test_masks(self, X, y, groups)
  138. 708 def _iter_test_masks(self, X, y=None, groups=None):
  139. --> 709 test_folds = self._make_test_folds(X, y)
  140. 710 for i in range(self.n_splits):
  141. 711 yield test_folds == i
  142. File ~/opt/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_split.py:652, in StratifiedKFold._make_test_folds(self, X, y)
  143. 650 allowed_target_types = ("binary", "multiclass")
  144. 651 if type_of_target_y not in allowed_target_types:
  145. --> 652 raise ValueError(
  146. 653 "Supported target types are: {}. Got {!r} instead.".format(
  147. 654 allowed_target_types, type_of_target_y
  148. 655 )
  149. 656 )
  150. 658 y = column_or_1d(y)
  151. 660 _, y_idx, y_inv = np.unique(y, return_index=True, return_inverse=True)
  152. ValueError: Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead.

答案1

得分: 2

cross_val_score中,使用scoring='accuracy'仅适用于二元或多类目标。
应该使用scoring=None或其他适用于连续目标的评分。请参阅回归('neg_mean_absolute_error','neg_mean_squared_error',...)。

英文:

In cross_val_score, the use of scoring='accuracy' is only for binary or multiclass targets.
You should use instead scoring=None or some other scoring adequate for continuous targets. See regression ('neg_mean_absolute_error', 'neg_mean_squared_error', ...).

huangapple
  • 本文由 发表于 2023年2月27日 06:35:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/75575412.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定