What is responsible for this TypeError: DataUndersampler.transform() missing 1 required positional argument: 'y'?

huangapple go评论132阅读模式
英文:

What is responsible for this TypeError: DataUndersampler.transform() missing 1 required positional argument: 'y'?

问题

以下是代码的翻译部分:

这是我以前问题的一个自定义支持向量数据欠采样器答案。

主要思想是以一种明智的方式对多数类别进行欠采样,方法是将SVC与数据拟合,找到支持向量,然后根据这些支持向量的距离对多数类别进行欠采样。

  1. from sklearn.base import BaseEstimator, TransformerMixin
  2. from sklearn.utils import resample
  3. from sklearn.svm import SVC
  4. import numpy as np
  5. from sklearn.multiclass import OneVsOneClassifier
  6. from imblearn.pipeline import Pipeline
  7. from sklearn.ensemble import RandomForestClassifier
  8. class DataUndersampler(BaseEstimator, TransformerMixin):
  9. def __init__(self, random_state=None):
  10. self.random_state = random_state
  11. self.svc = SVC(kernel='linear')
  12. def fit(self, X, y):
  13. # 将SVC拟合到数据
  14. self.svc.fit(X, y)
  15. return self
  16. def transform(self, X, y):
  17. # 获取支持向量
  18. support_vectors = self.svc.support_vectors_
  19. # 获取支持向量的索引
  20. support_vector_indices = self.svc.support_
  21. # 分离多数类别和少数类别
  22. majority_class = y.value_counts().idxmax()
  23. minority_class = y.value_counts().idxmin()
  24. X_majority = X[y == majority_class]
  25. y_majority = y[y == majority_class]
  26. X_minority = X[y == minority_class]
  27. y_minority = y[y == minority_class]
  28. # 计算多数类别样本到最近支持向量的距离
  29. distances = np.min(np.linalg.norm(X_majority.values[:, np.newaxis] - support_vectors, axis=2), axis=1)
  30. # 按距离对多数类别样本进行排序,并仅保留与少数类别相同数量的样本
  31. sorted_indices = np.argsort(distances)
  32. indices_to_keep = sorted_indices[:len(y_minority)]
  33. # 将欠采样后的多数类别与少数类别合并
  34. X_resampled = pd.concat([X_majority.iloc[indices_to_keep], X_minority])
  35. y_resampled = pd.concat([y_majority.iloc[indices_to_keep], y_minority])
  36. return X_resampled, y_resampled

最小工作示例(MWE):

  1. from sklearn.datasets import make_classification
  2. X, y = make_classification(n_samples=10_000, n_classes=5, weights=[22.6, 3.7, 16.4, 51.9],
  3. n_informative=4)
  4. rf_clf = model = RandomForestClassifier()
  5. resampler = DataUndersampler(random_state=234)
  6. pipeline = Pipeline([('sampler', resampler), ('clf', rf_clf)])
  7. classifier = OneVsOneClassifier(estimator=pipeline)
  8. classifier.fit(X, y)

产生的错误:

  1. ----> 7 classifier.fit(X, y)
  2. 18 frames
  3. /usr/local/lib/python3.10/dist-packages/sklearn/utils/_set_output.py in wrapped(self, X, *args, **kwargs)
  4. 138 @wraps(f)
  5. 139 def wrapped(self, X, *args, **kwargs):
  6. --> 140 data_to_wrap = f(self, X, *args, **kwargs)
  7. 141 if isinstance(data_to_wrap, tuple):
  8. 142 # only wrap the first output for cross decomposition
  9. TypeError: DataUndersampler.transform() missing 1 required positional argument: 'y'
英文:

This is a custom support vectorbased data undersampler answer from my previous question.

The main idea is to undersample the majority class in an informed way, by fitting an SVC to the data, find the support vectors, and then undersample the majority class based on the distances to these support vectors.

Code:

  1. from sklearn.base import BaseEstimator, TransformerMixin
  2. from sklearn.utils import resample
  3. from sklearn.svm import SVC
  4. import numpy as np
  5. from sklearn.multiclass import OneVsOneClassifier
  6. from imblearn.pipeline import Pipeline
  7. from sklearn.ensemble import RandomForestClassifier
  8. class DataUndersampler(BaseEstimator, TransformerMixin):
  9. def __init__(self, random_state=None):
  10. self.random_state = random_state
  11. self.svc = SVC(kernel='linear')
  12. def fit(self, X, y):
  13. # Fit SVC to data
  14. self.svc.fit(X, y)
  15. return self
  16. def transform(self, X, y):
  17. # Get support vectors
  18. support_vectors = self.svc.support_vectors_
  19. # Get indices of support vectors
  20. support_vector_indices = self.svc.support_
  21. # Separate majority and minority classes
  22. majority_class = y.value_counts().idxmax()
  23. minority_class = y.value_counts().idxmin()
  24. X_majority = X[y == majority_class]
  25. y_majority = y[y == majority_class]
  26. X_minority = X[y == minority_class]
  27. y_minority = y[y == minority_class]
  28. # Calculate distances of majority class samples to nearest support vector
  29. distances = np.min(np.linalg.norm(X_majority.values[:, np.newaxis] - support_vectors, axis=2), axis=1)
  30. # Sort the majority class samples by distance and take only as many as there are in minority class
  31. sorted_indices = np.argsort(distances)
  32. indices_to_keep = sorted_indices[:len(y_minority)]
  33. # Combine the undersampled majority class with the minority class
  34. X_resampled = pd.concat([X_majority.iloc[indices_to_keep], X_minority])
  35. y_resampled = pd.concat([y_majority.iloc[indices_to_keep], y_minority])
  36. return X_resampled, y_resampled

MWE:

  1. from sklearn.datasets import make_classification
  2. X, y = make_classification(n_samples=10_000, n_classes=5, weights=[22.6, 3.7, 16.4, 51.9],
  3. n_informative=4)
  4. rf_clf = model = RandomForestClassifier()
  5. resampler = DataUndersampler(random_state=234)
  6. pipeline = Pipeline([('sampler', resampler), ('clf', rf_clf)])
  7. classifier = OneVsOneClassifier(estimator=pipeline)
  8. classifier.fit(X, y)

Produces the error:

  1. ----> 7 classifier.fit(X, y)
  2. 18 frames
  3. /usr/local/lib/python3.10/dist-packages/sklearn/utils/_set_output.py in wrapped(self, X, *args, **kwargs)
  4. 138 @wraps(f)
  5. 139 def wrapped(self, X, *args, **kwargs):
  6. --> 140 data_to_wrap = f(self, X, *args, **kwargs)
  7. 141 if isinstance(data_to_wrap, tuple):
  8. 142 # only wrap the first output for cross decomposition
  9. TypeError: DataUndersampler.transform() missing 1 required positional argument: 'y'

答案1

得分: 1

The problem is the TransformerMixin as its implementation of fit_transform is:

  1. def fit_transform(self, X, y=None, **fit_params):
  2. if y is None:
  3. # fit method of arity 1 (unsupervised transformation)
  4. return self.fit(X, **fit_params).transform(X)
  5. else:
  6. # fit method of arity 2 (supervised transformation)
  7. return self.fit(X, y, **fit_params).transform(X) # <-- here is the problem

Solution implement fit_transform yourself.
(That is the main purpose of the TransformerMixin class - beside also inheriting from _SetOutputMixin source).

  1. class DataUndersampler(BaseEstimator):
  2. def fit_transform(self, X, y):
  3. return self.fit(X, y).transform(X, y)
  4. ...

NOTE:
You might run into problems further down the line if only a single output from transform is expected.
In that case you have to update Y inplace and only return X.

  1. y[:] = y_resampled
  2. return X_resampled

Should do the job.

英文:

The problem is the TransformerMixin as its implementation of fit_transform is:

  1. def fit_transform(self, X, y=None, **fit_params):
  2. &quot;&quot;&quot;
  3. Fits transformer to `X` and `y` with optional parameters `fit_params`
  4. and returns a transformed version of `X`.
  5. &quot;&quot;&quot;
  6. if y is None:
  7. # fit method of arity 1 (unsupervised transformation)
  8. return self.fit(X, **fit_params).transform(X)
  9. else:
  10. # fit method of arity 2 (supervised transformation)
  11. return self.fit(X, y, **fit_params).transform(X) # &lt;-- here is the problem

Solution implement fit_transform yourself.
(That is the main purpose of the TransformerMixin class - beside also also inheriting from _SetOutputMixin source).

  1. class DataUndersampler(BaseEstimator):
  2. def fit_transform(self, X, y):
  3. return self.fit(X, y).transform(X, y)
  4. ...

NOTE:
You might run into problems further down the line if only a single output from transform is expected.
In that case you have to update Y inplace and only return X.

  1. y[:] = y_resampled
  2. return X_resampled

Should do the job.

答案2

得分: 0

请慢慢一步步来。首先,让我们看一下错误。

>TypeError: DataUndersampler.transform() 缺少 1 个必需的位置参数: 'y'

当一个函数期望 2 个参数但只获得一个时,会发生这个错误。例如:

  1. def func(x, y):
  2. return x, y # 虚构的函数
  3. # 这会导致错误:
  4. func(3) # 参数不足

因此,无论谁在调用 transform(),都只希望 transform 接受 1 个参数。

实际上,如果您查看OneVsOneClassifier.fit()源代码,我们会看到这行代码:

  1. # 请注意,transform 只使用 1 个参数调用!
  2. Y = self.label_binarizer_.fit_transform(y)

我对 Sklearn 不是特别熟悉,但我怀疑您需要一个可以处理 2 个输入变量的分类器。我尝试查找,但无法弄清楚是什么,不过。

英文:

Let's slowly take this step by step. First let's look at the error.

>TypeError: DataUndersampler.transform() missing 1 required positional argument: 'y'

This error happens when a function expects 2 arguments but gets only one. For example:

  1. def func (x , y):
  2. return x, y # Fummy function
  3. # This causes the error:
  4. func(3) # Not enough arguments

Thus, whatever is calling transform() is only expecting transform to accept 1 argument.

Indeed, if you look at the source code for OneVsOneClassifier.fit(), we see this line:

  1. # Note transform is called with only 1 argument!
  2. Y = self.label_binarizer_.fit_transform(y)

I'm not super familiar with Sklearn, but I suspect that you need a classifier that can handle 2 input variables. I looked but couldn't figure out what that would be, though.

huangapple
  • 本文由 发表于 2023年7月4日 22:03:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/76613438.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定