2023年7月4日 22:03:04go评论132阅读模式

英文:

What is responsible for this TypeError: DataUndersampler.transform() missing 1 required positional argument: 'y'?

问题

以下是代码的翻译部分：

这是我以前问题的一个自定义支持向量数据欠采样器答案。

主要思想是以一种明智的方式对多数类别进行欠采样，方法是将SVC与数据拟合，找到支持向量，然后根据这些支持向量的距离对多数类别进行欠采样。

from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.utils import resample
from sklearn.svm import SVC
import numpy as np
from sklearn.multiclass import OneVsOneClassifier
from imblearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
class DataUndersampler(BaseEstimator, TransformerMixin):
    def __init__(self, random_state=None):
        self.random_state = random_state
        self.svc = SVC(kernel='linear')
    def fit(self, X, y):
        # 将SVC拟合到数据
        self.svc.fit(X, y)
        return self
    def transform(self, X, y):
        # 获取支持向量
        support_vectors = self.svc.support_vectors_
        # 获取支持向量的索引
        support_vector_indices = self.svc.support_
        # 分离多数类别和少数类别
        majority_class = y.value_counts().idxmax()
        minority_class = y.value_counts().idxmin()
        X_majority = X[y == majority_class]
        y_majority = y[y == majority_class]
        X_minority = X[y == minority_class]
        y_minority = y[y == minority_class]
        # 计算多数类别样本到最近支持向量的距离
        distances = np.min(np.linalg.norm(X_majority.values[:, np.newaxis] - support_vectors, axis=2), axis=1)
        # 按距离对多数类别样本进行排序，并仅保留与少数类别相同数量的样本
        sorted_indices = np.argsort(distances)
        indices_to_keep = sorted_indices[:len(y_minority)]
        # 将欠采样后的多数类别与少数类别合并
        X_resampled = pd.concat([X_majority.iloc[indices_to_keep], X_minority])
        y_resampled = pd.concat([y_majority.iloc[indices_to_keep], y_minority])
        return X_resampled, y_resampled

最小工作示例（MWE）：

from sklearn.datasets import make_classification
X, y = make_classification(n_samples=10_000, n_classes=5, weights=[22.6, 3.7, 16.4, 51.9],
                           n_informative=4)
rf_clf = model = RandomForestClassifier()
resampler = DataUndersampler(random_state=234)
pipeline = Pipeline([('sampler', resampler), ('clf', rf_clf)])
classifier = OneVsOneClassifier(estimator=pipeline)
classifier.fit(X, y)

产生的错误：

----> 7 classifier.fit(X, y)
18 frames
/usr/local/lib/python3.10/dist-packages/sklearn/utils/_set_output.py in wrapped(self, X, *args, **kwargs)
    138     @wraps(f)
    139     def wrapped(self, X, *args, **kwargs):
--> 140         data_to_wrap = f(self, X, *args, **kwargs)
    141         if isinstance(data_to_wrap, tuple):
    142             # only wrap the first output for cross decomposition
TypeError: DataUndersampler.transform() missing 1 required positional argument: 'y'

英文:

This is a custom support vectorbased data undersampler answer from my previous question.

The main idea is to undersample the majority class in an informed way, by fitting an SVC to the data, find the support vectors, and then undersample the majority class based on the distances to these support vectors.

Code:

from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.utils import resample
from sklearn.svm import SVC
import numpy as np
from sklearn.multiclass import OneVsOneClassifier
from imblearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
class DataUndersampler(BaseEstimator, TransformerMixin):
    def __init__(self, random_state=None):
        self.random_state = random_state
        self.svc = SVC(kernel=&#39;linear&#39;)
    def fit(self, X, y):
        # Fit SVC to data
        self.svc.fit(X, y)
        return self
    def transform(self, X, y):
        # Get support vectors
        support_vectors = self.svc.support_vectors_
        # Get indices of support vectors
        support_vector_indices = self.svc.support_
        # Separate majority and minority classes
        majority_class = y.value_counts().idxmax()
        minority_class = y.value_counts().idxmin()
        X_majority = X[y == majority_class]
        y_majority = y[y == majority_class]
        X_minority = X[y == minority_class]
        y_minority = y[y == minority_class]
        # Calculate distances of majority class samples to nearest support vector
        distances = np.min(np.linalg.norm(X_majority.values[:, np.newaxis] - support_vectors, axis=2), axis=1)
        # Sort the majority class samples by distance and take only as many as there are in minority class
        sorted_indices = np.argsort(distances)
        indices_to_keep = sorted_indices[:len(y_minority)]
        # Combine the undersampled majority class with the minority class
        X_resampled = pd.concat([X_majority.iloc[indices_to_keep], X_minority])
        y_resampled = pd.concat([y_majority.iloc[indices_to_keep], y_minority])
        return X_resampled, y_resampled

MWE:

from sklearn.datasets import make_classification
X, y = make_classification(n_samples=10_000, n_classes=5, weights=[22.6, 3.7, 16.4, 51.9],
                           n_informative=4)
rf_clf = model = RandomForestClassifier()
resampler = DataUndersampler(random_state=234)
pipeline = Pipeline([(&#39;sampler&#39;, resampler), (&#39;clf&#39;, rf_clf)])
classifier = OneVsOneClassifier(estimator=pipeline)
classifier.fit(X, y)

Produces the error:

----&gt; 7 classifier.fit(X, y)
18 frames
/usr/local/lib/python3.10/dist-packages/sklearn/utils/_set_output.py in wrapped(self, X, *args, **kwargs)
    138     @wraps(f)
    139     def wrapped(self, X, *args, **kwargs):
--&gt; 140         data_to_wrap = f(self, X, *args, **kwargs)
    141         if isinstance(data_to_wrap, tuple):
    142             # only wrap the first output for cross decomposition
TypeError: DataUndersampler.transform() missing 1 required positional argument: &#39;y&#39;

答案1

得分: 1

The problem is the TransformerMixin as its implementation of fit_transform is:

def fit_transform(self, X, y=None, **fit_params):
    if y is None:
        # fit method of arity 1 (unsupervised transformation)
        return self.fit(X, **fit_params).transform(X)
    else:
        # fit method of arity 2 (supervised transformation)
        return self.fit(X, y, **fit_params).transform(X) # <-- here is the problem

Solution implement fit_transform yourself.
(That is the main purpose of the TransformerMixin class - beside also inheriting from _SetOutputMixin source).

class DataUndersampler(BaseEstimator):
    def fit_transform(self, X, y):
        return self.fit(X, y).transform(X, y)
    ...

NOTE:
You might run into problems further down the line if only a single output from transform is expected.
In that case you have to update Y inplace and only return X.

y[:] = y_resampled
return X_resampled

Should do the job.

英文:

The problem is the TransformerMixin as its implementation of fit_transform is:

def fit_transform(self, X, y=None, **fit_params):
    &quot;&quot;&quot;
    Fits transformer to `X` and `y` with optional parameters `fit_params`
        and returns a transformed version of `X`.
    &quot;&quot;&quot;
        if y is None:
            # fit method of arity 1 (unsupervised transformation)
            return self.fit(X, **fit_params).transform(X)
        else:
            # fit method of arity 2 (supervised transformation)
            return self.fit(X, y, **fit_params).transform(X) # &lt;-- here is the problem

Solution implement fit_transform yourself.
(That is the main purpose of the TransformerMixin class - beside also also inheriting from _SetOutputMixin source).

class DataUndersampler(BaseEstimator):
    def fit_transform(self, X, y):
        return self.fit(X, y).transform(X, y)
    ...

NOTE:
You might run into problems further down the line if only a single output from transform is expected.
In that case you have to update Y inplace and only return X.

y[:] = y_resampled
return X_resampled

Should do the job.

答案2

得分: 0

请慢慢一步步来。首先，让我们看一下错误。

>TypeError: DataUndersampler.transform() 缺少 1 个必需的位置参数: 'y'

当一个函数期望 2 个参数但只获得一个时，会发生这个错误。例如：

def func(x, y):
    return x, y # 虚构的函数
# 这会导致错误：
func(3) # 参数不足

因此，无论谁在调用 transform()，都只希望 transform 接受 1 个参数。

实际上，如果您查看OneVsOneClassifier.fit()的源代码，我们会看到这行代码：

     # 请注意，transform 只使用 1 个参数调用！
     Y = self.label_binarizer_.fit_transform(y)

我对 Sklearn 不是特别熟悉，但我怀疑您需要一个可以处理 2 个输入变量的分类器。我尝试查找，但无法弄清楚是什么，不过。

英文:

Let's slowly take this step by step. First let's look at the error.

>TypeError: DataUndersampler.transform() missing 1 required positional argument: 'y'

This error happens when a function expects 2 arguments but gets only one. For example:

def func (x , y):
    return x, y # Fummy function
# This causes the error:
func(3) # Not enough arguments

Thus, whatever is calling transform() is only expecting transform to accept 1 argument.

Indeed, if you look at the source code for OneVsOneClassifier.fit(), we see this line:

     # Note transform is called with only 1 argument!
     Y = self.label_binarizer_.fit_transform(y)

I'm not super familiar with Sklearn, but I suspect that you need a classifier that can handle 2 input variables. I looked but couldn't figure out what that would be, though.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

What is responsible for this TypeError: DataUndersampler.transform() missing 1 required positional argument: 'y'?

问题

答案1

答案2

为什么要检查 pygame.QUIT，当可以点击 ‘X’ 按钮关闭窗口？

尝试使用exec.Command在Golang中执行Python代码。

Python多进程与函数的功能

无法使用Selenium CSS选择器找到元素，即使单独使用它正常。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。