2023年6月16日 00:11:41go评论96阅读模式

英文:

TypeError: fit_transform() missing argument: y when using ColumnTransformer

问题

I have two pipelines, one for my categorical features and one for my numeric features, that I feed into my column transformer. I then what to be able to fit the column transformer on my dataframe so I can see what it looks like.

My code is as follows:

num_pipeline = Pipeline(steps=[
    ('impute', RandomSampleImputer()),
    ('scale', MinMaxScaler())
])
cat_pipeline = Pipeline(steps=[
    ('impute', RandomSampleImputer()),
    ('target', TargetEncoder())
])
col_trans = ColumnTransformer(transformers=[
    ('num_pipeline', num_pipeline, num_cols),
    ('cat_pipeline', cat_pipeline, cat_cols)
], remainder=drop)

When I run

df_transform = col_trans.fit(df)

I get the error:

raise TypeError('fit_transform() missing argument: ' 'y'')'

Why is this?

英文:

My code is as follows:

num_pipeline = Pipeline(steps=[
    (&#39;impute&#39;, RandomSampleImputer()),
    (&#39;scale&#39;,MinMaxScaler())
])
cat_pipeline = Pipeline(steps=[
    (&#39;impute&#39;, RandomSampleImputer()),
    (&#39;target&#39;,TargetEncoder())
])
col_trans = ColumnTransformer(transformers=[
    (&#39;num_pipeline&#39;,num_pipeline,num_cols),
    (&#39;cat_pipeline&#39;,cat_pipeline,cat_cols)
    ],remainder=drop)

When I run

df_transform=col_trans.fit(df)

I get the error:

raise TypeError(&#39;fit_transform() missing argument: &#39;&#39;y&#39;&#39;&#39;)&#39;

Why is this?

答案1

得分: 0

正如Guilherme Marthe和Luca Anzalone所指出的，一些变换器（如TargetEncoder）确实需要目标变量y来计算转换。

为了获得经过转换的数据集，您需要在您的ColumnTransformer col_trans上调用fit_transform()，同时传递X（您的特征）和y（您的目标变量）。

当您调用fit_transform()时，fit()方法首先会计算所需的转换参数（例如用于归一化的均值和标准差），然后transform()将应用这些转换到您的数据上。结果是一个经过转换的新数据集。

为了确保输出是一个pandas的DataFrame，您可以使用scikit-learn中的set_config()函数来更改全局配置：

from sklearn import set_config
set_config(transform_output="pandas")

现在，当您转换您的数据时，输出将是一个pandas的DataFrame：

X_transformed = col_trans.fit_transform(X, y)

请注意，X_transformed现在是一个保留列名的DataFrame。

请记得将scikit-learn更新到版本1.2或更高版本以使用此功能。

英文:

As Guilherme Marthe and Luca Anzalone have pointed out, some transformers such as TargetEncoder do indeed require the target variable y to calculate the transformations.

In order to get your transformed dataset, you need to call fit_transform() on your ColumnTransformer col_trans, passing both X (your features) and y (your target).

When you call fit_transform(), the fit() method will first calculate any parameters needed for the transformation (such as the mean and standard deviation for normalization), and then transform() will apply the transformations to your data. The result is a new dataset where the transformations have been applied.

To ensure your output is a pandas DataFrame, you can use the set_config() function from scikit-learn to change the global configuration:

from sklearn import set_config
set_config(transform_output=&quot;pandas&quot;)

Now, when you transform your data, the output will be a pandas DataFrame:

X_transformed = col_trans.fit_transform(X, y)

Note that X_transformed is now a DataFrame with the column names preserved.

Please remember to update scikit-learn to version 1.2 or later to use this feature.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

TypeError: 在使用ColumnTransformer时，fit_transform()缺少参数：y

问题

答案1

计数算法

在Python的for循环中读取下一行

NeoVim ugly text next to variable assignment

Discord.py 头像 tobytes 引发错误图像

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。