英文:
TypeError: fit_transform() missing argument: y when using ColumnTransformer
问题
I have two pipelines, one for my categorical features and one for my numeric features, that I feed into my column transformer. I then what to be able to fit the column transformer on my dataframe so I can see what it looks like.
My code is as follows:
num_pipeline = Pipeline(steps=[
('impute', RandomSampleImputer()),
('scale', MinMaxScaler())
])
cat_pipeline = Pipeline(steps=[
('impute', RandomSampleImputer()),
('target', TargetEncoder())
])
col_trans = ColumnTransformer(transformers=[
('num_pipeline', num_pipeline, num_cols),
('cat_pipeline', cat_pipeline, cat_cols)
], remainder=drop)
When I run
df_transform = col_trans.fit(df)
I get the error:
raise TypeError('fit_transform() missing argument: ' 'y'')'
Why is this?
英文:
I have two pipelines, one for my categorical features and one for my numeric features, that I feed into my column transformer. I then what to be able to fit the column transformer on my dataframe so I can see what it looks like.
My code is as follows:
num_pipeline = Pipeline(steps=[
('impute', RandomSampleImputer()),
('scale',MinMaxScaler())
])
cat_pipeline = Pipeline(steps=[
('impute', RandomSampleImputer()),
('target',TargetEncoder())
])
col_trans = ColumnTransformer(transformers=[
('num_pipeline',num_pipeline,num_cols),
('cat_pipeline',cat_pipeline,cat_cols)
],remainder=drop)
When I run
df_transform=col_trans.fit(df)
I get the error:
raise TypeError('fit_transform() missing argument: ''y''')'
Why is this?
答案1
得分: 0
正如Guilherme Marthe和Luca Anzalone所指出的,一些变换器(如TargetEncoder
)确实需要目标变量y
来计算转换。
为了获得经过转换的数据集,您需要在您的ColumnTransformer
col_trans
上调用fit_transform()
,同时传递X
(您的特征)和y
(您的目标变量)。
当您调用fit_transform()
时,fit()
方法首先会计算所需的转换参数(例如用于归一化的均值和标准差),然后transform()
将应用这些转换到您的数据上。结果是一个经过转换的新数据集。
为了确保输出是一个pandas
的DataFrame
,您可以使用scikit-learn
中的set_config()
函数来更改全局配置:
from sklearn import set_config
set_config(transform_output="pandas")
现在,当您转换您的数据时,输出将是一个pandas
的DataFrame
:
X_transformed = col_trans.fit_transform(X, y)
请注意,X_transformed
现在是一个保留列名的DataFrame
。
请记得将scikit-learn
更新到版本1.2或更高版本以使用此功能。
英文:
As Guilherme Marthe and Luca Anzalone have pointed out, some transformers such as TargetEncoder
do indeed require the target variable y
to calculate the transformations.
In order to get your transformed dataset, you need to call fit_transform()
on your ColumnTransformer
col_trans
, passing both X
(your features) and y
(your target).
When you call fit_transform()
, the fit()
method will first calculate any parameters needed for the transformation (such as the mean and standard deviation for normalization), and then transform()
will apply the transformations to your data. The result is a new dataset where the transformations have been applied.
To ensure your output is a pandas
DataFrame
, you can use the set_config()
function from scikit-learn
to change the global configuration:
from sklearn import set_config
set_config(transform_output="pandas")
Now, when you transform your data, the output will be a pandas
DataFrame
:
X_transformed = col_trans.fit_transform(X, y)
Note that X_transformed
is now a DataFrame
with the column names preserved.
Please remember to update scikit-learn
to version 1.2 or later to use this feature.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论