TypeError: 在使用ColumnTransformer时,fit_transform()缺少参数:y

huangapple go评论55阅读模式
英文:

TypeError: fit_transform() missing argument: y when using ColumnTransformer

问题

I have two pipelines, one for my categorical features and one for my numeric features, that I feed into my column transformer. I then what to be able to fit the column transformer on my dataframe so I can see what it looks like.

My code is as follows:

num_pipeline = Pipeline(steps=[
    ('impute', RandomSampleImputer()),
    ('scale', MinMaxScaler())
])
cat_pipeline = Pipeline(steps=[
    ('impute', RandomSampleImputer()),
    ('target', TargetEncoder())
])

col_trans = ColumnTransformer(transformers=[
    ('num_pipeline', num_pipeline, num_cols),
    ('cat_pipeline', cat_pipeline, cat_cols)
], remainder=drop)

When I run

df_transform = col_trans.fit(df)

I get the error:

raise TypeError('fit_transform() missing argument: ' 'y'')'

Why is this?

英文:

I have two pipelines, one for my categorical features and one for my numeric features, that I feed into my column transformer. I then what to be able to fit the column transformer on my dataframe so I can see what it looks like.

My code is as follows:

num_pipeline = Pipeline(steps=[
    ('impute', RandomSampleImputer()),
    ('scale',MinMaxScaler())
])
cat_pipeline = Pipeline(steps=[
    ('impute', RandomSampleImputer()),
    ('target',TargetEncoder())
])

col_trans = ColumnTransformer(transformers=[
    ('num_pipeline',num_pipeline,num_cols),
    ('cat_pipeline',cat_pipeline,cat_cols)
    ],remainder=drop)

When I run

df_transform=col_trans.fit(df)

I get the error:

raise TypeError('fit_transform() missing argument: ''y''')'

Why is this?

答案1

得分: 0

正如Guilherme Marthe和Luca Anzalone所指出的,一些变换器(如TargetEncoder)确实需要目标变量y来计算转换。

为了获得经过转换的数据集,您需要在您的ColumnTransformer col_trans上调用fit_transform(),同时传递X(您的特征)和y(您的目标变量)。

当您调用fit_transform()时,fit()方法首先会计算所需的转换参数(例如用于归一化的均值和标准差),然后transform()将应用这些转换到您的数据上。结果是一个经过转换的新数据集。

为了确保输出是一个pandasDataFrame,您可以使用scikit-learn中的set_config()函数来更改全局配置:

from sklearn import set_config
set_config(transform_output="pandas")

现在,当您转换您的数据时,输出将是一个pandasDataFrame

X_transformed = col_trans.fit_transform(X, y)

请注意,X_transformed现在是一个保留列名的DataFrame

请记得将scikit-learn更新到版本1.2或更高版本以使用此功能。

英文:

As Guilherme Marthe and Luca Anzalone have pointed out, some transformers such as TargetEncoder do indeed require the target variable y to calculate the transformations.

In order to get your transformed dataset, you need to call fit_transform() on your ColumnTransformer col_trans, passing both X (your features) and y (your target).

When you call fit_transform(), the fit() method will first calculate any parameters needed for the transformation (such as the mean and standard deviation for normalization), and then transform() will apply the transformations to your data. The result is a new dataset where the transformations have been applied.

To ensure your output is a pandas DataFrame, you can use the set_config() function from scikit-learn to change the global configuration:

from sklearn import set_config
set_config(transform_output="pandas")

Now, when you transform your data, the output will be a pandas DataFrame:

X_transformed = col_trans.fit_transform(X, y)

Note that X_transformed is now a DataFrame with the column names preserved.

Please remember to update scikit-learn to version 1.2 or later to use this feature.

huangapple
  • 本文由 发表于 2023年6月16日 00:11:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76483591.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定