Feature importance scores with GridSearchCV

Feature importance scores with GridSearchCV


I am trying to get the feature importance scores of my variables. Besides the actual values I am trying to link them to the column names and create a dataframe.

This is how I am using the GridSearchCV function:

grid_search = GridSearchCV(model, parameters, cv=10) #tuning
grid_search.fit(X_train2, y_train2.values.ravel())

Now, I run this function that outputs an array of importance scores:


However, this array contains more scores than columns in my dataset, so I don't know how to match them up. Is there any way to directly output the importance scores with the assigned column names from the GridSearchCV function?

Also, this is how my model's pipeline looks like for reference:

# Define the pipeline
numeric_transformer = Pipeline(steps=[
    #('imputer', KNNImputer()), # for missing values
    ('scaler', StandardScaler()), # standardizing
    #('scaler', MinMaxScaler()), # normalizing
    #('to_df', FunctionTransformer(lambda x: pd.DataFrame(x, columns=X_train2.select_dtypes(include=['int64', 'float64']).columns)))

categorical_transformer = Pipeline(steps=[
    ('onehot', OneHotEncoder(handle_unknown='ignore'))

preprocessor = ColumnTransformer(
        ('num', numeric_transformer, X_train2.select_dtypes(include=['int64', 'float64']).columns.tolist()),
        ('cat', categorical_transformer, categorical_cols),
        # ("pca", PCA(random_state=548, n_components=25), indices_pca),
        # ('smotenc', SmoteNCWrapper(categorical_features=[1, 2], random_state=548)),

model = Pipeline(steps=[
    # ('over', SMOTE(random_state=548)),
    # ('smotenc', SmoteNCWrapper(categorical_features=[1, 2], random_state=548)),
    ('preprocessor', preprocessor),
    ('regressor', XGBClassifier(random_state=548))

parameters = { 'regressor__n_estimators': [1000],
    'regressor__max_depth': [5],
    'regressor__learning_rate': [0.01],
    # 'regressor__num_leaves': [31],
    # 'regressor__min_child_samples': [20],
    'regressor__reg_alpha': [0.1], 
    'regressor__reg_lambda': [0.1]

得分: 1


# 获取独热编码后的特征名称
feature_names = preprocessor.named_transformers_['cat'].named_steps['onehot'].get_feature_names_out(categorical_cols)
numeric_cols = X_train2.select_dtypes(include=['int64', 'float64']).columns.tolist()
all_feature_names = numeric_cols + feature_names

You are getting "more" features because you are using a onehot encoding that creates dummy features, you can get the names using ".get_feature_names_out"

# Get feature names after one-hot encoding
feature_names = preprocessor.named_transformers_['cat'].named_steps['onehot'].get_feature_names_out(categorical_cols)
numeric_cols = X_train2.select_dtypes(include=['int64', 'float64']).columns.tolist()
all_feature_names = numeric_cols + feature_names

