问题

我正在使用sklearn的GridSearchCV来优化Adaboost分类器的参数，用不同的数据集。然后，我创建/添加到一个包含数据集名称、best_params_和best_score_等信息的DataFrame中。

有时我会收到警告，如ConvergenceWarning，或者只是一个不推荐使用的包。它们不一定会造成问题，但我想将它们添加为一列。

这个帖子（https://stackoverflow.com/questions/41507783/writing-scikit-learn-verbose-log-into-an-external-file）似乎接近了bluesummers和mbil的消息，但我并不真的想写一个文件然后再读回来到我的DataFrame中。

以下是一个最小的工作示例。目前，DataFrame最后会将“warnings”列填充为NA。然而，因为我使用的是AdaBoostClassifier(base_estimator=RandomForestClassifier())而不是AdaBoostClassifier(estimator=RandomForestClassifier())，所以我应该会收到一堆错误消息，我想抓取并保存在warnings列中。

英文:

I am using sklearn's GridSearchCV to optimize parameters for Adaboost classifiers with different datasets. I then create/add to a DatafFrame that has information like the dataset name, best_params_, and best_score_.

Sometimes I get warnings such as a ConvergenceWarning, or just a deprecated package. They don't necessarily hurt anything, but I would like to add them as a column.

This post (https://stackoverflow.com/questions/41507783/writing-scikit-learn-verbose-log-into-an-external-file) seems to get close with bluesummers' and mbil's mesages, but I don't really want to write a file to read back in to my dataframe.

Here is a minimal working example. For the DataFrame at the end it currently fills NA for the "warnings" columns. However, because I'm using AdaBoostClassifier(base_estimator=RandomForestClassifier()) instead of AdaBoostClassifier(estimator=RandomForestClassifier()) I should be getting a bunch of errors that I would like to grab ans save in the warnings column.

from sklearn.model_selection import GridSearchCV, KFold, cross_val_score,StratifiedKFold
from sklearn.ensemble import AdaBoostClassifier, RandomForestClassifier
import numpy as np
import tqdm as tq
import pandas as pd
from sklearn.preprocessing import StandardScaler

df_params = pd.DataFrame(columns=[&#39;learning_rate&#39;, &#39;n_estimators&#39;, &#39;accuracy&#39;, &#39;warning&#39;])
abc = AdaBoostClassifier(base_estimator=RandomForestClassifier())

parameters = {&#39;n_estimators&#39;:[5,10],
              &#39;learning_rate&#39;:[0.01,0.2]}

a = np.random.random((50, 3))
b = np.random.random((70, 3))
c = np.random.random((50, 5))


for i, data in tq.tqdm(enumerate([a,b,c])):
    X = data
    sc =StandardScaler()
    X = sc.fit_transform(X)
    y = [&#39;foo&#39;, &#39;bar&#39;]*int(len(X)/2)
    
    skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=None)
    clf = GridSearchCV(abc, parameters, cv=skf, scoring=&#39;accuracy&#39;, n_jobs=-1,)
    clf.fit(X,y)
    
    dict_best_params = clf.best_params_.copy()
    dict_best_params[&#39;accuracy&#39;] = clf.best_score_
    best_params = pd.DataFrame(dict_best_params, index=[i])
    df_params = pd.concat([df_params, best_params], ignore_index=False)

df_params.head()

答案1

得分: 0

IIUC，您可以使用catch_warning：

import warnings  # 这里
import numpy as np
import tqdm as tq
import pandas as pd
from sklearn.preprocessing import StandardScaler

df_params = pd.DataFrame(columns=['learning_rate', 'n_estimators', 'accuracy', 'warning'])
abc = AdaBoostClassifier(base_estimator=RandomForestClassifier())

parameters = {'n_estimators':[5,10],
              'learning_rate':[0.01,0.2]}

a = np.random.random((50, 3))
b = np.random.random((70, 3))
c = np.random.random((50, 5))

warns = []
for i, data in tq.tqdm(enumerate([a,b,c])):
    with warnings.catch_warnings(record=True) as cx_manager:  # 这里
        X = data
        sc =StandardScaler()
        X = sc.fit_transform(X)
        y = ['foo', 'bar']*int(len(X)/2)
    
        skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=None)
        clf = GridSearchCV(abc, parameters, cv=skf, scoring='accuracy', n_jobs=-1,)
        clf.fit(X,y)
    
        dict_best_params = clf.best_params_.copy()
        dict_best_params['accuracy'] = clf.best_score_
        dict_best_params['warning'] = [i.message for i in cx_manager]  # 这里
        best_params = pd.DataFrame(dict_best_params, index=[i])
        df_params = pd.concat([df_params, best_params], ignore_index=False)

输出：

>>> df_params
   learning_rate n_estimators  accuracy                                            warning
0           0.20           10  0.520000  `base_estimator` was renamed to `estimator` in...
1           0.20           10  0.514286  `base_estimator` was renamed to `estimator` in...
2           0.01            5  0.440000  `base_estimator` was renamed to `estimator` in...

英文:

IIUC, you can use catch_warning:

import warnings  # HERE
import numpy as np
import tqdm as tq
import pandas as pd
from sklearn.preprocessing import StandardScaler

df_params = pd.DataFrame(columns=[&#39;learning_rate&#39;, &#39;n_estimators&#39;, &#39;accuracy&#39;, &#39;warning&#39;])
abc = AdaBoostClassifier(base_estimator=RandomForestClassifier())

parameters = {&#39;n_estimators&#39;:[5,10],
              &#39;learning_rate&#39;:[0.01,0.2]}

a = np.random.random((50, 3))
b = np.random.random((70, 3))
c = np.random.random((50, 5))


warns = []
for i, data in tq.tqdm(enumerate([a,b,c])):
    with warnings.catch_warnings(record=True) as cx_manager:  # HERE
        X = data
        sc =StandardScaler()
        X = sc.fit_transform(X)
        y = [&#39;foo&#39;, &#39;bar&#39;]*int(len(X)/2)
    
        skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=None)
        clf = GridSearchCV(abc, parameters, cv=skf, scoring=&#39;accuracy&#39;, n_jobs=-1,)
        clf.fit(X,y)
    
        dict_best_params = clf.best_params_.copy()
        dict_best_params[&#39;accuracy&#39;] = clf.best_score_
        dict_best_params[&#39;warning&#39;] = [i.message for i in cx_manager]  # HERE
        best_params = pd.DataFrame(dict_best_params, index=[i])
        df_params = pd.concat([df_params, best_params], ignore_index=False)

Output:

&gt;&gt;&gt; df_params
   learning_rate n_estimators  accuracy                                            warning
0           0.20           10  0.520000  `base_estimator` was renamed to `estimator` in...
1           0.20           10  0.514286  `base_estimator` was renamed to `estimator` in...
2           0.01            5  0.440000  `base_estimator` was renamed to `estimator` in...

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将sklearn的警告保存到数据框中

问题

答案1

BeautifulSoup 类型错误：’NoneType’ 对象不可调用

禁用按钮，直到文本输入框中输入文本 – PySimpleGUI

基于另一个数据框架的条件筛选多级索引数据框。

你如何访问迭代器的值？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论