英文:
Saving sklearn warnings to a dataframe
问题
我正在使用sklearn的GridSearchCV来优化Adaboost分类器的参数,用不同的数据集。然后,我创建/添加到一个包含数据集名称、best_params_和best_score_等信息的DataFrame中。
有时我会收到警告,如ConvergenceWarning,或者只是一个不推荐使用的包。它们不一定会造成问题,但我想将它们添加为一列。
这个帖子(https://stackoverflow.com/questions/41507783/writing-scikit-learn-verbose-log-into-an-external-file)似乎接近了bluesummers和mbil的消息,但我并不真的想写一个文件然后再读回来到我的DataFrame中。
以下是一个最小的工作示例。目前,DataFrame最后会将“warnings”列填充为NA。然而,因为我使用的是AdaBoostClassifier(base_estimator=RandomForestClassifier())
而不是AdaBoostClassifier(estimator=RandomForestClassifier())
,所以我应该会收到一堆错误消息,我想抓取并保存在warnings列中。
英文:
I am using sklearn's GridSearchCV to optimize parameters for Adaboost classifiers with different datasets. I then create/add to a DatafFrame that has information like the dataset name, best_params_, and best_score_.
Sometimes I get warnings such as a ConvergenceWarning, or just a deprecated package. They don't necessarily hurt anything, but I would like to add them as a column.
This post (https://stackoverflow.com/questions/41507783/writing-scikit-learn-verbose-log-into-an-external-file) seems to get close with bluesummers' and mbil's mesages, but I don't really want to write a file to read back in to my dataframe.
Here is a minimal working example. For the DataFrame at the end it currently fills NA for the "warnings" columns. However, because I'm using AdaBoostClassifier(base_estimator=RandomForestClassifier())
instead of AdaBoostClassifier(estimator=RandomForestClassifier())
I should be getting a bunch of errors that I would like to grab ans save in the warnings column.
from sklearn.model_selection import GridSearchCV, KFold, cross_val_score,StratifiedKFold
from sklearn.ensemble import AdaBoostClassifier, RandomForestClassifier
import numpy as np
import tqdm as tq
import pandas as pd
from sklearn.preprocessing import StandardScaler
df_params = pd.DataFrame(columns=['learning_rate', 'n_estimators', 'accuracy', 'warning'])
abc = AdaBoostClassifier(base_estimator=RandomForestClassifier())
parameters = {'n_estimators':[5,10],
'learning_rate':[0.01,0.2]}
a = np.random.random((50, 3))
b = np.random.random((70, 3))
c = np.random.random((50, 5))
for i, data in tq.tqdm(enumerate([a,b,c])):
X = data
sc =StandardScaler()
X = sc.fit_transform(X)
y = ['foo', 'bar']*int(len(X)/2)
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=None)
clf = GridSearchCV(abc, parameters, cv=skf, scoring='accuracy', n_jobs=-1,)
clf.fit(X,y)
dict_best_params = clf.best_params_.copy()
dict_best_params['accuracy'] = clf.best_score_
best_params = pd.DataFrame(dict_best_params, index=[i])
df_params = pd.concat([df_params, best_params], ignore_index=False)
df_params.head()
答案1
得分: 0
IIUC,您可以使用catch_warning
:
import warnings # 这里
import numpy as np
import tqdm as tq
import pandas as pd
from sklearn.preprocessing import StandardScaler
df_params = pd.DataFrame(columns=['learning_rate', 'n_estimators', 'accuracy', 'warning'])
abc = AdaBoostClassifier(base_estimator=RandomForestClassifier())
parameters = {'n_estimators':[5,10],
'learning_rate':[0.01,0.2]}
a = np.random.random((50, 3))
b = np.random.random((70, 3))
c = np.random.random((50, 5))
warns = []
for i, data in tq.tqdm(enumerate([a,b,c])):
with warnings.catch_warnings(record=True) as cx_manager: # 这里
X = data
sc =StandardScaler()
X = sc.fit_transform(X)
y = ['foo', 'bar']*int(len(X)/2)
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=None)
clf = GridSearchCV(abc, parameters, cv=skf, scoring='accuracy', n_jobs=-1,)
clf.fit(X,y)
dict_best_params = clf.best_params_.copy()
dict_best_params['accuracy'] = clf.best_score_
dict_best_params['warning'] = [i.message for i in cx_manager] # 这里
best_params = pd.DataFrame(dict_best_params, index=[i])
df_params = pd.concat([df_params, best_params], ignore_index=False)
输出:
>>> df_params
learning_rate n_estimators accuracy warning
0 0.20 10 0.520000 `base_estimator` was renamed to `estimator` in...
1 0.20 10 0.514286 `base_estimator` was renamed to `estimator` in...
2 0.01 5 0.440000 `base_estimator` was renamed to `estimator` in...
英文:
IIUC, you can use catch_warning
:
import warnings # HERE
import numpy as np
import tqdm as tq
import pandas as pd
from sklearn.preprocessing import StandardScaler
df_params = pd.DataFrame(columns=['learning_rate', 'n_estimators', 'accuracy', 'warning'])
abc = AdaBoostClassifier(base_estimator=RandomForestClassifier())
parameters = {'n_estimators':[5,10],
'learning_rate':[0.01,0.2]}
a = np.random.random((50, 3))
b = np.random.random((70, 3))
c = np.random.random((50, 5))
warns = []
for i, data in tq.tqdm(enumerate([a,b,c])):
with warnings.catch_warnings(record=True) as cx_manager: # HERE
X = data
sc =StandardScaler()
X = sc.fit_transform(X)
y = ['foo', 'bar']*int(len(X)/2)
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=None)
clf = GridSearchCV(abc, parameters, cv=skf, scoring='accuracy', n_jobs=-1,)
clf.fit(X,y)
dict_best_params = clf.best_params_.copy()
dict_best_params['accuracy'] = clf.best_score_
dict_best_params['warning'] = [i.message for i in cx_manager] # HERE
best_params = pd.DataFrame(dict_best_params, index=[i])
df_params = pd.concat([df_params, best_params], ignore_index=False)
Output:
>>> df_params
learning_rate n_estimators accuracy warning
0 0.20 10 0.520000 `base_estimator` was renamed to `estimator` in...
1 0.20 10 0.514286 `base_estimator` was renamed to `estimator` in...
2 0.01 5 0.440000 `base_estimator` was renamed to `estimator` in...
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论