保存 H2O GridSearch 为 CSV

huangapple go评论50阅读模式
英文:

Saving H2O GridSeach as CSV

问题

import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.grid.grid_search import H2OGridSearch

h2o.init()
data = h2o.import_file('dataset.csv')
train, test = train.split_frame(ratios=[0.8])

n_trees = [50, 100, 200, 300]
max_depth = [5, 6, 7]
learn_rate = [0.01, 0.05, 0.1]
min_rows = [10, 15, 20]
min_split_improvement = [0.00001, 0.0001]
hyper_parameters = {"ntrees": n_trees, 
                   "max_depth": max_depth,
                   "learn_rate": learn_rate,
                   "min_rows": min_rows}

gs = H2OGridSearch(model=H2OGradientBoostingEstimator, hyper_params=hyper_parameters)
gs.train(x=train.columns, y=target_column, training_frame=train, validation_frame=test, distribution='bernoulli')

grid_perf = gs.get_grid(sort_by='auc', decreasing=True)

要将grid_perf的结果保存为CSV文件,您可以尝试以下代码:

h2o.download_csv(grid_perf, 'grid_search_results.csv')
英文:

I have the following code:

import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.grid.grid_search import H2OGridSearch

h2o.init()
data=h2o.import_file('dataset.csv')
train,test= train.split_frame(ratios=[0.8])

n_trees = [50, 100, 200, 300]
max_depth = [5, 6, 7]
learn_rate = [0.01, 0.05, 0.1]
min_rows = [10,15,20]
min_split_improvement = [0.00001, 0.0001]
hyper_parameters = {"ntrees":n_trees, 
                   "max_depth":max_depth,
                   "learn_rate":learn_rate,
                   "min_rows":min_rows}

gs=H2OGridSearch(model=H2OGradientBoostingEstimator, hyper_params=hyper_parameters)
gs.train(x=train.columns, y=target_column, training_frame=train, validation_frame=test, distribution='bernoulli')

grid_perf=gs.get_grid(sort_by='auc',decreasing=True)

This produces a grid search of GBMs on the dataset.
I want to be able to save the result of the grid search, grid_perf, as a csv.

Something along the lines of:
h2o.export_file(grid_perf,'grid_search_results.csv')

Note: the code above works, so no debugging necessary, thanks.

Tried using the above line, but it gives me a Argument python_obj should be a None | list | tuple | dict | numpy.ndarray | pandas.DataFrame | scipy.sparse.issparse, got H2OGridSearch error.

答案1

得分: 1

grid_perf._grid_json 可以适用于您的情况吗?

也许 _grid_json["summary_table"]

英文:

Would grid_perf._grid_json work for your case?

Maybe _grid_json["summary_table"]?

答案2

得分: 0

感谢Adam Valenta的建议。
使用这个建议,解决方案如下:

grid_perf = gs.get_grid(sort_by='auc', decreasing=True)
table = grid_perf._grid_json['summary_table'].as_data_frame()
table.to_csv('GridSearch1.csv', index=False)
英文:

Thanks to Adam Valenta for the suggestion.
Using that, the solution is:

grid_perf=gs.get_grid(sort_by='auc', decreasing=True)
table = grid_perf._grid_json['summary_table'].as_data_frame()
table.to_csv('GridSearch1.csv',index=False)

huangapple
  • 本文由 发表于 2023年2月23日 23:16:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/75546765.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定