英文:
Why my hyper_opt algorithm returns a bad 'best configuration' with same parameters written several times
问题
我最近使用搜索算法进行了超参数优化。
目的是在OpenAI Gym环境中训练一个代理。
问题如下:当我使用ray.tune的hyperOpt算法进行超参数优化时,它多次返回相同参数的最佳配置。此外,我无法使用这个最佳配置来运行单位训练。我推断出存在问题。
以下是我的代码:
config = {
"env": "LunarLander-v2",
"sgd_minibatch_size": 1000,
"num_sgd_iter": 1000,
"lr": tune.uniform(5e-6, 5e-2),
"lambda": tune.uniform(0.6, 0.9),
"vf_loss_coeff": 0.7,
"kl_target": 0.01,
"kl_coeff": tune.uniform(0.5, 0.9),
"entropy_coeff": 0.001,
"clip_param": tune.uniform(0.4, 0.99),
"train_batch_size": 25000,
"num_workers": 4,
"num_gpus": 0,
}
config = explore(config)
optimizer = HyperOptSearch(
metric="episode_reward_mean",
mode="max",
n_initial_points=1,
random_state_seed=7,
space=config,
)
tuner = tune.Tuner(
"PPO",
tune_config=tune.TuneConfig(
metric="episode_reward_mean",
mode="max",
search_alg=optimizer,
num_samples=1,
),
run_config=air.RunConfig(stop={"training_iteration": 1}),
)
results = tuner.fit()
best_conf = results.get_best_result().config
print(f"\n##############################################\nMeilleure configuration : {best_conf}\n##############################################\n")
这是调优的最佳配置(很长,所有参数都写了好几次):
***Best configuration***: {'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, ...(此处省略了大部分内容)}
我补充一下,我在ray文档中找到了这个。我该如何调整它以适应我的情况?
import os
logdir = results.get_best_result("mean_accuracy", mode="max").log_dir
state_dict = torch.load(os.path.join(logdir, "model.pth"))
model = ConvNet()
model.load_state_dict(state_dict)
提前感谢您的时间
英文:
I recently worked on a hyperparameters optimization with a search algorithm.
The purpose is to train an agent in an OpenAI Gym environment.
The problem is the following one : when I realize a hyperparameters optimization with a hyperOpt algorithm from ray.tune, it returns me a best config with same parameters several times in this configuration. Furthermore, I cannot use this best configuration to run a unit training. I deduced there was a problem.
I show you below my code below :
config = {
"env": "LunarLander-v2",
"sgd_minibatch_size": 1000,
"num_sgd_iter": 1000,
"lr": tune.uniform(5e-6, 5e-2),
"lambda": tune.uniform(0.6, 0.9),
"vf_loss_coeff": 0.7,
"kl_target": 0.01,
"kl_coeff": tune.uniform(0.5, 0.9),
"entropy_coeff": 0.001,
"clip_param": tune.uniform(0.4, 0.99),
"train_batch_size": 25000, # taille de l'épisode
# "monitor": True,
# "model": {"free_log_std": True},
"num_workers": 4,
"num_gpus": 0,
# "rollout_fragment_length":3
# "batch_mode": "complete_episodes"
}
config = explore(config)
optimizer = HyperOptSearch(metric="episode_reward_mean", mode="max", n_initial_points=1, random_state_seed=7, space=config)
# optimizer = ConcurrencyLimiter(optimizer, max_concurrent=4)
tuner = tune.Tuner(
"PPO",
tune_config=tune.TuneConfig(
metric="episode_reward_mean", # the metric we want to study
mode="max", # maximize the metric
search_alg=optimizer,
# num_samples will repeat the entire config 'num_samples' times == Number of trials dans l'output 'Status'
num_samples=1,
),
run_config=air.RunConfig(stop={"training_iteration": 1}),
# limite le nombre d'épisode pour chaque croisement d'hyperparamètres
)
results = tuner.fit()
best_conf=results.get_best_result().config
print(f"\n ##############################################\n Meilleure configuration : {best_conf}\n ##############################################\n")
So here is the best config of this tuning (it is long so also difficult to see, but all parameters are written several times) :
***Best configuration*** : {'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, '_fake_gpus': False, 'custom_resources_per_worker': {}, 'placement_strategy': 'PACK', 'eager_tracing': False, 'eager_max_retraces': 20, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'env': 'LunarLander-v2', 'env_config': {}, 'observation_space': None, 'action_space': None, 'env_task_fn': None, 'render_env': False, 'clip_rewards': None, 'normalize_actions': True, 'clip_actions': False, 'disable_env_checking': False, 'num_workers': 4, 'num_envs_per_worker': 1, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'sample_async': False, 'enable_connectors': False, 'rollout_fragment_length': 6250, 'batch_mode': 'truncate_episodes', 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'validate_workers_after_construction': True, 'ignore_worker_failures': False, 'recreate_failed_workers': False, 'restart_failed_sub_environments': False, 'num_consecutive_worker_failures_tolerance': 100, 'horizon': None, 'soft_horizon': False, 'no_done_at_end': False, 'preprocessor_pref': 'deepmind', 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'compress_observations': False, 'enable_tf1_exec_eagerly': False, 'sampler_perf_stats_ema_coef': None, 'gamma': 0.99, 'lr': 0.03346975115973727, 'train_batch_size': 25000, 'model': {'_use_default_native_models': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, 'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'framestack': True, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1}, 'optimizer': {}, 'explore': True, 'exploration_config': {'type': 'StochasticSampling'}, 'input_config': {}, 'actions_in_input_normalized': False, 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_config': {}, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'evaluation_interval': None, 'evaluation_duration': 10, 'evaluation_duration_unit': 'episodes', 'evaluation_sample_timeout_s': 180.0, 'evaluation_parallel_to_training': False, 'evaluation_config': {'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, '_fake_gpus': False, 'custom_resources_per_worker': {}, 'placement_strategy': 'PACK', 'eager_tracing': False, 'eager_max_retraces': 20, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'env': 'LunarLander-v2', 'env_config': {}, 'observation_space': None, 'action_space': None, 'env_task_fn': None, 'render_env': False, 'clip_rewards': None, 'normalize_actions': True, 'clip_actions': False, 'disable_env_checking': False, 'num_workers': 4, 'num_envs_per_worker': 1, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'sample_async': False, 'enable_connectors': False, 'rollout_fragment_length': 6250, 'batch_mode': 'truncate_episodes', 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'validate_workers_after_construction': True, 'ignore_worker_failures': False, 'recreate_failed_workers': False, 'restart_failed_sub_environments': False, 'num_consecutive_worker_failures_tolerance': 100, 'horizon': None, 'soft_horizon': False, 'no_done_at_end': False, 'preprocessor_pref': 'deepmind', 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'compress_observations': False, 'enable_tf1_exec_eagerly': False, 'sampler_perf_stats_ema_coef': None, 'gamma': 0.99, 'lr': 0.03346975115973727, 'train_batch_size': 25000, 'model': {'_use_default_native_models': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, 'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'framestack': True, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1}, 'optimizer': {}, 'explore': True, 'exploration_config': {'type': 'StochasticSampling'}, 'input_config': {}, 'actions_in_input_normalized': False, 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_config': {}, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'evaluation_interval': None, 'evaluation_duration': 10, 'evaluation_duration_unit': 'episodes', 'evaluation_sample_timeout_s': 180.0, 'evaluation_parallel_to_training': False, 'evaluation_config': {}, 'off_policy_estimation_methods': {}, 'evaluation_num_workers': 0, 'always_attach_evaluation_results': False, 'in_evaluation': False, 'sync_filters_on_rollout_workers_timeout_s': 60.0, 'keep_per_episode_custom_metrics': False, 'metrics_episode_collection_timeout_s': 60.0, 'metrics_num_episodes_for_smoothing': 100, 'min_time_s_per_iteration': None, 'min_train_timesteps_per_iteration': 0, 'min_sample_timesteps_per_iteration': 0, 'logger_creator': None, 'logger_config': None, 'log_level': 'WARN', 'log_sys_usage': True, 'fake_sampler': False, 'seed': None, '_tf_policy_handles_more_than_one_loss': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, '_disable_execution_plan_api': True, 'simple_optimizer': False, 'monitor': -1, 'evaluation_num_episodes': -1, 'metrics_smoothing_episodes': -1, 'timesteps_per_iteration': -1, 'min_iter_time_s': -1, 'collect_metrics_timeout': -1, 'buffer_size': -1, 'prioritized_replay': -1, 'learning_starts': -1, 'replay_batch_size': -1, 'replay_sequence_length': None, 'prioritized_replay_alpha': -1, 'prioritized_replay_beta': -1, 'prioritized_replay_eps': -1, 'min_time_s_per_reporting': -1, 'min_train_timesteps_per_reporting': -1, 'min_sample_timesteps_per_reporting': -1, 'input_evaluation': -1, 'lr_schedule': None, 'use_critic': True, 'use_gae': True, 'kl_coeff': 0.5003002941138288, 'sgd_minibatch_size': 1000, 'num_sgd_iter': 1000, 'shuffle_sequences': True, 'vf_loss_coeff': 0.7, 'entropy_coeff': 0.001, 'entropy_coeff_schedule': None, 'clip_param': 0.9429343265857039, 'vf_clip_param': 10.0, 'grad_clip': None, 'kl_target': 0.01, 'vf_share_layers': -1, 'lambda': 0.7125712711928637, 'input': 'sampler', 'multiagent': {'policies': {'default_policy': <ray.rllib.policy.policy.PolicySpec object at 0x7f1d0c4073d0>}, 'policy_map_capacity': 100, 'policy_map_cache': None, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'callbacks': <class 'ray.rllib.algorithms.callbacks.DefaultCallbacks'>, 'create_env_on_driver': False, 'custom_eval_function': None, 'framework': 'tf', 'num_cpus_for_driver': 1}, 'off_policy_estimation_methods': {}, 'evaluation_num_workers': 0, 'always_attach_evaluation_results': False, 'in_evaluation': False, 'sync_filters_on_rollout_workers_timeout_s': 60.0, 'keep_per_episode_custom_metrics': False, 'metrics_episode_collection_timeout_s': 60.0, 'metrics_num_episodes_for_smoothing': 100, 'min_time_s_per_iteration': None, 'min_train_timesteps_per_iteration': 0, 'min_sample_timesteps_per_iteration': 0, 'logger_creator': None, 'logger_config': None, 'log_level': 'WARN', 'log_sys_usage': True, 'fake_sampler': False, 'seed': None, '_tf_policy_handles_more_than_one_loss': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, '_disable_execution_plan_api': True, 'simple_optimizer': False, 'monitor': -1, 'evaluation_num_episodes': -1, 'metrics_smoothing_episodes': -1, 'timesteps_per_iteration': -1, 'min_iter_time_s': -1, 'collect_metrics_timeout': -1, 'buffer_size': -1, 'prioritized_replay': -1, 'learning_starts': -1, 'replay_batch_size': -1, 'replay_sequence_length': None, 'prioritized_replay_alpha': -1, 'prioritized_replay_beta': -1, 'prioritized_replay_eps': -1, 'min_time_s_per_reporting': -1, 'min_train_timesteps_per_reporting': -1, 'min_sample_timesteps_per_reporting': -1, 'input_evaluation': -1, 'lr_schedule': None, 'use_critic': True, 'use_gae': True, 'kl_coeff': 0.5003002941138288, 'sgd_minibatch_size': 1000, 'num_sgd_iter': 1000, 'shuffle_sequences': True, 'vf_loss_coeff': 0.7, 'entropy_coeff': 0.001, 'entropy_coeff_schedule': None, 'clip_param': 0.9429343265857039, 'vf_clip_param': 10.0, 'grad_clip': None, 'kl_target': 0.01, 'vf_share_layers': -1, 'lambda': 0.7125712711928637, 'input': 'sampler', 'multiagent': {'policies': {'default_policy': <ray.rllib.policy.policy.PolicySpec object at 0x7f1d0c407580>}, 'policy_map_capacity': 100, 'policy_map_cache': None, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'callbacks': <class 'ray.rllib.algorithms.callbacks.DefaultCallbacks'>, 'create_env_on_driver': False, 'custom_eval_function': None, 'framework': 'tf', 'num_cpus_for_driver': 1}
I add that I found this on the ray documentation. How could I adapt it to my case ?
import os
logdir = results.get_best_result("mean_accuracy", mode="max").log_dir
state_dict = torch.load(os.path.join(logdir, "model.pth"))
model = ConvNet()
model.load_state_dict(state_dict)
Thank you by advance for your time
答案1
得分: 1
您当前仅使用 num_samples=1
运行,这应该只产生一个带有一个采样配置的结果。 RLlib 使用其他默认配置填充 best_conf
,但您指定的配置仍然存在。
要恢复您的 RLlib 实验,文档中的此资源 可能会有帮助(恢复和继续训练 RLlib 算法)。
英文:
You're currently running with only num_samples=1
, which should only produce a single result with one sampled configuration. RLlib is populating the best_conf
with other default configs, but the configs that you specified are still there.
For resuming your RLlib experiment, this resource from the docs may be useful (restoring and continuing training an RLlib algorithm).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论