2023年2月24日 09:27:35go评论59阅读模式

英文:

Vectorized GYM Environments how to block automatic environment reset on Done = True

问题

SINGLE ENV CODE:

当我在GYM中运行"single"环境时，一旦达到True状态，就不会重置。

When I use the vectorized environments though the reset values are returned as the next_state values immediately.

但是，当我使用矢量化环境时，重置值会立即作为下一个状态值返回。

Is there a way to block that automatic reset behavior in the vectorized environments or is there any other way to record the un-reset Next_State value?

是否有一种方法可以阻止在矢量化环境中的自动重置行为，或者是否有其他方法可以记录未重置的Next_State值？

SINGLE ENV RESULTS:

单一环境结果：

[0.9371 0.1632 0.9866 0.0424] False [ 0.0114 0.0381 -0.0195 -0.0132] [ 0.0121 0.2335 -0.0198 -0.312 ] [[-0.0429 0.194 -0.0462 -0.2908]]

[0.7218 0.5444 0.7603 0.5108] False [ 0.0121 0.2335 -0.0198 -0.312 ] [ 0.0168 0.4289 -0.026 -0.6109] [[-0.0429 0.194 -0.0462 -0.2908]]

[0.6618 0.6869 0.6806 0.6701] False [ 0.0168 0.4289 -0.026 -0.6109] [ 0.0254 0.6244 -0.0383 -0.9116] [[-0.0429 0.194 -0.0462 -0.2908]]

[0.6701 0.7614 0.6772 0.7496] False [ 0.0254 0.6244 -0.0383 -0.9116] [ 0.0379 0.82 -0.0565 -1.2161] [[-0.0429 0.194 -0.0462 -0.2908]]

[0.6977 0.8072 0.699 0.797 ] False [ 0.0379 0.82 -0.0565 -1.2161] [ 0.0543 1.0158 -0.0808 -1.5259] [[-0.0429 0.194 -0.0462 -0.2908]]

[0.7276 0.8383 0.7259 0.8281] False [ 0.0543 1.0158 -0.0808 -1.5259] [ 0.0746 1.2118 -0.1113 -1.8427] [[-0.0429 0.194 -0.0462 -0.2908]]

[0.7547 0.8607 0.7513 0.85 ] False [ 0.0746 1.2118 -0.1113 -1.8427] [ 0.0988 1.408 -0.1482 -2.1678] [[-0.0429 0.194 -0.0462 -0.2908]]

[0.7782 0.8777 0.7736 0.8663] False [ 0.0988 1.408 -0.1482 -2.1678] [ 0.127 1.6042 -0.1915 -2.5023] [[-0.0429 0.194 -0.0462 -0.2908]]

[0.7983 0.8911 0.7928 0.8789] True [ 0.127 1.6042 -0.1915 -2.5023] [ 0.159 1.8003 -0.2416 -2.8471] [[-0.0429 0.194 -0.0462 -0.2908]]

当[True]时，比率与其他比率一致，当前值不会被重置。

VECTORIZED ENV CODE:

矢量化环境代码：

nn = 1

#env_vect = gym.vector.SyncVectorEnv([lambda: gym.make("CartPole-v1").env for _ in range(nn)])
env_vect = gym.vector.make('CartPole-v1', num_envs=nn)

current_state = env_vect.reset()

print("current_state", current_state)
#print("self.env.state", env_vect.state)
print("self.env.state", env_vect.observations)

for i in range(50):
next_state, reward , done, info = env_vect.step([1 for i in range(nn)])
print(current_state / next_state, done, current_state, next_state, env_vect.observations)
current_state = deepcopy(next_state)

矢量化环境结果：

[[1.0269 0.1803 0.9242 0.1417]] [False] [[-0.0327 0.043 -0.0119 -0.0489]] [[-0.0319 0.2382 -0.0129 -0.3454]] [[-0.0319 0.2382 -0.0129 -0.3454]]

[[1.1757 0.5495 0.6516 0.5379]] [False] [[-0.0319 0.2382 -0.0129 -0.3454]] [[-0.0271 0.4335 -0.0198 -0.6421]] [[-0.0271 0.4335 -0.0198 -0.6421]]

[[1.4702 0.6893 0.6069 0.6824]] [False] [[-0.0271 0.4335 -0.0198 -0.6421]] [[-0.0184 0.6289 -0.0327 -0.941 ]] [[-0.0184 0.6289 -0.0327 -0.941 ]]

[[3.1457 0.7628 0.6345 0.7566]] [False] [[-0.0184 0.6289 -0.0327 -0.941 ]] [[-0.0059 0.8245 -0.0515 -1.2437]] [[-0.0059 0.

英文:

when I run "single" environment in GYM there is no reset once True is achieved

When I use the vectorized environments though the reset values are returned as the next_state values immediately.

Is there a way to block that automatic reset behavior in the vectorized environments or is there any other way to record the un-reset Next_State value?

SINGLE ENV CODE:

import gym

env = gym.make(&quot;CartPole-v1&quot;)
current_state = env.reset()


for i in range(50):
    next_state, reward, done, info = env.step(1)
    print(current_state / next_state, done, current_state, next_state, env_vect.observations)
    current_state = next_state