接前一篇文章:OpenAI Gym中FrozenLake环境(场景)源码分析(5)
上一篇文章通过pdb调试了第3个关键步骤:
- env.action_space.sample()
本文来看第3个关键步骤:
-
env.step(action)
为了便于看清楚及调试,退出前一次调试,重新运行以下命令开始新的调试:
python -m pdb frozen_lake2.py
命令及结果如下:
$ python -m pdb frozen_lake2.py
> /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py(1)<module>()
-> import numpy as np
(Pdb)
env.action_space.sample()在frozen_lake2.py的第73行,因此将断点设置在文件的第73行,命令及结果如下:
$ python -m pdb frozen_lake2.py
> /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py(1)<module>()
-> import numpy as np
(Pdb) b 73
Breakpoint 1 at /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py:73
(Pdb)
之后输入c,使程序继续运行(执行到这个断点)。如下所示:
(Pdb) c
The observation space: Discrete(16)
16
The action space: Discrete(4)
4
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
> /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py(73)<module>()
-> new_state, reward, done, truncated, info = env.step(action) # 这个也是,刚开始报错,来后我查了新的库这个函数输出五个数,网上说最后那个加‘_’就行(Pdb)
可以看到,程序已经停在了断点的位置。输入s,细点执行,也就是通常所说的Step In,即进入到函数或方法中。如下所示:
-> new_state, reward, done, truncated, info = env.step(action) # 这个也是,刚开始报错,来后我查了新的库这个函数输出五个数,网上说最后那个加‘_’就行(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/time_limit.py(39)step()
-> def step(self, action):
(Pdb)
可以看到,程序已经进入到了step函数中。最为关键的是,其指示出了step函数所在的位置,是在gym/wrappers/time_limit.py文件中。step方法代码如下:
def step(self, action):
"""Steps through the environment and if the number of steps elapsed exceeds ``max_episode_steps`` then truncate.
Args:
action: The environment step action
Returns:
The environment step ``(observation, reward, terminated, truncated, info)`` with `truncated=True`
if the number of steps elapsed >= max episode steps
"""
observation, reward, terminated, truncated, info = self.env.step(action)
self._elapsed_steps += 1
if self._elapsed_steps >= self._max_episode_steps:
truncated = True
return observation, reward, terminated, truncated, info
这个函数是class TimeLimit(gym.Wrapper)中的方法。继续跟进调试:
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/time_limit.py(39)step()
-> def step(self, action):
(Pdb) s
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/time_limit.py(50)step()
-> observation, reward, terminated, truncated, info = self.env.step(action)
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/order_enforcing.py(33)step()
-> def step(self, action):
(Pdb)
这个step方法位于gym/wrappers/order_enforcing.py文件中。step方法代码如下:
def step(self, action):
"""Steps through the environment with `kwargs`."""
if not self._has_reset:
raise ResetNeeded("Cannot call env.step() before calling env.reset()")
return self.env.step(action)
继续跟进调试:
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/order_enforcing.py(33)step()
-> def step(self, action):
(Pdb) s
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/order_enforcing.py(35)step()
-> if not self._has_reset:
(Pdb) s
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/order_enforcing.py(37)step()
-> return self.env.step(action)
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/env_checker.py(33)step()
-> def step(self, action: ActType):
(Pdb)
这次又来到了gym/wrappers/env_checker.py的class PassiveEnvChecker(gym.Wrapper)
的step方法中。代码如下:
def step(self, action: ActType):
"""Steps through the environment that on the first call will run the `passive_env_step_check`."""
if self.checked_step is False:
self.checked_step = True
return env_step_passive_checker(self.env, action)
else:
return self.env.step(action)
继续跟进:
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/env_checker.py(33)step()
-> def step(self, action: ActType):
(Pdb) s
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/env_checker.py(35)step()
-> if self.checked_step is False:
(Pdb) s
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/env_checker.py(36)step()
-> self.checked_step = True
(Pdb) s
> /home/penghao/.local/lib/python3.11/site-packages/gym/wrappers/env_checker.py(37)step()
-> return env_step_passive_checker(self.env, action)
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/passive_env_checker.py(211)env_step_passive_checker()
-> def env_step_passive_checker(env, action):
env_step_passive_checker方法位于gym/utils/passive_env_checker.py文件中。代码如下:
def env_step_passive_checker(env, action):
"""A passive check for the environment step, investigating the returning data then returning the data unchanged."""
# We don't check the action as for some environments then out-of-bounds values can be given
result = env.step(action)
assert isinstance(
result, tuple
), f"Expects step result to be a tuple, actual type: {type(result)}"
if len(result) == 4:
logger.deprecation(
"Core environment is written in old step API which returns one bool instead of two. "
"It is recommended to rewrite the environment with new step API. "
)
obs, reward, done, info = result
if not isinstance(done, (bool, np.bool8)):
logger.warn(
f"Expects `done` signal to be a boolean, actual type: {type(done)}"
)
elif len(result) == 5:
obs, reward, terminated, truncated, info = result
# np.bool is actual python bool not np boolean type, therefore bool_ or bool8
if not isinstance(terminated, (bool, np.bool8)):
logger.warn(
f"Expects `terminated` signal to be a boolean, actual type: {type(terminated)}"
)
if not isinstance(truncated, (bool, np.bool8)):
logger.warn(
f"Expects `truncated` signal to be a boolean, actual type: {type(truncated)}"
)
else:
raise error.Error(
f"Expected `Env.step` to return a four or five element tuple, actual number of elements returned: {len(result)}."
)
check_obs(obs, env.observation_space, "step")
if not (
np.issubdtype(type(reward), np.integer)
or np.issubdtype(type(reward), np.floating)
):
logger.warn(
f"The reward returned by `step()` must be a float, int, np.integer or np.floating, actual type: {type(reward)}"
)
else:
if np.isnan(reward):
logger.warn("The reward is a NaN value.")
if np.isinf(reward):
logger.warn("The reward is an inf value.")
assert isinstance(
info, dict
), f"The `info` returned by `step()` must be a python dictionary, actual type: {type(info)}"
return result
继续跟进调试:
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/passive_env_checker.py(211)env_step_passive_checker()
-> def env_step_passive_checker(env, action):
(Pdb) s
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/passive_env_checker.py(214)env_step_passive_checker()
-> result = env.step(action)
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/envs/toy_text/frozen_lake.py(244)step()
-> def step(self, a):
(Pdb)
最终来到了gym/envs/toy_text/frozen_lake.py文件中,如同前篇文章中分析的一样。frozen_lake.py中的step函数代码如下:
def step(self, a):
transitions = self.P[self.s][a]
i = categorical_sample([t[0] for t in transitions], self.np_random)
p, s, r, t = transitions[i]
self.s = s
self.lastaction = a
if self.render_mode == "human":
self.render()
return (int(s), r, t, False, {"prob": p})
对于step函数的具体解析,请看下回。