OpenAI Gym中FrozenLake环境(场景)源码分析(5)

接前一篇文章:OpenAI Gym中FrozenLake环境(场景)源码分析(4)

上一篇文章通过pdb调试了第2个关键步骤:

  • env.reset()

 本文来看第3个关键步骤:

  • env.action_space.sample()

为了便于看清楚及调试,退出前一次调试,重新运行以下命令开始新的调试:

python -m pdb frozen_lake2.py 

命令及结果如下:

$ python -m pdb frozen_lake2.py 
> /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py(1)<module>()
-> import numpy as np
(Pdb) 

env.action_space.sample()在frozen_lake2.py的第70行,因此将断点设置在文件的第70行,命令及结果如下:

$ python -m pdb frozen_lake2.py 
> /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py(1)<module>()
-> import numpy as np
(Pdb) b 70
Breakpoint 1 at /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py:70
(Pdb) 

之后输入c,使程序继续运行(执行到这个断点)。如下所示:

(Pdb) c
The observation space: Discrete(16)
16
The action space: Discrete(4)
4
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
> /home/penghao/OpenAI-Gym/sample_code/frozen_lake2.py(70)<module>()
-> action = env.action_space.sample()
(Pdb) 

可以看到,程序已经停在了断点的位置。输入s,细点执行,也就是通常所说的Step In,即进入到函数或方法中。如下所示:

-> action = env.action_space.sample()
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/core.py(253)action_space()
-> @property
(Pdb) 

可以看到,程序已经进入到了action_space函数中。最为关键的是,其指示出了action_space函数所在的位置,是在gym/core.py文件中。action_space方法代码如下:

    @property
    def action_space(self) -> spaces.Space[ActType]:
        """Returns the action space of the environment."""
        if self._action_space is None:
            return self.env.action_space
        return self._action_space

这个函数是class Wrapper(Env[ObsType, ActType])中的方法。继续跟进调试:

-> action = env.action_space.sample()
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/core.py(253)action_space()
-> @property
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/core.py(256)action_space()
-> if self._action_space is None:
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/core.py(257)action_space()
-> return self.env.action_space
(Pdb) n
--Return--
> /home/penghao/.local/lib/python3.11/site-packages/gym/core.py(257)action_space()->Discrete(4)
-> return self.env.action_space
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/spaces/discrete.py(47)sample()
-> def sample(self, mask: Optional[np.ndarray] = None) -> int:
(Pdb) 

这个sample方法位于gym/spaces/discrete.py文件中。sample方法代码如下:

def sample(self, mask: Optional[np.ndarray] = None) -> int:
        """Generates a single random sample from this space.

        A sample will be chosen uniformly at random with the mask if provided

        Args:
            mask: An optional mask for if an action can be selected.
                Expected `np.ndarray` of shape `(n,)` and dtype `np.int8` where `1` represents valid actions and `0` invalid / infeasible actions.
                If there are no possible actions (i.e. `np.all(mask == 0)`) then `space.start` will be returned.

        Returns:
            A sampled integer from the space
        """
        if mask is not None:
            assert isinstance(
                mask, np.ndarray
            ), f"The expected type of the mask is np.ndarray, actual type: {type(mask)}"
            assert (
                mask.dtype == np.int8
            ), f"The expected dtype of the mask is np.int8, actual dtype: {mask.dtype}"
            assert mask.shape == (
                self.n,
            ), f"The expected shape of the mask is {(self.n,)}, actual shape: {mask.shape}"
            valid_action_mask = mask == 1
            assert np.all(
                np.logical_or(mask == 0, valid_action_mask)
            ), f"All values of a mask should be 0 or 1, actual values: {mask}"
            if np.any(valid_action_mask):
                return int(
                    self.start + self.np_random.choice(np.where(valid_action_mask)[0])
                )
            else:
                return self.start

        return int(self.start + self.np_random.integers(self.n))

继续跟进:

--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/spaces/discrete.py(47)sample()
-> def sample(self, mask: Optional[np.ndarray] = None) -> int:
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/spaces/discrete.py(60)sample()
-> if mask is not None:
(Pdb) p mask
None
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/spaces/discrete.py(81)sample()
-> return int(self.start + self.np_random.integers(self.n))
(Pdb) 

self.start指的是所属类的start,此处sample方法的所属类是class Discrete(Space[int]),此时start的值是0.如下所示:

(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/spaces/discrete.py(81)sample()
-> return int(self.start + self.np_random.integers(self.n))
(Pdb) p self.start
0

输入s继续Step In,进入到np_random方法中。如下所示:

(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/spaces/space.py(72)np_random()
-> @property
(Pdb) 

np_random方法位于gym/spaces/space.py文件中。np_random方法代码如下:

@property
    def np_random(self) -> np.random.Generator:
        """Lazily seed the PRNG since this is expensive and only needed if sampling from this space."""
        if self._np_random is None:
            self.seed()

        return self._np_random  # type: ignore  ## self.seed() call guarantees right type.

继续往下单步跟进调试:

--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/spaces/space.py(72)np_random()
-> @property
(Pdb) s
> /home/penghao/.local/lib/python3.11/site-packages/gym/spaces/space.py(75)np_random()
-> if self._np_random is None:
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/spaces/space.py(76)np_random()
-> self.seed()
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/spaces/space.py(103)seed()
-> def seed(self, seed: Optional[int] = None) -> list:
(Pdb) 

seed方法就在同文件(gym/spaces/space.py)中,代码如下:

def seed(self, seed: Optional[int] = None) -> list:
        """Seed the PRNG of this space and possibly the PRNGs of subspaces."""
        self._np_random, seed = seeding.np_random(seed)
        return [seed]

继续跟进:

-> self._np_random, seed = seeding.np_random(seed)
(Pdb) s
--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/seeding.py(9)np_random()
-> def np_random(seed: Optional[int] = None) -> Tuple[np.random.Generator, Any]:
(Pdb) 

np_random方法位于gym/utils/seeding.py中,代码如下(这个文件很简单,就只有这一个函数,因此贴出该文件全部代码):

"""Set of random number generator functions: seeding, generator, hashing seeds."""
from typing import Any, Optional, Tuple

import numpy as np

from gym import error


def np_random(seed: Optional[int] = None) -> Tuple[np.random.Generator, Any]:
    """Generates a random number generator from the seed and returns the Generator and seed.

    Args:
        seed: The seed used to create the generator

    Returns:
        The generator and resulting seed

    Raises:
        Error: Seed must be a non-negative integer or omitted
    """
    if seed is not None and not (isinstance(seed, int) and 0 <= seed):
        raise error.Error(f"Seed must be a non-negative integer or omitted, not {seed}")

    seed_seq = np.random.SeedSequence(seed)
    np_seed = seed_seq.entropy
    rng = RandomNumberGenerator(np.random.PCG64(seed_seq))
    return rng, np_seed


RNG = RandomNumberGenerator = np.random.Generator

np_random方法返回2个值:rng和np_seed。

rng = RandomNumberGenerator(np.random.PCG64(seed_seq))这行代码实际上是这样:

rng = np.random.Generator(np.random.PCG64(seed_seq))

而np_seed = seed_seq.entropy这回代码实际是这样:

np_seed = np.random.SeedSequence(seed).entropy

这里需要补充一下numpy随机数的相关知识。


参考以下博客:

numpy学习之随机数生成(1)_qianerwauestc的博客-CSDN博客

Python numpy.random.SeedSequence实例讲解 - 码农教程

涉及模组:

numpy.random

原理:

numpy的随机数程序使用BitGenerator的组合产生伪随机数,以创建序列,并使用Generator从不同的统计分布中采样:

  • BitGenerators

产生随机数的对象。这些通常是无符号整数字,充满了32或64个随机位的序列。

  • 生成器

将BitGenerator中的随机比特序列转化为指定区间内遵循特定概率分布(如均匀、正态或二项式)的数字序列的对象。

概念:

  • Random Generator

Generator提供了对各种分布的访问,并作为RandomState的替代。两者之间的主要区别是,Generator依靠一个额外的BitGenerator来管理状态并生成随机位,然后将其转化为有用分布的随机值。Generator使用的默认BitGenerator是PCG64。可以通过向Generator传递一个实例化的BitGenerator来改变这个BitGenerator。

创建Generator的方法:

1)numpy.random.default_rng(seed=None)

概述:用默认的BitGenerator(PCG64)构造一个新的Generator。

2)numpy.random.Generator(bit_generator)

概述:返回一个bitGernerator。

SeedSequence以可重复的方式混合熵源,以设置独立且很可能不重叠的初始状态BitGenerators。

用法:

class  numpy.random.SeedSequence(entropy=None, *, spawn_key=(), pool_size=4)


最终,np_random方法返回的2个值分别如下所示:

--Call--
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/seeding.py(9)np_random()
-> def np_random(seed: Optional[int] = None) -> Tuple[np.random.Generator, Any]:
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/seeding.py(21)np_random()
-> if seed is not None and not (isinstance(seed, int) and 0 <= seed):
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/seeding.py(24)np_random()
-> seed_seq = np.random.SeedSequence(seed)
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/seeding.py(25)np_random()
-> np_seed = seed_seq.entropy
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/seeding.py(26)np_random()
-> rng = RandomNumberGenerator(np.random.PCG64(seed_seq))
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/seeding.py(27)np_random()
-> return rng, np_seed
(Pdb) p rng
Generator(PCG64) at 0x7F7477B6AEA0
(Pdb) p np_seed
256403554427214301906482223862250111160
(Pdb) 

这就意味着gym/spaces/space.py中seed方法的返回值为[256403554427214301906482223862250111160],如下所示:

> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/seeding.py(27)np_random()
-> return rng, np_seed
(Pdb) p rng
Generator(PCG64) at 0x7F7477B6AEA0
(Pdb) p np_seed
256403554427214301906482223862250111160
(Pdb) n
--Return--
> /home/penghao/.local/lib/python3.11/site-packages/gym/utils/seeding.py(27)np_random()->(Generator(PCG...0x7F7477B6AEA0, 256403554427214301906482223862250111160)
-> return rng, np_seed
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/spaces/space.py(106)seed()
-> return [seed]
(Pdb) n
--Return--
> /home/penghao/.local/lib/python3.11/site-packages/gym/spaces/space.py(106)seed()->[256403554427214301906482223862250111160]
-> return [seed]
(Pdb) 

再往上一层,gym/spaces/space.py文件中的np_random方法的返回值如下:

> /home/penghao/.local/lib/python3.11/site-packages/gym/spaces/space.py(106)seed()
-> return [seed]
(Pdb) n
--Return--
> /home/penghao/.local/lib/python3.11/site-packages/gym/spaces/space.py(106)seed()->[256403554427214301906482223862250111160]
-> return [seed]
(Pdb) n
> /home/penghao/.local/lib/python3.11/site-packages/gym/spaces/space.py(78)np_random()
-> return self._np_random  # type: ignore  ## self.seed() call guarantees right type.
(Pdb) p self._np_random
Generator(PCG64) at 0x7F7477B6AEA0
(Pdb) 

再往上一层,self.np_random.integers(self.n)实际上是这样:

np.random.Generator(np.random.PCG64(seed_seq)).integers(self.n)

此时self.n为4,也就是代表了上、下、左、右四种动作。

而numpy.random.Generator.integers函数说明如下:


参考以下博客:

Python numpy.random.Generator.integers用法及代码示例 - 纯净天空

用法:

random.Generator.integers(low, high=None, size=None, dtype=np.int64, endpoint=False)

返回从低(含)到高(不含)的随机整数,或者如果端点=True,则从低(含)到高(含)。替换 RandomState.randint(endpoint=False)和 RandomState.random_integers(endpoint=True)

从指定dtype的 “discrete uniform” 分布中返回随机整数。如果 high 为 None(默认值),则结果从 0 到 low。 


最终,return int(self.start + self.np_random.integers(self.n))这一句代码中self.np_random.integers(self.n)的返回值为3,整体返回值为0+3=3。

看似挺长的一大段代码调用,实际的作用其实就是从上下左右四种动作中随机选择一种。

猜你喜欢

转载自blog.csdn.net/phmatthaus/article/details/131744764