OpenVINO 2022.3之十：OpenVINO™ 中用于推理优化的自适应参数选择

当你使用 AI 模型进行推理时，OpenVINO™ 提供了一些自动设定参数选项的功能。

常见有以下3个与模型推理相关的功能：

用于 input 数据足够多时，提供最大 throughput 的 Auto-batching 功能；
用于自动选择设备进行推理的 Auto Plugin 功能；
以及用于满足特定模型动态输入的 Dynamic Shape 功能。

1 Auto-batching

Auto-batching 设计目的是让开发者利用最少的代码去实现使用英特尔® 显卡做模型推理的数据吞吐量最大化。在没有设定 input 以及没有限制范围的情况下，它会按照集成显卡或者是独立显卡能承受的最大吞吐量去设定推理线程数。如果应用程序有大量的输入数据且以高频率连续提交推理请求，推荐使用 Auto-batching 功能。

Auto-batching启用：

将“device“参数设置为：“BATCH:GPU“ 该功能将会被激活

./benchmark_app -m <model> -d "BATCH:GPU"
./benchmark_app -m <model> -d "BATCH:GPU(16)"
./benchmark_app -m <model> -d "BATCH:CPU(16)"

在 GPU 推理时，选择性能模式为”THROUGHPUT”，该功能将会被自动触发。

config = {"PERFORMANCE_HINT": "THROUGHPUT"}
compiled_model = core.compile_model(model, "GPU", config)

CPU设备不支持隐式启用BATCH设备，命令如

./benchmark_app -m <model> -d CPU -hint tput

将不会隐式应用BATCH设备，但

./benchmark_app -m <model> -d "BATCH:CPU(16)

可以显式加载BATCH设备。

设置推理线程数：

在 Auto-batching 中限定推理线程有两种方式分别为，设置 BATCH: GPU (4)或设置 ov::hint::num_requests 参数可以将推理线程设为4：

auto compiled_model = core.compile_model(model, "GPU",
    ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT),
    ov::hint::num_requests(4));

Auto_batch_timeout：

Auto_batch_timeout参数用于监测输入数据送达的时延，初始值为1000，表示若1000毫秒后无数据输入则提示推理超时。注意，如果推理频率较低，或者根据 Auto_batch_timeout 参数发现推理超时，可以手动关闭 Auto-batching

// disabling the automatic batching
auto compiled_model = core.compile_model(model, "GPU",
    ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT),
    ov::hint::allow_auto_batching(false));

2 AUTO Plugin

在 OpenVINO™ 工具套件的推理插件（Plugin）选择上，除了常规的 CPU，iGPU，Myriad，您还可以选择使用 AUTO Plugin。开发者通过它快速部署 AI 模型，且不用考虑推理设备的选择就能获得一个不错的推理性能。不需要指定设备，它会自动配置推理硬件，当有多个设备时，它也会自动联合调用多个硬件进行推理。

AUTO Plugin 的工作流程是：

首先，检测当前环境下所有的可用设备，之后根据预制的硬件选择规则，选择相应的推理设备，并且优化推理的整体配置，最后执行 AI 推理。

AUTO Plugin 对于推理设备选择遵循以下的规则：

dGPU (e.g. Intel® Iris® Xe MAX)

iGPU (e.g. Intel® UHD Graphics 620 (iGPU))

Intel® Movidius™ Myriad™ X VPU(e.g. Intel® Neural Compute Stick 2 (Intel® NCS2))

Intel® CPU (e.g. Intel® Core™ i7-1165G7)

AUTO Plugin 内置有三个模式可供选择:

1 THROUGHPUT：默认模式。该模式优先考虑高吞吐量，在延迟和功率之间进行平衡，最适合于涉及多个任务的推理，例如推理视频源或大量图像。注：此模式只会对 CPU 与 GPU 进行调用。若该模式下调用GPU进行推理，将会自动触发“Auto-batching“功能。
```
compiled_model = core.compile_model(model=model, device_name="AUTO", 
config={
      
      "PERFORMANCE_HINT":"THROUGHPUT"})
```
2.LATENCY：此选项优先考虑低延迟，为每个推理任务提供比较短的响应时间。它对于需要对单个输入图像进行推断的任务（例如超声扫描图像的医学分析）。此外，它还适用于实时或接近实时应用的任务，例如工业机器人对其环境中动作的响应或自动驾驶车辆的避障。注：此模式只会对 CPU 与 GPU 进行调用。
```
compiled_model = core.compile_model(model=model, device_name="AUTO", config={
      
      "PERFORMANCE_HINT":"LATENCY"})
```
3.CUMULATIVE_THROUGHPUT：CUMULTIVE_THROUGHPUT 模式允许同时在多个设备上运行推理以获得更高的吞吐量。使用 CUMULTIVE_THROUGHPUT 模式时，AUTO Plugin 将网络模型加载到候选列表中的所有可用设备，然后根据默认的优先级载入设备运行推理。
```
compiled_mode = core.compile_model(model, "AUTO", ov::hint::performance_mode(ov::hint::PerformanceMode::CUMULATIVE_THROUGHPUT));
```

3 Dynamic Shape

Dynamic shape 能供根据输入图像动态调整模型的 input shape 大小。在新版本OpenVINO™ 版本中，MO 支持了 Dynamic Shape 的功能，模型转换时可以不指定输入尺寸：

mo --saved_model_dir ./data/lpdetector/tf2/models/saved_model/ --output_dir ./data/lpdetector/tf2/models/FP32/

在生成的 xml 文件中，可以开到 input shape 被识别为动态输入，动态参数以“？”或“-1”进行显示：

<?xml version="1.0" ?>
<net name="saved_model" version="11">
    <layers>
      <layer id="0" name="input" type="Parameter" version="opset1">
        <data shape="?,?,?,3" element_type="f32"/>
      <output>
        <dim>-1</dim>
        <dim>-1</dim>
        <dim>-1</dim>
        <dim>3</dim>
        </port>
       </output>
    </layer>

如果模型优化器没有识别到模型的动态输入参数，也可以在代码中手动指定 Dynamic Shape：

C++:

ov::Core core;
auto model = core.read_model("model.xml");

// Set first dimension as dynamic (ov::Dimension()) and remaining dimensions as static
model->reshape({
   
   {ov::Dimension(), 3, 224, 224}});  // {?,3,224,224}

// Or, set third and fourth dimensions as dynamic
model->reshape({
   
   {1, 3, ov::Dimension(), ov::Dimension()}});  // {1,3,?,?}

Python:

Core = ov.Core()
model = core.read_model(“model.xml”)

# Set first dimension to be dynamic while keeping others static
model.reshape([-1, 3, 224, 224])

# Or, set third and fourth dimensions as dynamic
model.reshape([1, 3, -1, -1])

当然，也可以指定动态输入的动态范围：

c++:

// Both dimensions are dynamic, first has a size within 1..10 and the second has a size within 8..512
model->reshape({
   
   {ov::Dimension(1, 10), ov::Dimension(8, 512)}});  // {1..10,8..512}

// Both dimensions are dynamic, first doesn't have bounds, the second is in the range of 8..512
model->reshape({
   
   {-1, ov::Dimension(8, 512)}});   // {?,8..512}

python:

# Example 1 - set first dimension as dynamic (no bounds) and third and fourth dimensions to range of 112..448
model.reshape([-1, 3, (112, 448), (112, 448)])

# Example 2 - Set first dimension to a range of 1..8 and third and fourth dimensions to range of 112..448
model.reshape([(1, 8), 3, (112, 448), (112, 448)])