官方地址https://github.com/triton-inference-server/server
官方文档地址

start up triton

服务端配置

./fetch_models.sh
cd server/docs/examples
docker run --gpus=1 --rm --net=host -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:23.02-py3 tritonserver --model-repository=/models
docker run --gpus=1 --rm --net=host -p8000:8000 -p8001:8001 -p8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:23.02-py3 tritonserver --model-repository=/models

在这里插入图片描述

请求端配置

docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:23.02-py3-sdk
cd /workspace/install/bin
image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg

在这里插入图片描述

model_repository简单介绍

$ tree ${PWD}/model_repository
/home/pdd/Diffusion/server/docs/examples/model_repository
└── densenet_onnx
    ├── 1(version Directory)
    │   └── model.onnx（存放模型或者源文件）
    ├── config.pbtxt(模型配置文件)
    ├── densenet_labels.txt(标签文件，对于分类模型可自动转换结果到标签)
    └── model_repository

模型或者源文件

	后缀
TensorRT	.plan
ONNX	.onnx
TorchScripts	.pt
TensorFlow	.graphdef ,.savedmodel
Python	.py
DALI	.dali
OpenVINO	.xml , .bin
Custom	.so

模型配置

name: "fc_model_pt" # 模型名，也是目录名
platform: "pytorch_libtorch" # 模型对应的平台，本次使用的是torch，不同格式的对应的平台可以在官方文档找到
max_batch_size : 64 # 一次送入模型的最大bsz，防止oom
input [
  {
    name: "input__0" # 输入名字，对于torch来说名字于代码的名字不需要对应，但必须是<name>__<index>的形式，注意是2个下划线，写错就报错
    data_type: TYPE_INT64 # 类型，torch.long对应的就是int64,不同语言的tensor类型与triton类型的对应关系可以在官方文档找到
    dims: [ -1 ]  # -1 代表是可变维度,虽然输入是二维的，但是默认第一个是bsz,所以只需要写后面的维度就行(无法理解的操作，如果是[-1,-1]调用模型就报错)
  }
]
output [
  {
    name: "output__0" # 命名规范同输入
    data_type: TYPE_FP32
    dims: [ -1, -1, 4 ]
  },
  {
    name: "output__1"
    data_type: TYPE_FP32
    dims: [ -1, -1, 8 ]
  }
]
- [这个模型配置文件估计是整个triton最复杂的地方，上线模型的大部分工作估计都在写配置文件，](https://zhuanlan.zhihu.com/p/516017726)

(base) pdd@pdd-Dell-G15-5511:~/Diffusion/server/docs/examples$ tree ${PWD}/model_repository
/home/pdd/Diffusion/server/docs/examples/model_repository
├── densenet_onnx
│   ├── 1
│   │   └── model.onnx
│   ├── config.pbtxt
│   ├── densenet_labels.txt
│   └── model_repository
├── inception_graphdef
│   ├── 1
│   │   └── model.graphdef
│   ├── config.pbtxt
│   └── inception_labels.txt
├── simple
│   ├── 1
│   │   └── model.graphdef
│   └── config.pbtxt
├── simple_dyna_sequence
│   ├── 1
│   │   └── model.graphdef
│   └── config.pbtxt
├── simple_identity
│   ├── 1
│   │   └── model.savedmodel
│   │       └── saved_model.pb
│   └── config.pbtxt
├── simple_int8
│   ├── 1
│   │   └── model.graphdef
│   └── config.pbtxt
├── simple_sequence
│   ├── 1
│   │   └── model.graphdef
│   └── config.pbtxt
└── simple_string
    ├── 1
    │   └── model.graphdef
    └── config.pbtxt

18 directories, 18 files

Error实验时的错误与解决

Error:Unable to destroy DCGM group: Setting not configured

按照官方指示操作，出现了以下错误，解决方案，将模型集合中的其他模型资源先移出集合，防止triton对其解析造成的错误

$ docker run --gpus=1 --rm --net=host -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:23.02-py3 tritonserver --model-repository=/models
Unable to find image 'nvcr.io/nvidia/tritonserver:23.02-py3' locally
23.02-py3: Pulling from nvidia/tritonserver   b549f31133a9: Pull complete 

=============================
== Triton Inference Server ==
=============================

I0323 04:51:12.372561 1 server.cc:590] 
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend     | Path                                                            | Config                                                                                                                                                        |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
| tensorflow  | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0323 04:51:12.372618 1 server.cc:633] 
+----------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model                | Version | Status                                                                                                                                                                                                       |
+----------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| densenet_onnx        | 1       | READY                                                                                                                                                                                                        |
| inception_graphdef   | 1       | UNAVAILABLE: Internal: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version                                                                                          |
| simple               | 1       | UNAVAILABLE: Internal: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version                                                                                          |
| simple_dyna_sequence | 1       | UNAVAILABLE: Internal: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version                                                                                          |
| simple_identity      | 1       | UNAVAILABLE: Internal: unable to auto-complete model configuration for 'simple_identity', failed to load model: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version |
| simple_int8          | 1       | UNAVAILABLE: Internal: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version                                                                                          |
| simple_sequence      | 1       | UNAVAILABLE: Internal: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version                                                                                          |
| simple_string        | 1       | UNAVAILABLE: Internal: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version                                                                                          |
+----------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

W0323 04:51:12.488232 1 metrics.cc:848] Cannot get CUDA device count, GPU metrics will not be available
I0323 04:51:12.493861 1 metrics.cc:757] Collecting CPU metrics
I0323 04:51:12.493988 1 tritonserver.cc:2264] 
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                               |
| server_version                   | 2.31.0                                                                                                                                                                                               |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace logging |
| model_repository_path[0]         | /models                                                                                                                                                                                              |
| model_control_mode               | MODE_NONE                                                                                                                                                                                            |
| strict_model_config              | 0                                                                                                                                                                                                    |
| rate_limit                       | OFF                                                                                                                                                                                                  |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                            |
| response_cache_byte_size         | 0                                                                                                                                                                                                    |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                  |
| strict_readiness                 | 1                                                                                                                                                                                                    |
| exit_timeout                     | 30                                                                                                                                                                                                   |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0323 04:51:12.493993 1 server.cc:264] Waiting for in-flight requests to complete.
I0323 04:51:12.494000 1 server.cc:280] Timeout 30: Found 0 model versions that have in-flight inferences
I0323 04:51:12.494026 1 server.cc:295] All models are stopped, unloading models
I0323 04:51:12.494031 1 server.cc:302] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0323 04:51:12.496973 1 onnxruntime.cc:2640] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0323 04:51:12.509553 1 onnxruntime.cc:2640] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0323 04:51:12.515835 1 onnxruntime.cc:2586] TRITONBACKEND_ModelFinalize: delete model state
I0323 04:51:12.515881 1 model_lifecycle.cc:579] successfully unloaded 'densenet_onnx' version 1
I0323 04:51:13.494168 1 server.cc:302] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
W0323 04:51:15.494565 1 metrics.cc:240] Unable to destroy DCGM group: Setting not configured

CG

git Git Clone错误解决：GnuTLS recv error (-110): The TLS connection was non-properly terminated.
通过curl命令验证Triton服务是否正常运行 curl -v localhost:8000/v2/health/ready
海卫:海王星有两个卫星. 大一点的卫星叫TRITON, 仅仅略大于地球的卫星.
https://blog.csdn.net/ResumeProject/article/details/127162169
- https://github.com/NVIDIA/TensorRT
https://github.com/huggingface/diffusers
现有几种搭建框架

Python：TF+Flask+Funicorn+Nginx
FrameWork：TF serving，TorchServe，ONNX Runtime
Intel：OpenVINO，mms，NVNN，QNNPACK（FB的）
NVIDIA：TensorRT Inference Server（Triton），DeepStream
————————————————
版权声明：本文为CSDN博主「ooMelloo」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/Aidam_Bo/article/details/112791627

TensorRT triton start up

start up triton

服务端配置

请求端配置

model_repository简单介绍

模型或者源文件

模型配置

Error实验时的错误与解决

Error:Unable to destroy DCGM group: Setting not configured

相关学习资源

官方文档 https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html

其它

open source examples of triton

Triton Features & Overview:

Model Analyzer

ASR

Video Inferencing

NLP

CG

猜你喜欢