版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/warrah/article/details/88863041
1 获取显存使用
zabbix-agent安装参考第1.3章 树莓派环境监控,
cd /etc/zabbix
mkdir monitor
cd monitor
vi get_gpu_used.sh
#!/bin/bash
nvidia-smi -q | grep -A 3 "FB Memory Usage" | grep Used | awk '{print $3}'
chmod +x /etc/zabbix/monitor/get_gpu_used.sh
vi /etc/zabbix/zabbix_agentd.conf
UserParameter=gpu.used,/etc/zabbix/monitor/get_gpu_used.sh
在zabbix server端执行./zabbix_get -s 10.101.5.147 -p 10050 -k "gpu.used"
获取到数据,则正常。
在zabbix中添加监控项gpu.used
,再在图形添加对应的指标即可
2 gpu利用率
vi /etc/zabbix/monitor/get_gpu_util.sh
#!/bin/bash
nvidia-smi -q | grep -A 3 "Utilization" | grep Gpu | awk '{print $3}'
chmod +x /etc/zabbix/monitor/get_gpu_util.sh
继续添加UserParameter即可,zabbix中的流程同上
vi /etc/zabbix/zabbix_agentd.conf
UserParameter=gpu.util,/etc/zabbix/monitor/get_gpu_util.sh
service zabbix-agent restart
GPU监控指标脚本,其实也很简单nvidia-smi -q
得到下面的瞬时信息,如果你想要GPU的温度,那么只要按照规则拼接就可以。{print $5}
这个数字是数文本第几个。
root@147:/etc/zabbix# nvidia-smi -q | grep -A 3 "Temperature"
Temperature
GPU Current Temp : 51 C
GPU Shutdown Temp : 99 C
GPU Slowdown Temp : 96 C
root@147:/etc/zabbix# nvidia-smi -q | grep -A 3 "Temperature" | grep "GPU Current Temp"
GPU Current Temp : 51 C
root@147:/etc/zabbix# nvidia-smi -q | grep -A 3 "Temperature" | grep "GPU Current Temp" | awk '{print $5}'
51
==============NVSMI LOG==============
Timestamp : Thu Mar 28 10:52:46 2019
Driver Version : 390.116
Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : GeForce GTX 1070
Product Brand : GeForce
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-85df7bd7-a582-33c9-988d-c0ed0e332493
Minor Number : 0
VBIOS Version : 86.04.50.00.63
MultiGPU Board : No
Board ID : 0x100
GPU Part Number : N/A
Inforom Version
Image Version : G001.0000.01.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x1B8110DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x85991043
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 3415
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 0 %
Performance State : P2
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 8114 MiB
Used : 7907 MiB
Free : 207 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 6 MiB
Free : 250 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending : N/A
Temperature
GPU Current Temp : 51 C
GPU Shutdown Temp : 99 C
GPU Slowdown Temp : 96 C
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 35.21 W
Power Limit : 166.10 W
Default Power Limit : 166.10 W
Enforced Power Limit : 166.10 W
Min Power Limit : 75.00 W
Max Power Limit : 200.00 W
Clocks
Graphics : 1632 MHz
SM : 1632 MHz
Memory : 3802 MHz
Video : 1468 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 2037 MHz
SM : 2037 MHz
Memory : 4004 MHz
Video : 1708 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
Process ID : 1207
Type : G
Name : /usr/lib/xorg/Xorg
Used GPU Memory : 14 MiB
Process ID : 1265
Type : G
Name : /usr/bin/gnome-shell
Used GPU Memory : 63 MiB
Process ID : 5225
Type : G
Name : /usr/lib/xorg/Xorg
Used GPU Memory : 74 MiB
Process ID : 5401
Type : G
Name : /usr/bin/gnome-shell
Used GPU Memory : 97 MiB
Process ID : 6031
Type : C
Name : python
Used GPU Memory : 7653 MiB
3 gpu自身的监控
nvidia-smi
# 每2秒监控一次
watch -n 2 nvidia-smi
还可以安装pip install gpustat
,然后执行watch --color -n1 gpustat -cpu