准备
CUDA 版本:10.0
安装包
- cuda_10.0.130_410.48_linux.run
- cudnn-10.0-linux-x64-v7.5.0.56.tgz
检查硬件环境
- 检测系统是否已安装 GPU,执行命令:
lspci | grep -i nvidia
输出类似如下信息表明已安装 GPU :
03:00.0 VGA compatible controller: NVIDIA Corporation Device 1b06 (rev a1)
03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1)
- 检查内核版本
uname -r
# 如果内核版本小于3.10.0-957,则升级至3.10.0-957:
yum install kernel
- 手动设置启动内核:
grub2-set-default "CentOS Linux (3.10.0-957.1.3.el7.x86_64) 7 (Core)"
再次重启系统查看改动是否生效。
检查软件环境
- 安装内核头文件
yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
- 安装 GCC 、EPEL 源 和 DKMS
yum install gcc gcc-c++
yum install epel-release
yum install --enablerepo=epel dkms
- 禁用 Nouveau 驱动
vi /etc/modprobe.d/blacklist-nouveau.conf
# 编辑内容
blacklist nouveau
options nouveau modeset=0
保存文件执行
dracut --force
# 运行以下命令查看是否禁用成功
lsmod | grep nouveau
# 如果有输出如下信息,说明禁用失败,可以尝试重启之后,再执行lsmod | grep nouveau。
安装
- 进入文字模式
init 3
- 执行安装脚本
# sh cuda_10.0.130_410.48_linux.run
Do you accept the previously read EULA?
accept/decline/quit: accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 387.26?
(y)es/(n)o/(q)uit: y
Do you want to install the OpenGL libraries?
(y)es/(n)o/(q)uit [ default is yes ]: n
Do you want to run nvidia-xconfig?
(y)es/(n)o/(q)uit [ default is no ]: n
Install the CUDA 10.0 Toolkit?
(y)es/(n)o/(q)uit: y
Enter Toolkit Location
[ default is /usr/local/cuda-10.0 ]: /home/default
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y
Install the CUDA 10.0 Samples?
(y)es/(n)o/(q)uit: n
===========
= Summary =
===========
Driver: Installed
Toolkit: Installed
Samples: Not Selected
- 检查是否安装成功
# 命令查看 GPU 设备状态,出现一下内容说明安装成功
nvidia-smi
- 运行测试程序,检测 CUDA ToolKit 是否安装成功:
$ /usr/local/cuda/extras/demo_Suite/deviceQuery
/usr/local/cuda/extras/demo_suite/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1080 Ti"
CUDA Driver Version / Runtime Version 10.0 / 10.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 11178 MBytes (11721506816 bytes)
(28) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores
GPU Max Clock rate: 1582 MHz (1.58 GHz)
Memory Clock rate: 5505 Mhz
......
......
......
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 1, Device0 = GeForce GTX 1080 Ti
Result = PASS
- 安装 cuDNN 库
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
卸载
如需要更新CUDA或者主动卸载CUDA,请按如下卸载方式执行。
先卸载 CUDA ToolKit,再卸载 NVIDIA 驱动。
/icooper/tools/cuda-10.0/bin/uninstall_cuda_10.0.pl
# 卸载 dmks 中的 nvidia 模块
dkms remove nvidia/396.26 -k 3.10.0-957.1.3.el7.x86_64
nvidia-uninstall
参考
Nvidia CUDA 安装文档:
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-installation
Nvidia cuDNN 安装文档:
https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html