GPU-server

配置GPU服务器教程

环境:Ubuntu22.04, GTX950

安装显卡驱动

打开软件安装器

找到软件安装器

软件安装器

选择合适的驱动,安装后重启电脑即可

选择驱动

另一种方案是从官网下载驱动,参照https://blog.csdn.net/qq_49323609/article/details/130310522

经过测试,上面的教程能够正常安装nvidia驱动,但无法安装cuda

如果遇到开机黑屏,可以重启,在BIOS界面狂按shift,选择recovery mode,呼出root命令行

recovery mode默认是不联网的,但是在进入 recovery mode 界面有联网选项,建议联网以便安装必要的依赖

验证显卡驱动

输入nvidia-smi,看到以下输出就行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Wed May  8 08:42:43 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 950 Off | 00000000:06:10.0 Off | N/A |
| 0% 32C P8 13W / 75W | 5MiB / 2048MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 989 G /usr/lib/xorg/Xorg 2MiB |
+---------------------------------------------------------------------------------------+

安装cuda

以cuda11.8为例

下载驱动安装程序

https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=runfile_local

1
2
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run

需要给系统盘留足够的空间,至少13GB

取消勾选driver

记得要取消勾选driver

运行结束,看到以下输出

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
===========
= Summary =
===========

Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-11.8/

Please make sure that
- PATH includes /usr/local/cuda-11.8/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-11.8/lib64, or, add /usr/local/cuda-11.8/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.8/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 520.00 is required for CUDA 11.8 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver

Logfile is /var/log/cuda-installer.log

添加环境变量

1
2
export PATH=$PATH:/usr/local/cuda-11.8/bin
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64

验证cuda驱动

1
nvcc --version
1
2
3
4
5
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

安装anaconda

在中大校内可以用matrix镜像源下载

1
wget https://mirrors.matrix.moe/anaconda/archive/Anaconda3-2022.05-Linux-x86_64.sh

下载anaconda

110MB/s , 速度相当快

1
2
chmod +x Anaconda3-2022.05-Linux-x86_64.sh
./Anaconda3-2022.05-Linux-x86_64.sh

遇到Do you wish the installer to initialize Anaconda3
by running conda init? [yes|no]
输入yes就行

安装pytorch

1
2
conda create -n pytorch python=3.11
创建 环境名称 python版本号

打开名为pytorch的环境

1
conda activate pytorch

然后安装pytorch,可以在官网上根据具体环境选择安装指令

https://pytorch.org/

pytorch

1
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

检验pytorch

1
2
3
4
5
6
7
(pytorch) node1@vmGPU:~$ python
Python 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>>

查询GPU详细信息

查询详细信息 nvidia-smi -q

查询特定GPU详细信息 nvidia-smi -q -i 0

显示GPU特定信息 nvidia-smi -q -i 0 -d MEMORY

帮助信息 nvidia-smi -h

nvidia-smi 还可以进行GPU模式的设置

安装cudnn

https://developer.nvidia.com/rdp/cudnn-archive

在此选择所需版本,需要登录才能下载

默认安装cudnn9:

linux安装cudnn的官方文档 https://docs.nvidia.com/deeplearning/cudnn/latest/installation/linux.html

不看文档,直接访问https://developer.nvidia.com/cudnn-downloads 获取安装方式也可以

自定义安装

像onnxruntime这些不支持cudnn9的,需要安装cudnn8

推荐参考 https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-860/install-guide/index.html#package-manager-ubuntu-install

提前确定好版本,然后待入安装指令即可

cuda编程

(未完待续,后续会开一个专题放cuda编程)

hello_world

注意:在核函数中只能使用printf,不能使用cout

hello.cu

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <stdio.h>

__global__ void hello_from_gpu()
{
printf("Hello World from the the GPU\n");
}


int main(void)
{
hello_from_gpu<<<4, 4>>>();// 使用4x4=16线程(使用4个线程块,每个线程块4个线程)
cudaDeviceSynchronize();// 同步

return 0;
}

编译:nvcc ./hello.cu -o hello

可以看到输出了16行 hello world

核函数

核函数在GPU上进行并行执行

注意:

  1. 限定词__global__修饰
  2. 返回值必须是void
1
2
3
4
__global__ void hello_from_gpu()
{
printf("Hello World from the the GPU\n");
}

注意事项:

  1. 核函数只能访问GPU内存
  2. 核函数不能使用变长参数
  3. 核函数不能使用静态变量
  4. 核函数不能使用函数指针
  5. 核函数具有异步性

cuda程序编写流程

1
2
3
4
5
int main(){
主机代码(配置GPU);
核函数调用;
主机代码(GPU数据回传主机);
}

线程模型

线程模型重要概念:

(1) grid网格

(2) block线程块

线程分块是逻辑上的划分,物理上线程不分块

配置线程: <<<grid_size, block_size>>>

最大允许线程块大小:1024

最大允许网格大小:231-1(针对一维网格)

线程模型

矩阵乘法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#include <iostream>
#include <vector>

// CUDA kernel for matrix multiplication
__global__ void matrixMultiply(float* A, float* B, float* C, int N) {
int row = blockIdx.y * blockDim.y + threadIdx.y;
int col = blockIdx.x * blockDim.x + threadIdx.x;

if (row < N && col < N) {
float sum = 0.0f;
for (int k = 0; k < N; ++k) {
sum += A[row * N + k] * B[k * N + col];
}
C[row * N + col] = sum;
}
}

int main() {
const int N = 3; // Matrix size (3x3 in this case)

// Initialize matrices A and B
std::vector<float> h_A = {1, 2, 3, 4, 5, 6, 7, 8, 9};
std::vector<float> h_B = {13, 12, 11, 10, 9, 8, 7, 6, 5};
std::vector<float> h_C(N * N, 0);

// Allocate device memory
float* d_A, *d_B, *d_C;
cudaMalloc(&d_A, N * N * sizeof(float));
cudaMalloc(&d_B, N * N * sizeof(float));
cudaMalloc(&d_C, N * N * sizeof(float));

// Copy data from host to device
cudaMemcpy(d_A, h_A.data(), N * N * sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(d_B, h_B.data(), N * N * sizeof(float), cudaMemcpyHostToDevice);

// Define grid and block dimensions
dim3 threadsPerBlock(16, 16);
dim3 numBlocks((N + threadsPerBlock.x - 1) / threadsPerBlock.x,
(N + threadsPerBlock.y - 1) / threadsPerBlock.y);

// Launch the kernel
matrixMultiply<<<numBlocks, threadsPerBlock>>>(d_A, d_B, d_C, N);

// Copy result back to host
cudaMemcpy(h_C.data(), d_C, N * N * sizeof(float), cudaMemcpyDeviceToHost);

// Print the result
std::cout << "Matrix C (result of A * B):" << std::endl;
for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j) {
std::cout << h_C[i * N + j] << " ";
}
std::cout << std::endl;
}

// Clean up
cudaFree(d_A);
cudaFree(d_B);
cudaFree(d_C);

return 0;
}

运行结果

1
2
3
4
Matrix C (result of A * B):
54 48 42
144 129 114
234 210 186

GPU-server
https://blog.algorithmpark.xyz/2024/05/08/GPU-server/index/
作者
CJL
发布于
2024年5月8日
更新于
2024年8月29日
许可协议