This page documents the various GPU settings and modes for the Lilac cluster in more detail than the Lilac Cluster Intro page provides. This page looks specifically at the new GTX 1080 GPUs, although many of the options apply to other Nvidia GPUs as well. Where applicable, this primer will talk about how these properties interact with the Lila Cluster
Looking at the status of the GPUs on a node though nvidia-smi
Important terminology notes
Process Exclusive
and Shared
mode on nVidia GPUsnvidia-smi
nVidia provides a tool to view the status of GPUs, such as their current memory load, temperature, and operating mode. The command to see this is nvidia-smi, here is a sample output from a Lilac node with 4 GPUs on it.
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.48 Driver Version: 367.48 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1080 Off | 0000:02:00.0 Off | N/A | | 0% 39C P8 6W / 180W | 2MiB / 8113MiB | 0% E. Process | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 1080 Off | 0000:03:00.0 Off | N/A | | 0% 41C P8 11W / 180W | 2MiB / 8113MiB | 0% E. Process | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 1080 Off | 0000:83:00.0 Off | N/A | | 0% 36C P8 9W / 180W | 2MiB / 8113MiB | 0% E. Process | +-------------------------------+----------------------+----------------------+ | 3 GeForce GTX 1080 Off | 0000:84:00.0 Off | N/A | | 0% 35C P8 11W / 180W | 2MiB / 8113MiB | 0% E. Process | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ |
The entries at the top match with the entries per GPU by position, e.g. the Bus-Id
entry in the top center cell matches to the 0000:02:00.0 in the second row, center cell. There are many things to digest here, but let's only cover the items that will most likely be important to your jobs.
|-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1080 Off | 0000:02:00.0 Off | N/A | | 0% 39C P8 6W / 180W | 2MiB / 8113MiB | 0% E. Process | +-------------------------------+----------------------+----------------------+ |
These individual items of note will use the above, shortened example.
E. Process
and Default
which correspond to exclusive process
and shared
mode as referenced in the Intro to Lilac documentation.E. Process
or "exclusive process"Compute Mode
of the GPUs, the Compute M. = Default
case will be referred to as shared
mode through this document. Shared
in this document refers only to the computation mode of the GPU in that multiple Contexts can share this GPU. Shared
does not mean that a GPU is shared between different jobs submitted to the.Exclusive process
will be shorted to exclusive
since there are not multiple modes of exclusivity on the GTX 1080sExclusive
in this document will only refer to the compute mode of the GPUEach physical machine has some number of physical GPUs associated with it. When you start a process on a GPU, which physical GPU it start on depends on a few factors:
shared
or exclusive
modeProvide steps that the user can take to solve the problem. For example "The level 7 printer will flash red when it is out of paper. Add paper to tray 1".
You may want to use a panel to highlight important steps.
|