Troubleshooting
Some of the problems you could run into when using the agent, along with solutions
Failed to initialize NVML: Unknown Error
Quick solution
Proper solution
~$ cat /etc/docker/daemon.json {
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
},
"exec-opts": ["native.cgroupdriver=cgroupfs"]
}CUDA Out Of Memory Error
Solution
Additional: stop a process via docker.
Can't start the docker container. Trying to use another runtime.
Solution
Fast solution
Without rebooting
Additional: disable automatic kernel updates.
Last updated