Supervisely
About SuperviselyEcosystemContact usSlack
  • 💻Supervisely Developer Portal
  • 🎉Getting Started
    • Installation
    • Basics of authentication
    • Intro to Python SDK
    • Environment variables
    • Supervisely annotation format
      • Project Structure
      • Project Meta: Classes, Tags, Settings
      • Objects
      • Tags
      • Image Annotation
      • Video Annotation
      • Point Clouds Annotation
      • Point Cloud Episode Annotation
      • Volumes Annotation
    • Python SDK tutorials
      • Images
        • Images
        • Image and object tags
        • Spatial labels on images
        • Keypoints (skeletons)
        • Multispectral images
        • Multiview images
        • Advanced: Optimized Import
        • Advanced: Export
      • Videos
        • Videos
        • Video and object tags
        • Spatial labels on videos
      • Point Clouds
        • Point Clouds (LiDAR)
        • Point Cloud Episodes and object tags
        • 3D point cloud object segmentation based on sensor fusion and 2D mask guidance
        • 3D segmentation masks projection on 2D photo context image
      • Volumes
        • Volumes (DICOM)
        • Spatial labels on volumes
      • Common
        • Iterate over a project
        • Iterate over a local project
        • Progress Bar tqdm
        • Cloning projects for development
    • Command Line Interface (CLI)
      • Enterprise CLI Tool
        • Instance administration
        • Workflow automation
      • Supervisely SDK CLI
    • Connect your computer
      • Linux
      • Windows WSL
      • Troubleshooting
  • 🔥App development
    • Basics
      • Create app from any py-script
      • Configuration file
        • config.json
        • Example 1. Headless
        • Example 2. App with GUI
        • v1 - Legacy
          • Example 1. v1 Modal Window
          • Example 2. v1 app with GUI
      • Add private app
      • Add public app
      • App Compatibility
    • Apps with GUI
      • Hello World!
      • App in the Image Labeling Tool
      • App in the Video Labeling Tool
      • In-browser app in the Labeling Tool
    • Custom import app
      • Overview
      • From template - simple
      • From scratch - simple
      • From scratch GUI - advanced
      • Finding directories with specific markers
    • Custom export app
      • Overview
      • From template - simple
      • From scratch - advanced
    • Neural Network integration
      • Overview
      • Serving App
        • Introduction
        • Instance segmentation
        • Object detection
        • Semantic segmentation
        • Pose estimation
        • Point tracking
        • Object tracking
        • Mask tracking
        • Image matting
        • How to customize model inference
        • Example: Custom model inference with probability maps
      • Serving App with GUI
        • Introduction
        • How to use default GUI template
        • Default GUI template customization
        • How to create custom user interface
      • Inference API
      • Training App
        • Overview
        • Tensorboard template
        • Object detection
      • High level scheme
      • Custom inference pipeline
      • Train and predict automation model pipeline
    • Advanced
      • Advanced debugging
      • How to make your own widget
      • Tutorial - App Engine v1
        • Chapter 1 Headless
          • Part 1 — Hello world! [From your Python script to Supervisely APP]
          • Part 2 — Errors handling [Catching all bugs]
          • Part 3 — Site Packages [Customize your app]
          • Part 4 — SDK Preview [Lemons counter app]
          • Part 5 — Integrate custom tracker into Videos Annotator tool [OpenCV Tracker]
        • Chapter 2 Modal Window
          • Part 1 — Modal window [What is it?]
          • Part 2 — States and Widgets [Customize modal window]
        • Chapter 3 UI
          • Part 1 — While True Script [It's all what you need]
          • Part 2 — UI Rendering [Simplest UI Application]
          • Part 3 — APP Handlers [Handle Events and Errors]
          • Part 4 — State and Data [Mutable Fields]
          • Part 5 — Styling your app [Customizing the UI]
        • Chapter 4 Additionals
          • Part 1 — Remote Developing with PyCharm [Docker SSH Server]
      • Custom Configuration
        • Fixing SSL Certificate Errors in Supervisely
        • Fixing 400 HTTP errors when using HTTP instead of HTTPS
      • Autostart
      • Coordinate System
      • MLOps Workflow integration
    • Widgets
      • Input
        • Input
        • InputNumber
        • InputTag
        • BindedInputNumber
        • DatePicker
        • DateTimePicker
        • ColorPicker
        • TimePicker
        • ClassesMapping
        • ClassesColorMapping
      • Controls
        • Button
        • Checkbox
        • RadioGroup
        • Switch
        • Slider
        • TrainValSplits
        • FileStorageUpload
        • Timeline
        • Pagination
      • Text Elements
        • Text
        • TextArea
        • Editor
        • Copy to Clipboard
        • Markdown
        • Tooltip
        • ElementTag
        • ElementTagsList
      • Media
        • Image
        • LabeledImage
        • GridGallery
        • Video
        • VideoPlayer
        • ImagePairSequence
        • Icons
        • ObjectClassView
        • ObjectClassesList
        • ImageSlider
        • Carousel
        • TagMetaView
        • TagMetasList
        • ImageAnnotationPreview
        • ClassesMappingPreview
        • ClassesListPreview
        • TagsListPreview
        • MembersListPreview
      • Selection
        • Select
        • SelectTeam
        • SelectWorkspace
        • SelectProject
        • SelectDataset
        • SelectItem
        • SelectTagMeta
        • SelectAppSession
        • SelectString
        • Transfer
        • DestinationProject
        • TeamFilesSelector
        • FileViewer
        • Dropdown
        • Cascader
        • ClassesListSelector
        • TagsListSelector
        • MembersListSelector
        • TreeSelect
        • SelectCudaDevice
      • Thumbnails
        • ProjectThumbnail
        • DatasetThumbnail
        • VideoThumbnail
        • FolderThumbnail
        • FileThumbnail
      • Status Elements
        • Progress
        • NotificationBox
        • DoneLabel
        • DialogMessage
        • TaskLogs
        • Badge
        • ModelInfo
        • Rate
        • CircleProgress
      • Layouts and Containers
        • Card
        • Container
        • Empty
        • Field
        • Flexbox
        • Grid
        • Menu
        • OneOf
        • Sidebar
        • Stepper
        • RadioTabs
        • Tabs
        • TabsDynamic
        • ReloadableArea
        • Collapse
        • Dialog
        • IFrame
      • Tables
        • Table
        • ClassicTable
        • RadioTable
        • ClassesTable
        • RandomSplitsTable
        • FastTable
      • Charts and Plots
        • LineChart
        • GridChart
        • HeatmapChart
        • ApexChart
        • ConfusionMatrix
        • LinePlot
        • GridPlot
        • ScatterChart
        • TreemapChart
        • PieChart
      • Compare Data
        • MatchDatasets
        • MatchTagMetas
        • MatchObjClasses
        • ClassBalance
        • CompareAnnotations
      • Widgets demos on github
  • 😎Advanced user guide
    • Objects binding
    • Automate with Python SDK & API
      • Start and stop app
      • User management
      • Labeling Jobs
  • 🖥️UI widgets
    • Element UI library
    • Supervisely UI widgets
    • Apexcharts - modern & interactive charts
    • Plotly graphing library
  • 📚API References
    • REST API Reference
    • Python SDK Reference
Powered by GitBook
On this page
  • Failed to initialize NVML: Unknown Error
  • Quick solution
  • Proper solution
  • CUDA Out Of Memory Error
  • Solution
  • Can't start the docker container. Trying to use another runtime.
  • Solution
  • Additional: disable automatic kernel updates.

Was this helpful?

Edit on GitHub
  1. Getting Started
  2. Connect your computer

Troubleshooting

Some of the problems you could run into when using the agent, along with solutions

Failed to initialize NVML: Unknown Error

This error applies to any utilities/libraries that use NVML: pytorch, nvidia-smi, pynvml etc. It frequently shows up unexpectedly and prevents applications from using the GPU until it is fixed.

Quick solution

If the error only appears inside the container, you can quickly fix it by restarting it. If the error also appears on the host after running nvidia-smi you can fix it by using the reboot command on the host.

Proper solution

  1. If the error has not yet appeared, you can check if your system is affected by this problem.

    • run agent docker on the host PC;

    • run sudo systemctl daemon-reload on the host;

    • execute nvidia-smi into the agent container and catch NVML initialization error

  2. Set the parameter "exec-opts": ["native.cgroupdriver=cgroupfs"] in the /etc/docker/daemon.json file.

~$ cat /etc/docker/daemon.json 
{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    },
    "exec-opts": ["native.cgroupdriver=cgroupfs"]
}
  1. Restart Docker with sudo systemctl restart docker

CUDA Out Of Memory Error

This error could appear in any training apps.

Solution

  1. Check the amount of free GPU memory by running nvidia-smi command in your machine terminal - it will give you an understanding of how much GPU memory is it necessary to free in order to train your machine learning model

  2. Stop unnecessary app sessions in Supervisely: START button → App Sessions → stop all unnecessary app sessions by clicking on Stop button in front of every undesired app session

  3. Stop unnecessary processes in your machine terminal by running sudo kill <put_your_process_id_here>

  4. Select a lighter machine learning model (check "Memory" column in a model table - there is information about how much GPU memory will this model require to train).

If this information is not provided, use a simple rule: the higher the model in the table, the lighter it is.

  1. Reduce batch size or model input resolution

    MMsegmentation image resolution/batch size

    MMdetection v3 image resolution/batch size

Additional: stop a process via docker.

  1. run docker ps - it will return a big table with all docker containers running on this machine

  2. run docker stop <put_your_container_id_here>

Can't start the docker container. Trying to use another runtime.

This message indicates that there was a problem using the Nvidia runtime, most likely the Nvidia driver failed after an automatic kernel update - by default, this feature is enabled on Ubuntu. However, it often fails because the driver cannot be unloaded while it is in use.

Solution

Fast solution

The simplest way to fix this problem is to reboot the machine. After the reboot, the Nvidia driver will be reloaded and the problem will be fixed. But, if you don't want to reboot the machine, use the second solution.

Without rebooting

In case you receive: nvidia version mismatch after executing nvidia-smi command:

~$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch

You can fix it without rebooting or reinstalling the driver using these commands:

kill -9 $(lsof /dev/nvidia* | awk '{print $2}')
sleep 5
modprobe --remove nvidia_uvm nvidia_drm nvidia_modeset nvidia drm_kms_helper drm
modprobe nvidia_uvm nvidia_drm nvidia_modeset nvidia drm_kms_helper drm

The first command will kill all processes that use the Nvidia driver, and the last two will unload and reload the driver. You might need to redeploy your agent on the machine after running this command.

Additional: disable automatic kernel updates.

You can also run this command to disable automatic kernel updates:

sudo apt remove unattended-upgrades

If the commands above don't work for you (some process is auto restarting preventing the driver from properly unload), you can simply reboot the machine.

PreviousWindows WSLNextBasics

Last updated 10 months ago

Was this helpful?

You can also try other to solve this problem for a specific docker container or plunge into this problem by reading or .

MMsegmentation required memory
The lightest YOLOv8 model

🎉
official NVIDIA fixes
this discussion
this official description