Advanced: Export

This advanced tutorial will guide you through various methods of downloading images and annotations from Supervisely. We'll cover everything from basic downloads to optimized approaches for large projects, with performance benchmarks to illustrate the benefits of different techniques.

Basic Downloads

Project Metadata

The easiest way is to create a .env file that stores your SERVER_ADDRESS and API_TOKEN. This makes it simpler to initialize the api client as you work through code snippets in this tutorial. You can learn more about this in this section

Project metadata contains essential information about your project, including classes, tags, and other configurations.

import supervisely as sly

# Initialize API client
api = sly.Api.from_env()

# Get project metadata with project settings by ID
project_id = 12345
project_meta_json = api.project.get_meta(project_id, with_settings=True)
project_meta = sly.ProjectMeta.from_json(project_meta_json)

# Display project classes and tags
print("Project Classes:")
for obj_class in project_meta.obj_classes:
    print(f"- {obj_class.name} ({obj_class.geometry_type.name()})")

print("\nProject Tags:")
for tag_meta in project_meta.tag_metas:
    print(f"- {tag_meta.name}")

Single Image and Annotation

Here's how to download a single image and its annotation:

import os
import supervisely as sly

api = sly.Api.from_env()

# Define project and dataset IDs
project_id = 12345
dataset_id = 67890

# Get first image info
image_infos = api.image.get_list(dataset_id)
image_info = image_infos[0]
image_id = image_info.id

# Download numpy array
img_np = api.image.download_np(image_id)
print(f"Downloaded image numpy array: {image_info.name}, shape: {img_np.shape}")

# or download image and save locally if needed
save_dir = "downloaded_data"
sly.fs.mkdir(save_dir)
save_path = os.path.join(save_dir, image_info.name)
api.image.download(image_id, save_path)
print(f"Downloaded image file: {image_info.name}, path: {save_path}")

# Download annotation in JSON format and save
ann_json = api.annotation.download_json(image_id)
save_path = os.path.join(save_dir, image_info.name + ".json")
sly.json.dump_json_file(ann_json, save_path)
print(f"Downloaded annotation file: {save_path}")

# or convert to Annotation object
ann = sly.Annotation.from_json(ann_json, project_meta)
print(f"Downloaded annotation object with {len(ann.labels)} labels")

Annotation JSON Format

Supervisely allows you to download annotations in JSON format, which is particularly useful for custom processing or integration with other tools.

  1. Flexibility: JSON format provides the raw data structure, allowing you to parse and process it according to your specific needs.

  2. Completeness: JSON format includes all metadata and additional information that might be stripped in specific export formats.

  3. Interoperability: JSON is a universal format that can be easily converted to other formats or used directly in various applications.

To learn more about Supervisely image annotation format, read the Image Annotation docs.

Batch Downloads

Multiple Images and Annotations

For better performance, download multiple images and annotations in batches. Almost all our methods that download multiple images or annotations at once use batches at a low level. The batch size is optimized for efficient operation across different instances and is set to 50.

import supervisely as sly
from tqdm import tqdm

api = sly.Api.from_env()

project_id = 12345
dataset_id = 67890

# Get image IDs from dataset
image_infos = api.image.get_list(dataset_id)
image_ids = [image_info.id for image_info in image_infos[:100]]  # First 100 images

# Download images. This method is optimized and will download all images batch by batch.
images_progress = tqdm(total=len(image_ids), desc="Downloading images")
images = api.image.download_nps(dataset_id, image_ids, progress_cb=images_progress)

# Download annotations for all images,
annotation_progress = tqdm(total=len(image_ids), desc="Downloading annotations")
batch_anns = api.annotation.download_batch(dataset_id, image_ids, progress_cb=annotation_progress)

Setting Batch Size

Batch size significantly affects download performance. Here's how to set it and understand its impact:

import re
import time
from typing import List, Optional

from requests_toolbelt import MultipartDecoder
from tqdm import tqdm

import supervisely as sly
from supervisely.api.annotation_api import ApiField
from supervisely.imaging import image

api = sly.Api.from_env()

dataset_id = 67890

image_infos = api.image.get_list(dataset_id)
image_ids = [image_info.id for image_info in image_infos[:1000]]  # Test with 1000 images

# Test different batch sizes
batch_sizes = [10, 50, 100, 200]

for batch_size in batch_sizes:
    sly.api_constants.DOWNLOAD_BATCH_SIZE = batch_size
    progress_cb = tqdm(total=len(image_ids), desc=f"Download images: {batch_size}")
    start_time = time.monotonic()
    img_nps = api.image.download_nps(dataset_id, image_ids, progress_cb=progress_cb)
    elapsed_time = time.monotonic() - start_time
    print(f"Batch size: {batch_size}, Time: {elapsed_time:.2f} seconds")

Following results was obtained on Pascal VOC 2012 dataset which you could download from datasetninja.com

Batch size
Time (seconds)

10

210

50

44

100

33

200

31

Batch size affects:

  • Network efficiency: Larger batches reduce overhead from multiple requests

  • Memory usage: Very large batches consume more RAM

  • Error handling: Smaller batches are easier to retry if errors occur

The optimal batch size depends on your network conditions, server load, and image sizes. Generally, batch sizes between 50-100 work well for most cases.

Entire Project Downloads

Downloading in Supervisely Format

To download a complete project, you can use the convenient download_fast function that handles all the details for you.

This function provides significant advantages over manual download approaches:

  • Uses a smart approach to choose between asynchronous downloading or standard method

  • Downloads the complete structure with all metadata

  • Preserves the Supervisely format for easy re-import later

  • Automatically handles batching and resource management

  • Provides options for customizing exactly what gets downloaded

import supervisely as sly

api = sly.Api.from_env()

project_id = 12345
save_path = 'Pascal_VOC_2012'
sly.fs.mkdir(save_path)
sly.download_fast(
    api=api,
    project_id=project_id,
    dest_dir=save_path,
)

Read the signature of the download_fast function in the Python SDK

Downloading in Specific Formats

When downloading data from Supervisely, it is initially exported in the native Supervisely format. For projects with thousands of small images, Supervisely offers an optimized approach using "blob files".

However, you can easily convert data from the classic Supervisely format to other popular formats immediately after downloading. The SDK provides built-in conversion utilities that make it simple to transform your data into formats like COCO, YOLO, Pascal VOC, and more.

Extended Supervisely Format with Blobs

This download method is only available for projects that were originally uploaded using the blob format.

import supervisely as sly

api = sly.Api.from_env()

project_id = 12345
output_dir = "blob_project"
sly.fs.mkdir(output_dir)

# Download project with blob files (much faster for projects with many small images)
sly.download_fast(
    api=api,
    project_id=project_id,
    dest_dir=output_dir,
    download_blob_files=True  # Important for blob images
)

The blob approach packages many small images into a single archive file, reducing filesystem operations and network requests.

For more detailed information about working with blob files, including how to upload and process blob-based projects, please refer to documentation on working with blob files.

You can also use the application from the ecosystem that will download a project of this format: Export to Supervisely format: Blob

After downloading classic Supervisely format, you can convert the data to popular formats like COCO, YOLO, or Pascal VOC:

import supervisely as sly

api = sly.Api.from_env()

project_id = 12345

# Define download path
output_dir = "downloaded_project"
sly.fs.mkdir(output_dir)

# Download the project in Supervisely format
print("Downloading project in Supervisely format...")
sly.download_fast(
    api=api,
    project_id=project_id,
    dest_dir=output_dir,
)
print(f"Project saved to: {output_dir}")

# After downloading, you can open the project to access its contents
project_fs = sly.Project(output_dir, sly.OpenMode.READ)

Then you can convert the project to other formats in two ways:

  • Using the sly.convert functions

  • Using the Project object

COCO format supports geometry types like rectangles, bitmaps, polygons, and graph nodes

# 1. Convert to COCO format
print("\nConverting to COCO format...")
coco_output_dir = "coco_format"

sly.convert.project_to_coco(output_dir, coco_output_dir)
# or
project_fs.to_coco(coco_output_dir)
print(f"COCO format saved to: {coco_output_dir}")

You can also convert specific datasets

# For example, to convert a specific dataset to Pascal VOC format:
for ds in project_fs.datasets:
    sly.convert.dataset_to_pascal_voc(dataset=ds, meta=project_fs.meta, dest_dir=pascal_output_dir)
    # or
    ds.to_pascal_voc(project_fs.meta, dest_dir=pascal_output_dir)

These conversion utilities make it easy to use your Supervisely data with other frameworks and tools without needing to implement custom converters.

Working with Datasets

Dataset Hierarchy

Supervisely supports hierarchical dataset structures. See the special article that explains how to work with projects that have hierarchical datasets - Iterate over a project

Here's how to navigate and work with them:

import supervisely as sly

api = sly.Api.from_env()

project_id = 172

# Use the tree() method to efficiently iterate through the dataset hierarchy
print("Dataset Hierarchy (using tree method):")
for parents, dataset in api.dataset.tree(project_id):
    # parents is a list of parent dataset names, empty list for root datasets
    indent = "  " * len(parents)
    if not parents:
        print(f"{indent}- {dataset.name} (ID: {dataset.id})")
    else:
        parent_path = " > ".join(parents)
        print(f"{indent}|- {dataset.name} (ID: {dataset.id}, Path: {parent_path})")

# Output will look like this:
# - DS1 (ID: 328)
#   |- DS1-1 (ID: 383, Path: DS1)
#     |- DS1-1-1 (ID: 384, Path: DS1 > DS1-1)
# - DS2 (ID: 382)
#   |- DS2-1 (ID: 385, Path: DS2)
#     |- DS2-1-1 (ID: 774, Path: DS2 > DS2-1)
#       |- DS2-1-1-1 (ID: 775, Path: DS2 > DS2-1 > DS2-1-1)

Downloading Specific Datasets

To download specific datasets from a project, you can use the convenient download_fast mentioned above.

When you specify a dataset ID, the function will:

  • Create the folder structure up to the parent dataset level

  • Download only the images and annotations for the specified dataset

  • Skip downloading images from parent datasets in the hierarchy

  • Skip downloading any nested child datasets that might exist under your specified dataset

If you need to download an entire branch of the dataset hierarchy (a dataset and all its nested children), you would need to provide all the relevant dataset IDs in the dataset_ids parameter.

import supervisely as sly

api = sly.Api.from_env()

project_id = 10
dataset_ids = [39]
save_path = 'Pascal_VOC_2012/train'
sly.fs.mkdir(save_path)
sly.download_fast(
    api=api,
    project_id=project_id,
    dest_dir=save_path,
    dataset_ids=dataset_ids,
)

Dataset Images Asynchronous Downloads

Download Methods

For better performance, you can use asynchronous methods even in a synchronous context:

import supervisely as sly

coroutine = download_nps_async(img_ids)
img_nps = sly.run_coroutine(coroutine)

The table below lists various asynchronous methods available in the Supervisely SDK for downloading images in different formats and output types. These methods can significantly improve download performance compared to their synchronous counterparts, especially when working with large datasets.

Method
Description

download_np_async

Downloads a single image as numpy array

download_nps_async

Downloads multiple images as numpy arrays

download_path_async

Downloads a single image to a specified path

download_paths_async

Downloads multiple images to specified paths

download_bytes_single_async

Downloads a single image as bytes

download_bytes_many_async

Downloads multiple images as bytes in parallel (one request per image)

download_bytes_generator_async

Downloads multiple images as bytes using a single batch request, yielding results through an async generator

Performance Comparison

Let's compare synchronous and asynchronous download methods, for example as numpy array:

import supervisely as sly
from time import monotonic
from tqdm import tqdm

api = sly.Api.from_env()

dataset_id = 1037458

# Get image IDs for testing
image_infos = api.image.get_list(dataset_id)
img_ids = [info.id for info in image_infos[:1000]]  # Using first 1000 images

print(f"Testing download performance with {len(img_ids)} images...")

# 1. Synchronous download (one by one)
start_time = monotonic()
img_nps = []
progress = tqdm(total=len(img_ids), desc="Sync download")
for img_id in img_ids:
    img_nps.append(api.image.download_np(img_id))
    progress.update(1)
sync_time = monotonic() - start_time
print(f"Synchronous download took {sync_time:.2f} seconds")

Results was obtained on Pascal VOC 2012 dataset which you could download from datasetninja.com

Method
Description
Pros
Cons
Best For

Single download

Download one image at a time

Simple to implement, minimal memory usage

Very slow for many images

Small projects, debugging

Batch download

Download images in groups

Better network utilization, simple API

Blocking operation

Medium-sized projects

Asynchronous download

Non-blocking parallel downloads

Highest performance, efficient resource usage

Limited by network/system performance

Large projects

Using asynchronous downloads with proper concurrency control (via semaphores) enables you to get the best possible performance while managing system resource usage.

Advanced Annotation Downloads

Synchronous Annotation Downloads

First, we'll compare what speed increase we get when downloading annotations in batches with a fixed size of 50. This size remains constant since an optimal value has been chosen that will work efficiently for any instance configuration.

import supervisely as sly
from time import monotonic
from tqdm import tqdm

api = sly.Api.from_env()

dataset_id = 1037458

# Get image IDs
image_infos = api.image.get_list(dataset_id)
img_ids = [info.id for info in image_infos[:1000]]  # Using first 1000 images

print(f"Testing annotation download performance with {len(img_ids)} images...")

# 1. Synchronous download (one by one)
start_time = monotonic()
anns = []
progress = tqdm(total=len(img_ids), desc="Sync download")
for img_id in img_ids:
    anns.append(api.annotation.download(img_id))
    progress.update(1)
sync_time = monotonic() - start_time
print(f"Synchronous annotations download took {sync_time:.2f} seconds")

Asynchronous Annotation Downloads

There are two methods for asynchronous annotation downloading that are used depending on the types of annotations in your dataset images.

Method
Best For

download_batch_async

Standard annotations for normal-sized images

download_bulk_async

Small or simple annotations for smaller-sized images

To apply these methods effectively, you can separate images into different lists based on their size information.

import supervisely as sly
from time import monotonic
from tqdm import tqdm
import asyncio

api = sly.Api.from_env()

dataset_id = 1037458

# Get image IDs
image_infos = api.image.get_list(dataset_id)
img_ids = [info.id for info in image_infos[:1000]]  # Using first 1000 images

print(f"Testing annotation download performance with {len(img_ids)} images...")

# Method 3: Download multiple annotations in parallel (one request per image)
# Adjust the number of concurrent requests depending on your instance limitations
semaphore = asyncio.Semaphore(4)

progress = tqdm(total=len(img_ids), desc=f"Async Batch download")
download_coroutine = api.annotation.download_batch_async(
    dataset_id,
    img_ids,
    progress_cb=progress,
    semaphore=semaphore,
)
start_time = monotonic()
anns = sly.run_coroutine(download_coroutine)
async_batch_time = monotonic() - start_time
print(f"Downloaded {len(anns)} annotations in {async_batch_time:.2f} seconds")

Following results was obtained on Pascal VOC 2012 dataset which you could download from datasetninja.com

The benefits of asynchronous downloading:

  1. Parallel processing: Multiple batches can be downloaded simultaneously

  2. Better resource utilization: Network I/O doesn't block the application

  3. Improved throughput: Especially noticeable with many small files

  4. Reduced total processing time: Significant reduction for large datasets

Choosing the Right Async Method

For best performance, consider these guidelines:

  1. Use semaphore to control concurrency (typically 5-20 concurrent downloads)

  2. The download_bulk_async method is generally fastest for datasets with many small annotations

  3. For complex annotations with alpha masks or large bitmaps, download_batch_async with a smaller semaphore value may work better

  4. When using ApiContext, the methods automatically use the project metadata to avoid redundant API calls:

# Optimize downloads with ApiContext
project_id = api.dataset.get_info_by_id(dataset_id).project_id
project_meta = api.project.get_meta(project_id)

with sly.ApiContext(api, dataset_id=dataset_id, project_id=project_id, project_meta=project_meta):
    annotations = sly.run_coroutine(download_bulk_async())

Figures Download

When working with datasets that have large numbers of annotations, downloading figures in bulk can significantly improve performance. Supervisely provides dedicated API methods for this purpose.

The bulk figure download approach is particularly effective when:

  • You need to analyze annotation distribution without loading full data

  • You're developing a custom export pipeline to another format

  • You need to visualize or process specific types of annotations

  • Your dataset contains many images with hundreds or thousands of annotations

Understanding Figures vs Annotations

In Supervisely's data model:

  • Annotations contain all information about labeled objects, including tags and metadata

  • Figures represent the geometric shapes that define objects in images (rectangles, polygons, bitmaps, etc.)

For many ML tasks, you might only need the geometric information without all the associated metadata.

Basic Figures Download

FigureInfo represents detailed information about a figure: geometry, tags, metadata etc. Here's how to get FigureInfo for images in a dataset.

import supervisely as sly
from tqdm import tqdm

api = sly.Api.from_env()

# Define dataset ID
dataset_id = 254737

# Download all figures for a dataset
# Returns a dictionary where keys are image IDs and values are lists of figures
figures_dict = api.image.figure.download(dataset_id)
figure_ids = []
# Process each image's figures
for image_id, figures in figures_dict.items():
    print(f"Image ID: {image_id}, Number of figures: {len(figures)}")
    for figure in figures:
        print(f"  - Figure ID: {figure.id}, Class ID: {figure.class_id}, Type: {figure.geometry_type}")
        figure_ids.append(figure.id)

Optimized Figures Download

For large datasets, you can skip downloading the geometry data initially to speed up the process.

For example, when you need to filter figures by class. You download lightweight FigureInfos, process it, and get a list of figures you need.

import supervisely as sly
from tqdm import tqdm
from supervisely.geometry.alpha_mask import AlphaMask

api = sly.Api.from_env()

# Define dataset ID
dataset_id = 254737

# Download only figures info without geometries
figures_dict = api.image.figure.download(dataset_id, skip_geometry=True)
# Collect figure IDs
figures_ids = []
for image_id, figures in figures_dict.items():
    for figure in figures:
        if figure.geometry_type == AlphaMask.name():
            figures_ids.append(figure.id)

print(f"Found {len(figures_ids)} AlphaMask figures in the dataset")

Working with AlphaMask Geometries

For advanced cases like AlphaMask geometries, you'll need to handle the download separately:

# Then download only the geometries you need in batches
progress = tqdm(total=len(figures_ids), desc="Downloading geometries")
geometries = api.image.figure.download_geometries_batch(figures_ids, progress_cb=progress)

# Process geometries
for figure_id, geometry in zip(figures_ids, geometries):
    # Your processing code here
    pass

The bulk geometry download offers several advantages:

  1. Reduced network overhead: Only essential figure data is transferred

  2. Faster processing: Server-side filtering minimizes data transfer

  3. Lower memory usage: Only relevant geometry information is returned

  4. Simplified post-processing: Data is already in the required format

Advanced: Asynchronous Downloads

For even better performance with large datasets (containing approximately 1.2 million figures in total), you can use asynchronous downloads:

import supervisely as sly

api = sly.Api.from_env()

figures_dict = api.image.figure.download_fast(dataset_id)

alpha_ids = []
for image_id, figures in figures_dict.items():
    for figure in figures:
        if figure.geometry_type == AlphaMask.name():
            alpha_ids.append(figure.id)

progress = tqdm(total=len(alpha_ids), desc="Downloading AlphaMask geometries")

# Download geometries asynchronously
download_coroutine = api.image.figure.download_geometries_batch_async(alpha_ids, progress_cb=progress)
geometries = sly.run_coroutine(download_coroutine)

print(f"Downloaded {len(geometries)} geometries")

Performance Tips for Figure Downloads

  1. Use skip_geometry=True when you only need figure metadata initially

  2. Process figures by type - some geometry types might need special handling

  3. Download geometries in batches (optimal batch size is typically 50-200)

  4. Use asynchronous methods for large datasets with many figures

  5. Consider memory constraints when downloading many complex geometries

Conclusion

When downloading data from Supervisely, choosing the right method can dramatically impact performance. Single downloads are simple but inefficient for large datasets, suitable only for debugging or working with a few images. Batch downloads offer a good balance of simplicity and performance for medium-sized projects, improving network utilization while remaining easy to implement. For large-scale projects with thousands of images or annotations, asynchronous downloads deliver the best performance - up to ~20x faster than sequential downloads - by efficiently utilizing network resources and processing multiple requests in parallel.

Remember to use semaphores to control concurrency and consider the specific characteristics of your data (image sizes, annotation complexity) when selecting a download method. By implementing the appropriate download strategy for your project's scale, you can significantly reduce processing time and improve overall workflow efficiency.

Last updated

Was this helpful?