Advanced: Export

This advanced tutorial will guide you through various methods of downloading images and annotations from Supervisely. We'll cover everything from basic downloads to optimized approaches for large projects, with performance benchmarks to illustrate the benefits of different techniques.

This tutorial uses Supervisely Python SDK version v6.73.349. The code examples provided are compatible with this specific version. Using the exact or newer version ensures you'll get the expected results. You can install it using:

pip install supervisely==6.73.349

Basic Downloads

Project Metadata

The easiest way is to create a .env file that stores your SERVER_ADDRESS and API_TOKEN. This makes it simpler to initialize the api client as you work through code snippets in this tutorial. You can learn more about this in this section

Project metadata contains essential information about your project, including classes, tags, and other configurations.

import supervisely as sly

# Initialize API client
api = sly.Api.from_env()

# Get project metadata with project settings by ID
project_id = 12345
project_meta_json = api.project.get_meta(project_id, with_settings=True)
project_meta = sly.ProjectMeta.from_json(project_meta_json)

# Display project classes and tags
print("Project Classes:")
for obj_class in project_meta.obj_classes:
    print(f"- {obj_class.name} ({obj_class.geometry_type.name()})")

print("\nProject Tags:")
for tag_meta in project_meta.tag_metas:
    print(f"- {tag_meta.name}")

Single Image and Annotation

Here's how to download a single image and its annotation:

import os
import supervisely as sly

api = sly.Api.from_env()

# Define project and dataset IDs
project_id = 12345
dataset_id = 67890

# Get first image info
image_infos = api.image.get_list(dataset_id)
image_info = image_infos[0]
image_id = image_info.id

# Download numpy array
img_np = api.image.download_np(image_id)
print(f"Downloaded image numpy array: {image_info.name}, shape: {img_np.shape}")

# or download image and save locally if needed
save_dir = "downloaded_data"
sly.fs.mkdir(save_dir)
save_path = os.path.join(save_dir, image_info.name)
api.image.download(image_id, save_path)
print(f"Downloaded image file: {image_info.name}, path: {save_path}")

# Download annotation in JSON format and save
ann_json = api.annotation.download_json(image_id)
save_path = os.path.join(save_dir, image_info.name + ".json")
sly.json.dump_json_file(ann_json, save_path)
print(f"Downloaded annotation file: {save_path}")

# or convert to Annotation object
ann = sly.Annotation.from_json(ann_json, project_meta)
print(f"Downloaded annotation object with {len(ann.labels)} labels")

Annotation JSON Format

Supervisely allows you to download annotations in JSON format, which is particularly useful for custom processing or integration with other tools.

Flexibility: JSON format provides the raw data structure, allowing you to parse and process it according to your specific needs.
Completeness: JSON format includes all metadata and additional information that might be stripped in specific export formats.
Interoperability: JSON is a universal format that can be easily converted to other formats or used directly in various applications.

To learn more about Supervisely image annotation format, read the Image Annotation docs.

Batch Downloads

Multiple Images and Annotations

For better performance, download multiple images and annotations in batches. Almost all our methods that download multiple images or annotations at once use batches at a low level. The batch size is optimized for efficient operation across different instances and is set to 50.

import supervisely as sly
from tqdm import tqdm

api = sly.Api.from_env()

project_id = 12345
dataset_id = 67890

# Get image IDs from dataset
image_infos = api.image.get_list(dataset_id)
image_ids = [image_info.id for image_info in image_infos[:100]]  # First 100 images

# Download images. This method is optimized and will download all images batch by batch.
images_progress = tqdm(total=len(image_ids), desc="Downloading images")
images = api.image.download_nps(dataset_id, image_ids, progress_cb=images_progress)

# Download annotations for all images,
annotation_progress = tqdm(total=len(image_ids), desc="Downloading annotations")
batch_anns = api.annotation.download_batch(dataset_id, image_ids, progress_cb=annotation_progress)

Setting Batch Size

Batch size significantly affects download performance. Here's how to set it and understand its impact:

import re
import time
from typing import List, Optional

from requests_toolbelt import MultipartDecoder
from tqdm import tqdm

import supervisely as sly
from supervisely.api.annotation_api import ApiField
from supervisely.imaging import image

api = sly.Api.from_env()

dataset_id = 67890

image_infos = api.image.get_list(dataset_id)
image_ids = [image_info.id for image_info in image_infos[:1000]]  # Test with 1000 images

# Test different batch sizes
batch_sizes = [10, 50, 100, 200]

for batch_size in batch_sizes:
    sly.api_constants.DOWNLOAD_BATCH_SIZE = batch_size
    progress_cb = tqdm(total=len(image_ids), desc=f"Download images: {batch_size}")
    start_time = time.monotonic()
    img_nps = api.image.download_nps(dataset_id, image_ids, progress_cb=progress_cb)
    elapsed_time = time.monotonic() - start_time
    print(f"Batch size: {batch_size}, Time: {elapsed_time:.2f} seconds")

Following results was obtained on Pascal VOC 2012 dataset which you could download from datasetninja.com

Batch size

Time (seconds)

210

100

200

Batch size affects:

Network efficiency: Larger batches reduce overhead from multiple requests
Memory usage: Very large batches consume more RAM
Error handling: Smaller batches are easier to retry if errors occur

The optimal batch size depends on your network conditions, server load, and image sizes. Generally, batch sizes between 50-100 work well for most cases.

Entire Project Downloads

Downloading in Supervisely Format

To download a complete project, you can use the convenient download_fast function that handles all the details for you.

This function provides significant advantages over manual download approaches:

Uses a smart approach to choose between asynchronous downloading or standard method
Downloads the complete structure with all metadata
Preserves the Supervisely format for easy re-import later
Automatically handles batching and resource management
Provides options for customizing exactly what gets downloaded

🚀 It works ~8x faster than the standard download method

import supervisely as sly

api = sly.Api.from_env()

project_id = 12345
save_path = 'Pascal_VOC_2012'
sly.fs.mkdir(save_path)
sly.download_fast(
    api=api,
    project_id=project_id,
    dest_dir=save_path,
)

Read the signature of the download_fast function in the Python SDK

Downloading in Specific Formats

When downloading data from Supervisely, it is initially exported in the native Supervisely format. For projects with thousands of small images, Supervisely offers an optimized approach using "blob files".

However, you can easily convert data from the classic Supervisely format to other popular formats immediately after downloading. The SDK provides built-in conversion utilities that make it simple to transform your data into formats like COCO, YOLO, Pascal VOC, and more.

Extended Supervisely Format with Blobs

This download method is only available for projects that were originally uploaded using the blob format.

import supervisely as sly

api = sly.Api.from_env()

project_id = 12345
output_dir = "blob_project"
sly.fs.mkdir(output_dir)

# Download project with blob files (much faster for projects with many small images)
sly.download_fast(
    api=api,
    project_id=project_id,
    dest_dir=output_dir,
    download_blob_files=True  # Important for blob images
)

The blob approach packages many small images into a single archive file, reducing filesystem operations and network requests.

☄️ Can be up to ~22x faster than standard downloads for projects with thousands of small images (under 100KB each).

For more detailed information about working with blob files, including how to upload and process blob-based projects, please refer to documentation on working with blob files.

You can also use the application from the ecosystem that will download a project of this format: Export to Supervisely format: Blob

Popular formats like COCO, YOLO, Pascal VOC etc.

After downloading classic Supervisely format, you can convert the data to popular formats like COCO, YOLO, or Pascal VOC:

import supervisely as sly

api = sly.Api.from_env()

project_id = 12345

# Define download path
output_dir = "downloaded_project"
sly.fs.mkdir(output_dir)

# Download the project in Supervisely format
print("Downloading project in Supervisely format...")
sly.download_fast(
    api=api,
    project_id=project_id,
    dest_dir=output_dir,
)
print(f"Project saved to: {output_dir}")

# After downloading, you can open the project to access its contents
project_fs = sly.Project(output_dir, sly.OpenMode.READ)

Then you can convert the project to other formats in two ways:

Using the sly.convert functions
Using the Project object

COCO format supports geometry types like rectangles, bitmaps, polygons, and graph nodes

# 1. Convert to COCO format
print("\nConverting to COCO format...")
coco_output_dir = "coco_format"

sly.convert.project_to_coco(output_dir, coco_output_dir)
# or
project_fs.to_coco(coco_output_dir)
print(f"COCO format saved to: {coco_output_dir}")

YOLO format supports:

Detection task: rectangles, bitmaps, polygons, graph nodes, polylines, alpha masks
Segmentation task: polygons, bitmaps, alpha masks - Pose estimation task: graph nodes

# 2. Convert to YOLO format for detection
print("\nConverting to YOLO format for detection...")
yolo_output_dir = "yolo_format"

sly.convert.project_to_yolo(output_dir, yolo_output_dir, task_type="detect")
# or
project_fs.to_yolo(yolo_output_dir, task_type="detect")

print(f"YOLO format saved to: {yolo_output_dir}")

Pascal VOC format supports standard Pascal VOC annotation structure

# 3. Convert to Pascal VOC format
print("\nConverting to Pascal VOC format...")
pascal_output_dir = "pascal_voc_format"

sly.convert.project_to_pascal_voc(output_dir, pascal_output_dir)
# or
project_fs.to_pascal_voc(pascal_output_dir)
print(f"Pascal VOC format saved to: {pascal_output_dir}")

You can also convert specific datasets

# For example, to convert a specific dataset to Pascal VOC format:
for ds in project_fs.datasets:
    sly.convert.dataset_to_pascal_voc(dataset=ds, meta=project_fs.meta, dest_dir=pascal_output_dir)
    # or
    ds.to_pascal_voc(project_fs.meta, dest_dir=pascal_output_dir)

These conversion utilities make it easy to use your Supervisely data with other frameworks and tools without needing to implement custom converters.

Working with Datasets

Dataset Hierarchy

Supervisely supports hierarchical dataset structures. See the special article that explains how to work with projects that have hierarchical datasets - Iterate over a project

Here's how to navigate and work with them:

import supervisely as sly

api = sly.Api.from_env()

project_id = 172

# Use the tree() method to efficiently iterate through the dataset hierarchy
print("Dataset Hierarchy (using tree method):")
for parents, dataset in api.dataset.tree(project_id):
    # parents is a list of parent dataset names, empty list for root datasets
    indent = "  " * len(parents)
    if not parents:
        print(f"{indent}- {dataset.name} (ID: {dataset.id})")
    else:
        parent_path = " > ".join(parents)
        print(f"{indent}|- {dataset.name} (ID: {dataset.id}, Path: {parent_path})")

# Output will look like this:
# - DS1 (ID: 328)
#   |- DS1-1 (ID: 383, Path: DS1)
#     |- DS1-1-1 (ID: 384, Path: DS1 > DS1-1)
# - DS2 (ID: 382)
#   |- DS2-1 (ID: 385, Path: DS2)
#     |- DS2-1-1 (ID: 774, Path: DS2 > DS2-1)
#       |- DS2-1-1-1 (ID: 775, Path: DS2 > DS2-1 > DS2-1-1)

import json
import supervisely as sly

api = sly.Api.from_env()

project_id = 172

# Get the dataset tree as a dictionary structure
original_tree = api.dataset.get_tree(project_id)
# Convert tree to use dataset `[ID] Name` as keys instead of DatasetInfo objects for better representation
def convert_tree_to_id_keys(tree: dict) -> dict:
    id_tree = {}
    for dataset_info, children in tree.items():
        id_tree[f"[{dataset_info.id}] {dataset_info.name}"] = {
            "children": convert_tree_to_id_keys(children) if children else {}
        }
    return id_tree
dataset_tree = convert_tree_to_id_keys(original_tree)
# You can now navigate this tree structure programmatically
print("\nDataset Tree Structure (using get_tree method): ")
print(json.dumps(dataset_tree, indent=2))

# Output will look like this:
# {
#   "[328] DS1": {
#     "children": {
#       "[383] DS1-1": {
#         "children": {
#           "[384] DS1-1-1": {
#             "children": {}
#           }
#         }
#       }
#     }
#   },
#   "[382] DS2": {
#     "children": {
#       "[385] DS2-1": {
#         "children": {
#           "[774] DS2-1-1": {
#             "children": {
#               "[775] DS2-1-1-1": {
#                 "children": {}
#               }
#             }
#           }
#         }
#       }

import supervisely as sly

api = sly.Api.from_env()

project_id = 172

# Get all datasets including nested ones with recursive=True
print("\nAll Datasets (Flat List):")
all_datasets = api.dataset.get_list(project_id, recursive=True)
for ds in all_datasets:
    parent_info = f"(Parent ID: {ds.parent_id})" if ds.parent_id is not None else "(Root)"
    print(f"- [{ds.id}] {ds.name} {parent_info}")

# Output will look like this:
# - [328] DS1 (Root)
# - [382] DS2 (Root)
# - [383] DS1-1 (Parent ID: 328)
# - [384] DS1-1-1 (Parent ID: 383)
# - [385] DS2-1 (Parent ID: 382)
# - [774] DS2-1-1 (Parent ID: 385)
# - [775] DS2-1-1-1 (Parent ID: 774)

import supervisely as sly

api = sly.Api.from_env()

project_id = 172
dataset_id = 385  # Specify parent dataset ID

# Get only nested datasets of a specific parent dataset
print(f"\nNested Datasets for Dataset ID {dataset_id}:")
nested_datasets = api.dataset.get_nested(project_id, dataset_id)
for ds in nested_datasets:
    print(f"- [{ds.id}] {ds.name} (Parent ID: {ds.parent_id})")

# Output will look like this:
# - [774] DS2-1-1 (Parent ID: 385)
# - [775] DS2-1-1-1 (Parent ID: 774)

Downloading Specific Datasets

To download specific datasets from a project, you can use the convenient download_fast mentioned above.

When you specify a dataset ID, the function will:

Create the folder structure up to the parent dataset level
Download only the images and annotations for the specified dataset
Skip downloading images from parent datasets in the hierarchy
Skip downloading any nested child datasets that might exist under your specified dataset

If you need to download an entire branch of the dataset hierarchy (a dataset and all its nested children), you would need to provide all the relevant dataset IDs in the dataset_ids parameter.

import supervisely as sly

api = sly.Api.from_env()

project_id = 10
dataset_ids = [39]
save_path = 'Pascal_VOC_2012/train'
sly.fs.mkdir(save_path)
sly.download_fast(
    api=api,
    project_id=project_id,
    dest_dir=save_path,
    dataset_ids=dataset_ids,
)

Dataset Images Asynchronous Downloads

Download Methods

For better performance, you can use asynchronous methods even in a synchronous context:

import supervisely as sly

coroutine = download_nps_async(img_ids)
img_nps = sly.run_coroutine(coroutine)

The table below lists various asynchronous methods available in the Supervisely SDK for downloading images in different formats and output types. These methods can significantly improve download performance compared to their synchronous counterparts, especially when working with large datasets.

Method

Description

download_np_async

Downloads a single image as numpy array

download_nps_async

Downloads multiple images as numpy arrays

download_path_async

Downloads a single image to a specified path

download_paths_async

Downloads multiple images to specified paths

download_bytes_single_async

Downloads a single image as bytes

download_bytes_many_async

Downloads multiple images as bytes in parallel (one request per image)

download_bytes_generator_async

Downloads multiple images as bytes using a single batch request, yielding results through an async generator

Performance Comparison

Let's compare synchronous and asynchronous download methods, for example as numpy array:

import supervisely as sly
from time import monotonic
from tqdm import tqdm

api = sly.Api.from_env()

dataset_id = 1037458

# Get image IDs for testing
image_infos = api.image.get_list(dataset_id)
img_ids = [info.id for info in image_infos[:1000]]  # Using first 1000 images

print(f"Testing download performance with {len(img_ids)} images...")

# 1. Synchronous download (one by one)
start_time = monotonic()
img_nps = []
progress = tqdm(total=len(img_ids), desc="Sync download")
for img_id in img_ids:
    img_nps.append(api.image.download_np(img_id))
    progress.update(1)
sync_time = monotonic() - start_time
print(f"Synchronous download took {sync_time:.2f} seconds")

import supervisely as sly
from time import monotonic
from tqdm import tqdm

api = sly.Api.from_env()

dataset_id = 1037458

# Get image IDs for testing
image_infos = api.image.get_list(dataset_id)
img_ids = [info.id for info in image_infos[:1000]]  # Using first 1000 images

print(f"Testing download performance with {len(img_ids)} images...")

# 2. Batch synchronous download with predefined batch sizes: 50
start_time = monotonic()
progress = tqdm(total=len(img_ids), desc=f"Batch download")
img_nps = api.image.download_nps(dataset_id, img_ids, progress_cb=progress)
batch_time = monotonic() - start_time
print(f"Batch download took {batch_time:.2f} seconds")

import supervisely as sly
from time import monotonic
from tqdm import tqdm

api = sly.Api.from_env()

dataset_id = 1037458

# Get image IDs for testing
image_infos = api.image.get_list(dataset_id)
img_ids = [info.id for info in image_infos[:1000]]  # Using first 1000 images

print(f"Testing download performance with {len(img_ids)} images...")

# 3. Asynchronous download
start_time = monotonic()
progress = tqdm(total=len(img_ids), desc="Async download")
# Run async function in synchronous context
img_nps = sly.run_coroutine(api.image.download_nps_async(img_ids))
async_time = monotonic() - start_time
print(f"Asynchronous download took {async_time:.2f} seconds")

# Calculate speedups
print("\nSpeedup compared to synchronous download:")

batch_speedup = sync_time / batch_time
print(f"Batch: {batch_speedup:.2f}x faster")

async_speedup = sync_time / async_time
print(f"Asynchronous: {async_speedup:.2f}x faster")

Images

The performance improvement from synchronous to batch to asynchronous methods can be dramatic:

Batch: ~2.5x speedup
🚀 Asynchronous: ~15x speedup

Results was obtained on Pascal VOC 2012 dataset which you could download from datasetninja.com

Method

Description

Pros

Cons

Best For

Single download

Download one image at a time

Simple to implement, minimal memory usage

Very slow for many images

Small projects, debugging

Batch download

Download images in groups

Better network utilization, simple API

Blocking operation

Medium-sized projects

Asynchronous download

Non-blocking parallel downloads

Highest performance, efficient resource usage

Limited by network/system performance

Large projects

Using asynchronous downloads with proper concurrency control (via semaphores) enables you to get the best possible performance while managing system resource usage.

Advanced Annotation Downloads

Synchronous Annotation Downloads

First, we'll compare what speed increase we get when downloading annotations in batches with a fixed size of 50. This size remains constant since an optimal value has been chosen that will work efficiently for any instance configuration.

import supervisely as sly
from time import monotonic
from tqdm import tqdm

api = sly.Api.from_env()

dataset_id = 1037458

# Get image IDs
image_infos = api.image.get_list(dataset_id)
img_ids = [info.id for info in image_infos[:1000]]  # Using first 1000 images

print(f"Testing annotation download performance with {len(img_ids)} images...")

# 1. Synchronous download (one by one)
start_time = monotonic()
anns = []
progress = tqdm(total=len(img_ids), desc="Sync download")
for img_id in img_ids:
    anns.append(api.annotation.download(img_id))
    progress.update(1)
sync_time = monotonic() - start_time
print(f"Synchronous annotations download took {sync_time:.2f} seconds")

import supervisely as sly
from time import monotonic
from tqdm import tqdm

api = sly.Api.from_env()

dataset_id = 1037458

# Get image IDs
image_infos = api.image.get_list(dataset_id)
img_ids = [info.id for info in image_infos[:1000]]  # Using first 1000 images

print(f"Testing annotation download performance with {len(img_ids)} images...")

# 2. Batch synchronous download with predefined batch sizes: 50
start_time = monotonic()
progress = tqdm(total=len(img_ids), desc=f"Batch download")
anns = api.annotation.download_batch(dataset_id, img_ids, progress_cb=progress)
batch_time = monotonic() - start_time
print(f"Batch download took {batch_time:.2f} seconds")

Asynchronous Annotation Downloads

There are two methods for asynchronous annotation downloading that are used depending on the types of annotations in your dataset images.

Method

Best For

download_batch_async

Standard annotations for normal-sized images

download_bulk_async

Small or simple annotations for smaller-sized images

To apply these methods effectively, you can separate images into different lists based on their size information.

import supervisely as sly
from time import monotonic
from tqdm import tqdm
import asyncio

api = sly.Api.from_env()

dataset_id = 1037458

# Get image IDs
image_infos = api.image.get_list(dataset_id)
img_ids = [info.id for info in image_infos[:1000]]  # Using first 1000 images

print(f"Testing annotation download performance with {len(img_ids)} images...")

# Method 3: Download multiple annotations in parallel (one request per image)
# Adjust the number of concurrent requests depending on your instance limitations
semaphore = asyncio.Semaphore(4)

progress = tqdm(total=len(img_ids), desc=f"Async Batch download")
download_coroutine = api.annotation.download_batch_async(
    dataset_id,
    img_ids,
    progress_cb=progress,
    semaphore=semaphore,
)
start_time = monotonic()
anns = sly.run_coroutine(download_coroutine)
async_batch_time = monotonic() - start_time
print(f"Downloaded {len(anns)} annotations in {async_batch_time:.2f} seconds")

import supervisely as sly
from time import monotonic
from tqdm import tqdm
import asyncio

api = sly.Api.from_env()

dataset_id = 1037458

# Get image IDs
image_infos = api.image.get_list(dataset_id)
img_ids = [info.id for info in image_infos[:1000]]  # Using first 1000 images

print(f"Testing annotation download performance with {len(img_ids)} images...")

# Method 4: Download multiple annotations in parallel in batches
tasks = []
progress = tqdm(total=len(img_ids), desc=f"Async Bilk download")
for batch in sly.batched(img_ids):
    download_coroutine = api.annotation.download_bulk_async(
        dataset_id,
        batch,
        progress_cb=progress,
    )
    tasks.append(download_coroutine)

start_time = monotonic()
anns_batches = sly.run_coroutine(asyncio.gather(*tasks))
anns = []
for batch_result in anns_batches:
    anns.extend(batch_result)
async_bulk_time = monotonic() - start_time
print(f"Downloaded {len(anns)} annotations in {async_bulk_time:.2f} seconds")

# Calculate speedups
async_batch_speedup = sync_time / async_batch_time
print(f"Speedup of async batch download: {async_batch_speedup:.2f}x faster")
async_bulk_speedup = sync_time / async_bulk_time
print(f"Speedup of async batch download: {async_bulk_speedup:.2f}x faster")

Following results was obtained on Pascal VOC 2012 dataset which you could download from datasetninja.com

Annotations

The performance improvement from synchronous to batch to asynchronous methods:

Batch: ~3.5x speedup
Asynchronous: ~8x speedup
🧨 Asynchronous batch: ~19x speedup

Your specific speedup may differ from these benchmarks depending on: number of annotations on images, complexity of annotations, image size (annotation size), network conditions, server load.

The benefits of asynchronous downloading:

Parallel processing: Multiple batches can be downloaded simultaneously
Better resource utilization: Network I/O doesn't block the application
Improved throughput: Especially noticeable with many small files
Reduced total processing time: Significant reduction for large datasets

Choosing the Right Async Method

For best performance, consider these guidelines:

Use semaphore to control concurrency (typically 5-20 concurrent downloads)
The download_bulk_async method is generally fastest for datasets with many small annotations
For complex annotations with alpha masks or large bitmaps, download_batch_async with a smaller semaphore value may work better
When using ApiContext, the methods automatically use the project metadata to avoid redundant API calls:

# Optimize downloads with ApiContext
project_id = api.dataset.get_info_by_id(dataset_id).project_id
project_meta = api.project.get_meta(project_id)

with sly.ApiContext(api, dataset_id=dataset_id, project_id=project_id, project_meta=project_meta):
    annotations = sly.run_coroutine(download_bulk_async())

Figures Download

When working with datasets that have large numbers of annotations, downloading figures in bulk can significantly improve performance. Supervisely provides dedicated API methods for this purpose.

The bulk figure download approach is particularly effective when:

You need to analyze annotation distribution without loading full data
You're developing a custom export pipeline to another format
You need to visualize or process specific types of annotations
Your dataset contains many images with hundreds or thousands of annotations

Understanding Figures vs Annotations

In Supervisely's data model:

Annotations contain all information about labeled objects, including tags and metadata
Figures represent the geometric shapes that define objects in images (rectangles, polygons, bitmaps, etc.)

For many ML tasks, you might only need the geometric information without all the associated metadata.

Basic Figures Download

FigureInfo represents detailed information about a figure: geometry, tags, metadata etc. Here's how to get FigureInfo for images in a dataset.

import supervisely as sly
from tqdm import tqdm

api = sly.Api.from_env()

# Define dataset ID
dataset_id = 254737

# Download all figures for a dataset
# Returns a dictionary where keys are image IDs and values are lists of figures
figures_dict = api.image.figure.download(dataset_id)
figure_ids = []
# Process each image's figures
for image_id, figures in figures_dict.items():
    print(f"Image ID: {image_id}, Number of figures: {len(figures)}")
    for figure in figures:
        print(f"  - Figure ID: {figure.id}, Class ID: {figure.class_id}, Type: {figure.geometry_type}")
        figure_ids.append(figure.id)

Optimized Figures Download

For large datasets, you can skip downloading the geometry data initially to speed up the process.

For example, when you need to filter figures by class. You download lightweight FigureInfos, process it, and get a list of figures you need.

import supervisely as sly
from tqdm import tqdm
from supervisely.geometry.alpha_mask import AlphaMask

api = sly.Api.from_env()

# Define dataset ID
dataset_id = 254737

# Download only figures info without geometries
figures_dict = api.image.figure.download(dataset_id, skip_geometry=True)
# Collect figure IDs
figures_ids = []
for image_id, figures in figures_dict.items():
    for figure in figures:
        if figure.geometry_type == AlphaMask.name():
            figures_ids.append(figure.id)

print(f"Found {len(figures_ids)} AlphaMask figures in the dataset")

Working with AlphaMask Geometries

For advanced cases like AlphaMask geometries, you'll need to handle the download separately:

# Then download only the geometries you need in batches
progress = tqdm(total=len(figures_ids), desc="Downloading geometries")
geometries = api.image.figure.download_geometries_batch(figures_ids, progress_cb=progress)

# Process geometries
for figure_id, geometry in zip(figures_ids, geometries):
    # Your processing code here
    pass

The bulk geometry download offers several advantages:

Reduced network overhead: Only essential figure data is transferred
Faster processing: Server-side filtering minimizes data transfer
Lower memory usage: Only relevant geometry information is returned
Simplified post-processing: Data is already in the required format

Advanced: Asynchronous Downloads

For even better performance with large datasets (containing approximately 1.2 million figures in total), you can use asynchronous downloads:

import supervisely as sly

api = sly.Api.from_env()

figures_dict = api.image.figure.download_fast(dataset_id)

alpha_ids = []
for image_id, figures in figures_dict.items():
    for figure in figures:
        if figure.geometry_type == AlphaMask.name():
            alpha_ids.append(figure.id)

progress = tqdm(total=len(alpha_ids), desc="Downloading AlphaMask geometries")

# Download geometries asynchronously
download_coroutine = api.image.figure.download_geometries_batch_async(alpha_ids, progress_cb=progress)
geometries = sly.run_coroutine(download_coroutine)

print(f"Downloaded {len(geometries)} geometries")

Figures

The performance improvement from synchronous to asynchronous method:

Synchronous without geometries: ~1.3x
💪 Asynchronous: ~5x
💪 Asynchronous without geometries: ~6x

Performance Tips for Figure Downloads

Use skip_geometry=True when you only need figure metadata initially
Process figures by type - some geometry types might need special handling
Download geometries in batches (optimal batch size is typically 50-200)
Use asynchronous methods for large datasets with many figures
Consider memory constraints when downloading many complex geometries

Conclusion

When downloading data from Supervisely, choosing the right method can dramatically impact performance. Single downloads are simple but inefficient for large datasets, suitable only for debugging or working with a few images. Batch downloads offer a good balance of simplicity and performance for medium-sized projects, improving network utilization while remaining easy to implement. For large-scale projects with thousands of images or annotations, asynchronous downloads deliver the best performance - up to ~20x faster than sequential downloads - by efficiently utilizing network resources and processing multiple requests in parallel.

Remember to use semaphores to control concurrency and consider the specific characteristics of your data (image sizes, annotation complexity) when selecting a download method. By implementing the appropriate download strategy for your project's scale, you can significantly reduce processing time and improve overall workflow efficiency.

PreviousAdvanced: Optimized Import NextAI Search

Last updated 7 months ago

Was this helpful?