Advanced: Export
This advanced tutorial will guide you through various methods of downloading images and annotations from Supervisely. We'll cover everything from basic downloads to optimized approaches for large projects, with performance benchmarks to illustrate the benefits of different techniques.
This tutorial uses Supervisely Python SDK version v6.73.349. The code examples provided are compatible with this specific version. Using the exact or newer version ensures you'll get the expected results. You can install it using:
pip install supervisely==6.73.349Basic Downloads
Project Metadata
Project metadata contains essential information about your project, including classes, tags, and other configurations.
import supervisely as sly
# Initialize API client
api = sly.Api.from_env()
# Get project metadata with project settings by ID
project_id = 12345
project_meta_json = api.project.get_meta(project_id, with_settings=True)
project_meta = sly.ProjectMeta.from_json(project_meta_json)
# Display project classes and tags
print("Project Classes:")
for obj_class in project_meta.obj_classes:
print(f"- {obj_class.name} ({obj_class.geometry_type.name()})")
print("\nProject Tags:")
for tag_meta in project_meta.tag_metas:
print(f"- {tag_meta.name}")Single Image and Annotation
Here's how to download a single image and its annotation:
Annotation JSON Format
Supervisely allows you to download annotations in JSON format, which is particularly useful for custom processing or integration with other tools.
Flexibility: JSON format provides the raw data structure, allowing you to parse and process it according to your specific needs.
Completeness: JSON format includes all metadata and additional information that might be stripped in specific export formats.
Interoperability: JSON is a universal format that can be easily converted to other formats or used directly in various applications.
To learn more about Supervisely image annotation format, read the Image Annotation docs.
Batch Downloads
Multiple Images and Annotations
For better performance, download multiple images and annotations in batches. Almost all our methods that download multiple images or annotations at once use batches at a low level. The batch size is optimized for efficient operation across different instances and is set to 50.
Setting Batch Size
Batch size significantly affects download performance. Here's how to set it and understand its impact:
Following results was obtained on Pascal VOC 2012 dataset which you could download from datasetninja.com
10
210
50
44
100
33
200
31
Batch size affects:
Network efficiency: Larger batches reduce overhead from multiple requests
Memory usage: Very large batches consume more RAM
Error handling: Smaller batches are easier to retry if errors occur
The optimal batch size depends on your network conditions, server load, and image sizes. Generally, batch sizes between 50-100 work well for most cases.
Entire Project Downloads
Downloading in Supervisely Format
To download a complete project, you can use the convenient download_fast function that handles all the details for you.
This function provides significant advantages over manual download approaches:
Uses a smart approach to choose between asynchronous downloading or standard method
Downloads the complete structure with all metadata
Preserves the Supervisely format for easy re-import later
Automatically handles batching and resource management
Provides options for customizing exactly what gets downloaded
π It works ~8x faster than the standard download method
Read the signature of the download_fast function in the Python SDK
Downloading in Specific Formats
When downloading data from Supervisely, it is initially exported in the native Supervisely format. For projects with thousands of small images, Supervisely offers an optimized approach using "blob files".
However, you can easily convert data from the classic Supervisely format to other popular formats immediately after downloading. The SDK provides built-in conversion utilities that make it simple to transform your data into formats like COCO, YOLO, Pascal VOC, and more.
Extended Supervisely Format with Blobs
This download method is only available for projects that were originally uploaded using the blob format.
The blob approach packages many small images into a single archive file, reducing filesystem operations and network requests.
βοΈ Can be up to ~22x faster than standard downloads for projects with thousands of small images (under 100KB each).
For more detailed information about working with blob files, including how to upload and process blob-based projects, please refer to documentation on working with blob files.
You can also use the application from the ecosystem that will download a project of this format: Export to Supervisely format: Blob
Popular formats like COCO, YOLO, Pascal VOC etc.
After downloading classic Supervisely format, you can convert the data to popular formats like COCO, YOLO, or Pascal VOC:
Then you can convert the project to other formats in two ways:
Using the sly.convert functions
Using the Project object
COCO format supports geometry types like rectangles, bitmaps, polygons, and graph nodes
YOLO format supports:
Detection task: rectangles, bitmaps, polygons, graph nodes, polylines, alpha masks
Segmentation task: polygons, bitmaps, alpha masks - Pose estimation task: graph nodes
Pascal VOC format supports standard Pascal VOC annotation structure
You can also convert specific datasets
These conversion utilities make it easy to use your Supervisely data with other frameworks and tools without needing to implement custom converters.
Working with Datasets
Dataset Hierarchy
Supervisely supports hierarchical dataset structures. See the special article that explains how to work with projects that have hierarchical datasets - Iterate over a project
Here's how to navigate and work with them:
Downloading Specific Datasets
To download specific datasets from a project, you can use the convenient download_fast mentioned above.
When you specify a dataset ID, the function will:
Create the folder structure up to the parent dataset level
Download only the images and annotations for the specified dataset
Skip downloading images from parent datasets in the hierarchy
Skip downloading any nested child datasets that might exist under your specified dataset
If you need to download an entire branch of the dataset hierarchy (a dataset and all its nested children), you would need to provide all the relevant dataset IDs in the dataset_ids parameter.
Dataset Images Asynchronous Downloads
Download Methods
For better performance, you can use asynchronous methods even in a synchronous context:
The table below lists various asynchronous methods available in the Supervisely SDK for downloading images in different formats and output types. These methods can significantly improve download performance compared to their synchronous counterparts, especially when working with large datasets.
download_np_async
Downloads a single image as numpy array
download_nps_async
Downloads multiple images as numpy arrays
download_path_async
Downloads a single image to a specified path
download_paths_async
Downloads multiple images to specified paths
download_bytes_single_async
Downloads a single image as bytes
download_bytes_many_async
Downloads multiple images as bytes in parallel (one request per image)
download_bytes_generator_async
Downloads multiple images as bytes using a single batch request, yielding results through an async generator
Performance Comparison
Let's compare synchronous and asynchronous download methods, for example as numpy array:
Images
The performance improvement from synchronous to batch to asynchronous methods can be dramatic:
Batch:
~2.5xspeedupπ Asynchronous:
~15xspeedup
Results was obtained on Pascal VOC 2012 dataset which you could download from datasetninja.com
Single download
Download one image at a time
Simple to implement, minimal memory usage
Very slow for many images
Small projects, debugging
Batch download
Download images in groups
Better network utilization, simple API
Blocking operation
Medium-sized projects
Asynchronous download
Non-blocking parallel downloads
Highest performance, efficient resource usage
Limited by network/system performance
Large projects
Using asynchronous downloads with proper concurrency control (via semaphores) enables you to get the best possible performance while managing system resource usage.
Advanced Annotation Downloads
Synchronous Annotation Downloads
First, we'll compare what speed increase we get when downloading annotations in batches with a fixed size of 50. This size remains constant since an optimal value has been chosen that will work efficiently for any instance configuration.
Asynchronous Annotation Downloads
There are two methods for asynchronous annotation downloading that are used depending on the types of annotations in your dataset images.
download_batch_async
Standard annotations for normal-sized images
download_bulk_async
Small or simple annotations for smaller-sized images
Following results was obtained on Pascal VOC 2012 dataset which you could download from datasetninja.com
Annotations
The performance improvement from synchronous to batch to asynchronous methods:
Batch:
~3.5xspeedupAsynchronous:
~8xspeedup𧨠Asynchronous batch:
~19xspeedup
Your specific speedup may differ from these benchmarks depending on: number of annotations on images, complexity of annotations, image size (annotation size), network conditions, server load.
The benefits of asynchronous downloading:
Parallel processing: Multiple batches can be downloaded simultaneously
Better resource utilization: Network I/O doesn't block the application
Improved throughput: Especially noticeable with many small files
Reduced total processing time: Significant reduction for large datasets
Choosing the Right Async Method
For best performance, consider these guidelines:
Use
semaphoreto control concurrency (typically 5-20 concurrent downloads)The
download_bulk_asyncmethod is generally fastest for datasets with many small annotationsFor complex annotations with alpha masks or large bitmaps,
download_batch_asyncwith a smaller semaphore value may work betterWhen using
ApiContext, the methods automatically use the project metadata to avoid redundant API calls:
Figures Download
When working with datasets that have large numbers of annotations, downloading figures in bulk can significantly improve performance. Supervisely provides dedicated API methods for this purpose.
The bulk figure download approach is particularly effective when:
You need to analyze annotation distribution without loading full data
You're developing a custom export pipeline to another format
You need to visualize or process specific types of annotations
Your dataset contains many images with hundreds or thousands of annotations
Understanding Figures vs Annotations
In Supervisely's data model:
Annotations contain all information about labeled objects, including tags and metadata
Figures represent the geometric shapes that define objects in images (rectangles, polygons, bitmaps, etc.)
For many ML tasks, you might only need the geometric information without all the associated metadata.
Basic Figures Download
FigureInfo represents detailed information about a figure: geometry, tags, metadata etc.
Here's how to get FigureInfo for images in a dataset.
Optimized Figures Download
For large datasets, you can skip downloading the geometry data initially to speed up the process.
For example, when you need to filter figures by class. You download lightweight FigureInfos, process it, and get a list of figures you need.
Working with AlphaMask Geometries
For advanced cases like AlphaMask geometries, you'll need to handle the download separately:
The bulk geometry download offers several advantages:
Reduced network overhead: Only essential figure data is transferred
Faster processing: Server-side filtering minimizes data transfer
Lower memory usage: Only relevant geometry information is returned
Simplified post-processing: Data is already in the required format
Advanced: Asynchronous Downloads
For even better performance with large datasets (containing approximately 1.2 million figures in total), you can use asynchronous downloads:
Figures
The performance improvement from synchronous to asynchronous method:
Synchronous without geometries:
~1.3xπͺ Asynchronous:
~5xπͺ Asynchronous without geometries:
~6x
Performance Tips for Figure Downloads
Use
skip_geometry=Truewhen you only need figure metadata initiallyProcess figures by type - some geometry types might need special handling
Download geometries in batches (optimal batch size is typically 50-200)
Use asynchronous methods for large datasets with many figures
Consider memory constraints when downloading many complex geometries
Conclusion
When downloading data from Supervisely, choosing the right method can dramatically impact performance.
Single downloads are simple but inefficient for large datasets, suitable only for debugging or working with a few images.
Batch downloads offer a good balance of simplicity and performance for medium-sized projects, improving network utilization while remaining easy to implement. For large-scale projects with thousands of images or annotations, asynchronous downloads deliver the best performance - up to ~20x faster than sequential downloads - by efficiently utilizing network resources and processing multiple requests in parallel.
Remember to use semaphores to control concurrency and consider the specific characteristics of your data (image sizes, annotation complexity) when selecting a download method. By implementing the appropriate download strategy for your project's scale, you can significantly reduce processing time and improve overall workflow efficiency.
Last updated
Was this helpful?