In this article, we will learn how to iterate through a project with annotated data in python. It is one of the most frequent operations in Superviely Apps and python automation scripts.
Everything you need to reproduce this tutorial is on GitHub: source code, Visual Studio code configuration, and a shell script for creating venv.
In this guide we will go through the following steps:
If you don't have any projects yet, go to the ecosystem and add the demo project 🍋 Lemons annotated to your current workspace.
2. .env files
Create a file at ~/supervisely.env with the credentials for your Supervisely account. Learn more about environment variables here. The content should look like this:
# your API credentials, learn more here: https://developer.supervisely.com/getting-started/basics-of-authenticationSERVER_ADDRESS="https://app.supervisely.com"# ⬅️ change it if use Enterprise EditionAPI_TOKEN="4r47N.....blablabla......xaTatb"# ⬅️ change it
Create the second file local.env and place it in the same directory with the main.py. This file will contain values we are going to use in the python script.
# change the Project ID to your valuePROJECT_ID=12208# ⬅️ change it
The reason why the variable for Project ID has such a strange name modal.state.slyProjectId will be explained later in the next tutorials. Let's just keep it this way for now.
3. Python script
This script illustrates only the basics. If your project is huge and has hundreds of thousands of images then it is not so efficient to download annotations one by one. It is better to use batch (bulk) methods to reduce the number of API requests and significantly speed up your code. Learn more in the optimizations section below.
Check that you have ~/supervisely.env file with correct values
Source code:
import osimport supervisely as slyfrom dotenv import load_dotenvif sly.is_development():load_dotenv("local.env")load_dotenv(os.path.expanduser("~/supervisely.env"))api = sly.Api.from_env()project_id = sly.env.project_id()project = api.project.get_info_by_id(project_id)if project isNone:raiseKeyError(f"Project with ID {project_id} not found in your account")print(f"Project info: {project.name} (id={project.id})")# get project meta - collection of annotation classes and tagsmeta_json = api.project.get_meta(project.id)project_meta = sly.ProjectMeta.from_json(meta_json)print(project_meta)datasets = api.dataset.get_list(project.id)print(f"There are {len(datasets)} datasets in project")for dataset in datasets:print(f"Dataset {dataset.name} has {dataset.items_count} images") images = api.image.get_list(dataset.id)for image in images: ann_json = api.annotation.download_json(image.id) ann = sly.Annotation.from_json(ann_json, project_meta)print(f"There are {len(ann.labels)} objects on image {image.name}")
Output
The script above produces the following output:
Project info: Lemons (Annotated) (id=12208)
ProjectMeta:
Object Classes
+-------+--------+----------------+--------+
| Name | Shape | Color | Hotkey |
+-------+--------+----------------+--------+
| kiwi | Bitmap | [255, 0, 0] | |
| lemon | Bitmap | [81, 198, 170] | |
+-------+--------+----------------+--------+
Tags
+------+------------+-----------------+--------+---------------+--------------------+
| Name | Value type | Possible values | Hotkey | Applicable to | Applicable classes |
+------+------------+-----------------+--------+---------------+--------------------+
+------+------------+-----------------+--------+---------------+--------------------+
There are 1 datasets in project
Dataset ds1 has 6 images
There are 3 objects on image IMG_1836.jpeg
There are 4 objects on image IMG_8144.jpeg
There are 4 objects on image IMG_3861.jpeg
There are 3 objects on image IMG_0748.jpeg
There are 5 objects on image IMG_4451.jpeg
There are 7 objects on image IMG_2084.jpeg
4. Optimizations
The bottleneck of this script is in these lines (27-28):
for image in images: ann_json = api.annotation.download_json(image.id)
If you have 1M images in your project, your code will send 🟡 1M requests to download annotations. It is inefficient due to Round Trip Time (RTT) and a large number of similar tiny requests to a Supervisely database.
It can be optimized by using the batch API method:
Supervisely API allows downloading annotations for multiple images in a single request. The code sample below sends ✅ 50x fewer requests and it leads to a significant speed-up of our original code:
for batch in sly.batched(images): image_ids = [image.id for image in batch] annotations = api.annotation.download_json_batch(dataset.id, image_ids)for image, ann_json inzip(batch, annotations): ann = sly.Annotation.from_json(ann_json, project_meta)print(f"There are {len(ann.labels)} objects on image {image.name}")
The optimized version of the original script is in main_optimized.py.