AI Search
Overview
This tutorial demonstrates how to use the Supervisely Python SDK to work with AI Search functionality. AI Search allows you to intelligently search for images within a project using semantic similarity, leveraging CLIP embeddings stored in a dedicated vector database (Qdrant).
Prerequisites
Instance Requirements
Minimum Instance Version: 6.14.4
For AI Search functionality to work properly, your Supervisely instance must have the following services running:
Embeddings Generator- Handles the calculation of CLIP embeddings for imagesEmbeddings Auto-Updater- Automatically updates embeddings when new images are addedQdrant Vector Database- Configured to store and retrieve the calculated embeddings for projectsCLIP Service Application- Provides the neural network model for generating image embeddings
SDK Setup
Before starting, ensure you have set up your development environment and installed Supervisely SDK version 6.73.413 or higher
To install the required SDK version:
Initialize API Client
Once you have your credentials configured, initialize the API client:
Core Methods
Calculate Embeddings
The calculate_embeddings method initiates an asynchronous calculation of CLIP embeddings for all images in the specified project. The embeddings are generated and stored in the Qdrant vector database for AI Search operations.
Parameters:
id
int
Required. The unique identifier of the project for which to calculate embeddings.
Returns:
None
Notes:
Before calculating embeddings, ensure that the project has the
embeddings_enabledflag set toTrueusing theenable_embeddings()method. Otherwise, the calculation request will not be processed.This method sends a request to the
Embeddings Generatorservice and returns immediately. The actual calculation happens asynchronously in the background.Embeddings must be calculated before AI Search can be performed on the project.
The calculation time depends on the number of images in the project and the performance of your instance.
Progress can be tracked through the
ProjectInfowhereembeddings_in_progressshould beFalseandembeddings_updated_attimestamp should be present to indicate completion.If the
Embeddings Auto-Updaterservice is running, new images added to the project will have embeddings calculated automatically.
Perform AI Search
The perform_ai_search method executes an AI-powered search within a project using one of three mutually exclusive search modes:
semantic text search
image similarity search
diverse sampling
Parameters:
project_id
int
Required. Unique identifier of the project to search within.
dataset_id
Optional[int]
Optional. Restricts the search to images within this dataset. Default is None (searches entire project).
image_id
Optional[List[int]]
Optional. IDs of a reference image for similarity search. Finds images visually similar to this images.
prompt
Optional[str]
Optional. Natural language text for semantic search. Finds images matching this description.
method
Optional[str]
Optional. Sampling method for diverse search: "centroids" (representative samples from clusters), "random" (evenly across clusters).
limit
Optional[int]
Optional. Maximum number of images to return. Default is 100.
clustering_method
Optional[Literal["kmeans", "dbscan"]]
Optional. Clustering method for results: "kmeans" or "dbscan". If None, no clustering is applied.
num_clusters
Optional[int]
Optional. Number of clusters to create if clustering_method is specified. Required for "kmeans" method.
image_id_scope
Optional[List[int]]
Optional. List of image IDs to limit the search scope. If None, search is performed across all images unless other filters are set.
threshold
Optional[float]
Optional. Similarity threshold. Only images with similarity above this value are returned. In search results, this parameter is also referred to as score.
Returns:
The ID
intof the created entities collection containing search results, orNoneif no collection was created (e.g., no results found).
Raises:
ValueError: If more than one of prompt, image_id, or method is provided (they are mutually exclusive).ValueError: If method is provided but is not one of the allowed values ("centroids" or "random").
Notes:
Only one search mode parameter (
prompt,image_id, ormethod) can be used per search.The returned collection ID can be used to retrieve the actual image results using the image API methods.
Collections created by AI Search are temporary and will be overwritten by subsequent searches unless explicitly saved.
Embeddings must be calculated for the project before performing any search operations.
Complete Examples
Enable AI Search for a Project
Text Prompt Search
Image Similarity Search
Diverse Search
Other Embeddings Methods
Here's a comprehensive table of all embeddings-related methods in the Supervisely SDK:
enable_embeddings()
api.project
Enable embeddings for the project
id: int - Project ID
silent: bool = True
None
disable_embeddings()
api.project
Disable embeddings for the project
id: int - Project ID
silent: bool = True
None
is_embeddings_enabled()
api.project
Check if embeddings are enabled for the project.
id: int - Project ID
bool
set_embeddings_in_progress()
api.project
Set embeddings calculation status
id: int - Project ID
in_progress: bool
None
get_embeddings_in_progress()
api.project
Get embeddings calculation status
id: int - Project ID
bool
set_embeddings_updated_at()
api.project
Set the timestamp when embeddings were last updated
id: int - Project ID
timestamp: Optional[str] = None
silent: bool = True
None
get_embeddings_updated_at()
api.project
Get the timestamp when embeddings were last updated
id: int - Project ID
str - YYYY-MM-DDTHH:MM:SS.fffZ
Possible Use Cases
Integration with Annotation Workflow
Active Learning Sample Selection
Quality Control in Manufacturing
Summary
The AI Search functionality in Supervisely provides powerful capabilities for:
Semantic Search: Find images based on natural language descriptions
Similarity Search: Locate visually similar images
Diverse Sampling: Get representative samples from your dataset
Dataset Exploration: Understand the diversity and structure of your data
Key points to remember
Always enable and calculate embeddings before using AI Search
Use appropriate search methods based on your use case
Manage collections efficiently to avoid clutter
Leverage batch operations for large-scale tasks
Monitor embeddings status and update as needed
For more information, refer to the AI Search documentation, which provides a visual overview of how AI Search works.
Last updated
Was this helpful?