AI Search

Overview

This tutorial demonstrates how to use the Supervisely Python SDK to work with AI Search functionality. AI Search allows you to intelligently search for images within a project using semantic similarity, leveraging CLIP embeddings stored in a dedicated vector database (Qdrant).

Prerequisites

Instance Requirements

Minimum Instance Version: 6.14.4

For AI Search functionality to work properly, your Supervisely instance must have the following services running:

  • Embeddings Generator - Handles the calculation of CLIP embeddings for images

  • Embeddings Auto-Updater - Automatically updates embeddings when new images are added

  • Qdrant Vector Database - Configured to store and retrieve the calculated embeddings for projects

  • CLIP Service Application - Provides the neural network model for generating image embeddings

These services are typically configured by your instance administrator. If AI Search is not working, contact your administrator to ensure all required services are properly deployed and running.

SDK Setup

Before starting, ensure you have set up your development environment and installed Supervisely SDK version 6.73.413 or higher

To install the required SDK version:

Initialize API Client

Once you have your credentials configured, initialize the API client:

Core Methods

Calculate Embeddings

The calculate_embeddings method initiates an asynchronous calculation of CLIP embeddings for all images in the specified project. The embeddings are generated and stored in the Qdrant vector database for AI Search operations.

Parameters:

Argument
Type
Description

id

int

Required. The unique identifier of the project for which to calculate embeddings.

Returns:

  • None

Notes:

  • Before calculating embeddings, ensure that the project has the embeddings_enabled flag set to True using the enable_embeddings() method. Otherwise, the calculation request will not be processed.

  • This method sends a request to the Embeddings Generator service and returns immediately. The actual calculation happens asynchronously in the background.

  • Embeddings must be calculated before AI Search can be performed on the project.

  • The calculation time depends on the number of images in the project and the performance of your instance.

  • Progress can be tracked through the ProjectInfo where embeddings_in_progress should be False and embeddings_updated_at timestamp should be present to indicate completion.

  • If the Embeddings Auto-Updater service is running, new images added to the project will have embeddings calculated automatically.

The perform_ai_search method executes an AI-powered search within a project using one of three mutually exclusive search modes:

  • semantic text search

  • image similarity search

  • diverse sampling

Parameters:

Argument
Type
Description

project_id

int

Required. Unique identifier of the project to search within.

dataset_id

Optional[int]

Optional. Restricts the search to images within this dataset. Default is None (searches entire project).

image_id

Optional[List[int]]

Optional. IDs of a reference image for similarity search. Finds images visually similar to this images.

prompt

Optional[str]

Optional. Natural language text for semantic search. Finds images matching this description.

method

Optional[str]

Optional. Sampling method for diverse search: "centroids" (representative samples from clusters), "random" (evenly across clusters).

limit

Optional[int]

Optional. Maximum number of images to return. Default is 100.

clustering_method

Optional[Literal["kmeans", "dbscan"]]

Optional. Clustering method for results: "kmeans" or "dbscan". If None, no clustering is applied.

num_clusters

Optional[int]

Optional. Number of clusters to create if clustering_method is specified. Required for "kmeans" method.

image_id_scope

Optional[List[int]]

Optional. List of image IDs to limit the search scope. If None, search is performed across all images unless other filters are set.

threshold

Optional[float]

Optional. Similarity threshold. Only images with similarity above this value are returned. In search results, this parameter is also referred to as score.

Returns:

  • The ID int of the created entities collection containing search results, or None if no collection was created (e.g., no results found).

Raises:

  • ValueError: If more than one of prompt, image_id, or method is provided (they are mutually exclusive).

  • ValueError: If method is provided but is not one of the allowed values ("centroids" or "random").

Notes:

  • Only one search mode parameter (prompt, image_id, or method) can be used per search.

  • The returned collection ID can be used to retrieve the actual image results using the image API methods.

  • Collections created by AI Search are temporary and will be overwritten by subsequent searches unless explicitly saved.

  • Embeddings must be calculated for the project before performing any search operations.

Complete Examples

Enable AI Search for a Project

Other Embeddings Methods

Here's a comprehensive table of all embeddings-related methods in the Supervisely SDK:

Method
Module
Description
Parameters
Returns

enable_embeddings()

api.project

Enable embeddings for the project

id: int - Project ID silent: bool = True

None

disable_embeddings()

api.project

Disable embeddings for the project

id: int - Project ID silent: bool = True

None

is_embeddings_enabled()

api.project

Check if embeddings are enabled for the project.

id: int - Project ID

bool

set_embeddings_in_progress()

api.project

Set embeddings calculation status

id: int - Project ID in_progress: bool

None

get_embeddings_in_progress()

api.project

Get embeddings calculation status

id: int - Project ID

bool

set_embeddings_updated_at()

api.project

Set the timestamp when embeddings were last updated

id: int - Project ID timestamp: Optional[str] = None silent: bool = True

None

get_embeddings_updated_at()

api.project

Get the timestamp when embeddings were last updated

id: int - Project ID

str - YYYY-MM-DDTHH:MM:SS.fffZ

Possible Use Cases

Integration with Annotation Workflow

Active Learning Sample Selection

Quality Control in Manufacturing

Summary

The AI Search functionality in Supervisely provides powerful capabilities for:

  1. Semantic Search: Find images based on natural language descriptions

  2. Similarity Search: Locate visually similar images

  3. Diverse Sampling: Get representative samples from your dataset

  4. Dataset Exploration: Understand the diversity and structure of your data

For more information, refer to the AI Search documentation, which provides a visual overview of how AI Search works.

Last updated

Was this helpful?