Custom 3D AI Assistant app
Supervisely's 3D AI assistant is a universal tool for automating 3D point cloud labeling. It covers all types of labeling scenarios for 3D point clouds: 3D object detection, ground segmentation, 3D cuboid tracking, transfer of 2D annotations from photo context images to original 3D point clouds. But sometimes user may want to use its own algorithms for these task. In this tutorial, you will know how to build custom 3D AI Assistant app and release it to your instance as a private app.
This repository — custom-3d-ai-assistant — is a minimal end-to-end example. You can refer to this example in order to understand which endpoints have to be implemented, which information do they receive from the server and what should they return in response.
1. Overview
3D AI Assistant exposes six HTTP endpoints. Each one is invoked by a different action in the labeling tool:
POST /interactive_3d_detection— predicts a cuboid for area circled by Smart Lasso tool.POST /track— propagates cuboids across consecutive frames of a point cloud episode.POST /generate_clusters— pre-computes labeling-proposal clusters for a whole point cloud.POST /get_labeling_proposal— automatically corrects manually created cuboid (can use clusters generated from the previous endpoint).POST /segment_ground— returns indices of points belonging to the ground.POST /transfer_masks_to_pcd— transfers 2D figures drawn on a photo-context image to 3D point clouds.
Before going further, read the Supervisely Point Cloud Annotation Format — it explains the geometries JSON shape, project structure and how photo-context images are tied to point clouds with extrinsic/intrinsic matrices.
2. Set up environment for development
The fastest way to get a reproducible dev environment is VS Code's Dev Containers extension. The repo ships a .devcontainer/ folder that is ready to use — open the project in VS Code and choose "Reopen in Container". Alternatively, you can use Dockerfile provided below in order to prepare development environment.
Development Dockerfile
Two things to notice:
The base image
supervisely/base-py-sdk:6.73.560already contains the Supervisely Python SDK.wireguardandiproute2are installed only because we want to run the app in advanced debug mode, which uses a WireGuard VPN tunnel into the Supervisely platform (see §4 Advanced debug). When you publish the released private app, you do not need these packages — see the slimmer production Dockerfile below.
Dev container run arguments
Whether you use VS Code's Dev Containers extension or not, if you are developing your app inside some Docker container it is important to remember to run it with some specific arguments. You can see devcontainer.json example below, pay attention to runArgs - those are just arguments for running Docker container:
Each runArgs entry matters:
--gpus all+--runtime=nvidia— give the container access to the host GPU. Required for any real deep-learning model (we do not use any neural networks in our example app, but you may want to use one in your implementation).--ipc=host— share host IPC namespace; needed by PyTorch DataLoaders that use shared memory.--cap-add NET_ADMIN— required by WireGuard so the container can bring up the VPN tunnel during advanced debug.
Environment files
You will need two env files:
~/supervisely.env— your real credentials (SERVER_ADDRESSandAPI_TOKEN).debug.env— placeholders forTEAM_IDandWORKSPACE_ID. Replace them with your own IDs before running advanced debug.
You can get information about environment variables here.
3. Create code base
You can find source code for implementing all endpoints in main.py.
The app is a standard FastAPI server bootstrapped through Supervisely:
Every handler receives a FastAPI Request whose state is populated by Supervisely's middleware. Three attributes matter:
request.state.api— an authenticatedsly.Apiyou can use to download point clouds, create figures, notify progress, etc.request.state.state— per-call payload from the labeling tool (e.g.pcd_id,click_coordinate).request.state.context— per-call context for tracking jobs (e.g.trackId,pointCloudIds,objectIds).
The response convention is a JSON object with two keys: {"result": <payload>, "error": <null or string>}. Synchronous endpoints wrap their body in try/except and return {"result": None, "error": repr(e)} on failure so the UI can surface the error.
Below is a per-endpoint reference. For full implementations including the random-stub bodies, see main.py.
POST /interactive_3d_detection
POST /interactive_3d_detectionCalled when the user circles target object with Smart Lasso tool:
Receives (under request.state.state):
pcd_id
int
ID of the point cloud the user is labeling.
indices
list[int]
Point indices of the user's mask.
Returns: {"result": <Cuboid3d JSON>, "error": null}. Use Cuboid3d / Vector3d from supervisely.geometry.cuboid_3d. Serialize with .to_json().
Function read_pcd (in functions.py) downloads the .pcd file once and caches it on disk under input_pcds/ directory.
POST /track
POST /trackCalled when the user starts a tracking job. The platform expects an immediate acknowledgement — the actual work runs in a FastAPI BackgroundTasks worker. Progress and results are reported back over the Supervisely API as the work proceeds.
Receives (under request.state.context):
trackId
str
UUID of the tracking job; required for progress and error notifications.
datasetId
int
ID of the episode dataset.
pointCloudIds
list[int]
Frames to track through, in chronological order.
direction
str
"forward" or "backward"; reverse pointCloudIds if backward.
objectIds
list[int]
Supervisely object IDs being tracked.
figureIds
list[int]
Source figure IDs; used to size the progress bar.
Returns (immediate): {"message": "Track task started."}.
Background work must:
Load the source cuboids from the first frame via
api.pointcloud.annotation.download(first_pcd_id), deserialized withsly.deserialize_geometry.For each subsequent frame, predict a new cuboid per object and write it back via
api.pointcloud_episode.figure.create(pcd_id, object_id, geom_json, "cuboid_3d", track_id).Notify the UI after each step via
api.pointcloud_episode.notify_progress(track_id, dataset_id, pcd_ids, current, total).Catch all exceptions inside the background task and report them through
point-clouds.episodes.notify-annotation-tool— otherwise the UI's tracking spinner will hang forever. The repo provides asend_error_datadecorator that does this; reuse it.
See the full start_track handler, the track_cuboids background worker, and the send_error_data decorator in main.py.
POST /generate_clusters
POST /generate_clustersCalled once per point cloud, before the user starts requesting labeling proposals. It is a side-effect endpoint: it pre-computes proposal clusters and stores them in memory keyed by pcd_id.
Receives (under request.state.state):
pcd_id
int
ID of the point cloud to cluster.
Returns: should not return anything.
The clusters are kept in the module-level labeling_proposals dict. Your real implementation of /get_labeling_proposal (next) can use these generated clusters, but it depends on your implementation. If you don't need it, you can just skip implementation of this endpoint.
POST /get_labeling_proposal
POST /get_labeling_proposalCalled when user enables Smart Auto-Fit option in AI Assistance tools and places manually created cuboid after that - a request to this endpoint will be sent in order to try to automatically fit cuboid to target object instead of correcting it manually%
Receives (under request.state.state):
pcd_id
int
ID of the point cloud.
click_coordinate
list[float]
[x, y, z] of the user's click.
Returns: {"result": <Cuboid3d JSON> | null, "error": null}. Return null for result when no labeling proposals are available near the click.
POST /segment_ground
POST /segment_groundCalled when the user selects Ground Segmentation option in AI Assistance tools:
Receives (under request.state.state):
pcd_id
int
ID of the point cloud to segment.
Returns: {"result": [int, int, ...], "error": null} — a list of indices into the point cloud's points array that belong to the ground plane.
POST /transfer_masks_to_pcd
POST /transfer_masks_to_pcdCalled when the user has drawn 2D annotations (rectangles, polygons, bitmaps) on a photo-context image and wants to transfer them to 3D space (e.g. bounding boxes -> cuboids):
Receives (under request.state.state):
pcd_id
int
ID of the target point cloud.
image_id
int
ID of the photo-context image carrying the 2D figures.
figure_ids
list[int] (optional)
If present, only transfer these figures. Otherwise transfer all figures on the image.
Returns: {"result": [<entry>, ...]} where each entry is
srcFigureId is mandatory — the labeling tool uses it to link the resulting 3D cuboid back to the originating 2D figure.
For real implementations you will probably need the photo-context image and its camera calibration, plus a way to rasterize the 2D figures. The repo ships two helpers in functions.py that the random-stub transfer_masks_to_pcd handler does not call but that you will almost certainly want.
load_photo_context_data downloads the related image and pulls the 3×4 extrinsic and 3×3 intrinsic matrices out of the point cloud's image metadata:
get_2d_anns turns each Supervisely 2D figure into a numpy mask (for bitmaps and polygons), a [left, top, right, bottom] bbox (for rectangles), or a list of [col, row] points (for polylines):
A real transfer_masks_to_pcd handler typically chains the two: call load_photo_context_data to get the image and matrices, call get_2d_anns to get rasterized 2D figures, then project each figure into the point cloud using the extrinsic/intrinsic matrices and fit a Cuboid3d around the resulting 3D points.
See also: the Supervisely Point Cloud Annotation Format for the canonical JSON shape of
cuboid_3d, dataset structure, and photo-context fields.
4. Advanced debug
Once the code is written, it's time to test it right in the Supervisely platform as a debugging app. In advanced debug mode the app runs locally on your machine but is reachable from the platform through a WireGuard VPN tunnel — so you can hit the real labeling-tool requests against your local breakpoints.
The repo already ships the launch configuration. .vscode/launch.json:
You can read more about advanced debug mode here.
After that:
If you develop in a Docker container, run the container with
--cap-add NET_ADMIN— already configured in devcontainer.json.Install
wireguardandiproute2— already done in Dockerfile. On macOS usebrew install wireguard-tools.Define your
TEAM_ID(andWORKSPACE_ID) indebug.env. The other env variables that advanced debug needs are already set in.vscode/launch.json.Switch the
launch.jsonconfig to "Advanced debug":

Run the code.
✅ It will deploy the app in the Supervisely platform as a REST API service.
Here is how an advanced-debug launch looks like:
After advanced debug launch you must be able to debug your app via the Develop & Debug app (just select Develop & Debug session in session selector).
5. App configuration
Use the config.json from example repository as a starting point:
Two fields are critical for an app to be recognized as a 3D AI Assistant:
"is3DAIAssistant": true— without this flag the platform will not surface your app as a 3D assistant in the labeling tool."session_tags": ["3d_smart_tool", "sly_point_cloud_tracking"]— the labeling tool and the tracker look up running sessions by these tags. Drop either tag and the corresponding feature (interactive smart tool or episode tracking) will not be wired to your app.
Other fields to keep aligned:
"headless": true— this is a serving app with no UI."port"and"entrypoint"must match the uvicorn command (8000 /src.main:app)."docker_image"must point to a published production image. Do not forget to build and push it before releasing app.
6. App release
Once you've tested the code, it's time to release it into the platform. It can be released as an App that is shared with the all Supervisely community, or as your own private App.
Refer to How to Release your App for all releasing details. For a private app check also Private App Tutorial.
Last updated