Supervisely
About SuperviselyEcosystemContact usSlack
  • 💻Supervisely Developer Portal
  • 🎉Getting Started
    • Installation
    • Basics of authentication
    • Intro to Python SDK
    • Environment variables
    • Supervisely annotation format
      • Project Structure
      • Project Meta: Classes, Tags, Settings
      • Objects
      • Tags
      • Image Annotation
      • Video Annotation
      • Point Clouds Annotation
      • Point Cloud Episode Annotation
      • Volumes Annotation
    • Python SDK tutorials
      • Images
        • Images
        • Image and object tags
        • Spatial labels on images
        • Keypoints (skeletons)
        • Multispectral images
        • Multiview images
        • Advanced: Optimized Import
        • Advanced: Export
      • Videos
        • Videos
        • Video and object tags
        • Spatial labels on videos
      • Point Clouds
        • Point Clouds (LiDAR)
        • Point Cloud Episodes and object tags
        • 3D point cloud object segmentation based on sensor fusion and 2D mask guidance
        • 3D segmentation masks projection on 2D photo context image
      • Volumes
        • Volumes (DICOM)
        • Spatial labels on volumes
      • Common
        • Iterate over a project
        • Iterate over a local project
        • Progress Bar tqdm
        • Cloning projects for development
    • Command Line Interface (CLI)
      • Enterprise CLI Tool
        • Instance administration
        • Workflow automation
      • Supervisely SDK CLI
    • Connect your computer
      • Linux
      • Windows WSL
      • Troubleshooting
  • 🔥App development
    • Basics
      • Create app from any py-script
      • Configuration file
        • config.json
        • Example 1. Headless
        • Example 2. App with GUI
        • v1 - Legacy
          • Example 1. v1 Modal Window
          • Example 2. v1 app with GUI
      • Add private app
      • Add public app
      • App Compatibility
    • Apps with GUI
      • Hello World!
      • App in the Image Labeling Tool
      • App in the Video Labeling Tool
      • In-browser app in the Labeling Tool
    • Custom import app
      • Overview
      • From template - simple
      • From scratch - simple
      • From scratch GUI - advanced
      • Finding directories with specific markers
    • Custom export app
      • Overview
      • From template - simple
      • From scratch - advanced
    • Neural Network integration
      • Overview
      • Serving App
        • Introduction
        • Instance segmentation
        • Object detection
        • Semantic segmentation
        • Pose estimation
        • Point tracking
        • Object tracking
        • Mask tracking
        • Image matting
        • How to customize model inference
        • Example: Custom model inference with probability maps
      • Serving App with GUI
        • Introduction
        • How to use default GUI template
        • Default GUI template customization
        • How to create custom user interface
      • Inference API
      • Training App
        • Overview
        • Tensorboard template
        • Object detection
      • High level scheme
      • Custom inference pipeline
      • Train and predict automation model pipeline
    • Advanced
      • Advanced debugging
      • How to make your own widget
      • Tutorial - App Engine v1
        • Chapter 1 Headless
          • Part 1 — Hello world! [From your Python script to Supervisely APP]
          • Part 2 — Errors handling [Catching all bugs]
          • Part 3 — Site Packages [Customize your app]
          • Part 4 — SDK Preview [Lemons counter app]
          • Part 5 — Integrate custom tracker into Videos Annotator tool [OpenCV Tracker]
        • Chapter 2 Modal Window
          • Part 1 — Modal window [What is it?]
          • Part 2 — States and Widgets [Customize modal window]
        • Chapter 3 UI
          • Part 1 — While True Script [It's all what you need]
          • Part 2 — UI Rendering [Simplest UI Application]
          • Part 3 — APP Handlers [Handle Events and Errors]
          • Part 4 — State and Data [Mutable Fields]
          • Part 5 — Styling your app [Customizing the UI]
        • Chapter 4 Additionals
          • Part 1 — Remote Developing with PyCharm [Docker SSH Server]
      • Custom Configuration
        • Fixing SSL Certificate Errors in Supervisely
        • Fixing 400 HTTP errors when using HTTP instead of HTTPS
      • Autostart
      • Coordinate System
      • MLOps Workflow integration
    • Widgets
      • Input
        • Input
        • InputNumber
        • InputTag
        • BindedInputNumber
        • DatePicker
        • DateTimePicker
        • ColorPicker
        • TimePicker
        • ClassesMapping
        • ClassesColorMapping
      • Controls
        • Button
        • Checkbox
        • RadioGroup
        • Switch
        • Slider
        • TrainValSplits
        • FileStorageUpload
        • Timeline
        • Pagination
      • Text Elements
        • Text
        • TextArea
        • Editor
        • Copy to Clipboard
        • Markdown
        • Tooltip
        • ElementTag
        • ElementTagsList
      • Media
        • Image
        • LabeledImage
        • GridGallery
        • Video
        • VideoPlayer
        • ImagePairSequence
        • Icons
        • ObjectClassView
        • ObjectClassesList
        • ImageSlider
        • Carousel
        • TagMetaView
        • TagMetasList
        • ImageAnnotationPreview
        • ClassesMappingPreview
        • ClassesListPreview
        • TagsListPreview
        • MembersListPreview
      • Selection
        • Select
        • SelectTeam
        • SelectWorkspace
        • SelectProject
        • SelectDataset
        • SelectItem
        • SelectTagMeta
        • SelectAppSession
        • SelectString
        • Transfer
        • DestinationProject
        • TeamFilesSelector
        • FileViewer
        • Dropdown
        • Cascader
        • ClassesListSelector
        • TagsListSelector
        • MembersListSelector
        • TreeSelect
        • SelectCudaDevice
      • Thumbnails
        • ProjectThumbnail
        • DatasetThumbnail
        • VideoThumbnail
        • FolderThumbnail
        • FileThumbnail
      • Status Elements
        • Progress
        • NotificationBox
        • DoneLabel
        • DialogMessage
        • TaskLogs
        • Badge
        • ModelInfo
        • Rate
        • CircleProgress
      • Layouts and Containers
        • Card
        • Container
        • Empty
        • Field
        • Flexbox
        • Grid
        • Menu
        • OneOf
        • Sidebar
        • Stepper
        • RadioTabs
        • Tabs
        • TabsDynamic
        • ReloadableArea
        • Collapse
        • Dialog
        • IFrame
      • Tables
        • Table
        • ClassicTable
        • RadioTable
        • ClassesTable
        • RandomSplitsTable
        • FastTable
      • Charts and Plots
        • LineChart
        • GridChart
        • HeatmapChart
        • ApexChart
        • ConfusionMatrix
        • LinePlot
        • GridPlot
        • ScatterChart
        • TreemapChart
        • PieChart
      • Compare Data
        • MatchDatasets
        • MatchTagMetas
        • MatchObjClasses
        • ClassBalance
        • CompareAnnotations
      • Widgets demos on github
  • 😎Advanced user guide
    • Objects binding
    • Automate with Python SDK & API
      • Start and stop app
      • User management
      • Labeling Jobs
  • 🖥️UI widgets
    • Element UI library
    • Supervisely UI widgets
    • Apexcharts - modern & interactive charts
    • Plotly graphing library
  • 📚API References
    • REST API Reference
    • Python SDK Reference
Powered by GitBook
On this page
  • Introduction
  • Input data overview: 3D point cloud, photo context image with 2D masks, sensor calibration parameters
  • Environment preparation and libraries import
  • Download photo context and visualize mask annotations
  • Download input point cloud
  • Get sensor calibration parameters
  • Project LiDAR 3D points on 2D photo context image
  • Get LiDAR point projections located inside masks on photo context image
  • Create 3D point cloud segmentation masks
  • Upload created 3D segmentation mask to Supervisely platform
  • Data export
  • Conclusion
  • Acknowledgement

Was this helpful?

Edit on GitHub
  1. Getting Started
  2. Python SDK tutorials
  3. Point Clouds

3D point cloud object segmentation based on sensor fusion and 2D mask guidance

How to create 3D segmentation masks in point clouds with 2D mask guidance and camera calibration data

PreviousPoint Cloud Episodes and object tagsNext3D segmentation masks projection on 2D photo context image

Last updated 4 months ago

Was this helpful?

Introduction

Nowadays there are hundreds of machine learning models which can perform instance segmentation on 2D images, even without fine-tuning. However, when it comes to segmentation of 3D point clouds, the choice of models becomes significantly narrower and most of them require finetuning in order to work properly on custom data. Since instance segmentation on 2D images is relatively easily accesible nowadays, the tranfer of segmentation masks from 2D to 3D space could significantly speed up 3D point cloud labeling for instance segmentation task.

In this tutorial we will learn how to create segmentation masks for 3D point cloud using segmentation masks on 2D photo context image and camera calibration data. We wil take point cloud, photo context image and camera calibration parameters from dataset as an example, but this approach can be generalized to any data. Supervisely's and will be used for working with point cloud and photo context image respectively.

The main steps of this tutorial are the following:

  • prepare input data: a 3D point cloud, a reference image labeled with segmentation masks, KITTI's sensor calibration files

  • project LiDAR 3D points on 2D reference image

  • get LiDAR point projections located inside masks on photo context image

  • create segmentation masks for 3D point cloud

Everything you need to reproduce this toturial is on : source code, Dockerfile, demo data.

Input data overview: 3D point cloud, photo context image with 2D masks, sensor calibration parameters

As mentioned above, input data will be taken from KITTI dataset. You can import point clouds from KITTI dataset to your Supervisely account using app. We will take one point cloud for example:

input point cloud

In KITTI dataset, every point cloud has 4 reference images: 2 from grayscale cameras (left / right) and 2 from color cameras (left / right). We will take image from left color camera as an example and label it with 2D segmentation masks:

Now let's check sensor calibration parameters. KITTI dataset provides several sensor calibration files:

  • calib_cam_to_cam.txt - contains matrices for camera-to-camera calibration

  • calib_velo_to_cam.txt - contains matrices for velodyne-to-camera registration

  • S_xx: 1x2 size of image xx before rectification

  • K_xx: 3x3 calibration matrix of camera xx before rectification

  • D_xx: 1x5 distortion vector of camera xx before rectification

  • R_xx: 3x3 rotation matrix of camera xx (extrinsic)

  • T_xx: 3x1 translation vector of camera xx (extrinsic)

  • S_rect_xx: 1x2 size of image xx after rectification

  • R_rect_xx: 3x3 rectifying rotation to make image planes co-planar

  • P_rect_xx: 3x4 projection matrix after rectification

For our task, we will need only P_rect_xx, R_rect_xx, R_xx and T_xx matrices.

File for velodyne-to-camera registration contains the following data:

  • R: 3x3 rotation matrix

  • T: 3x1 translation vector

This data serves as a representation of the velodyne coordinate frame in camera coordinates. We will need rotation matrix and translation vector in order to transform point in velodyne coordinates into the camera coordinate system.

Environment preparation and libraries import

FROM supervisely/base-py-sdk:6.73.45

RUN pip3 install open3d==0.13.0
RUN pip3 install executing==1.1.1
RUN pip3 install plotly==5.18.0
RUN pip3 install kaleido==0.2.1
RUN pip3 install jsonschema==4.20.0
import supervisely as sly
from dotenv import load_dotenv
from PIL import Image
import os
import matplotlib.pyplot as plt
import open3d as o3d
from functions import *
from supervisely.project.project_type import ProjectType
from supervisely.geometry.pointcloud import Pointcloud
from supervisely.pointcloud_annotation.pointcloud_tag_collection import (
    PointcloudTagCollection,
)
from supervisely.pointcloud_annotation.pointcloud_object_collection import (
    PointcloudObjectCollection,
)


# load credentials
load_dotenv("../supervisely.env")
api = sly.Api()

# set image display parameters
%matplotlib inline
plt.rcParams["figure.figsize"] = (20, 10)

Download photo context and visualize mask annotations

# define project id and photo context image id
photo_context_project_id = 43714
photo_context_image_id = 32382786

# download photo context image
photo_context_image_np = api.image.download_np(photo_context_image_id)
masked_photo_context_image_np = photo_context_image_np.copy()

# download photo context image annotation
project_meta_json = api.project.get_meta(photo_context_project_id)
project_meta = sly.ProjectMeta.from_json(project_meta_json)
ann_json = api.annotation.download(photo_context_image_id).annotation
ann = sly.Annotation.from_json(ann_json, project_meta)

# draw annotation on photo context image
ann.draw_pretty(masked_photo_context_image_np, thickness=1)

# create output directory
save_dir = "../tutorial_data/"
if not os.path.exists(save_dir):
    os.mkdir(save_dir)

# save result
masked_photo_context_path = os.path.join(save_dir, "masked_photo_context.png")
masked_photo_context_image = Image.fromarray(masked_photo_context_image_np)
masked_photo_context_image.save(masked_photo_context_path)

# display result
plt.axis("off")
plt.imshow(masked_photo_context_image);

Download input point cloud

Next step - download input point cloud to local storage:

# define point cloud id and download it to local storage
pcd_id = 32384623
local_pcd_path = os.path.join(save_dir, "lidar_data.pcd")
api.pointcloud.download_path(pcd_id, local_pcd_path)

# display point cloud on interactive plot
visualize_pcd(local_pcd_path)

Get sensor calibration parameters

y~=PrecticamiRrefirectiTref0refiTveloref0x~,where x~=[x,y,z,1]T\tilde{y} = P^{\text{cam}_i}_{\text{rect}_i} R^{\text{rect}_i}_{\text{ref}_i} T^{\text{ref}_i}_{\text{ref}_0} T^{\text{ref}_0}_{\text{velo}} \tilde{x}, \qquad \text{where } \tilde{x} = [x, y, z, 1]^Ty~​=Precti​cami​​Rrefi​recti​​Tref0​refi​​Tveloref0​​x~,where x~=[x,y,z,1]T
y~=(u~,v~,z,1)\tilde{y} = (\tilde{u}, \tilde{v}, z, 1)y~​=(u~,v~,z,1)
Tvelocami=PrecticamiRrefirectiTref0refiTveloref0T^{\text{cam}_i}_{\text{velo}} = P^{\text{cam}_i}_{\text{rect}_i} R^{\text{rect}_i}_{\text{ref}_i} T^{\text{ref}_i}_{\text{ref}_0} T^{\text{ref}_0}_{\text{velo}}Tvelocami​​=Precti​cami​​Rrefi​recti​​Tref0​refi​​Tveloref0​​

Where:

  • LiDAR to camera reference → transforms a 3D point relative to the LiDAR to a 3D point relative to the Camera.

TvelorefT^{\text{ref}}_{\text{velo}}Tveloref​
  • Rigid body transformation from camera 0 to camera i.

Tref0refiT^{\text{ref}_i}_{\text{ref}_0}Tref0​refi​​
  • Camera i to rectified camera i reference.

RrefirectiR^{\text{rect}_i}_{\text{ref}_i}Rrefi​recti​​
  • Rectified camera i to 2D camera i (u, v, z) coordinate space.

PrecticamiP^{\text{cam}_i}_{\text{rect}_i}Precti​cami​​
  • 3D LiDAR space to 2D camera i (u, v, z) coordinate space.

TvelocamiT^{\text{cam}_i}_{\text{velo}}Tvelocami​​

Where (u, v, z) are the final camera coordinates after the rectification and projection transforms. In order to transform from homogeneous image coordinates y to true (u, v, z) image coordinates y, we will need to normalize by the depth and drop the 1:

y=(u~z,v~z,z)y = \left( \frac{\tilde{u}}{z}, \frac{\tilde{v}}{z}, z \right)y=(zu~​,zv~​,z)

# define target camera number
camera_number = 2

# read calib_cam_to_cam.txt
calib_cam_to_cam_file = open("../tutorial_data/calib_cam_to_cam.txt")
calib_cam_to_cam_dict = {}
for line in calib_cam_to_cam_file.readlines():
    key, value = line.split(": ")
    calib_cam_to_cam_dict[key] = value.strip()
calib_cam_to_cam_file.close()

# read calib_velo_to_cam.txt
calib_velo_to_cam_file = open("../tutorial_data/calib_velo_to_cam.txt")
calib_velo_to_cam_dict = {}
for line in calib_velo_to_cam_file.readlines():
    key, value = line.split(": ")
    calib_velo_to_cam_dict[key] = value.strip()
calib_velo_to_cam_file.close()

# get projection matrices
P_rect = calib_cam_to_cam_dict[f"P_rect_0{camera_number}"]
P_rect = np.array([float(x) for x in P_rect.split(" ")]).reshape(
    (3, 4)
)

# get rectified rotation matrices
R_rect = calib_cam_to_cam_dict[f"R_rect_0{camera_number}"]
R_rect = np.array([float(x) for x in R_rect.split(" ")]).reshape(
    (3, 3)
)
# add (0, 0, 0) translation and convert to homogeneous coordinates
R_rect = np.insert(R_rect, 3, values=[0, 0, 0], axis=0)
R_rect = np.insert(R_rect, 3, values=[0, 0, 0, 1], axis=1)

# get rotation matrix from reference camera to target camera
R = calib_cam_to_cam_dict[f"R_0{camera_number}"]
R = np.array([float(x) for x in R.split(" ")]).reshape((3, 3))

# get translation vector from reference camera to target camera
t = calib_cam_to_cam_dict[f"T_0{camera_number}"]
t = np.array([float(x) for x in t.split(" ")]).reshape((3, 1))

# get reference camera to target camera rigid body transformation in homogeneous coordinates
T_ref_to_target = np.insert(np.hstack((R, t)), 3, values=[0, 0, 0, 1], axis=0)

# get lidar to camera reference transformation
R_velo = np.array([float(x) for x in calib_velo_to_cam_dict["R"].split(" ")]).reshape((3, 3))
t_velo = np.array([float(x) for x in calib_velo_to_cam_dict["T"].split(" ")])[:, None]
T_velo_ref0 = np.vstack((np.hstack((R_velo, t_velo)), np.array([0, 0, 0, 1])))

Project LiDAR 3D points on 2D photo context image

Now when we got all necessary transformation matrices, we can obtain matrix to transform 3D LiDAR points to 2D camera coordinates.

# transform from velo (LiDAR) to target camera
T_velo_to_cam = P_rect @ R_rect @ T_ref_to_target @ T_velo_ref0

# get lidar points
pcd = o3d.io.read_point_cloud(local_pcd_path, format="pcd")
pcd_points = np.asarray(pcd.points)
xyzw = np.insert(pcd_points, 3, 1, axis=1).T

# get 3D points projections on photo context image
projections = T_velo_to_cam @ xyzw
projections[:2] /= projections[2, :]

# draw point projections on image and display result
save_path = os.path.join(save_dir, "projections.png")
draw_projections_on_image(projections.copy(), photo_context_image_np.copy(), save_path)
plt.axis("off")
plt.imshow(Image.open(save_path));

Get LiDAR point projections located inside masks on photo context image

Next step after we got LiDAR point projections - find projections which are located inside masks on photo context image (in our case we need to find find point projections which lie on two cars in front of ego vehicle).

# download photo context image annotation
project_meta_json = api.project.get_meta(photo_context_project_id)
project_meta = sly.ProjectMeta.from_json(project_meta_json)
ann_json = api.annotation.download(photo_context_image_id).annotation
ann = sly.Annotation.from_json(ann_json, project_meta)

# get binary masks
labels = ann.labels
bitmap = np.zeros(photo_context_image_np.shape, dtype=np.uint8)
for label in labels:
    geometry = label.geometry
    geometry.draw(bitmap=bitmap, color=[1, 1, 1])
bitmap = bitmap[:, :, :2]  # to 2d

# get indexes of projections which are located inside masks
u, v, z = projections
inside_masks = []
img_h, img_w, _ = photo_context_image_np.shape
for idx in range(len(pcd.points)):
    point = np.array([int(u[idx]), int(v[idx])])
    if (point[0] <= 0 or point[0] >= img_w) or (point[1] <= 0 or point[1] >= img_h):
        continue
    else:
        if np.all(bitmap[point[1], point[0]] == 1):
            inside_masks.append(idx)

print(f"{len(inside_masks)} point projections are located inside masks")

# display point projections located inside masks
masked_projections_save_path = os.path.join(save_dir, "masked_projections.png")
draw_projections_on_image(
    projections.copy(),
    photo_context_image_np.copy(),
    masked_projections_save_path,
    preprocess=False,
    indexes=inside_masks,
)
plt.axis("off")
plt.imshow(Image.open(masked_projections_save_path));

Create 3D point cloud segmentation masks

Since 3D points and their projections have the same indexing, we can apply indexes of projections located inside masks on photo context image to original LiDAR points and get masked part of point cloud.

visualize_3d_masks(local_pcd_path, inside_masks)

As you can see, not only points of target objects (cars) were segmented - there are also points in front and behind target objects, whose projections are located on target objects on 2D image. Such scenario was quite predictable since in 3D space points can have similar x and y coordinates, but different z coordinates - their projections will be almost the same, but in 3D space these points can be located in completely different parts of point cloud.

In order to handle such corner case, we will have to apply some postprocessing to masked LiDAR points in order to filer points which do not belong to our target objects.

When it comes to postprocessing 2D to 3D segmentation transfer results, in most cases points which belong to target objects (in our case cars) will be in majority and other points (in our case points of surrounding environment, which we don't want to segment) will be in minority. Additionally, we can state that points of target objects and points of surrounding environment are most likely to be distant from each other.

What can we do with this information? We can apply clustering algorithm to masked part of our point cloud and find N biggest clusters (where N equals number of target objects / segmentation masks, in our case - 2) - since "target" points and "noise" points are likely to be distant from each other and "target" points are likely to have higher density than other points, there is a high probability that points of target objects will belong to biggest clusters. Let's apply DBScan clustering algorithm and test this hypothesis:

# cluster masked part of point cloud
pcd = o3d.io.read_point_cloud(local_pcd_path, format="pcd")
masked_pcd = pcd.select_by_index(inside_masks)
cluster_labels = np.array(masked_pcd.cluster_dbscan(eps=1.5, min_points=100))
clusters, counts = np.unique(cluster_labels, return_counts=True)

# we recommend to set number of clusters equal to number of segmentation masks on photo context image
n_biggest_clusters = len(ann.labels)
biggest_clusters = clusters[np.argsort(counts)][-n_biggest_clusters:]
biggest_cluster_indexes = []
for idx, label in enumerate(cluster_labels):
    if label in biggest_clusters:
        biggest_cluster_indexes.append(idx)
inside_masks_processed = [inside_masks[idx] for idx in biggest_cluster_indexes]
print(f"{len(inside_masks) - len(inside_masks_processed)} points were filtered using DBScan clustering")

# display point filtering results
visualize_3d_masks(local_pcd_path, inside_masks_processed)

As you can see, for our case, DBScan perfectly filtered points which do not belong to target objects.

Upload created 3D segmentation mask to Supervisely platform

Now, when we have indexes of target object's points in 3D space, we can create 3D point cloud segmentattion annotation and upload it to Supervisely platform. We will create new point cloud project and upload to it input point cloud, masked photo context image and 3D segmentation mask.

# create output project
pcd_project = api.project.create(
    657,
    "Segmented point cloud",
    change_name_if_conflict=True,
    type=ProjectType.POINT_CLOUDS,
)
pcd_dataset = api.dataset.create(pcd_project.id, "ds_0", change_name_if_conflict=True)

# upload point cloud to output project
pcd_info = api.pointcloud.upload_path(pcd_dataset.id, name="scene.pcd", path=local_pcd_path)

# upload related image to output project
related_image_hash = api.pointcloud.upload_related_image(masked_photo_context_path)

# create dict with camera info
extrinsic_matrix = T_velo_ref0[:3, :4]
extrinsic_matrix = extrinsic_matrix.flatten().tolist()
intrinsic_matrix = calib_cam_to_cam_dict["P_rect_00"]
intrinsic_matrix = np.array(intrinsic_matrix.split(" "), dtype=np.float32).reshape(3, 4)
intrinsic_matrix = intrinsic_matrix[:3, :3].flatten().tolist()
cam_info = {
    "deviceId": "cam_0",
    "sensorsData": {
        "extrinsicMatrix": extrinsic_matrix,
        "intrinsicMatrix": intrinsic_matrix,
    },
}

# create dict with image info
related_image_info = {
    "entityId": pcd_info.id,
    "name": "img_0.png",
    "hash": related_image_hash,
    "meta": cam_info,
}

# upload related image info
api.pointcloud.add_related_images([related_image_info])

# upload point cloud segmentation mask to the platform
pcd_project_meta = sly.ProjectMeta.from_json(api.project.get_meta(pcd_project.id))

if not pcd_project_meta.get_obj_class("mask"):
    pcd_project_meta = pcd_project_meta.add_obj_class(sly.ObjClass("mask", Pointcloud))
    api.project.update_meta(pcd_project.id, pcd_project_meta.to_json())

ann_info = api.pointcloud.annotation.download(pcd_id)
pcd_objects = []
pcd_figures = []
geometry = Pointcloud(inside_masks_processed)
pcd_object = sly.PointcloudObject(pcd_project_meta.get_obj_class("mask"))
pcd_figure = sly.PointcloudFigure(pcd_object, geometry)
pcd_objects.append(pcd_object)
pcd_figures.append(pcd_figure)
pcd_objects = PointcloudObjectCollection(pcd_objects)
result_ann = sly.PointcloudAnnotation(
    pcd_objects, pcd_figures, PointcloudTagCollection([])
)
api.pointcloud.annotation.append(pcd_info.id, result_ann)

Data export

Conclusion

In this tutorial, we used 2D mask guidance, sensor calibration matrices and DBScan clustering algorithm in order to transfer 2D segmentation mask to 3D space. This approach can be useful when there is a need in fast 3D point clouds labeling for instance segmentation tasks. For example, we can apply YOLO11 model for instance segmentation to each photo context image in KITTI dataset and transfer 2D masks to 3D space using sensor calibration data - it will allow to create a huge labeled dataset for 3D instance segmentation without having to manually draw segmentation masks in every point cloud. Alternative ways of 3D point clouds labeling will be covered in future tutorials.

Acknowledgement

photo context image with 2D masks

File for camera-to-camera calibration contains the following data (source - ):

For running the code provided in this tutorial, you will need some Python modules: supervisely, open3d and plotly. You can use for convenience:

We will also use functions.py file to import visualization functions, you can find this file .

Import necessary libraries, load and set image display parameters:

Let's download photo context image and its annotations from Supervisely platform to local storage and visualize the result. We will use Supervisely Python SDK for this purpose, you can find more tutorials on how to use Supervisely Python SDK .

masked photo context

The KITTI describes the transformation from LiDAR to camera i as follows, where each transformation matrix has been converted to it's homogeneous representation. The difference here is that we have changed the notation and added the transformation to the desired camera reference.

For convenience we will denote the transformation from LiDAR to camera i like Isaac Berrios in his sensor fusion tutorial:

3D point projections on 2D photo context image
3D point projections located inside masks

Now, when point cloud has been uploaded to Supervisely platform, you can easily export labeled data in any suitable format using corresponding apps: , , .

This tutorial is based on by Isaac Berrios.

🎉
KITTI README
this Dockerfile
here
Supervisely account credentials
here
paper
proposed
Export pointclouds project in Supervisely format
Export to KITTI 3D
Export Point Clouds to ROS Bag
great work
KITTI
3D Point Cloud labeling tool
Image labeling tool
GitHub
Import KITTI 3D