Previously we have made a tutorial on how to transfer segmentation masks from 2D photo context image to 3D point cloud. This time we will do the opposite - transfer segmentation masks from 3D point cloud to 2D photo context image.
In this tutorial, we will show an example of transfering segmention masks from 3D point cloud to 2D photo context image. Segmented image will be uploaded to Supervisely Platform - after that it is possible to export image segmentation masks in any popular format. We wil take point cloud, photo context image and camera calibration parameters from KITTI dataset as an example, but this approach can be generalized to any data. Supervisely's 3D Point Cloud labeling tool and Image labeling tool will be used for working with point cloud and photo context image respectively.
The main steps of this tutorial are the following:
prepare input data: a 3D point cloud with segmentation mask, a photo context image, KITTI's sensor calibration files
project LiDAR 3D points on photo context image, get projections of masked points
build convex hull based on projections of masked points to create 2D segmentation mask on photo context image
Everything you need to reproduce this toturial is on GitHub: source code, Dockerfile, demo data.
Input data overview: 3D point cloud with segmentation mask, photo context image, camera calibration parameters
Firstly, we will need 3D point cloud with segmentation mask:
Secondly, we will need 2D photo context image related to this point cloud:
Finally, we will need camera calibration parameters to project LiDAR 3D points on 2D photo context image. KITTI dataset provides several sensor calibration files:
calib_cam_to_cam.txt - contains matrices for camera-to-camera calibration
calib_velo_to_cam.txt - contains matrices for velodyne-to-camera registration
File for camera-to-camera calibration contains the following data (source - KITTI README):
S_xx: 1x2 size of image xx before rectification
K_xx: 3x3 calibration matrix of camera xx before rectification
D_xx: 1x5 distortion vector of camera xx before rectification
R_xx: 3x3 rotation matrix of camera xx (extrinsic)
T_xx: 3x1 translation vector of camera xx (extrinsic)
S_rect_xx: 1x2 size of image xx after rectification
R_rect_xx: 3x3 rectifying rotation to make image planes co-planar
P_rect_xx: 3x4 projection matrix after rectification
For our task, we will need only P_rect_xx, R_rect_xx, R_xx and T_xx matrices.
File for velodyne-to-camera registration contains the following data:
R: 3x3 rotation matrix
T: 3x1 translation vector
This data serves as a representation of the velodyne coordinate frame in camera coordinates. We will need rotation matrix and translation vector in order to transform point in velodyne coordinates into the camera coordinate system.
Environment preparation and libraries import
For running the code provided in this tutorial, you will need some Python modules: supervisely, open3d and alphashape. You can use this Dockerfile for convenience:
# import necessary librariesimport supervisely as slyfrom dotenv import load_dotenvfrom PIL import Imageimport osimport matplotlib.pyplot as pltimport open3d as o3dimport jsonimport numpy as npimport cv2import alphashapefrom matplotlib import cm# load credentialsload_dotenv("../supervisely.env")api = sly.Api()# define input parameterspcd_id =316219# your pcd idphoto_context_image_path ="../tutorial_data/photo_context.png"cam_to_cam_file_path ="../tutorial_data/calib_cam_to_cam.txt"velo_to_cam_file_path ="../tutorial_data/calib_velo_to_cam.txt"# set image display parameters%matplotlib inlineplt.rcParams["figure.figsize"]= (20,10)
Download input point cloud and its annotation
Download input point cloud and get indexes of masked points:
# download input point cloud to local storagesave_dir ="../tutorial_data/"local_pcd_path = os.path.join(save_dir, "lidar_data.pcd")api.pointcloud.download_path(pcd_id, local_pcd_path)# download point cloud mask created with pen toolann_info = api.pointcloud.annotation.download(pcd_id)mask_indexes = ann_info["figures"][0]["geometry"]["indices"]
Get sensor calibration parameters
We already covered the topic of sensor calibration parameters in our previous tutorial, but we will also duplicate it here for convenience.
The KITTI paper describes the transformation from LiDAR to camera $i$ as follows, where each transformation matrix has been converted to it's homogeneous representation. The difference here is that we have changed the notation and added the transformation to the desired camera reference.
LiDAR to camera reference → transforms a 3D point relative to the LiDAR to a 3D point relative to the Camera.
Tveloref
Rigid body transformation from camera 0 to camera i.
Tref0refi
Camera i to rectified camera i reference.
Rrefirecti
Rectified camera i to 2D camera i(u, v, z) coordinate space.
Precticami
3D LiDAR space to 2D camera i(u, v, z) coordinate space.
Tvelocami
Where (u, v, z) are the final camera coordinates after the rectification and projection transforms. In order to transform from homogeneous image coordinates y to true (u, v, z) image coordinates y, we will need to normalize by the depth and drop the 1:
y=(zu~,zv~,z)
# define target camera numbercamera_number =2# read calib_cam_to_cam.txtcalib_cam_to_cam_file =open(cam_to_cam_file_path)calib_cam_to_cam_dict ={}for line in calib_cam_to_cam_file.readlines(): key, value = line.split(": ") calib_cam_to_cam_dict[key]= value.strip()calib_cam_to_cam_file.close()# read calib_velo_to_cam.txtcalib_velo_to_cam_file =open(velo_to_cam_file_path)calib_velo_to_cam_dict ={}for line in calib_velo_to_cam_file.readlines(): key, value = line.split(": ") calib_velo_to_cam_dict[key]= value.strip()calib_velo_to_cam_file.close()# get projection matricesP_rect = calib_cam_to_cam_dict[f"P_rect_0{camera_number}"]P_rect = np.array([float(x) for x in P_rect.split(" ")]).reshape( (3, 4))# get rectified rotation matricesR_rect = calib_cam_to_cam_dict[f"R_rect_0{camera_number}"]R_rect = np.array([float(x) for x in R_rect.split(" ")]).reshape( (3, 3))# add (0, 0, 0) translation and convert to homogeneous coordinatesR_rect = np.insert(R_rect, 3, values=[0, 0, 0], axis=0)R_rect = np.insert(R_rect, 3, values=[0, 0, 0, 1], axis=1)# get rotation matrix from reference camera to target cameraR = calib_cam_to_cam_dict[f"R_0{camera_number}"]R = np.array([float(x) for x in R.split(" ")]).reshape((3, 3))# get translation vector from reference camera to target camerat = calib_cam_to_cam_dict[f"T_0{camera_number}"]t = np.array([float(x) for x in t.split(" ")]).reshape((3, 1))# get reference camera to target camera rigid body transformation in homogeneous coordinatesT_ref_to_target = np.insert(np.hstack((R, t)), 3, values=[0, 0, 0, 1], axis=0)# get lidar to camera reference transformationR_velo = np.array([float(x) for x in calib_velo_to_cam_dict["R"].split(" ")]).reshape((3, 3))t_velo = np.array([float(x) for x in calib_velo_to_cam_dict["T"].split(" ")])[:,None]T_velo_ref0 = np.vstack((np.hstack((R_velo, t_velo)), np.array([0, 0, 0, 1])))
Project LiDAR 3D points on 2D photo context image
Next step - project LiDAR 3D points on photo context image and get projections of points which belong to segmented area in point cloud (mask projections).
# transform from velo (LiDAR) to target cameraT_velo_to_cam = P_rect @ R_rect @ T_ref_to_target @ T_velo_ref0# get lidar pointspcd = o3d.io.read_point_cloud(local_pcd_path, format="pcd")pcd_points = np.asarray(pcd.points)xyzw = np.insert(pcd_points, 3, 1, axis=1).T# get 3D points projections on photo context imageprojections = T_velo_to_cam @ xyzwprojections[:2]/= projections[2,:]defdraw_projections_on_image(velo_uvz,image,save_path,preprocess=True,indexes=None):"""Draw LiDAR point projectiins on the image"""if preprocess:# remove negative points velo_uvz = np.delete(velo_uvz, np.where(velo_uvz[2, :] <0)[0], axis=1)# remove outliers u, v, z = velo_uvz img_h, img_w, _ = image.shape u_out = np.logical_or(u <0, u > img_w) v_out = np.logical_or(v <0, v > img_h) outliers = np.logical_or(u_out, v_out) velo_uvz = np.delete(velo_uvz, np.where(outliers), axis=1)# create color map u, v, z = velo_uvz rainbow_r = cm.get_cmap("rainbow_r", lut=100) color_map =lambdaz: [255* val for val inrainbow_r(int(z.round()))[:3]]# draw LiDAR point cloud on blank imagefor i inrange(len(u)):if indexes and i in indexes: cv2.circle(image, (int(u[i]), int(v[i])), 1, color_map(z[i]), -1)if indexes:for i in indexes: cv2.circle(image, (int(u[i]), int(v[i])), 1, color_map(z[i]), -1)else:for i inrange(len(u)): cv2.circle(image, (int(u[i]), int(v[i])), 1, color_map(z[i]), -1)# save result image = Image.fromarray(image) image.save(save_path)# get projections of masked pointsmask_projections = projections.T[mask_indexes].T# read photo context imagephoto_context_image = sly.image.read(photo_context_image_path)# draw point projections on image and display resultsave_path = os.path.join(save_dir, "projections.png")draw_projections_on_image(mask_projections.copy(), photo_context_image.copy(), save_path)plt.axis("off")plt.imshow(Image.open(save_path));
Build 2D segmentation mask from 3D point projections
In order to create segmentation mask from 3D point projections, we are going to build a convex hull - the smallest convex set that encloses all the points, forming a convex polygon. We found alphashape implementation of convex hull to be the most effective, but it is also possible to use cv2 and scipy convex hull implementations.
In this tutorial, we used 3D mask guidance, sensor calibration matrices and convex hull algorithm in order to project 3D segmentation mask on 2D photo context image. This approach can be useful when there is a need in labeling both 3D point clouds and corresponding photo context images - instead of manually labeling both point clouds and images, you can label only point clouds and transfer 3D mask annotations to images.
Acknowledgement
This tutorial is based on great work by Isaac Berrios.