Gabriel Chan

Search

Search IconIcon to open search

3DSSD: Point-based 3D Single Stage Object Detector

Last updated Sep 9, 2022 Edit Source

arXiv link github #todo

# Main Ideas:

# Feature-Farthest Point Sampling (F-FPS)

# Fusion Sampling

# Network walkthrough

using the following network config:

# Network Input:

# Assumptions

# syntax

$B$: is batch size

# 3DSSD Backbone

# SA_Layer 1 (D-FPS)

Input:

Operations

  1. D-FPS to sample 512 points

  2. Grouping: create new_feature_list

    1. ball query with r=0.2, nsample=32, MLP=[16,16,32]

      • new_feauture: (B,32,512,32)
    2. Maxpool and squeeze last channel [-1]

      • new_feature: (B,32,512), append to new_feature_list
    3. ball query with r=0.4, nsample=32, MLP=[16,16,32]

      • new_feauture: (B,32,512,32)
    4. Maxpool and squeeze last channel [-1]

      • new_feature: (B,32,512) , append to new_feature_list
    5. ball query with r=0.8, nsample=64, MLP=[16,16,32]

      • new_feauture: (B,32,512,64)
    6. Maxpool and squeeze last channel [-1]

      • new_feature: (B,32,512) , append to new_feature_list
  3. Aggregation Channel:

    1. torch.cat all features along dim=1
      • new_feature: (B,32+32+64,512)
    2. Conv1d with in_channel=128, out_channel=64, kernel_size = 1, batchnorm1d and ReLU
      • new_feature: (B,64,512)

Output:

# SA_Layer 2 (FS)

Input

Operations

  1. Sample 512 points via D-FPS and F-FPS, then concat them together (total pts=1024)

    • Note: unlike sampling via [F-FPS,D-FPS] (see next layer), it seems like FS may select the same point more than once.
  2. Grouping

    1. ball query with r=0.4, nsample=32, then MLP=[64,64,128]

      • new_feauture: (B,1281024,32) <- the +3 here is xyz of each sampled point
    2. Maxpool and squeeze last channel [-1]

      • new_feature: (B,128,1024), append to new_feature_list
    3. ball query with r=0.8, nsample=32, then MLP=[64,64,128]

      • new_feauture: (B,128,1024,32) <- the +3 here is xyz of each sampled point
    4. Maxpool and squeeze last channel [-1]

      • new_feature: (B,128,1024), append to new_feature_list
    5. ball query with r=1.6, nsample=64, then MLP=[64,96,128]

      • new_feauture: (B,128,1024,64) <- the +3 here is xyz of each sampled point
    6. Maxpool and squeeze last channel [-1]

      • new_feature: (B,128,1024), append to new_feature_list
  3. Aggregation Channel:

    1. torch.cat all features along dim=1
      • new_feature: (B,128+128+128,1024)
    2. Conv1d with in_channel=384, out_channel=128, kernel_size = 1, batchnorm1d and ReLU
      • new_feature: (B,128,1024)

Output

# SA_Layer 3 (F-FPS, D-FPS)

Input

Operations

  1. Sample 256 points via D-FPS and F-FPS, then concat them together (total pts=512)

  2. Grouping

    1. ball query with r=1.6, nsample=32, then MLP=[128,128,256]

      • new_feauture: (B,256,512,32) <- the +3 here is xyz of each sampled point
    2. Maxpool and squeeze last channel [-1]

      • new_feature: (B,256,512), append to new_feature_list
    3. ball query with r=3.2, nsample=32, then MLP=[128,196,256]

      • new_feauture: (B,256,512,32) <- the +3 here is xyz of each sampled point
    4. Maxpool and squeeze last channel [-1]

      • new_feature: (B,256,512), append to new_feature_list
    5. ball query with r=4.8, nsample=32, then MLP=[128,256,256]

      • new_feauture: (B,256,512,32) <- the +3 here is xyz of each sampled point
    6. Maxpool and squeeze last channel [-1]

      • new_feature: (B,256,512), append to new_feature_list
  3. Aggregation Channel:

    1. torch.cat all features along dim=1
      • new_feature: (B,256+256+256,512)
    2. Conv1d with in_channel=768, out_channel=256, kernel_size = 1, batchnorm1d and ReLU
      • new_feature: (B,256,512)

Output

# SA_Layer 4 (F-FPS, D-FPS)

THIS IS THE FIRST PART OF CANDIDATE GENERATION Input

Operations

Output:

# Vote_Layer (n/a)

THIS IS THE SECOND PART OF CANDIDATE GENERATION

Input:

Output:

# SA_Layer5 (D-FPS)

Input

Operationa

  1. Sample 128 points via D-FPS (total pts=128)

  2. Grouping

    1. ball query with r=4.8, nsample=16, then MLP=[256,256,512]
      • new_feauture: (B,512,128,16) <- the +3 here is xyz of each sampled point
    2. Maxpool and squeeze last channel [-1]
      • new_feature: (B,512,128), append to `new_feature_list
    3. ball query with r=4.8, nsample=32, then MLP=[256,512,1024]
      • new_feauture: (B,1024,128,32) <- the +3 here is xyz of each sampled point
    4. Maxpool and squeeze last channel [-1]
      • new_feature: (B,1024,128), append to `new_feature_list
  3. Aggregation Channel:

    1. torch.cat all features along dim=1
      • new_feature: (B,512+1024,128)
    2. Conv1d with in_channel=1536, out_channel=512, kernel_size = 1, batchnorm1d and ReLU
      • new_feature: (B,512,128)

Output

# Detection head

# Box prediction

  1. FC: in_channel=512, out_channel=256
  2. BN then ReLU
  3. FC: in_channel=256, out_channel=256
  4. BN then ReLU
  5. FC: in_channel=256, out_channel=30

Output:

note

# Box classification

  1. FC: in_channel=512, out_channel=256
  2. BN then ReLU
  3. FC: in_channel=256, out_channel=256
  4. BN then ReLU
  5. FC: in_channel=256, out_channel=3 <- number of classes

Output

# Target assignment

  1. Take GT boxes and enlarge them by GT_EXTRA_WIDTH: [0.2, 0.2, 0.2]
  2. call assign_stack_targets() with params:
    • points = centers, centers are predicted centers with shape (B*128,4)
    • gt_boxes = gt_boxes, GT boxes with shape (B,G,8) <- G = number of boxes per scene, 8: xy,z,dx,dy,dz,angle,class
    • extend_gt_boxes=extend_gt_boxes: enlarged GT boxes with shape (B,G,8)
    • set_ignore_flag = True: not sure what this is used atm…
    • use_ball_constraint = False: not sure
    • ret_part_labels = False: not sure
    • ret_box_labels = True : but not sure why
    1. For each scene in the batch, run roiaware_pool3d_utils.points_in_boxes_gpu() to find the predicted centroids that lie inside a gt box.
      • return shape is (128): values are either idx of the gt box it lies in, or -1 if its background
      • create flag: box_fg_flag = (box_idxs_of_pts >= 0)
    2. if set_ignore_flag = True, then do the same for the extended gt boxes.
      • `fg_flag = box_fg_flag
      • ignore_flag = fg_flag ^ (extend_box_idxs_of_pts >= 0)
        • note that ^ is xor gate in other flag, we only want to ignore flag points that are ONLY in the extended gt box, or ONLY in the gt box
      • then point_cls_labels_single[ignore_flag] = -1,
      • gt_box_of_fg_points = gt_boxes[k][box_idxs_of_pts[fg_flag]]
        • box_idxs_of_pts[fg_flag]: is a 1D tensor with indices of the gt box each pt lies in. e.g: [0,4,2,4,0,1,…]
        • then gt_box_of_fg_points is a 2D tensor of shape (M,8) where M is the number of pts that are inside a gt box, with the associated gt box.
      • point_cls_labels_single[fg_flag] = 1 if self.num_class == 1 else gt_box_of_fg_points[:, -1].long()
        • [-1] is the class of the gt, so this gets the label and puts it into a 1D tensor.
    3. if ret_box_labels and gt_box_of_fg_points.shape[0] > 0:, i.e. if there are points that lie inside gt boxes,
      • call fg_point_box_labels = self.box_coder.encode_torch() with params:
        • gt_boxes=gt_box_of_fg_points[:, :-1]: so just all boxes, with x,y,z,dx,dy,dz,angle
        • points=points_single[fg_flag], all predicted centers that lie inside gt box
        • gt_classes=gt_box_of_fg_points[:, -1].long() class of gt boxes.
          • but not used if self.use_mean = False , this is set via 'use_mean_size': False under BOX_CODER_CONFIG
      • function basically assigned a label based on residuals for each pt based on the gt box it lies in.
        • i.e. the label for each point is different between point (x,y,z) and box center +
        • log() of dx dy dz of gt box
        • bin of angle + residual
        • total = length of 8
    4. point_box_labels_single[fg_flag] = fg_point_box_labels
      • assign the points their new labels
    5. point_box_labels[bs_mask] = point_box_labels_single
      • assign it to the “outer” list (where all labels for each sample in the batch will be)
    6. after doing this for all samples in batch: concat all gt_box_of_fg_points into a tensor gt_boxes_of_fg_points
    7. Return the following dict.
1
2
3
4
5
6
7
8
9
targets_dict = {
'point_cls_labels': point_cls_labels, # (B*128)
'point_box_labels': point_box_labels, # (B*128,8)
'point_part_labels': point_part_labels, # None
'box_idxs_of_pts': box_idxs_of_pts, # (128)
'gt_box_of_fg_points': gt_boxes_of_fg_points,#(M,8), M is the number of pts that 
					     # are inside a gt box, so this is list
						 # since it changes for every batch
}