Gabriel Chan

Search

Search IconIcon to open search

PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection

Last updated Jul 17, 2022 Edit Source

# Main Ideas

# Network Diagram

# Feature Encoding

# Voxel Set Abstraction Module

For each keypoint $p_i$ we find neighboarding non-empty voxels at the $k$th level within radius $r_k$. The resulting set of voxel-wise features vectors is:$$S_{i}^{(l_k)}= \left { [f_j^{(lk)};\underbrace{v{j}^{(l_k)}- p_i}{\text{relative coords}}]^T ; \middle | ; \begin{array}{cc} \lVert v{j}^{(l_k)}-p_i \rVert^2 < r_{k}, \[1ex] \forall v_{j}^{(l_k)}\in \mathcal{V}^{(l_k)}, \[1ex] \forall f{j}^{(l_k)}\in\mathcal{F}^{(l_k)} \end{array} \right}$$ where:

Then for each keypoint $p_i$ , we concatinate its features from different levels $$f_i^{(pv)}=\left[f_{i}^{(pv_1)},f_{i}^{(pv_2)},f_{i}^{(pv_3)},f_{i}^{(pv_4)}\right], \text{for } i=1,\ldots,n$$ where:

# Extended VSA Module

# Predicted Keypoint Weighting

# RoI-grid Pooling via Set Abstraction

# Training Loss