Recovering thee underlying structure from images
Pose Estimation
RGB(D) Image -> 3D structure -> 3D understanding
1. 3D Representations
Geometric representations in deep learning broadly can be divided into 3 classes:
- voxel-based: limited in resolution, offer no topological guarantees, cannot represent sharp features
- point-based: memory issues, do not capture manifold connectivity
- mesh-based
State-of-the-art deep learning systems output 3D geometry as point clouds, triangle meshes, voxel grids, and implicit surfaces
1.1. Volumetric
3D Voxel Grids
- 3D ShapeNets, Deep Belief Network, 2015
- 3D GAN, 2016
- Sparse 3D Convolutional Networks, 2016
1.2. Light Field Representation (LFD)
1.3. Multi-view
1.4. Triangular mesh
1.5. Point cloud
- PointNet, PointNet++
1.6. Implicit representation
arbitrary topology
decision boundary
- Occupancy Network, CVPR 2019
- octree
- DeepSDF, signed distance field, CVPR 2019
- Shape completion: noisy pointcloud -> mesh
- why model sharp edge? ReLU: piece-wise smooth
- pixel2mesh, graph convolutional network, 2018
- graph unpooling layer: subdivision=deconvolution
- motivated by geometric mesh flow
- perceptual feature pooling
- Regularization: Laplacian-smoothness