Variational Shape Learner: A Hierarchical Latent-Variable Model of 3D Shapes

BY Shikun Liu, C. Lee Giles and Alexander G. Ororbia II. [3DV 2018, Oral]
[paper] [supplementary] [arxiv] [code]

Abstract

We propose the Variational Shape Learner (VSL), a hierarchical latent-variable model for 3D shape learning. VSL employs an unsupervised approach to learning and inferring the underlying structure of voxelized 3D shapes. Through the use of skip-connections, our model can successfully learn a latent, hierarchical representation of objects. Furthermore, realistic 3D objects can be easily generated by sampling the VSL's latent probabilistic manifold. We show that our generative model can be trained end-to-end from 2D images to perform single image 3D model retrieval. Experiments show, both quantitatively and qualitatively, the improved performance of our proposed model over a range of tasks.

Model Architecture (VSL)

What's wrong with the current methods?

Many of the previous methods require multiple images and/or additional human-provided information. Of the few unsupervised neural-based approaches that exist, one may either need multi-stage training procedure, since jointly training the system components proves to be too difficult, or employ an adversarial learning scheme. However, GANs are notoriously difficult to train, often due to ill-designed loss functions and the higher chance of zero gradients.

Why hierarchical?

It is well known that generative models, learned through variational inference, are excellent at reconstructing complex data but tend to produce blurry samples. This happens because there is uncertainty in the model's predictions when we reconstruct the data from a latent space. Previous approaches to 3D object modelling have focused on learning a single latent representation of the data. However, this simple latent structure might be hindering the underlying model's ability to extract richer structure from the input distribution and thus lead to blurrier reconstructions.

Model Design

As can be seen above, we start from a global latent variable layer (horizontally depicted) that is hardwired to a set of local latent variables layers (vertically depicted), each tasked with representing one level of feature abstraction. The skip-connections tie together the latent codes, and in a top-down directed fashion, local codes closer to the input will tend to represent lower-level features while local codes farther away from the input will tend towards representing higher-level features. The global latent vector can be thought of as a large pool of command units that ensures that each local code extracts information relative to its position in the hierarchy, forming an overall coherent structure. This explicit global-local form, and the way it constrains how information flows across it, lends itself to a straightforward parametrisation of the generative model and furthermore ensures robustness, dramatically cutting down on over-fitting.

Experiments and Visualisations

(*Note: All generated results we show are not cherry-picked and we trained the model using ModelNet40 dataset except for single image shape reconstruction which applied on PASCAL3D+ dataset. In our visualisations, brighter colour corresponds to higher occupancy probability. Please consult with our original implementation on GitHub and our paper for more details.)

Single Shape Generation

Single Shape Reconstruction

Shape Morphing / Intra Class

Shape Morphing / Inter Class

Single Image Shape Reconstruction

Shape Arithmetics

Citation

If you found this work is useful in your own research, please considering citing the following.

@inproceedings{liu2018learning,
  title={Learning a hierarchical latent-variable model of 3d shapes},
  author={Liu, Shikun and Giles, Lee and Ororbia, Alexander},
  booktitle={2018 International Conference on 3D Vision (3DV)},
  pages={542--551},
  year={2018},
  organization={IEEE}
}