We propose a cross-domain image-based 3D shape re- trieval method, which learns a joint embedding space for natural images and 3D shapes in an end-to-end manner. The similarities between images and 3D shapes can be com- puted as the distances in this embedding space. To better encode a 3D shape, we propose a new feature aggrega- tion method, Cross-View Convolution (CVC), which models a 3D shape as a sequence of rendered views. For bridg- ing the gaps between images and 3D shapes, we propose a Cross-Domain Triplet Neural Network (CDTNN) that in- corporates an adaptation layer to match the features from different domains better and can be trained end-to-end. In addition, we speed up the triplet training process by pre- senting a new fast cross-domain triplet neural network ar- chitecture. We evaluate our method on a new image to 3D shape dataset for category-level retrieval and ObjectNet3D for instance-level retrieval. Experimental results demon- strate that our method outperforms the state-of-the-art ap- proaches in terms of retrieval performance. We also pro- vide in-depth analysis of various design choices to further reduce the memory storage and computational cost.
Leave a Reply