Chao-Te Chou, Cheng-Han Lee, Kaipeng Zhang, Hu Cheng Lee, Winston H. Hsu
Asian Conference on Computer Vision (ACCV)
Publication year: 2018

Virtual try-on – synthesizing an almost-realistic image for dressing a target fashion item provided the source human photo – has growing needs due to the prevalence of e-commerce and the development of deep learning technologies. However, existing deep learning virtual try-on methods focus on the clothing replacement due to the lack of dataset and cope with flat body segments with frontal poses provid- ing the front view of the target fashion item. In this paper, we present pose invariant virtual try-on shoe (PIVTONS) to cope with the vir- tual try-on shoe. We collect the first paired feet and shoe virtual-try on dataset, Zalando-shoes, containing 14,062 shoes among the 11 categories of shoes. The shoe image only contains a single view of the shoes but the try-on result should show other views of the shoes depending on the orig- inal feet pose. We formulate that as an automatic and labor-free image completion task and design an end-to-end neural networks composing of feature point detector. By combing three losses for image generation, we can synthesize realistic results. Through the numerous experiments and ablation studies, we demonstrate the performance of the proposed framework and investigate the parameterizing factors for optimizing the challenging problem.