Wen-Hua Lin, Kuan-Ting Chen, Winston Hsu
WWW 2018 Cognitive Computing Track
Publication year: 2018

Recently, deep neural network models have achieved promising results in image captioning task. However, current work has several deficiencies. It has low utilities as simply generating the “vanilla” sentences, which only describe the shallow appearances (e.g., types, colors) in the photo – lacking engagements, contexts, and user intentions. To tackle this problem, we propose a novel approach, Netizen Style Commenting (NSC), to automatically generate characteristic comments to a user-contributed fashion photo. We are devoted to modulating the comments in a vivid “netizen” style which reflects the culture in the designated social community and hope to facilitate more engagements users. In this work, we design a novel framework that consists of three major components: (1) We construct a brand new large-scale clothing dataset, NetiLook, which contains 300K posts (photos) in order to discover netizen-style comments. (2) In addition, we propose three novel measurements to estimate the diversity of comments. (3) To automatically generate diverse comments, we investigate and leverage the merit of the latent topic model which is able to keep long-range dependencies. Furthermore, we bring freshness and diversity by marrying topic discovery model (i.e., latent Dirichlet allocation) with neural networks to make up the insufficiency of conventional image captioning works. Experimenting over Flickr30k and our NetiLook datasets, we demonstrate our proposed approach significantly benefit fashion photo commenting and improve image captioning task both in accuracy and diversity.