[Fall 2020] Cognitive Computing (感知運算)

[Introduction]

Course Goal:Cognitive computing refers to systems that learn at scale, reason with purpose, and interact with humans naturally from numerous emerging sensors and signals. Cognitive computing systems are trained to sense, predict, infer, and in some ways, reason, using machine learning algorithms that are operated over large-scale, noisy, and unstructured data streams.

The topic will be essential for current and future industrial needs and academic research opportunities.

We aim to introduce the state-of-the-art and essential machine learning algorithms for numerous core problems in cognitive computing. We investigate methods for machine perception and the following action planning. We need to deal with the noisy, unstructured, high-dimensional data in rigorous and efficient manners.

We emphasize the hands-on experiences for conducting the course in terms of programming and experimental assignments, midterm, and final projects. We will organize the lecture content from the state-of-the-art and the reading materials will be mostly based on the literatures from top conferences.

Lecturer: Winston Hsu (office: R512, CSIE Building)

TA: Jia-Fong Yeh <jiafongyeh@ieee.org> (office hour: 10am – 12pm, Tuesday)

Time: 2:20pm – 5:10pm, Tuesday

Location: R107, CSIE Building

Info: 課號CSIE 5420; 課程識別碼: 922 U4460

Assessment (tentative):

Assignments : 20-30%
Midterm Exam.: 30-40%
- November 5, 2019
Final Project: 30-40%

Resources:

The lecture slides, homework descriptions, and datasets will be posted in NTU CEIBA. Only the students registered for the lecture can access them.
We will also host the discussions (questions) in NTU CEIBA as well.
The icon indicates that the related resources are ready for download

Requirements: Background in image processing (or signal processing related courses), probability, and linear algebra. Experience with machine learning or statistical pattern recognition will be useful but not required.

Textbook: NO. We will cover some active research areas not included in any mature textbooks. Nevertheless, we will provide rich papers and reference books.

[Course Outline]

[09/15] Introduction to cognitive computing and machine perception (PDF)
- HW#1 – Python warmup for image processing; due at 12pm (noon), September 21, 2020 (Monday); digital submission in CEIBA (or email). See details in PDF.
- Readings:
  - How to read a paper. S Keshav, SIGCOMM Comput. Commun. Rev. 37, 3 (Jul. 2007), 83-84. [m, must read]
  - Deep learning. Yann LeCun, Yoshua Bengio, Geoffrey Hinton, Nature 521, 436–444 (28 May 2015)
  - Harnessing A.I. for Augmenting Creativity: Application to Movie Trailer Creation. John R Smith, Dhiraj Joshi, Benoit Huet, Winston Hsu, and Zef Cota. ACM Multimedia 2017. (PDF)
[09/22] Understanding image, video and sensors
- image sensor, video, compression, and shot detection
- Readings:
  - D. Le Gall, “MPEG: A Video Compression Standard for Multimedia Applications,” Communications of ACM, April 1991, Vol 34, No. 4, pp. 46-58. [m]
  - R. Ramantha et al. “Color Image Processing Pipeline: a general survey of digital still cameras”, IEEE Signal Processing Magazine , Jan 2005. [m]
  - S. Uchihashi and J. Foote, “Summarizing Video Using a Shot Importance Measure and a Frame-Packing Algorithm,” ICASSP 1999.
  - Michael Brown, “Understanding the In-Camera Image Processing Pipeline for Computer Vision,” CVPR 2016. A tutorial.
  - Matthew D. Zeiler, Rob Fergus. Visualizing and Understanding Convolutional Networks, ECCV 2014. (optional for CNN)
  - Chang et al. Free-form Video Inpainting with 3D Gated Convolution and Temporal Patch GAN. ICCV 2019. (GitHub)
  - I. Koprinska, S. Carrato, “Temporal video segmentation: a survey,” Signal Processing: Image Communication, vol. 16, pp. 477–500, 2001. (Sec. 3.2 & 3.3, skipped)
  - Sony. The Basics of Camera Technology. (PDF)
[09/29] visual feature: color + texture + shape (I)
- low-level visual features: color, texture, and shape
- Readings:
  - Kieran McDonald, Alan F. Smeaton, “A Comparison of Score, Rank and Probability-Based Fusion Methods for Video Shot Retrieval,” CIVR 2005. (multimodal fusion)
  - “Texture features for browsing and retrieval of image data,” B. S. Manjunath and W.Y. Ma, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.18, no.8, pp.837-42, Aug 1996. [m]
  - “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories,” S. Lazebnik, C. Schmid, and J. Ponce, CVPR 2006. [m]
  - “Color and Texture Descriptors,” B. S. Manjunath, Jens-Rainer Ohm, Vinod V. Vasudevan, Akio Yamada, IEEE Transactions on Circuits and Systems for Video Technology, Vol 11, No. 6, June 2001.
  - “MPEG-7 visual shape descriptors,” Miroslaw Bober, IEEE Transactions on Circuits and Systems for Video Technology, Vol 11, No. 6, June 2001. [m]
  - “Representing shape with a spatial pyramid kernel,” A. Bosch, et al., CIVR 2007.
[10/06] visual feature: color + texture + shape (II)
- continuing from the prior topic
- HW #2: visual features for image retrieval; due at 10pm, Monday, October 26 (submission through CEIBA)
[10/13] visual feature: local features
- Local features and visual words
- Readings:
  - “Video Google: A Text Retrieval Approach to Object Matching in Videos,” J. Sivic, and A. Zisserman, ICCV, 2003. [m]
  - “Aggregating local descriptors into a compact image representation,” H. J ́egou, M. Douze, C. Schmid, and P. P ́erez.. CVPR 2010. [m]
  - “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories,” S. Lazebnik, C. Schmid, and J. Ponce, CVPR 2006.
  - “Distinctive Image Features from Scale-Invariant Keypoints,” Lowe, IJCV, 2004.
  - “A Performance Evaluation of Local Descriptors,” Mikolajczyk, PAMI 2005.
  - “A Comparison of Affine Region Detectors,” Mikolajczyk, IJCV, 2004.
  - “Scale & Affine Invariant Interest Point Detectors,” Mikolajczyk, IJCV, 2004.
  - “ContextSeer: Context Search and Recommendation at Query Time for Shared Consumer Photos,” Yi-Hsuan Yang, Po-Tun Wu, Ching-Wei Lee, Kuan-Hung Lin, Winston H. Hsu, ACM Multimedia 2008.
  - “Scalable Face Image Retrieval using Attribute-Enhanced Sparse Codewords,” Bor-Chun Chen, Yan-Ying Chen, Yin-Hsi Kuo, Winston H. Hsu, IEEE Transactions on Multimedia, 2013.

[10/20] visual feature: face
- early face detection, facial attributes, and face recognition. The new paradigm for deep learning methods will be introduced in the CNN sessions.
- Readings:
  - Wang et al. Deep Face Recognition: A Survey. arXiv 2019
  - “Robust real-time face detection,” P. Viola and M. Jones, IJCV 57(2), 2004.
  - “An extended set of Haar-like features for rapid object detection,” Lienhart, R. and Maydt, J., ICIP, 2002.
  - “FaceTracer: A Search Engine for Large Collections of Images with Faces,” Neeraj Kumar, Peter N. Belhumeur, Shree K. Nayar, ECCV, 2008. [m]
  - Bor-Chun Chen, Chu-Song Chen, Winston H. Hsu, “Cross-Age Reference Coding for Age-Invariant Face Recognition and Retrieval,” ECCV 2014 [m]
  - Wu et al., A Light CNN for Deep Face Representation With Noisy Labels. IEEE Transactions on Information Forensics and Security, 2018
  - “Face recognition using eigenfaces,” M. A. Turk, A.P. Pentland, CVPR, 1991.
  - “Face Recognition with Local Binary Patterns,” Timo Ahonen, ECCV, 2004.
  - “Scalable Face Image Retrieval using Attribute-Enhanced Sparse Codewords,” Bor-Chun Chen, Yan-Ying Chen, Yin-Hsi Kuo, Winston H. Hsu, IEEE Transactions on Multimedia, 2013.
  - “Toward Large-Scale Face Recognition Using Social Network Context,” Z. Stone, T. Zickler, and T. Darrell, Proceedings of the IEEE, 2010.
  - “Discovering Informative Social Subgraphs and Predicting Pairwise Relationships from Group Photos,” Yan-Ying Chen, Winston H. Hsu, Hong-Yuan Mark Liao, ACM Multimedia 2012.

[10/27] learning to hash
- State-of-the-art hash-based indexing methods
- Readings:
  - Kristen Grauman. Efficiently Searching for Similar Images. Communications of the ACM,, 2009. (good paper) [m]
  - Malcolm Slaney and Michael Casey, “Locality-Sensitive Hashing for Finding Nearest Neighbors. IEEE Signal Processing Magazine, 2008. [m]
  - Alexandr Andoni and Piotr Indyk, “Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions,” Communications of the ACM, 2008. (good paper)
  - Wang et al., “Learning to Hash for Indexing Big Data—A Survey,” Proceedings of the IEEE, 2016.
  - M. Datar, et al. Locality-sensitive hashing scheme based on p-stable distributions. SoCG 2004.
  - P. Indyk et al. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. STOC 1998
  - Scalable object detection by filter compression with regularized sparse coding. Ting-Hsuan Chao, Yen-Liang Lin, Yin-Hsi Kuo, Winston H. Hsu. CVPR 2015.
  - Junfeng He et al. Mobile Product Search with Bag of Hash Bits and Boundary Reranking. CVPR 2012.

[11/03] feature reduction and manifold
- Readings:
  - Ella, Bingham; Heikki, Mannila. “Random projection in dimensionality reduction: Applications to image and text data”. KDD 2001. （a very good paper)
  - “Graph Embedding and Extensions: A General Framework for Dimensionality Reduction,” Shuicheng Yan et al., PAMI 2007. （a very good paper)
  - “Eigenfaces for recognition,” M Turk, A Pentland – Journal of Cognitive Neuroscience, 1991. (page 72, The Eigenface Approach, to page 76 ONLY)
  - “Nonlinear dimensionality reduction by locally linear embedding,” Roweis & Saul, Science, 2000.
  - Vittorio Castelli, “Multidimensional Indexing Structures for Content-based Retrieval,” IBM Research Report, 2001. (overview paper, section 1 & 2 ONLY) [problems in high-dim features listed here]
  - Laurens van der Maaten, Geoffrey Hinton. Visualizing Data using t-SNE; The Journal of Machine Learning Research, 2008. [m]

[11/10] midterm
- Coverage: TBA
- Closed book
- 2:20pm, R107 (+ R103)
[11/17] convolutional neural networks (I)
- Paper reading:
  - Zeiler et al. Visualizing and Understanding Convolutional Networks. ECCV 2014.
  - Zhao et al., Pyramid Scene Parsing Network. CVPR 2017
  - Johnson et al.. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. ECCV 2016. [m]
  - He et al., Deep Residual Learning for Image Recognition, CVPR 2016
  - Ren et al., Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, CVPR 2015
  - “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size”, Landola et al. arxiv 2016
  - “An Analysis of Deep Neural Network Models for Practical Applications”, Canziani et al. arxiv 2016
  - “Accelerating Very Deep Neural Networks for Classification and Detection”, Zhang et al. TPAMI 2015

[11/24] convolutional neural networks (II)
- Readings (to be revised)
  - Tseng et al. Joint Sequence Learning and Cross-Modality Convolution for 3D Biomedical Segmentation. CVPR 2017
  - Zhang, et al. “PANDA: Pose Aligned Networks for Deep Attribute Modeling,” CVPR 2014
  - Sun, et al. Deep learning face representation by joint identification-verification. NIPS 2014
  - Wen, et al. A Discriminative Feature Learning Approach for Deep Face Recognition. ECCV 2016
  - Hsieh, et al. Drone-based Object Counting by Spatially Regularized Regional Proposal Network. ICCV 2017.

[12/08] 3D (point cloud) learning
[12/15] aesthetic learning (I)
[12/22] aesthetic learning (II)
[12/29] sentiment/emotion learning
- what’s sentiment? emotion? opinion?
- how to conduct robust learning? where are the datasets?
- Readings:
- Readings:
  - Borth et al. Large-scale visual sentiment ontology and detectors using adjective noun pairs. ACM Multimedia 2013. [m]
  - Chen et al. DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks. CoRR abs/1410.8586 (2014)
  - Yang et al. Weakly Supervised Coupled Networks for Visual Sentiment Analysis. CVPR 2018.
  - Soleymani et al. A survey of multimodal sentiment analysis. Image Vision Comput. 65: 3-14 (2017)
  - Ronen Feldman. Techniques and applications for sentiment analysis. Comm. of the ACM. April 2013.
  - You et al. Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and the Benchmark. AAAI 2016.
  - Giachanou et al. Like It or Not: A Survey of Twitter Sentiment Analysis Methods. ACM Computing Surveys, June 2016.
[01/05] backup; project meetings
[01/12] final project presentation
- Workshop style presentations — drinks and snacks will be provided (but bring in your own mugs)
- Each team will have 8 – 10 mins (to be finalized)
- Two Best Awards (Technical and Presentation) will be selected
- The lecture will start earlier to accommodate the all teams!

Winston H. Hsu

National Taiwan University