[Spring 2021] ADVANCED TOPICS IN MULTIMEDIA ANALYSIS AND INDEXING – AMMAI (高等多媒體資訊分析與檢索) – Winston HSU (徐宏民) | Scientist enthusiastic for machine intelligence and deep learning over large-scale image/video streams

[Spring 2021] ADVANCED TOPICS IN MULTIMEDIA ANALYSIS AND INDEXING – AMMAI (高等多媒體資訊分析與檢索)

[Introduction]

This course focuses on recent development of machine learning techniques that are promising for solving practical problems in image/video indexing, recognition, comprehension and deep learning. Especially, we will focus on the emerging needs and solutions in very large-scale and diverse media types, sensors, and in-depth deep learning neural networks. The goal is for students to get familiar with the state of the art, learn how to formulate and solve practical image/video learning/indexing, and acquire hands-on experiences through actual experiments. The course will include some topics in depth such as:

Data augmentation strategies for deep neural networks
Advanced Face Recognition and Beyond
Advanced hash learning and optimization
Manifold learning
Sparse coding and solvers
Latent semantic analysis
Automatic neural network learning (autoML) + structure learning
Explainable AI
Deep comprehension and question and answering
Efficient computation for neural networks
Few (zero) shot learning
Reinforcement learning from sensors and data
Experience sharing from the domain experts

Course Goals:

Extending breadths and depths for essential technical components for cognitive computation in feature representations, learning, scalable computation.
Gaining practical experiences through assignments and experiments.
Practicing paper critiques, summarization, and presentations

Lecturer: Winston Hsu (office: R512, CSIE Building)

TA: Jia-Fong Yeh <jiafongyeh@ieee.org> (office hour: TBA)

Time: 14:20 ~ 17:20, Thursday

Location: R544, CSIE Building

Info: 課程識別碼: 922 U3710

Assessment (tentative):

Assignments : 30-40% (2 experiments)
Course participation & paper summarization: 30-40%. (examples for paper summarization)
Final Project: 30%

Requirements: Background in image processing (or signal processing related courses), probability, and linear algebra. Experience with machine learning or statistical pattern recognition will be useful but not required.

Textbook: NO. We will cover some active research areas not included in any mature textbooks. Nevertheless, we will provide rich papers and reference books.

Resources: The lecture slides, homework descriptions, and datasets will be posted in NTU CEIBA. Only the students registered for the lecture can access them. We will also host the discussions (questions) in NTU CEIBA as well.

[Course Outline]

Week 01 – Introduction for the course and topics (02/25)
- Basic paper reading, critique, and presentation techniques
- Readings:
  - “How to Read a Paper,” Keshav, ACM SIGCOMM Computer Communication Review 2007. [m – must] (no summary)
  - “How to give a good research talk,” Jones et. al. [m] (no summary)
  - “Writing Technical Articles,” Henning Schulzrinne. [o – optional]
  - “Image Retrieval: Ideas, Influences, and Trends of the New Age,” Datta, 2008 (comprehensive and long) [o]
  - O. Russakovsky, et al., “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision 115(3), 2015
  - Sze, et al. Hardware for Machine Learning: Challenges and Opportunities. CICC 2017.
  - “Artificial Intelligence and Big Data,”D. E. O’Leary, Intelligent Systems, IEEE, vol. 28, no. 2, pp. 96–99, 2013.
  - Stoica et al. A Berkeley View of Systems Challenges for AI. arXiv, 2017. [o]

Week 02 – Data Augmentation Strategies for Machine (Deep) Learning (03/04)
- Investigating different data augmentation strategies for providing sufficient (& quality) training data for deep neural networks.
- Papers:
  - Xie et al. Self-training with Noisy Student improves ImageNet classification. CVPR 2020. [m]
  - Shorten, C., Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J Big Data6, 60 (2019). [m]
  - He et al. Rethinking ImageNet Pre-training. ICCV 2019.
  - Wang et al., ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. CVPR 2017.
  - Veit at al. Learning From Noisy Large-Scale Datasets With Minimal Supervision. CVPR 2017.
  - Mahajan et al. Exploring the Limits of Weakly Supervised Pretraining. ECCV 2018
  - … others in the lecture slides

Week 03 – Advanced Face Recognition and Beyond (03/11)
- Advanced face recognition models and the effective cost functions
- HW #1 — face recognition with varying cost functions (due: 12pm, April 8)
- Readings:
  - Ranjan et al., “Deep Learning for Understanding Faces: Machines May Be Just as Good, or Better, than Humans”. IEEE Signal Processing Magazine 2018. [m]
  - Wang et al. Deep Face Recognition: A Survey. arXiv 2019
  - Bansal et al. The Do’s and Don’ts for CNN-based Face Verification. CVPR 2017
  - Sun Y, et al. Deep learning face representation by joint identification-verification. NIPS 2014
  - Wen. Y, et al A Discriminative Feature Learning Approach for Deep Face Recognition. ECCV 2016
  - Liu et al. Large-Margin Softmax Loss for Convolutional Neural Networks. ICMR 2016
  - Liu et al. SphereFace: Deep Hypersphere Embedding for Face Recognition. CVPR 2017. [m]
  - Wang et al. Additive margin softmax for face verification, IEEE Signal Processing Letters 2018
  - Deng et al. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. CVPR 2019.
  - Zhang et al. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters, 2016.

Week 04 – Varying Effective Cost Functions (03/18)
- Introducing other (potential) cost functions motivated via prior ML or CV methods
- Readings
  - P. F. Felzenszwalb et al., Object Detection with Discriminatively Trained Part Based Models. TPAMI 2010
  - X. Zhu et al., Face Detection, Pose Estimation, and Landmark Localization in the Wild. CVPR 2012
  - Bojan Pepik et al., Teaching 3D Geometry to Deformable Part Models. CVPR 2012
  - Mohsen Hejrat et al., Analyzing 3D Objects in Cluttered Images. NIPS 2012
  - Arandjelović et al., NetVLAD: CNN architecture for weakly supervised place recognition. CVPR 2016
  - Perronnin et al., Improving the Fisher Kernel for Large-Scale Image Classification. ECCV 2010
  - Jaakkola et al., Exploiting Generative Models in Discriminative Classifiers. NIPS 1998. [m]
  - Jegou et al., Aggregating local descriptors into a compact image representation. CVPR 2010
  - J. Philbin et al., Descriptor Learning for Efficient Retrieval. ECCV 2010
  - Tsung-Yi Lin et al. Learning Deep Representations for Ground-to-Aerial Geolocalization. CVPR 2015

Week 05 – Advanced Hash Learning and Optimization (03/25)
- Advanced (supervised, semi-supervised, and unsupervised) hashing methods and optimization packages
- Readings:
  - J. Wang, T. Zhang, j. song, N. Sebe and H. T. Shen. A Survey on Learning to Hash. in IEEE Transactions on Pattern Analysis and Machine Intelligence, April 2018.
  - Wang et al. Learning to Hash for Indexing Big Data – A Survey. Proceedings of the IEEE, 2016. [m]
  - Yunchao Gong and S. Lazebnik. Iterative Quantization: A Procrustean Approach to Learning Binary Codes. CVPR 2011.
  - Yair Weiss, Antonio Torralba, and Robert Fergus. Spectral Hashing. Neural Information Processing Systems, 2008.
  - Ruslan Salakhutdinov and Geoffrey Hinton. Semantic hashing. International Journal of Approximate Reasoning, 50(7): 969-978, 2009.
  - Brian Kulis, and Trevor Darrell. Learning to Hash with Binary Reconstructive Embeddings. Neural Information Processing Systems, 2009.
  - Mohammad Norouzi, and David Fleet. Minimal Loss Hashing for Compact Binary Codes. International Conference in Machine Learning, 2011.
  - Christoph Strecha, Alex M. Bronstein, Michael M. Bronstein, and Pascal Fua. LDAHash: Improved matching with smaller descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 1, pp. 66-78, January, 2012.
  - Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. Semi-Supervised Hashing for Large-Scale Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(12): 2393-2406, 2012.
  - Learning binary hash codes for large-scale image search. K. Grauman and R. Fergus. Machine Learning for Computer Vision, pages 49-87, 2013.

Week 06 – Spring Break; no lecture (04/01)

Week 07 – Manifold Learning and Optimization (04/08)
- Readings:
  - “Graph Embedding and Extensions: A General Framework for Dimensionality Reduction,” Shuicheng Yan et al., PAMI 2007. [m]
  - Laurens van der Maaten, Geoffrey Hinton. Visualizing Data using t-SNE; The Journal of Machine Learning Research, 2008.
  - “Nonlinear dimensionality reduction by locally linear embedding,” Roweis & Saul, Science, 2000. [o]
  - “An Introduction to Locally Linear Embedding,” Saul & Roweis. [o]
  - “The Manifold Ways of Perception,” Seung & Lee, Science, 2000. [o]
  - “Linear Discriminant Analysis in Document Classification,” Torkkola. [o]
  - “Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifolds,” LK Saul, ST Roweis – Journal of Machine Learning Research, 2004. (an extended version for LLE) [o]
  - J.B. Tenenbaum, “A global geometric framework for nonlinear dimensionality reduction,” Science, 2000 (ISOMAP)
  - Supplemental materials Rich resources for Locally Linear Embedding including papers, examples, and sample codes.
    - Sections 3.8.1 Principal Component Analysis (PCA) and 3.8.2 Fisher Linear Discriminant in [Duda’02].
    - Matlab Toolbox for Dimensionality Reduction by Laurens van der Maate

Week 08 – Sparse Coding and Solvers (04/15)
- HW #2 — application for sparse coding (due: 12pm, May 6)
- ReadingsMairal et al. Online dictionary learning for sparse coding. ICML 2009.
  - Yang et al. Robust Sparse Coding for Face Recognition. CVPR 2011.
  - Bor-Chun Chen, Yan-Ying Chen, Yin-Hsi Kuo, Winston H. Hsu. Scalable Face Image Retrieval Using Attribute-Enhanced Sparse Codewords. IEEE Trans. Multimedia 15(5): 1163-1173 (2013)
  - B.-C. Chen, Y.-H. Kuo, Y.-Y. Chen, K.-Y. Chu, and W. Hsu. Semi-supervised face image retrieval using sparse coding with identity constraint. In Proceedings of the 19th ACM International Conference on Multimedia, MM ’11, pages 1369–1372, New York, NY, USA, 2011. ACM. [o]
  - Elad and Aharon. Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries. Image Processing, IEEE Transactions on Image Processing (2006) vol. 15 (12) pp. 3736 – 3745. [m]
  - J Wright, et al., Robust face recognition via sparse representation. IEEE transactions on pattern analysis and machine intelligence 31 (2), 210-227
  - Shenghua Gao et al. Local features are not lonely – Laplacian sparse coding for image classification. CVPR 2010.
  - Jianchao Yang et al. Linear spatial pyramid matching using sparse coding for image classification. CVPR 2009.
  - Lee et al., Efficient sparse coding algorithms, NIPS 2007.

Week 09 – midterm week; no lecture (04/22)

Week 10 – Latent Semantic Analysis (04/29)
- readings:
  - Yishu Miao‚ Edward Grefenstette and Phil Blunsom. Discovering Discrete Latent Topics with Neural Variational Inference. ICML 2017.
  - Li Wan et. al. “A Hybrid Neural Network-Latent Topic Model”, JMLR 2012. [o]
  - “Probabilistic latent semantic indexing,” T. Hofmann, SIGIR, 1999. [o]
  - “Unsupervised Learning by Probabilistic Latent Semantic Analysis,” T Hofmann – Machine Learning, 2001. (an extended version) [o]
  - “Document Clustering using Word Clusters via the Information Bottleneck Method,” Noam Slonim and Naftali Tishby, SIGIR 2000. [o]
  - “Indexing by Latent Semantic Analysis,” Deerwester, 1990. [o]
  - “Image retrieval on large-scale image databases,” Eva Horster, Rainer Lienhart, Malcolm Slaney, CIVR 2007. [o]
  - “A Bayesian Hierarchical Model for Learning Natural Scene Categories,” Fei Fei Li, CVPR 2005.
  - “Latent Dirichlet allocation,” D. Blei, A. Ng, and M. Jordan. . Journal of Machine Learning Research, 3:993–1022, January 2003. [o]
  - “Efficient Indexing for Large Scale Visual Search,” Xiao Zhang et al., ICCV 2009. [m]
  - Lin et al., Netizen-Style Commenting on Fashion Photos – Dataset and Diversity Measures, WWW 2018. [o]

Week 11 – Deep Comprehension and Question and Answering (05/06)
- Speaker: TA, Hung-Ting Su
- Readings:
  - A Neural Probabilistic Language Model, JMLR, 2003
  - Attention is All You Need, NeurIPS 2017
  - Deep contextualized word representations, NAACL 2018
  - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL 2019
  - Bidirectional Attention Flow for Machine Comprehension, ICLR 2017
  - To Test Machine Comprehension, Start by Defining Comprehension, ACL 2020. [m]
  - Situation and Behavior Understanding by Trope Detection on Films, WWW 2021
  - From System 1 Deep Learning to System 2 Deep Learning, NeurIPS 2019
  - Language Models are Few-Shot Learners, NeurIPS 2020
  - Bidirectional Attention Flow for Machine Comprehension, ICLR 2017.
  - MovieQA: Understanding Stories in Movies through Question-Answering, CVPR 2016
  - Video Question Answering via Gradually Refined Attention over Appearance and Motion, ACMMM 2017
  - ReasoNet: Learning to Stop Reading in Machine Comprehension, KDD 2017
  - Know What You Don’t Know: Unanswerable Questions for SQuAD, ACL 2018
  - Denoising Distantly Supervised Open-Domain Question Answering, ACL 2018
  - Densely Connected Attention Propagation for Reading Comprehension, NeurIPS 2018
  - TVQA: Localized, Compositional Video Question Answering, EMNLP 2018
  - Asking Questions the Human Way: Scalable Question-Answer Generation from Text Corpus, WWW 2020
  - Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering, ACL 2020

Week 12 – Automatic Neural Network Learning (autoML) + Structure Learning (05/13)
- Readings:
  - Dong et al. DPP-Net – Device-Aware Progressive Search for Pareto-Optimal Neural Architectures, ECCV 2018.
  - Elsken et al. Neural Architecture Search: A Survey. JMLR 2019. [m]
  - Zoph et al. Learning Transferable Architectures for Scalable Image Recognition. CVPR 2018
  - Pham et al. Efficient Neural Architecture Search via Parameter Sharing. ICML 2018
  - Yao et al. Taking the Human out of Learning Applications: A Survey on Automated Machine Learning. arXiv 2019
  - Neural Architecture Search with Reinforcement Learning, ICLR 2017.
  - Learning Transferable Architectures for Scalable Image Recognition, CVPR 2018
  - Efficient Neural Architecture Search via Parameter Sharing, arXiv, 2018
  - Large-Scale Evolution of Image Classifiers, ICML 2017
  - Regularized Evolution for Image Classifier Architecture Search, arXiv.

Week 13 – Few (Zero)-Shot Learning (05/20)
- Final: Cross-domain few-shot learning (doc)
  - Presentation (tentative): June 17, 2021
- Readings:
  - Chen et al. A Closer Look at Few-shot Classification. ICLR 2019.
  - Finn et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML 2017
  - Rusu et al. Meta-Learning with Latent Embedding Optimization. ICLR 2019
  - Vinyals et al. Matching Networks for One Shot Learning. NIPS 2016
  - Snell et al, Prototypical Networks for Few-shot Learning. NIPS 2017
  - Li et al. Meta-SGD: Learning to Learn Quickly for Few-Shot Learning. arXiv 2017
  - Sung et al. Learning to Compare: Relation Network for Few-Shot Learning. CVPR 2018
  - Wang et al. Large Margin Meta-Learning for Few-Shot Classification. NIPS 2018
  - Karlinsky RepMet: Representative-based metric learning for classification and few-shot object detection. et. al., CVPR 2019
  - Simon et al., Adaptive Subspaces for Few-Shot Learning, CVPR 2020. [m]
  - Li et al., Boosting Few-Shot Learning With Adaptive Margin Loss, CVPR 2020.
  - Yang et al., DPGN: Distribution Propagation Graph Network for Few-Shot Learning, CVPR2020.
  - Yao et al, Graph Few-Shot Learning via Knowledge Transfer, AAAI2020.

Week 14 – Efficient Computation for Deep Learning Methods (05/27)
- Readings:
  - Molchanov et al. Importance Estimation for Neural Network Pruning. CVPR 2019. [m]
  - Gamanayake et al. Cluster Pruning: An Efficient Filter Pruning Method for Edge AI Vision Applications. IEEE Journal of Selected Topics in Signal Processing, May 2020
  - Lemaire et al. Structured Pruning of Neural Networks with Budget-Aware Regularization. CVPR 2019
  - Howard et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017.
  - Chen et al. Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution. CVPR 2019
  - Cheng et al. Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges. IEEE Signal Processing Magazine 2018.
  - Sze et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. Proceedings of the IEEE. 2017.
  - Yim et al. A Gift from Knowledge Distillation:Fast Optimization, Network Minimization and Transfer Learning. CVPR 2017
  - Bianco et al. Benchmark Analysis of Representative Deep Neural Network Architectures. IEEE Access 2018.
  - Kim et al. Compression of deep convolutional neural networks for fast and low power mobile applications. ICLR 2016.
  - Ding et al. A Compact CNN-DBLSTM Based Character Model For Offline Handwriting Recognition with Tucker Decomposition. ICDAR 2017

Week 15 – Autonomous Driving: Perception, Prediction, and Planning (I) (06/03)
- Readings:
  - Luo et al. Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net. CVPR 2020. [m]
  - CVPR 2020 Tutorial: All About Self-Driving, Ersin Yumer, Raquel Urtasun (Uber ATG)
  - Lefèvre et al. A survey on motion prediction and risk assessment for intelligent vehicles. ROBOMECH
  - Journal 2014.Chai et al. MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction. CoRL 2019.
  - Kim et al. Probabilistic Vehicle Trajectory Prediction over Occupancy Grid Map via Recurrent Neural Network. ITSC 2017.
  - Wu et al. MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird’s Eye View Maps. CVPR 2020
  - Hoermann et al. Dynamic Occupancy Grid Prediction for Urban Autonomous Driving: A Deep Learning Approach with Fully Automatic Labeling. ICRA 2018.
  - Gao et al. VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation. CVPR 2020.
  - Djuric, et al. Uncertainty-aware Short-term Motion Prediction of Traffic Actors for Autonomous Driving. WACV 2020
  - Phan-Minh et al. CoverNet: Multimodal Behavior Prediction using Trajectory Sets. CVPR 2020.
  - Casas et al. IntentNet: Learning to Predict Intention from Raw Sensor Data. CoRL 2018
  - Cui, et al. Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks. ICRA 2019
  - Phillips et al. Deep Multi-Task Learning for Joint Localization, Perception, and Prediction. CVPR 2021.
  - Alahi et al. Social LSTM: Human Trajectory Prediction in Crowded Spaces. CVPR 2016.
  - Hong et al. Rules of the Road: Predicting Driving Behavior with a Convolutional Model of Semantic Interactions. CVPR 2019.
  - Ivanovic and Pavone. The Trajectron: Probabilistic Multi-Agent Trajectory Modeling With Dynamic Spatiotemporal Graphs, ICCV 2019
  - Liang et al. PnPNet: End-to-End Perception and Prediction with Tracking in the Loop. CVPR 2020.
  - Bansal et al. ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst. 2019 ICML Workshop on AI for Autonomous Driving.

Week 16 – Autonomous Driving: Perception, Prediction, and Planning (II) (06/10)
- Readings:
  - Ren et al. MP3: A Unified Model to Map, Perceive, Predict and Plan. CVPR 2021. [m]
  - Katrakazas et al. Real-time motion planning methods for autonomous on-road driving: State-of-the-art and future research directions. Transportation Research Part C: Emerging Technologies, 2015.
  - Paden, et al. A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Transactions on Intelligent Vehicles, 2016.
  - Claussmann et al. A Review of Motion Planning for Highway Autonomous Driving. IEEE Trans. on Intelligent Transportation Systems, 2020.
  - Ziegler et al. Trajectory Planning for BERTHA – a Local, Continuous Method. IV 2014.
  - Zeng et al. End-to-end Interpretable Neural Motion Planner. CVPR 2019
  - Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations. ECCV 2020
  - Cui et al. LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving. arXiv 2021.

Week 17 – Final Project Demo & Presentation (06/17)

Winston H. Hsu

National Taiwan University