Stochastic approximation algorithms for large-scale unsupervised learning

The nature of signal processing and machine learning has evolved dramatically over the years as we try to investigate increasingly intricate, dynamic and large-scale systems. This development is accompanied by an explosion of massive, unlabeled, multimodal, corrupted and very high-dimensional “big data”, which poses new challenges for efficient analysis and learning. In this talk, I will advocate a learning approach based on “stochastic approximation”, wherein a single data point is processed at each iteration using a computationally simple update, to address these challenges. I will start by presenting a stochastic approximation (SA) meta-algorithm for unsupervised learning with large high-dimensional datasets. I will then describe the application of the SA algorithm to a multiview learning framework, where multiple modalities are available at the time of training but not for prediction at test time, and a similarity-based learning framework where data is observed only in the form of pairwise similarities. I will conclude with a theoretical analysis of the SA algorithm and a discussion about the pitfalls of SA approaches and the remedies thereof.

Speaker Biography

Raman Arora received his B.E. degree from NSIT, Delhi, India, in 2001, and M.S. and Ph.D. degrees from the University of Wisconsin-Madison in 2005 and 2009, respectively. He worked as a Research Associate at University of Washington, Seattle, from 2009 to 2011 and was a visiting researcher at Microsoft Research (MSR) during the summer of 2011. He is currently a Postdoctoral Researcher at the Toyota Technological Institute at Chicago. His research interests include online learning, large-scale machine learning, speech recognition and statistical signal processing.