Algorithmic Challenges in Machine Learning

Kamalika Chaudhuri, Univeristy of California, San Diego

In this talk, we address two algorithmic challenges in machine learning.

First, with the increase in electronic record-keeping, many datasets that learning algorithms work with relate to sensitive information about individuals. Thus the problem of privacy-preserving learning – how to design learning algorithms that operate on the sensitive data of individuals while still guaranteeing the privacy of individuals in the training set – has achieved great practical importance. In this talk, we address the problem of privacy-preserving classification, and we present an efficient classifier which is private in the differential privacy model of Dwork et al. Our classifier works in the ERM (empirical loss minimization) framework, and includes privacy preserving logistic regression and privacy preserving support vector machines. We show that our classifier is private, provide analytical bounds on the sample requirement of our classifier, and evaluate it on some real data.

Second, we address the problem of clustering, when data is available from multiple domains or views. For example, when clustering a document corpus such as Wikipedia, we have access to the contents of the documents and their link structure. In this talk, we address this problem of Multiview Clustering, and show how to use information from multiple views to improve clustering performance. We present an algorithm for multiview clustering, provide analytical bounds on the performance of our algorithm under certain statistical assumptions, and finally evaluate our algorithm on some real data.

Based on joint work with Sham Kakade (UPenn), Karen Livescu (TTI Chicago), Claire Monteleoni (CCLS Columbia), Anand Sarwate (ITA UCSD), and Karthik Sridharan (TTI Chicago).

Speaker Biography

Kamalika Chaudhuri received a Bachelor of Technology degree in Computer Science and Engineering in 2002 from the Indian Institute of Technology, Kanpur, and a PhD in Computer Science from UC Berkeley in 2007. She is currently a postdoctoral researcher at the Computer Science and Engineering Department at UC San Diego. Kamalika’s research is on the design and analysis of machine-learning algorithms and their applications. In particular, her interests lie in clustering, online learning, and privacy-preserving machine-learning, and applications of machine-learning and algorithms to practical problems in other areas.