|Date: Friday, November 14, 2014
Location: 1084 East Hall (3:00 PM to 4:00 PM)
Title: Locating Outliers in Large Matrices with Adaptive Compressive Sampling
Abstract: Fueled by our increasingly information-based and data-driven society, there is a growing need for computational methods capable of "making sense" of volumes of data that may be very high-dimensional, corrupted, or even largely incomplete. A unifying theme in many modern investigations into problems of this form is to exploit intrinsic low-dimensional structure to facilitate inference. For example, a number of recent efforts have shown that computationally tractable "robust" variants of principal component analysis methods can provably succeed in recovering data vectors originating from some common low-dimensional subspace, even when the data are corrupted by outliers and/or have many missing elements. These tools have found utility in a number of applications in machine learning, bioinformatics, image processing, and collaborative filtering, to name a few.
In this talk, we discuss some of our initial work on a related, but complimentary, task -- rather than estimate vectors in the presence of corruption or outliers, we consider the problem of identifying the locations of corrupted data points or outliers in a large, otherwise low-rank, matrix. Such problems may arise, for example, when identifying malicious responses in survey data, detecting anomalous patterns in network traffic, or even in visual surveillance applications where one seeks to locate anomalies in a scene. We propose a simple two-step adaptive sensing and inference approach to these problems, and provide theoretical guarantees quantifying its performance. Our results show that under somewhat mild assumptions, accurate outlier identification is achievable using very few linear summaries of the rows and columns of the original data matrix (as few as the squared rank of the low-rank component plus the number of outliers, times constant and logarithmic factors). More generally, our investigation shows that the sample complexity and computational burden associated with the task of identifying outliers from a structured background can be significantly less than for methods that seek to recover or estimate the structured background itself in the presence of outliers.
We demonstrate the performance of our approach experimentally on synthetic data, and in an application motivated by saliency map estimation tasks arising in computer vision and automated surveillance.
Speaker: Jarvis Haupt
Institution: University of Minnesota