New Algorithms for High-Dimensional, Inconsistent Datasets

Nir Ailon

The information revolution has made massive amounts of data available from a myriad of sources such as biology, finance, e-commerce, advertising and the web. Handling these high-dimensional, often noisy and inconsistent datasets has become an important focus of algorithmic research. (1) Rank Aggregation deals with combining inconsistent rankings suggested by different voters. This old problem from social choice theory has recently attracted the attention of computer scientists in surprising new contexts such as search engines and information retrieval systems.I will present new, improved algorithms, and discuss related results and follow-up work on cluster aggregation, hierarchical clustering and phylogeny. (2) Sublinear algorithms compute functions by considering only a small sample of the input. This is especially important when the input is massive. I will discuss new results in sublinear algorithms in the context of property testing, property-preserving data reconstruction and nearest-neighbor searching. Time permitting, I will briefly touch upon other topics I am interested in such as self-improving algorithms, computational geometry and complexity.