CAREER: Scaling Up Knowledge Discovery in High-Dimensional Data Via Nonconvex Statistical Optimization


The past decade has witnessed a surge of research activities on knowledge discovery in high-dimensional data, among which convex optimization-based methods are widely used. While convex optimization algorithms enjoy global convergence guarantees, they are not always scalable to high-dimensional massive data. Motivated by the empirical success of nonconvex methods such as matrix factorization, the objective of this project is to develop a new generation of principled nonconvex statistical optimization algorithms to scale up high-dimensional machine learning methods. This project amplifies the utility of high-dimensional knowledge discovery methods in various fields such as computational genomics and recommendation systems. It incorporates the resulting research outcomes into curriculum development and online courses, to train a new generation of machine learning and data mining practitioners. In addition, special training is provided to K-12 students and community college students for a broader education of modern data analysis techniques. This project consists of three synergistic research thrusts. First, it develops a family of nonconvex algorithms for structured sparse learning, including extensions to both parallel computing and distributed computing. Second, it devises a unified nonconvex optimization framework for low-rank matrix estimation, which covers a wide range of low-rank matrix learning problems such as matrix completion and preference learning. Several acceleration techniques are also explored. Third, it develops a family of alternating optimization algorithms, to solve the bi-convex optimization problem for estimating various complex statistical models. This project integrates modern optimization techniques with model-based statistical thinking, and provides a systematic way to design nonconvex high-dimensional machine learning methods with strong theoretical guarantees. The targeted applications include but not limited to computational genomics, neuroscience, and recommendation systems.


Funding Source

Project Period