CAREER: Scaling Up Knowledge Discovery in High-Dimensional Data Via Nonconvex Statistical Optimization


The past decade has witnessed a surge of research activities on knowledge discovery in high-dimensional data, among which convex optimization-based methods are widely used. While convex optimization algorithms enjoy global convergence guarantees, they are not always scalable to high-dimensional massive data. Motivated by the empirical success of nonconvex methods such as matrix factorization, the objective of this project is to develop a new generation of principled nonconvex statistical optimization algorithms to scale up high-dimensional machine learning methods. This project amplifies the utility of high-dimensional knowledge discovery methods in various fields such as computational genomics and recommendation systems. It incorporates the resulting research outcomes into curriculum development and online courses, to train a new generation of machine learning and data mining practitioners. In addition, special training is provided to K-12 students and community college students for a broader education of modern data analysis techniques. This project consists of three synergistic research thrusts. First, it develops a family of nonconvex algorithms for structured sparse learning, including extensions to both parallel computing and distributed computing. Second, it devises a unified nonconvex optimization framework for low-rank matrix estimation, which covers a wide range of low-rank matrix learning problems such as matrix completion and preference learning. Several acceleration techniques are also explored. Third, it develops a family of alternating optimization algorithms, to solve the bi-convex optimization problem for estimating various complex statistical models. This project integrates modern optimization techniques with model-based statistical thinking, and provides a systematic way to design nonconvex high-dimensional machine learning methods with strong theoretical guarantees. The targeted applications include but not limited to computational genomics, neuroscience, and recommendation systems.


Funding Source

Project Period


This site is registered on as a development site.