Research Statement |
|||
The long-term goal of research in my group is to create provable machine learning algorithms that facilitate efficient and accurate data analysis. In the “big data” era, despite the plethora of data in both public and private sectors, most datasets are incompletely observed. The target structure is often missing, and the salient structures are obscured by the raw feature representation. Although machine learning is evolving to tackle these challenges with increasingly delicate models, the training schemes have been relying on heuristics that provide no theoretical guarantees such as global optimality. Over the past few years, we have focused on addressing this issue by jointly inferring latent representation and learning predictors for massive datasets through a single convex optimization. The results were analyzed theoretically and validated on real applications. In the pursuit of representation learning, it is critical to conjoin it with prediction model learning, so that the latent features discovered have predictive power. Such a joint learning poses significant challenges because it leads to non-convex problems where finding a jointly optimal solution is intractable. This forms a major obstacle to the analysis of deep learning despite its empirical success. Instead of superseding deep learning, our goal is to complement it with analyzable tools that learn predictive representations with improved scalability, modularity, reliability, and/or flexibility. Towards this end, we have made the following progresses:
Using a similar approach, we convexified multiview learning with a single hidden layer in [36, 29],
and it is being extended to multiple layers and multiple modalities.
Similar extensions are being made for convex modeling of temporal,
spatial, and multi-way correlations by leveraging reproducing kernel
Hilbert space embeddings of distributions [40, 41] our work of tensor trace norm relaxation [19].
The technique leverages the polar operator on the set of
representations with unit invariant complexity, and it was extended to
multiple layers [7],
allowing deep regularization in neural networks to be much more
effectively accomplished for, e.g., multiview learning and robust
sequential modeling.
|
I maintain several notes on research and IT.
They can be found here.