非凸优化研究最新成果

美国艾尔弗·斯隆基金会（The Alfred P. Sloan Foundation）公布了2019年斯隆研究奖（Sloan Research Fellowships）获奖名单，华裔学者鬲融获此殊荣。现就其研究专业部分进行总结：

鬲融的研究领域为理论计算机科学和机器学习。他在个人主页上写道“深度学习等现代机器学习算法尝试从数据中自动学习有用的隐含表示。那么我们要如何公式化数据中的隐含结构，以及如何设计高效的算法找到它们呢？我的研究就以非凸优化和张量分解为工具，通过研究文本、图像和其他形式的数据分析中出现的问题，尝试解答这些疑问。”

鬲融的研究有三个主要课题：表示学习（Representation Learning）、非凸优化（Non-convex Optimization）以及张量分解（Tensor Decompositions）。此次获得斯隆研究奖，正是基于鬲融在非凸优化方面的研究。根据他本人介绍：“现在机器学习大多使用深度学习算法，这些算法需要通过解决一些非凸优化问题来找到最优的神经网络参数。理论上非凸优化在最坏情况下是非常困难的，但是实际上即使是非常简单的算法（比如梯度下降gradient descent）都表现很好。我最近的工作对于一些简单的非凸优化问题给出了一些分析，可以证明所有的局部最优解都是全局最优解。”

他还补充道：“科研中感觉有些问题一开始看来完全没有头绪，但是有几个特别感兴趣的问题我一般会每隔一段时间再看一下。现在理论机器学习方向发展很快，往往过了一段时间就有很多新的技术可以尝试。其实一开始研究非凸优化的问题是为了解决张量分解的问题（这个是我之前做的研究），但是开始做了之后才发现我们用的工具在很多其他问题中也非常有效。”

不仅此次获奖的研究结论“简单的非凸优化中所有的局部最优解都是全局最优解”对机器学习领域的研究人员们来说是一个令人欣慰的结论，鬲融更多关于别的课题的研究论文也发表在了NIPS、ICML、ICLR等顶级人工智能学术会议上。我们下面列举一些。

Learning Two-layer Neural Networks with Symmetric Inputs，借助对称输入学习双层神经网络. ICLR 2019. https://arxiv.org/abs/1810.06793
Understanding Composition of Word Embeddings via Tensor Decomposition，通过张量分解理解词嵌入的成分. ICLR 2019. https://openreview.net/forum?id=H1eqjiCctX
Stronger generalization bounds for deep nets via a compression approach，通过压缩方式为深度神经网络赋予更强的泛化边界. ICML 2018. https://arxiv.org/abs/1802.05296
Minimizing Nonconvex Population Risk from Rough Empirical Risk，从粗糙的经验风险中最小化非凸种群风险. NeurIPS 2018. https://arxiv.org/abs/1803.09357
Beyond Log-concavity: Provable Guarantees for Sampling Multi-modal Distributions using Simulated Tempering Langevin Monte Carlo，超越对数凹面：通过仿真时序郎之万蒙特卡洛实现采样多模态分布的可证明保证. NIPS 2017 Bayesian Inference Workshop. NeurIPS 2018. https://arxiv.org/abs/1812.00793
Global Convergence of Policy Gradient Methods for Linearized Control Problems，用于线性化控制问题的策略梯度方法的全局收敛性. ICML 2018. https://arxiv.org/abs/1801.05039
Learning One-hidden-layer Neural Networks with Landscape Design，通过曲面设计学习单层隐层的神经网络. ICLR 2018. https://arxiv.org/abs/1711.00501
Generalization and Equilibrium in Generative Adversarial Nets (GANs)，对抗性生成式网络的泛化性和均衡研究. ICML 2017. https://arxiv.org/abs/1703.00573
No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis，低阶非凸问题中不存在虚假的局部极小值：一个统一的几何分析. ICML 2017. https://arxiv.org/abs/1704.00708
How to Escape Saddle Points Efficiently，如何高效地离开驻点. ICML 2017. https://arxiv.org/abs/1703.00887
On the Optimization Landscape of Tensor decompositions，关于张量分解的优化图像.NIPS 2016 非凸 workshop 最佳理论研究奖. https://sites.google.com/site/nonconvexnips2016/files/Paper8.pdf
Matrix Completion has No Spurious Local Minimum，矩阵期满中不存在虚假的局部极小值. NIPS 2016 最佳学生论文奖. http://arxiv.org/abs/1605.07272
Provable Algorithms for Inference in Topic Models，话题模型中可证明的推理算法. In ICML 2016. http://arxiv.org/abs/1605.08491
Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis，几个高效的大规模泛化特征向量计算和规范关联分析算法. ICML 2016. http://arxiv.org/abs/1604.03930
Rich Component Analysis，富成分分析. In ICML 2016. http://arxiv.org/abs/1507.03867
Intersecting Faces: Non-negative Matrix Factorization With New Guarantees，相交的截面：带有新的保证的非负矩阵乘法. ICML 2015. http://arxiv.org/abs/1507.02189
Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization，反规范化：用于经验风险最小化的逼近近似点和更快的随机算法. ICML 2015. http://arxiv.org/abs/1506.07512
此外他还有多篇论文发表在各年的 COLT（Annual Conference on Learning Theory，ACM 主办，计算学习理论顶级会议）中。

以下是取自杜克大学网站的官方新闻英文原文介绍：

March 04, 2019

By Matt Hartman

Rong Ge, 2019 Sloan Research Fellow and Duke University Assistant Professor of Computer Science

Before the promises of artificial intelligence can happen, the theoretical problems with machine learning algorithms must be solved. Fortunately, Duke University Assistant Professor of Computer Science Rong Ge has been making headway on them. In recognition of that work, he has been awarded with a prestigious Sloan Research Fellowship.

Awarded to 126 scholars each year, the Sloan Fellowships provide support to promising early-career scientists and researchers in the United States and Canada. Candidates are nominated by their peers, and the winners are selected by a panel of senior scholars on the basis of their accomplishments and potential to become leaders in their fields.

“Sloan Research Fellows are the best young scientists working today,” says Adam F. Falk, president of the Alfred P. Sloan Foundation, which awards the fellowships. “Sloan Fellows standout for their creativity, for their hard work, for the importance of the issues they tackle, and the energy and innovation with which they tackle them. To be a Sloan Fellow is to be in the vanguard of twenty-first century science.”

“The award means a lot to me,” Ge says. “I’m happy that people like the work I’ve been doing. There are a lot of open problems in it still, and I’m just hoping to continue working on them.”

Ge’s research focuses on how “neural networks” are trained. These networks are essential to machine learning; they are what allow machines to make decisions about new cases without human input. Facial recognition technology is one example. In order to determine whether a photo includes a human face, much less identify whose face it is, the machine needs a framework for analyzing the photo. The neural network provides that framework.

But before that can happen, the neural network must be uncovered. Which way of organizing the network, from all the countless possibilities, will allow the machine to generate the desired prediction about a new case? In facial recognition, the machine could take into account the colors of the photo, the size and shape of the photographed objects, the angle the photo was taken, and more. To guide the machine, computer scientists typically use manually coded examples, providing millions of photos labeled with specific information (like “includes a human face” and “does not include a human face”). As the machine sees more examples, it can create a more accurate neural network.

Finding the right parameters for the network so that it makes the best prediction is an optimization problem. “You want to find the best set of parameters given the data you have,” Ge says. He explains that most research on the issue focuses on a special kind of optimization problem called convex optimization. But machine learning is a non-convex optimization problem, which is more complicated because there can be more than one optimal solution.

In practice, however, very simple algorithms can solve these very complex problems. Ge’s work is focused on understanding why that is the case. Summarizing his work, Ge asks, “Why do we have a problem that is theoretically very difficult that we should not be able to solve, but in practice is solved by a very simple algorithm?”

While he has not yet discovered a complete solution to the dilemma, Ge has found why it works in some cases, like Amazon or Netflix recommendations. In those cases, even though the optimization problems are non-convex, all of the different optimal solutions are equally good, so it doesn’t matter which you find.

Going forward, Ge’s work will focus on expanding our understanding of these problems. Sloan Fellows receive a two-year, $70,000 award to fund their research, which Ge says he will use to hire a postdoctoral researcher to help apply his findings to more complicated forms of machine learning.

“We will first try to understand why the current algorithms are working so well, and then hopefully we can design new algorithms that work even better,” he says.

If he succeeds, he will help open the pathway to even more advanced kinds of artificial intelligence technology, like self-driving cars and personalized medicine.

参考

AI科技评论公众号文章

Duke University Official News

非凸优化研究最新成果

猜你喜欢