\"在这里插入图片描述\"

本文为加拿大多伦多大学(作者:Ilya Sutskever)的博士论文,共101页。

递归神经网络(RNN)是一种强大的序列模型,通常被认为是难以训练的,因此在机器学习应用中很少使用。本文提出了克服RNN训练困难的方法,以及RNN在具有挑战性问题中的应用。

我们首先描述了一种结合约束玻尔兹曼机和RNNs的新的概率序列模型。新模型比已有同类模型更强大,而训练更容易一些。接下来,我们提出了Hessian-free(HF)优化器的一个新变体,并表明它能够对具有极端远程时间依赖性任务的RNN进行训练,这在以前被认为是不可能的。然后,我们将HF应用到字符级语言建模中,并获得优异的结果。同时,将HF应用于最优控制,得到了在时滞反馈和未知干扰条件下能成功工作的RNN控制律。最后,我们描述了一个随机参数初始化方案,它允许利用具有动量的梯度下降来训练具有长期相关性问题的RNN。这直接违背了关于一阶方法不能够如此实现的广泛理解,并且表明先前训练RNN失败的部分原因是由于随机初始化的缺陷。

Recurrent Neural Networks (RNNs) arepowerful sequence models that were believed to be difficult to train, and as aresult they were rarely used in machine learning applications. This thesispresents methods that overcome the difficulty of training RNNs, andapplications of RNNs to challenging problems. We first describe a newprobabilistic sequence model that combines Restricted Boltzmann Machines andRNNs. The new model is more powerful than similar models while being lessdifficult to train. Next, we present a new variant of the Hessian-free (HF)optimizer and show that it can train RNNs on tasks that have extreme long-rangetemporal dependencies, which were previously considered to be impossibly hard.We then apply HF to character-level language modelling and get excellentresults. We also apply HF to optimal control and obtain RNN control laws thatcan successfully operate under conditions of delayed feedback and unknowndisturbances. Finally, we describe a random parameter initialization schemethat allows gradient descent with momentum to train RNNs on problems withlong-term dependencies. This directly contradicts widespread beliefs about theinability of first-order methods to do so, and suggests that previous attemptsat training RNNs failed partly due to flaws in the random initialization.

1 引言
2 项目背景
3 时域递归的约束玻尔兹曼机
4 基于Hessian-Free优化的RNN训练
5 基于RNN的语言建模
6 基于RNN的学习控制律
7 初始化良好RNN中的动量方法
8 结论

下载英文原文地址:

http://page5.dfpan.com/fs/3l9caj5242711269163/

更多精彩文章请关注微信号:\"在这里插入图片描述\"

收藏 打印