将近学习了两天半吧,结构弄得差不多了,Theano上LSTM tutorial 的例程也跑了跑,正在读代码ing。
1. 概念:
Long short-termmemory (LSTM)is a recurrent neuralnetwork (RNN)architecture (an artificialneural network)published[1] in 1997 by Sepp Hochreiter and Jürgen Schmidhuber. Like most RNNs, an LSTM network is universalin the sense that given enough network units it can compute anything aconventional computer can compute, provided it has the proper weight matrix, which may be viewed as its program. Unliketraditional RNNs, an LSTM network is well-suited to learn from experience to classify, process and predict time series when there are very long time lags of unknownsize between important events. This is one of the main reasons why LSTMoutperforms alternative RNNs and Hidden Markov Models and other sequence learning methods in numerousapplications.
3.1 前馈神经网络VS 反馈神经网络
在深度学习领域,传统的前馈神经网络(feed-forward neural net,简称FNN)具有出色的表现,取得了许多成功,它曾在许多不同的任务上——包括手写数字识别和目标分类上创造了记录。甚至到了今天,FNN在解决分类任务上始终都比其他方法要略胜一筹。
3.2 CNN vs RNN
《Convolutional Networks for Images, Speech,and Time Series》,by YannLeCun & Yoshua Bengio
While characters or short spoken words can besize-normalized and fed to a fixed-size network, more complex objects such aswritten or spoken words and sentences have inherently variable size. One way ofhandling such a composite object is to segment it heuristically into simplerobjects that can be recognized individually, e.g. characters phonemes. However,reliable segmentation heuristics do not exist for speech or cursivehandwriting. A bruteforce solution.....
Task 1 - Sentiment analysis: You're given some review, and youwant to predict the rating of the review.
Task 2 - Machine translation: Translate a sentence from some source language totarget language.
Now, the basic difference in terms of applicability of conv-net and RNN is thatconv-nets (like most other machine learning algorithm) take a fixed size inputand generate fixed-size outputs. RNN, on the other hand, can handle arbitraryinput/output lengths, but would typically require much more data compared toconv-nets because it is a more complex model.
Using this insight, we see that task 2 cannot be performed by conv-nets, sinceinputs and outputs are not fixed-length. So RNNs for task 2.
For task 1, however, you can use RNN if you have a lot of data. But you canalso use conv-nets - fix the length of the input, and adjust the input lengthby truncating or padding the actual input. Note that this will not affect thesentiment of the review much, so this is a reasonable approach. And since it'sa 1D convolution, that is typically used in sequences, it is called temporalconvolution. Conceptually, it is similar to 2D spatial convolution.
3.3 LSTM vs (传统)RNNs
1. AlexGraves. 《SupervisedSequence Labelling with Recurrent Neural Networks》. Textbook, Studies inComputational Intelligence, Springer, 2012.
“Long Short-term Memory (LSTM) is an RNN architecture designed to be better at storing and accessing information thanstandard RNNs. LSTM has recently given state-of-the-art results in a variety ofsequenceprocessing tasks, including speech andhandwriting recognition .”
2. Yann LeCun、Yoshua Bengio和Geoffrey Hinton合作的这篇综述文章《Deep Learning》
为了解决这个问题,一个增大网络存储的想法随之产生。采用了特殊隐式单元的LSTM(long short-termmemory networks)被首先提出,其自然行为便是长期的保存输入。一种称作记忆细胞的特殊单元类似累加器和门控神经元:它在下一个时间步长将拥有一个权值并联接到自身,拷贝自身状态的真实值和累积的外部信号,但这种自联接是由另一个单元学习并决定何时清除记忆内容的乘法门控制的。
Task | classification | sentiment analysis | machine translation | dialog | language generation | QA | total |
2006年以来,从Google Scholar上的检索数据进行对比 | |||||||
LSTM | 1900 | 148 | 616 | 373 | 27 | 59 | 3690 |
CNN | 5060 | 179 | 247 | 304 | 30 | 100 | 5670 |
从Web of Science数据库上的主题检索进行对比(全时间) | |||||||
LSTM | 56 | 0 | 1 | 0 | 6 | 2 | 248 |
CNN | 373 | 2 | 13 | 0 | 25 | 2 | 1064 |
数据尽管在检索上还有一些问题,尤其是 WOS数据库上涵盖的文章可能代表了一部分水平比较高的论文,在数量上并不完全按与研究的力度划等号,但还是可以看出一些端倪。
LSTM是RNN的一个优秀的变种模型,继承了大部分RNN模型的特性,同时解决了梯度反传过程由于逐步缩减而产生的Vanishing Gradient问题。具体到语言处理任务中,LSTM非常适合用于处理与时间序列高度相关的问题,例如机器翻译、对话生成、编码解码等。
