概述
1.DBN was used ASR
[1]Mohamed A, Dahl G, Hinton G. Deep belief networks for phone recognition[C]//Nips workshop on deep learning for speech recognition and related applications. 2009, 1(9): 39
##2.DNN was first introduced to LVCSR
[1]Dahl, G.E., et al. Large vocabulary continuous speech recognition with context-dependent DBN-HMMS. 2011: IEEE.
[2]G. E. Dahl, D. Yu, L. Deng, and A. Acero, “Context-dependent pretrained deep neural networks for large-vocabulary speech recognition,” IEEE Transactions on Audio, Speech and Language Processing, vol. 20, no. 1, pp. 30–42, 2012.
2.end-to-end deep learning model was first raised
[3]Alex Graves and Navdeep Jaitly, “Towards end-to-end speech recognition with recurrent neural networks,” in International Conference on Machine Learning, 2014, pp. 1764–1772.
significantly improved*
[4]Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, et al., “Deep speech 2: End-to-end speech recognition in english and mandarin,” in International Conference on Machine Learning, 2016, pp. 173–182.
3.CTC
(1)Connectionist Temporal classifification (CTC) loss function proposed
[5]Alex Graves and Faustino Gomez, “Connectionist temporal classifification:labelling unsegmented sequence data with recurrent neural networks,” in International Conference on Machine Learning, 2006, pp. 369–376.
(2)began to be applied
[3]Alex Graves and Navdeep Jaitly, “Towards end-to-end speech recognition with recurrent neural networks,” in International Conference on Machine Learning, 2014, pp. 1764–1772.
[6]Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, and Adam Coates, “Deep speech: Scaling up end-to-end speech recognition,” Computer Science, 2014.
[7]Andrew Maas, Ziang Xie, Dan Jurafsky, and Andrew Ng, “Lexicon-free conversational speech recognition with neural networks,” in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015, pp. 345–354.
(3)LM rescoring mechanism
[6]Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, and Adam Coates, “Deep speech: Scaling up end-to-end speech recognition,” Computer Science, 2014.
(4)proposed alternative way called lattice rescoring by training a RNN-LM.
[8]X. Liu, Y. Wang, X. Chen, M. J. F. Gales, and P. C. Woodland, “Effificient lattice rescoring using recurrent neural network language models,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2014, pp. 4908–4912.
(5)applied the tuned parameters between acoustic model, language model and sentence length to alleviate the CTC issues when decoding the sentence.
[4]Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, et al., “Deep speech 2: End-to-end speech recognition in english and mandarin,” in International Conference on Machine Learning, 2016, pp. 173–182.
4.Attention-based
(0)Attention mechanism have show good performance in handwriting synthesis
[1]A. Graves. Generating sequences with recurrent neural networks. arXiv:1308.0850, August 2013.
machine translation
[2]D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In Proc. of the 3rd ICLR, 2015. arXiv:1409.0473.
image caption generation
[3]K. Xu, J. Ba, R. Kiros, et al. Show, attend and tell: Neural image caption generation with visual attention. In Proc. of the 32nd ICML, 2015. arXiv:1502.03044.
visual object classifification
[4] V. Mnih, N. Heess, A. Graves, et al. Recurrent models of visual attention. In Proc. of the 27th NIPS, 2014. arXiv:1406.6247.
(1)In Jun 2015, Attention-based seq2seq system was first introduced into English speech recognition
[1]Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio, “Attention-based models for speech recognition,” in Advances in neural information processing systems, 2015, pp. 577–585.
(2)In March 2016, LAS was examined on large-scale speech task
[2]William Chan, Navdeep Jaitly, Quoc Le, and Oriol Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” in International Conference onAcoustics, Speech and Signal Processing (ICASSP). IEEE, 2016, pp. 4960–4964.
show superior performance to a conventional hybrid system
[4]Chung-Cheng Chiu, Tara N Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J Weiss, Kanishka Rao, Ekaterina Gonina, et al., “State-of-the-art speech recognition with sequence-to-sequence models,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 4774–4778.
5.Transformer
(1)In Jun 2017, Transformer was propose on ML task
[4]Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 5998–6008.
(2)In Apr 2018,Transformer was first introduced into ASR
[5]Linhao Dong, Shuang Xu, and Bo Xu, “Speech-transformer: A no-recurrence sequence-to-sequence model for speech recognition,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 5884–5888.
(3)Investigated this model on Mandarin Chinese ASR task (HKUST dataset) with different modeling units, and found the character based model performs best.
In Apr 2018,
[6]Shiyu Zhou, Linhao Dong, Shuang Xu, and Bo Xu, “Syllable based sequence-to-sequence speech recognition with the transformer in mandarin chinese,” in Proc. Interspeech 2018, 2018, pp. 791–795.
In May 2018
[7]Shiyu Zhou, Linhao Dong, Shuang Xu, and Bo Xu, “A comparison of modeling units in sequence-to-sequence speech recognition with the transformer on mandarin chinese,” arXiv preprint arXiv:1805.06239, 2018.
(4)focuses on a large-scale Mandarin Chinese ASR task containing 8000 hours data, and makes three improvements to the SpeechTransformer model In May 2019
[8]Zhao, Yuanyuan , et al. “The Speechtransformer for Large-scale Mandarin Chinese Speech Recognition.” ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE, 2019.
最后
以上就是受伤云朵为你收集整理的语音识别深度学习模型发展历程论文简要总结1.DBN was used ASR2.end-to-end deep learning model was first raised3.CTC4.Attention-based5.Transformer的全部内容,希望文章能够帮你解决语音识别深度学习模型发展历程论文简要总结1.DBN was used ASR2.end-to-end deep learning model was first raised3.CTC4.Attention-based5.Transformer所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复