参考文献
[1] TURING A M. Computing machinery and intelligence[J]. Mind, 1950, 59(236): 433-460.
[2] SHANNON C E. A Mathematical Theory of Communication[J]. The Bell System Technical Journal, 2001, 5(3): 3-55.
[3] ZHU Y, KIROS R, ZEMEL R, et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and ReadingBooks[J]. IEEE, 2015.
[4] D Masters, Luschi C. Revisiting Small Batch Training for Deep Neural Networks[J]. 2018.
[5] CARLINI N, TRAMER F, WALLACE E, et al. Extracting Training Data from Large Language Models[J]. 2020.
[6] LUO R, XU J, ZHANG Y, et al. PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation[J]. 2019.
[7] VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need[J]. arXiv, 2017.
[8] KINGMA D, BA J. Adam: A Method for Stochastic Optimization[J]. Computer Science, 2014.
[9] LOSHCHILOV I, HUTTER F. Decoupled Weight Decay Regularization[J]. 2017.
[10] GAO J, LI M, HUANG C N, et al. Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach[J]. ComputationalLinguistics, 2005.
[11] KANDOLA E J, HOFMANN T, POGGIO T, et al. A Neural Probabilistic Language Model[M]. Springer Berlin Heidelberg, 2006.
[12] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient Estimation of Word Representations in Vector Space[J]. Computer Science, 2013.
[13] GRAVES A. Generating Sequences With Recurrent Neural Networks[J]. Computer Science, 2013.
[14] CHO K, MERRIENBOER B V, GULCEHRE C, et al. Learning Phrase Representations using RNN Encoder-Decoder for Statistical MachineTranslation[J]. Computer Science, 2014.
[15] SUTSKEVER I, VINYALS O, LE Q V. Sequence to Sequence Learning with Neural Networks [J]. Advances in neural information processingsystems, 2014.
[16] ITTI L. A Model of Saliency-based Visual Attention for Rapid Scene Analysis[J]. IEEE Trans, 1998, 20.