Author of the publication

LongNet: Scaling Transformers to 1,000,000,000 Tokens

, , , , , , and . (2023)cite arxiv:2307.02486Comment: Work in progress.

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Label Embedding Network: Learning Label Representation for Soft Training of Deep Networks., , , and . CoRR, (2017)Lock-Free Parallel Perceptron for Graph-based Dependency Parsing., and . CoRR, (2017)A Bilingual Parallel Corpus with Discourse Annotations., , , , , and . CoRR, (2022)GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator., , , , , , , , , and . CoRR, (2022)DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders., , , , , , , , and . CoRR, (2021)A Length-Extrapolatable Transformer., , , , , , , , and . CoRR, (2022)GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation., , , , , , , and . IEEE ACM Trans. Audio Speech Lang. Process., (2023)LongNet: Scaling Transformers to 1,000,000,000 Tokens, , , , , , and . (2023)cite arxiv:2307.02486Comment: Work in progress.meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting., , , and . ICML, volume 70 of Proceedings of Machine Learning Research, page 3299-3308. PMLR, (2017)Multimodal Matching Transformer for Live Commenting., , , , , and . ECAI, volume 325 of Frontiers in Artificial Intelligence and Applications, page 1998-2005. IOS Press, (2020)