копировать удалить добавить публикацию в буфер
Запись сообщества
посмотреть историю данной записи
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

L. Yu, D. Simig, C. Flaherty, A. Aghajanyan, L. Zettlemoyer, и M. Lewis. (2023)cite arxiv:2305.07185.

Аннотация

Autoregressive transformers are spectacular models for short sequences but scale poorly to long sequences such as high-resolution images, podcasts, code, or books. We proposed Megabyte, a multi-scale decoder architecture that enables end-to-end differentiable modeling of sequences of over one million bytes. Megabyte segments sequences into patches and uses a local submodel within patches and a global model between patches. This enables sub-quadratic self-attention, much larger feedforward layers for the same compute, and improved parallelism during decoding -- unlocking better performance at reduced cost for both training and generation. Extensive experiments show that Megabyte allows byte-level models to perform competitively with subword models on long context language modeling, achieve state-of-the-art density estimation on ImageNet, and model audio from raw files. Together, these results establish the viability of tokenization-free autoregressive sequence modeling at scale.

Описание

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Линки и ресурсы

ключ BibTeX: yu2023megabyte
тип записи: misc
год: 2023
url: http://arxiv.org/abs/2305.07185
Примечание: cite arxiv:2305.07185

тэги

@vincentqb- тэги данного пользователя выделены

attention

Цитировать эту публикацию

искать в

Метаданные

Последнее изменение 10 месяцев назад
Создан 10 месяцев назад

Комментарии и рецензии
(0)

Комментарии, или рецензии отсутствуют. Вы можете их написать!