Exploiting Subword Permutations to Maximize CNN Compute Performance and Efficiency

M. Beyer, S. Gesper, A. Guntoro, G. Payá-Vayá, и H. Blume.
2023 IEEE 34th International Conference on Application-specific Systems, Architectures and Processors (ASAP), стр. 61-68. (июля 2023)
DOI: 10.1109/ASAP57973.2023.00023

Аннотация

Neural networks (NNs) are quantized to decrease their computational demands and reduce their memory foot-print. However, specialized hardware is required that supports computations with low bit widths to take advantage of such optimizations. In this work, we propose permutations on subword level that build on top of multi-bit-width multiply-accumulate operations to effectively support low bit width computations of quantized NNs. By applying this technique, we extend the data reuse and further improve compute performance for convolution operations compared to simple vectorization using SIMD (single-instruction-multiple-data). We perform a design space exploration using a cycle accurate simulation with MobileNet and VGG16 on a vector-based processor. The results show a speedup of up to $3.7\times$ and a reduction of up to $1.9\times$ for required data transfers. Additionally, the control overhead for orchestrating the computation is decreased by up to $3.9\times$.

ключ BibTeX: 10265694
тип записи: inproceedings
название книги: 2023 IEEE 34th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
год: 2023
месяц: July
страницы: 61-68
issn: 2160-052X
DOI: 10.1109/ASAP57973.2023.00023

тэги

myown

Пользователи данного ресурса

Комментарии и рецензиипоказать / перейти в невидимый режим

Пожалуйста, войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)

BibSonomy