Abstract
We consider an important task of effective and efficient semantic image
segmentation. In particular, we adapt a powerful semantic segmentation
architecture, called RefineNet, into the more compact one, suitable even for
tasks requiring real-time performance on high-resolution inputs. To this end,
we identify computationally expensive blocks in the original setup, and propose
two modifications aimed to decrease the number of parameters and floating point
operations. By doing that, we achieve more than twofold model reduction, while
keeping the performance levels almost intact. Our fastest model undergoes a
significant speed-up boost from 20 FPS to 55 FPS on a generic GPU card on
512x512 inputs with solid 81.1\% mean iou performance on the test set of PASCAL
VOC, while our slowest model with 32 FPS (from original 17 FPS) shows 82.7\%
mean iou on the same dataset. Alternatively, we showcase that our approach is
easily mixable with light-weight classification networks: we attain 79.2\% mean
iou on PASCAL VOC using a model that contains only 3.3M parameters and performs
only 9.3B floating point operations.
Users
Please
log in to take part in the discussion (add own reviews or comments).