Reversible Vision Transformers

https://arxiv.org/abs/2302.04869v1
 https://i.imgur.com/XJlssKp.png

A memory efficient architecture design for visual recognition which decouples the GPU memory requirement from the depth of the model. We benchmark extensively across both model sizes and tasks and show up to 15.5x reduced memory footprint at roughly identical model complexity, parameters and accuracy.

Reversible Vision Transformers | GPU memory requirement | Vision Transformer | Multiscale Vision Transformers | image classification | object detection | video classification | reduced memory footprint | additional computational burden