Abstract: Vision Transformers (ViTs) have become one of the dominant architectures in computer vision, and pre-trained ViT models are commonly adapted to new tasks via finetuning. Recent works ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results