DeepMind Claims Image Captioner Alone Is Surprisingly Powerful then Previous Believed, Competing with CLIP

In a new paper Image Captioners Are Scalable Vision Learners Too, a DeepMind research team presents CapPa, a image captioning based pretraining strategy that and can compete CLIP and exhibit favora...

By · · 1 min read

Source: syncedreview.com

In a new paper Image Captioners Are Scalable Vision Learners Too, a DeepMind research team presents CapPa, a image captioning based pretraining strategy that and can compete CLIP and exhibit favorable model and data scaling properties, verifying that a plain image captioning can be a competitive pretraining strategy for vision backbones.