Generalized Visual Language Models-lilianweng

CLIP:Learning Transferable Visual Models From Natural Language Supervision

Untitled

Untitled

ALBEF: Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

Untitled