About This Codebase

ITRA is a codebase for flexible and efficient Image Text Representation Alignment…

Model Builder

Training Objectives

  • CLIP: InfoNCE, ProtoCLIP

  • Self-supervised KD: RKD, SEED, CompRess, ProtoCPC, SimReg

  • VICReg, BarlowTwins, DINO

Downstream Evaluation

  • Image classification: zero-shot, linear/k-NN, and clustering evaluation (AMI, NMI) (from ProtoCLIP)

  • EVEVATER Image Classification Toolkit on 20 datasets

  • Image-text retrieval on MS-COCO dataset

  • Sentence embeddings (SentEval)

  • Passage retrieval on MS-MARCO and Wiki Sections

  • Word embeddings: RG65, Simlex999, WordSim353

  • Zero-shot VQA (TAP-C) and visual entailment