About This Codebase
ITRA is a codebase for flexible and efficient Image Text Representation Alignment…
Model Builder
TorchHub
ChineseCLIP
…
Training Objectives
CLIP: InfoNCE, ProtoCLIP
Self-supervised KD: RKD, SEED, CompRess, ProtoCPC, SimReg
VICReg, BarlowTwins, DINO
Downstream Evaluation
Image classification: zero-shot, linear/k-NN, and clustering evaluation (AMI, NMI) (from ProtoCLIP)
EVEVATER Image Classification Toolkit on 20 datasets
Image-text retrieval on MS-COCO dataset
Sentence embeddings (SentEval)
Passage retrieval on MS-MARCO and Wiki Sections
Word embeddings: RG65, Simlex999, WordSim353
Zero-shot VQA (TAP-C) and visual entailment
…