Welcome to the documentation of ITRA ! 🎈
ITRA
(abbreviation for Image Text Representation Alignment) is a codebase for flexible and efficient vision language learning. ITRA
features a unified interface to easily access state-of-the-art pretrained models, adapters, loss functions from various sources.

ITRA
supports training, evaluation and benchmarking on a rich variety of tasks, including zero-shot/k-NN/linear classification, retrieval, word embedding and sentence embedding evaluation. In the meantime, ITRA
is also highly modular extensible and configurable, facilitating future development and customization.
Important
ITRA
is an ongoing project developing by the Artificial Intelligence of Multi-modality Group (AIM Group, https://multimodality.group) at Hohai University lead by Prof. Fan Liu. A temporary repository of the codebase is located at: https://github.com/ChenDelong1999/ITRA

Note
If you find any bugs or have any recommendations for building ITRA
, please raise a issue in the repo, thanks~
Introduction
Getting Started
User Guide
Example Usage
Todo
New features incoming👇
Refract main.py
Write help messages for arguments
Use YAML
- Project
install as package
Pypi package publishing
- Evaluation reports
zero-shot classification
linear/knn classification
clustering evaluation
SentEval
word embedding
MS Marco retrieval
Chinese CLIPs’ Evaluation Reports (ImageNet-CN zero-shot, MC-COCO-CN retrieval)
- Implementations
UniCL-based image classification
Validate loss functions
Validate Adapters
SimCSE and PromptBERT re-implementation
Vision-to-language Knowledge Distillation
Language-to-vision Knowledge Distillation
Teacher selection based on Information Bottleneck Theory