Welcome to the documentation of ITRA ! 🎈

ITRA (abbreviation for Image Text Representation Alignment) is a codebase for flexible and efficient vision language learning. ITRA features a unified interface to easily access state-of-the-art pretrained models, adapters, loss functions from various sources.

ITRA supports training, evaluation and benchmarking on a rich variety of tasks, including zero-shot/k-NN/linear classification, retrieval, word embedding and sentence embedding evaluation. In the meantime, ITRA is also highly modular extensible and configurable, facilitating future development and customization.

Important

ITRA is an ongoing project developing by the Artificial Intelligence of Multi-modality Group (AIM Group, https://multimodality.group) at Hohai University lead by Prof. Fan Liu. A temporary repository of the codebase is located at: https://github.com/ChenDelong1999/ITRA

Note

If you find any bugs or have any recommendations for building ITRA, please raise a issue in the repo, thanks~

Introduction

Getting Started

User Guide

Example Usage

Todo

New features incoming👇

Refract main.py
Write help messages for arguments
Use YAML
Project
- install as package
- Pypi package publishing
Evaluation reports
- zero-shot classification
- linear/knn classification
- clustering evaluation
- SentEval
- word embedding
- MS Marco retrieval
- Chinese CLIPs’ Evaluation Reports (ImageNet-CN zero-shot, MC-COCO-CN retrieval)
Implementations
- UniCL-based image classification
- Validate loss functions
- Validate Adapters
- SimCSE and PromptBERT re-implementation
- Vision-to-language Knowledge Distillation
- Language-to-vision Knowledge Distillation
- Teacher selection based on Information Bottleneck Theory