Welcome to the documentation of ITRA ! 🎈

ITRA (abbreviation for Image Text Representation Alignment) is a codebase for flexible and efficient vision language learning. ITRA features a unified interface to easily access state-of-the-art pretrained models, adapters, loss functions from various sources.

_images/pipeline.png

ITRA supports training, evaluation and benchmarking on a rich variety of tasks, including zero-shot/k-NN/linear classification, retrieval, word embedding and sentence embedding evaluation. In the meantime, ITRA is also highly modular extensible and configurable, facilitating future development and customization.

Important

ITRA is an ongoing project developing by the Artificial Intelligence of Multi-modality Group (AIM Group, https://multimodality.group) at Hohai University lead by Prof. Fan Liu. A temporary repository of the codebase is located at: https://github.com/ChenDelong1999/ITRA

_images/modular.png

Note

If you find any bugs or have any recommendations for building ITRA, please raise a issue in the repo, thanks~

Todo

New features incoming👇

  • Refract main.py

  • Write help messages for arguments

  • Use YAML

  • Project
    • install as package

    • Pypi package publishing

  • Evaluation reports
    • zero-shot classification

    • linear/knn classification

    • clustering evaluation

    • SentEval

    • word embedding

    • MS Marco retrieval

    • Chinese CLIPs’ Evaluation Reports (ImageNet-CN zero-shot, MC-COCO-CN retrieval)

  • Implementations
    • UniCL-based image classification

    • Validate loss functions

    • Validate Adapters

    • SimCSE and PromptBERT re-implementation

    • Vision-to-language Knowledge Distillation

    • Language-to-vision Knowledge Distillation

    • Teacher selection based on Information Bottleneck Theory