CLIP Pretraining
First, assume that you have already created an environment with required dependencies, prepared data for pre-training and downstream evaluations.
Then you can activate the environment and modify the PYTHONPATH variable, such that modules can be imported successfully.
conda activate ITRA
export PYTHONPATH="$PYTHONPATH:$PWD/src"
Standard Contrastive Language Image Pretraining From Scratch
Training a CLIP from scratch is the most straight forward usage of ITRA. By specifying --loss 'InfoNCE', the model will contrast image and text samples within a batch.
# Example command for a 8x2080ti machine
torchrun --nproc_per_node 8 -m training.main \
--dataset-size 14000000 --episode-size 14000000 --train-data 'cache/yfcc_nori.csv' --nori-dataset\
--epochs 8 --save-frequency 8 --batch-size 64 --workers 8 \
--lr 5e-4 --warmup 2000 --wd 0.5 --max-grad-norm 5 \
--image-model 'RN50' --image-model-builder 'openclip' --text-model 'RN50' --text-model-builder 'openclip'\
--loss 'InfoNCE' \
--report-to tensorboard --logs 'logs/example-usage/clip-pretraining/YFCC14M-8_epoch-RN50'
Train a Tiny CLIP
AlexNet, MobileNet?
Small SBERT?
GloVe Embeddings?