To run these scripts you will need to clone the jean-zay-doc repo in your
$WORK dir and go to corresponding folder. Follow instructions detalied inside
cd $WORK &&\ git clone https://github.com/jean-zay-users/jean-zay-doc.git
In this tutorial you will learn to train a basic CNN model and tune its hyperparameters on Jean Zay using Slurm Batch and Slurm Job Array. Two implementations are available: single GPU and multi GPU.
This example will show you how to train a ResNet18 in a distributed setting (multi GPU and multi node) on the Jean Zay infrastructure using Slurm srun.