Jean Zay users¶
Why this doc?¶
We are researchers and engineers in AI (very vague term but oh well ...) who have managed to get access to Jean Zay and think this can be a very useful cluster for your AI research.
At the time of writing (November 2020), the GPU part of Jean Zay is very much underused and we think a user-contributed documentation could help people navigating the access procedure and knowing a few necessary tips and tricks to be productive on such a cluster.
We use gitter for chat, don't hesitate to get involved there and ask questions!
- Access procedure. The access procedure for Jean Zay will take roughly 3 weeks (add 1-2 months on top of that if you have to go through additional security background checks). It does seem long but it is definitely worth it.
- Tips and tricks
- Example scripts: PyTorch examples, Tensorflow examples, Tensorflow MPI distributed examples.
In the medium term, more material could be added to discuss tips and tricks, limitations, work-arounds, etc ... on Jean Zay. In particular, feel free to share tutorials, tools and scripts to help users have a more productive use of the Jean Zay cluster, e.g.:
- how to make your code use checkpointing to be able to get long running processing despite the 20 hour wall time limit;
- how to make sure your code can leverage the hardware optimally (e.g. with mixed precision and tensorcores);
- how to make sure that your processing is not limited by suboptimal data access patterns on the disks or inefficient pre-processing on the CPUs;
- how to do efficient hyper-parameter tuning at scale;
- how to synchronize you code between local computer and the cluster.
- There are big differences in the way of working between traditional HPC (High Performance Computing) users and AI users. For example, most traditional "serious" HPC clusters do not have access to the internet, yes you have read this correctly, people in traditional HPC do not need internet access to work on their problems.
- So far every interaction we have had with Jean Zay user support has been very positive. Even if there may be some frustration (on both sides), try to be both pedagogical and constructive when you send an email to firstname.lastname@example.org.