Enviar aquest missatge de text: Optimal Gradient Checkpointing for Sparse and Recurrent Architectures using Off-Chip Memory