Gradient checkpointing
DevOps
About
Gradient checkpointing is an optimization technique used when training large deep learning models to drastically reduce GPU memory footprints. Instead of caching all intermediate activation states during the forward pass, it retains a small selection and recalculates the rest on-the-fly during backpropagation.