Friendly #pytorch warning (TIL): when you run automatic mixed precision, and you want to save a checkpoint, you should save and load the gradscaler state as well as the model and optimizer ones.
This is not mentioned in the normal checkpointing page, but it is mentioned on the AMP page:
https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html