Reproducible Deep Learning Using PyTorch

Have you ever tried comparing the results of different model configurations or various models? Maybe you tried to refactor your deep learning (DL) code without the purpose of changing its underlying functionality. Such as: rewriting messy code, integrating existing code with your task-specific requirements, optimizing specific code sections, etc. How did you ensure that no bugs were added?
Neural network projects are full of non-deterministic processes that result in different outcomes in each execution. To perform a fair comparison, you need to enable reproducibility.
In this article, I will explain how to achieve it. Let's start!
“The assumption of an absolute determinism is the essential foundation of every scientific inquiry” , Max Planck
Getting reproducible functionality in the machine learning process involves several considerations: random seeds, data splitting, data loading, and deterministic operations.
Random Seeds
When you set random seeds, you ensure that various pseudorandom number generators (PRNGs) are reproducible. Do it as follows:
SEED
can be any integer number of your choice.RandomState(MT19937(SeedSequence()))
creates a new one BitGenerator. You can also usenp.seed()
that initializes the python RNG (reseeds the BitGenerator) and sets seed for custom operators, but note that NumPy suggests the first option as the best practice.np.random.seed()
reseed the BitGenerator. If any of the libraries or code rely on NumPy, seed it.torch.manual_seed()
sets the seed for generating random numbers.
Data Splitting
When you "randomly" split your data to train and validation subsets, you must ensure that you can reproduce the same data split the next time you run and evaluate the model. For this, you set the seed as follows:
random_state
controls the shuffling applied to the data before applying the split, enabling reproducible output across multiple function calls.
Data Loading
You need to make sure that your data loading process is reproducible. The data loaded to your model in each execution of the whole algorithm should be the same to make the result comparable.
As suggested by NVIDIA GitHub, to shuffle differently but reproducibly every epoch, you should reset the generator by creating an instance (self.g
) of torch.Generator
in torch.utils.data.DataLoader
and use it as follows:
set_epoch
should be called at the start of each epoch.
Note, that the function is implemented as a method inside DataLoader class, but you can implement is a way more convenient for you.
Additionally, as suggested by PyTorch documentation, in multi-process data loading algorithm DataLoader
will reseed the workers using worker_init_fn()
to preserve reproducibility in the following way:
Please note, when creating a torch.Generator()
object, the device should be mentioned, as shown below:
I encourage you to experiment with different random states for data split and loading and examine the differences in results.
Deterministic Operations
NVIDIA CUDA Deep Neural Network (cuDNN) is a GPU-accelerated library for deep neural networks. CUDA is a framework that serves as a basis for cuDNN that uses it for accelerated computational tasks and is optimized for it. CuDNN is a kind of Deep Learning library that uses CUDA, and CUDA is the way to contact the GPU.
CUDA
Set the seed for generating random numbers for the current GPU:
or for all GPUs:
cuDNN
As declared in NVIDIA cuDNN documentation, most of cuDNN's routines belonging to the same version are designed to generate the same bit-wise results across runs when executed on the same architecture GPUs. However, there are exceptions, such as ConvolutionBackwardFilter, ConvolutionBackwardData, PoolingBackward, SpatialTfSamplerBackward, CTCLoss, etc., that don't guarantee reproducible results, even running on the same architecture. The reason is the use of atomic operations (program operations that run entirely independently of any other processes) in a way that introduces truly random floating point rounding errors. Also, across different architectures, no cuDNN routines guarantee bit-wise reproducibility.
So, what should you do?
torch.backends.cudnn.deterministic = True
causes cuDNN only to use deterministic convolution algorithms. It does not guarantee that your training process will be deterministic if other non-deterministic functions exist. On the other hand, torch.use_deterministic_algorithms(True)
affects all the normally-nondeterministic operations. As the documentation states, some of the listed operations don't have a deterministic implementation, and an error will be thrown. If you need to use non-deterministic operations, the solution is to write a custom deterministic implementation yourself.
torch.backends.cudnn.benchmark = True
causes cuDNN to benchmark multiple convolution algorithms and select the fastest. So, when False
is set, it disables the dynamic selection of cuDNN convolution algorithms and ensures that the algorithm selection itself is reproducible. If your model does not change and your input sizes remain the same — then you may benefit from setting torch.backends.cudnn.benchmark = True
. In case of changing input size, cuDNN will benchmark every time a new input size appears, which will lead to worse performance.
However, if your model changes input sizes, some layers are activated in certain conditions, etc., then setting True
might stall the execution.
Note:
- In case using the aforementioned cuDNN settings will not reproduce your results, use
torch.backends.cudnn.enabled = False
. It controls whether cuDNN is enabled or not. Disabling cuDNN can solve the reproducibility issue. - Pay attention to the specific cuDNN, CUDA, or any Python library versions that are needed when utilizing cuDNN with machine learning models. Using the incorrect version can result in issues in your project.
Conclusion
To be on the "reproducible side", keep your 'Data Spliting' and 'Data Loading' processes as described above.
To quickly achieve deterministic behavior of the other part of your flow, I suggest defining the following function:
And call it at the beginning of your algorithm.