Unit Test ML Models in PyTorch for Gradient Updates

Editors: Justin Zhao, Daniel Treiman, Piero Molino

unit-testing-machine-learning-code-in-ludwig-and-pytorch-tests-for-gradient

A significant step was taken in Ludwig 0.6 to improve the code quality of Ludwig components, e.g., encoders, combiners and decoders. All of these components may be implemented as deep neural networks, and may have trainable or fixed parameters. Deep neural networks have many layers composed of a large number of parameters that must be updated to converge to a solution. Depending on the particular algorithm, the code for updating parameters during training can be quite complex. As a result it is near impossible for a developer to reason through an analysis that confirms model parameters are updated.

In this article we’ll show how we introduced a mechanism for testing the updates of the weights to Ludwig, how it improved code quality and how it can be used by the PyTorch community

How Neural Networks Are Trained

At a high-level Figure 1 shows the typical training cycle for a neural network. Input features from the training data are fed into the neural network where training data are combined with network parameters to produce predictions. These predictions are compared to the target output features to compute loss values. Lower the loss, the better the model’s predictions match the training data. The backward pass computes the gradient of each parameter with respect to the total loss for each batch of training examples. These gradients are used by the optimizer to update the network parameters to minimize the total loss. This process is repeated many, many times until the model “converges” to a solution.

Figure 1: Neural Network Training Cycle

Detecting Errors in Neural Architectures

Computations performed in the forward pass range from very simple, sequential processing to a complex set of computations that depend on various factors, such as type of training data and pseudo random number values. If these complex computations are not handled correctly, this may produce miscalculations of the gradient and inconsistent updating of network parameters during the optimization step.

Errors in neural network architectures may be harder to detect since unlike regular code, no hard errors are generated. Subtle architecture issues may surface only during model training, manifesting as slightly reduced performance or slower convergence time.

Introducing the `check_module_parametes_updated` utility

To address this difficulty, a reusable utility called check_module_parametes_updated() was developed by the Ludwig team to perform a quick sanity check on Ludwig encoders, combiners and decoders to ensure parameters, such as weights and biases, are updated during one cycle. This work was inspired by these earlier blog postings: How to unit test machine learning code and Testing Your PyTorch Models with Torcheck.

Before continuing, a note on terminology. For the remainder of the post, the term “parameter” will mean Torch’s Parameter object. This should not be confused with the other use of the term “parameter” referring to the underlying Torch.tensor objects.

This code fragment illustrates this difference. For a fully connected layer (the FCLayer module in Ludwig) named fc_layer defined below,

check_module_parameters_updated()

returns 2 trainable Parameter objects, which represent weights and bias for that fc_layer. For the same object torchinfo.summary() reports there are 4,352 parameters. These are the individual tensors that make up fc_layer weights (16 X 256 = 4,096) and bias (256) that add up to 4,352.

How does check_module_parameters_updated() work?

In a nutshell, the utility performs a minimal version of the core learning procedure in Figure 1 using synthetic data. As described in the function’s docstring, there are three required positional parameters:

module: this is the instantiated Ludwig object to be tested, i.e.,encoder, combiner, or decoder
module_input_args: tuple of synthetic tensor data that is passed to the forward() method of the module being tested.
module_target: target values used to compute losses.

The other function parameters are optional with the specified default values, which can be modified if needed.

Let’s now look at how the functionality is implemented. Before executing the steps shown in Figure 1, the function sets up some key objects, such as the loss function to use and the optimizer. As noted earlier there is some ability to customize these key objects with the optional function arguments.

The next bit of setup involves initializing data structures to capture results of the Figure 1 process.

These are the key code fragments of the function:

This makes a forward pass with the synthetic input data through the Ludwig module under test.

After taking the forward pass through the module, the function computes the loss from the predicted value and target, after which it performs the backward pass to compute gradients and updates the Parameter objects.

compute gradients and updates the Parameter objects

The function checks each Parameter. If a Parameter object contains a non-zero gradient tensor it means it was updated. If it is updated, the Parameter is recorded in a list to keep track of what Parameters were updated.

keep track of what Parameters were updated

As noted earlier this work was inspired by two other blog posts. These posts describe taking a copy of the parameters before the backward pass and optimizer step and compare the before values against the values after the optimizer step to see changes. In our approach, which depends on the gradient value only, we eliminate the need to duplicate and update the model’s parameters. Depending on the model could involve a large amount of memory.

After capturing all updated parameters, the utility now captures any parameter not updated.

Finally the function returns results of the above checks, which are:

frozen_parameters : count of frozen parameters
trainable_parameters : count of trainable parameters
parameters_updated : count of updated parameters
parameters_not_updated : the list of parameters that are not updated, if any.

How is check_module_parameters_updated() used in unit tests?

First a developer will need a good understanding of PyTorch classes used to build neural networks. This knowledge is needed to understand how Parameter objects are used in the classes.

With this understanding, the developer will need to understand the number and types of Parameters that are contained in the Ludwig module to be tested. One method is by printing the module. Using the fc_layer object from the earlier example, we see

This shows fc_layer is made up of torch.nn.Linear and torch.nn.ReLU classes. From the perspective of Parameter, fc_layer contains 2 Parameters (weights and bias in torch.nn.Linear). The torch.nn.ReLU is an activation function and does not contain a Parameter.

We ‘ll now show two examples of unit tests using the check function. The Ludwig Developer Guide contains a section that describes how to incorporate parameter update checking into the unit tests in more detail.

Simple Parameter update check

This unit test is composed of these steps:

Set the random seed to ensure repeatability
Create synthetic input tensor and instantiate the ResNetEncoder to test
Confirm the output contains the expected content and is the correct shape
Create synthetic target tensor and use it to check for parameter updates

In this simple case, the number of updated parameters (upc) should equal the number of trainable parameters (tpc). If this is not the case, raise an AssertionError because some of the parameters that should have been updated were not.

The situation where upc == tpc is, for the most part, the expected case and is the approach described in the two blog posts mentioned earlier. However, there are times when this is not the case and is not an error. The next example illustrates this situation.

Complex Parameter update check

This covers one of the unit tests for the TabTransformerCombiner. To understand circumstances when upc != tpc one needs to understand how TabTransformer works. Figure 2, which comes from the paper, shows the TabTransformer architecture. From this we see there are two processing paths through the network. One path involves categorical features and the other involves numerical (aka continuous) features.

Figure 2: TabTransformer Architecture

For the situation where it is not possible to know ahead of time if categorical features will be in the input, then Parameters associated with the Transformer stack may not be used at all. While the count of trainable parameters will include all Parameters in the model, the Transformer only parameters will not be reflected in the count of updated parameters. For this situation, adjustments need to be made to reflect this operation. This is shown in the final Assert check where the number of parameters in the Transformer stack is subtracted from the count of trainable parameters.

Assert check where the number of parameters in the Transformer stack is subtracted from the count of trainable parameters

Summary

As neural network architectures become more sophisticated, their processing complexity increases. This makes it almost impossible to verify their correct operation through static analyses, such as code reviews. The check_module_parameters_updated() function was developed to assist Ludwig developers in confirming correct operation of Ludwig components. An advanced user who is developing custom components can also use this capability in their work.

Unit Test ML Models in PyTorch for Gradient Updates

How Neural Networks Are Trained

Detecting Errors in Neural Architectures

Introducing the `check_module_parametes_updated` utility

How does check_module_parameters_updated() work?

How is check_module_parameters_updated() used in unit tests?

Simple Parameter update check

Complex Parameter update check

Summary

Related Articles

Ludwig 0.5: Declarative Machine Learning, now on PyTorch

Ludwig AutoML for Text Classification

Boost Tabular Data Predictions with Tree Models in Ludwig 0.6

Join Our Community!

Unit Test ML Models in PyTorch for Gradient Updates

How Neural Networks Are Trained

Detecting Errors in Neural Architectures

Introducing the check_module_parametes_updated utility

How does check_module_parameters_updated() work?

How is check_module_parameters_updated() used in unit tests?

Simple Parameter update check

Complex Parameter update check

Summary

Related Articles

Ludwig 0.5: Declarative Machine Learning, now on PyTorch

Ludwig AutoML for Text Classification

Boost Tabular Data Predictions with Tree Models in Ludwig 0.6

Join Our Community!

Introducing the `check_module_parametes_updated` utility