Interpreting Loss Curves

Machine learning would be much simpler if all your loss curves looked like this the first time you trained your model:

Unfortunately, loss curves are often challenging to interpret. Use your intuition about loss curves to solve the exercises on this page.

What three things could you do to try improve the loss curve shown in Figure 21?

Increase the number of examples in the training set.
Reduce the training set to a tiny number of trustworthy examples.
Reduce the learning rate.
Check your data against a data schema to detect bad examples, and then remove the bad examples from the training set.
Increase the learning rate.

Answer: 1, 2 & 3

Reduce the training set to a tiny number of trustworthy examples.

Although this technique sounds artificial, it is actually a good idea. Assuming that the model converges on the small set of trustworthy examples, you can then gradually add more examples, perhaps discovering which examples cause the loss curve to oscillate.
Reduce the learning rate.

Yes, reducing learning rate is often a good idea when debugging a training problem.
Check your data against a data schema to detect bad examples, and then remove the bad examples from the training set.
Yes, this is a good practice for all models.

Which two of the following statements identify possible reasons for the exploding loss shown in Figure 22?

The input data contains one or more NaNs - for example, a value caused by a division by zero.
The regularization rate is too high.
The input data contains a burst of outliers.
The learning rate is too low.

Answer: 1, 2

The input data contains one or more NaNs - for example, a value caused by a division by zero.

This is more common than you might expect.
The input data contains a burst of outliers.
Sometimes, due to improper shuffling of batches, a batch might contain a lot of outliers.

Which one of the following statements best identifies the reason for this difference between the loss curves of the training and test sets?

Answer: The model is overfitting the training set.

Yes, it probably is. Possible solutions:

Which one of the following statements is the most likely explanation for the erratic loss curve shown in Figure 24?

Answer: The training set contains repetitive sequences of examples.

This is a possibility. Ensure that you are shuffling examples sufficiently.