This is the second in a two-part series. Read Part 1 here.

Ready for a GPU to Enhance Your Workflow?

Improving the Accuracy of the Cat & Dog Convolutional Neural Network


In Part 1 of this blog, I compare the training performance of using CPU vs. GPU on many convolutional networks. I learned that the deeper and more complex the network is, the more performance benefits can be gained from using GPU.

In the second part of the blog, I describe the changes I made to the “Cat & Dog” Convolutional Neural Network (CNN) based on Venkatesh’s tutorial which improves the validation accuracy of the network from 80% to 94%. I also share the results of predictions from the trained network against random sets of images.

Figure 1: Original “Cat & Dog” Image Classification Convolutional Neural Network

Improvements to the Cat & Dog CNN

Add Convolutional and Max Pooling Layers

I added a pair of convolutional (64 filters) and max-pooling layers to the original network.  The additional depth of the network improves validation accuracy from 80% to 84%. It does not result in any noticeable change in training speed.

Figure 2: Add a pair of Convolutional and Max Pool Layers
Figure 3: Cat & Dog CNN with additional convolutional & max pool layers

Add Dropout Layer

I added a dropout layer with a rate of 0.5, which randomly removes neurons from the trained network to prevent overfitting.  The addition of the dropout layer improves the validation accuracy from 84% to 90%. It does not result in any noticeable change in training speed.

Figure 4: Add a Dropout Layer

Data Augmentation

Data augmentation is a technique that generates variations of training images from the original images through a shift, rotation, zoom, shear, flip, etc. to train the model. Checkout Keras documentation of ImageDataGenerator class for more details. The original CNN already incorporates data augmentations, so this is not an improvement per se, but I am interested in understanding the effect of data augmentation on accuracy and training speed.

Figure 5: Data Augmentation

The following are some examples of the augmented images.

Figure 6: Examples of Augmented Images

To test the effect of data augmentation, I remove the shear, zoom and flip operations from the image data generator.  The removal of data augmentation decreases the validation accuracy from 90% to 85%.  It is worth noting that data augmentation does come with a performance overhead.  Without data augmentation, the training performance on the GPU increases from 425 images/sec to 533 images/sec.

Increase the Target Image Resolutions

The original CNN resizes all images to 64×64 before training the model.  I increased the target resolutions to 128×128 and added another pair of convolutional and max pool layers to the network to capture more details of the images. I also increased the number of filters to 64 on all layers.  The new CNN with higher target image resolutions and more layers improves the validation accuracy from 90% to 94%. It also comes with performance overhead which decreases the training performance on the GPU from 425 images/sec to 333 images/sec, as shown in Part One of the blog.

Figure 7: Validation Accuracy of 94%
Figure 8: Improved Cat & Dog CNN


Now, it’s the fun part, which is to use the trained model for predicting cat or dog images.

Errors in the Original CNN Code

I want to point out that the prediction code from Venkatesh’s tutorial is missing a critical line of code.

Figure 9: Original Prediction Example

Using this code will incorrectly tilt the prediction toward “dog.”  The reason is the model is trained by rescaling the RGB values from 0-to-255 to 0-to-1 range.  For the model to predict correctly as trained, we must also rescale the input RGB values from 0-to-255 to 0-to-1 range.  Without rescaling, the input pixel values are 255 larger than what’s expected by the model, which will incorrectly tilt the result higher toward 1 (i.e., dog).

Another observation is the result[0][0] can return 0.00000xxx for cat and 0.99999xxx for dog, instead of absolute 0 or 1.  So, I also changed the check to “>0.5” rather than “==1”.  The modified and corrected code is shown the figure below.

Figure 10: Corrected Prediction Code with Changes Highlighted

Predicting Images

So, how ‘intelligent’ is the CNN?  How well can it predict cat and dog images, beyond those in the training and validation sets?  Can it tell that Tom from “Tom & Jerry” is a cat?  How about Pluto and Goofy from Disney?  How about Cat Woman?  How about wolves, tigers, and even human faces?  Below are randomly downloaded images and the prediction results from the model.

Figure 11: Random images for the model to predict
Figure 12: Results of prediction of random images


My two-week journey with the GPU loaner quickly came to an end.  It was a fun and productive learning experience. Without the training speeds from the powerful NVidia V100 GPU card, all the changes, tweaks and experiments with different network architecture, parameters and techniques would not be possible within such a short period.

This is the second in a two-part series. Read Part 1 here.

Ready for a GPU to Enhance Your Workflow?

Contact phoenixNAP today.

Complete the form below and our experts will contact you within 24 hours.