Deep Neural Networks: CS231n & Transfer Learning

Deep learning (also known as neural networks) has become a very powerful technique for dealing with very high dimensional data, i.e. images, audio, and video. As one example, automated image classification has become highly effective. This task consists of putting an image into one of a certain number of classes.

Look at the results of the ImageNet Large Scale Visual Recognition Challenge. This is an annual challenge for image classification, recognition and detection algorithms, for images from 1,000 categories. The top-5 error rate has decreased from 28% in 2010, to 26% in 2011, to 15% in 2012, then 11%, 7%, 3.5%, to 3% this year. (Classification error). Other tasks continue to improve with the use of neural networks, especially convolutional and recurrent neural networks.

The number of resources for learning about neural networks has also multiplied dramatically. One resource I can recommend firsthand is the CS231n Stanford course about image recognition. The course syllabus, including slides and lecture notes (and Jupyter notebook assignments) are available online. Additionally, the lectures are available on YouTube. I’m up to lecture 5 and I can highly recommend both the slides and YouTube lectures. There is really excellent theoretical background on topics like the history of image processing, loss functions, backpropagation, but also practical advice on weight initialization, transfer learning, learning rate, regularization and more.

Another resource that has been linked recently on HackerNews is the Yerevann guide to deep learning, which seems to be a good, deep source of information.


The folks at TensorFlow just keep improving their offerings. I recently implemented their transfer learning using Inception v3 using 8 categories I chose from ImageNet. It was surprisingly easy to train my own classifier, which was quite effective (top 1 error rate <10%)! The only real issue I had was some bazel errors which were resolved by upgrading my version of bazel. Training on roughly 8,000 images took about 12 hours on a MacBook Pro using CPU only. To be more specific, the bottleneck phase took about 12 hours, while the actual training about 20 minutes. Using ~1,000 images per category, is probably not totally necessary for an effective classifier, so you can likely cut down on this time dramatically.