An Idiot's Guide to "Deep Learning" (Part 1 - A Toe in the Water)

POSTED IN: Data Analytics & Visualization Blog


"Who moved my cheese?" asked someone in the 90's - and me today.  I've spent several months going deep on Machine Learning (especially random forest), dimensionality reduction, and Topological Data Analysis (TDA).  Fascinating stuff to learn about - though took me a while to get my head around a few of the concepts. Anyway, just when I was starting to feel pretty good about my progress...

BANG!  Deep Learning     ....zoink!

From three sources, in three days. "Deep Learning" - It really feels like someone might have "moved my cheese"  Anyway, I need to get my head around this, so thought I'd do a little legwork, and also share my journey with other folks hoping to get their bearings, or new to the subject (like me).   

Goal - Define "Deep Learning" and Key Components

So what is Deep Learning and how does it differ from plain old ML and Neural Networks?   Is this old wine in new bottles, hype driven, or has something else changed? 

What I've learned so far

1) LAYERED: " The term "deep learning" gained traction in the mid-2000s after a publication ..showed how a many-layered feedforward neural network could be effectively pre-trained one layer at a time, treating each layer in turn as an unsupervised machine, then using supervised backpropagation for fine-tuning" "Data is generated by interactions of many different factors on different levels. Deep learning adds the assumption that these factors are organized into multiple levels, corresponding to different levels of abstraction or composition... Deep learning algorithms in particular exploit this idea of hierarchical explanatory factors" (From WIki page - link below)

'Deep learning refers to a relatively recently developed set of generative machine learning techniques that autonomously generate high-level representations from raw data sources, and using these representations can perform typical machine learning tasks such as classification, regression and clustering. Many of the most important deep learning techniques are extensions of neural network methods and a simple way to understand them is to think of multiple layers of neural networks linked together.'

'A deep neural network (DNN) is defined to be an artificial neural network with at least one hidden layer of units between the input and output layers - to achieve abstraction'

2) UNSUPERVISED - "Unlike supervised machine learning, deep learning is mostly unsupervised. large-scale neural nets that allow the computer to learn and “think” by itself" ; No "training" set, no feedback loop, no 'reward signal'.;   (note above there is a "supervised backpropagation for fine-tuning")

3) SUBSET OF MACHINE LEARNING - It's a subcategory of machine learning, that uses neural networks to improve or better understand things like speech recognition, computer vision, and natural language processing.   Also, it's not new, but it's getting a 'second wind' it seems.  

4) BLEEDING EDGE , depending on your definition -  "Yoshua Bengio, an AI researcher at the University of Montreal, estimates that there are only about 50 experts worldwide in deep learning, many of whom are still graduate students. He estimated that DeepMind employed about a dozen of them on its staff" - suspect this is a point that many folks would take exception to. 

5) TRENDING (or starting to) (Graph Above, Google Trends)  Definitely on the rise, but given light volume, still early to call it a hockey stick.  It's not hit prime-time/mainstream yet.  Also interesting to see the "cross" of ML and NN in 2010.

6) HYPE AND MONEY Google acquired DeepMind Technologies in January  - which got people's attention.  For a sector (Data Science / ML) already bubbling, this turned up the gas even more.

December 2013, Facebook announced that it hired Yann LeCun to head its new artificial intelligence (AI) lab with operations in California, London, and New York.

7) BUT IT WORKS - "In 2009, deep multidimensional LSTM networks demonstrated the power of deep learning with many nonlinear layers, by winning three ICDAR 2009 competitions in connected handwriting recognition, without any prior knowledge about the three different languages to be learned" - see wiki footnotes. 

In the Google/Stanford paper from 2012 "Building High-level Features Using Large Scale Unsupervised Learning" - they achieved a 70% improvement in cat-detection technology :)  (goal to build a high-level, class-specific feature detectors from unlabeled images - e.g. face detectors"  

From Josh Bloom, CTO at Wise.IO "The accuracy wise.io is seeing that DL provides for multi class inference problems on sensor data, eg. imaging, is remarkable. Now, we're trying to make predictions from all the algorithms we use in our machine learning driven applications more interpretable for the end business user."

And to exercise / test it:  Automatic speech recognition is application - there is a popular TIMIT data set - used for initial evaluations of deep learning architectures. The entire set contains 630 speakers from eight major dialects of American English, with each speaker reading 10 different sentences.  (For images there is MNIST is composed of handwritten digits and includes 60000 training examples and 10000 test examples) 

Recent Developments in DL (gets technical in spots)

8) APPLIED IN FUTURE - Effective learning off of unlabeled data, using unsupervised methods = desired outcome with lots of practical applications.  Humans are visual creatures, cameras are everywhere, and capabilities that allow firms to excel in voice and visual ML, will allow them to outcompete their peers.

9) METAPHYSICAL - In looking at some of the research papers, and seeing the "Master Neuron" images of Cats and Faces - that wasn't any one cat or one face - I was struck by the parallels to Plato's Theory of Forms (Platonic realism is a philosophical term usually used to refer to the idea of realism regarding the existence of universals or abstract objects). 

Anyway, hope this helped a little.  I'm going to poke around and see if there are some examples I can work with for R. If I find any, will post them in this blog.  

Links & References


For the "R" Crowd

    R LIBRARIES & LINKS (Developing)


    Caffe Architecture, Install & Examples (have not tried yet)

    • Introductory slides: slides about the Caffe architecture, updated 03/14.
    • Installation: Instructions on installing Caffe (works on Ubuntu, Red Hat, OS X).
    • Pre-trained models: BVLC provides some pre-trained models for academic / non-commercial use.
    • Development: Guidelines for development and contributing to Caffe.



    Will keep working on this - let me know if you have any 'adds' or edits - Cheers! Ryan

    Before you can comment, you need to sign-up or login

    Ryan Anderson

    responded 9 months ago

    Post Author
    Neural Networks Demystified [Part 1: Data and Architecture]
    (multi-part series)
    Ryan Anderson

    responded May 2014

    Post Author
    edited May 2014
    REDDIT REPLY from Azsu via /r/MachineLearning/ sent 1 hour ago
    "This is exactly what I've been researching! The following are about DNNs or related material like RBMs or CNNs (I left out the general AI and statistics stuff):
    (great for when you start coding) http://www.heatonresearch.com/content/non-mathematical-introduction-using-neural-networks
    http://www.youtube.com/playlist?list=PLaXTOtKxwtZu5invOHNPfYobsWl9p_0c- http://www.stanford.edu/class/cs294a/handouts.html
    (awesome) https://class.coursera.org/neuralnets-2012-001
    (awesome) http://yann.lecun.com/
    (awesome) https://class.coursera.org/ml-005/lecture
    From REDDIT [[ delarhi ]]

    "It's worth noting that there were two parts to the deep learning timeline.

    The first part started about mid 2000s with, I think, Ruslan Salakhutdinov and Hinton's work on deep restricted Boltzmann machines. These were pretrained in an unsupervised manner to find better starting parameters for optimization. This started off a few years of deep learning research based on unsupervised pretraining and such.

    The second part started with Alex Krizhevsky and Hinton's work on deep neural networks for ImageNet classification. Their result in the paper was so far ahead of anything else done in ImageNet that it caught a ton of attention. I believe most of the deep learning work now is based off of this which involves no pretraining. It really is a huge neural network with a lot of tricks like rectified linear units, dropout, convolutional layers, etc. along with a shitload of data.

    My impression is that unsupervised pretraining is not really pursued anymore. Instead people just modify Krizhevsky's network to suit their needs, push data through it, and extract the nice features out of it when they're done.

    Krizhevsky's code can be found here:
    I prefer a more recently made framework called Caffe:
    If you're a fan of LeCun his framework is Torch 7. His recent object detection framework is called OverFeat:

    About the Author

    Ryan Anderson

    Ryan Anderson

    Hi! I like to play with data, analytics and hack around with robots and gadgets in my garage. Lately I've been learning about machine learning.

    About this blog

    Description is...
    Data Analytics & Visualization Blog - Generating insights from Data since 2013

    Created: July 25, 2014


    Up Next