Last week I blogged about using Vgg16 to train and then differentiate between the dogs and cats. Lets dive slightly deeper this week and look at how we can build some similar thing from scratch.
Essentially the Vgg16 is a Convolutional Neural Network (CNN), a special type of neural net that is extremely powerful in image / visual recognition. I wouldn’t talk too much in this blog post about the technical details on what is CNN and how it works (probably because I am also not too sure LOL), but instead we will discuss how to build a CNN from scratch using Keras library. Maybe next week or one of the upcoming post after I study carefully and deeply into the CNN architecture, I will blog about the information on CNN 🙂
Based on Keras docs, Keras is a high-level neural networks API, written in Python and capable of running on top of either TensorFlow, CNTK or Theano. It was developed with a focus on enabling fast experimentation.
As you can see, we don’t need to know how Tensorflow or Theano works. Instead, Keras wraps these around with a high level API and within few lines of codes we can build a full fledged, powerful neural network. Also switching between different backend (Keras does not handle low level operations such as tensor product, convolutions etc., so it needs to have backend such as Tensorflow / Theano) is easy, as it is just matter of fact of changing the configuration files.
Sounds pretty good! What are we working on today?
As the feature image suggest, we will be working with numbers this time! As mentioned above, not going to talk about how us and machine recognize that the number is “215” (probably next time), instead we writing the code for our machine to identify that the 3 numbers is 2, 1 and 5 separately. Speaking about this, I realized the code I am going to post up doesn’t yet recognize a block of numbers, it could be something that I can work on next.
You can look at the following Jupyter notebook in my github, which contains comments and explanation along the way. Not going to copy whatever I have posted there here haha. It’s a very basic version of CNN that recognize single digit in greyscale format in 28×28 pixels.
Till next time when we improve our code or talk more about CNN!