Basics of Convolutional Neural Network (CNN) — Part 1

Rahul Bal
4 min readJul 15, 2021

In this article, I will discuss some basic things about CNN that you need to know if you want to build accurate deep learning models. Hope you are really excited about this.

Let’s start learning CNN!

First of all, let me tell you about an application of CNN which is nothing but a Classification.

Yes !! you heard it correctly. We use CNN to classify the things, especially images.

For example, if we want to identify whether given image is of Table or Chair.

Table image
Chair image

We can do this by applying it as an input image to our CNN model. This image can be either gray scale ( single channel ) or RGB( three channels — Red, Green and Blue ). Here both images ( table and chair ) are RGB.

What final result we are expecting from our CNN model?

It should tell us correctly for any random related image (containing either table or chair in it ) whether its a table or a chair.

Now the question is how to build such model ? What is the basic theme behind this ?

Answer to above question is Feature extraction

Exactly!! Its all about finding important features from your image and processing these features further to neural network ( I will publish a separate article on Neural network soon ) which will predict if its a table or a chair. Hope you get some idea about what exactly we want from CNN.

Why it’s called as CNN?

As discussed earlier , we need to extract features from images for that we need to apply something on our image. Right?

what exactly is this ‘something’ ? — its a Filter

Yes!! we will apply this filter on our input image to get features from it. This method of moving filter on an image is called as ‘Convolution’

So,

Convolution + Neural Network = Convolutional Neural Network

There are many filters available which can be used to extract different types of features from an image. For example, we can use vertical filter to detect vertical edge from our image. ( Edge detection is one of the feature extraction methods )

The result of this convolution is called as Feature Map in CNN which consist of important features of that image and the layer where convolution happens is called as Convolution layer.

So here the bottom line is , when we apply our image to convolution layer, it generates a feature map which is again an image but with different dimensions. Now what this ‘dimension’ indicates ??

Dimension is nothing but a size of image. For example, if our input image is a gray scale ( 1 channel ), we can represent its size as W*H*1 ( where W is width , H is Height of that image in pixels and 1 indicates number of channels (N)) and if it’s a RGB image, W*H*3 will be it’s size. ( generally W and H are of equal length )

Here note that, we generally use filter of size 3*3* N ( 5*5 for bigger images) and N will be same as that of input image as we need to apply this filter through the complete depth of an image ( all channels). For example, if its 1 channel input image, we will need 3*3*1 size filter and if its a RGB image, we will need 3*3*3 size filter for convolution.

Now, can we use more than one filter to extract the features?

A BIG YESS, We can !!

because we can extract variety of features if we use different filters over an input image ( where ‘ k’ represents number of filters )

Convolutional layer

Here we are convolving W*H*N image with F*F*N filter resulting into a feature map ( Convolved image ) of size (W-F +1)*(W-F +1)*K

Let’s understand this using an example, Consider an input image of size 28*28*3 which is then convolved using two filters of size 3*3*3 which results in ( 28–3 +1)*(28–3 +1)*2 size. That is, 26*26*2 size convolved image.

Convolution is very first operation in CNN architecture. We will discuss about remaining operations of CNN in next article. Stay tuned !!

Read: Basics of Convolutional Neural Network (CNN) — Part 2

Happy learning :-)

--

--

Rahul Bal

AI-ML researcher, blogger, writer, content developer