Basics of Convolutional Neural Network (CNN)- Part 2

4 min readJul 16, 2021

We have already seen about very first layer in CNN, that is Convolution layer in my previous article ( Basics of Convolutional Neural Network (CNN)- Part 1)

Before moving to next layer, there are few operations that you need to know in convolution layer itself which we have not discussed yet .These operations are Padding and applying stride on an image. Let’s look into these one by one now.

What is Padding ?

In padding we add ‘zero value’ pixels around the image so that information in corner pixels of an image will be retained even after convolution.

Since in convolution, we know that our image size will get reduced ( it will drop from W*H to (W-F+1)*(W-F+1) where F is Filter size) causing an information loss in corner pixels ( Note-we need not to apply padding always )

Padding ( denoted by ‘P’ ) will keep your output convolved image size same as that of input image size.

Let’s take an input image N*N which is convolved with Filter of size F*F

Before Padding — (N*N) * (F*F) = (N-F+1)*(N-F+1)

After Padding — ((N+2P)*(N+2P)) * (F*F) = (N+2P-F+1)*(N+2P-F+1)

(N+2P-F+1)*(N+2P-F+1)= (N*N) (here output and input are of same size N*N)

Thus, we can write,N+2P-F+1 = N , solving it further, P = (F-1)/2

Now let’s move 3*3 (F*F) filter over an image of size 26*26*3. Here we will perform convolution with padding P=1.

(Why it’s value is 1 ? — (F-1)/2 = (3–1)/2 = 1) (Note : P=0 indicates no padding or zero padding)

Now what about Stride?? It’s a movement of your filter. In general, your filter moves one pixel by pixel over image, in this case stride value( denoted by ‘S’) will be 1.

Sometimes we need to move our filter by keeping 2 pixels difference ( especially when adjacent pixels are similar, we can skip one out of it), In this case, S will be of value 2.

That’s it !! We complete convolution layer operation by adding padding and stride in it.

Let’s summarize convolutional layer calculations as

Input Image — W1 * H1* D1

Filter — F*F

Number of Filters — K

Padding — P

Stride — S

Output (convolved) image — W2 * H2 * D2

where W2 = ((W1- F +2P)/S) + 1 , H2 = ((H1- F +2P)/S) + 1 and D2=K

(here D1 and D2 indicates number of channels in input and output image respectively)

Let’s understand this using an example, Consider a 5*5*3 input image with K=2 , F = 3, S = 2 and P = 1. Find output image size

**After applying ‘padding’ and ‘stride’**

Applying above discussed relation, we get output image size as

[((5–3+2)/2)+1] * [((5–3+2)/2)+1] * 2 = 3*3*2

A convolved image is further processed through Pooling layer

What does exactly Pooling perform ? Answer is It will reduce your spatial dimension that means it divides width and height of input image by certain size ( we call it as ‘pool size’)

Where should we keep this Pooling layer in CNN architecture? Answer is We insert one pooling layer in between two convolutional layers (Note — we have multiple convolutional and pooling layers in CNN architecture)

What exactly we use in Pooling layer ? Answer is Generally we use 2*2 filter and we will move this filter over an image with stride 2 ( S=2)

We use MAX pooling to get maximum value from four adjacent pixels

As described above, we have Max pooling method where we consider maximum value out of four adjacent pixel values ( why 4? because we are using 2*2 filter with stride 2 ) Similarily we have Average pooling method in which we take average of adjacent pixels.

Note: pooling won’t change depth of your image which means 26*26*8 image will become 13*13*8 after pooling ( only width and height is changing here )

Ok!! I think we should take one example here !!

Consider an Input of size 64*64*3 with K=6 , F= 5 and S=1 for first convolution layer , with K=10 , F= 5 and S=1 for second convolution layer. Find output image size by performing pooling twice on respective convolutional layers. Take default parameters for pooling(F=2 and S=2) and Use zero padding (P=0)

Congratulations !! You are almost done with basics of CNN now. At next level, you need to apply output image obtained after multiple Convolutional and pooling layers to Softmax function which will activate respective neurons in layers (Input layer, hidden layer/layers and output layer)of Neural network which will then classify the object in your image based on scores obtained.

There are still many things to discuss in CNN about Neural network part which I will discuss in coming articles on CNN. Stay tuned!!

Read Basics of Convolutional Neural Network (CNN)- Part 1

Happy learning :-)

Basics of Convolutional Neural Network (CNN)- Part 2

Written by Rahul Bal