Activation function in CNN


New member
Mar 6, 2020
I know that using an activation functions in CNN helps to solve the linearity problem and while i can understand it in simple perceptrons i can not quite get it how it solves the linearity problem in the CNN (especially when we apply max pooling ). Can anyone please give me a mathematical example of how the linearity is kept to the next convolutional layer if i don't apply an activation function ?

Thank you in advance!


Full Member
Oct 30, 2021
Convolution layer in CNN is just a special case of a regular "fully connected" linear layer. Which means that having convolution layers does not eliminate the need for non-linearities, a.k.a. activation layers.

While max pooling is not a linear operation, it is "kind of linear": sums are not mapped to sums, but multiplication by a scalar produces multiplied output ([imath]P(\alpha x) = \alpha P(x)[/imath], where
[imath]P[/imath] is max pooling function, [imath]x[/imath] is its input and [imath]\alpha [/imath] is an arbitrary scalar. You can say that max pooling "isn't non-linear enough" to warrant elimination of standard non-linearities.