Is a convolutional layer capable of converting, for example, a binary image into an RGBA image?

Question

I am asking this question for a better understanding of the concept of channels in images.

I am aware that a convolutional layer generates feature maps from a given image. We can adjust the size of the output feature map by proper padding and regulating strides.

But I am not sure whether there exist kernels for a single convolution layer that are capable of changing an {RGBA, RGB, Grayscale, binary} image into (any) another {RGBA, RGB, Grayscale, binary} image?

For example, I have a binary image of a cat, is it capable to convert it into an RGBA image of a cat? If no, can it at least convert a binary cat image into an RGBA image?

I am asking only from a theoretical perspective.

You say "I am not sure whether there exist kernels for a single convolution layer", but note that the kernels in a CNN are usually learned. Note also that you don't need CNNs, for example, to convert RGB images into grayscale/binary ones. There are algorithms to convert RGB images into grayscale images. I'm not fully sure how this question that you're asking is related to your confusion about channels. — nbro, Jul 31 '21 at 12:55
I'm also not sure why you're so confused about this concept, as it's not really anything special (usually it just refers to the 3rd dimension of the image or feature map, i.e. would be a synonym for depth, although in the case of the images the depth has some meaning to us, as each slice, for example, in RGB images, corresponds to the values of the red, green and blue color, hence the name RGB) — nbro, Jul 31 '21 at 12:56
Please check [here](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html). It has two arguments `in_channels`, `out_channels`. What is the purpose of them? Example shows 16 input channels and 33 output channels. I am aware about images with 1 channel, 3 channels and 4 channels. @nbro — hanugm, Jul 31 '21 at 12:58
Someone in the other answered provided [a link to an answer on Data Science SE by an anonymous user (who was me before I delete the account there)](https://datascience.stackexchange.com/a/54032). It explains these parameters. — nbro, Jul 31 '21 at 13:01
In the input layer of the CNN, typically, you will have `in_channels == 1` or `in_channels == 3`. However, in hidden layers of CNNs, you can have `in_channels == K` for $K > 1$, because this corresponds to the depth of the feature map that you produced in the previous convolutional layer, which corresponds to the number of kernels that you applied to the input of the previous layer (I'm assuming a 2d convolution). — nbro, Jul 31 '21 at 13:03
Oh! So, number input channels is the depth of the image and number of output channels is the number of kernels. Why the example shows in_channels =16 then? Is it just for he sake of example or the example of its usage as hidden layer or are there any sensible images with number of channels equal to 16 @nbro — hanugm, Jul 31 '21 at 13:08

score 0 · Answer 1 · answered Jul 30 '21 at 15:12

0

No, because each output from a convolution layer only looks at a local region of the image. A convolution layer cannot do any global transformation, only local ones. Convolution layers must have translation invariance which means if it converts an eyeball to a tail at one position, it'll also convert the same eyeball to the same tail if it's found at a different position. If it's not overfitted, it will also convert similar eyeballs to similar tails. If you want only some eyeballs to become tails, you can't do that without introducing overfitting, or expanding the convolution size until the layer can see enough context to distinguish which eyeballs should become tails and which ones shouldn't.

If you want to change one image into a specific other image, and don't care what happens to all other images, it might be possible to create a convolution layer that does this transformation. The input image has to be different wherever the output image is different, or else the convolution layer won't be able to produce that difference in the output image. You would be teaching it to recognize the specific pixel patterns in the input image and generate the specific pixels in the output image. This would be an extreme case of overfitting and wouldn't work for any other input images.

The number of channels in the input and output image is irrelevant, except that more channels means the network has more data to learn from, obviously.

answered Jul 30 '21 at 15:12

user253751

922
3
11

**The number of channels in the input and output image is irrelevant, except that more channels means the network has more data to learn from, obviously.** does it mean it is possible to change image with $k$ channels with an image of $n$ channels?where $k \ne n$? – hanugm Jul 30 '21 at 22:33
@hanugm Yes, you can do this in any image processing software. Of course, if you change to a better format, it will not magically improve the data, and if you go to a worse format, the quality of the image will decrease. For example, you can convert a colour image to greyscale. – user253751 Jul 31 '21 at 00:02
If possible, please add in answer that an image of $k_1$ channels can be converted by a convolution layer in to an image of $k_2$ channels where $k_1 \ne k_2$. – hanugm Jul 31 '21 at 00:17
@hanugm You think there is something magical about channels for some reason. There is not. – user253751 Jul 31 '21 at 08:28
True. Due to my misconception about the word channels I am facing complications. – hanugm Jul 31 '21 at 08:59
@hanugm In an image, Red is a channel. Green is a channel. Blue is a channel. You should have no problem understanding that images are made of red, green and blue components. [here is a little visualization](https://commons.wikimedia.org/wiki/File:RGB_channels_separation.png) They are fully separate to the computer. Or if you are asking about channels in intermediate layers (are you?), those are the same thing but with a different purpose. Those won't be red, green or blue. They'll be more like, amount of horizontal lines, amount of vertical lines, amount of circles. – user253751 Jul 31 '21 at 09:57
I personally know that an RGB image has 3 matrices corresponding to each color dimensions. But, in literature, the word channel is confusing for me. For suppose, you can see that [Conv3d](https://pytorch.org/docs/stable/generated/torch.nn.Conv3d.html) has two parameters `in_channels`, `out_channels`. I am not sure whether these channels are the dimensions of the image i am thinking of. If yes, in many programs I saw their value beyond 3, but working on RGB images only. #1/#2 – hanugm Jul 31 '21 at 10:04
Similarly, I encountered some [other explanations using the word "channels"](https://ai.stackexchange.com/questions/29936/what-is-an-additional-channel-dimension-contain-in-batch-normalization) which I cannot reconcile with the interpretation I had. #2/#2 – hanugm Jul 31 '21 at 10:04
You can see the [confusions I have](https://ai.stackexchange.com/questions/tagged/channel) due to the word **channel**. In fact, I created the tag channel due to my misconceptions or some other reason. – hanugm Jul 31 '21 at 10:07
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/128072/discussion-between-hanugm-and-user253751). – hanugm Jul 31 '21 at 10:12

Is a convolutional layer capable of converting, for example, a binary image into an RGBA image?

1 Answers1