0

In practical applications, we generally talk about three types of convolution layers: 1-dimensional convolution, 2-dimensional convolution, and 3-dimensional convolution. Most popular packages like PyTorch, Keras, etc., provide Conv1d, Conv2d, and Conv3d.

What is the deciding factor for the dimensionality of the convolution layer mentioned in the packages?

nbro
  • 39,006
  • 12
  • 98
  • 176
hanugm
  • 3,571
  • 3
  • 18
  • 50
  • What do you mean by "dimensionality of the convolution layer"? A convolutional layer is just **an abstraction to perform the convolution** (i.e. a function). So, to me, it doesn't make sense to ask "What is the dimensionality of a function?". So, you should start by clarifying this before proceeding. Also, I don't understand this sentence "Neither the shape of the input nor the shape of the kernel decides the dimensionality of the convolution operation." because I don't understand what you mean by "dimensionality of a layer". So, you should also clarify that sentence and why you claim that. – nbro Oct 01 '21 at 15:23
  • @nbro Is it fine now? I am asking about the dimensionality of the layers that are discussed in packages. – hanugm Oct 07 '21 at 10:09
  • To be honest, your last edit doesn't clarify what was in my opinion unclear. As I said, convolutional layers should be seen more as functions rather than tensors (indeed, they are just an implementation of a function, i.e. the convolution). So, I don't understand what you mean by "dimensionality of a function here". Maybe you mean "the dimensionality of the input to and output of the convolutional layer" That would make more sense. Like in the function $f : \mathbb{R}^n \rightarrow \mathbb{R}$. If that's not what you mean, I don't understand what you mean. – nbro Oct 07 '21 at 14:51
  • If you're asking about whether we use 3d or 2d convolutions, then that's a different story than what your post seems to suggest. If we use 3d or 2d convolutions, it's a matter of choice of the programmer (usually). Anyway, please, edit your post to clarify what you really mean by "dimensionality of the convolutional layer". – nbro Oct 07 '21 at 14:55
  • Are you now asking "What do people refer to when they use the word 'dimensionality' in the context of convolutional layer"? If yes, I would suggest that you remove the sentences "The dimensionality I am referring to is obvious from this context." and "Neither the shape of the input nor the shape of the kernel decides the dimensionality of the convolution operation.". It may also be a good idea to quote someone that used this terminology, i.e. "dimensionality of convolutional layer". The context will probably be very useful to answer that question. – nbro Oct 08 '21 at 13:17
  • Ok, now, the question is clearer to me, but still not completely clear. Are you asking 1. how we choose between conv1d, conv2d, and conv3d, or 2. what makes, for example, a conv2d and 2d convolution rather than a 1d or 3d convolutional, i.e. why is e.g. a conv2d a convolution in 2d? These questions are related, but different, I guess. – nbro Oct 08 '21 at 20:05
  • I am asking for 2. Which element/entity decides 1d,2d or 3d. @nbro – hanugm Oct 08 '21 at 21:42

2 Answers2

1

Kernel dimensionality and presence of filters decides the dimension of convolution operator. N Dimensional Convolutions have N dimensional kernels. For example, from Keras Documentation on 2 Dimensional Convolutions:

kernel_size: An integer or tuple/list of 2 integers, specifying the height and width of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.

If you have more than one filter in the layer, that also adds another dimension to the layer. So, we can say that a 2D convolutional layer is in general 3 dimensional, where 3rd dimension is the number of filters: (k,k,F). For the special case of a single filter, F=1 and we can treat it as 2 dimensional.

serali
  • 890
  • 6
  • 16
  • Please check [this](https://ai.stackexchange.com/a/29977/18758) answer – hanugm Oct 01 '21 at 11:35
  • @hanugm Yes, that is the case in the first layer for a color image, a separate kernel used for each color. Still, each kernel is 2 dimensional. And if we have an additional convolutional layer with 16 filter, we still have 2 dimensional kernels, but now 16 of them. – serali Oct 01 '21 at 11:58
  • 1
    Worth noting that for efficiency purposes a Conv2D layer is very likely running a 3D convolution operation when feeding forward. This is used to sum over the input channels. However, it is still useful to think of this as conceptually a sum of 2D convolutions when looking from the perspective of signal processing from the input. – Neil Slater Oct 08 '21 at 13:40
1

The dimensionality used to discuss convolutional layers in CNNs is based on the dimensionality of the input without considering channels.

  • 1D CNNs might process raw audio sources (mono or stereo), text sequences, IR spectrometry from a single sample point
  • 2D CNNs can process photographic images (regardless of colour/depth etc information), audio spectrograms, grid-based board games
  • 3D CNNs can process voxels from Minecraft, image sequences from videos etc

It is often possible to perform signal processing that changes dimensions of signal sources. Whether that adds "channels" or adds a dimension can be a matter of convenience to fit a particular approach. In terms of defining a n-dimensional array, then the addition of channels is just another dimension. In terms of considering signal processing performed in CNNs, we care about the distinction between channels and the rest of the space that the signal exists in.

One way to decide whether something is considered a channel or a CNN layer dimension is whether there is an ordering or metric that consistently separates measurements over that dimension. If a metric such as space, time or frequency applies, then that dimension can be considered part of the "core" dimensionality that defines the problem, whilst a more arbitrary set of features (e.g. each entry in the vector embedding of a word) is more channel-like.

As standard CNN design involves summing over all input channels to create each output feature/channel, which is mathematically the same as increasing the convolution dimension (when the kernel size in that dimension matches to the number of channels), then in practice the convolution operation implemented in a CNN layer of a particular dimensionality can be one dimension size higher. E.g. a layer class labelled "Conv1D" will perform a 2D convolution operation, with the added dimension size matching exactly to the number of input channels. However, conceptually it makes sense to view this as a sum of lower-dimension convolutions, because of the need to exactly match the dimension size. This extra dimension is seen as a convenience for calculation, and not part of the definition.

Neil Slater
  • 28,678
  • 3
  • 38
  • 60