2

Most of the practical research in AI that includes neural networks deals with higher dimensional tensors. It is easy to imagine tensors up to three dimensions.

When I ask the question How do researchers imagine vector space? on Mathematics Stack exchange, you can read the responses

Response #1:

I personally view vector spaces as just another kind of algebraic object that we sometimes do analysis with, along the lines of groups, rings, and fields.

Response #2

In research mathematics, linear algebra is used mostly as a fundamental tool, often in settings where there is no geometric visualization available. In those settings, it is used in the same way that basic algebra is, to do straightforward calculations.

Response #3:

Thinking of vectors as tuples or arrows or points and arrows... is rather limiting. I generally do not bother imagining anything visual or specific about them beyond what is required by the definition... they are objects that I can add to one another and that I can "stretch" and "reverse" by multiplying by a scalar from the scalar field.

In concise, mathematicians generally treat vectors as objects in vector space rather than popular academic/beginner imaginations such as points or arrows in space.

A similar question on our site also recommends not to imagine higher dimensions and to treat dimensions as degrees of freedom.

I know only two kinds of treatments regarding tensors:

  1. Imagining at most up to three-dimensional tensors spatially.

  2. Treating tensors as objects having shape attribute which looks like $n_1 \times n_2 \times n_3 \times \cdots n_d$

Most of the time I prefer the first approach. But I am feeling difficulty with the first approach when I try to understand codes (programs) that use higher dimensional tensors. I am not habituated with the second approach although I think it is capable enough to understand all the required tasks on tensors.

I want to know:

  • How do researchers generally treat tensors?
  • If it is the second approach I mentioned: Is it possible to understand all the high dimensional tensor-related tasks?
hanugm
  • 3,571
  • 3
  • 18
  • 50
  • I am curious why, while you posted [How do researchers imagine vector space?](https://math.stackexchange.com/questions/4185297/how-do-researchers-imagine-vector-space) to Math SE (where it arguably belongs indeed), you decided to post a similar question about a different mathematical object (tensor) *here* instead of there as well. – desertnaut Aug 04 '21 at 19:13
  • 2
    @desertnaut It is because I found the usage of tensors in AI at some places are difficult for me to understand. Suppose [Unsqueeze](https://pytorch.org/docs/stable/generated/torch.unsqueeze.html) operation in PyTorch. These type of operations on tensors may be new for mathematicians. So, I choose to ask in our site. – hanugm Aug 04 '21 at 22:47
  • I agree. Understanding and manipulating tensors is a crucial skill, specially for working with Neural Networks. – Andre Goulart Aug 23 '21 at 18:32

1 Answers1

3

I would say they are treated as multidimensional arrays of numbers. They are not visualized in their actual dimension. Sometimes small ones will be visualized when someone is trying to explain a concept that requires it.

You may have, for example, a variable uint8 training_batch[100][200][400][3];. This is a batch of 100 RGB images with 200x400 pixels in each image. A pixel is an array of [3] numbers; an image is an array of [200][400] pixels; a batch is an array of [100] images. There's no more structure than that. You don't have to try to imagine a 4D array of numbers. (In this particular case you could easily imagine an array of images though)

What is useful to imagine is what each dimension means. The first dimension is the image within the batch. The 2nd and 3rd dimensions are the pixel position in the image. The 4th dimension is the R/G/B channel.

If I reduce a tensor along a dimension, I wouldn't think of it as flattening, but rather as using up a dimension. If I want to compute the average colour of each image, I reduce the 2nd and 3rd dimensions and get another tensor of shape [100][3]. Now there's no width or height dimension anymore, just image and channel.

If you reshape the vector to [100][240000] so you can compute a matrix multiplication for a dense layer, now the 1st dimension is still the batch number, and the 2nd dimension is essentially meaningless but you have 240000 arbitrarily-indexed numbers per image. You could also reshape it to [100][80000][3] and have 80000 arbitrarily-indexed pixels, but still, be able to use the channel number.

Disclaimer: I'm not actually a researcher.

hanugm
  • 3,571
  • 3
  • 18
  • 50
user253751
  • 922
  • 3
  • 11