10

I am training a neural network where the target data is a vector of angles in radians (between $0$ and $2\pi$).

I am looking for study material on how to encode this data.

Can you supply me with a book or research paper that covers this topic comprehensively?

nbro
  • 39,006
  • 12
  • 98
  • 176
user366312
  • 351
  • 1
  • 12
  • 1
    Many suggestions here: https://stats.stackexchange.com/questions/565038/why-doesnt-mean-square-error-work-in-case-of-angular-data/565057#565057 – Sycorax Nov 27 '22 at 22:19

2 Answers2

17

The main problem with simply using the values $\alpha \in [0, 2\pi]$ is that semantically $0 = 2\pi$, but numerically $0$ and $2\pi$ are maximally far apart. A common way to encode this is by a vector of $\sin$ and $\cos$. It perfectly conveys the fact that $0 = 2\pi$, because:

$$ \begin{bmatrix} \sin(0)\\ \cos(0) \end{bmatrix} = \begin{bmatrix} \sin(2\pi)\\ \cos(2\pi) \end{bmatrix} $$

This encoding essentially maps the angle values onto the 2D unit circle. In order to decode this, you can calculate $$\text{atan}2(a_1, a_2) = \alpha,$$

where $a_1 = \sin(\alpha)$ and $a_2 = \cos(\alpha)$.

Here is a nice detailed explanation and here are two references, where this is applied:

EDIT As it was noted in the comments: The values $\sin(\alpha)$ and $\cos(\alpha)$ are not independent and the following naturally holds: $\sqrt{\sin(\alpha)^2 + \cos(\alpha)^2}= 1$, i.e. the euclidean norm is one. In a situation where your Neural Network predicts the sin and cos values, this condition isn't necessarily true. Therefore, you should consider adding a regularization term to the loss that guides the neural network toward outputting valid values (with unit norm) which could look like this:

$$ r_\lambda\left(\hat{y}_1, \hat{y}_2\right)\; = \lambda \left(\; 1 - \sqrt{\hat{y}_1^2 + \hat{y}_2^2}\right), $$

where $\hat{y}_1$ and $\hat{y}_2$ are the sin and cos outputs of the network respectively and $\lambda$ is a scalar that weights the regularization term against the loss. I found this paper where such a regularization term is used (s. Sec. 3.2) to get valid quaternions (Quaternions must also have unit norm). They found that many values work for $\lambda$ and they settle for $\lambda = 0.1$

Chillston
  • 1,501
  • 5
  • 11
  • 1
    Do you know if there is a general approach for that? The closest thing that comes to my mind are group convolutional networks. Personally though, I would prefer a simpler, less number crunch heavy method – Imago Nov 27 '22 at 13:12
  • I'm not sure what exactly you are looking for when you ask for a general approach - on the one hand, there is general angular data (which you can encode with the method described). And then there is the context of those values (i.e. the domain). If the domain involves rotational symmetry then you should look into group equivariant convolutional networks (which is probably what you mean by group convolutional network, right?). Without knowing the domain, it is not clear if group equivariant convolutions are a reasonable thing to use. What is the dataset you are using? – Chillston Nov 27 '22 at 13:31
  • 1
    You only need to use $sin(\alpha)$ (or $ccos(\alpha)$) as the label, right? No need (and undesirable) to have both as multilabel outputs given their strict dependence. – Snehal Patel Nov 27 '22 at 15:23
  • 1
    Actually, you should use both otherwise you would get singularities, e.g. for $\sin$ this would happen at $\sin(0) = \sin(\pi) = \sin(2\pi)$. Whereas, if you encode it as the tuple, you get a unique vector for all $\alpha \in [0, 2\pi)$ with the property that at $0$ and $2\pi$ the encoding is the same. You are right that the values are dependent. I've often seen that there is a regularization term in NN training like this: $\parallel [x_1, x_2]^T \parallel_2 = 1$, which guides the network toward outputting valid numbers for the two outputs $x_1, x_2$ where the norm is one. – Chillston Nov 27 '22 at 16:45
  • 1
    Thanks for pointing that out @SnehalPatel, I added the regularization thing to the answer – Chillston Nov 27 '22 at 17:02
  • 1
    @Chillston I didn't have a specific data set in set. I was wondering how one would handle the case, when the given data has some symmetries, structure or invariances towards specific transformation. In case of angular data as above, a neural network can learn to treat the regions are around an angle 0 and angle 2pi equally, however it is not guaranteed that the network will pick up on that structure. So, how can one simplify this process in general, as done with the angle problem above? – Imago Nov 30 '22 at 14:57
  • @Imago The general form of that would be group-specific neural networks. So when you have a symmetry w.r.t a specific group, then you can design neural operations that obey these symmetries. For 2D rotation invariant data, you'd want a kernel that itself is invariant to the orthogonal group O(2). However, this is a different concept to what the question is about. Proper angle encoding doesn't necessarily mean that you have rotational symmetry. Maybe you find the [protobook about Geometric Deep Learning](https://geometricdeeplearning.com/) a good read (or you already know it). – Chillston Dec 01 '22 at 12:49
0

You might want to look at the von Mises Distribution, it defines a probability distribution over angles.

See Pattern Recognition and Machine Learning, Christopher Bishop, Appendix B (pg 693), or alternatively Wikipedia has an article on this.

You could certainly use this as a loss function in a neural network. My only reservation is that the input parameter $\theta_o$ is itself periodic which might not play well with standard neural network architectures? Therefore previous answers are also worth considering.

I only mention this as if you are interested in distributions over angles, the von Mises distribution is something you should probably be aware of.

Snehal Patel
  • 912
  • 1
  • 1
  • 25
  • 4
    I'm not sure I understand what you mean by "You could certainly use this as a loss function in a neural network?" Can you provide an explanation of how it the probability density function can be used as a loss function? – Snehal Patel Nov 28 '22 at 14:47
  • 1
    Von Mises distribution defines a probability density. You take the logarithm of this function and then take the negative of that; that is the loss function. This provides what I think is a pretty good explanation of the basic approach: [link](https://goodboychan.github.io/python/coursera/tensorflow_probability/icl/2021/08/19/01-Maximum-likelihood-estimation.html#The-negative-log-likelihood) – Julian Francis Nov 29 '22 at 15:15