I am interested in what insights can be gained about the mathematical class of auto-regressive encoder-decoders (LLMs), by comparing them to topological neural networks.
Specifically, I am looking for similarities and differences in their structures, behaviors, and mathematical properties.
In the context of this question, an LLM is a type of neural network that is designed to generate sequences of data, such as sentences in a language. It does this by learning to predict the next item in a sequence based on the previous items.
Any insights, references, or resources that could help clarify this would be greatly appreciated.
References: