I am currently trying to implement an end-to-end speech recognition system from scratch, that is, without using any of the existing frameworks (like TensorFlow, Keras, etc.). I am building my own library, where I am trying to do a polynomial approximation of functions (like exponential, log, sigmoid, ReLU, etc). I would like to have access to a nice description of the neural networks involved in an end-to-end speech recognition system, where the architecture (the layers, activation functions, etc.) is clearly laid out, so that I can implement it.
I find most of the academic or industry papers citing various previous works, toolkits or papers, making it tedious for me. I am new to the field, so I am having more difficulty, so looking for some help here.