As you know, an LSTM language model takes in the past word and tries to predict the new one and continue over a loop. A sentence is divided into tokens and depending on different method, the tokens are divided differently. Some model maybe character based models which simply uses each character as input and output. In this case you can treat punctuation as one character and just run the model as normal. For word based model which is commonly used in many systems, we treat punctuation as it's own token. It is commonly called a end of sentence token. There is also a specific token for end of output. This makes the system knows when to finish and stop prediction.
Also, just so you know for language model trying to generate original text, they feed the output as the input of the next data point, but the output they choose is not necessarily the one with the best accuracy. They set a threshold and choose upon that. It can introduce diversity to the language model so taht even though the staring word is the same, the sentence/paragraph will be different and not the same one again and again.
For some state-of-the-art models, you can try GPT-2 as mentioned by @jdleoj23 . This is a character based(actually byte based but basically the same, it treats each unicode symbol individually) model that uses attention and transformers. The advantage of character based system is that even inputs that have spelling errors can be inputted into the model and new words not in the dictionary can be introduced.
However if you want to learn more about how language model works, and not just striving for the best performance, you should try implementing one simple one by yourself. You can try following this article which uses keras to make a language model.
https://machinelearningmastery.com/develop-word-based-neural-language-models-python-keras/
The advantage of making a simple one is taht you can actually understand the encoding process, the tokenization process, the model underneath and others instead of relying on other people's code. The article uses keras Tokenizer but you could try writing your own using regex and simple string processing.
Hope my help is useful for you.