So I understand how a language model could scan a large data set like the internet and produce text that mimicked the statistical properties of the input data, eg completing a sentence like "eggs are healthy because ...", or producing text that sounded like the works of a certain author.
However, what I don't get about ChatGPT is that it seems to understand the commands it has been given, even if that command was not part of its training data, and can perform tasks totally separate from extrapolating more data from the given dataset. My (admittedly imperfect) understanding of machine learning doesn't really account for how such a model could follow novel instructions without having some kind of authentic understanding of the intentions of the writer, which ChatGPT seems not to have.
A clear example: if I ask "write me a story about a cat who wants to be a dentist", I'm pretty sure there are zero examples of that in the training data, so even if it has a lot of training data, how does that help it produce an answer that makes novel combinations of the cat and dentist aspects? Eg:
Despite his passion and talent, Max faced many challenges on his journey to become a dentist. For one thing, he was a cat, and most people didn't take him seriously when he told them about his dream. They laughed and told him that only humans could be dentists, and that he should just stick to chasing mice and napping in the sun.
But Max refused to give up. He knew that he had what it takes to be a great dentist, and he was determined to prove everyone wrong. He started by offering his services to his feline friends, who were more than happy to let him work on their teeth. He cleaned and polished their fangs, and he even pulled a few pesky cavities.
In the above text, the bot is writing things about a cat dentist that wouldn't be in any training data stories about cats or any training data stories about dentists.
Similarly, how can any amount of training data on computer code generally help a language model debug novel code examples? If the system isn't actually accumulating conceptual understanding like a person would, what is it accumulating from training data that it is able to solve novel prompts? It doesn't seem possible to me that you could look at the linguistic content of many programs and come away with a function that could map queries to correct explanations unless you were actually modeling conceptual understanding.
Does anyone have a way of understanding this at a high level for someone without extensive technical knowledge?