I can't speak to wit.ai specifically, but I can tell you a little bit about how similar applications work. Specifically, I can talk a bit about Apache Stanbol which also converts free text into structured data. That said, I should prefix this by saying there isn't just one way to "get there from here." Many techniques could be part of a stack for accomplishing this goal.
Anyway, in the case of Stanbol, they run the text through multiple processing engines, sequentially, with different engines affecting the final output. One engine simply does Named Entity Recognition using OpenNLP. This identities discrete named "things" - people, places, companies, etc. Another engine does entity matching with a pre-established database of entities - specifically (in the out-of-the-box configuration) a dump of entities from DBPedia. Where a match is found, the text from the original input is assigned to the entity. In the case of a collision, it assigns a weight to the mapping so any downstream consumers can use probabilistic techniques to select the "correct" mapping.
There are, of course, more details that I left out. Before NER can happen there is parsing and tokenizing and other NLP activities. But a big part of the basic process is doing NER and then doing the entity matching.
And in the case of Stanbol, you can add your own entities and corresponding structured data, as well as your own engines. So, for example, if you wanted to write an engine based on neural networks / deep learning, and plug that in, you could.