I think the issue here is that the chatbots you're using aren't very good at "short-term memory". What I mean by is that the bots construct responses that are slowly and incrementally tuned according to the overall usage of the chat bot, from every user. The bots are responding to each message based on how a new user would expect them to. As Alan1 notes, "Men are Man". It's making this response based solely off your single most recent message.
Instead, you are looking for a bot who focuses moreso on persistent memory of the individual conversation. The problem here becomes you're now almost asking for a Natural Language Parser, a big problem many people are working on and something that's years away from existing as robustly as you suggest.
The chat bot not only has to recognize the words 'Socrates', 'Name', and 'Dog'; but that in this sentence, it's the dog's name that is Socrates. That's a lot of information to gain beyond just the words. Which is why from a server / implementation standpoint, the above method is also a lot easier to program (every message just query your server, no need to maintain state - that is, memory of the conversation).
The chat bots can't possibly get enough information from one person to train how to speak and respond, so they 'crowd-source' that information for training. But that means that Clever Bot (or any similar caliber chat bot) won't respond in terms of parsing the meaning of what you're asking.
Taking this even further, one can consider the notion of such a program being Turing-Complete. Supposing we had a chat bot like you're suggesting, we could perhaps show equivalence to a Turing-Machine, or even perhaps show we can do something like decide the halting problem. Off the top of my head I imagine the procedure being basically showing you would be able to decide halting given initial conditions. E.g. Given "Socrates is a man" and "All men die" can we decide if the chat bot will ever be able to deduce if Socrates dies?
I'll work on a formal proof from the latter and post it if it works out.