I want to make a kind of robotic brain, i.e. a big neural network, which includes an NLP model (for understanding human voice), real-time object recognition system (so that it can identify particular objects), a face recognition model (for identifying faces), etc.
Is possible to build a huge neural network in which we can combine all these separate models together, so we can use all 3 model's capabilities at same time in parallel?
For example, if I ask the robot, using the microphone, "Can you see that table or that boy?", the robot would start recognizing the objects and faces, then answer me back by speaking if it could identify them or not.
If this is possible, can you kindly share your idea how can I implement this? Or is there any better to make such AI (e.g. in TensorFlow)?