4

In computer vision is very common to use supervised tasks, where datasets have to be manually annotated by humans. Some examples are object classification (class labels), detection (bounding boxes) and segmentation (pixel-level masks). These datasets are essentially pairs of inputs-outputs which are used to train Convolutional Neural Networks to learn the mapping from inputs to outputs, via gradient descent optimization. But animals don't need anybody to show them bounding boxes or masks on top of things in order for them to learn to detect objects and make sense of the visual world around them. This leads me to think that brains must be performing some sort of self-supervision to train themselves to see.

What does current research say about the learning paradigm used by brains to achieve such an outstanding level of visual competence? Which tasks do brains use to train themselves to be so good at processing visual information and making sense of the visual world around them? Or said in other words: how does the brain manage to train its neural networks without having access to manually annotated datasets like ImageNet, COCO, etc. (i.e. what does the brain use as ground truth, what is the loss function the brain is optimizing)? Finally, can we apply these insights in computer vision?


Update: I posted a related question on Psychology & Neuroscience StackExchange, which I think complements the question I posted here: check it out

Pablo Messina
  • 207
  • 3
  • 8
  • Cross-posted: https://psychology.stackexchange.com/q/21965/11209, https://ai.stackexchange.com/q/11666/1794, https://datascience.stackexchange.com/q/48645/8560. Please [do not post the same question on multiple sites](https://meta.stackexchange.com/q/64068). Each community should have an honest shot at answering without anybody's time being wasted. – D.W. Apr 05 '19 at 17:20
  • Your brain is not a computer (see [this](https://aeon.co/essays/your-brain-does-not-process-information-and-it-is-not-a-computer) or [this](https://psychcentral.com/blog/your-brain-is-not-a-computer/)). Your brain doesn't optimize a loss function. Also, even if I reluctantly accept this analogy, I feel like the visual system has plenty of "external supervision" from the other senses, etc. – sfmiller940 Apr 10 '19 at 22:25
  • @sfmiller940 if the brain is not optimizing a loss, then what mechanism does the brain use to learn new things? Regarding other senses, can you elaborate more on how they exert supervision on the visual system? – Pablo Messina Apr 11 '19 at 03:17
  • @PabloMessina I don't think anybody knows exactly how the brain learns. But I'm not an expert, so I imagine that neuroscientists (or similar) would know much more about what is and isn't known. As far as supervision, let's consider the senses of touch and hearing. If we see a ball, we don't necessarily know if it's a discrete object but, if we can touch it, then we get additional information about it's shape and relation to other objects. Similarly we can hear sounds and voices that can instruct us directly or indirectly. Other senses seem a form of supervision that guides visual learning. – sfmiller940 Apr 11 '19 at 16:59
  • @sfmiller940 "Other senses seem a form of supervision that guides visual learning", probably, but how? – Pablo Messina Apr 11 '19 at 21:00
  • @PabloMessina That's a question more for neuroscientists than data scientists. – sfmiller940 Apr 12 '19 at 22:22

1 Answers1

3

I think you are slightly confusing 2 problems. 1 being classification of meta visual elements and the other being the visual system itself.

Our visual system, when it comes to processing information, has had billions of years of iteration(training), so that at birth(and before), we are already tuned for the processing of visual stimuli, as well as have the mechanisms to decipher objects in our spatial field of view.

These two papers(L1, L2), have a great deal of information about the evolution of our visual system and its processing. The second speculates on the connection of said evolution and the construction of "seeing systems" very interesting.

For further inquiry on this in particular, check out David Marr. He was probably the most influential early computer vision mind. He still is mentioned in many top-down AGI and computer vision research projects to this day.

hisairnessag3
  • 1,235
  • 5
  • 15
  • Do you mean babies are born right off the bat already identifying people, dogs, cats, mountains, water, intuitive physics, predicting object trajectories, etc.? Or do their brains learn this over time instead? If your answer to the last question is yes, then how do brains manage to find training examples to train their neural networks without having access to manually annotated datasets like ImageNet, COCO, etc.? – Pablo Messina Apr 05 '19 at 13:56
  • @PabloMessina while tests on infants are difficult for obvious reasons, yes. We're born with ingrained classification functionality and can identify spatial forms after and likely before we are out of the womb. Physics is evolutionary as well(at least for examples relevant to what our ancestors would of faced, lions, snakes, etc.). Some things get learned over time as well, like object permanence. Trying to compare our visual system to convolutional neural networks is a very rough analogy and usually is only used for instructive reasons. Our visual system is more likely than not convolutional. – hisairnessag3 Apr 05 '19 at 14:09
  • Likely not convolutional* – hisairnessag3 Apr 07 '19 at 06:03
  • Hey @hisairnessag3, your answer motivated me to ask [this question](https://psychology.stackexchange.com/questions/21971/is-the-visual-cortex-of-a-newborn-baby-immediately-capable-of-object-detection-o) in Psychology & Neuroscience, feel free to answer it if you want – Pablo Messina Apr 07 '19 at 14:26