6

YouTube has a huge amount of videos, many of which also containing various spoken languages. This would presumably provide something like the data that a "challenged" baby would experience - "challenged" meaning a baby without arms or legs (unfortunately many people are born that way).

Would this not allow unsupervised learning in a deep learning system that has both vision and audio capabilities? The neural network would presumably learn correlations between words and images, and could perhaps even learn rudimentary language skills, all without human supervision. I believe that the individual components to do this already exist.

Has this been tried, and if not, why?

god of llamas
  • 332
  • 1
  • 6
Wolphram jonny
  • 284
  • 2
  • 14
  • In this day and age, you can expect someone has already done what you want to do. – FreezePhoenix Apr 09 '18 at 22:57
  • 1
    The underlying issue seems to be that integrated circuits aren't nearly as powerful as the human brain, and we're not even sure how the brain can store and correlate all of that data. I usually recommend taking a quick look at computational complexity theory, to get a sense of the problems sizes for problems where all of the parameters are known and information is perfect and complete. Compare to nature, where this is not the case. It's a very challenging problem. Good question, though. Welcome to AI! – DukeZhou Apr 10 '18 at 20:18
  • you should edit your question if your phrasing is not the way you intend it to be – k.c. sayz 'k.c sayz' Apr 14 '18 at 00:50
  • @k.c.sayz'k.csayz' Feel free to edit it, I will reverse it only in case the edited meaning disagrees with the intended one – Wolphram jonny Apr 14 '18 at 15:51
  • check https://arxiv.org/pdf/1112.6209.pdf – thecomplexitytheorist Jul 16 '18 at 16:42

3 Answers3

2

Yes, it is possible, and yes, it probably has been done before. Odds are, however, the person(s) who tried were disappointed with the results and forgot to tell others.

The reasons they might be disappointed could be any of the following:

  • Took to long to train
  • Even when fully trained, (or appeared to be), it did not give the expected output
  • Too sensitive to variations in the input
FreezePhoenix
  • 422
  • 3
  • 20
2

Yes! Unsupervised machine learning has absolutely been applied to youtube videos... To recognize cats!

Here's an article about it in wired. One of the leading ML researchers was Andrew Ng.

Charles
  • 281
  • 2
  • 6
2

Answer is quite yes, please have a look what Google did around this:

Google Cloud Video Intelligence makes videos searchable, and discoverable, by extracting metadata with an easy to use REST API. You can now search every moment of every video file in your catalog. It quickly annotates videos stored in Google Cloud Storage, and helps you identify key entities (nouns) within your video; and when they occur within the video.

https://cloud.google.com/video-intelligence/

So, Google does recognize all kinds of data from the video: it classifies the whole content of it to tags.

What about Humanoid Robot Sophia?

Cameras within Sophia's eyes combined with computer algorithms allow her to see. She can follow faces, sustain eye contact, and recognize individuals. She is able to process speech and have conversations using a natural language subsystem.

https://en.wikipedia.org/wiki/Sophia_(robot)

These intent to the direction to understand (Google) and produce (Sophia) language from sounds and images. To learn think by themselves, machines are still not ready. If you get into these two cases more you would see that these are quite mechanical and manual (requiring human pre-effort) things still.

It is said that machines are now on a phases of toddler who can ask names for things around her and name them. Take some years more, maybe the abilities are more advanced ;)

edit:

You asked about unsupervised learning. There is a video about speech of MIT researcher, who made experiments on text and images and in final notes he denoted that it would be nice to make the same with videos, actually with the same reasoning you had: to learn a language. He promised to keep that in mind with his colleagues, maybe some of them already work on that.

edit2:

Interesting research paper on the topic was on this link :

We address the problem of automatically learning the main steps to complete a certain task, such as changing a car tire, from a set of narrated instruction videos. The con- tributions of this paper are three-fold. [..] Third, we experimentally demonstrate that the proposed method can automatically discover, in an unsupervised manner, the main steps to achieve the task and locate the steps in the input videos.

mico
  • 927
  • 1
  • 8
  • 16