How to figure out which words have the same meaning in two different languages?

Question

Imagine two languages that have only these words:

Man = 1,
deer = 2, 
eat = 3,
grass = 4

And you would form all sentences possible from these words:

Man eats deer.
Deer eats grass.
Man eats.
Deer eats.

German:

Mensch = 5,
Gras = 6, 
isst = 7, 
Hirsch = 8

Possible german sentences:

Mensch isst Hirsch.
Hirsch isst Gras.
Mensch isst.
Hirsch isst.

How would you write a program that would figure out which words have the same meaning in English and German?

It is possible.

All words get their meaning from the information in which sentences they can be used. The connection with other words defines their meaning.

We need to write a program that would recognize that a word is connected to other words in the same way in both languages. Then it would know those two words must have the same meaning.

If we take the word "deer" (2) it has this structure in English

1-3-2
2-3-4

In german (8):

5-6-8
8-6-7

We get the same structure (pattern) in both languages: both 8 and 2 lie in first and last position, and the middle word is the same in both languages, the other word is different in both languages. So we can conclude that 8=2 because both elements are connected with other elements the same way.

Maybe we just need to write a very good program for recognizing analogies and we will be on the right track to creating AI?

This is similar to the way that [example-based machine translation](https://en.wikipedia.org/wiki/Example-based_machine_translation) systems are designed. — Anderson Green, Jan 25 '18 at 08:23
These are advanced levels of NLP and a lot of research is going on this matter...No researcher has able to come up with a flawless theory which describe how humans develop language among themselves....So writing such programs still remain a distant dream.... According to me all the nlp programs are kind of superficial in nature — , Jan 25 '18 at 11:14
Welcome to AI! My sense is that language development is partly creative, so it may require algorithms that can think abstractly in a general semantic context. Combinatorially speaking, I think you'd also need to include ["grass eats", "grass eats deer", "deer eats man"] and give a distinct value to "eats" as a verb, to distinguish from nouns (possibly this could be approached logarithmically.) You'll also have to contend with poetic constructions "Deer, man eats", etc., nonsense such as "deer eats man", and abstractions "grass eats man" (in the sense of "dust to dust", as fertilizer.) — DukeZhou, Jan 25 '18 at 18:33
Yes, we can add all those metaphorical sentences. Basically every sentence that is syntatically correct and not total nonsense. But for the sake of simplicity I left them out. Program would work that way anyway that it would count frequency of sentences used in large amounts of texts. Your examples would have very small frequency and so connections between those words wouldn't have that much meaning for the definition of word as would frequent sentences. — Tone Škoda, Jan 27 '18 at 21:56

score 2 · Answer 1 · answered Jan 28 '20 at 19:22

You are implying that such ideas are novel, and that such tools do not exist. But the idea is very popular, and there are numerous tools.

We need to write a program that would recognize that a word is connected to other words in the same way in both language. Then it would know those two words must have the same meaning.

You are describing the essence of known natural language processing (NLP) tasks such as word alignment (link words in different languages that have the same meaning) and, of course, machine translation.

While learning a machine translation model, we actually do discover which words (or parts of words, or sequences of words) in different languages have the same meaning.

Here are some concepts I would recommend for further study of this subject:

Word alignment, an example for a well-known and popular tool would be fast_align
Word embeddings, word2vec is a widely used tool
Modern machine translation with sequence-to-sequence models, well-known tools are fairseq, or Sockeye

score 1 · Answer 2 · answered Jan 25 '18 at 11:07

1

Isn't this what already Word2Vec and other word-embedding techniques already use. You know your word by the company it keeps is an idea that has been around for some time now.

answered Jan 25 '18 at 11:07

Krishna Sangeeth K S

111
2

Never heard of it. I'll read about it. – Tone Škoda Jan 27 '18 at 22:07

score -1 · Answer 3 · answered Jan 28 '20 at 05:49

-1

It is wrong to assume that just the connections to other words define their meaning.

Give an AI a hundred novels and it would still not know what the word "cat" means.

Show the AI a picture of a cat with the word "cat" underneath it and it would know straight away.

In this way an AI needs to know a minimum number of words through experience other than combinations of other words. From then it may be able to deduce meanings of new words.

Just like, if I gave you a hundred novels in Chinese you would never be able to understand Chinese. I show you a picture book in Chinese and maybe you have a chance.

answered Jan 28 '20 at 05:49

zooby

2,196
1
11
21

If your theory was true then blind people wouldn't understand words. – Tone Škoda Jan 28 '20 at 15:21
My apologies, but this answer is completely inaccurate. It is absolutely true that words can be defined by their context, most modern language technology tools are based on this idea (word embeddings). It is not true that in order to learn the meaning of words they need to be presented together with images. – Mathias Müller Jan 28 '20 at 19:14
@tone nonsense blind people use their senses like touch and sound to associate the word cat with cats – zooby Jan 28 '20 at 22:33
@mathias. It is true that after you have learnt a minimum number of words you can use these to define other words... But first these minimum words beed to be learnt from association with the real world... Unless u are learning a second language. Also word emedding ai have no concept of what a cat is. – zooby Jan 28 '20 at 22:35
U haven't heard of Helen Mirren? She had no hearing or sight, she had touch but she used it to communicate, not to touch things to understand them. You never saw atom and still know what it is. – Tone Škoda Jan 29 '20 at 07:44
@zooby "It is true that after you have learnt a minimum number of words you can use these to define other words..." - I can only repeat myself: It is not true for modern NLP tools that in order to learn the meaning of words they need to be presented together with images or other kinds of sensory input. – Mathias Müller Jan 29 '20 at 08:12
@zooby "Also word emedding ai have no concept of what a cat is." - Which facts are you basing this observation on? Please back up with some sources. – Mathias Müller Jan 29 '20 at 08:18
I think that this answer can highly be improved by emphasizing the term "meaning", because some people will claim that AI really understands the meaning and some people will say that they perform only pattern recognition (or whatever). In other words, just say that this is also a philosophical issue. – nbro Jan 29 '20 at 11:35
@ToneŠkoda I assume you mean Helen Keller and not the actress XD – Fnguyen Jan 29 '20 at 12:28
yes you are correct :) – Tone Škoda Jan 29 '20 at 13:05

score -1 · Accepted Answer · answered Jan 28 '18 at 15:05

For this example the function below will do: TSAI.Analogies.FindAnalogy(List ex1, List ex2, List ex3, out List ex4) ex1 is to ex2 as ex3 is to ex4. Figure out ex4.

Fill ex4 with values from ex2. For every value in ex3: find out to which positions in ex4 we have to copy this value, based on value in ex1 at the same position that was repeated in ex2.

How to figure out which words have the same meaning in two different languages?

4 Answers4