3

Suppose I have context-free grammar (CFG) $L$ that pumps out words in that language. I want a machine learning system that can detect if a word $w$ came from $L$ or not. It has access to a stream from $L$ that is constantly producing words in $L$ at random. In this case, the system can only be trained with positive examples. The system also has access to a verifier for $L$. The system (if it chooses to) can generate strings and verify that the string is in $L$ or not. However, the system could just cheat and just use the verifier, so during the test phase, it gets disabled (during the learning phase, it uses the verifier as much as it wants). Moreover, $L$ can be augmented and changed, and the system should adapt to that change.

For instance, let's consider this regex (a regex is a CFG): $$(aa)*$$

This language only produces a string of $aa$'s of even length. Suppose I modified the regex to $(aa|bb)*$. This system should adapt to recognize this new language.

What kind of methods/approaches should I consider in my design?

nbro
  • 39,006
  • 12
  • 98
  • 176
lee
  • 141
  • 5

1 Answers1

2

The process of automatically learning a grammar from examples of a language is called grammar induction. Since you mention that L can be "augmented and changed", it might be feasible to solve this problem using an adaptive parser.

  • Has this been applied to NLP purposes, after POS tagging? – lee Mar 27 '18 at 19:22
  • Grammar induction has also been [used in some machine-translation systems](https://www.google.com/search?q=%22grammar+induction%22+machine+translation), – Anderson Green Mar 29 '18 at 22:24
  • Do you happen to have any idea how successful these methods were compared to the typical approaches? – lee Mar 30 '18 at 15:17