0

I am comparing a few methods of word segmentation in artificial language without dictionary and "golden" segmentation. Let's say, idolikecats is splitted by three different algorithms to i do like cats, ido li kecats and ido lik cat s.

  1. Is there a measure to compare the quality of the segmentation?
  2. Is it a good idea to compare it with perplexity on the char level?
  3. Would it require building an n-gram model for each segmentation method on the training set and comparing results from the test set?
Hiren Namera
  • 406
  • 5
  • 14
dobrowol
  • 1
  • 1

0 Answers0