I am comparing a few methods of word segmentation in artificial language without dictionary and "golden" segmentation.
Let's say, idolikecats
is splitted by three different algorithms to i do like cats
, ido li kecats
and ido lik cat s
.
- Is there a measure to compare the quality of the segmentation?
- Is it a good idea to compare it with perplexity on the char level?
- Would it require building an n-gram model for each segmentation method on the training set and comparing results from the test set?