How to compare word segmentation methods?

Asked Jun 06 '23 at 12:30

Active Jun 09 '23 at 06:24

Viewed 9 times

I am comparing a few methods of word segmentation in artificial language without dictionary and "golden" segmentation. Let's say, idolikecats is splitted by three different algorithms to i do like cats, ido li kecats and ido lik cat s.

Is there a measure to compare the quality of the segmentation?
Is it a good idea to compare it with perplexity on the char level?
Would it require building an n-gram model for each segmentation method on the training set and comparing results from the test set?

edited Jun 09 '23 at 06:24

Hiren Namera

asked Jun 06 '23 at 12:30

dobrowol

How to compare word segmentation methods?

0 Answers0