There are many ways to decide how many words to include in the dictionary. I made an experiment where I selected words that appeared in corpus 10 or more times and words that appeared 40 or more times. After decoding, I got similar WERs. So, at the end I decided to go with the dictionary that has less words.
Future work: Check the difference in decoding speed.
Advice: Don't choose counts that are very similar for your experiment. Bad numbers would be count > 10 and count > 15. It is best to choose numbers that are far apart like 10 and 40.
Comments
Post a Comment