Do the lexicon words in dict folder become unigrams?

Typically the unigram probs are just all the words in the dictionary,

but it can depend whether the word-list or dictionary was passed into the LM-building script.

If nothing was passed in, unigrams might just be whatever was in the training data but it's normal to pass in the word-list.

It's easy to check the number of unigrams, it's at the top of the arpa file.

Comments