Typically the unigram probs are just all the words in the dictionary,
but it can depend whether the word-list or dictionary was passed into the LM-building script.
If nothing was passed in, unigrams might just be whatever was in the training data but it's normal to pass in the word-list.
It's easy to check the number of unigrams, it's at the top of the arpa file.
Comments
Post a Comment