Building a Statistical Language Model for ASR with Kaldi Script

 Building a Statistical Language Model.

the number on the RHS is the "backoff penalty" that says if you have just seen "job good->", how much of the probability mass belongs to the lower-level history, i.e. the state "good->". If there is no such history, e.g. there no n-grams "job arises -> X", there is no need for this number.

Example for a 3gram with perplexity numbers on the left. Note 3gram has bigrams in them.

-4.19626 job arises
-4.19537 job workers'
-3.11480 job good -0.10237

https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/local/lm/train_lm.sh


Comments