When we build the LM using train_lm.sh script with -include_heldout flag, to prune we should use prune_lm.sh.
As it will take advantage of the way the LM was built.
The other way of doing it would be using SRILM but use prune_lm.sh script for best results if you used train_lm.sh to build it.
SRILM way of doing it:
ngram -lm $lm_dir/lm_4gram.arpa.gz -prune 1.1e-8 -write-lm $lm_dir/lm_pruned_11e8.gz
Kaldi way of doing it that is recommended is
prune_lm.sh --arpa 120.0 $lm_dir/4gram-mincount
Comments
Post a Comment