Tested train_lm.sh with --include-heldout

 

In Kaldi we can create LMs using script train_lm.sh. When building a language using this script one can use a flag --include-heldout. If the the flag is not used kaldi withhelds some data for collecting stats about the data and uses the rest to build LM. If we have very little data we must use this flag.

Tested train_lm.sh with --include-heldout


WITHOUT --include-heldout--include-heldout
4gram-mincount6.7M ngram 1=866,729 ngram 2=309,939 ngram 3=91,535 ngram 4=41,0137.0M ngram 1=866,729 ngram 2=333,290 ngram 3=103,351 ngram 4=47,622
4gram20M ngram 1=866,729 ngram 2=309,939 ngram 3=635,560 ngram 4=813,92622M ngram 1=866,729 ngram 2=333,290 ngram 3=698,607 ngram 4=903,329
3gram-mincount6.2M ngram 1=866,729 ngram 2=309,939 ngram 3=91,5356.5M ngram 1=866,729 ngram 2=333,290 ngram 3=103,351
3-gram12M/ ngram 1=866,729 ngram 2=309,939 ngram 3=676,55913M ngram 1=866,729 ngram 2=333,290 ngram 3=742,706



used prune_lm.sh

WITHOUT --include-heldout LM/lm_pr6.0.gz/ngrams_disc_pr6.0.gz--include-heldout LM/lm_pr6.0.gz size/ngrams_disc_pr6.0.gz
4gram-mincount6.7M/3.4M/318K head lm_pr6.0_copy ngram 1=866,729 ngram 2=31,046 ngram 3=8,212 ngram 4=8567.0M/3.5M/346K head lm_pr6.0_copy ngram 1=866,729 ngram 2=34,343 ngram 3=9,637 ngram 4=1,047
4gram20M22M
3gram-mincount6.2M/3.4M/290K head lm_pr6.0_copy ngram 1=866,729 ngram 2=28,901 ngram 3=7,6816.5M/3.4M/315K head lm_pr6.0_copy ngram 1=866,729 ngram 2=32,055 ngram 3=9,024
3-gram12M13M

Comments