--include-heldout. If the the flag is not used kaldi withhelds some data for collecting stats about the data and uses the rest to build LM. If we have very little data we must use this flag.
--include-heldout
WITHOUT --include-heldout | --include-heldout | |
---|---|---|
4gram-mincount | 6.7M ngram 1=866,729 ngram 2=309,939 ngram 3=91,535 ngram 4=41,013 | 7.0M ngram 1=866,729 ngram 2=333,290 ngram 3=103,351 ngram 4=47,622 |
4gram | 20M ngram 1=866,729 ngram 2=309,939 ngram 3=635,560 ngram 4=813,926 | 22M ngram 1=866,729 ngram 2=333,290 ngram 3=698,607 ngram 4=903,329 |
3gram-mincount | 6.2M ngram 1=866,729 ngram 2=309,939 ngram 3=91,535 | 6.5M ngram 1=866,729 ngram 2=333,290 ngram 3=103,351 |
3-gram | 12M/ ngram 1=866,729 ngram 2=309,939 ngram 3=676,559 | 13M ngram 1=866,729 ngram 2=333,290 ngram 3=742,706 |
WITHOUT --include-heldout LM/lm_pr6.0.gz/ngrams_disc_pr6.0.gz | --include-heldout LM/lm_pr6.0.gz size/ngrams_disc_pr6.0.gz | |
---|---|---|
4gram-mincount | 6.7M/3.4M/318K head lm_pr6.0_copy ngram 1=866,729 ngram 2=31,046 ngram 3=8,212 ngram 4=856 | 7.0M/3.5M/346K head lm_pr6.0_copy ngram 1=866,729 ngram 2=34,343 ngram 3=9,637 ngram 4=1,047 |
4gram | 20M | 22M |
3gram-mincount | 6.2M/3.4M/290K head lm_pr6.0_copy ngram 1=866,729 ngram 2=28,901 ngram 3=7,681 | 6.5M/3.4M/315K head lm_pr6.0_copy ngram 1=866,729 ngram 2=32,055 ngram 3=9,024 |
3-gram | 12M | 13M |
Comments
Post a Comment