Is there a benefit of putting one sentence per line when building LMs for ASR?

 We must put one sentence per line because it affects the end-of-sentence (EOS) probability.

There are many tools out there that can break down text to one sentence per line.

I tried BlingFire tokenizer and liked it a lot.

Tutorial was found here: https://towardsdatascience.com/pre-processing-a-wikipedia-dump-for-nlp-model-training-a-write-up-3b9176fdf67


Comments