What is CUDA and decoding with Kaldi?

What is CUDA and decoding with Kaldi?

CUDA is an extension of C++ that we can use to program GPU cards. NVIDIA guys contributed some code to Kaldi that we can use to decode on GPUs. I think the normal decoding Kaldi scripts decode on CPUs. I personally never tried decoding on GPUs but hoping to try it soon.

How can I evaluate in the future which method is better?

  • Is latency the same?
  • Do I have a WER degradation?

Would want WER stay the same as WER in the batch mode [current decoding scripts]

Graph search is done on CPU and current CPU decoding is slow.

Kaldi has two different ways of decoding online and offline. First one, online decoding means when the speech transmitted chunk by chunk to the server. Second one, offline decoding when we have audio available to us. When we do online decoding latency is very important. For offline decoding the method is slightly different and we normally get better results.

Future project:

  • time the decoding in batch mode using CPUs
  • time the decoding where neural net evaluation done on GPU and graph search is done on CPU.
  • time the decoding by decoding on GPUs
  • compare latency and WERs for all three models [word error rates]

Training is different from decoding. Training scripts typically use more than one GPU’s. When training Librispeech model I noticed that scripts were written for the situations when one uses more than one GPU. At that time I only had one GPU. With very little code modification I could train Librispeech model on one GPU.

Here is my blog about it:

https://npovey.github.io/jekyll/update/2022/01/10/L.html


Comments