"$train_cmd"

 

"$train_cmd"


 local/prepare_dict.sh --stage 3 --nj 30 --cmd "$train_cmd" \
  data/local/lm data/local/lm data/local/dict_nosp

The script is taken from run.sh

https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/run.sh

When I first used the script I observed that $train_cmd is never declared in the script.

Later I  learned that we should probably do: export train_cmd=run.pl

and same for decode_cmd, we can do this in run.sh

or 

train_cmd="run.pl --max-jobs-run 4"

decode_cmd="run.pl --max-jobs-run 4"

might be better

to limit how many CPUs it tries to use, and memory requirements.

In current run.sh the variables are declared in cmd.sh file here

https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/cmd.sh

cmd.sh

# you can change cmd.sh depending on what type of queue you are using.
# If you have no queueing system and want to run on a local machine, you
# can change all instances 'queue.pl' to run.pl (but be careful and run
# commands one by one: most recipes will exhaust the memory on your
# machine). queue.pl works with GridEngine (qsub). slurm.pl works
# with slurm. Different queues are configured differently, with different
# queue names and different ways of specifying things like memory;
# to account for these differences you can create and edit the file
# conf/queue.conf to match your queue's configuration. Search for
# conf/queue.conf in http://kaldi-asr.org/doc/queue.html for more information,
# or search for the string 'default_config' in utils/queue.pl or utils/slurm.pl.
export train_cmd="queue.pl --mem 2G"
export decode_cmd="queue.pl --mem 4G"
export mkgraph_cmd="queue.pl --mem 8G"
Please read the comments as I found them very useful. I noticed sometimes if there is a rush, we can remove those variables and run the job immediately. Do this only if no one is actually using machines.


Comments