"$train_cmd"
local/prepare_dict.sh --stage 3 --nj 30 --cmd "$train_cmd" \
data/local/lm data/local/lm data/local/dict_nosp
The script is taken from run.sh
https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/run.sh
When I first used the script I observed that $train_cmd is never declared in the script.
Later I learned that we should probably do: export train_cmd=run.pl
and same for decode_cmd, we can do this in run.sh
or
train_cmd="run.pl --max-jobs-run 4"
decode_cmd="run.pl --max-jobs-run 4"
might be better
to limit how many CPUs it tries to use, and memory requirements.
In current run.sh the variables are declared in cmd.sh file here
https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/cmd.sh
cmd.sh
# you can change cmd.sh depending on what type of queue you are using.
# If you have no queueing system and want to run on a local machine, you
# can change all instances 'queue.pl' to run.pl (but be careful and run
# commands one by one: most recipes will exhaust the memory on your
# machine). queue.pl works with GridEngine (qsub). slurm.pl works
# with slurm. Different queues are configured differently, with different
# queue names and different ways of specifying things like memory;
# to account for these differences you can create and edit the file
# conf/queue.conf to match your queue's configuration. Search for
# conf/queue.conf in http://kaldi-asr.org/doc/queue.html for more information,
# or search for the string 'default_config' in utils/queue.pl or utils/slurm.pl.
export train_cmd="queue.pl --mem 2G"
export decode_cmd="queue.pl --mem 4G"
export mkgraph_cmd="queue.pl --mem 8G"
Please read the comments as I found them very useful. I noticed sometimes if there is a rush, we can remove those variables and run the job immediately. Do this only if no one is actually using machines.
Comments
Post a Comment