Solution:
turns out my disk is full. Will resize the volume and will run again.
ubuntu@ip-172-31-6-144:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 16G 0 16G 0% /dev
tmpfs 3.1G 104M 3.0G 4% /run
/dev/nvme0n1p1 388G 388G 0 100% /
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/loop1 56M 56M 0 100% /snap/core18/2253
/dev/loop0 32M 32M 0 100% /snap/snapd/11036
/dev/loop3 56M 56M 0 100% /snap/core18/1988
/dev/loop2 25M 25M 0 100% /snap/amazon-ssm-agent/4046
/dev/loop5 44M 44M 0 100% /snap/snapd/14295
/dev/loop4 34M 34M 0 100% /snap/amazon-ssm-agent/3552
tmpfs 3.1G 0 3.1G 0% /run/user/1000
ubuntu@ip-172-31-6-144:~$
Got the error below when running LibriSpeesh script on AWS g4dn.2xlarge instance.
showing bottom few lines of the output:
run.pl: Error opening log file exp/chain_cleaned/tdnn_1d_sp/egs/log/get_egs.100.log at /home/ubuntu/kaldi/egs/librispeech/s5/utils/run.pl line 275.
run.pl: Error opening log file exp/chain_cleaned/tdnn_1d_sp/egs/log/get_egs.99.log at /home/ubuntu/kaldi/egs/librispeech/s5/utils/run.pl line 275.
run.pl: 76 / 100 failed, log is in exp/chain_cleaned/tdnn_1d_sp/egs/log/get_egs.*.log
Traceback (most recent call last):
File "steps/nnet3/chain/train.py", line 644, in main
train(args, run_opts)
File "steps/nnet3/chain/train.py", line 405, in train
stage=args.egs_stage)
File "steps/libs/nnet3/train/chain_objf/acoustic_model.py", line 118, in generate_chain_egs
egs_opts=egs_opts if egs_opts is not None else ''))
File "steps/libs/common.py", line 129, in execute_command
p.returncode, command))
Exception: Command exited with status 1: steps/nnet3/chain/get_egs.sh --frames-overlap-per-eg 0 --constrained false --cmd "run.pl --max-jobs-run 8" --cmvn-opts "--norm-means=false --norm-vars=false" --online-ivector-dir "exp/nnet3_cleaned/ivectors_train_960_cleaned_sp_hires" --left-context 41 --right-context 41 --left-context-initial -1 --right-context-final -1 --left-tolerance '5' --right-tolerance '5' --frame-subsampling-factor 3 --alignment-subsampling-factor 3 --stage -10 --frames-per-iter 2500000 --frames-per-eg 150,110,100 --srand 0 data/train_960_cleaned_sp_hires exp/chain_cleaned/tdnn_1d_sp exp/chain_cleaned/tri6b_cleaned_train_960_cleaned_sp_lats exp/chain_cleaned/tdnn_1d_sp/egs
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5$ exit
the line 405 in train.py
# this is where get_egs.sh is called.
chain_lib.generate_chain_egs(
dir=args.dir, data=args.feat_dir,
lat_dir=args.lat_dir, egs_dir=default_egs_dir,
left_context=egs_left_context,
right_context=egs_right_context,
left_context_initial=egs_left_context_initial,
right_context_final=egs_right_context_final,
run_opts=run_opts,
left_tolerance=args.left_tolerance,
right_tolerance=args.right_tolerance,
frame_subsampling_factor=args.frame_subsampling_factor,
alignment_subsampling_factor=args.alignment_subsampling_factor,
frames_per_eg_str=args.chunk_width,
srand=args.srand,
egs_opts=args.egs_opts,
cmvn_opts=args.cmvn_opts,
online_ivector_dir=args.online_ivector_dir,
frames_per_iter=args.frames_per_iter,
stage=args.egs_stage) #line 405
Comments
Post a Comment