Trying to understand the difference between GMM-based system and NNET parts.
All the recipes start off with the GMM-based systems and later do the nnet parts.
GMM system gives a best alignment in speech processing. My understanding is that we get 100 frames per second and for each frame we map audio to phones. [? need checking]. Alignments tell us for each frame which phone was active.
LibriSpeech ASR training starts with training GMM system, tri1, tri2b etc... GMM system has different stages corresponding to different types of feature transforms, different phases of alignments.
ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5/exp$ ls
chain_cleaned mono nnet3_cleaned tri1_ali_10k tri2b_ali_10k tri3b_ali_clean_100 tri4b_ali_clean_460 tri5b_ali_960 tri6b_ali_cleaned tri6b_cleaned_ali_train_960_cleaned_sp
make_mfcc mono_ali_5k tri1 tri2b tri3b tri4b tri5b tri6b tri6b_cleaned tri6b_cleaned_work
ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5/exp$ cd tri6b_cleaned
exp/tri6b_cleaned
final.mdl
file.
ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5/exp/tri6b_cleaned$ ls
35.alimdl ali.29.gz ali.51.gz ali.74.gz ali.97.gz final.mat fsts.29.gz fsts.51.gz fsts.74.gz fsts.97.gz trans.21 trans.44 trans.67 trans.9
35.mdl ali.3.gz ali.52.gz ali.75.gz ali.98.gz final.mdl fsts.3.gz fsts.52.gz fsts.75.gz fsts.98.gz trans.22 trans.45 trans.68 trans.90
35.occs ali.30.gz ali.53.gz ali.76.gz ali.99.gz final.occs fsts.30.gz fsts.53.gz fsts.76.gz fsts.99.gz trans.23 trans.46 trans.69 trans.91
ali.1.gz ali.31.gz ali.54.gz ali.77.gz cmvn_opts fsts.1.gz fsts.31.gz fsts.54.gz fsts.77.gz full.mat trans.24 trans.47 trans.7 trans.92
ali.10.gz ali.32.gz ali.55.gz ali.78.gz decode_dev_clean_fglarge fsts.10.gz fsts.32.gz fsts.55.gz fsts.78.gz graph_tgsmall trans.25 trans.48 trans.70 trans.93
ali.100.gz ali.33.gz ali.56.gz ali.79.gz decode_dev_clean_tglarge fsts.100.gz fsts.33.gz fsts.56.gz fsts.79.gz log trans.26 trans.49 trans.71 trans.94
ali.11.gz ali.34.gz ali.57.gz ali.8.gz decode_dev_clean_tgmed fsts.11.gz fsts.34.gz fsts.57.gz fsts.8.gz num_jobs trans.27 trans.5 trans.72 trans.95
ali.12.gz ali.35.gz ali.58.gz ali.80.gz decode_dev_clean_tgsmall fsts.12.gz fsts.35.gz fsts.58.gz fsts.80.gz phones.txt trans.28 trans.50 trans.73 trans.96
ali.13.gz ali.36.gz ali.59.gz ali.81.gz decode_dev_clean_tgsmall.si fsts.13.gz fsts.36.gz fsts.59.gz fsts.81.gz questions.int trans.29 trans.51 trans.74 trans.97
ali.14.gz ali.37.gz ali.6.gz ali.82.gz decode_dev_other_fglarge fsts.14.gz fsts.37.gz fsts.6.gz fsts.82.gz questions.qst trans.3 trans.52 trans.75 trans.98
ali.15.gz ali.38.gz ali.60.gz ali.83.gz decode_dev_other_tglarge fsts.15.gz fsts.38.gz fsts.60.gz fsts.83.gz splice_opts trans.30 trans.53 trans.76 trans.99
ali.16.gz ali.39.gz ali.61.gz ali.84.gz decode_dev_other_tgmed fsts.16.gz fsts.39.gz fsts.61.gz fsts.84.gz trans.1 trans.31 trans.54 trans.77 tree
ali.17.gz ali.4.gz ali.62.gz ali.85.gz decode_dev_other_tgsmall fsts.17.gz fsts.4.gz fsts.62.gz fsts.85.gz trans.10 trans.32 trans.55 trans.78
ali.18.gz ali.40.gz ali.63.gz ali.86.gz decode_dev_other_tgsmall.si fsts.18.gz fsts.40.gz fsts.63.gz fsts.86.gz trans.100 trans.33 trans.56 trans.79
ali.19.gz ali.41.gz ali.64.gz ali.87.gz decode_test_clean_fglarge fsts.19.gz fsts.41.gz fsts.64.gz fsts.87.gz trans.11 trans.34 trans.57 trans.8
ali.2.gz ali.42.gz ali.65.gz ali.88.gz decode_test_clean_tglarge fsts.2.gz fsts.42.gz fsts.65.gz fsts.88.gz trans.12 trans.35 trans.58 trans.80
ali.20.gz ali.43.gz ali.66.gz ali.89.gz decode_test_clean_tgmed fsts.20.gz fsts.43.gz fsts.66.gz fsts.89.gz trans.13 trans.36 trans.59 trans.81
ali.21.gz ali.44.gz ali.67.gz ali.9.gz decode_test_clean_tgsmall fsts.21.gz fsts.44.gz fsts.67.gz fsts.9.gz trans.14 trans.37 trans.6 trans.82
ali.22.gz ali.45.gz ali.68.gz ali.90.gz decode_test_clean_tgsmall.si fsts.22.gz fsts.45.gz fsts.68.gz fsts.90.gz trans.15 trans.38 trans.60 trans.83
ali.23.gz ali.46.gz ali.69.gz ali.91.gz decode_test_other_fglarge fsts.23.gz fsts.46.gz fsts.69.gz fsts.91.gz trans.16 trans.39 trans.61 trans.84
ali.24.gz ali.47.gz ali.7.gz ali.92.gz decode_test_other_tglarge fsts.24.gz fsts.47.gz fsts.7.gz fsts.92.gz trans.17 trans.4 trans.62 trans.85
ali.25.gz ali.48.gz ali.70.gz ali.93.gz decode_test_other_tgmed fsts.25.gz fsts.48.gz fsts.70.gz fsts.93.gz trans.18 trans.40 trans.63 trans.86
ali.26.gz ali.49.gz ali.71.gz ali.94.gz decode_test_other_tgsmall fsts.26.gz fsts.49.gz fsts.71.gz fsts.94.gz trans.19 trans.41 trans.64 trans.87
ali.27.gz ali.5.gz ali.72.gz ali.95.gz decode_test_other_tgsmall.si fsts.27.gz fsts.5.gz fsts.72.gz fsts.95.gz trans.2 trans.42 trans.65 trans.88
ali.28.gz ali.50.gz ali.73.gz ali.96.gz final.alimdl fsts.28.gz fsts.50.gz fsts.73.gz fsts.96.gz trans.20 trans.43 trans.66 trans.89
ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5/exp/tri6b_cleaned$
The final GMM system final.mdl
is used to start training Neural Net model that can be found in exp/chain_cleaned/tdnn_1d_sp
Comments
Post a Comment