P3: Librispeech ASR Chain 1d

We need trained ASR model to decode audio files. This tutorial investigates what is inside exp folder that holds the trained ASR model.

https://kaldi-asr.org/models.html


Step 1: Get Librispeech ASR Chain 1d and Librispeech i-vector extractor to ASW EC2 instance
wget http://www.kaldi-asr.org/models/13/0013_librispeech_v1_chain.tar.gz
wget http://www.kaldi-asr.org/models/13/0013_librispeech_v1_extractor.tar.gz

I chose Librispeech ASR model to decode some audio files.


Step 2: Extract tar.gz files

[ec2-user@ip-172-31-6-113 ~]$ ls
0013_librispeech_v1_chain.tar.gz 0013_librispeech_v1_extractor.tar.gz kaldi trash
[ec2-user@ip-172-31-6-113 ~]$

We got exp folder

[ec2-user@ip-172-31-6-113 ~]$ tar -xvzf 0013_librispeech_v1_chain.tar.gz  
exp/
exp/chain_cleaned/
exp/chain_cleaned/tdnn_1d_sp/
exp/chain_cleaned/tdnn_1d_sp/phones.txt
exp/chain_cleaned/tdnn_1d_sp/num_jobs
exp/chain_cleaned/tdnn_1d_sp/phone_lm.fst
exp/chain_cleaned/tdnn_1d_sp/tree
exp/chain_cleaned/tdnn_1d_sp/0.trans_mdl
exp/chain_cleaned/tdnn_1d_sp/den.fst
exp/chain_cleaned/tdnn_1d_sp/normalization.fst
exp/chain_cleaned/tdnn_1d_sp/cmvn_opts
exp/chain_cleaned/tdnn_1d_sp/final.ie.id
exp/chain_cleaned/tdnn_1d_sp/lda.mat
exp/chain_cleaned/tdnn_1d_sp/srand
exp/chain_cleaned/tdnn_1d_sp/accuracy.report
exp/chain_cleaned/tdnn_1d_sp/configs/
exp/chain_cleaned/tdnn_1d_sp/configs/network.xconfig
exp/chain_cleaned/tdnn_1d_sp/configs/xconfig
exp/chain_cleaned/tdnn_1d_sp/configs/xconfig.expanded.1
exp/chain_cleaned/tdnn_1d_sp/configs/xconfig.expanded.2
exp/chain_cleaned/tdnn_1d_sp/configs/init.config
exp/chain_cleaned/tdnn_1d_sp/configs/ref.config
exp/chain_cleaned/tdnn_1d_sp/configs/final.config
exp/chain_cleaned/tdnn_1d_sp/configs/init.raw
exp/chain_cleaned/tdnn_1d_sp/configs/ref.raw
exp/chain_cleaned/tdnn_1d_sp/configs/vars
exp/chain_cleaned/tdnn_1d_sp/configs/lda.mat
exp/chain_cleaned/tdnn_1d_sp/0.mdl
exp/chain_cleaned/tdnn_1d_sp/final.mdl
[ec2-user@ip-172-31-6-113 ~]$

Extract second tar.gz file

tar -xvzf 0013_librispeech_v1_extractor.tar.gz
[ec2-user@ip-172-31-6-113 ~]$ tar -xvzf 0013_librispeech_v1_extractor.tar.gz 
exp/
exp/nnet3_cleaned/
exp/nnet3_cleaned/extractor/
exp/nnet3_cleaned/extractor/final.dubm
exp/nnet3_cleaned/extractor/final.mat
exp/nnet3_cleaned/extractor/global_cmvn.stats
exp/nnet3_cleaned/extractor/splice_opts
exp/nnet3_cleaned/extractor/online_cmvn.conf
exp/nnet3_cleaned/extractor/num_jobs
exp/nnet3_cleaned/extractor/final.ie
exp/nnet3_cleaned/extractor/final.ie.id
exp/nnet3_cleaned/extractor/10.ie
[ec2-user@ip-172-31-6-113 ~]$

The trained ASR model is in final.mdl

[ec2-user@ip-172-31-6-113 ~]$ tree exp
exp
├── chain_cleaned
│   └── tdnn_1d_sp
│       ├── 0.mdl
│       ├── 0.trans_mdl
│       ├── accuracy.report
│       ├── cmvn_opts
│       ├── configs
│       │   ├── final.config
│       │   ├── init.config
│       │   ├── init.raw
│       │   ├── lda.mat -> ../lda.mat
│       │   ├── network.xconfig
│       │   ├── ref.config
│       │   ├── ref.raw
│       │   ├── vars
│       │   ├── xconfig
│       │   ├── xconfig.expanded.1
│       │   └── xconfig.expanded.2
│       ├── den.fst
│       ├── final.ie.id
│       ├── final.mdl
│       ├── lda.mat
│       ├── normalization.fst
│       ├── num_jobs
│       ├── phone_lm.fst
│       ├── phones.txt
│       ├── srand
│       └── tree
└── nnet3_cleaned
  └── extractor
      ├── 10.ie
      ├── final.dubm
      ├── final.ie -> 10.ie
      ├── final.ie.id
      ├── final.mat
      ├── global_cmvn.stats
      ├── num_jobs
      ├── online_cmvn.conf
      └── splice_opts

5 directories, 34 files
[ec2-user@ip-172-31-6-113 ~]$

librispeech ASR model size, below some idea of what is in each folder.


[ec2-user@ip-172-31-6-113 ~]$ du -sh exp/
251M exp/
[ec2-user@ip-172-31-6-113 ~]$ cd exp/
[ec2-user@ip-172-31-6-113 exp]$ ls
chain_cleaned nnet3_cleaned
[ec2-user@ip-172-31-6-113 exp]$
[ec2-user@ip-172-31-6-113 exp]$ cd chain_cleaned/
[ec2-user@ip-172-31-6-113 chain_cleaned]$ ls
tdnn_1d_sp
[ec2-user@ip-172-31-6-113 chain_cleaned]$ cd tdnn_1d_sp/
[ec2-user@ip-172-31-6-113 tdnn_1d_sp]$ ls
0.mdl       accuracy.report configs final.ie.id lda.mat           num_jobs     phones.txt tree
0.trans_mdl cmvn_opts       den.fst final.mdl   normalization.fst phone_lm.fst srand
[ec2-user@ip-172-31-6-113 tdnn_1d_sp]$ ls -alh
total 172M
drwxr-xr-x 3 ec2-user ec2-user  263 Feb  2  2020 .
drwxr-xr-x 3 ec2-user ec2-user   24 Feb  2  2020 ..
-rw-r--r-- 1 ec2-user ec2-user 80M Feb  2  2020 0.mdl
-rw-r--r-- 1 ec2-user ec2-user 829K Feb  2  2020 0.trans_mdl
-rw-r--r-- 1 ec2-user ec2-user 22K Feb  2  2020 accuracy.report
-rw-r--r-- 1 ec2-user ec2-user   37 Feb  2  2020 cmvn_opts
drwxr-xr-x 2 ec2-user ec2-user  211 Feb  2  2020 configs
-rw-r--r-- 1 ec2-user ec2-user 2.7M Feb  2  2020 den.fst
-rw-r--r-- 1 ec2-user ec2-user   33 Feb  2  2020 final.ie.id
-rw-r--r-- 1 ec2-user ec2-user 81M Feb  2  2020 final.mdl
-rw-r--r-- 1 ec2-user ec2-user 190K Feb  2  2020 lda.mat
-rw-r--r-- 1 ec2-user ec2-user 3.0M Feb  2  2020 normalization.fst
-rw-r--r-- 1 ec2-user ec2-user    3 Feb  2  2020 num_jobs
-rw-r--r-- 1 ec2-user ec2-user 2.6M Feb  2  2020 phone_lm.fst
-rw-r--r-- 1 ec2-user ec2-user 3.2K Feb  2  2020 phones.txt
-rw-r--r-- 1 ec2-user ec2-user    1 Feb  2  2020 srand
-rw-r--r-- 1 ec2-user ec2-user 1.7M Feb  2  2020 tree
[ec2-user@ip-172-31-6-113 tdnn_1d_sp]$

phones.txt has 364 phones

[ec2-user@ip-172-31-6-113 tdnn_1d_sp]$ wc -l phones.txt 
364 phones.txt
[ec2-user@ip-172-31-6-113 tdnn_1d_sp]$

phones.txt shows first 10 phones

[ec2-user@ip-172-31-6-113 tdnn_1d_sp]$ head phones.txt 
<eps> 0
SIL 1
SIL_B 2
SIL_E 3
SIL_I 4
SIL_S 5
SPN 6
SPN_B 7
SPN_E 8
SPN_I 9
[ec2-user@ip-172-31-6-113 tdnn_1d_sp]$

configs folder

[ec2-user@ip-172-31-6-113 configs]$ ls -alh
total 80M
drwxr-xr-x 2 ec2-user ec2-user  211 Feb  2  2020 .
drwxr-xr-x 3 ec2-user ec2-user  263 Feb  2  2020 ..
-rw-r--r-- 1 ec2-user ec2-user 22K Feb  2  2020 final.config
-rw-r--r-- 1 ec2-user ec2-user  493 Feb  2  2020 init.config
-rw-r--r-- 1 ec2-user ec2-user  230 Feb  2  2020 init.raw
lrwxrwxrwx 1 ec2-user ec2-user   10 Feb  2  2020 lda.mat -> ../lda.mat
-rw-r--r-- 1 ec2-user ec2-user 3.1K Feb  2  2020 network.xconfig
-rw-r--r-- 1 ec2-user ec2-user 23K Feb  2  2020 ref.config
-rw-r--r-- 1 ec2-user ec2-user 80M Feb  2  2020 ref.raw
-rw-r--r-- 1 ec2-user ec2-user   45 Feb  2  2020 vars
-rw-r--r-- 1 ec2-user ec2-user 3.4K Feb  2  2020 xconfig
-rw-r--r-- 1 ec2-user ec2-user 4.9K Feb  2  2020 xconfig.expanded.1
-rw-r--r-- 1 ec2-user ec2-user 5.1K Feb  2  2020 xconfig.expanded.2
[ec2-user@ip-172-31-6-113 configs]$

final.config

[ec2-user@ip-172-31-6-113 configs]$ head final.config  
# This file was created by the command:
# steps/nnet3/xconfig_to_configs.py --xconfig-file exp/chain_cleaned/tdnn_1d_sp/configs/network.xconfig --config-dir exp/chain_cleaned/tdnn_1d_sp/configs/
# It contains the entire neural network.

input-node name=ivector dim=100
input-node name=input dim=40
component name=lda type=FixedAffineComponent matrix=exp/chain_cleaned/tdnn_1d_sp/configs/lda.mat
component-node name=lda component=lda input=Append(Offset(input, -1), input, Offset(input, 1), ReplaceIndex(ivector, t, 0))
component name=tdnn1.affine type=NaturalGradientAffineComponent input-dim=220 output-dim=1536  max-change=0.75 l2-regularize=0.008
component-node name=tdnn1.affine component=tdnn1.affine input=lda
[ec2-user@ip-172-31-6-113 configs]$

network.xconfig

[ec2-user@ip-172-31-6-113 configs]$ head -20 network.xconfig 
input dim=100 name=ivector
input dim=40 name=input

 # please note that it is important to have input layer with the name=input
 # as the layer immediately preceding the fixed-affine-layer to enable
 # the use of short notation for the descriptor
fixed-affine-layer name=lda input=Append(-1,0,1,ReplaceIndex(ivector, t, 0)) affine-transform-file=exp/chain_cleaned/tdnn_1d_sp/configs/lda.mat

 # the first splicing is moved before the lda layer, so no splicing here
relu-batchnorm-dropout-layer name=tdnn1 l2-regularize=0.008 dropout-proportion=0.0 dropout-per-dim=true dropout-per-dim-continuous=true dim=1536
tdnnf-layer name=tdnnf2 l2-regularize=0.008 dropout-proportion=0.0 bypass-scale=0.75 dim=1536 bottleneck-dim=160 time-stride=1
tdnnf-layer name=tdnnf3 l2-regularize=0.008 dropout-proportion=0.0 bypass-scale=0.75 dim=1536 bottleneck-dim=160 time-stride=1
tdnnf-layer name=tdnnf4 l2-regularize=0.008 dropout-proportion=0.0 bypass-scale=0.75 dim=1536 bottleneck-dim=160 time-stride=1
tdnnf-layer name=tdnnf5 l2-regularize=0.008 dropout-proportion=0.0 bypass-scale=0.75 dim=1536 bottleneck-dim=160 time-stride=0
tdnnf-layer name=tdnnf6 l2-regularize=0.008 dropout-proportion=0.0 bypass-scale=0.75 dim=1536 bottleneck-dim=160 time-stride=3
tdnnf-layer name=tdnnf7 l2-regularize=0.008 dropout-proportion=0.0 bypass-scale=0.75 dim=1536 bottleneck-dim=160 time-stride=3
tdnnf-layer name=tdnnf8 l2-regularize=0.008 dropout-proportion=0.0 bypass-scale=0.75 dim=1536 bottleneck-dim=160 time-stride=3
tdnnf-layer name=tdnnf9 l2-regularize=0.008 dropout-proportion=0.0 bypass-scale=0.75 dim=1536 bottleneck-dim=160 time-stride=3
tdnnf-layer name=tdnnf10 l2-regularize=0.008 dropout-proportion=0.0 bypass-scale=0.75 dim=1536 bottleneck-dim=160 time-stride=3
tdnnf-layer name=tdnnf11 l2-regularize=0.008 dropout-proportion=0.0 bypass-scale=0.75 dim=1536 bottleneck-dim=160 time-stride=3
[ec2-user@ip-172-31-6-113 configs]$

xconfig

[ec2-user@ip-172-31-6-113 configs]$ head xconfig 
# This file was created by the command:
# steps/nnet3/xconfig_to_configs.py --xconfig-file exp/chain_cleaned/tdnn_1d_sp/configs/network.xconfig --config-dir exp/chain_cleaned/tdnn_1d_sp/configs/
# It is a copy of the source from which the config files in # this directory were generated.

input dim=100 name=ivector
input dim=40 name=input

# please note that it is important to have input layer with the name=input
# as the layer immediately preceding the fixed-affine-layer to enable
# the use of short notation for the descriptor
[ec2-user@ip-172-31-6-113 configs]$

Comments