We need trained ASR model to decode audio files. This tutorial investigates what is inside exp
folder that holds the trained ASR model.
wget http://www.kaldi-asr.org/models/13/0013_librispeech_v1_chain.tar.gz
wget http://www.kaldi-asr.org/models/13/0013_librispeech_v1_extractor.tar.gz
I chose Librispeech ASR model to decode some audio files.
Step 2: Extract tar.gz files
[ec2-user@ip-172-31-6-113 ~]$ ls
0013_librispeech_v1_chain.tar.gz 0013_librispeech_v1_extractor.tar.gz kaldi trash
[ec2-user@ip-172-31-6-113 ~]$
We got exp
folder
[ec2-user@ip-172-31-6-113 ~]$ tar -xvzf 0013_librispeech_v1_chain.tar.gz
exp/
exp/chain_cleaned/
exp/chain_cleaned/tdnn_1d_sp/
exp/chain_cleaned/tdnn_1d_sp/phones.txt
exp/chain_cleaned/tdnn_1d_sp/num_jobs
exp/chain_cleaned/tdnn_1d_sp/phone_lm.fst
exp/chain_cleaned/tdnn_1d_sp/tree
exp/chain_cleaned/tdnn_1d_sp/0.trans_mdl
exp/chain_cleaned/tdnn_1d_sp/den.fst
exp/chain_cleaned/tdnn_1d_sp/normalization.fst
exp/chain_cleaned/tdnn_1d_sp/cmvn_opts
exp/chain_cleaned/tdnn_1d_sp/final.ie.id
exp/chain_cleaned/tdnn_1d_sp/lda.mat
exp/chain_cleaned/tdnn_1d_sp/srand
exp/chain_cleaned/tdnn_1d_sp/accuracy.report
exp/chain_cleaned/tdnn_1d_sp/configs/
exp/chain_cleaned/tdnn_1d_sp/configs/network.xconfig
exp/chain_cleaned/tdnn_1d_sp/configs/xconfig
exp/chain_cleaned/tdnn_1d_sp/configs/xconfig.expanded.1
exp/chain_cleaned/tdnn_1d_sp/configs/xconfig.expanded.2
exp/chain_cleaned/tdnn_1d_sp/configs/init.config
exp/chain_cleaned/tdnn_1d_sp/configs/ref.config
exp/chain_cleaned/tdnn_1d_sp/configs/final.config
exp/chain_cleaned/tdnn_1d_sp/configs/init.raw
exp/chain_cleaned/tdnn_1d_sp/configs/ref.raw
exp/chain_cleaned/tdnn_1d_sp/configs/vars
exp/chain_cleaned/tdnn_1d_sp/configs/lda.mat
exp/chain_cleaned/tdnn_1d_sp/0.mdl
exp/chain_cleaned/tdnn_1d_sp/final.mdl
[ec2-user@ip-172-31-6-113 ~]$
Extract second tar.gz file
tar -xvzf 0013_librispeech_v1_extractor.tar.gz
[ec2-user@ip-172-31-6-113 ~]$ tar -xvzf 0013_librispeech_v1_extractor.tar.gz
exp/
exp/nnet3_cleaned/
exp/nnet3_cleaned/extractor/
exp/nnet3_cleaned/extractor/final.dubm
exp/nnet3_cleaned/extractor/final.mat
exp/nnet3_cleaned/extractor/global_cmvn.stats
exp/nnet3_cleaned/extractor/splice_opts
exp/nnet3_cleaned/extractor/online_cmvn.conf
exp/nnet3_cleaned/extractor/num_jobs
exp/nnet3_cleaned/extractor/final.ie
exp/nnet3_cleaned/extractor/final.ie.id
exp/nnet3_cleaned/extractor/10.ie
[ec2-user@ip-172-31-6-113 ~]$
The trained ASR model is in final.mdl
[ec2-user@ip-172-31-6-113 ~]$ tree exp
exp
├── chain_cleaned
│ └── tdnn_1d_sp
│ ├── 0.mdl
│ ├── 0.trans_mdl
│ ├── accuracy.report
│ ├── cmvn_opts
│ ├── configs
│ │ ├── final.config
│ │ ├── init.config
│ │ ├── init.raw
│ │ ├── lda.mat -> ../lda.mat
│ │ ├── network.xconfig
│ │ ├── ref.config
│ │ ├── ref.raw
│ │ ├── vars
│ │ ├── xconfig
│ │ ├── xconfig.expanded.1
│ │ └── xconfig.expanded.2
│ ├── den.fst
│ ├── final.ie.id
│ ├── final.mdl
│ ├── lda.mat
│ ├── normalization.fst
│ ├── num_jobs
│ ├── phone_lm.fst
│ ├── phones.txt
│ ├── srand
│ └── tree
└── nnet3_cleaned
└── extractor
├── 10.ie
├── final.dubm
├── final.ie -> 10.ie
├── final.ie.id
├── final.mat
├── global_cmvn.stats
├── num_jobs
├── online_cmvn.conf
└── splice_opts
5 directories, 34 files
[ec2-user@ip-172-31-6-113 ~]$
librispeech ASR model size, below some idea of what is in each folder.
[ec2-user@ip-172-31-6-113 ~]$ du -sh exp/
251M exp/
[ec2-user@ip-172-31-6-113 ~]$ cd exp/
[ec2-user@ip-172-31-6-113 exp]$ ls
chain_cleaned nnet3_cleaned
[ec2-user@ip-172-31-6-113 exp]$
[ec2-user@ip-172-31-6-113 exp]$ cd chain_cleaned/
[ec2-user@ip-172-31-6-113 chain_cleaned]$ ls
tdnn_1d_sp
[ec2-user@ip-172-31-6-113 chain_cleaned]$ cd tdnn_1d_sp/
[ec2-user@ip-172-31-6-113 tdnn_1d_sp]$ ls
0.mdl accuracy.report configs final.ie.id lda.mat num_jobs phones.txt tree
0.trans_mdl cmvn_opts den.fst final.mdl normalization.fst phone_lm.fst srand
[ec2-user@ip-172-31-6-113 tdnn_1d_sp]$ ls -alh
total 172M
drwxr-xr-x 3 ec2-user ec2-user 263 Feb 2 2020 .
drwxr-xr-x 3 ec2-user ec2-user 24 Feb 2 2020 ..
-rw-r--r-- 1 ec2-user ec2-user 80M Feb 2 2020 0.mdl
-rw-r--r-- 1 ec2-user ec2-user 829K Feb 2 2020 0.trans_mdl
-rw-r--r-- 1 ec2-user ec2-user 22K Feb 2 2020 accuracy.report
-rw-r--r-- 1 ec2-user ec2-user 37 Feb 2 2020 cmvn_opts
drwxr-xr-x 2 ec2-user ec2-user 211 Feb 2 2020 configs
-rw-r--r-- 1 ec2-user ec2-user 2.7M Feb 2 2020 den.fst
-rw-r--r-- 1 ec2-user ec2-user 33 Feb 2 2020 final.ie.id
-rw-r--r-- 1 ec2-user ec2-user 81M Feb 2 2020 final.mdl
-rw-r--r-- 1 ec2-user ec2-user 190K Feb 2 2020 lda.mat
-rw-r--r-- 1 ec2-user ec2-user 3.0M Feb 2 2020 normalization.fst
-rw-r--r-- 1 ec2-user ec2-user 3 Feb 2 2020 num_jobs
-rw-r--r-- 1 ec2-user ec2-user 2.6M Feb 2 2020 phone_lm.fst
-rw-r--r-- 1 ec2-user ec2-user 3.2K Feb 2 2020 phones.txt
-rw-r--r-- 1 ec2-user ec2-user 1 Feb 2 2020 srand
-rw-r--r-- 1 ec2-user ec2-user 1.7M Feb 2 2020 tree
[ec2-user@ip-172-31-6-113 tdnn_1d_sp]$
phones.txt
has 364 phones
[ec2-user@ip-172-31-6-113 tdnn_1d_sp]$ wc -l phones.txt
364 phones.txt
[ec2-user@ip-172-31-6-113 tdnn_1d_sp]$
phones.txt shows first 10 phones
[ec2-user@ip-172-31-6-113 tdnn_1d_sp]$ head phones.txt
<eps> 0
SIL 1
SIL_B 2
SIL_E 3
SIL_I 4
SIL_S 5
SPN 6
SPN_B 7
SPN_E 8
SPN_I 9
[ec2-user@ip-172-31-6-113 tdnn_1d_sp]$
configs folder
[ec2-user@ip-172-31-6-113 configs]$ ls -alh
total 80M
drwxr-xr-x 2 ec2-user ec2-user 211 Feb 2 2020 .
drwxr-xr-x 3 ec2-user ec2-user 263 Feb 2 2020 ..
-rw-r--r-- 1 ec2-user ec2-user 22K Feb 2 2020 final.config
-rw-r--r-- 1 ec2-user ec2-user 493 Feb 2 2020 init.config
-rw-r--r-- 1 ec2-user ec2-user 230 Feb 2 2020 init.raw
lrwxrwxrwx 1 ec2-user ec2-user 10 Feb 2 2020 lda.mat -> ../lda.mat
-rw-r--r-- 1 ec2-user ec2-user 3.1K Feb 2 2020 network.xconfig
-rw-r--r-- 1 ec2-user ec2-user 23K Feb 2 2020 ref.config
-rw-r--r-- 1 ec2-user ec2-user 80M Feb 2 2020 ref.raw
-rw-r--r-- 1 ec2-user ec2-user 45 Feb 2 2020 vars
-rw-r--r-- 1 ec2-user ec2-user 3.4K Feb 2 2020 xconfig
-rw-r--r-- 1 ec2-user ec2-user 4.9K Feb 2 2020 xconfig.expanded.1
-rw-r--r-- 1 ec2-user ec2-user 5.1K Feb 2 2020 xconfig.expanded.2
[ec2-user@ip-172-31-6-113 configs]$
final.config
[ec2-user@ip-172-31-6-113 configs]$ head final.config
# This file was created by the command:
# steps/nnet3/xconfig_to_configs.py --xconfig-file exp/chain_cleaned/tdnn_1d_sp/configs/network.xconfig --config-dir exp/chain_cleaned/tdnn_1d_sp/configs/
# It contains the entire neural network.
input-node name=ivector dim=100
input-node name=input dim=40
component name=lda type=FixedAffineComponent matrix=exp/chain_cleaned/tdnn_1d_sp/configs/lda.mat
component-node name=lda component=lda input=Append(Offset(input, -1), input, Offset(input, 1), ReplaceIndex(ivector, t, 0))
component name=tdnn1.affine type=NaturalGradientAffineComponent input-dim=220 output-dim=1536 max-change=0.75 l2-regularize=0.008
component-node name=tdnn1.affine component=tdnn1.affine input=lda
[ec2-user@ip-172-31-6-113 configs]$
[ec2-user@ip-172-31-6-113 configs]$ head -20 network.xconfig
input dim=100 name=ivector
input dim=40 name=input
# please note that it is important to have input layer with the name=input
# as the layer immediately preceding the fixed-affine-layer to enable
# the use of short notation for the descriptor
fixed-affine-layer name=lda input=Append(-1,0,1,ReplaceIndex(ivector, t, 0)) affine-transform-file=exp/chain_cleaned/tdnn_1d_sp/configs/lda.mat
# the first splicing is moved before the lda layer, so no splicing here
relu-batchnorm-dropout-layer name=tdnn1 l2-regularize=0.008 dropout-proportion=0.0 dropout-per-dim=true dropout-per-dim-continuous=true dim=1536
tdnnf-layer name=tdnnf2 l2-regularize=0.008 dropout-proportion=0.0 bypass-scale=0.75 dim=1536 bottleneck-dim=160 time-stride=1
tdnnf-layer name=tdnnf3 l2-regularize=0.008 dropout-proportion=0.0 bypass-scale=0.75 dim=1536 bottleneck-dim=160 time-stride=1
tdnnf-layer name=tdnnf4 l2-regularize=0.008 dropout-proportion=0.0 bypass-scale=0.75 dim=1536 bottleneck-dim=160 time-stride=1
tdnnf-layer name=tdnnf5 l2-regularize=0.008 dropout-proportion=0.0 bypass-scale=0.75 dim=1536 bottleneck-dim=160 time-stride=0
tdnnf-layer name=tdnnf6 l2-regularize=0.008 dropout-proportion=0.0 bypass-scale=0.75 dim=1536 bottleneck-dim=160 time-stride=3
tdnnf-layer name=tdnnf7 l2-regularize=0.008 dropout-proportion=0.0 bypass-scale=0.75 dim=1536 bottleneck-dim=160 time-stride=3
tdnnf-layer name=tdnnf8 l2-regularize=0.008 dropout-proportion=0.0 bypass-scale=0.75 dim=1536 bottleneck-dim=160 time-stride=3
tdnnf-layer name=tdnnf9 l2-regularize=0.008 dropout-proportion=0.0 bypass-scale=0.75 dim=1536 bottleneck-dim=160 time-stride=3
tdnnf-layer name=tdnnf10 l2-regularize=0.008 dropout-proportion=0.0 bypass-scale=0.75 dim=1536 bottleneck-dim=160 time-stride=3
tdnnf-layer name=tdnnf11 l2-regularize=0.008 dropout-proportion=0.0 bypass-scale=0.75 dim=1536 bottleneck-dim=160 time-stride=3
[ec2-user@ip-172-31-6-113 configs]$
xconfig
[ec2-user@ip-172-31-6-113 configs]$ head xconfig
# This file was created by the command:
# steps/nnet3/xconfig_to_configs.py --xconfig-file exp/chain_cleaned/tdnn_1d_sp/configs/network.xconfig --config-dir exp/chain_cleaned/tdnn_1d_sp/configs/
# It is a copy of the source from which the config files in # this directory were generated.
input dim=100 name=ivector
input dim=40 name=input
# please note that it is important to have input layer with the name=input
# as the layer immediately preceding the fixed-affine-layer to enable
# the use of short notation for the descriptor
[ec2-user@ip-172-31-6-113 configs]$
Comments
Post a Comment