Find data to decode using our ASR model, dict and lm that we downloaded previously.
At this point we have exp
, dict
, lm
folders.
[ec2-user@ip-172-31-6-113 ~]$ ls
exp kaldi trash
[ec2-user@ip-172-31-6-113 ~]$
Looks like there is a script that might download the librispeech dataset for you
[ec2-user@ip-172-31-6-113 s5]$ tree -L 1 local
local
├── chain
├── data_prep.sh
├── decode_example.sh
├── download_and_untar.sh
├── download_lm.sh
├── format_data.sh
├── format_lms.sh
├── g2p
├── g2p.sh
├── lm
├── lookahead
├── nnet2
├── nnet3
├── online
├── online_pitch
├── prepare_dict.sh
├── prepare_example_data.sh
├── rnnlm
├── run_cleanup_segmentation.sh
├── run_data_cleaning.sh
├── run_nnet2_clean_100.sh
├── run_nnet2_clean_460.sh
├── run_nnet2.sh
├── run_rnnlm.sh
└── score.sh
9 directories, 16 files
[ec2-user@ip-172-31-6-113 s5]$
The file below seems does it for us.
download_and_untar.sh
I decided manually to download test set for now.
The plan is to decode files that are in the test set. We should have good results as the ASR model was trained on the similar dataset.
wget https://www.openslr.org/resources/12/test-clean.tar.gz
tar -xvzf test-clean.tar.gz
[ec2-user@ip-172-31-6-113 ~]$ ls
exp kaldi trash
[ec2-user@ip-172-31-6-113 ~]$ wget https://www.openslr.org/resources/12/test-clean.tar.gz
--2021-12-12 21:25:52-- https://www.openslr.org/resources/12/test-clean.tar.gz
Resolving www.openslr.org (www.openslr.org)... 46.101.158.64
Connecting to www.openslr.org (www.openslr.org)|46.101.158.64|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://us.openslr.org/resources/12/test-clean.tar.gz [following]
--2021-12-12 21:25:53-- http://us.openslr.org/resources/12/test-clean.tar.gz
Resolving us.openslr.org (us.openslr.org)... 46.101.158.64
Connecting to us.openslr.org (us.openslr.org)|46.101.158.64|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 346663984 (331M) [application/x-gzip]
Saving to: ‘test-clean.tar.gz’
100%[===============================================================================================================================================>] 346,663,984 8.44MB/s in 97s
2021-12-12 21:27:30 (3.40 MB/s) - ‘test-clean.tar.gz’ saved [346663984/346663984]
[ec2-user@ip-172-31-6-113 ~]$
[ec2-user@ip-172-31-6-113 ~]$ ls
exp kaldi LibriSpeech trash
[ec2-user@ip-172-31-6-113 ~]$ ls
exp kaldi LibriSpeech trash
[ec2-user@ip-172-31-6-113 ~]$ cd LibriSpeech/
[ec2-user@ip-172-31-6-113 LibriSpeech]$ ls
BOOKS.TXT CHAPTERS.TXT LICENSE.TXT README.TXT SPEAKERS.TXT test-clean
[ec2-user@ip-172-31-6-113 LibriSpeech]$ ls
BOOKS.TXT CHAPTERS.TXT LICENSE.TXT README.TXT SPEAKERS.TXT test-clean
[ec2-user@ip-172-31-6-113 LibriSpeech]$ cd test-clean/
[ec2-user@ip-172-31-6-113 test-clean]$ ls
1089 121 1284 1580 2094 237 2830 3570 3729 4446 4970 5105 5639 61 6829 7021 7176 8224 8455 8555
1188 1221 1320 1995 2300 260 2961 3575 4077 4507 4992 5142 5683 672 6930 7127 7729 8230 8463 908
[ec2-user@ip-172-31-6-113 test-clean]$
[ec2-user@ip-172-31-6-113 1089]$ tree 134686
134686
├── 1089-134686-0000.flac
├── 1089-134686-0001.flac
├── 1089-134686-0002.flac
├── 1089-134686-0003.flac
├── 1089-134686-0004.flac
├── 1089-134686-0005.flac
├── 1089-134686-0006.flac
├── 1089-134686-0007.flac
├── 1089-134686-0008.flac
├── 1089-134686-0009.flac
├── 1089-134686-0010.flac
├── 1089-134686-0011.flac
├── 1089-134686-0012.flac
├── 1089-134686-0013.flac
├── 1089-134686-0014.flac
├── 1089-134686-0015.flac
├── 1089-134686-0016.flac
├── 1089-134686-0017.flac
├── 1089-134686-0018.flac
├── 1089-134686-0019.flac
├── 1089-134686-0020.flac
├── 1089-134686-0021.flac
├── 1089-134686-0022.flac
├── 1089-134686-0023.flac
├── 1089-134686-0024.flac
├── 1089-134686-0025.flac
├── 1089-134686-0026.flac
├── 1089-134686-0027.flac
├── 1089-134686-0028.flac
├── 1089-134686-0029.flac
├── 1089-134686-0030.flac
├── 1089-134686-0031.flac
├── 1089-134686-0032.flac
├── 1089-134686-0033.flac
├── 1089-134686-0034.flac
├── 1089-134686-0035.flac
├── 1089-134686-0036.flac
├── 1089-134686-0037.flac
└── 1089-134686.trans.txt
0 directories, 39 files
[ec2-user@ip-172-31-6-113 1089]$
Comments
Post a Comment