P5: Resize AWS EC2 instance t2.2xlarge from 8GB to 200GB

Resizing AWS EC2 instance as I didn't realize the storage needed for librispeech model. Recommend to use mini librispeech model.

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/recognize-expanded-volume-linux.html#extend-file-system

I started my kaldi project with t2.2xlarge but I initiated it with 8GB storage, that was not enough later on.

At some point during the project I got an error when I was downloading a file.

Error No space left on device

[ec2-user@ip-172-31-6-113 s5]$ wget https://www.openslr.org/resources/11/librispeech-lm-corpus.tgz
Cannot write to ‘librispeech-lm-corpus.tgz’ (No space left on device).

[ec2-user@ip-172-31-6-113 ~]$ df -h
Filesystem     Size Used Avail Use% Mounted on
devtmpfs         16G     0   16G   0% /dev
tmpfs           16G     0   16G   0% /dev/shm
tmpfs           16G 432K   16G   1% /run
tmpfs           16G     0   16G   0% /sys/fs/cgroup
/dev/xvda1      8.0G  8.0G   20K 100% /
tmpfs           3.2G     0  3.2G   0% /run/user/1000
[ec2-user@ip-172-31-6-113 ~]$

Added more storage after that.





After resizing to 200GB the storage space never became available to download more data:

[ec2-user@ip-172-31-6-113 ~]$ df -h
Filesystem     Size Used Avail Use% Mounted on
devtmpfs         16G     0   16G   0% /dev
tmpfs           16G     0   16G   0% /dev/shm
tmpfs           16G  1.1M   16G   1% /run
tmpfs           16G     0   16G   0% /sys/fs/cgroup
/dev/xvda1      8.0G  8.0G   16K 100% /
tmpfs           3.2G     0  3.2G   0% /run/user/0
tmpfs           3.2G     0  3.2G   0% /run/user/1000
[ec2-user@ip-172-31-6-113 ~]$ lsblk
NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda    202:0    0 200G  0 disk
└─xvda1 202:1    0   8G  0 part /

Must run the following command after resizing the EC2 instance for the space to become available.

[ec2-user@ip-172-31-6-113 ~]$  sudo growpart /dev/xvda 1
CHANGED: partition=1 start=4096 old: size=16773087 end=16777183 new: size=419426271 end=419430367
[ec2-user@ip-172-31-6-113 ~]$
[ec2-user@ip-172-31-6-113 s5]$ sudo xfs_growfs -d /
meta-data=/dev/xvda1             isize=512    agcount=4, agsize=524159 blks
        =                       sectsz=512   attr=2, projid32bit=1
        =                       crc=1        finobt=1 spinodes=0
data     =                       bsize=4096   blocks=2096635, imaxpct=25
        =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
        =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 2096635 to 52428283

Still didn't show up below even though above we can see that resizing did occur.

[ec2-user@ip-172-31-6-113 s5]$ df -hT
Filesystem     Type     Size Used Avail Use% Mounted on
devtmpfs       devtmpfs   16G     0   16G   0% /dev
tmpfs         tmpfs     16G     0   16G   0% /dev/shm
tmpfs         tmpfs     16G 436K   16G   1% /run
tmpfs         tmpfs     16G     0   16G   0% /sys/fs/cgroup
/dev/xvda1     xfs       200G  7.5G 193G   4% /
tmpfs         tmpfs     3.2G     0  3.2G   0% /run/user/1000
[ec2-user@ip-172-31-6-113

we can see that growing storage results are different here than before.

[ec2-user@ip-172-31-6-113 ~]$ lsblk
NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda    202:0    0 200G  0 disk
└─xvda1 202:1    0 200G  0 part /
[ec2-user@ip-172-31-6-113 ~]$


I grew a partition the second time as 400GB wasn't enough to train LibriSpeech. Grew the disk to 600GB.

ubuntu@ip-172-31-6-144:~$ sudo growpart /dev/nvme0n1 1

CHANGED: partition=1 start=2048 old: size=838858719 end=838860767 new: size=1258289119,end=1258291167

ubuntu@ip-172-31-6-144:~$ lsblk

NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT

loop0         7:0    0  31.1M  1 loop /snap/snapd/11036

loop1         7:1    0  55.5M  1 loop /snap/core18/2253

loop2         7:2    0    25M  1 loop /snap/amazon-ssm-agent/4046

loop3         7:3    0  55.5M  1 loop /snap/core18/1988

loop4         7:4    0  33.3M  1 loop /snap/amazon-ssm-agent/3552

loop5         7:5    0  43.3M  1 loop /snap/snapd/14295

nvme1n1     259:0    0 209.6G  0 disk 

nvme0n1     259:1      600G  0 disk 

└─nvme0n1p1 259:2      600G  0 part /

ubuntu@ip-172-31-6-144:~$ 

but didn't show up as available MUST DO :

sudo resize2fs /dev/nvme0n1p1


ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5$ df -hT

Filesystem     Type      Size  Used Avail Use% Mounted on

udev           devtmpfs   16G     0   16G   0% /dev

tmpfs          tmpfs     3.1G  820K  3.1G   1% /run

/dev/nvme0n1p1 ext4      388G  332G   57G  86% /

tmpfs          tmpfs      16G     0   16G   0% /dev/shm

tmpfs          tmpfs     5.0M     0  5.0M   0% /run/lock

tmpfs          tmpfs      16G     0   16G   0% /sys/fs/cgroup

/dev/loop1     squashfs   56M   56M     0 100% /snap/core18/2253

/dev/loop0     squashfs   32M   32M     0 100% /snap/snapd/11036

/dev/loop3     squashfs   56M   56M     0 100% /snap/core18/1988

/dev/loop2     squashfs   25M   25M     0 100% /snap/amazon-ssm-agent/4046

/dev/loop5     squashfs   44M   44M     0 100% /snap/snapd/14295

/dev/loop4     squashfs   34M   34M     0 100% /snap/amazon-ssm-agent/3552

tmpfs          tmpfs     3.1G     0  3.1G   0% /run/user/1000


ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5$ sudo resize2fs /dev/nvme0n1p1

resize2fs 1.44.1 (24-Mar-2018)

Filesystem at /dev/nvme0n1p1 is mounted on /; on-line resizing required

old_desc_blocks = 50, new_desc_blocks = 75

The filesystem on /dev/nvme0n1p1 is now 157286139 (4k) blocks long.


ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5$ 

ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5$ df -hT

Filesystem     Type      Size  Used Avail Use% Mounted on

udev           devtmpfs   16G     0   16G   0% /dev

tmpfs          tmpfs     3.1G  820K  3.1G   1% /run

/dev/nvme0n1p1 ext4      582G  332G  251G  57% /

tmpfs          tmpfs      16G     0   16G   0% /dev/shm

tmpfs          tmpfs     5.0M     0  5.0M   0% /run/lock

tmpfs          tmpfs      16G     0   16G   0% /sys/fs/cgroup

/dev/loop1     squashfs   56M   56M     0 100% /snap/core18/2253

/dev/loop0     squashfs   32M   32M     0 100% /snap/snapd/11036

/dev/loop3     squashfs   56M   56M     0 100% /snap/core18/1988

/dev/loop2     squashfs   25M   25M     0 100% /snap/amazon-ssm-agent/4046

/dev/loop5     squashfs   44M   44M     0 100% /snap/snapd/14295

/dev/loop4     squashfs   34M   34M     0 100% /snap/amazon-ssm-agent/3552

tmpfs          tmpfs     3.1G     0  3.1G   0% /run/user/1000

ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5$ 




-----------------------------------------------------------------------------------------------------------------------------
At some point when I ran out the memory all the scripts that were active at that time got corrupt. So, the error below is not what ordinary user will find useful.

Still getting an ERROR when I want to edit the file:

 [ Error reading lock file ./.run_edited.sh.swp: Not enough data read ]


Looks like my run.sh file got corrupt, had to manually delete .run.sh.swp .run_tdnn.sh.swp and .train.py.swp.

ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5$ ls -alh

total 18M

drwxrwxr-x  7 ubuntu ubuntu 4.0K Jan  1 08:49 .

drwxrwxr-x  3 ubuntu ubuntu 4.0K Dec 28 09:00 ..

-rw-rw-r--  1 ubuntu ubuntu    0 Jan  1 03:09 .run.sh.swp

-rw-rw-r--  1 ubuntu ubuntu    0 Jan  1 03:53 .run_edited.sh.swp

-rw-rw-r--  1 ubuntu ubuntu  71K Dec 28 09:00 RESULTS

-rw-rw-r--  1 ubuntu ubuntu 1.1K Dec 28 10:54 cmd.sh

drwxrwxr-x  2 ubuntu ubuntu 4.0K Dec 28 09:00 conf

drwxrwxr-x 34 ubuntu ubuntu 4.0K Dec 31 02:29 data

drwxrwxr-x 22 ubuntu ubuntu 4.0K Dec 31 02:29 exp

drwxrwxr-x 11 ubuntu ubuntu 4.0K Dec 28 09:00 local

drwxrwxr-x  2 ubuntu ubuntu  36K Dec 28 17:20 mfcc

-rw-rw-r--  1 ubuntu ubuntu 2.4M Dec 28 10:23 output.txt

-rw-rw-r--  1 ubuntu ubuntu  16M Dec 31 19:20 output_stage5.txt

-rw-rw-r--  1 ubuntu ubuntu  25K Dec 28 10:47 output_stage_1_to_4.txt

-rwxrwxr-x  1 ubuntu ubuntu 1.1K Dec 28 09:00 path.sh

lrwxrwxrwx  1 ubuntu ubuntu   23 Dec 28 09:00 rnnlm -> ../../../scripts/rnnlm/

-rwxrwxr-x  1 ubuntu ubuntu  13K Dec 28 09:00 run.sh

-rwxrwxr-x  1 ubuntu ubuntu  13K Dec 28 10:54 run_edited.sh

lrwxrwxrwx  1 ubuntu ubuntu   18 Dec 28 09:00 steps -> ../../wsj/s5/steps

lrwxrwxrwx  1 ubuntu ubuntu   18 Dec 28 09:00 utils -> ../../wsj/s5/utils

ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5$ cp run.sh.swp run.sh

cp: cannot stat 'run.sh.swp': No such file or directory

ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5$ ls




ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5/local/chain$ ls -alh

total 56K

drwxrwxr-x  3 ubuntu ubuntu 4.0K Jan  1 08:55 .

drwxrwxr-x 11 ubuntu ubuntu 4.0K Dec 28 09:00 ..

-rw-rw-r--  1 ubuntu ubuntu    0 Jan  1 03:09 .run_tdnn.sh.swp

-rwxrwxr-x  1 ubuntu ubuntu 4.5K Dec 28 09:00 compare_wer.sh

-rwxrwxr-x  1 ubuntu ubuntu 2.9K Dec 28 09:00 run_chain_common.sh

lrwxrwxrwx  1 ubuntu ubuntu   25 Dec 28 09:00 run_cnn_tdnn.sh -> tuning/run_cnn_tdnn_1a.sh

lrwxrwxrwx  1 ubuntu ubuntu   21 Dec 28 09:00 run_tdnn.sh -> tuning/run_tdnn_1d.sh

-rwxrwxr-x  1 ubuntu ubuntu  17K Jan  1 08:55 run_tdnn2.sh

-rwxrwxr-x  1 ubuntu ubuntu 8.7K Dec 28 09:00 run_tdnn_discriminative.sh

lrwxrwxrwx  1 ubuntu ubuntu   26 Dec 28 09:00 run_tdnn_lstm.sh -> tuning/run_tdnn_lstm_1b.sh

drwxrwxr-x  2 ubuntu ubuntu 4.0K Dec 28 09:00 tuning

ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5/local/chain$ 



ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5/steps/nnet3/chain$ ls -alh

total 184K

drwxrwxr-x 4 ubuntu ubuntu 4.0K Jan  1 03:10 .

drwxrwxr-x 9 ubuntu ubuntu 4.0K Dec 28 09:00 ..

-rw-rw-r-- 1 ubuntu ubuntu    0 Jan  1 03:10 .train.py.swp

-rwxrwxr-x 1 ubuntu ubuntu 8.7K Dec 28 09:00 build_tree.sh

-rwxrwxr-x 1 ubuntu ubuntu  12K Dec 28 09:00 build_tree_multiple_sources.sh

drwxrwxr-x 2 ubuntu ubuntu 4.0K Dec 28 09:00 e2e

-rwxrwxr-x 1 ubuntu ubuntu 1.9K Dec 28 09:00 gen_topo.pl

-rwxrwxr-x 1 ubuntu ubuntu 2.4K Dec 28 09:00 gen_topo.py

-rwxrwxr-x 1 ubuntu ubuntu 2.7K Dec 28 09:00 gen_topo2.py

-rwxrwxr-x 1 ubuntu ubuntu 1.9K Dec 28 09:00 gen_topo3.py

-rwxrwxr-x 1 ubuntu ubuntu 2.2K Dec 28 09:00 gen_topo4.py

-rwxrwxr-x 1 ubuntu ubuntu 2.4K Dec 28 09:00 gen_topo5.py

-rwxrwxr-x 1 ubuntu ubuntu 2.6K Dec 28 09:00 gen_topo_orig.py

-rwxrwxr-x 1 ubuntu ubuntu  27K Dec 28 09:00 get_egs.sh

-rwxrwxr-x 1 ubuntu ubuntu 3.8K Dec 28 09:00 get_model_context.sh

-rwxrwxr-x 1 ubuntu ubuntu  11K Dec 28 09:00 get_phone_post.sh

-rwxrwxr-x 1 ubuntu ubuntu 6.0K Dec 28 09:00 make_weighted_den_fst.sh

drwxrwxr-x 2 ubuntu ubuntu 4.0K Dec 28 09:00 multilingual

-rwxrwxr-x 1 ubuntu ubuntu  31K Dec 28 09:00 train.py

-rwxrwxr-x 1 ubuntu ubuntu  29K Dec 28 09:00 train_tdnn.sh

ubuntu@ip-172-31-6-144:~/kaldi/egs/librispeech/s5/steps/nnet3/chain$


Comments