What is HCLG.fst?


Trying to understand what is needed to make HCLG.fst file.


H = the HMM topology,

C = the phone context transducer, converts from monophones to triphones

L = lexicon, phones <--> words

G = grammar, usually represents the ARPA FST.

So, H o C o L o G = HCLG.fst

How do we get H for HCLG.fst ?

The information from it should come from data/lang_chain/topo or data/lang_chain_sp/topo

it is not actually represented as an FST.

How do we get C for HCLG.fst ?

Also C is not represented as a physical FST,

there is a program fstcomposecontext that composes with it programatically

How do we get L for HCLG.fst ?

L.fst can be found in data/lang or data/lang_nosp

How do we get G for HCLG.fst ?

G.carpa can be found in data/lang_test_tglarge or data/lang_nosp_test_fglarge



monophones - just means phone without context

triphones - means a sequence of 3 phones, i.e. left-context/phone/right-context


In linguistics, a triphone is a sequence of three consecutive phonemes.[1] Triphones are useful in models of natural language processing where they are used to establish the various contexts in which a phoneme can occur in a particular natural language.[wikipedia https://en.wikipedia.org/wiki/Triphone]

Comments