SYNOPSIS
drecog [ -opts] params features_list
drecog [ -opts] params fsm_list features_list
DESCRIPTION
This command inputs speech acoustic features and outputs utterance
hypotheses based on the models used as components of the recognizer.
The acoustic model (see dmodel(3)) is used together with other informa-
tion sources -- grammar, pronunciation dictionary, phonemic context-
dependency specification -- to generate an overall set of transcription
hypotheses with corresponding probabilities. In this recognizer, the
grammars, lexicons, context-dependency specification and recognition
hypotheses are all represented as finite-state acceptors and transduc-
ers (see fsmintro(1)).
The two argument form of drecog is for first-pass recognition. In this
mode, the recognition transducer (often called the recognition ''net-
work''), which combines the grammar, lexicon and context-dependency
specification, is the same for all utterances in an invocation. The
first argument is the params file which contains parameter-value pairs,
one per line, separated by white space. The legal parameters (some des-
ignated required and others optional) are described below. The second
argument, features_list, is a list of acoustic feature (e.g., cepstral)
files. Each line of this list should be an acoustic feature file cor-
responding to the format specified by the -i option described below.
These must be 'full' feature files; no further transformations (e.g.,
cepstral 'delta' computations) are done on these files by drecog. By
using the -F option described below, the features can instead be gener-
ated and returned on a UNIX pipe.
The three argument form of drecog is for segmentation or second-pass
(''rescoring'') recognition. In this mode, the recognition transducer
is changed for each utterance as determined by a list of transducers
passed on the command line (see fsms parameter below). The first argu-
ment, params, is as before. The second argument, fsm_list, is a list
of FSM (see fsm(1)) or FSM archive (see far(1)) files. By using the -L
option described below, the FSM/FARs can instead be generated and
returned on a UNIX pipe. The third argument, features_list, is the
same as the second argument in first-pass mode. It is the user's
choice how many utterances are stored per file in each line of the
FSM/FAR and acoustic features file lists. By default, these two list
lengths needn't even match; the only requirement is that the number of
distinct FSMs in the FSM/FAR files match the total number of utterances
in the acoustic feature files. The alignment between these two lists
can be made more or less stringent with the -a option below.
The recognition transducer, once constructed according the the fsms
parameter, will have input labels that correspond to acoustic model
OPTION ACTION
-a type of alignment requirement with "type" one of: "file"
or "utt" (default)
The alignment type "utt" requires only that the number of distinct
FSMs in the FSM/FAR files match the total number of utterances in
the feature files. The alignment type "file" further requires that
the number of files in the FSM/FAR list match the number of files
in the acoustic features list. Using this option, when valid, gives
somewhat clearer error messages when the data are misaligned.
-c
Normally a single FSM is output for each utterance. In this mode,
the response to an utterance may be cut into portions and output as
multiple FSMs. If the cut_label parameter is unspecified, output
occurs as soon as it is certain. If the cut_label parameter is
specified, the partial response will only occur only when all
final output labels match that label as well. Currently, this
option requires that response_type=onebest .
-i type acoustic features format with "type" one of: "blasr",
"raw", or "ssw" (default)
These must be 'full' acoustic feature sets (e.g., 'delta' cepstra
must be precomputed if used). But see -F option below.
-v verbose
In verbose mode, the following per utterance information is output
to standard error: utt: utterance number, recog: recognition suc-
cessful (i.e., non-empty output)?, final: a path reached a recogni-
tion transducer-final state?, nframes: number of frames, nres: num-
ber of response FSMs, time: recognition time (secs), narcs: number
of active arcs per frame, cost: best-scoring path cost. After all
utterances, the recognition run timing information is output.
-F command acoustic features filter (default: none)
This option customizes the reading of acoustic features: one string
at a time is read from the features_list and is passed to the UNIX
command as arguments. The result of this command applied to this
string must be an acoustic feature file corresponding to the format
specified by the -i option.
-L command FSM/FAR filter (default: none)
This option customizes the reading of FSM/FARs: one string at a
time is read from the fsms_list and is passed to the UNIX command
as arguments. The result of this command applied to this string
must be a binary FSM or FSM archive file (see fsm(1), far(1)).
arcs_max int (default: INT_MAX, optional)
During recognition, prune active arcs at the current frame so at
most the best arcs_max arcs are expanded in the next frame.
beam float
During recognition, prune active states at the current frame with
cost greater than beam relative to the best path so far.
cut_label int (optional, default: none)
Output a partial result only when all final output labels match
this label.
dur_mult float (optional, default: 0)
Duration cost multipler (see gram_mult for further details).
eps_prune float (optional, default: true)
During recognition, prune epsilon arcs as well.
final_state_mode ignore|prefer|require (optional, default: pre-
fer)
We describe this flag's operation when response_type=lattice; the
operation when response_type=onebest is simply to reduce that lat-
tice to the best scoring path. If this flag is set to require,
then only those hypothesized paths that reach final states in the
recognition transducer are output with the lattice final state
costs set to the corresonding recognition transducer final state
costs. If this flag is set to ignore, then all unpruned hypothe-
sized paths, regardless if they reach a recognition transducer
final state, are output with the final state costs in the lattice
set to zero cost. By default and when this flag is set to prefer,
then if there is a hypothesized path that reaches a final state,
then the output lattice is as if require were set, otherwise it is
as if ignore were set.
fsms FSM_filename1 FSM_filename2 ... (optional)
This specifies the recognition transducer (''network''). First,
assume model_fsm is false (the default), Then, for each utterance
in first-pass mode (the two argument form of drecog), these FSMs
are composed on-the-fly in the order given to form the recognition
transducer (see fsm(3)). In this mode, this parameter is required.
For each utterance in segmentation/second-pass mode (the three
argument form of drecog), these FSMs along with the next FSM from
the fsm_list command-line argument are composed on-the-fly in the
order given with the fsm_list FSM coming last. In this mode, this
parameter is optional. When omitted, the fsm_list FSM is used as
the recognition transducer. When model_fsm is true, an HMM state-
ing the singleton case) should be input-indexed (see fsmintro(1)).
full_lattice bool (optional, default: false)
By default, lattice generation and duration modeling use a very
efficient and reliable approximation. When this flag is set to
true, this approximation is not used (see Ljolje, Pereira, and
Riley, ''Efficient General Lattice Generation and Rescoring'',
Eurospeech '99, Budapest, Hungary.)
gc_period int (optional, default: 50)
Number of frames between the garbage collection of the partial
recognition lattice/parent pointers and cached active-state infor-
mation. Reducing this quantity can decrease memory usage but will
increase running time.
gram_mult float (optional, default: 1.0)
Grammar cost multipler. The total_recognition_cost = acoustic_cost
/ gram_mult + dur_cost * dur_mult / gram_mult + gram_cost.
gram_prune bool (optional, default: true)
During recognition, prune the activation of an arc with the grammar
cost.
lattice_beam float (optional, default: beam)
Prune lattice paths with cost greater than lattice_beam relative to
the best path in the lattice. This threshold is on complete paths
unlike the recognizer beam.
model file (required)
The acoustic model file (see dmodel(3))
model_fsm bool (optional, default: false)
By default, the recognition transducer has input labels corresponding
to those of the first FSM in the fsms parameter (or fsm_list if unspec-
ified). When model_fsm is true, an HMM state-to-model FSM is internally
generated from the acoustic model and the recognizer behaves precisely
as if the pathname to a file representation of that FSM had been
prepended to fsms parameter and the model_level parameter is set to
STATE.
model_level state|hmm (optional, default: hmm)
By default, the input labels of the recognition transducer reference
acoustic model HMM IDs. If model_level is set to state, the input
labels of the recognition transducer reference acoustic model HMM state
norm_costs bool (optional, default: false)
If a very long segment of speech (typically more than one hour at
100 frames/sec) is presented as a single utterance to the recog-
nizer, then significant errors may occur due to lack of floating-
point precision for the large accumulated path costs. If this flag
is set to true, the cost of the best scoring path at each frame is
subtracted from each path's cost, ensuring that no such problems
arise. This flag should not be set, of course, if the user wishes
to preserve the original path costs.
pin_bounds int (optional, default: INT_MAX)
In general, each output FSM state is a pair (time, network_state).
If this parameter is set to an integer delta (less than INT_MAX),
each output state's time frame must be within delta frames of the
FSM state potential (see fsmaccess(3)) of the corresponding recog-
nition transducer state. Thus, by setting the recognition trans-
ducer state potentials appropriately (typically in segmenta-
tion/rescoring mode), the user can limit the search.
response_type onebest|lattice (optional, default: onebest)
By default, only the single best scoring path is output. When lat-
tice is specified for this parameter, all paths within the lat-
tice_beam are output.
segment_level ilabel|olabel (optional, default: olabel)
This parameter determines the output segmentation level. If set to
ilabel, the output is a transducer whose input labels and output
labels match the corresponding input and output labels of the
recognition transducer. The FSM state potentials in the output (see
fsmaccess(3)) will be set to the corresponding frame numbers from
recognition. If this parameter is set to olabel, an acceptor is
output whose labels match the output labels of the recognition
transducer (the FSM state potentials are not easily interpreted in
this mode).
self_loop bool (optional, default: true)
By default, each HMM state is implicitly given a self loop by the
recognizer. When this parameter is set to false, these self loops
are not simulated (see dmodel(3)).
suppress_labels int1 int2 ... (optional, default: none)
These output labels will be converted to epsilon labels in the
recognition output. For example, this is a convenient way of remov-
ing silence ''words'' from the output.
dutils(1) DCD utility user programs.
dutils(3) DCD utility C++ routines.
fsmintro(1) Introduction to the FSM finite-state
machine library.
fsm(1) FSM user commands.
fsmaccess(3) FSM C accessors.
far(1) FSM archive user commands.
FILES
/n/lvr/linux/bin/dcd-2 Distribution binaries.
/n/lvr/linux/src/cmd/dcd/dcd-2 Distribution sources.
/n/lvr/linux/include/dcd-2 Distribution DCD include files.
/n/lvr/linux/lib/libdcd-2.a Distribution DCD library.
AUTHORS
Michael Riley (riley@research.att.com)
Copyright (C) 2003 AT&T Corp. All rights reserved.
Version 2.0 DRECOG(1)