SYNOPSIS
       drecog [ -opts] params features_list

       drecog [ -opts] params fsm_list features_list


DESCRIPTION
       This  command  inputs  speech  acoustic  features and outputs utterance
       hypotheses based on the models used as components  of  the  recognizer.
       The acoustic model (see dmodel(3)) is used together with other informa-
       tion sources -- grammar, pronunciation  dictionary,  phonemic  context-
       dependency specification -- to generate an overall set of transcription
       hypotheses with corresponding probabilities.  In this  recognizer,  the
       grammars,  lexicons,  context-dependency  specification and recognition
       hypotheses are all represented as finite-state acceptors and  transduc-
       ers (see fsmintro(1)).

       The two argument form of drecog is for first-pass recognition.  In this
       mode, the recognition transducer (often called the  recognition  ''net-
       work''),  which  combines  the  grammar, lexicon and context-dependency
       specification, is the same for all utterances in  an  invocation.   The
       first argument is the params file which contains parameter-value pairs,
       one per line, separated by white space. The legal parameters (some des-
       ignated  required  and others optional) are described below. The second
       argument, features_list, is a list of acoustic feature (e.g., cepstral)
       files.   Each line of this list should be an acoustic feature file cor-
       responding to the format specified by the -i  option  described  below.
       These  must  be 'full' feature files; no further transformations (e.g.,
       cepstral 'delta' computations) are done on these files by  drecog.   By
       using the -F option described below, the features can instead be gener-
       ated and returned on a UNIX pipe.

       The three argument form of drecog is for  segmentation  or  second-pass
       (''rescoring'')  recognition.  In this mode, the recognition transducer
       is changed for each utterance as determined by a  list  of  transducers
       passed on the command line (see fsms parameter below).  The first argu-
       ment, params, is as before.  The second argument, fsm_list, is  a  list
       of FSM (see fsm(1)) or FSM archive (see far(1)) files.  By using the -L
       option described below, the  FSM/FARs  can  instead  be  generated  and
       returned  on  a  UNIX  pipe.  The third argument, features_list, is the
       same as the second argument in  first-pass  mode.   It  is  the  user's
       choice  how  many  utterances  are  stored per file in each line of the
       FSM/FAR and acoustic features file lists.  By default, these  two  list
       lengths  needn't even match; the only requirement is that the number of
       distinct FSMs in the FSM/FAR files match the total number of utterances
       in  the  acoustic feature files.  The alignment between these two lists
       can be made more or less stringent with the -a option below.

       The recognition transducer, once constructed  according  the  the  fsms
       parameter,  will  have  input  labels that correspond to acoustic model
       OPTION         ACTION

       -a             type of alignment requirement with "type" one of: "file"
                      or "utt" (default)

           The alignment type "utt" requires only that the number of  distinct
           FSMs  in  the FSM/FAR files match the total number of utterances in
           the feature files.  The alignment type "file" further requires that
           the  number  of files in the FSM/FAR list match the number of files
           in the acoustic features list. Using this option, when valid, gives
           somewhat clearer error messages when the data are misaligned.

       -c

           Normally  a single FSM is output for each utterance.  In this mode,
           the response to an utterance may be cut into portions and output as
           multiple  FSMs.  If  the cut_label parameter is unspecified, output
           occurs as soon as it is certain.  If  the  cut_label  parameter  is
           specified,  the  partial  response   will  only occur only when all
           final output labels match that  label  as  well.   Currently,  this
           option requires that response_type=onebest .

       -i type        acoustic  features  format  with "type" one of: "blasr",
                      "raw", or "ssw" (default)

           These must be 'full' acoustic feature sets (e.g.,  'delta'  cepstra
           must be precomputed if used). But see -F option below.

       -v             verbose

           In  verbose mode, the following per utterance information is output
           to standard error: utt: utterance number, recog:  recognition  suc-
           cessful (i.e., non-empty output)?, final: a path reached a recogni-
           tion transducer-final state?, nframes: number of frames, nres: num-
           ber  of response FSMs, time: recognition time (secs), narcs: number
           of active arcs per frame, cost: best-scoring path cost.  After  all
           utterances, the recognition run timing information is output.

       -F command     acoustic features filter    (default: none)

           This option customizes the reading of acoustic features: one string
           at a time is read from the features_list and is passed to the  UNIX
           command  as  arguments.  The result of this command applied to this
           string must be an acoustic feature file corresponding to the format
           specified by the -i option.

       -L command     FSM/FAR filter    (default: none)

           This  option  customizes  the  reading of FSM/FARs: one string at a
           time is read from the fsms_list and is passed to the  UNIX  command
           as  arguments.  The  result  of this command applied to this string
           must be a binary FSM or FSM archive file (see fsm(1), far(1)).

       arcs_max              int    (default: INT_MAX, optional)

           During recognition, prune active arcs at the current  frame  so  at
           most the best arcs_max arcs are expanded in the next frame.

       beam                  float

           During  recognition,  prune active states at the current frame with
           cost greater than beam relative to the best path so far.

       cut_label             int   (optional, default: none)

           Output a partial result only when all  final  output  labels  match
           this label.

       dur_mult              float    (optional, default: 0)

           Duration cost multipler (see gram_mult for further details).

       eps_prune             float    (optional, default: true)

           During recognition, prune epsilon arcs as well.

       final_state_mode      ignore|prefer|require    (optional, default: pre-
                             fer)
           We describe this flag's operation when  response_type=lattice;  the
           operation  when response_type=onebest is simply to reduce that lat-
           tice to the best scoring path.  If this flag  is  set  to  require,
           then  only  those hypothesized paths that reach final states in the
           recognition transducer are output  with  the  lattice  final  state
           costs  set  to  the corresonding recognition transducer final state
           costs.  If this flag is set to ignore, then all  unpruned  hypothe-
           sized  paths,  regardless  if  they  reach a recognition transducer
           final state, are output with the final state costs in  the  lattice
           set  to zero cost.  By default and when this flag is set to prefer,
           then if there is a hypothesized path that reaches  a  final  state,
           then  the output lattice is as if require were set, otherwise it is
           as if ignore were set.

       fsms                  FSM_filename1 FSM_filename2 ...    (optional)

           This specifies the recognition  transducer  (''network'').   First,
           assume  model_fsm  is false (the default), Then, for each utterance
           in first-pass mode (the two argument form of  drecog),  these  FSMs
           are  composed on-the-fly in the order given to form the recognition
           transducer (see fsm(3)).  In this mode, this parameter is required.
           For  each  utterance  in  segmentation/second-pass  mode (the three
           argument form of drecog), these FSMs along with the next  FSM  from
           the  fsm_list  command-line argument are composed on-the-fly in the
           order given with the fsm_list FSM coming last. In this  mode,  this
           parameter  is  optional.  When omitted, the fsm_list FSM is used as
           the recognition transducer.  When model_fsm is true, an HMM  state-
           ing  the singleton case) should be input-indexed (see fsmintro(1)).

       full_lattice          bool    (optional, default: false)

           By default, lattice generation and duration  modeling  use  a  very
           efficient  and  reliable  approximation.   When this flag is set to
           true, this approximation is not  used  (see  Ljolje,  Pereira,  and
           Riley,  ''Efficient  General  Lattice  Generation  and Rescoring'',
           Eurospeech '99, Budapest, Hungary.)

       gc_period             int    (optional, default: 50)

           Number of frames between the  garbage  collection  of  the  partial
           recognition  lattice/parent pointers and cached active-state infor-
           mation. Reducing this quantity can decrease memory usage  but  will
           increase running time.

       gram_mult             float    (optional, default: 1.0)

           Grammar  cost multipler. The total_recognition_cost = acoustic_cost
           / gram_mult + dur_cost * dur_mult / gram_mult + gram_cost.

       gram_prune            bool    (optional, default: true)

           During recognition, prune the activation of an arc with the grammar
           cost.

       lattice_beam          float    (optional, default: beam)

           Prune lattice paths with cost greater than lattice_beam relative to
           the best path in the lattice. This threshold is on  complete  paths
           unlike the recognizer beam.

       model                 file    (required)

           The acoustic model file (see dmodel(3))

       model_fsm             bool    (optional, default: false)

       By  default,  the recognition transducer has input labels corresponding
       to those of the first FSM in the fsms parameter (or fsm_list if unspec-
       ified). When model_fsm is true, an HMM state-to-model FSM is internally
       generated from the acoustic model and the recognizer behaves  precisely
       as  if  the  pathname  to  a  file  representation of that FSM had been
       prepended to fsms parameter and the model_level  parameter  is  set  to
       STATE.

       model_level           state|hmm    (optional, default: hmm)

       By  default,  the  input labels of the recognition transducer reference
       acoustic model HMM IDs. If model_level  is  set  to  state,  the  input
       labels of the recognition transducer reference acoustic model HMM state

       norm_costs            bool    (optional, default: false)

           If  a  very long segment of speech (typically more than one hour at
           100 frames/sec) is presented as a single utterance  to  the  recog-
           nizer,  then  significant errors may occur due to lack of floating-
           point precision for the large accumulated path costs.  If this flag
           is  set to true, the cost of the best scoring path at each frame is
           subtracted from each path's cost, ensuring that  no  such  problems
           arise.  This  flag should not be set, of course, if the user wishes
           to preserve the original path costs.

       pin_bounds            int    (optional, default: INT_MAX)

           In general, each output FSM state is a pair (time,  network_state).
           If  this  parameter is set to an integer delta (less than INT_MAX),
           each output state's time frame must be within delta frames  of  the
           FSM  state potential (see fsmaccess(3)) of the corresponding recog-
           nition transducer state. Thus, by setting  the  recognition  trans-
           ducer   state  potentials  appropriately  (typically  in  segmenta-
           tion/rescoring mode), the user can limit the search.

       response_type         onebest|lattice    (optional, default: onebest)

           By default, only the single best scoring path is output. When  lat-
           tice  is  specified  for  this parameter, all paths within the lat-
           tice_beam are output.

       segment_level         ilabel|olabel    (optional, default: olabel)

           This parameter determines the output segmentation level. If set  to
           ilabel,  the  output  is a transducer whose input labels and output
           labels match the corresponding  input  and  output  labels  of  the
           recognition transducer. The FSM state potentials in the output (see
           fsmaccess(3)) will be set to the corresponding frame  numbers  from
           recognition.   If  this  parameter is set to olabel, an acceptor is
           output whose labels match the  output  labels  of  the  recognition
           transducer  (the FSM state potentials are not easily interpreted in
           this mode).

       self_loop             bool    (optional, default: true)

           By default, each HMM state is implicitly given a self loop  by  the
           recognizer.  When  this parameter is set to false, these self loops
           are not simulated (see dmodel(3)).

       suppress_labels       int1 int2 ...    (optional, default: none)

           These output labels will be converted  to  epsilon  labels  in  the
           recognition output. For example, this is a convenient way of remov-
           ing silence ''words'' from the output.

       dutils(1)                          DCD utility user programs.
       dutils(3)                          DCD utility C++ routines.
       fsmintro(1)                        Introduction to the FSM finite-state
                                          machine library.
       fsm(1)                             FSM user commands.
       fsmaccess(3)                       FSM C accessors.
       far(1)                             FSM archive user commands.
FILES
       /n/lvr/linux/bin/dcd-2             Distribution binaries.
       /n/lvr/linux/src/cmd/dcd/dcd-2     Distribution sources.
       /n/lvr/linux/include/dcd-2         Distribution DCD include files.
       /n/lvr/linux/lib/libdcd-2.a        Distribution DCD library.
AUTHORS
       Michael Riley (riley@research.att.com)
       Copyright (C) 2003 AT&T Corp. All rights reserved.



Version 2.0                                                          DRECOG(1)