SYNOPSIS
#include "dmodel.h"
typedef float Cost;
class DSearchModel {
protected:
vector< vector<int> > *_hmms; HMM specification
int _nstates; // # of HMM states
bool _sdurs; // state durations?
bool _hdurs; // HMM durations?
public:
typedef enum { SELF, EXIT } TransitionType;
int numHmms() { return _hmms->size(); }
int numStates() { return _hmms_states; }
int hmmLength(int m) { return (*_hmms)[m-1].size(); }
int hmmState(int m, int i) { return (*_hmms)[m-1][i-1]; }
bool stateDurations() { return _sdurs; }
bool hmmDurations() { return _hdurs; }
virtual ~DSearchModel() {}
virtual Cost stateCost(int s) = 0; // acoustic model cost for
HMM state s
virtual Cost stateDurationCost(int s, int d, TransitionType t) =
0; // state transition costs for HMM state s, dur d
virtual Cost hmmDurationCost(int h, int d, TransitionType t) = 0;
// HMM transition cost for HMM h, dur d
};
class DModel : public DSearchModel {
public:
virtual void reset() {} // reset between utterances
virtual void reset(DModelParams &par) {} // reset with parameter
update
virtual void nextFrame(vector<float*> &feats) = 0; // advance
model to next frame
virtual int numFeats() = 0; // acoustic feature vector dimension
virtual int numFrames() { return 1; } // number of acoustic
frames to pass
};
class DModelParams {
public:
string type; // model type
float dur_mult; // duration cost multiplier (def: 0.0)
float gram_mult; // grammar cost multiplier (def: 1.0)
char *source; // model source description
map<string, DParamValue> ext; // extended parameters
char *find(char *var); // return extended paramater value
void warning(); // print warning message about unreferenced
extended paramaters
};
set of assumptions about the model; the recognizer executable, drecog
(drecog(1)) makes some additional assumptions. Each case will be
described in turn.
SEARCH MODULE ACOUSTIC MODEL
The search module assumes the acoustic model consists of a set of HMMs
consecutively numbered from 1. Each HMM h consists of a set of HMM
states. The initial HMM state of h is numbered 1. There is a transi-
tion between each HMM state i of HMM h and its successor state i+1 and,
if the DSearchParams self_loop flag is set, there is also a self-loop
transition on each HMM state. More general HMM topologies are cur-
rently not directly supported (but see Caveats). Associated with each
HMM state is a probability density. In addition, an acoustic model may
support an HMM duration model or an HMM state duration model.
The class DSearchModel is the acoustic model interface for DSearch,
the DCD library search module (dsearch(3)). Any class correctly derived
from this abstract base class will work with DSearch (passed in the
DSearchParams parameter class). The derived class's constructor must
set _hmms to point to a vector of integer vectors. Each integer vector
represents the states in a particular HMM with the $m$-th HMM's states
being stored in the top-level vector's $m-1$-th location. The derived
class's constructor must also set the number of distinct HMM states in
_nstates, and the flags _sdurs and _hdurs according to which, if any,
types of duration model are provided (it would be odd but not prohib-
ited to provide both types). A data member interface rather than a
virtual function interface is used for these quantities for maximal
efficiency. These protected data members can then be accessed publicly
via several inline member functions that are provided.
The stateCost member funtion returns the negative log acoustic proba-
bility given the HMM state ID. The hmmDurationCost (stateDurationCost)
virtual member function returns the negative log probability given the
HMM ID (HMM state ID) and the model (state) duration in frames, and the
transition type, as appropriate; the transition costs may be duration-
dependent with DSearch. Note that caching of these model values,
important for efficient decoding, is assumed to be done by the derived
model class and not the search module.
Also note that stateCost does not take the acoustic feature vector as
an argument. Instead, acoustic features are passed to the model via the
nextFrame member function of the DModel class in the next section. This
design choice makes the search module completely independent of the
acoustic front-end.
FULL RECOGNIZER ACOUSTIC MODEL
The full recognizer assumes additionally that the acoustic model: (1)
models n-dimensional floating-point feature vectors, (2) handles the
grammar and duration model scaling and (3) may require acoustic feature
frame lookahead.
tual member functions defined by the derived model class.
DModelParams is a parameter class used to initialize and reset the
acoustic model. Its data members consist of a string type, which is
the name of the acoustic model type (cf. model_type in drecog(1));
gram_mult and dur_mult, which are the relative model scalings to be
used by the model (as described in drecog(1)); source, which is the
name, for example, of the model file (used in error messages); and ext,
which is a set of extended parameters passed as an STL map of string
variable and DParamValue pairs that can be used by the derived model.
For example, any parameters file parameters that are unrecognized by
drecog(1) are passed to the model via the ext map and marked unrefer-
enced. The model should use the convenience member function find to
inquire about the value of a given parameter. If found, it is marked
by find as referenced. If not found, find returns 0. The member func-
tion warning prints a message about any extended parameters that have
not been referenced.
DYNAMIC-SHARED-OBJECT ACOUSTIC MODEL
The recognizer, drecog(1), is able to load new model type definitions
without recompilation using UNIX dynamic-shared objects (DSOs). To
use, for example, a model type "mymod" with drecog, a DSO libdmymod.so
needs to be created as described below, the directory where it is
stored needs to be placed in the LD_LIBRARY_PATH, and "mymod" must be
given as the value of the param model_type in the parameters file.
To create libdmymod.so, first create a derived class DMyModel of DModel
that correctly implements all of the functionality described above.
Place it in a DSO source file mymod.cc and include the header file
dmodel_so.h (dmodel.h has the same information, but dmodel_so.h has the
bare mininum needed for creating the stand-alone DSO, eliminating all
library dependencies).
#include "dmodel_so.h"
class DMyModel : public DModel {
DMyModel(char *mod_file, DModelParams ¶ms);
<definition of virtual member functions here>
};
Then place the following definition in the DSO source file, mymod.cc.
extern "C" {
DModel *dmymod_load(char *mod_file, DModelParams *params)
return new DMyModel(mod_file, *params);
};
This load function will be called by drecog(1) with mod_file and params
correctly filled in. Note the name libd$MOD.so and the name, argument
types, and return type of d$MOD_load must be used exactly as written
for model type $MOD (which is "mymod" in this example) in order for the
recognizer to find the DSO and load the model. Note also that DModel-
More general HMM topologies than described above can be implemented by
transferring some or all of the HMM structure into the search module's
recognition transducer (via DSearchParam netfsm). For example, the
acoustic model can be constructed with trivial one state HMMs. Any
desired HMM structure can then be directly substituted into the recog-
nition transducer. The DSearchParams self_loop flag indicates whether
self-loops are to be simulated by the search module for each HMM state
or if instead they have been explicitly added to the recognition trans-
ducer. In fact, HMMs could always be handled this way except there are
time and space efficiencies for the search module if given simple
''linear'' HMM topologies by the acoustic model.
DIAGNOSTICS
When an error occurs in DCD library, a diagnostic message is printed on
standard error and then exit(1) is called. When a warning occurs in
DCD library, a diagnostic message is printed on standard error.
SEE ALSO
drecog(1) Transducer-based speech recognizer
command.
drecog(3) Transducer-based speech recognizer
C++ class.
dsearch(3) DCD library search module.
dutils(1) DCD utility user programs.
dutils(3) DCD utility C++ routines.
amintro(1) Intro. to the AM acoustic model
tools package.
FILES
/n/lvr/linux/bin/dcd-2 Distribution binaries.
/n/lvr/linux/src/cmd/dcd/dcd-2 Distribution sources.
/n/lvr/linux/include/dcd-2 Distribution DCD include files.
/n/lvr/linux/lib/libdcd-2.a Distribution DCD library.
AUTHORS
Michael Riley (riley@research.att.com)
Copyright (C) 2003 AT&T Corp. All rights reserved.
Version 2.0 DMODEL(3)