SYNOPSIS
#include "dutils.h"
typedef enum { DMAKE_NONE, DMAKE_BIDET,
DMAKE_ACYC } DMakeType;
typedef enum { DWEIGHT_NONE, DWEIGHT_REPLACE,
DWEIGHT_SUBTRACT } DWeightMode;
const int DMAKE_VERB = 0x1;
const int DMAKE_ENCMIN = 0x2;
Fsm DMake(vector<Fsm> &fsms, vector<int> &typs
unsigned flags);
Fsm DFactor(Fsm fsm, AMIOHmms **hmms,
int maxlen = INT_MAX, int maxchains = INT_MAX);
Fsm DWeight(Fsm fsm, Fsm wfsm, Cost cthresh,
float m, float b, DWeightMode wmode);
struct DAlignParams {
Smr smr; // semiring (def: tropical)
Cost cthresh; // prune with cost threshold x (def: inf)
Cost subcost; // substitution cost (def: tropical: 1.0)
Cost delcost; // deletion cost (def: tropical: 1.0)
Cost inscost; // insertion cost (def: tropical: 1.0)
int nbest; // return only the n-best alignments
};
class DAlign {
public:
DAlign(DAlignParams& params);
Fsm next(Fsm ref, FSM hyp);
};
class DScore {
public:
DScore(void);
void next(Fsm alig);
int numAlign(); // number of alignments (calls to next())
ins numSub(); // number of substitutions in last alignment
ins numDel(); // number of deletions in last alignment
ins numIns(); // number of insertions in last alignment
int numRef(); // number of reference (non-zero input) labels in
last alignment
ins totSub(); // total number of substitutions
ins totDel(); // total number of deletions
ins totIns(); // total number of insertions
determinizable, i.e., both the FSM and its inverse are determinizable
(where epsilons are treated as regular symbols). The DMAKE_NONE type
can be used to pass an arbitrary FSM.
The third argument to DMake passes control flags. The DMAKE_ENCMIN
flag minimizes the (encoded) intermediate and final results. The
DMAKE_VERB flag prints progress information to standard error. See
dutils(1) for a further explanation of these arguments and flags, which
have obvious analogs in the binary executable dmake. This operation is
destructive on all FSMs in its first argument (see fsm(3)).
DFactor factors a recognition transducer fsm by finding linear chains
of transitions, identifying each unique chain as an HMM and adding it
to a newly-generated AMIOHmms class returned in the hmms argument, and
replacing those chains in the recognition transducer with single tran-
sitions whose input labels are the corresponding HMM IDs. The maxlen
argument sets the maximum chain length (default: unlimited), and the
maxchain argument sets the maximum number of chains (default: unlim-
ited). This operation is destructive on its first argument (see
fsm(3)).
DWeight modifies the weights on fsm. If the wmode is DWEIGHT_REPLACE,
the weights in weight_fsm are used to replace the weights on matching
paths in fsm. A common use of this flag is to replace the weights on
word lattices entirely with the weights from a word grammar. If the
wmode is DWEIGHT_SUBTRACT, these weights are instead subtracted from
fsm. A common use of this flag is to subtract the grammar used in
recognition from lattices containing both acoustic and grammar weights
so that the resulting lattices will have just the acoustic scores on
them. These operations use determinization, which may use considerable
time and space for some inputs. Pruning can help in this matter:
cthresh is used to prune the determinizaion to a particular cost
threshold; m and b limit the determinization to nstates*m+b states
where nstates is the number of states in fsm (see fsmprune in fsm(1))
for more information). This operation is destructive on its first
argument (see fsm(3)).
DAlign aligns reference and hypothesis FSMs. It is constructed by pass-
ing a DAlignParams structure of parameter settings. See dutils(1) for
an explanation of the parameter structure fields that correspond to the
analogous flags in the binary executable dalign. The next() member
function is passed a reference and hypothesis FSM and returns the cor-
responding alignment FSM.
DScore computes statistics about alignments. It is passed alignments
via the next() member function. The numAlign() member function returns
the number of alignments passed so far. There are also member functions
that return the number of substitutions, insertions, deletions, and
reference (non-zero input) labels for each new alignment passed (e.g.,
numSub()) and that return the cumulative number of substitutions,
insertion, deletions, and reference labels for all alignments passed
(e.g., totSub()).
tools package.
FILES
/n/lvr/linux/bin/dcd-2 Distribution binaries.
/n/lvr/linux/src/cmd/dcd/dcd-2 Distribution sources.
/n/lvr/linux/include/dcd-2 Distribution DCD include files.
/n/lvr/linux/lib/libdcd-2.a Distribution DCD library.
AUTHORS
Cyril Allauzen (allauzen@research.att.com)
Mehryar Mohri (mohri@research.att.com)
Michael Riley (riley@research.att.com)
Copyright (C) 2003 AT&T Corp. All rights reserved.
Version 2.0 DMODEL(3)