fsmconnect, fsmdeterminize, fsmpush, fsmminimize, fsmencode, fsmquan-
tize, fsmequiv, fsmrmepsilon, fsmsynchronize, fsmepsnormalize, fsmbest-
path, fsmprune, fsmconvert, fsmarcsort, fsmtopsort, fsminfo - finite-
state machine utilities
SYNOPSIS
fsmcompile [ -i symbols ] [ -o symbols ] [ -S symbols ] [ -t ] [ -f
fsmclass ] [ -s semiring ] [ file ]
fsmprint [ -i symbols ] [ -o symbols ] [ -s symbols ] [ -l ] [ fsm ]
fsmdraw [ -i symbols ] [ -o symbols ] [ -s symbols ] [ -w x ] [ -h y ]
[ -f font ] [ -F n ] [ -lpv ] [ fsm ]
fsmunion [ fsm1 ... ]
fsmconcat [ fsm1 ... ]
fsmclosure [ -p ] [ fsm ]
fsmreverse [ fsm ]
fsminvert [ fsm ]
fsmproject [ -io12 ] [ fsm ]
fsmintersect [ -c cthresh ] [ -n nthresh ] [ -1d ] fsm1 fsm2 [ fsm3
...]
fsmcompose [ -c cthresh ] [ -n nthresh ] [ -1d ] fsm1 fsm2 [ fsm3
...]
fsmdifference [ -c cthresh ] [ -n nthresh ] [ -1d ] fsm1 fsm2
fsmconnect [ -t ] [ fsm ]
fsmdeterminize [ -c cthresh ] [ -n nthresh ] [ fsm ]
fsmpush [ -clifrIkd ] [ fsm ]
fsmminimize [ fsm ]
fsmencode [ -cld ] fsm key
fsmquantize [ fsm ]
fsmequiv [ -v ] fsm1 fsm2
fsmrmepsilon [ -c cthresh ] [ -n nthresh ] [ -1d ] [ fsm ]
fsmsynchronize [ -c cthresh ] [ -n nthresh ] [ -d ] [ fsm ]
DESCRIPTION
These commands construct, display, combine, minimize and search
weighted finite-state machines (FSMs). Two types of machines are sup-
ported, acceptors and transducers. See fsmaccess(3) for definitions of
the components of an FSM: state, start state, arc (transition), input
symbol, output symbol, arc cost, final cost and final state.
Except as described for the commands that compile, print and display
FSMs, all the commands take as input one or more binary-encoded FSMs
and send to standard output one binary-encoded FSM. For commands that
take one input FSM, the FSM can be given as standard input or on a file
specified as a command-line argument. For commands that take more than
one FSM, all inputs are specified by command-line arguments. A command-
line argument of ''-'' specifies that the corresponding FSM will be
provided on standard input. A command-line argument of -? will print
usage information.
FSM COMPILATION AND DISPLAY
fsmcompile takes input representing an FSM from file file or standard
input, and sends to standard output its binary encoding. The input
should be the textual representation of an FSM (see fsm(5)). FSM
states, input symbols and output symbols are represented in the input
by non-negative numbers, unless the options -S symbols, -i symbols, -o
symbols are used. These options allow state, input symbols and output
symbols, respectively, to be given textual names, where symbols files
give the translation from those names and numbers (see fsm(5)). The
input should be an acceptor, unless the -t option is given, in which
case it should be a transducer. The -f option specifies the FSM repre-
sentation class (see fsmclass(3)); by default, it is FSMBasicClass.
The -s option specifies the cost semiring (see fsmcost(3)); by default,
it is SMRTropical. Note fsmcompile will internally renumber the states
from 0 to the number of states minus 1.
fsmprint prints the input FSM on standard output using same textual
format as fsmcompile accepts as input (see fsm(5)). States, input sym-
bols and output symbols are printed in numeric form, unless the options
-s symbols, -i symbols, -o symbols are used to provide textual names
for states, input symbols and output symbols, respectively (see
fsm(5)). Note that the states may be renumbered by fsmcompile or
other operations. With the -l option, additional information is output
that when read by fsmcompile preserves the state potentials, the FSM
representation class, the cost semiring, and whether it is an acceptor
or transducer.
fsmdraw sends to standard output a dot(1) graph representation of the
input FSM (The command dot -Tps can be used to convert from dot format
to PostScript.) States, input symbols and output symbols are displayed
in numeric form, unless the options -s symbols, -i symbols, -o symbols
are used to provide textual names for states, input symbols and output
symbols, respectively (see fsm(5)). The options -w x and -h x set the
page width and height (in inches), -f fontname sets the font name
fsmreverse reverses the input FSM, that is, the input FSM accepts
string a1 ... aN with a certain cost iff the output FSM accepts aN ...
a1 with the same cost (and similarly for transducers).
fsminvert inverts a transducer, that is, the input FSM transduces
string s1 to s2 iff the output machine transduces string s2 to s1 with
the same cost. It does so by tranposing the input and output symbols
on each transition.
fsmproject converts a transducer into an acceptor by retaining only the
input (with -i or -1) or output (with -o or -2) label on each transi-
tion.
fsmintersect returns the intersection of two or more acceptors. Each
input FSM accepts string s iff the output FSM accepts s with the costs
combined by the SMRTimes operation (see fsmcost(3)).
fsmcompose returns the relational composition of the input FSMs, in the
order given in the command line. With two input FSMs, for example, if
the first machine transduces string s1 to s2 and the second machine
transduces s2 to s3, then the output machine will transduce s1 to s3
with the two costs combined by the SMRTimes operation (see fsmcost(3)).
If an input machine is an acceptor, it is treated as a transducer from
the language it accepts to itself.
fsmdifference returns the intersection of the acceptor fsm1 with the
complement of the costless, deterministic, epsilon-free acceptor fsm2.
FSM MINIMIZATION AND EQUIVALENCE
Two acceptors are equivalent if they accept the same strings with the
same costs; two transducers are equivalent if they transduce the same
input strings to the same output strings with the same costs.
fsmconnect returns an equivalent FSM from which any states and arcs in
the input that do not lie on a path from the start state to a final
state have been removed. With the -t option, it returns exit status 1
if the output has no states, which is useful for testing the input for
emptiness.
fsmdeterminize returns a deterministic FSM that is equivalent to the
input, which must be determinizable. Epsilon arcs are treated the same
as other symbols.
fsmpush returns a pushed FSM equivalent to the input. With the -c
option, the topology of the input FSM is unchanged and the SMRPlus-sum
of the costs of the outgoing arcs (with -i) or incoming arcs (with -f)
at each state equals SMROne. By default, the residual cost (the SMR-
Plus-sum of costs of all complete paths) is placed final -- the origi-
nal final costs are SMRTimes-multiplied by this residual cost. With -I,
the residual cost is instead placed initial -- the cost of each arc
leaving the initial state is SMRTimes-multiplied by this cost. With the
over the alphabet of pairs of input labels and output labels by encod-
ing each distinct input label and output label pair of an arc as a new
label in the output FSM. fsmencode with the -c option represents a
weighted FSM as an unweighted one by encoding each input label and cost
pair of an arc as a new label in the output FSM. With the -cl options
together, weighted transducers can be represented as unweighted accep-
tors. In each case, the mapping from each input label and output label
and/or cost to its encoding is written as an FSM to the key file. If
the key file already exists, fsmencode will first use the encodings
specified in the key fsm for any matching arc labelings. fsmencode
with the -d option, decodes an encoded FSM if the same set of -l and -c
options and key FSM is given as used in its encoding.
fsmquantize replaces each cost in an FSM with the nearest element in
its delta-quantization (see fsmcost(3)).
fsmequiv exits with zero status if fsm1 and fsm2 are equivalent. With
the -v option, a text message is printed if the machines are not equiv-
alent. The inputs must be deterministic, epsilon-free acceptors.
FSM EPSILON OPERATIONS
fsmrmepsilon returns an FSM equivalent to the input FSM that is epsilon
removed. For an acceptor, this means there are no epsilon transitions.
For a transducer, this means there are no paired input-output epsilon
transitions.
fsmsynchronize returns an FSM equivalent to the input FSM in which the
input and output labels are synchronized. For a transducer, this means
as any path is traversed, its delay is either zero or increases
strictly monotonically -- the delay of a path is the difference between
the number of non-epsilon labels encountered on the output side and
those encountered on the input side. The input FSM must have bounded
delays, that is the delay of any cycle must be zero. All acceptors are
already synchronized. The worst case time and space complexity of the
algorithm is O(O((|Q| + |E|)(|A|^|d| + |B|^|d|))), where Q is the set
of states of the input FSM, E its transition set, A its input alphabet,
and B its output alphabet and |d| the maximum delay (in absolute value)
of a path of the input FSM.
fsmepsnormalize returns an equivalent FSM that is epsilon normalized.
For an acceptor, this is the same as being epsilon removed (see fsm-
rmepsilon in fsm(1)). For a transducer, it is epsilon removed and has
the following additional property. With the -i or -1 flag, a transi-
tion with an epsilon input label is never followed on a path by a tran-
sition with an non-epsilon input label. With the -o or -2 flag, the
corresponding property is true for the output labels. The input FSM
must be epsilon-normalizable.
FSM SEARCH
fsmbestpath returns the lowest-cost path from the start state of the
input FSM to a final state. The path is encoded as a (single path) FSM.
With the .I -n nbest option, the nbest lowest-cost paths are returned.
and a particular semiring. The -f option specifies the FSM representa-
tion class (see fsmclass(3)); by default, it is the same as the input.
The -s option specifies the cost semiring (see fsmcost(3)); by default,
it is the same as the input. Note the costs are not modified when
changing semirings, they are simply reinterpreted in the new semiring.
fsmarcsort sorts the arcs of the input FSM according to the option
given: -i sorts by the arc input label, -o sorts by the arc output
label, and -c sorts by the arc cost. All sort from least to greatest.
fsmtopsort sorts the input FSM according to the option given: -t does a
full topological sort; -e topologically sorts with respect to (i/o)
epsilons, and -i topologically sorts with respect to the input
epsilons. In each case, if there is a cycle with respect to the sort-
ing criterion, fsmtopsort returns the input FSM unsorted.
fsminfo sends to standard output the following information about the
input FSM -- its FSM representation class, its cost semiring and
whether it is an acceptor or transducer. With the -n option, various
numeric information is printed, including the number of states, number
of transitions, final states, epsilon transitions, strongly-connected
components, accessible states, and co-accessible states. With the -p
option, FSMProps (see fsmaccess(3)) is called on the input, which will
return pre-computed information about the FSM, such as whether it is
cyclic, costless, non-negative, or deterministic. If pre-computed
information about a property is not supported by the FSM class, a ''?''
is printed for it. With the -t option, values for all FSM properties
are printed (by explicit tests run on the FSM if needed). See "fsm-
props.h" for the set of defined FSM properties. With the -c option,
the FSM class properties are printed, which include the FSM operations
supported by that class. See "fsmprops.h" for the set of defined FSM
class properties. With the -q w option, quantiles in intervals of
width w are printed for various data including state in-degree, state
out-degree, input label, output label, and arc cost. With the -b q1''
option, the quantiles begin at q1 (default: 0.0), and with the -e q2''
option, the quantiles end at p2 (default: 1.0). The -v option is
equivalent to -tcn -q4. With the -P flag, information is sent instead
to standard error and a copy of the input is sent to standard output.
CAVEATS
To allow limiting the size of their results, fsmintersect, fsmdiffer-
ence, fsmcompose fsmrmepsilon, fsmsynchronize fsmepsnormalize and
fsmdeterminize each accept the -c cthresh option and -n nthresh, which
call FSMPrune (see fsm(3)) prior to output. For fsmrmepsilon, fsminter-
sect, fsmdifference, and fsmcompose, if the -c or -n option is not
given, then FSMConnect (see fsm(3)) is instead called on the output to
remove any states that are not on a path from the start state to a
final state. However, if the -d option is given, FSMConnect will not be
called. In any case, memory size is, by default, kept to a mininum by
using a two-pass expansion strategy. With the '-1' option, only one
expansion pass is used, which can halve the execution time in certain
circumstances, but increase memory usage.
Some commands take additional options not specified here. The -?
option describes the complete set of options available for each com-
mand.
BUGS
On non-POSIX compliant systems, binary data directed to standard output
may become corrupted. A command-line argument of -F can instead be used
to redirect the output FSM to a specified file.
SEE ALSO
fsmintro(1) Intro. to the FSM programs and
library.
fsm(3) FSM C library.
fsmaccess(3) FSM C accessors.
fsmcost(3) FSM cost definitions.
fsmclass(3) FSM class description.
fsmobject(3) FSM object definition.
fsm(5) FSM file formats.
http://www.research.att.com/sw/tools/fsm
FSM home page -- software, documen-
tation and references.
FILES
/n/lvr/linux/bin/fsm-4 Distribution binaries.
/n/lvr/linux/src/cmd/fsm/fsm-4 Distribution sources.
AUTHORS
Cyril Allauzen (allauzen@research.att.com)
Mehryar Mohri (mohri@research.att.com)
Fernando Pereira (pereira@cis.upenn.edu)
Michael Riley (riley@research.att.com)
Copyright (C) 1998-2003 AT&T Corp. All rights reserved.
Version 4.0 FSM(1)