The GRM Library (Grammar Library) is a set of general-purpose
software tools for constructing, modifying, and compiling
grammars. The GRM functionalities include:
Context-Dependent Rules: compilation of weighted
context-dependent rules into weighted finite-state transducers defined
over general semirings.
Context-Free Grammars: approximation and compilation of
weighted context-free grammars into weighted automata. Dynamic
modification of compiled context-free grammar automata.
Text and Grammar Processing Utilities: general text and
grammar processing utilities (e.g. construction of weighted suffix
automata, counting sequences appearing in weighted automata,
construction of local grammars).
Statistical Language Models: creation and modification of
statistical language models derived from input text or input weighted
automata.
The original goal of the GRM library was to provide the essential
algorithms and representations for the creation and compilation of
large statistical or rule-based grammars for use in large-vocabulary
speech recognition. This led to the following requirements:
Generality: to support the representation and use of the
various grammars in dynamic speech recognition.
Efficiency: to support competitive large-vocabulary
dynamic recognition using automata of several hundred million states
and transitions.
The mathematical foundation of the library is the theory of
rational power series, which supplies the semantics for the
objects and operations and creates opportunity for optimizations such
as determinization and minimization.
System Components
The GRM library includes more than a dozen stand-alone
commands to construct, modify and compile grammars. These commands
manipulate FSMs by reading from and writing to files or pipelines.
The FSM library: the GRM library uses many of the
functions of the FSM library.
Dot and Dotty: programs used by the FSM library and the
GRM library to visualize graph representations of grammars or
statistical language models (Graphviz).