|
AT&T
AST
(AT&T Software Technology) OpenSource software is a collection of libraries and commands for UNIX and Windows.
Included are re-implementations of many POSIX and X/Open APIs and utilities.
It provides a portable and efficient environment that behaves consistently
across a range of operating system and hardware implementations.
portable
because one collection of source builds unattended on all target architectures,
efficient
because the underlying algorithms are continually updated to match
best in class performance.
Some of the more popular components include:
cdt,
dss,
ksh93,
nmake,
pax,
sfio,
vcodex,
and
vmalloc.
AST
has been used internally in AT&T since the mid 1980's, and was released as
OpenSource in 1999.
Documentation, source and binaries are currently available under the OpenSource
Eclipse Public License 1.0
at
AT&T AST/UWIN OpenSource downloads.
The AT&T
AST
OpenSource software collection started in the mid 1980's.
Much of the early work was described in the book
Practical Reusable UNIX Software.
The software collection was made available as Open Source in 1999.
A paper presented at Usenix 2000 gave the reasons for doing so as follows:
-
to increase influence on national and international standardization efforts,
-
to create and support alternatives to closed systems,
-
to attract vendors to distribute and support the software,
-
to improve software quality due to widespread use, and
-
to increase visibility of AT&T Research in the research community.
These reasons continue to hold true today.
AST
has enjoyed tremendous reception from both
external and internal AT&T software community. It influenced a number of important
national and international standardization efforts.
The KornShell
ksh(1),
AST
functions
fts(3),
ftwalk(3)
and
regex(3),
the shell programming language
ksh93(1),
and the archival tool
pax(1)
have contributed to the POSIX 1003.1 and 1003.2 standards.
The
VCDIFF
Generic Differencing and Compression Data Format, part of
vcodex(3),
was approved as
RFC3284
by the IETF (Internet Engineering Task Force.)
The
Fowler-Knoll-Vo hash function
is used in a variety of well know applications
including Twitter, and is being advanced as another IETF standard for hashing small strings.
There has been between 50,000 to 100,000 downloads per year of the
AST
software for each of the last 10 years.
Many vendors, including RedHat, SUSE, and Apple, ship at least parts of the
AST
software.
Solaris 11 uses ksh93 as its
/bin/sh
and distributes most of the libraries.
Within AT&T, many major projects are based on the
AST
software collection.
Most notable are two independent database systems that underlie
various crucial operations systems for network engineering, monitoring, fraud detection,
and data analysis.
AST
now includes most standard POSIX utilities and popular tools
such as
ksh93(1),
nmake(1),
pax(1),
and other commands, all written on top of the
AST
libraries.
Each command can generate its own documentation (in several formats) thanks to the use
of the option parser available as an
AST
library routine. This option parser handles
both standard POSIX command argument syntax as well as GNU long form options.
AST
also provides many algorithms and data structures not available in any other reusable libraries.
For example,
vcodex(3)
is a platform providing various data transformation techniques
including compression, encryption and transcoding methods.
Its
Vctable
data transform embodies the fastest and best known algorithm
for transforming and compressing tabular or relational data.
In turn,
vcodex
and other tools are
based on libraries such as
sfio(3)
for safe and fast I/O operations,
vmalloc(3)
for managing memory allocation in virtual regions, including shared and persistent memory,
cdt(3),
a comprehensive set of container data types (binary trees, hash tables, linked lists)
dealing with ordered and unordered sets and multisets in a unified and simple API, and
aso(3),
a collection of fast, portable synchronization primitives for controlling data and process
concurrency.
The AT&T
AST
OpenSource software collection has grown to about a million lines of C code.
Despite this large size, most of the software was created
and maintained by a small group of researchers, and, for the past several years,
primarily by three members of AT&T research. The collection continues to
grow, driven from both within and outside AT&T.
This is possible thanks for a few guiding principles:
- reuse first
When a need for new software features arises,
they are always examined first for generality, then, if appropriate,
encapsulated in reusable API's.
For example, new concurrency algorithms in
cdt(3)and
vmalloc(3)
libraries require atomic scalar operations such as the hardware compare-and-switch instruction
and atomic integer addition and subtraction.
This could have been done on a per need basis in the different software components.
However, that would expose the code base to the varied and ad-hoc interfaces across
different compiler and operating system implementations, producing an inelegant unmaintainable mess.
Instead, a new portable library
aso(3)
was designed and built to provide a portable API
for all required atomic scalar operations.
This allowed new algorithms to be designed and implemented on top of the portable API.
- single source
The entire collection is built from a single source base despite the fact that
it is portable across virtually all UNIX environments, including
IBM Unix Systems Services on MVS (which uses EBCDIC) and
Windows systems via a suitable UNIX layer such as
UWIN
or
Cygwin.
A large part of this achievement is due to the effort to build portable API's
for any significant set of related features.
At some layer, e.g.,
aso(3),
the varied operating system interfaces cannot be avoided.
In such cases,
iffe(1)
is used to probe local compilation environments
and define abstract symbols that encapsulate the specific features.
In this way the logical data structures and algorithms can be written at a high level,
largely devoid of conditional code.
- extensibility
The software collection is designed for extensibility.
Since the Open Source release in 1999,
many new functions, libraries and tools have been added.
Despite such changes, the API's of the original libraries have remained stable.
This is possible due to the
discipline & method
library architecture,
used extensively in our library design. In this architecture, a
discipline
consists of callback functions that capture required aspects of the
underlying resource to be managed (e.g., I/O functions, object attributes, etc.)
as well as providing for exception handling, while a
method
defines a standard set of operations.
For example, a
cdt(3)
dictionary managing an unordered set of objects can be constructed
from a discipline defining object attributes and object comparison along with the
library-provided
Dtset
method. Underlying
Dtset
is a hash table for average constant time insertion, deletion and search operations.
A parallel
Dtoset
provides for ordered sets and is based on splay trees to guarantee amortized logarithmic access time.
New data structures and algorithms can be easily added as new methods without any disturbance to
the API or application code. In fact, carefully written applications can make use of
new methods with no code change whatever.
- run-time extension
The
discipline & method
library architecture is ideally suited to run-time plugins.
An
AST
plugin is simply a method implemented in a DLL or shared library.
An API is provided to locate a plugin by name at run-time, load it,
and then execute it.
Methods and plugins lead to stable development environment.
Once the main application is coded and tested it rarely changes.
Each new plugin can then be coded and tested independent of any other plugin
and without re-building the main application.
The plugin API enables
ksh(1)
to load plugins as built-in commands at run-time, eliminating the
need to create new process to execute those commands.
The
pax(1)
archiver uses plugins to handle various archive and compression formats.
Plugins can also be used to enable an OpenSource command framework
that transparently restricts access to proprietary software components
while allowing other components to be exposed.
For example, certain compression transforms in
vcodex(3)
are still proprietary to AT&T, so they cannot be distributed in the
Open Source collection; these are implemented as plugins accessible to the
vcodex(3)
compressor
vczip(1).
In this way the compressor itself remains single source for both internal and external use.
Some of the components available in the
AST
software collection are:
- POSIX commands
Most of the standard POSIX commands are available in the
AST
collection.
Many are coded as library functions which can be added to
ksh
as built-in command which dramatically improves performance.
- libast
This library is the porting base that includes
headers providing common data types unified across platforms.
Similarly, functions are provided either to fill in standard
ones missing from the local OS implementation
or to replace existing ones that are incorrect or inefficient.
- sfio
-
This I/O library provides a robust interface that implements
efficient buffering and data formatting algorithms.
It can be used transparently in place of the standard I/O library,
stdio.
- vmalloc
This memory allocation library manages regions created from the heap
or shared or mapped memory. A recent extension support concurrent
access by threads and processes.
A backward compatible
malloc(3)
interface is provided.
- cdt
-
This container data type library provides
general operations for ordered/unordered sets/multisets, lists, stacks and queues.
Efficient data structures such as hash tables and splay trees underlie the abstract containers.
A recent extension includes less-lock hash tables suitable for applications using
concurrent threads or processes.
- vcodex
This software platform provides a comprehensive set of data transforms for
compression, encryption, checksumming and data transcoding. The
vczip(1)
command applies common compositions of transforms for both encoding and decoding.
The transform that is equivalent to the popular compressor
bzip2
is faster and has a better compression rate.
Other transforms compress various forms of relational and network data (e.g., Cisco netflow)
better than all known general purpose compressors.
- retrie/LPM
This software platform provides the fastest known algorithm for Longest Prefix Matching
of integers when thought of as being encoded in some set of digits. Popular applications
include matching prefixes of IPv4 addresses or telephone numbers.
Larger IPv6 addresses are handled by a different algorithm in the
AST
iv
library.
- dss
-
This software platform provides a command and a library repository of
techniques to perform
data stream scanning.
The framework supports describing, transforming, reading, querying,
and writing streams of record oriented data.
The API is extended by plugins that define
data domain specific I/O, type and query functions.
dss(1)
is used in various AT&T projects to monitor and analyze network traces and logs.
- aso
-
This library provides a portable common interface to popular atomic scalar operations.
If a compilation or operating system environment provides the proper primitives, the library
simply maps the standardized API to those. Otherwise it implements the API based on standard
system calls such as semaphores or other locking mechanisms.
Documentation, source and binaries are currently available under the OpenSource
Eclipse Public License 1.0
at
AT&T AST/UWIN OpenSource downloads.
The site includes a GIT repository and
active mail groups for
AST
and
UWIN
users and developers.
|