Overview

AT&T AST (AT&T Software Technology) OpenSource software is a collection of libraries and commands for UNIX and Windows. Included are re-implementations of many POSIX and X/Open APIs and utilities. It provides a portable and efficient environment that behaves consistently across a range of operating system and hardware implementations. portable because one collection of source builds unattended on all target architectures, efficient because the underlying algorithms are continually updated to match best in class performance.

Some of the more popular components include: cdt, dss, ksh93, nmake, pax, sfio, vcodex, and vmalloc. AST has been used internally in AT&T since the mid 1980's, and was released as OpenSource in 1999. Documentation, source and binaries are currently available under the OpenSource Eclipse Public License 1.0 at AT&T AST/UWIN OpenSource downloads.


Background

The AT&T AST OpenSource software collection started in the mid 1980's. Much of the early work was described in the book Practical Reusable UNIX Software. The software collection was made available as Open Source in 1999. A paper presented at Usenix 2000 gave the reasons for doing so as follows:
  • to increase influence on national and international standardization efforts,
  • to create and support alternatives to closed systems,
  • to attract vendors to distribute and support the software,
  • to improve software quality due to widespread use, and
  • to increase visibility of AT&T Research in the research community.
These reasons continue to hold true today. AST has enjoyed tremendous reception from both external and internal AT&T software community. It influenced a number of important national and international standardization efforts. The KornShell ksh(1), AST functions fts(3), ftwalk(3) and regex(3), the shell programming language ksh93(1), and the archival tool pax(1) have contributed to the POSIX 1003.1 and 1003.2 standards. The VCDIFF Generic Differencing and Compression Data Format, part of vcodex(3), was approved as RFC3284 by the IETF (Internet Engineering Task Force.) The Fowler-Knoll-Vo hash function is used in a variety of well know applications including Twitter, and is being advanced as another IETF standard for hashing small strings.

There has been between 50,000 to 100,000 downloads per year of the AST software for each of the last 10 years. Many vendors, including RedHat, SUSE, and Apple, ship at least parts of the AST software. Solaris 11 uses ksh93 as its /bin/sh and distributes most of the libraries. Within AT&T, many major projects are based on the AST software collection. Most notable are two independent database systems that underlie various crucial operations systems for network engineering, monitoring, fraud detection, and data analysis.

AST now includes most standard POSIX utilities and popular tools such as ksh93(1), nmake(1), pax(1), and other commands, all written on top of the AST libraries. Each command can generate its own documentation (in several formats) thanks to the use of the option parser available as an AST library routine. This option parser handles both standard POSIX command argument syntax as well as GNU long form options.

AST also provides many algorithms and data structures not available in any other reusable libraries. For example, vcodex(3) is a platform providing various data transformation techniques including compression, encryption and transcoding methods. Its Vctable data transform embodies the fastest and best known algorithm for transforming and compressing tabular or relational data. In turn, vcodex and other tools are based on libraries such as sfio(3) for safe and fast I/O operations, vmalloc(3) for managing memory allocation in virtual regions, including shared and persistent memory, cdt(3), a comprehensive set of container data types (binary trees, hash tables, linked lists) dealing with ordered and unordered sets and multisets in a unified and simple API, and aso(3), a collection of fast, portable synchronization primitives for controlling data and process concurrency.


Guiding Principles

The AT&T AST OpenSource software collection has grown to about a million lines of C code. Despite this large size, most of the software was created and maintained by a small group of researchers, and, for the past several years, primarily by three members of AT&T research. The collection continues to grow, driven from both within and outside AT&T. This is possible thanks for a few guiding principles:
reuse first

When a need for new software features arises, they are always examined first for generality, then, if appropriate, encapsulated in reusable API's. For example, new concurrency algorithms in cdt(3)and vmalloc(3) libraries require atomic scalar operations such as the hardware compare-and-switch instruction and atomic integer addition and subtraction. This could have been done on a per need basis in the different software components. However, that would expose the code base to the varied and ad-hoc interfaces across different compiler and operating system implementations, producing an inelegant unmaintainable mess. Instead, a new portable library aso(3) was designed and built to provide a portable API for all required atomic scalar operations. This allowed new algorithms to be designed and implemented on top of the portable API.
single source

The entire collection is built from a single source base despite the fact that it is portable across virtually all UNIX environments, including IBM Unix Systems Services on MVS (which uses EBCDIC) and Windows systems via a suitable UNIX layer such as UWIN or Cygwin. A large part of this achievement is due to the effort to build portable API's for any significant set of related features. At some layer, e.g., aso(3), the varied operating system interfaces cannot be avoided. In such cases, iffe(1) is used to probe local compilation environments and define abstract symbols that encapsulate the specific features. In this way the logical data structures and algorithms can be written at a high level, largely devoid of conditional code.
extensibility

The software collection is designed for extensibility. Since the Open Source release in 1999, many new functions, libraries and tools have been added. Despite such changes, the API's of the original libraries have remained stable. This is possible due to the discipline & method library architecture, used extensively in our library design. In this architecture, a discipline consists of callback functions that capture required aspects of the underlying resource to be managed (e.g., I/O functions, object attributes, etc.) as well as providing for exception handling, while a method defines a standard set of operations. For example, a cdt(3) dictionary managing an unordered set of objects can be constructed from a discipline defining object attributes and object comparison along with the library-provided Dtset method. Underlying Dtset is a hash table for average constant time insertion, deletion and search operations. A parallel Dtoset provides for ordered sets and is based on splay trees to guarantee amortized logarithmic access time. New data structures and algorithms can be easily added as new methods without any disturbance to the API or application code. In fact, carefully written applications can make use of new methods with no code change whatever.
run-time extension

The discipline & method library architecture is ideally suited to run-time plugins. An AST plugin is simply a method implemented in a DLL or shared library. An API is provided to locate a plugin by name at run-time, load it, and then execute it. Methods and plugins lead to stable development environment. Once the main application is coded and tested it rarely changes. Each new plugin can then be coded and tested independent of any other plugin and without re-building the main application. The plugin API enables ksh(1) to load plugins as built-in commands at run-time, eliminating the need to create new process to execute those commands. The pax(1) archiver uses plugins to handle various archive and compression formats. Plugins can also be used to enable an OpenSource command framework that transparently restricts access to proprietary software components while allowing other components to be exposed. For example, certain compression transforms in vcodex(3) are still proprietary to AT&T, so they cannot be distributed in the Open Source collection; these are implemented as plugins accessible to the vcodex(3) compressor vczip(1). In this way the compressor itself remains single source for both internal and external use.


Components

Some of the components available in the AST software collection are:
POSIX commands

Most of the standard POSIX commands are available in the AST collection. Many are coded as library functions which can be added to ksh as built-in command which dramatically improves performance.
libast

This library is the porting base that includes headers providing common data types unified across platforms. Similarly, functions are provided either to fill in standard ones missing from the local OS implementation or to replace existing ones that are incorrect or inefficient.
sfio
This I/O library provides a robust interface that implements efficient buffering and data formatting algorithms. It can be used transparently in place of the standard I/O library, stdio.
vmalloc

This memory allocation library manages regions created from the heap or shared or mapped memory. A recent extension support concurrent access by threads and processes. A backward compatible malloc(3) interface is provided.
cdt
This container data type library provides general operations for ordered/unordered sets/multisets, lists, stacks and queues. Efficient data structures such as hash tables and splay trees underlie the abstract containers. A recent extension includes less-lock hash tables suitable for applications using concurrent threads or processes.
vcodex

This software platform provides a comprehensive set of data transforms for compression, encryption, checksumming and data transcoding. The vczip(1) command applies common compositions of transforms for both encoding and decoding. The transform that is equivalent to the popular compressor bzip2 is faster and has a better compression rate. Other transforms compress various forms of relational and network data (e.g., Cisco netflow) better than all known general purpose compressors.
retrie/LPM

This software platform provides the fastest known algorithm for Longest Prefix Matching of integers when thought of as being encoded in some set of digits. Popular applications include matching prefixes of IPv4 addresses or telephone numbers. Larger IPv6 addresses are handled by a different algorithm in the AST iv library.
dss
This software platform provides a command and a library repository of techniques to perform data stream scanning. The framework supports describing, transforming, reading, querying, and writing streams of record oriented data. The API is extended by plugins that define data domain specific I/O, type and query functions. dss(1) is used in various AT&T projects to monitor and analyze network traces and logs.
aso
This library provides a portable common interface to popular atomic scalar operations. If a compilation or operating system environment provides the proper primitives, the library simply maps the standardized API to those. Otherwise it implements the API based on standard system calls such as semaphores or other locking mechanisms.

Documentation, source and binaries are currently available under the OpenSource Eclipse Public License 1.0 at AT&T AST/UWIN OpenSource downloads. The site includes a GIT repository and active mail groups for AST and UWIN users and developers.


August 17, 2012