Daytona For Flat File Processing

Daytona's Cymbal query language contains a number of constructs that facilitate processing flat files quickly and easily. A flat file in this context is one that consists of a sequence of newline-terminated records, each structured as a sequence of fields separated by some delimiter character. awk and Perl are designed to process such files easily.

Note that when processing flat files, Cymbal is being used as a programming language, not as a conventional query language. Consequently, there is no attendant overhead for defining record classes (i.e., tables) and associated indices in an application archive. In other words, the Cymbal flat file query is sufficient in and of itself for processing the data; no other preparatory steps such as building indices are necessary. Of course, not having indices constrains the query to sequential processing but that may be just what the doctor ordered.

Speed is Daytona's primary advantage over awk and Perl for flat file processing. For example, for the Network Traffic Matrix query described below, Daytona is more than 8 times faster than the best of awk, gawk, nawk, awkcc, and Perl. This makes Daytona particularly suited for working with large datasets since it could make the difference between getting a job done in an afternoon or coming back for the answers the same time the next day. The primary reason for Daytona's superior speed is the fact that Daytona compiles queries into high-quality executables, whereas with the exception of awkcc, the others are interpreters.

In terms of language features, Cymbal is far more powerful than awk. And in the context of this kind of processing, Cymbal is comparable in power to Perl, as will be seen.

Examples
  1. Computing A Network Traffic Matrix
Comparative Anatomy

Cymbal language constructs for specific tasks.

Three Paradigms For Using Daytona To Work With Network Data

PDF slides

Back