pop - operate on partioned fixed row and column data


pop [ options ] [ file ]


pop operates on partitioned fixed row and column data files. It can cut high or low frequency partition columns, list format field names for partition columns, and list the partition column frequencies. See pzip(1) for a detailed description of file partitions and column frequencies.


-c, --cut

Copy selected columns from the input rows to the standard output.
-e, --endiff

Copy the row-by-row difference to the standard output.
-f, --format=file

Specifies the data format (schema) file. Two input styles are accepted. The first style lists field names and sizes in consecutive order: `name,size[,comment...]'. The second style lists the field offset range and name: `begin[-end] name'. Column offsets start at 0. Names are used to label partition group listings on the standard output, with partition groups separated by an empty line.
-h, --high

List information on high frequency columns only. This is the default.
-i, --information

List the selected column frequency information on the standard output.
-l, --low

List information on low frequency columns only.
-m, --map

List the partition file with the row size equal to the number of high frequency columns and the high frequency columns renumbered in order from 0. This partition file can then be used on high frequency data produced by the --cut option.
-n, --newline

Append a newline to each cut output row.
-o, --override=name=value

Override the column partition. Currently only fixed value columns may be specified. The syntax is begin[-end]='value' where begin is the beginning column offset (starting at 0), end is the ending column offset for an inclusive range, and value is the fixed column value. Uncompress time is improved when high frequency columns are given fixed values (see the --partition option).
-p, --partition=file

Specifies the data row size and the high frequency column partition groups and permutation. The partition file is a sequence of lines. Comments start with # and continue to the end of the line. The first non-comment line specifies the optional name string in "...". The next non-comment line specifies the row size. The remaining lines operate on column offset ranges of the form: begin[-end] where begin is the beginning column offset (starting at 0), and end is the ending column offset for an inclusive range. The operators are:
range [...]

places all columns in the specified range list in the same high frequency partition group. Each high frequency partition group is processed as a separate block by the underlying compressor (gzip(1) by default).

specifies that each column in range has the fixed character value value . C-style character escapes are valid for value.
-r, --row=row-size

Specifies the input row size (number of byte columns). Exactly one of --row or --partition must be specified.
-u, --undiff

The inverse of the --endiff difference encoding.
-v, --verbose

List header information on the input pzip file or partition-file and continue processing.
-x, --identify

Identify output information columns with labels from the --format file.
-Q, --regress

Generate output for regression testing, such that identical invocations with identical input files will generate the same output.
-T, --test=test-mask

Enable implementation-specific tests and tracing.
-X, --prefix=count[*terminator]

Uncompressed data contains a prefix that is defined by count and an optional terminator. This data is not pzip compressed. terminator may be one of:

count bytes.
count newline terminated records.

count char terminated records.


gzip(1), pin(1), pzip(1), pzip(3)



pop (AT&T Research) 2003-04-05

Glenn Fowler <glenn.s.fowler@gmail.com>

Copyright © 1998-2012 AT&T Intellectual Property