file tests and attempts to classify each
file argument. Non-regular files are classified by their
stat(2) types. Empty and non-readable regular files are classified as such.
Otherwise a data block is read from
file and this is used to match against the
magic file(s) (see
MAGIC FILE
below). Files with less than 1024 bytes of data are labelled
small to note that the sample may be too small for an accurate
classification. Failing a content match, the file name extension may be used to classify. As a last resort statistical sampling is
done for a small range of languages and applications. Failed matches usually result in the less informative
ascii text or
binary
data.
A
magic file specifies file content and name match expressions, descriptions, and
mime(1) classifications. Each line in the file consists of five
tab
separated fields:
- [op]offset
offset determines tha data location for the content test. (@expression
) specifies an indirect offset, i.e., the offset is the numeric contents of the data location at expression. The
default indirect numeric size is 4 bytes; a B suffix denotes 1 byte, H denotes 2 bytes, and Q denotes 8 bytes.
offset may also be one of { atime blocks ctime fstype gid mode mtime name
nlink size uid } to access stat(2) information for
the current file. The optional op specifies relationships with surrounding magic lines:
- +
- previous fields in block match, current optional
- &
- previous and current fields in block match
- |
- previous fields in block do not match, subsequent skipped
- {
- start nesting block
- }
- end nesting block
- c{
- function declaration and call (1 char names)
- }
- function return
- c()
- function call
- type
- The content data type:
- byte
- 1 byte integer
- short
2 byte integer
- long
- 4 byte integer
- quad
- 8 byte integer
- date
- 4 byte time_t
- version
4 byte unsigned integer of the form YYYYMMDD for YYYY-MM-DD, 0xYYZZ
for YY.ZZ, or 0xWWXXYYZZ for WW.XX.YY.ZZ
- edit
- substitute operator for string data: %old%new%[glu], where % is any
delimiter
- match
case insensitive sh(1) match pattern
operator for string data
- [mask]operator
mask is an optional &number that is masked (bit
and) with the content data before comparison. operator is one of { < <= > >= != == }. Numeric
values may be decimal, octal or hex.
- description
The description text. Care was taken to maintain consistency between all
descriptions, i.e., character case, grammatical parts placement, and punctuation, making description pattern matches feasible. description
may contain one printf(3) format specification for the current data
value at offset.
- mime
- The mime(1) type/subtype. This provides a
standard and consistent matching key space.