AT&T Labs - Research
AT&T  
Smart Sampling

Smart Sampling

Smart Sampling of Flow Statistics

    Routers in AT&Ts IP backbone compile summary statistics on each flow of traffic that passes through them, and exports them as flow records to a collector. These statistics are invaluable for network engineering, allowing us to see in detail how the network is used.

    Collection of all raw flow records is unfeasible due to resources required for transmission, collection, and storage. Smart sampling is a technique to get a reliable estimate of detailed usage from only a subset of flow record. It exploits the fact that a large fraction of usage is contained in a small fraction of flows. By preferentially sampling larger flows over small ones, we can control the volume of statistics while simultaneously controlling the variance of statistical estimates derived from them. Smart sampling entails balancing those two objectives in an optimal manner.

    Methods

Smart Sampling comes in two variants:

Threshold Sampling: a stream-based method suited to sampling  records as they are exported or collected.

Priority Sampling: samples a fixed number of "best" records from a population. Suited to database sampling for fast query execution

    Deployment in TAS

    Smart Sampling is widely used in AT&T's network measurement infrastructure and underpins the Traffic Analysis Service.

    Smart Charging from Flow Statistics

    Coupled with smart sampling is the idea of smart pricing of network usage. Usage sensitive charging is becoming increasingly popular as a means of managing service costs and demands. However, there is a perceived tension between sampling and pricing in that estimates of usage from samples may have an associated uncertainty.  Coupling smart sampling to charging manages this uncertainty.  In smart pricing, users pay a fixed cost, plus a amount which is sensitive only to usage that exceeds a certain insensitivity level. Only usage above this level needs to be reliably estimated for charging. By matching the insensitivity level to the sampling scheme, any desired target sampling accuracy can be achieved.

    Packet Sampling for Flow Statistics

    High end routers increasingly use packet sampling to reduce requirements and expense for fast memory to form flow statistics. For example, routers sample 1 in N packets periodically, or independently. We have a derived a simple model that can be used to determine, given statistics of original traffic flows, the typically number of concurrent flows that must be accommodated in the router memory, and the rate at which they are exported. We also turn the problem around, inferring characteristics of the original unsampled packet stream from the statistics of measured packet sampled flows.

    Related Work

    Contact

    Nick Duffield / duffield@research.att.com
    Carsten Lund / lund@research.att.com
    Mikkel Thorup / mthorup@research.att.com