The Britney Spears Problem
Tracking who's hot and who's not presents an algorithmic challenge
Search-engine queries and Internet packets are fairly complicated data structures, but the principles of stream algorithms can be illustrated with simple streams of numbers. Suppose a pipeline delivers an endless sequence of nonnegative integers at a steady rate of one number every t time units. We want to build a device—call it a stream gauge—that intercepts the stream and displays answers to certain questions about the numbers.
What computational resources does a stream gauge need to accomplish its task? The resources of particular concern are processing time—which can't exceed t units per stream element—and auxiliary storage space. (It's convenient to assume that numbers of any size can be stored in one unit of space, and that any operation of ordinary arithmetic, such as addition or multiplication, can be performed in one unit of time. Comparing two numbers also takes a single time unit.)
For some questions about number streams, it's quite easy to design an effective stream gauge. If you want to know how many stream elements have been received, all you need is a counter. The counter is a storage location, or register, whose initial value is set to 0. Each time an element of the stream arrives, the counter is incremented: In other words, if the current value of the counter is x , it becomes x +1.
Instead of counting the stream elements, another kind of gauge displays the sum of all the integers received. Again we need a register x initialized to 0; then, on the arrival of each integer n , we add n to the running total in x .
Still another easy-to-build gauge shows the maximum value of all the integers seen so far. Yet again x begins at 0; whenever a stream element n is greater than x , that n becomes the new x .
For each of these devices, the storage requirement is just a single register, and the processing of each stream element is completed in a single unit of time. Computing doesn't get much easier than that. Certain other stream functions require only slightly greater effort. For example, calculating the average (or arithmetic mean) of a stream's integers takes three registers and three operations per integer. One register counts the elements, another records their sum, and the third register holds the value of the average, calculated as the quotient of the sum and the count.
What matters most about the frugal memory consumption of these algorithms is not the exact number of registers needed; what matters is that the storage space remains constant no matter how long the stream. In effect, the algorithms can do calculations of any length on a single sheet of scratch paper. Their time performance follows a similar rule: They use the same number of clock cycles for each element of the stream. The algorithms are said to run in constant space and constant time.