Archive for Stream

Randomized Multi-pass Streaming Skyline Algorithms (VLDB’09)

Author (ordered alphabetically): Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, Jun Xu


Download: PDF, PPT

Journal: Soon

Conference: VLDB 2009: 35th International Conference on Very Large Databases [wiki]


We consider external algorithms for skyline computation without pre-processing. Our goal is to develop an algorithm with a good worst case guarantee while performing well on average. Due to the nature of disks, it is desirable that such algorithms access the input as a stream (even if in multiple passes). Using the tools of randomness, proved to be useful in many applications, we present an efficient multi-pass streaming algorithm, RAND, for skyline computation. As far as we are aware, RAND is the first randomized skyline algorithm in the literature.

RAND is near-optimal for the streaming model, which we prove via a simple lower bound. Additionally, our algorithm is distributable and can handle partially ordered domains on each attribute. Finally, we demonstrate the robustness of RAND via extensive experiments on both real and synthetic datasets. RAND is comparable to the existing algorithms in average case and additionally tolerant to simple modifications of the data, while other algorithms degrade considerably with such variation.

Update History

[v1] August 20, 2009 (Conference version)

Leave a Comment

Best-Order Streaming Model (TAMC’09)

best streamAuthors: Atish Das Sarma, Richard J. Lipton, Danupon Nanongkai

Download: PDF, PPT, Journal version (link) (The journal version is *slightly* more updated)

Journal: Invited to Special Issue of Theoretical Computer Science [wiki]

Conference: TAMC 2009: 6th Annual Conference on Theory and Applications of Models of Computation [link]


We study a new model of computation called stream checking on graph problems where a space-limited verifier has to verify a proof sequentially (i.e., it reads the proof as a stream). Moreover, the proof itself is nothing but a reordering of the input data. This model has a close relationship to many models of computation in other areas such as data streams, communication complexity, and proof checking and could be used in applications such as cloud computing.

In this paper we focus on graph problems where the input is a sequence of edges. We show that checking if a graph has a perfect matching is impossible to do deterministically using small space. To contrast this, we show that randomized verifiers are powerful enough to check whether a graph has a perfect matching or is connected.

Update History

[v1] Apr 12, 2009 (Updated conference version in PDF)
[v2] Oct 1, 2009 (Submitted journal version in PDF)
[v3] Nov 22, 2010 (Link to the journal version)

Comments (1)