Archive for VLDB

Regret-Minimizing Representative Databases

Author: Danupon Nanongkai, Atish Das Sarma, Ashwin Lall, Richard J. Lipton, Jun Xu

Download: PDF

Conference: VLDB 2010: 36th International Conference on Very Large Databases [wiki]


We propose the k-representative regret minimization query (k-regret) as an operation to support multi-criteria decision making. Like top-k, the k-regret query assumes that users have some utility or scoring functions; however, it never asks the users to provide such functions. Like skyline, it filters out a set of interesting points from a potentially large database based on the users’ criteria; however, it never overwhelms the users by outputting too many tuples.

In particular, for any number k and any class of utility functions, the k-regret query outputs k tuples from the database and tries to minimize the {\em maximum regret ratio}. This captures how disappointed a user could be had she seen k-representative tuples instead of the whole database. We focus on the class of linear utility functions, which is widely applicable.

The first challenge of this approach is that it is not clear if the maximum regret ratio can be small, or even bounded. We answer this question affirmatively. Theoretically, we prove that the maximum regret ratio can be bounded and this bound is independent of the database size. Moreover, our extensive experiments on real and synthetic datasets suggest that in practice the maximum regret ratio is reasonably small. Additionally, algorithms developed in this paper are practical as they run in linear time in the size of the database and the experiments show that their running time is small when they run on top of the skyline operation which means that these algorithm could be integrated into current database systems.

Update History

[v1] June 28, 2010 (Conference version)

Leave a Comment

Randomized Multi-pass Streaming Skyline Algorithms (VLDB’09)

Author (ordered alphabetically): Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, Jun Xu


Download: PDF, PPT

Journal: Soon

Conference: VLDB 2009: 35th International Conference on Very Large Databases [wiki]


We consider external algorithms for skyline computation without pre-processing. Our goal is to develop an algorithm with a good worst case guarantee while performing well on average. Due to the nature of disks, it is desirable that such algorithms access the input as a stream (even if in multiple passes). Using the tools of randomness, proved to be useful in many applications, we present an efficient multi-pass streaming algorithm, RAND, for skyline computation. As far as we are aware, RAND is the first randomized skyline algorithm in the literature.

RAND is near-optimal for the streaming model, which we prove via a simple lower bound. Additionally, our algorithm is distributable and can handle partially ordered domains on each attribute. Finally, we demonstrate the robustness of RAND via extensive experiments on both real and synthetic datasets. RAND is comparable to the existing algorithms in average case and additionally tolerant to simple modifications of the data, while other algorithms degrade considerably with such variation.

Update History

[v1] August 20, 2009 (Conference version)

Leave a Comment