Archive for November, 2010

Representative Skylines using Threshold-based Preference Distributions

Author: Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, Richard J. Lipton, Jun Xu

Download: PDF

Conference: ICDE 2011: the IEEE International Conference on Data Engineering [link]

Abstract:

The study of skylines and their variants has receivedconsiderable attention in recent years. Skylines are essentiallysets of most interesting (undominated) tuples in a database.However, since the skyline is often very large, much researcheffort has been devoted to identifying a smaller subset of (sayk) “representative skyline” points. Several different definitionsof representative skylines have been considered. Most of theseformulations are intuitive in that they try to achieve some kindof clustering “spread” over the entire skyline, with k points. Inthis work, we take a more principled approach in defining therepresentative skyline objective. One of our main contributionsis to formulate the problem of displaying k representative skylinepoints such that the probability that a random user would clickon one of them is maximized.

Two major research questions arise naturally from this formu-lation. First, how does one mathematically model the likelihoodwith which a user is interested in and will “click” on a certaintuple? Second, how does one negotiate the absence of theknowledge of an explicit set of target users; in particular whatdo we mean by “a random user”? To answer the first question,we model users based on a novel formulation of thresholdpreferences which we will motivate further in the paper. Toanswer the second question, we assume a probability distributionof users instead of a fixed set of users. While this makes theproblem harder, it lends more mathematical structures that canbe exploited as well, as one can now work with probabilities ofthresholds and handle cumulative density functions.

On the theoretical front, our objective is NP-hard. For thecase of a finite set of users with known thresholds, we presenta simple greedy algorithm that attains an approximation ratio of (1-1/e) of the optimal. For the case of user distributions,we show that a careful yet similar greedy algorithm achieves thesame approximation ratio. Unfortunately, it turns out that thisalgorithm is rather involved and computationally expensive. Sowe present a threshold sampling based algorithm that is morecomputationally affordable and, for any fixed \epsilon > 0, has anapproximation ratio of (1-1/e-\epsilon). We perform experimentson both real and synthetic data to show that our algorithmsignificantly outperforms previously proposed approaches.

Update History

[v1] November 14, 2010 (Conference version)

Advertisements

Leave a Comment

Distributed Verification and Hardness of Distributed Approximation

Author: Atish Das Sarma, Stephan Holzer, Liah Kor, Amos Korman, Danupon Nanongkai, Gopal Pandurangan, David Peleg, Roger Wattenhofer

Download: arXiv

Conference: STOC 2011

Journal:

Abstract:

We study the verification problem in distributed networks, stated as follows. Let H be a subgraph of a network G where each vertex of G knows which edges incident on it are in H. We would like to verify whether H has some properties, e.g., if it is a tree or if it is connected. We would like to perform this verification in a decentralized fashion via a distributed algorithm. The time complexity of verification is measured as the number of rounds of distributed communication.
In this paper we initiate a systematic study of distributed verification, and give almost tight lower bounds on the running time of distributed verification algorithms for many fundamental problems such as connectivity, spanning connected subgraph, and s-t cut verification. We then show applications of these results in deriving strong unconditional time lower bounds on the hardness of distributed approximation for many classical optimization problems including minimum spanning tree, shortest paths, and minimum cut. Many of these results are the first non-trivial lower bounds for both exact and approximate distributed computation and they resolve previous open questions. Moreover, our unconditional lower bound of approximating minimum spanning tree (MST) subsumes and improves upon the previous hardness of approximation bound of Elkin [STOC 2004] as well as the lower bound for (exact) MST computation of Peleg and Rubinovich [FOCS 1999]. Our result implies that there can be no distributed approximation algorithm for MST that is significantly faster than the current exact algorithm, for any approximation factor.
Our lower bound proofs show an interesting connection between communication complexity and distributed computing which turns out to be useful in establishing the time complexity of exact and approximate distributed computation of many problems.

Update History

Leave a Comment