Interactive Regret Minimization

Author: Danupon Nanongkai, Atish Das Sarma, Ashwin Lall, Kazuhisa Makino
(Author names are NOT in alphabetical order. )

Conference: SIGMOD 2012

Journal:

Abstract

We study the notion of regret ratio proposed by Nanongkai et al. [VLDB’10] to deal with multi-criteria decision making in database systems. The regret minimization query proposed Nanongkai et al. was shown to have features of both skyline and top-$k$: it does not need information from the user but still controls the output size. While this approach is suitable for obtaining a reasonably small regret ratio, it is still open whether one can make the regret ratio arbitrarily small. Moreover, it remains open whether reasonable questions can be asked to the users in order to improve efficiency of the process.

In this paper, we study the problem of minimizing regret ratio when the system is enhanced with interaction. We assume that when presented with a set of tuples, the user can tell which tuple is most preferred. Under this assumption, we develop the problem of interactive regret minimization where we fix the number of questions, and tuples per question, that we can display, and aim at minimizing the regret ratio. We try to answer two questions in this paper: (1) How much does interaction help? That is, how much can we improve the regret ratio when there are interactions? (2) How efficient can interaction be? In particular, we measure how many questions we have to ask the user in order to make her regret ratio small enough.

We answer both questions from both theoretical and practical standpoints. For the first question, we show that interaction can reduce the regret ratio almost exponentially. To do this, we prove a lower bound for the previous approach (thereby resolving an open problem from Nanongkai et al.), and develop an almost-optimal upper bound that makes the regret ratio exponentially smaller. Our experiments also confirm that, in practice, interactions help in improving the regret ratio by many orders of magnitude. For the second question, we prove that when our algorithm shows a reasonable number of points per question, it only needs a few questions to make the regret ratio small. Thus, interactive regret minimization seems to be a necessary and sufficient way to deal with multi-criteria decision making in database systems.

Representative Skylines using Threshold-based Preference Distributions

Author: Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, Richard J. Lipton, Jun Xu

Conference: ICDE 2011: the IEEE International Conference on Data Engineering [link]

Abstract:

The study of skylines and their variants has receivedconsiderable attention in recent years. Skylines are essentiallysets of most interesting (undominated) tuples in a database.However, since the skyline is often very large, much researcheffort has been devoted to identifying a smaller subset of (sayk) “representative skyline” points. Several different deﬁnitionsof representative skylines have been considered. Most of theseformulations are intuitive in that they try to achieve some kindof clustering “spread” over the entire skyline, with k points. Inthis work, we take a more principled approach in deﬁning therepresentative skyline objective. One of our main contributionsis to formulate the problem of displaying k representative skylinepoints such that the probability that a random user would clickon one of them is maximized.

Two major research questions arise naturally from this formu-lation. First, how does one mathematically model the likelihoodwith which a user is interested in and will “click” on a certaintuple? Second, how does one negotiate the absence of theknowledge of an explicit set of target users; in particular whatdo we mean by “a random user”? To answer the ﬁrst question,we model users based on a novel formulation of thresholdpreferences which we will motivate further in the paper. Toanswer the second question, we assume a probability distributionof users instead of a ﬁxed set of users. While this makes theproblem harder, it lends more mathematical structures that canbe exploited as well, as one can now work with probabilities ofthresholds and handle cumulative density functions.

On the theoretical front, our objective is NP-hard. For thecase of a ﬁnite set of users with known thresholds, we presenta simple greedy algorithm that attains an approximation ratio of $(1-1/e)$ of the optimal. For the case of user distributions,we show that a careful yet similar greedy algorithm achieves thesame approximation ratio. Unfortunately, it turns out that thisalgorithm is rather involved and computationally expensive. Sowe present a threshold sampling based algorithm that is morecomputationally affordable and, for any ﬁxed $\epsilon > 0$, has anapproximation ratio of $(1-1/e-\epsilon)$. We perform experimentson both real and synthetic data to show that our algorithmsigniﬁcantly outperforms previously proposed approaches.

Update History

[v1] November 14, 2010 (Conference version)

Regret-Minimizing Representative Databases

Author: Danupon Nanongkai, Atish Das Sarma, Ashwin Lall, Richard J. Lipton, Jun Xu

Conference: VLDB 2010: 36th International Conference on Very Large Databases [wiki]

Abstract:

We propose the k-representative regret minimization query (k-regret) as an operation to support multi-criteria decision making. Like top-k, the k-regret query assumes that users have some utility or scoring functions; however, it never asks the users to provide such functions. Like skyline, it filters out a set of interesting points from a potentially large database based on the users’ criteria; however, it never overwhelms the users by outputting too many tuples.

In particular, for any number k and any class of utility functions, the k-regret query outputs k tuples from the database and tries to minimize the {\em maximum regret ratio}. This captures how disappointed a user could be had she seen k-representative tuples instead of the whole database. We focus on the class of linear utility functions, which is widely applicable.

The first challenge of this approach is that it is not clear if the maximum regret ratio can be small, or even bounded. We answer this question affirmatively. Theoretically, we prove that the maximum regret ratio can be bounded and this bound is independent of the database size. Moreover, our extensive experiments on real and synthetic datasets suggest that in practice the maximum regret ratio is reasonably small. Additionally, algorithms developed in this paper are practical as they run in linear time in the size of the database and the experiments show that their running time is small when they run on top of the skyline operation which means that these algorithm could be integrated into current database systems.

Update History

[v1] June 28, 2010 (Conference version)

Efficient Distributed Random Walks with Applications

Author: Atish Das Sarma, Danupon Nanongkai, Gopal Pandurangan, Prasad Tetali

Conference: PODC 2010

Journal: Journal of the ACM 2013

Abstract:

We  focus on  the problem of performing random walks efficiently in a distributed network. Given bandwidth constraints, the goal is to minimize the number of rounds required to obtain a random walk sample. We first present a fast sublinear time distributed algorithm for performing random walks whose time complexity is sublinear in the length of the walk. Our algorithm performs a random walk of length $\ell$  in $\tilde{O}(\sqrt{\ell D})$  rounds (with high probability) on an undirected  network, where $D$ is the diameter of the network. This improves over the previous best algorithm that ran in $\tilde{O}(\ell^{2/3}D^{1/3})$ rounds (Das Sarma et al., PODC 2009). We further extend our algorithms to efficiently perform $k$ independent random walks in   $\tilde{O}(\sqrt{k\ell D} + k)$ rounds. We then show that there is a fundamental difficulty in improving the dependence on $\ell$ any further by proving a lower bound of $\Omega(\sqrt{\frac{\ell}{\log \ell}} + D)$ under a general model of distributed random walk algorithms. Our random walk algorithms are useful in speeding up distributed algorithms for a variety of applications that use random walks as a subroutine. We present two main applications. First, we give a fast distributed algorithm for computing a random spanning tree (RST) in an arbitrary (undirected) network which runs in $\tilde{O}(\sqrt{m}D)$ rounds (with high probability; here $m$ is the number of edges). Our second application is a fast decentralized algorithm for estimating mixing time and related parameters of the underlying network. Our algorithm is fully decentralized and can serve as a building block in the design of topologically-aware networks.

Update History

Mar 03, 2009 (New version posted on ArXiv)
Nov 06, 2009 (Link to arXiv posted)
Feb 18, 2010 (New version posted)
Feb 18, 2013 (Journal version posted)

Stackelberg Pricing is Hard to Approximate within 2−ε

Author Parinya Chalermsook, Bundit Lekhanukit, Danupon Nanongkai

Conference:

Abstract:

Stackelberg Pricing Games is a two-level combinatorial pricing problem studied in the Economics, Operation Research, and Computer Science communities. In this paper, we consider the decade-old shortest path version of this problem which is the first and most studied problem in this family. The game is played on a graph (representing a network) consisting of fixed cost edges and pricable or variable cost edges. The fixed cost edges already have some fixed price (representing the competitor’s prices). Our task is to choose prices for the variable cost edges. After that, a client will buy the cheapest path from a node $s$ to a node $t$, using any combination of fixed cost and variable cost edges. The goal is to maximize the revenue on variable cost edges.

In this paper, we show that the problem is hard to approximate within $2-\epsilon$, improving the previous APX-hardness result by Joret [to appear in Networks]. Our technique combines the existing ideas with a new insight into the price structure and its relation to the hardness of the instances.

Update History

[v1] Oct 2, 2009 (Manuscript posted on arXiv)

Faster Algorithms for Semi-Matching Problems

Author Jittat Fakcharoenphol, Bundit Lekhanukit, Danupon Nanongkai

Conference: ICALP 2010

Abstract:

We consider the problem of finding semi-matching in bipartite graphs, a problem also extensively studied under various names in the scheduling literature. We give faster algorithms for both weighted and unweighted case.

For the weighted case, we give an $O(nm\log n)$-time algorithm, where $n$ is the number of vertices and $m$ is the number of edges, by exploiting geometric structure of the problem. This improves the classical $O(n^3)$ algorithms by Horn [Operations Research 1973] and Brono, Coffman and Sethi [Communications of the ACM 1974].

For the unweighted case, the bound could be improved even further. We give a simple divide-and-conquer algorithm which runs in time $O(\sqrt{n}m\log n)$, improving two previous $O(nm)$-time algorithms by Abraham [MSc thesis, University of Glasgow 2003] and Harvey, Ladner, Lovasz and Tamir [WADS 2003 and Journal of Algorithms 2006]. We also extend this algorithm to solve the Balance Edge Cover problem in time $O(\sqrt{n}m\log n)$, improving the previous $O(nm)$-time algorithm by Harada, Ono, Sadakane and Yamashita [ISAAC 2008].

Randomized Multi-pass Streaming Skyline Algorithms (VLDB’09)

Author (ordered alphabetically): Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, Jun Xu

Journal: Soon

Conference: VLDB 2009: 35th International Conference on Very Large Databases [wiki]

Abstract:

We consider external algorithms for skyline computation without pre-processing. Our goal is to develop an algorithm with a good worst case guarantee while performing well on average. Due to the nature of disks, it is desirable that such algorithms access the input as a stream (even if in multiple passes). Using the tools of randomness, proved to be useful in many applications, we present an efficient multi-pass streaming algorithm, RAND, for skyline computation. As far as we are aware, RAND is the first randomized skyline algorithm in the literature.

RAND is near-optimal for the streaming model, which we prove via a simple lower bound. Additionally, our algorithm is distributable and can handle partially ordered domains on each attribute. Finally, we demonstrate the robustness of RAND via extensive experiments on both real and synthetic datasets. RAND is comparable to the existing algorithms in average case and additionally tolerant to simple modifications of the data, while other algorithms degrade considerably with such variation.

Update History

[v1] August 20, 2009 (Conference version)