Workshop 2: Speakers and Abstracts
All the videos of the lectures are available here!
You can download the PDF booklet that will be distributed at the workshop.
Convex bandit, part I 

Abstract: In this twoparts lecture Ronen and I will discuss our solution to the decadeold problem of characterizing the minimax regret in bandit convex optimization. Part I will focus on the information theoretic aspects of the proof, while part II will be about a new geometric question on convex functions that we had to answer.

Nonparametric learning, scale invariance, and sketching techniques for online algorithms 

Abstract: We describe recent results and work in progress on algorithmic techniques for online learning. First, we discuss a simple and modular approach to nonparametric learning in data streams. This method covers the input space using base classifiers that are locally trained. A good balance between model complexity and predictive accuracy is achieved by dynamically adapting the cover to the local complexity of the classification problem. Next, we consider the problem of designing online algorithms that are invariant to linear transformation of the data. As invariance relies upon maintaining secondorder information, we show how the associated computational cost can be tamed via the use of sketching techniques without giving up regret guarantees.

Combinatorial bandits revisited 

Abstract: We investigate stochastic and adversarial combinatorial multiarmed bandit problems. In the stochastic setting, we first derive problemspecific regret lower bounds, and analyze how these bounds scale with the dimension of the decision space. We then propose COMBUCB, algorithms that efficiently exploit the combinatorial structure of the problem, and derive finitetime upper bound on their regrets. These bounds improve over regret upper bounds of existing algorithms, and we show numerically that COMBUCB significantly outperforms any other algorithm. In the adversarial setting, we propose two simple algorithms, namely COMBEXP1 and COMBEXP2 for semibandit and bandit feedback, respectively. Their regrets have similar scaling as stateoftheart algorithms, in spite of the simplicity of their implementation. More information at http://arxiv.org/abs/1502.03475.

Placing Dynamic Content in Local Caches 

Abstract: In this talk we will propose a means of addressing a fundamental limitation for the adoption of caching for wireless access networks due to small population sizes. This shortcoming is due to two main challenges: (i) making timely estimates of varying content popularity and (ii) inferring popular content from small samples. We propose a framework that alleviates such limitations. To timely estimate varying popularity in a context of a single cache we propose an AgeBased Threshold (ABT) policy which caches all contents requested more times than a certain threshold related to the content age. We show that ABT is asymptotically hit rate optimal in the many contents regime, which allows us to obtain the first characterization of the optimal performance of a caching system in a dynamic context. We then address small sample sizes focusing on L local caches and one global cache. On one hand we show that the global cache learns L times faster by aggregating all requests from local caches, which improves hit rates. On the other hand, assuming correlations, aggregation washes out local characteristics which leads to a hit rate penalty. This motivates coordination mechanisms that combine global learning of popularity scores in clusters and LRU with prefetching.

Convex bandit, part II 

Abstract: In this twoparts lecture Sébastien and I will discuss our solution to the decadeold problem of characterizing the minimax regret in bandit convex optimization. Part I will focus on the information theoretic aspects of the proof, while part II will be about a new geometric question on convex functions that we had to answer.

Optimizing adauction reserve prices when the outcome is measured on an aggregate basis 

Abstract: Selling an adspace through automated adauctions with an optimal reserve price can be a very challenging task when the outcome of a particular reserve price configuration is observed on an aggregate basis at a low frequency.

On Bayesian index policies for sequential resource allocation 

Abstract: In the context of a parametric bandit model, I will investigate the interplay between frequentist and Bayesian regret and frequentist and Bayesian strategies. The focus will be on index policies inspired by a Bayesian view on the multiarmed bandit problem but that perform well with respect to the frequentist regret. The notion of index policy dates back to the strategy introduced by Gittins (1979) to solve the Bayesian, discounted, multiarmed bandit problem. We will see that the Gittins indices can also be useful in the nondiscounted case. Moreover, some existing approximations of these (hard to compute) indices suggest alternative exploration rates that could be used in UCBlike algorithms. I will then talk about BayesUCB, a simple algorithm based on quantiles of posterior distributions on the means of the arms, that is (asymptotically) optimal for some classes of rewards distributions.

The momentLP and momentSOS approach in polynomial optimization 

Abstract: We review basic properties of the momentLP and momentSDP hierarchies for polynomial optimization and compare them. We also illustrate how to use such a methodology in two applications outside optimization. More details in this abstract.

The Hidden World of Bandits 

Abstract: Although the standard multiarmed bandit framework is by now very well understood and the available algorithms are optimal, in practice the learning performance is not always satisfactory. In order to alleviate this problem, a common approach is to assume the existence of a "structure" characterizing the shape of the bandit problem. Whenever such structure is known, the learning process can be significantly improved. On the other hand, when the structure is "hidden", one of the main challenges is how to learn it at the same time as minimizing the regret. While many forms of structures have been proposed (e.g., regularity of the reward function across arms), in this talk we show how modern tensor decomposition methods for latent variable models can be paired with standard UCBlike strategies to unveil the hidden structure at the same time as learning the best arm. In particular, we will review an application of tensor decomposition in the standard multiarmed bandit and in the more challenging POMDP problem.

Sampling convex bodies with Projected Monte Carlo 

Abstract: Given a convex body K in high dimension n, we consider the Markov chain whose transition consists in adding a small Gaussian and projecting back on K. We shall show that this chain approches the uniform measure on K in at most n^7 steps. The proof extends to the case where a convex potential is added, allowing us to sample from a smooth logconcave density restricted to K. The talk is based on a joint work with Sébastien Bubeck and Ronen Eldan.

Subsampling, Concentration and Multiarmed bandits 

Abstract: In the first part of this talk, we will provide novel concentration results for the setting of sampling without replacement from a finite population (subsampling), improving on the fundamental result of Serfling (1974), and extending it to obtain Bernstein and empirical Berstein concentration bounds. We will then move to the stochastic multiarmed bandit problem, a popular model of the exploration/exploitation tradeoff in sequential decision making. We introduce a novel algorithm that is based on subsampling. Despite its simplicity, we show that the algorithm demonstrates excellent empirical performances against stateoftheart algorithms, including Thompson sampling and KLUCB. The algorithm is very flexible, and does need to know a set of reward distributions in advance nor the range of the rewards. We provide a detailed experimental study comparing the algorithm to the state of the art, the main intuition that explains the striking results, and conclude with a finitetime regret analysis for this algorithm in the simplified twoarm bandit setting.

The Baire Metric and Ultrametric for Linear Computational Time Hierarchical Clustering: Applications and Implementations 

Abstract: The context of this work is search and discovery through clustering, i.e. unsupervised classification. Case studies of the clustering of large data sets are reviewed in astronomy and, in high dimensions, in chemistry. To have linear computational time hierarchical clustering, the Baire, or longest common prefix, metric, and simultaneously ultrametric, is used. For high dimensions, random projection is used. Since hierarchically clustered data can be scaled in one dimension, seriation or unidimensional scaling is an important objective. This talk will include comparison with other uses of random projection; other low dimensional mapping and clustering approaches that are related; and an overview of the (mainly) R software used.

Two problems in stochastic gradient descent: choosing the right learning rate and dealing with temporal dependencies (noniid) 

Abstract: We will present two ideas for dealing with two problems in online statistical learning: choosing the right step size for stochastic gradient descents and similar algorithms, and dealing with strong temporal dependencies. The performance (cumulated loss) of algorithms based on stochastic gradients is highly dependent on the chosen step size. Fortunately it is possible to approximate the derivative of cumulated loss with respect to the step size in an online fashion; the resulting gradient descent on the step size results in a reasonable practical performance even if the initial step size varies by several orders of magnitude, although no theoretical results is proved yet. Next, we will consider online learning of the parameters of a dynamical system, such as a hidden Markov model or a recurrent neural network. The usual online algorithms, such as the Kalman filter, usually scale like the third power of dimension. All the more fast algorithms are nononline and require to go back in time at some point (Bellman equations, backpropagation through time, forwardbackward algorithm). We will present an online algorithm that maintains a random but *unbiased* estimate of the gradient of the state of the system with respect to its parameters, at a low algorithmic cost.

HighlySmooth Zeroth Order Online Optimization 

Abstract: In this joint work with Francis Bach, we consider online convex optimization with noisy zeroth order information, that is noisy function evaluations at any desired point. We focus on problems with high degrees of smoothness, such as online logistic regression. We show that as opposed to gradientbased algorithms, highorder smoothness may be used to improve estimation rates, with a precise dependence on the degree of smoothness and the dimension. In particular, we show that for infinitely differentiable functions, we recover the same dependence on sample size as gradientbased algorithms, with an extra dimensiondependent factor. This is done for convex and stronglyconvex functions, with finite horizon and anytime algorithms.

Contextual Distributed Models for Sequences 

Abstract: Generative probabilistic models have many practical applications for analyzing sequences, in e.g. natural language processing and web log mining. For such models, taking into account the context (e.g. the person who is writing) is important. While a purely probabilistic approach would introduce further random variables to capture the context, in distributed generative models like recurrent or convolutional neural networks, an exciting possibility is to use a distributed representation of the context, that modifies the generative process. This is the topic of this talk.

Constraintbased Reputation Modelling on microblogs 

Abstract: In this talk we review modeling corporate Entities' Online eReputation based on PLSpm approaches. Dimensions to analyze eReputation are set up by analyst as latent variables. Machine Learning (ML) Natural Language Processing (NLP) approaches are used to label large sets of text passages. For each Dimension, several estimations of the relationship with each text passage are computed as well as Opinion and Priority. Automatic path modeling algorithm explains Opinion or Priority scores based on selected Dimensions. Even though PLSpm is based on standard Factor Analysis, it includes an iterative algorithm that makes it incompatible with objective function optimization. However there is an implicit bayesian latent probabilistic model that explains Model Robustness.

Robust sequential learning with applications to the forecasting of electricity consumption and of exchange rates 

Abstract: Sometimes, you feel you're spoilt for choice: there are so many good predictors that you could use! Why select and focus on just one? I will review the framework of robust online aggregation (also known as prediction of individual sequences or online aggregation of expert advice). This setting explains how to combine base forecasts provided by ensemble methods. No stochastic modeling is needed and the performance achieved is comparable to the one of the best (constant convex combination of) base forecast(s). I will illustrate the technology on various data sets, including electricity consumption and exchange rates. More importantly, I will point out open issues, both on the theoretical and on the practical sides.
