Paper Digest: COLT 2015 Highlights

June 24, 2015June 18, 2020 admin

Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The Annual Conference on Learning Theory (COLT) focuses on addressing theoretical aspects of machine learing and related topics.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: COLT 2015 Papers

	Title	Authors	Highlight
1	Conference on Learning Theory 2015: Preface	Peter Gr�nwald, Elad Hazan	Conference on Learning Theory 2015: Preface
2	Open Problem: Restricted Eigenvalue Condition for Heavy Tailed Designs	Arindam Banerjee, Sheng Chen, Vidyashankar Sivakumar	We pose the equivalent question for heavy-tailed distributions: Given a random design matrix drawn from a heavy-tailed distribution satisfying the smallball property (Mendelson, 2015), does the design matrix satisfy the RE condition with the same order of sample complexity as sub-Gaussian distributions?
3	Open Problem: The landscape of the loss surfaces of multilayer networks	Anna Choromanska, Yann LeCun, G�rard Ben Arous	The question we pose is whether it is possible to drop some of these assumptions to establish a stronger connection between both models.
4	Open Problem: The Oracle Complexity of Smooth Convex Optimization in Nonstandard Settings	Crist�bal Guzm�n	We propose a conjecture on the optimal convergence rates for these settings, for which a positive answer would lead to significant improvements on minimization algorithms for parsimonious regression models.
5	Open Problem: Online Sabotaged Shortest Path	Wouter M. Koolen, Manfred K. Warmuth, Dmitri Adamskiy	In this note we revisit this online routing problem in the case where in each trial some of the edges or components are sabotaged / blocked.
6	Open Problem: Learning Quantum Circuits with Queries	Jeremy Kun, Lev Reyzin	We pose an open problem on the complexity of learning the behavior of a quantum circuit with value injection queries.
7	Open Problem: Recursive Teaching Dimension Versus VC Dimension	Hans U. Simon, Sandra Zilles	We pose the following question: is the RTD upper-bounded by a function that grows only linearly in the VCD?
8	On Consistent Surrogate Risk Minimization and Property Elicitation	Arpit Agarwal, Shivani Agarwal	In this paper, we connect these two themes by showing that calibrated surrogate losses in supervised learning can essentially be viewed as eliciting or estimating certain properties of the underlying conditional label distribution that are sufficient to construct an optimal classifier under the target loss of interest.
9	Online Learning with Feedback Graphs: Beyond Bandits	Noga Alon, Nicol� Cesa-Bianchi, Ofer Dekel, Tomer Koren	We study a general class of online learning problems where the feedback is specified by a graph.
10	Learning Overcomplete Latent Variable Models through Tensor Methods	Animashree Anandkumar, Rong Ge, Majid Janzamin	In the unsupervised setting, a simple initialization algorithm based on SVD of the tensor slices is proposed, and the guarantees are provided under the stricter condition that k ≤βd (where constant βcan be larger than 1).
11	Simple, Efficient, and Neural Algorithms for Sparse Coding	Sanjeev Arora, Rong Ge, Tengyu Ma, Ankur Moitra	Here we give a general framework for understanding alternating minimization which we leverage to analyze existing heuristics and to design new ones also with provable guarantees.
12	Label optimal regret bounds for online local learning	Pranjal Awasthi, Moses Charikar, Kevin A Lai, Andrej Risteski	In this work, we provide a complete answer to the question above via two main results.
13	Efficient Learning of Linear Separators under Bounded Noise	Pranjal Awasthi, Maria-Florina Balcan, Nika Haghtalab, Ruth Urner	We provide the first polynomial time algorithm that can learn linear separators to arbitrarily small excess error in this noise model under the uniform distribution over the unit sphere in \Re^d, for some constant value of η.
14	Efficient Representations for Lifelong Learning and Autoencoding	Maria-Florina Balcan, Avrim Blum, Santosh Vempala	In this work we pose and provide efficient algorithms for several natural theoretical formulations of this goal.
15	Optimally Combining Classifiers Using Unlabeled Data	Akshay Balsubramani, Yoav Freund	We develop a worst-case analysis of aggregation of classifier ensembles for binary classification.
16	Minimax Fixed-Design Linear Regression	Peter L. Bartlett, Wouter M. Koolen, Alan Malek, Eiji Takimoto, Manfred K. Warmuth	We consider a linear regression game in which the covariates are known in advance: at each round, the learner predicts a real-value, the adversary reveals a label, and the learner incurs a squared error loss.
17	Escaping the Local Minima via Simulated Annealing: Optimization of Approximately Convex Functions	Alexandre Belloni, Tengyuan Liang, Hariharan Narayanan, Alexander Rakhlin	Other applications of the method discussed in this work include private computation of empirical risk minimizers, two-stage stochastic programming, and approximate dynamic programming for online learning.
18	Bandit Convex Optimization: \sqrtT Regret in One Dimension	S�bastien Bubeck, Ofer Dekel, Tomer Koren, Yuval Peres	Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate.
19	The entropic barrier: a simple and optimal universal self-concordant barrier	S�bastien Bubeck, Ronen Eldan	We prove that the Fenchel dual of the log-Laplace transform of the uniform measure on a convex body in \mathbbR^n is a (1+o(1)) n-self-concordant barrier, improving a seminal result of Nesterov and Nemirovski.
20	Optimum Statistical Estimation with Strategic Data Sources	Yang Cai, Constantinos Daskalakis, Christos Papadimitriou	We propose an optimum mechanism for providing monetary incentives to the data sources of a statistical estimator such as linear regression, so that high quality data is provided at low cost, in the sense that the weighted sum of payments and estimation error is minimized.
21	On the Complexity of Learning with Kernels	Nicol� Cesa-Bianchi, Yishay Mansour, Ohad Shamir	In this paper, we study lower bounds on the error attainable by such methods as a function of the number of entries observed in the kernel matrix or the rank of an approximate kernel matrix.
22	Learnability of Solutions to Conjunctive Queries: The Full Dichotomy	Hubie Chen, Matthew Valeriote	In this article, we study a family of such learning problems; this family contains, for each relational structure, the problem of learning the solution space of an unknown conjunctive query evaluated on the structure.
23	Sequential Information Maximization: When is Greedy Near-optimal?	Yuxin Chen, S. Hamed Hassani, Amin Karbasi, Andreas Krause	In this paper, we analyze the widely used greedy policy for this task, and identify problem instances where it provides provably near-maximal utility, even in the challenging setting of persistent noise.
24	Efficient Sampling for Gaussian Graphical Models via Spectral Sparsification	Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, Shang-Hua Teng	We give two sparsification methods for this approach that may be of independent interest.
25	Stochastic Block Model and Community Detection in Sparse Graphs: A spectral algorithm with optimal rate of recovery	Peter Chin, Anup Rao, Van Vu	In this paper, we present and analyze a simple and robust spectral algorithm for the stochastic block model with k blocks, for any k fixed.
26	On-Line Learning Algorithms for Path Experts with Non-Additive Losses	Corinna Cortes, Vitaly Kuznetsov, Mehryar Mohri, Manfred Warmuth	We give new algorithms extending the Follow-the-Perturbed-Leader (FPL) algorithm to both of these families of loss functions and similarly give new algorithms extending the Randomized Weighted Majority (RWM) algorithm to both of these families.
27	Truthful Linear Regression	Rachel Cummings, Stratis Ioannidis, Katrina Ligett	We consider the problem of fitting a linear model to data held by individuals who are concerned about their privacy.
28	A PTAS for Agnostically Learning Halfspaces	Amit Daniely	We present a PTAS for agnostically learning halfspaces w.r.t. the uniform distribution on the d dimensional sphere.
29	S2: An Efficient Graph Based Active Learning Algorithm with Application to Nonparametric Classification	Gautam Dasarathy, Robert Nowak, Xiaojin Zhu	We introduce a simple and label-efficient algorithm called S^2 for this task.
30	Improved Sum-of-Squares Lower Bounds for Hidden Clique and Hidden Submatrix Problems	Yash Deshpande, Andrea Montanari	Recently, \citemeka2013association proposed a method to establish lower bounds for the hidden clique problem within the Sum of Squares (SOS) semidefinite hierarchy.
31	Contextual Dueling Bandits	Miroslav Dud�k, Katja Hofmann, Robert E. Schapire, Aleksandrs Slivkins, Masrour Zoghi	Here, we propose a new and natural solution concept, rooted in game theory, called a \emphvon Neumann winner, a randomized policy that beats or ties every other policy.
32	Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical Clustering	Justin Eldridge, Mikhail Belkin, Yusu Wang	In this paper we identify two limit properties, \emphseparation and \emphminimality, which address both over-segmentation and improper nesting and together imply (but are not implied by) Hartigan consistency.
33	Faster Algorithms for Testing under Conditional Sampling	Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapati, Ananda Theertha Suresh	We study two of the most important tests under the conditional-sampling model where each query specifies a subset S of the domain, and the response is a sample drawn from S according to the underlying distribution.
34	Learning and inference in the presence of corrupted inputs	Uriel Feige, Yishay Mansour, Robert Schapire	We model the classification and inference problems as a zero-sum game between a learner, minimizing the expected error, and an adversary, maximizing the expected error.
35	From Averaging to Acceleration, There is Only a Step-size	Nicolas Flammarion, Francis Bach	We provide a detailed analysis of the eigenvalues of the corresponding linear dynamical system, showing various oscillatory and non-oscillatory behaviors, together with a sharp stability result with explicit constants.
36	Variable Selection is Hard	Dean Foster, Howard Karloff, Justin Thaler	Assuming a standard complexity hypothesis, we show that no polynomial-time algorithm can find a k’-sparse \bfx with \\|B\bfx-\bfy\\|^2\le h(m,p), where k’=k⋅2^\log ^1-δ p and h(m,p)= p^C_1 m^1-C_2, where δ>0,C_1>0,C_2>0 are arbitrary.
37	Vector-Valued Property Elicitation	Rafael Frongillo, Ian A. Kash	We show that linear and ratio-of-linear do admit nonseparable scores, and provide evidence for a conjecture that these are the only such properties (up to link functions).
38	Competing with the Empirical Risk Minimizer in a Single Pass	Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford	Our goal in this work is to perform as well as the ERM, on \emphevery problem, while minimizing the use of computational resources such as running time and space usage.
39	A Chaining Algorithm for Online Nonparametric Regression	Pierre Gaillard, S�bastien Gerchinovitz	We consider the problem of online nonparametric regression with arbitrary deterministic sequences.
40	Escaping From Saddle Points � Online Stochastic Gradient for Tensor Decomposition	Rong Ge, Furong Huang, Chi Jin, Yang Yuan	In this paper we identify \em strict saddle property for non-convex problem that allows for efficient optimization.
41	Learning the dependence structure of rare events: a non-asymptotic study	Nicolas Goix, Anne Sabourin, St�phan Cl�men\ccon	The main purpose of this paper is to fill this gap.
42	Thompson Sampling for Learning Parameterized Markov Decision Processes	Aditya Gopalan, Shie Mannor	We present a version of Thompson sampling for parameterized reinforcement learning problems, and derive a frequentist regret bound for priors over general parameter spaces.
43	Computational Lower Bounds for Community Detection on Random Graphs	Bruce Hajek, Yihong Wu, Jiaming Xu	This paper studies the problem of detecting the presence of a small dense community planted in a large Erdős-Rényi random graph \calG(N,q), where the edge probability within the community exceeds q by a constant factor.
44	Adaptive Recovery of Signals by Convex Optimization	Zaid Harchaoui, Anatoli Juditsky, Arkadi Nemirovski, Dmitry Ostrovsky	We present a theoretical framework for adaptive estimation and prediction of signals of unknown structure in the presence of noise.
45	Tensor principal component analysis via sum-of-square proofs	Samuel B. Hopkins, Jonathan Shi, David Steurer	We study a statistical model for the \emphtensor principal component analysis problem introduced by Montanari and Richard: Given a order-3 tensor \mathbf T of the form \mathbf T = τ⋅v_0^⊗3 + \mathbf A, where τ≥0 is a signal-to-noise ratio, v_0 is a unit vector, and \mathbf A is a random noise tensor, the goal is to recover the planted vector v_0.
46	Fast Exact Matrix Completion with Finite Samples	Prateek Jain, Praneeth Netrapalli	In this paper, we present a fast iterative algorithm that solves the matrix completion problem by observing O\left(nr^5 \log^3 n\right) entries, which is independent of the condition number and the desired accuracy.
47	Exp-Concavity of Proper Composite Losses	Parameswaran Kamalaruban, Robert Williamson, Xinhua Zhang	In this paper we provide a complete characterization of the exp-concavity of any proper composite loss.
48	On Learning Distributions from their Samples	Sudeep Kamath, Alon Orlitsky, Dheeraj Pichapati, Ananda Theertha Suresh	We study distribution approximations for general loss measures.
49	MCMC Learning	Varun Kanade, Elchanan Mossel	In this paper we initiate the investigation of extending central ideas, methods and algorithms from the theory of learning under the uniform distribution to the setup of learning concepts given examples from MRF distributions.
50	Online PCA with Spectral Bounds	Zohar Karnin, Edo Liberty	We describe two simple and deterministic algorithms.
51	Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem	Junpei Komiyama, Junya Honda, Hisashi Kashima, Hiroshi Nakagawa	We introduce a tight asymptotic regret lower bound that is based on the information divergence.
52	Second-order Quantile Methods for Experts and Combinatorial Games	Wouter M. Koolen, Tim Van Erven	We aim to design strategies for sequential decision making that adjust to the difficulty of the learning problem.
53	Hierarchical Label Queries with Data-Dependent Partitions	Samory Kpotufe, Ruth Urner, Shai Ben-David	Given a joint distribution P_X, Y over a space \Xcal and a label set \Ycal=\braces0, 1, we consider the problem of recovering the labels of an unlabeled sample with as few label queries as possible.
54	Algorithms for Lipschitz Learning on Graphs	Rasmus Kyng, Anup Rao, Sushant Sachdeva, Daniel A. Spielman	We present an algorithm that computes a minimal Lipschitz extension in expected linear time, and an algorithm that computes an absolutely minimal Lipschitz extension in expected time \widetildeO (m n).
55	Low Rank Matrix Completion with Exponential Family Noise	Jean Lafond	When the sampling distribution is known, we propose another estimator and prove an oracle inequality \em w.r.t. the Kullback-Leibler prediction risk, which translates immediately into an upper bound on the Frobenius prediction risk.
56	Bad Universal Priors and Notions of Optimality	Jan Leike, Marcus Hutter	We show that Legg-Hutter intelligence and thus balanced Pareto optimality is entirely subjective, and that every policy is Pareto optimal in the class of all computable environments.
57	Learning with Square Loss: Localization through Offset Rademacher Complexity	Tengyuan Liang, Alexander Rakhlin, Karthik Sridharan	We introduce a notion of offset Rademacher complexity that provides a transparent way to study localization both in expectation and in high probability.
58	Achieving All with No Parameters: AdaNormalHedge	Haipeng Luo, Robert E. Schapire	We study the classic online learning problem of predicting with expert advice, and propose a truly parameter-free and adaptive algorithm that achieves several objectives simultaneously without using any prior information.
59	Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization	Mehrdad Mahdavi, Lijun Zhang, Rong Jin	In this paper we derive \textithigh probability lower and upper bounds on the excess risk of stochastic optimization of exponentially concave loss functions.
60	Correlation Clustering with Noisy Partial Information	Konstantin Makarychev, Yury Makarychev, Aravindan Vijayaraghavan	In this paper, we propose and study a semi-random model for the Correlation Clustering problem on arbitrary graphs G.
61	Online Density Estimation of Bradley-Terry Models	Issei Matsumoto, Kohei Hatano, Eiji Takimoto	We consider an online density estimation problem for the Bradley-Terry model, where each model parameter defines the probability of a match result between any pair in a set of n teams.
62	First-order regret bounds for combinatorial semi-bandits	Gergely Neu	In this paper, we propose an algorithm that improves this scaling to \widetildeO(\sqrtL_T^), where L_T^ is the total loss of the best action.
63	Norm-Based Capacity Control in Neural Networks	Behnam Neyshabur, Ryota Tomioka, Nathan Srebro	We investigate the capacity, convexity and characterization of a general family of norm-constrained feed-forward networks.
64	Cortical Learning via Prediction	Christos H. Papadimitriou, Santosh S. Vempala	Using Valiant’s neuronal model as a foundation, we introduce PJOIN (for “predictive join"), a primitive that combines association and prediction.
65	Partitioning Well-Clustered Graphs: Spectral Clustering Works!	Richard Peng, He Sun, Luca Zanetti	In this work we study the widely used \emphspectral clustering algorithms, i.e. partition a graph into k clusters via (1) embedding the vertices of a graph into a low-dimensional space using the bottom eigenvectors of the Laplacian matrix, and (2) partitioning embedded points via k-means algorithms.
66	Batched Bandit Problems	Vianney Perchet, Philippe Rigollet, Sylvain Chassang, Erik Snowberg	Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches.
67	Hierarchies of Relaxations for Online Prediction Problems with Evolving Constraints	Alexander Rakhlin, Karthik Sridharan	Despite this, we provide polynomial-time prediction algorithms that achieve low regret against combinatorial benchmark sets.
68	Fast Mixing for Discrete Point Processes	Patrick Rebeschini, Amin Karbasi	We investigate the systematic mechanism for designing fast mixing Markov chain Monte Carlo algorithms to sample from discrete point processes under the Dobrushin uniqueness condition for Gibbs measures.
69	Generalized Mixability via Entropic Duality	Mark D. Reid, Rafael M. Frongillo, Robert C. Williamson, Nishant Mehta	In doing so we introduce a more general notion of Φ-mixability where Φis a general entropy (\emphi.e., any convex function on probabilities).
70	On the Complexity of Bandit Linear Optimization	Ohad Shamir	This and other results we present highlight some interesting differences between full-information and bandit learning, which were not considered in previous literature.
71	An Almost Optimal PAC Algorithm	Hans U. Simon	In contrast to this result, we show that every consistent algorithm L (even a provably suboptimal one) induces a family (L_K)_K\ge1 of PAC algorithms (with 2K-1 calls of L as a subroutine) which come very close to optimality: the number of labeled examples needed by L_K exceeds the general lower bound only by factor \ell_K(1/\epsillon) where \ell_K denotes (a truncated version of) the K-times iterated logarithm.
72	Minimax rates for memory-bounded sparse linear regression	Jacob Steinhardt, John Duchi	We establish a minimax lower bound of Ω(\frackdBε) on the sample size needed to estimate parameters in a k-sparse linear regression of dimension d under memory restrictions to B bits, where εis the \ell_2 parameter error.
73	Interactive Fingerprinting Codes and the Hardness of Preventing False Discovery	Thomas Steinke, Jonathan Ullman	In order to optimize our hardness result, we give a new Fourier-analytic approach to analyzing fingerprinting codes that is simpler, more flexible, and yields better parameters than previous constructions.
74	Convex Risk Minimization and Conditional Probability Estimation	Matus Telgarsky, Miroslav Dud�k, Robert Schapire	This paper proves, in very general settings, that convex risk minimization is a procedure to select a unique conditional probability model determined by the classification problem.
75	Regularized Linear Regression: A Precise Analysis of the Estimation Error	Christos Thrampoulidis, Samet Oymak, Babak Hassibi	We focus on the problem of linear regression and consider a general class of optimization methods that minimize a loss function measuring the misfit of the model to the observations with an added structured-inducing regularization term.
76	Max vs Min: Tensor Decomposition and ICA with nearly Linear Sample Complexity	Santosh S. Vempala, Ying. Xiao	We present a simple, general technique for reducing the sample complexity of matrix and tensor decomposition algorithms applied to distributions.
77	On Convergence of Emphatic Temporal-Difference Learning	H. Yu	We present in this paper the first convergence proofs for two emphatic algorithms, ETD(λ) and ELSTD(λ).