Paper Digest: COLT 2017 Highlights

June 24, 2017June 18, 2020 admin

Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The Annual Conference on Learning Theory (COLT) focuses on addressing theoretical aspects of machine learing and related topics.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: COLT 2017 Papers

	Title	Authors	Highlight
1	Preface: Conference on Learning Theory (COLT), 2017	Satyen Kale, Ohad Shamir	Preface: Conference on Learning Theory (COLT), 2017
2	Open Problem: First-Order Regret Bounds for Contextual Bandits	Alekh Agarwal, Akshay Krishnamurthy, John Langford, Haipeng Luo, Schapire Robert E.	We describe two open problems related to first order regret bounds for contextual bandits.
3	Open Problem: Meeting Times for Learning Random Automata	Benjamin Fish, Lev Reyzin	In this note, we propose a method to find faster algorithms for this problem.
4	Corralling a Band of Bandit Algorithms	Alekh Agarwal, Haipeng Luo, Behnam Neyshabur, Robert E. Schapire	As examples, we present two main applications.
5	Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons	Arpit Agarwal, Shivani Agarwal, Sepehr Assadi, Sanjeev Khanna	We study the relationship between query complexity and adaptivity in identifying the $k$ most biased coins among a set of $n$ coins with unknown biases.
6	Thompson Sampling for the MNL-Bandit	Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi	We present an approach to adapt Thompson Sampling to this problem and show that it achieves near-optimal regret as well as attractive numerical performance.
7	Homotopy Analysis for Tensor PCA	Anima Anandkumar, Yuan Deng, Rong Ge, Hossein Mobahi	In this paper, we analyze the class of homotopy or continuation methods for global optimization of nonconvex functions.
8	Correspondence retrieval	Alexandr Andoni, Daniel Hsu, Kevin Shi, Xiaorui Sun	In the case of independent standard Gaussian measurement vectors, the main algorithm proposed in this work requires $n = d+1$ measurements to correctly return the $k$ unknown points with high probability.
9	Efficient PAC Learning from the Crowd	Pranjal Awasthi, Avrim Blum, Nika Haghtalab, Yishay Mansour	In this paper, we show how by interleaving the process of labeling and learning, we can attain computational efficiency with much less overhead in the labeling cost.
10	The Price of Selection in Differential Privacy	Mitali Bafna, Jonathan Ullman	In the differentially private top-$k$ selection problem, we are given a dataset $X ∈\pmo^n \times d$, in which each row belongs to an individual and each column corresponds to some binary attribute, and our goal is to find a set of $k ≪d$ columns whose means are approximately as large as possible.
11	Computationally Efficient Robust Sparse Estimation in High Dimensions	Sivaraman Balakrishnan, Simon S. Du, Jerry Li, Aarti Singh	We consider the problem of robust estimation of sparse functionals, and provide a computationally and statistically efficient algorithm in the high-dimensional setting.
12	Learning-Theoretic Foundations of Algorithm Configuration for Combinatorial Partitioning Problems	Maria-Florina Balcan, Vaishnavh Nagarajan, Ellen Vitercik, Colin White	Recently, Gupta and Roughgarden introduced the first learning-theoretic framework to rigorously study this problem, using it to analyze classes of greedy heuristics, parameter tuning in gradient descent, and other problems.
13	The Sample Complexity of Optimizing a Convex Function	Eric Balkanski, Yaron Singer	In this paper we study optimization from samples of convex functions.
14	Efficient Co-Training of Linear Separators under Weak Dependence	Avrim Blum, Yishay Mansour	We develop the first polynomial-time algorithm for co-training of homogeneous linear separators under \em weak dependence, a relaxation of the condition of independence given the label.
15	Sampling from a log-concave distribution with compact support with proximal Langevin Monte Carlo	Nicolas Brosse, Alain Durmus, �ric Moulines, Marcelo Pereyra	This paper presents a detailed theoretical analysis of the Langevin Monte Carlo sampling algorithm recently introduced in Durmus et al. (Efficient Bayesian computation by proximal Markov chain Monte Carlo: when Langevin meets Moreau, 2016) when applied to log-concave probability distributions that are restricted to a convex body $K$.
16	Rates of estimation for determinantal point processes	Victor-Emmanuel Brunel, Ankur Moitra, Philippe Rigollet, John Urschel	In this paper, we study the local geometry of the expected log-likelihood function to prove several rates of convergence for the MLE.
17	Learning Disjunctions of Predicates	Nader H. Bshouty, Dana Drachsler-Cohen, Martin Vechev, Eran Yahav	We give an algorithm for learning $\mathcal F_∨:={\vee_f∈Sf \| S⊆\mathcal {F}}$ from membership queries.
18	Testing Bayesian Networks	Clement L. Canonne, Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart	Our main contribution is the first non-trivial efficient testing algorithms for these problems and corresponding information-theoretic lower bounds.
19	Multi-Observation Elicitation	Sebastian Casalaina-Martin, Rafael Frongillo, Tom Morgan, Bo Waggoner	We study loss functions that measure the accuracy of a prediction based on multiple data points simultaneously.
20	Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning	Nicol� Cesa-Bianchi, Pierre Gaillard, Claudio Gentile, S�bastien Gerchinovitz	For full information feedback and Lipschitz losses, we design the first explicit algorithm achieving the minimax regret rate (up to log factors).
21	Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration	Lijie Chen, Anupam Gupta, Jian Li, Mingda Qiao, Ruosong Wang	We study the combinatorial pure exploration problem \textscBest-Set in a stochastic multi-armed bandit game. We further introduce an even more general problem, formulated in geometric terms.
22	Towards Instance Optimal Bounds for Best Arm Identification	Lijie Chen, Jian Li, Mingda Qiao	In this paper, we make significant progress towards a complete resolution of the gap-entropy conjecture.
23	Thresholding Based Outlier Robust PCA	Yeshwanth Cherapanamjeri, Prateek Jain, Praneeth Netrapalli	In this work, we provide a novel thresholding based iterative algorithm with per-iteration complexity at most linear in the data size.
24	Tight Bounds for Bandit Combinatorial Optimization	Alon Cohen, Tamir Hazan, Tomer Koren	We revisit the study of optimal regret rates in bandit combinatorial optimization—a fundamental framework for sequential decision making under uncertainty that abstracts numerous combinatorial prediction problems.
25	Online Learning Without Prior Information	Ashok Cutkosky, Kwabena Boahen	We describe a frontier of new lower bounds on the performance of such algorithms, reflecting a tradeoff between a term that depends on the optimal parameter value and a term that depends on the gradients’ rate of growth.
26	Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent	Arnak Dalalyan	In this paper, we revisit the recently established theoretical guarantees for the convergence of the Langevin Monte Carlo algorithm of sampling from a smooth and (strongly) log-concave density.
27	Depth Separation for Neural Networks	Amit Daniely	We give a simple proof that shows that poly-size depth two neural networks with (exponentially) bounded weights cannot approximate $f$ whenever $g$ cannot be approximated by a low degree polynomial.
28	Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing	Constantinos Daskalakis, Qinxuan Pan	We show that the square Hellinger distance between two Bayesian networks on the same directed graph, $G$, is subadditive with respect to the neighborhoods of $G$.
29	Ten Steps of EM Suffice for Mixtures of Two Gaussians	Constantinos Daskalakis, Christos Tzamos, Manolis Zampetakis	We provide global convergence guarantees for mixtures of two Gaussians with known covariance matrices.
30	Learning Multivariate Log-concave Distributions	Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart	We study the problem of estimating multivariate log-concave probability density functions.
31	Generalization for Adaptively-chosen Estimators via Stable Median	Vitaly Feldman, Thomas Steinke	We present an algorithm that estimates the expectations of $k$ arbitrary adaptively-chosen real-valued estimators using a number of samples that scales as $\sqrt{k}$.
32	Greed Is Good: Near-Optimal Submodular Maximization via Greedy Optimization	Moran Feldman, Christopher Harshaw, Amin Karbasi	In this paper, we show—arguably, surprisingly—that invoking the classical greedy algorithm $O(\sqrt{k})$-times leads to the (currently) fastest deterministic algorithm, called RepeatedGreedy, for maximizing a general submodular function subject to $k$-independent system constraints.
33	A General Characterization of the Statistical Query Complexity	Vitaly Feldman	We give applications of our techniques to two open problems in learning theory and to algorithms that are subject to memory and communication constraints.
34	Stochastic Composite Least-Squares Regression with Convergence Rate $O(1/n)$	Nicolas Flammarion, Francis Bach	We study the stochastic dual averaging algorithm with a constant step-size, showing that it leads to a convergence rate of O(1/n) without strong convexity assumptions.
35	ZigZag: A New Approach to Adaptive Online Learning	Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan	To obtain such adaptive methods, we introduce novel machinery, and the resulting algorithms are not based on the standard tools of online convex optimization.
36	Memoryless Sequences for Differentiable Losses	Rafael Frongillo, Andrew Nobel	In this paper, we ask how changing the loss function used changes the set of memoryless sequences, and in particular, the stochastic attributes they possess.
37	Matrix Completion from $O(n)$ Samples in Linear Time	David Gamarnik, Quan Li, Hongyi Zhang	In this paper, we propose a new matrix completion algorithm using a novel sampling scheme based on a union of independent sparse random regular bipartite graphs.
38	High Dimensional Regression with Binary Coefficients. Estimating Squared Error and a Phase Transtition	Gamarnik David, Zadik Ilias	We consider a sparse linear regression model $Y=Xβ^+W$ where $X$ is $n\times p$ matrix Gaussian i.i.d. entries, $W$ is $n\times 1$ noise vector with i.i.d. mean zero Gaussian entries and standard deviation $σ$, and $β^$ is $p\times 1$ binary vector with support size (sparsity) $k$.
39	Two-Sample Tests for Large Random Graphs Using Network Statistics	Debarghya Ghoshdastidar, Maurilio Gutzeit, Alexandra Carpentier, Ulrike von Luxburg	In this paper, we present a general principle for two-sample hypothesis testing in such scenarios without making any assumption about the network generation process.
40	Effective Semisupervised Learning on Manifolds	Amir Globerson, Roi Livni, Shai Shalev-Shwartz	The algorithm we analyse is similar to subspace clustering, and thus our results demonstrate that this method can be used to improve sample complexity.
41	Reliably Learning the ReLU in Polynomial Time	Surbhi Goel, Varun Kanade, Adam Klivans, Justin Thaler	We give the first dimension-efficient algorithms for learning Rectified Linear Units (ReLUs), which are functions of the form $\mathbf{x} \mapsto \mathsf{max}(0, \mathbf{w} ⋅\mathbf{x})$ with $\mathbf{w} ∈\mathbb{S}^n-1$.
42	Fast Rates for Empirical Risk Minimization of Strict Saddle Problems	Alon Gonen, Shai Shalev-Shwartz	We derive bounds on the sample complexity of empirical risk minimization (ERM) in the context of minimizing non-convex risks that admit the strict saddle property.
43	Nearly-tight VC-dimension bounds for piecewise linear neural networks	Nick Harvey, Christopher Liaw, Abbas Mehrabian	We prove new upper and lower bounds on the VC-dimension of deep neural networks with the ReLU activation function.
44	Submodular Optimization under Noise	Avinatan Hassidim, Yaron Singer	In many applications, however, we do not have access to the submodular function we aim to optimize, but rather to some erroneous or noisy version of it.
45	Surprising properties of dropout in deep networks	David P. Helmbold, Philip M. Long	We analyze dropout in deep networks with rectified linear units and the quadratic loss.
46	Quadratic Upper Bound for Recursive Teaching Dimension of Finite VC Classes	Lunjia Hu, Ruihan Wu, Tianhong Li, Liwei Wang	In this work we study the quantitative relation between the recursive teaching dimension (RTD) and the VC dimension (VCD) of concept classes of finite sizes.
47	A Unified Analysis of Stochastic Optimization Methods Using Jump System Theory and Quadratic Constraints	Bin Hu, Peter Seiler, Anders Rantzer	We make use of the symmetry in the stochastic optimization methods and reduce these LMIs to some equivalent small LMIs whose sizes are at most 3 by 3.
48	The Hidden Hubs Problem	Ravindran Kannan, Santosh Vempala	We introduce the following \em hidden hubs model $H(n,k,\sigma_0, \sigma_1)$: the input is an $n \times n$ random matrix $A$ with a subset $S$ of $k$ special rows (hubs); entries in rows outside $S$ are generated from the Gaussian distribution $p_0 = N(0,\sigma_0^2)$, while for each row in $S$, an unknown subset of $k$ of its entries are generated from $p_1 = N(0,\sigma_1^2)$, $\sigma_1>\sigma_0$, and the rest of the entries from $p_0$.
49	Predicting with Distributions	Michael Kearns, Zhiwei Steven Wu	We consider a new learning model in which a joint distribution over vector pairs $(x,y)$ is determined by an unknown function $c(x)$ that maps input vectors $x$ not to individual outputs, but to entire \em distributions\/ over output vectors $y$.
50	Bandits with Movement Costs and Adaptive Pricing	Tomer Koren, Roi Livni, Yishay Mansour	We extend the model of Multi-Armed Bandit with unit switching cost to incorporate a metric between the actions.
51	Sparse Stochastic Bandits	Joon Kwon, Vianney Perchet, Claire Vernade	We here consider the \emphsparse case of this classical problem in the sense that only a small number of arms, namely $s
52	On the Ability of Neural Nets to Express Distributions	Holden Lee, Rong Ge, Tengyu Ma, Andrej Risteski, Sanjeev Arora	These models are trained using ideas like variational autoencoders and Generative Adversarial Networks.
53	Fundamental limits of symmetric low-rank matrix estimation	Marc Lelarge, L�o Miolane	We consider the high-dimensional inference problem where the signal is a low-rank symmetric matrix which is corrupted by an additive Gaussian noise.
54	Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities	Jerry Li, Ludwig Schmidt	In this paper, we significantly improve this dependence by replacing the $1/ε$ term with $\log 1/ε$, while only increasing the exponent moderately.
55	Adaptivity to Noise Parameters in Nonparametric Active Learning	Carpentier Alexandra Locatelli Andrea, Kpotufe Samory	Our contributions are both statistical and algorithmic: \beginitemize \item We establish new minimax-rates for active learning under common noise conditions.
56	Noisy Population Recovery from Unknown Noise	Shachar Lovett, Jiapeng Zhang	In this work, we remove this assumption, and show how to recover the underlying parameters, even when the noise is unknown, in quasi-polynomial time.
57	Inapproximability of VC Dimension and Littlestone�s Dimension	Pasin Manurangsi, Aviad Rubinstein	We study the complexity of computing the VC Dimension and Littlestone’s Dimension.
58	A Second-order Look at Stability and Generalization	Andreas Maurer	A Second-order Look at Stability and Generalization
59	Solving SDPs for synchronization and MaxCut problems via the Grothendieck inequality	Song Mei, Theodor Misiakiewicz, Andrea Montanari, Roberto Imbuzeiro Oliveira	In this paper we study the rank-constrained version of SDPs arising in MaxCut and in $\mathbb Z_2$ and $\rm SO(d)$ synchronization problems.
60	Mixing Implies Lower Bounds for Space Bounded Learning	Dana Moshkovitz, Michal Moshkovitz	In this paper we give such a condition.
61	Fast rates for online learning in Linearly Solvable Markov Decision Processes	Gergely Neu, Vicen� G�mez	In the current paper, we consider an online setting where the state costs may change arbitrarily between consecutive rounds, and the learner only observes the costs at the end of each respective round.
62	Sample complexity of population recovery	Yury Polyanskiy, Ananda Theertha Suresh, Yihong Wu	We consider one of the two polling impediments: \beginitemize \item in lossy population recovery, a pollee may skip each question with probability $ε$; \item in noisy population recovery, a pollee may lie on each question with probability $ε$.
63	Exact tensor completion with sum-of-squares	Aaron Potechin, David Steurer	We obtain the first polynomial-time algorithm for exact tensor completion that improves over the bound implied by reduction to matrix completion.
64	Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis	Maxim Raginsky, Alexander Rakhlin, Matus Telgarsky	The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks.
65	On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities	Alexander Rakhlin, Karthik Sridharan	We study an equivalence of (i) deterministic pathwise statements appearing in the online learning literature (termed \emphregret bounds), (ii) high-probability tail bounds for the supremum of a collection of martingales (of a specific form arising from uniform laws of large numbers), and (iii) in-expectation bounds for the supremum.
66	Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization	Jonathan Scarlett, Ilija Bogunovic, Volkan Cevher	In this paper, we consider the problem of sequentially optimizing a black-box function $f$ based on noisy samples and bandit feedback.
67	An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits	Yevgeny Seldin, G�bor Lugosi	We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014).
68	Fast and robust tensor decomposition with applications to dictionary learning	Tselil Schramm, David Steurer	In this work, we introduce general techniques to capture the guarantees of SOS for worst-case problems.
69	The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime	Max Simchowitz, Kevin Jamieson, Benjamin Recht	We propose a novel technique for analyzing adaptive sampling called the Simulator.
70	On Learning vs. Refutation	Salil Vadhan	Building on the work of Daniely et al. (STOC 2014, COLT 2016), we study the connection between computationally efficient PAC learning and refutation of constraint satisfaction problems.
71	Ignoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization	Daniel Vainsencher, Shie Mannor, Huan Xu	We propose an approach that iterates between finding a solution with minimal empirical loss and re-weighting the data, reinforcing data points where the previous solution works well.
72	Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox	Jialei Wang, Weiran Wang, Nathan Srebro	We present and analyze statistically optimal, communication and memory efficient distributed stochastic optimization algorithms with near-linear speedups (up to $\log$-factors).
73	Learning Non-Discriminatory Predictors	Blake Woodworth, Suriya Gunasekar, Mesrob I. Ohannessian, Nathan Srebro	We study the problem of learning such a non-discriminatory predictor from a finite training set, both statistically and computationally.
74	Empirical Risk Minimization for Stochastic Convex Optimization: $O(1/n)$- and $O(1/n^2)$-type of Risk Bounds	Lijun Zhang, Tianbao Yang, Rong Jin	In this work, we strengthen the realm of ERM for SCO by exploiting smoothness and strong convexity conditions to improve the risk bounds.
75	A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics	Yuchen Zhang, Percy Liang, Moses Charikar	We study the Stochastic Gradient Langevin Dynamics (SGLD) algorithm for non-convex optimization.
76	Optimal learning via local entropies and sample compression	Zhivotovskiy Nikita	In particular, we provide a new tight PAC bound for the hard-margin SVM, an extended analysis of certain empirical risk minimizers under log-concave distributions, a new variant of an online to batch conversion, and distribution dependent localized bounds in the aggregation framework.