Paper Digest: COLT 2020 Highlights

July 25, 2020August 18, 2020 admin

Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The Annual Conference on Learning Theory (COLT) focuses on addressing theoretical aspects of machine learing and related topics.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: COLT 2020 Papers

	Title	Authors	Highlight
1	Domain Compression and its Application to Randomness-Optimal Distributed Goodness-of-Fit	Jayadev Acharya, Cl?ment L Canonne, Yanjun Han, Ziteng Sun, Himanshu Tyagi	In this work, we provide a complete understanding of the interplay between the amount of shared randomness available, the stringency of information constraints, and the sample complexity of the testing problem by characterizing a tight trade-off between these three parameters.
2	Distributed Signal Detection under Communication Constraints	Jayadev Acharya, Cl?ment L Canonne, Himanshu Tyagi	We study this distributed testing problem with and without the availability of a common randomness shared by the users.
3	Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes	Alekh Agarwal, Sham M Kakade, Jason D Lee, Gaurav Mahajan	This work provides provable characterizations of computational, approximation, and sample size issues with regards to policy gradient methods in the context of discounted Markov Decision Processes (MDPs).
4	Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal	Alekh Agarwal, Sham Kakade, Lin F. Yang	This work considers the sample and computational complexity of obtaining an ?-optimal policy in a discounted Markov Decision Process (MDP), given only access to a generative model.
5	From Nesterov?s Estimate Sequence to Riemannian Acceleration	Kwangjun Ahn, Suvrit Sra	We propose the first global accelerated gradient method for Riemannian manifolds.
6	Closure Properties for Private Classification and Online Prediction	Noga Alon, Amos Beimel, Shay Moran, Uri Stemmer	As a corollary, we derive closure properties for online learning and private PAC learning.
7	Hierarchical Clustering: A 0.585 Revenue Approximation	Noga Alon, Yossi Azar, Danny Vainstein	Hierarchical Clustering: A 0.585 Revenue Approximation
8	Winnowing with Gradient Descent	Ehsan Amid, Manfred K. Warmuth	The performance of multiplicative updates is typically logarithmic in the number of features when the targets are sparse. Strikingly, we show that the same property can also be achieved with gradient descent updates.
9	Pan-Private Uniformity Testing	Kareem Amin, Matthew Joseph, Jieming Mao	We study the intermediate model of \emph{pan-privacy}.
10	Dimension-Free Bounds for Chasing Convex Functions	C.J. Argue, Anupam Gupta, Guru Guruganesh	We consider the problem of chasing convex functions, where functions arrive over time.
11	Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations	Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan	We design an algorithm which finds an $\epsilon$-approximate stationary point (with $\\|\nabla F(x)\\|\le \epsilon$) using $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed.
12	Data-driven confidence bands for distributed nonparametric regression	Valeriy Avanesov	In this paper we suggest a novel computationally efficient fully data-driven algorithm, quantifying uncertainty of this method, yielding frequentist $L_2$-confidence bands.
13	Estimating Principal Components under Adversarial Perturbations	Pranjal Awasthi, Xue Chen, Aravindan Vijayaraghavan	We study a natural model of robustness for high-dimensional statistical estimation problems that we call the {\em adversarial perturbation model}.
14	Active Local Learning	Arturs Backurs, Avrim Blum, Neha Gupta	In this work we consider active {\em local learning}: given a query point $x$, and active access to an unlabeled training set $S$, output the prediction $h(x)$ of a near-optimal $h \in H$ using significantly fewer labels than would be needed to actually learn $h$ fully.
15	Finite Regret and Cycles with Fixed Step-Size via Alternating Gradient Descent-Ascent	James P. Bailey, Gauthier Gidel, Georgios Piliouras	In this paper, we eliminate these negative properties by considering a different implementation to obtain $O\left( \nicefrac{1}{T}\right)$ time-average regret via arbitrary fixed step-size.
16	Calibrated Surrogate Losses for Adversarially Robust Classification	Han Bao, Clay Scott, Masashi Sugiyama	In this work, we consider the question of which surrogate losses are \emph{calibrated} with respect to the adversarial 0-1 loss, meaning that minimization of the former implies minimization of the latter.
17	Complexity Guarantees for Polyak Steps with Momentum	Mathieu Barr?, Adrien Taylor, Alexandre d?Aspremont	In this work, we study a class of methods, based on Polyak steps, where this knowledge is substituted by that of the optimal value, $f_*$.
18	Free Energy Wells and Overlap Gap Property in Sparse PCA	G?rard Ben Arous, Alexander S. Wein, Ilias Zadik	We study a variant of the sparse PCA (principal component analysis) problem in the “hard” regime, where the inference task is possible yet no polynomial-time algorithm is known to exist.
19	Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process	Guy Blanc, Neha Gupta, Gregory Valiant, Paul Valiant	We consider networks, trained via stochastic gradient descent to minimize $\ell_2$ loss, with the training labels perturbed by independent noise at each iteration.
20	Hardness of Identity Testing for Restricted Boltzmann Machines and Potts models	Antonio Blanca, Zongchen Chen, Daniel ?tefankovic, Eric Vigoda	We study identity testing for restricted Boltzmann machines (RBMs), and more generally for undirected graphical models.
21	Selfish Robustness and Equilibria in Multi-Player Bandits	Etienne Boursier, Vianney Perchet	We provide the first algorithm robust to selfish players (a.k.a. Nash equilibrium) with a logarithmic regret, when the arm performance is observed.
22	Proper Learning, Helly Number, and an Optimal SVM Bound	Olivier Bousquet, Steve Hanneke, Shay Moran, Nikita Zhivotovskiy	In this paper we aim to characterize the classes for which the optimal sample complexity can be achieved by a proper learning algorithm.
23	Sharper Bounds for Uniformly Stable Algorithms	Olivier Bousquet, Yegor Klochkov, Nikita Zhivotovskiy	This paper is devoted to these questions: firstly, inspired by the original arguments of Feldman and Vondrak (2019), we provide a short proof of the moment bound that implies the generalization bound stronger than both recent results in Feldman and Vondrak (2018, 2019).
24	The Gradient Complexity of Linear Regression	Mark Braverman, Elad Hazan, Max Simchowitz, Blake Woodworth	We investigate the computational complexity of several basic linear algebra primitives, including largest eigenvector computation and linear regression, in the computational model that allows access to the data via a matrix-vector product oracle.
25	Reducibility and Statistical-Computational Gaps from Secret Leakage	Matthew Brennan, Guy Bresler	The insight in this work is that a slight generalization of the planted clique conjecture – secret leakage planted clique ($\textsc{pc}_\rho$), wherein a small amount of information about the hidden clique is revealed – gives rise to a variety of new average-case reduction techniques, yielding a web of reductions relating statistical problems with very different structure.
26	A Corrective View of Neural Networks: Representation, Memorization and Learning	Guy Bresler, Dheeraj Nagaraj	We develop a \emph{corrective mechanism} for neural network approximation: the total available non-linear units are divided into multiple groups and the first group approximates the function under consideration, the second approximates the error in approximation produced by the first group and corrects it, the third group approximates the error produced by the first and second groups together and so on.
27	ID3 Learns Juntas for Smoothed Product Distributions	Alon Brutzkus, Amit Daniely, Eran Malach	In this paper, we analyze the ID3 algorithm, when the target function is a $k$-Junta, a function that depends on $k$ out of $n$ variables of the input.
28	Coordination without communication: optimal regret in two players multi-armed bandits	S?bastien Bubeck, Thomas Budzinski	Under the assumption that shared randomness is available, we propose a strategy with no collisions at all between the players (with very high probability), and with near-optimal regret $O(\sqrt{T \log(T)})$.
29	How to Trap a Gradient Flow	S?bastien Bubeck, Dan Mikulincer	We consider the problem of finding an e-approximate stationary point of a smooth function on a compact domain of \Rd.
30	Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without	S?bastien Bubeck, Yuanzhi Li, Yuval Peres, Mark Sellke	We prove the first $\sqrt{T}$-type regret guarantee for this problem, assuming only two players, and under the feedback model where collisions are announced to the colliding players.
31	Highly smooth minimization of non-smooth problems	Brian Bullins	We establish improved rates for structured \emph{non-smooth} optimization problems by means of near-optimal higher-order accelerated methods.
32	Efficient, Noise-Tolerant, and Private Learning via Boosting	Mark Bun, Marco Leandro Carmosino, Jessica Sorrell	We introduce a simple framework for designing private boosting algorithms.
33	The estimation error of general first order methods	Michael Celentano, Andrea Montanari, Yuchen Wu	Here we consider two families of high-dimensional estimation problems: high-dimensional regression and low-rank matrix estimation, and introduce a class of ‘general first order methods’ that aim at efficiently estimating the underlying parameters.
34	Bounds in query learning	Hunter Chase, James Freitag	We introduce new combinatorial quantities for concept classes, and prove lower and upper bounds for learning complexity in several models of learning in terms of various combinatorial quantities.
35	Learning Polynomials in Few Relevant Dimensions	Sitan Chen, Raghu Meka	In this work we consider the important case where the covariates are Gaussian.
36	The Influence of Shape Constraints on the Thresholding Bandit Problem	James Cheshire, Pierre Menard, Alexandra Carpentier	We investigate the stochastic \emph{Thresholding Bandit problem} (\textit{TBP}) under several \emph{shape constraints}.
37	Gradient descent algorithms for Bures-Wasserstein barycenters	Sinho Chewi, Tyler Maunu, Philippe Rigollet, Austin J. Stromme	We study first order methods to compute the barycenter of a probability distribution $P$ over the space of probability measures with finite second moment.
38	Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss	L?ena?c Chizat, Francis Bach	Towards understanding this phenomenon, we analyze the training and generalization behavior of infinitely wide two-layer neural networks with homogeneous activations.
39	ODE-Inspired Analysis for the Biological Version of Oja?s Rule in Solving Streaming PCA	Chi-Ning Chou, Mien Brabeeba Wang	In this work, we give the first convergence rate analysis for the biological version of Oja’s rule in solving streaming PCA.
40	Pessimism About Unknown Unknowns Inspires Conservatism	Michael K. Cohen, Marcus Hutter	Our first main contribution is: given an assumption about the agent’s model class, a sufficiently pessimistic agent does not cause “unprecedented events” with probability 1-d, whether or not designers know how to precisely specify those precedents they are concerned with.
41	Optimal Group Testing	Amin Coja-Oghlan, Oliver Gebhard, Max Hahn-Klimroth, Philipp Loick	In the group testing problem, which goes back to the work of Dorfman (1943), we aim to identify a small set of $k\sim n^\theta$ infected individuals out of a population size $n$, $0<\theta<1$.
42	PAC learning with stable and private predictions	Yuval Dagan, Vitaly Feldman	We study binary classification algorithms for which the prediction on any point is not too sensitive to individual examples in the dataset.
43	High probability guarantees for stochastic convex optimization	Damek Davis, Dmitriy Drusvyatskiy	In this work, we show that a wide class of stochastic optimization algorithms for strongly convex problems can be augmented with high confidence bounds at an overhead cost that is only logarithmic in the confidence level and polylogarithmic in the condition number.
44	Halpern Iteration for Near-Optimal and Parameter-Free Monotone Inclusion and Strong Solutions to Variational Inequalities	Jelena Diakonikolas	We leverage the connections between nonexpansive maps, monotone Lipschitz operators, and proximal mappings to obtain near-optimal (i.e., optimal up to poly-log factors in terms of iteration complexity) and parameter-free methods for solving monotone inclusion problems.
45	Approximation Schemes for ReLU Regression	Ilias Diakonikolas, Surbhi Goel, Sushrut Karmalkar, Adam R. Klivans, Mahdi Soltanolkotabi	Our main insight is a new characterization of {\em surrogate losses} for nonconvex activations.
46	Learning Halfspaces with Massart Noise Under Structured Distributions	Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos, Nikos Zarifis	We study the problem of learning halfspaces with Massart noise in the distribution-specific PAC model.
47	Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks	Ilias Diakonikolas, Daniel M. Kane, Vasilis Kontonis, Nikos Zarifis	We study the problem of PAC learning one-hidden-layer ReLU networks with $k$ hidden units on $\mathbb{R}^d$ under Gaussian marginals in the presence of additive label noise.
48	Consistent recovery threshold of hidden nearest neighbor graphs	Jian Ding, Yihong Wu, Jiaming Xu, Dana Yang	Motivated by applications such as discovering strong ties in social networks and assembling genome subsequences in biology, we study the problem of recovering a hidden $2k$-nearest neighbor (NN) graph in an $n$-vertex complete graph, whose edge weights are independent and distributed according to $P_n$ for edges in the hidden $2k$-NN graph and $Q_n$ otherwise.
49	Root-n-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank	Kefan Dong, Jian Peng, Yining Wang, Yuan Zhou	In this paper, we consider the problem of online learning of Markov decision processes (MDPs) with very large state spaces.
50	Embedding Dimension of Polyhedral Losses	Jessie Finocchiaro, Rafael Frongillo, Bo Waggoner	In this work, we study the notion of the embedding dimension of a given discrete loss: the minimum dimension d such that an embedding exists.
51	Efficient Parameter Estimation of Truncated Boolean Product Distributions	Dimitris Fotakis, Alkis Kalavasis, Christos Tzamos	We study the problem of estimating the parameters of a Boolean product distribution in $d$ dimensions, when the samples are truncated by a set $S \subset \{0, 1\}^d$ accessible through a membership oracle.
52	Rigorous Guarantees for Tyler?s M-Estimator via Quantum Expansion	William Cole Franks, Ankur Moitra	Here we observe a surprising connection between Tyler’s M-estimator and operator scaling, which has been intensively studied in recent years in part because of its connections to the Brascamp-Lieb inequality in analysis.
53	From tree matching to sparse graph alignment	Luca Ganassali, Laurent Massouli?	In this paper we consider alignment of sparse graphs, for which we introduce the Neighborhood Tree Matching Algorithm (NTMA).
54	On the Convergence of Stochastic Gradient Descent with Low-Rank Projections for Convex Low-Rank Matrix Problems	Dan Garber	We revisit the use of Stochastic Gradient Descent (SGD) for solving convex optimization problems that serve as highly popular convex relaxations for many important low-rank matrix recovery problems such as matrix completion, phase retrieval, and more.
55	Asymptotic Errors for High-Dimensional Convex Penalized Linear Regression beyond Gaussian Matrices	C?dric Gerbelot, Alia Abbara, Florent Krzakala	We consider the problem of learning a coefficient vector $\bf x_0 \in \mathbb R^N$ from noisy linear observations $\mathbf{y} = \mathbf{F}{\mathbf{x}_{0}}+\mathbf{w} \in \mathbb R^M$ in high dimensional limit $M,N \to \infty$ with $\alpha \equiv M/N$ fixed.
56	No-Regret Prediction in Marginally Stable Systems	Udaya Ghai, Holden Lee, Karan Singh, Cyril Zhang, Yi Zhang	We consider the problem of online prediction in a marginally stable linear dynamical system subject to bounded adversarial or (non-isotropic) stochastic perturbations.
57	Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems	Noah Golowich, Sarath Pattathil, Constantinos Daskalakis, Asuman Ozdaglar	In this paper we study the smooth convex-concave saddle point problem.
58	Locally Private Hypothesis Selection	Sivakanth Gopi, Gautam Kamath, Janardhan Kulkarni, Aleksandar Nikolov, Zhiwei Steven Wu, Huanyu Zhang	Given samples from an unknown probability distribution $p$ and a set of $k$ probability distributions $\mathcal{Q}$, we aim to output, under the constraints of $\varepsilon$-differential privacy, a distribution from $\mathcal{Q}$ whose total variation distance to $p$ is comparable to the best such distribution.
59	Bessel Smoothing and Multi-Distribution Property Estimation	Yi Hao, Ping Li	We consider a basic problem in statistical learning: estimating properties of multiple discrete distributions.
60	Faster Projection-free Online Learning	Elad Hazan, Edgar Minasyan	In this paper we give an efficient projection-free algorithm that guarantees $T^{2/3}$ regret for general online convex optimization with smooth cost functions and one linear optimization computation per iteration.
61	Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond	Oliver Hinder, Aaron Sidford, Nimit Sohoni	In this paper, we provide near-optimal accelerated first-order methods for minimizing a broad class of smooth nonconvex functions that are unimodal on all lines through a minimizer.
62	A Greedy Anytime Algorithm for Sparse PCA	Guy Holtzman, Adam Soffer, Dan Vilenchik	We propose a new greedy algorithm for the $\ell_0$-sparse PCA problem which supports the calibration principle.
63	Noise-tolerant, Reliable Active Classification with Comparison Queries	Max Hopkins, Daniel Kane, Shachar Lovett, Gaurav Mahajan	By introducing comparisons, an additional type of query comparing two points, we provide the first time and query efficient algorithms for learning non-homogeneous linear separators robust to bounded (Massart) noise.
64	Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes	Yichun Hu, Nathan Kallus, Xiaojie Mao	We study a nonparametric contextual bandit problem where the expected reward functions belong to a Hölder class with smoothness parameter $\beta$.
65	Extrapolating the profile of a finite population	Soham Jana, Yury Polyanskiy, Yihong Wu	We study a prototypical problem in empirical Bayes.
66	Precise Tradeoffs in Adversarial Training for Linear Regression	Adel Javanmard, Mahdi Soltanolkotabi, Hamed Hassani	In this paper we provide a precise and comprehensive understanding of the role of adversarial training in the context of linear regression with Gaussian features.
67	Robust causal inference under covariate shift via worst-case subpopulation treatment effects	Sookyo Jeong, Hongseok Namkoong	We propose a notion of worst-case treatment effect (WTE) across all subpopulations of a given size, a conservative notion of topline treatment effect.
68	Efficient improper learning for online logistic regression	R?mi J?z?quel, Pierre Gaillard, Alessandro Rudi	In this work, we design an efficient improper algorithm that avoids this exponential constant while preserving a logarithmic regret.
69	Gradient descent follows the regularization path for general losses	Ziwei Ji, Miroslav Dud?k, Robert E. Schapire, Matus Telgarsky	In this work, we show that for empirical risk minimization over linear predictors with \emph{arbitrary} convex, strictly decreasing losses, if the risk does not attain its infimum, then the gradient-descent path and the \emph{algorithm-independent} regularization path converge to the same direction (whenever either converges to a direction).
70	Provably efficient reinforcement learning with linear function approximation	Chi Jin, Zhuoran Yang, Zhaoran Wang, Michael I Jordan	This paper presents the first provable RL algorithm with both polynomial runtime and polynomial sample complexity in this linear setting, without requiring a “simulator” or additional assumptions.
71	Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise	Maxim Kaledin, Eric Moulines, Alexey Naumov, Vladislav Tadic, Hoi-To Wai	In this paper, we provide a finite-time analysis for linear two timescale SA.
72	Private Mean Estimation of Heavy-Tailed Distributions	Gautam Kamath, Vikrant Singhal, Jonathan Ullman	We give new upper and lower bounds on the minimax sample complexity of differentially private mean estimation of distributions with bounded $k$-th moments.
73	Approximate is Good Enough: Probabilistic Variants of Dimensional and Margin Complexity	Pritish Kamath, Omar Montasser, Nathan Srebro	We present and study approximate notions of dimensional and margin complexity, which correspond to the minimal dimension or norm of an embedding required to {\em approximate}, rather then exactly represent, a given hypothesis class.
74	Privately Learning Thresholds: Closing the Exponential Gap	Haim Kaplan, Katrina Ligett, Yishay Mansour, Moni Naor, Uri Stemmer	In this work we reduce this gap significantly, almost settling the sample complexity.
75	Online Learning with Vector Costs and Bandits with Knapsacks	Thomas Kesselheim, Sahil Singla	We introduce online learning with vector costs ($OLVC_p$) where in each time step $t \in \{1,\ldots, T\}$, we need to play an action $i \in \{1,\ldots,n\}$ that incurs an unknown vector cost in $[0,1]^d$.
76	Universal Approximation with Deep Narrow Networks	Patrick Kidger, Terry Lyons	Here we consider the natural ‘dual’ scenario for networks of bounded width and arbitrary depth.
77	Information Directed Sampling for Linear Partial Monitoring	Johannes Kirschner, Tor Lattimore, Andreas Krause	We introduce {\em information directed sampling} (IDS) for stochastic partial monitoring with a linear reward and observation structure.
78	New Potential-Based Bounds for Prediction with Expert Advice	Vladimir A. Kobzar, Robert V. Kohn, Zhilei Wang	This work addresses the classic machine learning problem of online prediction with expert advice.
79	On Suboptimality of Least Squares with Application to Estimation of Convex Bodies	Gil Kur, Alexander Rakhlin, Adityanand Guntuboyina	We develop a technique for establishing lower bounds on the sample complexity of Least Squares (or, Empirical Risk Minimization) for large classes of functions.
80	The EM Algorithm gives Sample-Optimality for Learning Mixtures of Well-Separated Gaussians	Jeongyeol Kwon, Constantine Caramanis	We consider the problem of spherical Gaussian Mixture models with $k \geq 3$ components when the components are well separated.
81	Exploration by Optimisation in Partial Monitoring	Tor Lattimore, Csaba Szespv?ri	We provide a novel algorithm for adversarial k-action d-outcome partial monitoring that is adaptive, intuitive and efficient.
82	A Closer Look at Small-loss Bounds for Bandits with Graph Feedback	Chung-Wei Lee, Haipeng Luo, Mengxiao Zhang	Specifically, we develop an algorithm with regret $\mathcal{\tilde{O}}(\sqrt{\kappa L_})$ where $\kappa$ is the clique partition number and $L_$ is the loss of the best arm, and for the special case of self-aware graphs where every arm has a self-loop, we improve the regret to $\mathcal{\tilde{O}}(\min\{\sqrt{\alpha T}, \sqrt{\kappa L_*}\})$ where $\alpha \leq \kappa$ is the independence number.
83	Logsmooth Gradient Concentration and Tighter Runtimes for Metropolized Hamiltonian Monte Carlo	Yin Tat Lee, Ruoqi Shen, Kevin Tian	We show that the gradient norm \norm?f(x) for x~exp(-f(x)), where f is strongly convex and smooth, concentrates tightly around its mean.
84	A Fast Spectral Algorithm for Mean Estimation with Sub-Gaussian Rates	Zhixian Lei, Kyle Luh, Prayaag Venkat, Fred Zhang	In this work, we show that it is possible to go beyond SDP and achieve better computational efficiency.
85	Learning Over-Parametrized Two-Layer Neural Networks beyond NTK	Yuanzhi Li, Tengyu Ma, Hongyang R. Zhang	We consider the dynamic of gradient descent for learning a two-layer neural network.
86	On the Multiple Descent of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels	Tengyuan Liang, Alexander Rakhlin, Xiyu Zhai	We study the risk of minimum-norm interpolants of data in Reproducing Kernel Hilbert Spaces.
87	Learning Entangled Single-Sample Gaussians in the Subset-of-Signals Model	Yingyu Liang, Hui Yuan	We propose the subset-of-signals model where an unknown subset of $m$ variances are bounded by 1 while there are no assumptions on the other variances.
88	Near-Optimal Algorithms for Minimax Optimization	T. Lin, C. Jin, M. I. Jordan	This paper presents the first algorithm with $\tilde{O}(\sqrt{\kappax\kappay})$ gradient complexity, matching the lower bound up to logarithmic factors.
89	Better Algorithms for Estimating Non-Parametric Models in Crowd-Sourcing and Rank Aggregation	Allen Liu, Ankur Moitra	Here we introduce a framework for exploiting global information in shape-constrained estimation problems.
90	Tight Lower Bounds for Combinatorial Multi-Armed Bandits	Nadav Merlis, Shie Mannor	In this work, we prove regret lower bounds for combinatorial bandits that hold under mild assumptions for all smooth reward functions.
91	Lipschitz and Comparator-Norm Adaptivity in Online Learning	Zakaria Mhammedi, Wouter M. Koolen	We study Online Convex Optimization in the unbounded setting where neither predictions nor gradient are constrained. The goal is to simultaneously adapt to both the sequence of gradients and the comparator.
92	Information Theoretic Optimal Learning of Gaussian Graphical Models	Sidhant Misra, Marc Vuffray, Andrey Y. Lokhov	In this paper, we constructively answer this question and propose an algorithm, termed DICE, whose sample complexity matches the information-theoretic lower bound up to a universal constant factor.
93	Parallels Between Phase Transitions and Circuit Complexity?	Ankur Moitra, Elchanan Mossel, Colin Sandon	In this work, we study the circuit complexity of inference in the broadcast tree model, which has important applications in phylogenetic reconstruction and close connections to community detection.
94	On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration	Wenlong Mou, Chris Junchi Li, Martin J Wainwright, Peter L Bartlett, Michael I Jordan	When the matrix $\bar{A}$ is Hurwitz, we prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
95	Extending Learnability to Auxiliary-Input Cryptographic Primitives and Meta-PAC Learning	Mikito Nanashima	In this paper, we formulate a task of determining efficient learnability as a meta-PAC learning problem and show that our meta-PAC learning is exactly as hard as PAC learning.
96	Fast Rates for Online Prediction with Abstention	Gergely Neu, Nikita Zhivotovskiy	In the setting of sequential prediction of individual $(0, 1)$-sequences with expert advice, we show that by allowing the learner to abstain from the prediction by paying a cost marginally smaller than $0.5$ (say, $0.49$), it is possible to achieve expected regret bounds that are independent of the time horizon T.
97	Efficient and robust algorithms for adversarial linear contextual bandits	Gergely Neu, Julia Olkhovskaya	Under the assumption that the $d$-dimensional contexts are generated i.i.d. at random from a known distribution, we develop computationally efficient algorithms based on the classic Exp3 algorithm.
98	An $\widetilde\mathcalO(m/\varepsilon^3.5)$-Cost Algorithm for Semidefinite Programs with Diagonal Constraints	Yin Tat Lee, Swati Padmanabhan	Our key technical contribution is to combine an approximate variant of the Arora-Kale framework of mirror descent for SDPs with the idea of trading off exact computations in every iteration for variance-reduced estimations in most iterations, only periodically resetting the accumulated error with exact computations.
99	Costly Zero Order Oracles	Renato Paes Leme, Jon Schneider	We study optimization with an approximate zero order oracle where there is a cost $c(\epsilon)$ associated with querying the oracle with $\epsilon$ accuracy.
100	Adaptive Submodular Maximization under Stochastic Item Costs	Srinivasan Parthasarathy	In this work, we develop adaptive policies for maximizing such functions when both the utility function and item costs may be stochastic.
101	Covariance-adapting algorithm for semi-bandits with application to sparse outcomes	Pierre Perrault, Michal Valko, Vianney Perchet	We investigate \emph{stochastic combinatorial semi-bandits}, where the entire joint distribution of outcomes impacts the complexity of the problem instance (unlike in the standard bandits).
102	Finite-Time Analysis of Asynchronous Stochastic Approximation and $Q$-Learning	Guannan Qu, Adam Wierman	We consider a general asynchronous Stochastic Approximation (SA) scheme featuring a weighted infinity-norm contractive operator, and prove a bound on its finite-time convergence rate on a single trajectory.
103	List Decodable Subspace Recovery	Prasad Raghavendra, Morris Yau	In this work, we study robust statistics in the presence of overwhelming outliers for the fundamental problem of subspace recovery.
104	Tsallis-INF for Decoupled Exploration and Exploitation in Multi-armed Bandits	Chlo? Rouyer, Yevgeny Seldin	We derive a new algorithm using regularization by Tsallis entropy to achieve best of both worlds guarantees.
105	How Good is SGD with Random Shuffling?	Itay Safran, Ohad Shamir	In this paper, we provide lower bounds on the expected optimization error with these heuristics (using SGD with any constant step size), which elucidate their advantages and disadvantages.
106	A Nearly Optimal Variant of the Perceptron Algorithm for the Uniform Distribution on the Unit Sphere	Marco Schmalhofer	We show a simple perceptron-like algorithm to learn origin-centered halfspaces in $\mathbb{R}^n$ with accuracy $1-\epsilon$ and confidence $1-\delta$ in time $\mathcal{O}\left(\frac{n^2}{\epsilon}\left(\log \frac{1}{\epsilon}+\log \frac{1}{\delta}\right)\right)$ using $\mathcal{O}\left(\frac{n}{\epsilon}\left(\log \frac{1}{\epsilon}+\log \frac{1}{\delta}\right)\right)$ labeled examples drawn uniformly from the unit $n$-sphere.
107	Logistic Regression Regret: What?s the Catch?	Gil I Shamir	We address the problem of the achievable regret rates with online logistic regression.
108	Improper Learning for Non-Stochastic Control	Max Simchowitz, Karan Singh, Elad Hazan	We introduce a controller parametrization based on the denoised observations, and prove that applying online gradient descent to this parametrization yields a new controller which attains sublinear regret vs. a large class of closed-loop policies.
109	Reasoning About Generalization via Conditional Mutual Information	Thomas Steinke, Lydia Zakynthinou	We provide an information-theoretic framework for studying the generalization properties of machine learning algorithms.
110	Estimation and Inference with Trees and Forests in High Dimensions	Vasilis Syrgkanis, Manolis Zampetakis	In this work, we analyze the performance of regression trees and forests with binary features in the high-dimensional regime, where the number of features can grow exponentially with the number of samples.
111	Balancing Gaussian vectors in high dimension	Paxton Turner, Raghu Meka, Philippe Rigollet	We present a randomized polynomial-time algorithm that achieves discrepancy $e^{-\Omega(\log^2(n)/m)}$ with high probability, provided that $m = O(\sqrt{\log{n}})$.
112	Active Learning for Identification of Linear Dynamical Systems	Andrew Wagenmaker, Kevin Jamieson	We propose an algorithm to actively estimate the parameters of a linear dynamical system.
113	Taking a hint: How to leverage loss predictors in contextual bandits?	Chen-Yu Wei, Haipeng Luo, Alekh Agarwal	We initiate the study of learning in contextual bandits with the help of loss predictors.
114	Kernel and Rich Regimes in Overparametrized Models	Blake Woodworth, Suriya Gunasekar, Jason D. Lee, Edward Moroshko, Pedro Savarese, Itay Golan, Daniel Soudry, Nathan Srebro	We provide a complete and detailed analysis for a family of simple depth-$D$ linear networks that exhibit an interesting and meaningful transition between the kernel and rich regimes, and highlight an interesting role for the \emph{width} of the models.
115	Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium	Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang	In this work, we develop provably efficient reinforcement learning algorithms for two-player zero-sum Markov games with simultaneous moves.
116	Tree-projected gradient descent for estimating gradient-sparse parameters on graphs	Sheng Xu, Zhou Fan, Sahand Negahban	Given observations $Z_1,\ldots,Z_n$ and a smooth, convex loss function $\mathcal{L}$ for which $\boldsymbol{\theta}^$ minimizes the population risk $\mathbb{E}[\mathcal{L}(\boldsymbol{\theta};Z_1,\ldots,Z_n)]$, we propose to estimate $\boldsymbol{\theta}^$ by a projected gradient descent algorithm that iteratively and approximately projects gradient steps onto spaces of vectors having small gradient-sparsity over low-degree spanning trees of $G$.
117	Non-asymptotic Analysis for Nonparametric Testing	Yun Yang, Zuofeng Shang, Guang Cheng	We develop a non-asymptotic framework for hypothesis testing in nonparametric regression where the true regression function belongs to a Sobolev space.
118	Learning a Single Neuron with Gradient Methods	Gilad Yehudai, Shamir Ohad	We consider the fundamental problem of learning a single neuron $\mathbf{x}\mapsto \sigma(\mathbf{w}^\top\mathbf{x})$ in a realizable setting, using standard gradient methods with random initialization, and under general families of input distributions and activations.
119	Nearly Non-Expansive Bounds for Mahalanobis Hard Thresholding	Xiao-Tong Yuan, Ping Li	The core contribution of this paper is to prove that for any $\bar k$-sparse vector $\bar w$ with $\bar k < k$, the estimation error $\\|\mathcal{H}_{A,k}(w) – \bar w\\|_A$ satisfies \[ \\|\mathcal{H}_{A,k}(w) – \bar w\\|^2_A \le \left(1+ \mathcal{O}\left(\kappa(A,2k) \sqrt{\frac{\bar k }{k – \bar k}}\right)\right) \\|{w} – \bar w\\|^2_A, \]{where} $\kappa(A,2k)$ is the restricted strong condition number of $A$ over $(2k)$-sparse subspace.
120	Wasserstein Control of Mirror Langevin Monte Carlo	Kelvin Shuangjian Zhang, Gabriel Peyr?, Jalal Fadili, Marcelo Pereyra	In this paper, we consider Langevin diffusions on a Hessian-type manifold and study a discretization that is closely related to the mirror-descent scheme.
121	Open Problem: Model Selection for Contextual Bandits	Dylan J. Foster, Akshay Krishnamurthy, Haipeng Luo	We ask whether similar guarantees are possible for contextual bandit learning.
122	Open Problem: Tight Convergence of SGD in Constant Dimension	Tomer Koren, Shahar Segal	We point out to a gap that remains between the known upper and lower bounds for the expected suboptimality of the last SGD point whenever the dimension is a constant independent of the number of SGD iterations T, and in particular, that the gap is still unaddressed even in the one dimensional case.
123	Open Problem: Average-Case Hardness of Hypergraphic Planted Clique Detection	Yuetian Luo, Anru R Zhang	In particular, we conjecture if it is possible to establish the equivalence of the computational hardness between HPC and PC detection.
124	Open Problem: Information Complexity of VC Learning	Thomas Steinke, Lydia Zakynthinou	We ask whether all VC classes admit a learner with low information complexity which achieves the generalization bounds guaranteed by uniform convergence.
125	Open Problem: Fast and Optimal Online Portfolio Selection	Tim Van Erven, Dirk Van der Hoeven, Wojciech Kotlowski, Wouter M. Koolen	The open problem we put before the community is to formally prove whether this approach achieves the optimal regret.