Paper Digest: COLT 2018 Highlights

June 24, 2018June 18, 2020 admin

Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The Annual Conference on Learning Theory (COLT) focuses on addressing theoretical aspects of machine learing and related topics.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: COLT 2018 Papers

	Title	Authors	Highlight
1	Conference on Learning Theory 2018: Preface	S�bastien Bubeck, Philippe Rigollet	Conference on Learning Theory 2018: Preface
2	Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations	Yuanzhi Li, Tengyu Ma, Hongyang Zhang	We show that the gradient descent algorithm provides an implicit regularization effect in the learning of over-parameterized matrix factorization models and one-hidden-layer neural networks with quadratic activations.
3	Reducibility and Computational Lower Bounds for Problems with Planted Sparse Structure	Matthew Brennan, Guy Bresler, Wasim Huleihel	We introduce several new techniques to give a web of average-case reductions showing strong computational lower bounds based on the planted clique conjecture.
4	Logistic Regression: The Importance of Being Improper	Dylan J. Foster, Satyen Kale, Haipeng Luo, Mehryar Mohri, Karthik Sridharan	Starting with the simple observation that the logistic loss is $1$-mixable, we design a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm.
5	Actively Avoiding Nonsense in Generative Models	Steve Hanneke, Adam Tauman Kalai, Gautam Kamath, Christos Tzamos	To address this, we propose a model of active distribution learning using a binary invalidity oracle that identifies some examples as clearly invalid, together with random positive examples sampled from the true distribution.
6	A Faster Approximation Algorithm for the Gibbs Partition Function	Vladimir Kolmogorov	We consider the problem of estimating the partition function $Z(\beta)=\sum_x \exp(-\beta H(x))$ of a Gibbs distribution with a Hamilton $H(\cdot)$, or more precisely the logarithm of the ratio $q=\ln Z(0)/Z(\beta)$.
7	Exponential Convergence of Testing Error for Stochastic Gradient Methods	Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach	We consider binary classification problems with positive definite kernels and square loss, and study the convergence rates of stochastic gradient methods.
8	Size-Independent Sample Complexity of Neural Networks	Noah Golowich, Alexander Rakhlin, Ohad Shamir	We study the sample complexity of learning neural networks, by providing new bounds on their Rademacher complexity assuming norm constraints on the parameter matrix of each layer.
9	Underdamped Langevin MCMC: A non-asymptotic analysis	Xiang Cheng, Niladri S. Chatterji, Peter L. Bartlett, Michael I. Jordan	We present a MCMC algorithm based on its discretization and show that it achieves $\varepsilon$ error (in 2-Wasserstein distance) in $\mathcal{O}(\sqrt{d}/\varepsilon)$ steps.
10	Online Variance Reduction for Stochastic Optimization	Zalan Borsos, Andreas Krause, Kfir Y. Levy	In this work, we investigate a recently proposed setting which poses variance reduction as an online optimization problem with bandit feedback.
11	Information Directed Sampling and Bandits with Heteroscedastic Noise	Johannes Kirschner, Andreas Krause	In this work, we consider bandits with heteroscedastic noise, where we explicitly allow the noise distribution to depend on the evaluation point.
12	Testing Symmetric Markov Chains From a Single Trajectory	Constantinos Daskalakis, Nishanth Dikkala, Nick Gravin	We propose a measure of difference between two Markov chains, motivated by the early work of Kazakos [78], which captures the scaling behavior of the total variation distance between trajectories sampled from the Markov chains as the length of these trajectories grows.
13	Detection limits in the high-dimensional spiked rectangular model	Ahmed El Alaoui, Michael I. Jordan	We present a probabilistic approach capable of treating generic product priors.
14	Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification	Max Simchowitz, Horia Mania, Stephen Tu, Michael I. Jordan, Benjamin Recht	We generalize our technique to provide bounds for a more general class of linear response time-series.
15	Active Tolerant Testing	Avrim Blum, Lunjia Hu	In this work, we show that for a nontrivial hypothesis class $\mathcal C$, we can estimate the distance of a target function $f$ to $\mathcal C$ (estimate the error rate of the best $h\in \mathcal C$) using substantially fewer labeled examples than would be needed to actually {\em learn} a good $h \in \mathcal C$.
16	Polynomial Time and Sample Complexity for Non-Gaussian Component Analysis: Spectral Methods	Yan Shuo Tan, Roman Vershynin	In this paper, we propose a simple spectral algorithm called \textsc{Reweighted PCA}, and prove that it possesses the same guarantee.
17	Calibrating Noise to Variance in Adaptive Data Analysis	Vitaly Feldman, Thomas Steinke	Here we propose a relaxed notion of stability based on KL divergence that also composes adaptively.
18	Accelerating Stochastic Gradient Descent for Least Squares Regression	Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford	In particular, this work introduces an accelerated stochastic gradient method that provably achieves the minimax optimal statistical risk faster than stochastic gradient descent.
19	Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints	Wenlong Mou, Liwei Wang, Xiyu Zhai, Kai Zheng	Two theories are proposed with non-asymptotic discrete-time analysis, using stability and PAC-Bayesian theory respectively.
20	Optimal approximation of continuous functions by very deep ReLU networks	Dmitry Yarotsky	We consider approximations of general continuous functions on finite-dimensional cubes by general deep ReLU neural networks and study the approximation rates with respect to the modulus of continuity of the function and the total number of weights $W$ in the network.
21	Averaging Stochastic Gradient Descent on Riemannian Manifolds	Nilesh Tripuraneni, Nicolas Flammarion, Francis Bach, Michael I. Jordan	We develop a geometric framework to transform a sequence of slowly converging iterates generated from stochastic gradient descent (SGD) on $\mathcal{M}$ to an averaged iterate sequence with a robust and fast $O(1/n)$ convergence rate.
22	Fitting a Putative Manifold to Noisy Data	Charles Fefferman, Sergei Ivanov, Yaroslav Kurylev, Matti Lassas, Hariharan Narayanan	In the present work, we give a solution to the following question from manifold learning.
23	Private Sequential Learning	John Tsitsiklis, Kuang Xu, Zhi Xu	We formulate a private learning model to study an intrinsic tradeoff between privacy and query complexity in sequential learning.
24	Optimal Errors and Phase Transitions in High-Dimensional Generalized Linear Models	Jean Barbier, Florent Krzakala, Nicolas Macris, L�o Miolane, Lenka Zdeborov�	% In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes or benchmarks models in neural networks.
25	Exact and Robust Conformal Inference Methods for Predictive Machine Learning with Dependent Data	Victor Chernozhukov, Kaspar W�thrich, Zhu Yinchu	We extend conformal inference to general settings that allow for time series data.
26	Nonstochastic Bandits with Composite Anonymous Feedback	Nicol� Cesa-Bianchi, Claudio Gentile, Yishay Mansour	Our main contribution is a general reduction transforming a standard bandit algorithm into one that can operate in this harder setting.
27	Lower Bounds for Higher-Order Convex Optimization	Naman Agarwal, Elad Hazan	As a special case, we show Nesterov’s accelerated cubic regularization method and higher-order methods to be nearly tight.
28	Log-concave sampling: Metropolis-Hastings algorithms are fast!	Raaz Dwivedi, Yuansi Chen, Martin J Wainwright, Bin Yu	We consider the problem of sampling from a strongly log-concave density in $\mathbb{R}^d$, and prove a non-asymptotic upper bound on the mixing time of the Metropolis-adjusted Langevin algorithm (MALA).
29	Incentivizing Exploration by Heterogeneous Users	Bangrui Chen, Peter Frazier, David Kempe	We consider the problem of incentivizing exploration with heterogeneous agents.
30	Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms	Ilias Diakonikolas, Jerry Li, Ludwig Schmidt	Our goal is to output a hypothesis that is $O(\mathrm{OPT}) + \epsilon$ close to $f$, in $L_1$-distance.
31	Time-Space Tradeoffs for Learning Finite Functions from Random Evaluations, with Applications to Polynomials	Paul Beame, Shayan Oveis Gharan, Xin Yang	With our methods we can obtain bounds for learning concept classes of finite functions from random evaluations even when the sample space of random inputs can be significantly smaller than the concept class of functions and the function values can be from an arbitrary finite set.
32	Local Optimality and Generalization Guarantees for the Langevin Algorithm via Empirical Metastability	Belinda Tzen, Tengyuan Liang, Maxim Raginsky	We study the detailed path-wise behavior of the discrete-time Langevin algorithm for non-convex Empirical Risk Minimization (ERM) through the lens of metastability, adopting some techniques from Berglund and Gentz (2003).
33	Hardness of Learning Noisy Halfspaces using Polynomial Thresholds	Arnab Bhattacharyya, Suprovat Ghoshal, Rishi Saket	We prove the hardness of weakly learning halfspaces in the presence of adversarial noise using polynomial threshold functions (PTFs).
34	Best of both worlds: Stochastic & adversarial best-arm identification	Yasin Abbasi-Yadkori, Peter Bartlett, Victor Gabillon, Alan Malek, Michal Valko	We study bandit best-arm identification with arbitrary and potentially adversarial rewards.
35	Learning Patterns for Detection with Multiscale Scan Statistics	James Sharpnack	We consider the problem of simultaneously learning and detecting the anomalous pattern from a dictionary of smooth patterns and a database of many tensors.
36	Global Guarantees for Enforcing Deep Generative Priors by Empirical Risk	Paul Hand, Vladislav Voroninski	In particular we consider two models, one in which the task is to invert a generative neural network given access to its last layer and another in which the task is to invert a generative neural network given only compressive linear observations of its last layer.
37	Small-loss bounds for online learning with partial information	Thodoris Lykouris, Karthik Sridharan, �va Tardos	We consider the problem of adversarial (non-stochastic) online learning with partial information feedback, where at each round, a decision maker selects an action from a finite set of alternatives.
38	Empirical bounds for functions with weak interactions	Andreas Maurer, Massimiliano Pontil	We provide sharp empirical estimates of expectation, variance and normal approximation for a class of statistics whose variation in any argument does not change too much when another argument is modified.
39	Restricted Eigenvalue from Stable Rank with Applications to Sparse Linear Regression	Shiva Prasad Kasiviswanathan, Mark Rudelson	We give two applications of this construction to sparse linear regression problems, including one to a compressed sparse regression setting where the regression algorithm only has access to a compressed representation of a fixed design matrix $X$.
40	Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent	Chi Jin, Praneeth Netrapalli, Michael I. Jordan	Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent
41	Convex Optimization with Unbounded Nonconvex Oracles using Simulated Annealing	Oren Mangoubi, Nisheeth K. Vishnoi	In this paper we study the more general case when the noise has magnitude $\alpha F(x) + \beta$ for some $\alpha, \beta > 0$, and present a polynomial time algorithm that finds an approximate minimizer of $F$ for this noise model.
42	Learning Mixtures of Linear Regressions with Nearly Optimal Complexity	Yuanzhi Li, Yingyu Liang	This paper proposes a fixed parameter tractable algorithm for the problem under general conditions, which achieves global convergence and the sample complexity scales nearly linearly in the dimension.
43	Detecting Correlations with Little Memory and Communication	Yuval Dagan, Ohad Shamir	We study the problem of identifying correlations in multivariate data, under information constraints: Either on the amount of memory that can be used by the algorithm, or the amount of communication when the data is distributed across several machines.
44	Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning	Gal Dalal, Gugan Thoppe, Bal�zs Sz�r�nyi, Shie Mannor	In this work, we develop a novel recipe for their finite sample analysis.
45	Near-Optimal Sample Complexity Bounds for Maximum Likelihood Estimation of Multivariate Log-concave Densities	Timothy Carpenter, Ilias Diakonikolas, Anastasios Sidiropoulos, Alistair Stewart	We study the problem of learning multivariate log-concave densities with respect to a global loss function.
46	More Adaptive Algorithms for Adversarial Bandits	Chen-Yu Wei, Haipeng Luo	We develop a novel and generic algorithm for the adversarial multi-armed bandit problem (or more generally the combinatorial semi-bandit problem).
47	Efficient Convex Optimization with Membership Oracles	Yin Tat Lee, Aaron Sidford, Santosh S. Vempala	We consider the problem of minimizing a convex function over a convex set given access only to an evaluation oracle for the function and a membership oracle for the set.
48	A General Approach to Multi-Armed Bandits Under Risk Criteria	Asaf Cassel, Shie Mannor, Assaf Zeevi	In this paper we provide a more systematic approach to analyzing such risk criteria within a stochastic multi-armed bandit (MAB) formulation.
49	An Optimal Learning Algorithm for Online Unconstrained Submodular Maximization	Tim Roughgarden, Joshua R. Wang	We consider a basic problem at the interface of two fundamental fields: {\em submodular optimization} and {\em online learning}.
50	The Mean-Field Approximation: Information Inequalities, Algorithms, and Complexity	Vishesh Jain, Frederic Koehler, Elchanan Mossel	Building on the methods used to prove the bound, along with techniques from combinatorics and optimization, we study the algorithmic problem of estimating the (variational) free energy for Ising models and general Markov random fields.
51	Approximation beats concentration? An approximation view on inference with smooth radial kernels	Mikhail Belkin	In this paper we take the approximation theory point of view to explore various aspects of smooth kernels related to their inferential properties.
52	Non-Convex Matrix Completion Against a Semi-Random Adversary	Yu Cheng, Rong Ge	In this paper, we investigate a more realistic semi-random model, where the probability of observing each entry is {\em at least} $p$.
53	The Vertex Sample Complexity of Free Energy is Polynomial	Vishesh Jain, Frederic Koehler, Elchanan Mossel	For Markov random fields of order $r$, we obtain an algorithm that achieves $\epsilon$ approximation using a number of samples polynomial in $r$ and $1/\epsilon$ and running time that is $2^{O(1/\epsilon^2)}$ up to polynomial factors in $r$ and $\epsilon$.
54	Efficient Algorithms for Outlier-Robust Regression	Adam Klivans, Pravesh K. Kothari, Raghu Meka	We give the first polynomial-time algorithm for performing linear or polynomial regression resilient to adversarial corruptions in both examples and labels.
55	Action-Constrained Markov Decision Processes With Kullback-Leibler Cost	Ana Bu�ic, Sean Meyn	This paper introduces a technique to solve a more general class of action-constrained MDPs.
56	Fundamental Limits of Weak Recovery with Applications to Phase Retrieval	Marco Mondelli, Andrea Montanari	We consider the case of Gaussian vectors $\boldsymbol a_i$.
57	Cutting plane methods can be extended into nonconvex optimization	Oliver Hinder	We show that it is possible to obtain an $O(\epsilon^{-4/3})$ runtime — including computational cost — for finding $\epsilon$-stationary points of nonconvex functions using cutting plane methods.
58	An Analysis of the t-SNE Algorithm for Data Visualization	Sanjeev Arora, Wei Hu, Pravesh K. Kothari	This work gives a formal framework for the problem of data visualization – finding a 2-dimensional embedding of clusterable data that correctly separates individual clusters to make them visually identifiable.
59	Adaptivity to Smoothness in X-armed bandits	Andrea Locatelli, Alexandra Carpentier	We study the stochastic continuum-armed bandit problem from the angle of adaptivity to \emph{unknown regularity} of the reward function $f$.
60	Black-Box Reductions for Parameter-free Online Learning in Banach Spaces	Ashok Cutkosky, Francesco Orabona	We introduce several new black-box reductions that significantly improve the design of adaptive and parameter-free online learning algorithms by simplifying analysis, improving regret guarantees, and sometimes even improving runtime.
61	A Data Prism: Semi-verified learning in the small-alpha regime	Michela Meister, Gregory Valiant	We consider a simple model of unreliable or crowdsourced data where there is an underlying set of $n$ binary variables, each “evaluator” contributes a (possibly unreliable or adversarial) estimate of the values of some subset of $r$ of the variables, and the learner is given the true value of a \emph{constant} number of variables.
62	A Direct Sum Result for the Information Complexity of Learning	Ido Nachum, Jonathan Shafer, Amir Yehudayoff	We introduce a class of functions of VC dimension $d$ over the domain $\mathcal{X}$ with information complexity at least $\Omega \left(d\log \log \frac{\|\mathcal{X}\|}{d}\right)$ bits for any consistent and proper algorithm (deterministic or random).
63	Online learning over a finite action set with limited switching	Jason Altschuler, Kunal Talwar	\par Next, to investigate the value of switching actions at a more granular level, we introduce the setting of \textit{switching budgets}, in which the algorithm is limited to $S \leq T$ switches between actions.
64	Smoothed Online Convex Optimization in High Dimensions via Online Balanced Descent	Niangjun Chen, Gautam Goel, Adam Wierman	We introduce a novel algorithmic framework for this problem, Online Balanced Descent (OBD), which works by iteratively projecting the previous point onto a carefully chosen level set of the current cost function so as to balance the switching costs and hitting costs.
65	Faster Rates for Convex-Concave Games	Jacob Abernethy, Kevin A. Lai, Kfir Y. Levy, Jun-Kun Wang	In this work we go further, showing that for a particular class of games one achieves a $O(1/T^2)$ rate, and we show how this applies to the Frank-Wolfe method and recovers a similar bound \citep{D15}.
66	$\ell_1$ Regression using Lewis Weights Preconditioning and Stochastic Gradient Descent	David Durfee, Kevin A. Lai, Saurabh Sawlani	We present preconditioned stochastic gradient descent (SGD) algorithms for the $\ell_1$ minimization problem $\min_{\boldsymbol{\mathit{x}}}\\|\boldsymbol{\mathit{A}} \boldsymbol{\mathit{x}} – \boldsymbol{\mathit{b}}\\|_1$ in the overdetermined case, where there are far more constraints than variables.
67	Optimal Single Sample Tests for Structured versus Unstructured Network Data	Guy Bresler, Dheeraj Nagaraj	Our goal is to test without know- ing the parameter values of the underlying models: only the structure of dependencies is known.
68	A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation	Jalaj Bhandari, Daniel Russo, Raghav Singal	In this work, we provide a \emph{simple and explicit finite time analysis} of temporal difference learning with linear function approximation.
69	Privacy-preserving Prediction	Cynthia Dwork, Vitaly Feldman	Here we formulate the problem of ensuring privacy of individual predictions and investigate the overheads required to achieve it in several standard models of classification and regression.
70	An Estimate Sequence for Geodesically Convex Optimization	Hongyi Zhang, Suvrit Sra	We propose a Riemannian version of Nesterov’s Accelerated Gradient algorithm (\textsc{Ragd}), and show that for \emph{geodesically} smooth and strongly convex problems, within a neighborhood of the minimizer whose radius depends on the condition number as well as the sectional curvature of the manifold, \textsc{Ragd} converges to the minimizer with acceleration.
71	The Externalities of Exploration and How Data Diversity Helps Exploitation	Manish Raghavan, Aleksandrs Slivkins, Jennifer Vaughan Wortman, Zhiwei Steven Wu	We introduce the notion of a group externality, measuring the extent to which the presence of one population of users (the majority) impacts the rewards of another (the minority).
72	Efficient Contextual Bandits in Non-stationary Worlds	Haipeng Luo, Chen-Yu Wei, Alekh Agarwal, John Langford	In this work, we develop several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i.i.d. problems with sophisticated statistical tests so as to dynamically adapt to a change in distribution.
73	Langevin Monte Carlo and JKO splitting	Espen Bernton	We develop novel connections between such Monte Carlo algorithms, the theory of Wasserstein gradient flow, and the operator splitting approach to solving PDEs.
74	Subpolynomial trace reconstruction for random strings \{and arbitrary deletion probability	Nina Holden, Robin Pemantle, Yuval Peres	We show that if $\bf x$ is chosen uniformly at random, then $\exp(O(\log^{1/3} n))$ traces suffice to reconstruct $\bf x$ with high probability.
75	An explicit analysis of the entropic penalty in linear programming	Jonathan Weed	We provide matching lower bounds and show that the entropic approach does not lead to a near-linear time approximation scheme for the linear assignment problem.
76	Efficient active learning of sparse halfspaces	Chicheng Zhang	In this paper, we provide a computationally efficient algorithm that achieves this goal.
77	Marginal Singularity, and the Benefits of Labels in Covariate-Shift	Samory Kpotufe, Guillaume Martinet	We present new minimax results that concisely capture the relative benefits of source and target labeled data, under {covariate-shift}.
78	Learning Single-Index Models in Gaussian Space	Rishabh Dudeja, Daniel Hsu	We consider regression problems where the response is a smooth but non-linear function of a $k$-dimensional projection of $p$ normally-distributed covariates, contaminated with additive Gaussian noise.
79	Hidden Integrality of SDP Relaxations for Sub-Gaussian Mixture Models	Yingjie Fei, Yudong Chen	We consider the problem of finding discrete clustering structures under Sub-Gaussian Mixture Models.
80	Counting Motifs with Graph Sampling	Jason M. Klusowski, Yihong Wu	In this paper, we study the problem of estimating the number of motifs as induced subgraphs under both models from a statistical perspective.
81	Approximate Nearest Neighbors in Limited Space	Piotr Indyk, Tal Wagner	We consider the $(1+\epsilon)$-approximate nearest neighbor search problem: given a set $X$ of $n$ points in a $d$-dimensional space, build a data structure that, given any query point $y$, finds a point $x \in X$ whose distance to $y$ is at most $(1+\epsilon) \min_{x \in X} \\|x-y\\|$ for an accuracy parameter $\epsilon \in (0,1)$.
82	Breaking the $1/\sqrt{n}$ Barrier: Faster Rates for Permutation-based Models in Polynomial Time	Cheng Mao, Ashwin Pananjady, Martin J. Wainwright	We consider the problem of estimating such a matrix based on noisy observations of a subset of its entries, and design and analyze a polynomial-time algorithm that improves upon the state of the art.
83	Unleashing Linear Optimizers for Group-Fair Learning and Optimization	Daniel Alabi, Nicole Immorlica, Adam Kalai	Most systems and learning algorithms optimize average performance or average loss – one reason being computational complexity.
84	The Many Faces of Exponential Weights in Online Learning	Dirk Hoeven, Tim Erven, Wojciech Kotlowski	Here we explore the alternative approach of putting Exponential Weights (EW) first.
85	Sampling as optimization in the space of measures: The Langevin dynamics as a composite optimization problem	Andre Wibisono	We propose the symmetrized Langevin algorithm (SLA), which should have a smaller bias than ULA, at the price of implementing a proximal gradient step in space.
86	Online Learning: Sufficient Statistics and the Burkholder Method	Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan	To demonstrate the scope and effectiveness of the Burkholder method, we develop a novel online strategy for matrix prediction that attains a regret bound corresponding to the variance term in matrix concentration inequalities.
87	Minimax Bounds on Stochastic Batched Convex Optimization	John Duchi, Feng Ruan, Chulhee Yun	We study the stochastic batched convex optimization problem, in which we use many \emph{parallel} observations to optimize a convex function given limited rounds of interaction.
88	Geometric Lower Bounds for Distributed Parameter Estimation under Communication Constraints	Yanjun Han, Ayfer �zg�r, Tsachy Weissman	For other models however, we show that the sample size reduction is re-mediated only linearly with increasing $k$, e.g. when some sub-Gaussian structure is available.
89	Local moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance	Yanjun Han, Jiantao Jiao, Tsachy Weissman	We present \emph{Local Moment Matching (LMM)}, a unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance.
90	Iterate Averaging as Regularization for Stochastic Gradient Descent	Gergely Neu, Lorenzo Rosasco	We propose and analyze a variant of the classic Polyak–Ruppert averaging scheme, broadly used in stochastic gradient methods.
91	Smoothed analysis for low-rank solutions to semidefinite programs in quadratic penalty form	Srinadh Bhojanapalli, Nicolas Boumal, Prateek Jain, Praneeth Netrapalli	In pursuit of low-rank solutions and low complexity algorithms, we consider the Burer–Monteiro factorization approach for solving SDPs.
92	Certified Computation from Unreliable Datasets	Themis Gouleakis, Christos Tzamos, Manolis Zampetakis	In this work, we provide a generic approach that is based on \textit{verification} of only few records of the data set to guarantee high quality learning outcomes for various optimization objectives.
93	Open Problem: The Dependence of Sample Complexity Lower Bounds on Planning Horizon	Nan Jiang, Alekh Agarwal	How can we explain such a difference?
94	Open problem: Improper learning of mixtures of Gaussians	Elad Hazan, Livni Roi	Open problem: Improper learning of mixtures of Gaussians