Paper Digest: ICML 2019 Highlights

May 23, 2019October 5, 2019 admin

Download ICML-2019-Paper-Digests.pdf– highlights of all ICML-2019 papers (.PDF file size is ~0.5M).

The 2019 International Conference on Machine Learning (ICML) is one of the top machine learning conferences in the world. In 2019, it is to be held in Long Beach, California. There were ~3,400 paper submissions, of which 774 were accepted. 519 papers also published their code (download link).

To help AI community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting AI paper, you are welcome to sign up our free paper digest service to get new paper updates customized to your own interests on a daily basis.

Paper Digest Team
team@paperdigest.org

TABLE 1: ICML 2019 Papers

	Title	Authors	Highlight
1	AReS and MaRS Adversarial and MMD-Minimizing Regression for SDEs	Gabriele Abbati, Philippe Wenk, Michael A. Osborne, Andreas Krause, Bernhard Sch?lkopf, Stefan Bauer	In this paper, we propose a novel, probabilistic model for estimating the drift and diffusion given noisy observations of the underlying stochastic system.
2	Dynamic Weights in Multi-Objective Deep Reinforcement Learning	Axel Abels, Diederik Roijers, Tom Lenaerts, Ann Now?, Denis Steckelmacher	We generalize across weight changes and high-dimensional inputs by proposing a multi-objective Q-network whose outputs are conditioned on the relative importance of objectives and we introduce Diverse Experience Replay (DER) to counter the inherent non-stationarity of the Dynamic Weights setting.
3	MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing	Sami Abu-El-Haija, Bryan Perozzi, Amol Kapoor, Nazanin Alipourfard, Kristina Lerman, Hrayr Harutyunyan, Greg Ver Steeg, Aram Galstyan	To address this weakness, we propose a new model, MixHop, that can learn these relationships, including difference operators, by repeatedly mixing feature representations of neighbors at various distances.
4	Communication-Constrained Inference and the Role of Shared Randomness	Jayadev Acharya, Clement Canonne, Himanshu Tyagi	We propose a general purpose simulate-and-infer strategy that uses only private-coin communication protocols and is sample-optimal for distribution learning.
5	Distributed Learning with Sublinear Communication	Jayadev Acharya, Chris De Sa, Dylan Foster, Karthik Sridharan	Our main result is that by slightly relaxing the standard boundedness assumptions for linear models, we can obtain distributed algorithms that enjoy optimal error with communication logarithmic in dimension.
6	Communication Complexity in Locally Private Distribution Estimation and Heavy Hitters	Jayadev Acharya, Ziteng Sun	We propose a sample-optimal $\eps$-locally differentially private (LDP) scheme for distribution estimation, where each user communicates one bit, and requires no public randomness.
7	Learning Models from Data with Measurement Error: Tackling Underreporting	Roy Adams, Yuelong Ji, Xiaobin Wang, Suchi Saria	In this paper we present a method for estimating the distribution of an outcome given a binary exposure that is subject to underreporting. As studies based on observational data are increasingly used to inform decisions with real-world impact, it is critical that we develop a robust set of techniques for analyzing and adjusting for these biases.
8	TibGM: A Transferable and Information-Based Graphical Model Approach for Reinforcement Learning	Tameem Adel, Adrian Weller	Here we propose a flexible GM-based RL framework which leverages efficient inference procedures to enhance generalisation and transfer power.
9	PAC Learnability of Node Functions in Networked Dynamical Systems	Abhijin Adiga, Chris J Kuhlman, Madhav Marathe, S Ravi, Anil Vullikanti	We consider the PAC learnability of the local functions at the vertices of a discrete networked dynamical system, assuming that the underlying network is known.
10	Static Automatic Batching In TensorFlow	Ashish Agarwal	To address this we extend TensorFlow with pfor, a parallel-for loop optimized using static loop vectorization.
11	Efficient Full-Matrix Adaptive Regularization	Naman Agarwal, Brian Bullins, Xinyi Chen, Elad Hazan, Karan Singh, Cyril Zhang, Yi Zhang	We show how to modify full-matrix adaptive regularization in order to make it practical and effective.
12	Online Control with Adversarial Disturbances	Naman Agarwal, Brian Bullins, Elad Hazan, Sham Kakade, Karan Singh	We present an efficient algorithm that achieves nearly-tight regret bounds in this setting.
13	Fair Regression: Quantitative Definitions and Reduction-Based Algorithms	Alekh Agarwal, Miroslav Dudik, Zhiwei Steven Wu	In this paper, we study the prediction of a real-valued target, such as a risk score or recidivism rate, while guaranteeing a quantitative notion of fairness with respect to a protected attribute such as gender or race.
14	Learning to Generalize from Sparse and Underspecified Rewards	Rishabh Agarwal, Chen Liang, Dale Schuurmans, Mohammad Norouzi	We propose Meta Reward Learning (MeRL) to construct an auxiliary reward function that provides more refined feedback for learning.
15	The Kernel Interaction Trick: Fast Bayesian Discovery of Pairwise Interactions in High Dimensions	Raj Agrawal, Brian Trippe, Jonathan Huggins, Tamara Broderick	Our key insight is that many hierarchical models of practical interest admit a Gaussian process representation such that rather than maintaining a posterior over all O(p^2) interactions, we need only maintain a vector of O(p) kernel hyper-parameters.
16	Understanding the Impact of Entropy on Policy Optimization	Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans	In this work, we analyze this claim using new visualizations of the optimization landscape based on randomly perturbing the loss function.
17	Fairwashing: the risk of rationalization	Ulrich Aivodji, Hiromi Arai, Olivier Fortineau, S?bastien Gambs, Satoshi Hara, Alain Tapp	Our solution, LaundryML, is based on a regularized rule list enumeration algorithm whose objective is to search for fair rule lists approximating an unfair black-box model.
18	Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search	Youhei Akimoto, Shinichi Shirakawa, Nozomu Yoshinari, Kento Uchida, Shota Saito, Kouhei Nishida	We propose a stochastic natural gradient method with an adaptive step-size mechanism built upon our theoretical investigation (robust).
19	Projections for Approximate Policy Iteration Algorithms	Riad Akrour, Joni Pajarinen, Jan Peters, Gerhard Neumann	In this paper, we propose to improve over such solutions by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent.
20	Validating Causal Inference Models via Influence Functions	Ahmed Alaa, Mihaela Van Der Schaar	In this paper, we use influence functions {—} the functional derivatives of a loss function {—} to develop a model validation procedure that estimates the estimation error of causal inference methods.
21	Multi-objective training of Generative Adversarial Networks with multiple discriminators	Isabela Albuquerque, Joao Monteiro, Thang Doan, Breandan Considine, Tiago Falk, Ioannis Mitliagkas	In this work, we revisit the multiple-discriminator setting by framing the simultaneous minimization of losses provided by different models as a multi-objective optimization problem.
22	Graph Element Networks: adaptive, structured computation and memory	Ferran Alet, Adarsh Keshav Jeewajee, Maria Bauza Villalonga, Alberto Rodriguez, Tomas Lozano-Perez, Leslie Kaelbling	We explore the use of graph neural networks (GNNs) to model spatial processes in which there is no a priori graphical structure.
23	Analogies Explained: Towards Understanding Word Embeddings	Carl Allen, Timothy Hospedales	We derive a probabilistically grounded definition of paraphrasing that we re-interpret as word transformation, a mathematical description of “$w_x$ is to $w_y$”.
24	Infinite Mixture Prototypes for Few-shot Learning	Kelsey Allen, Evan Shelhamer, Hanul Shin, Joshua Tenenbaum	We propose infinite mixture prototypes to adaptively represent both simple and complex data distributions for few-shot learning.
25	A Convergence Theory for Deep Learning via Over-Parameterization	Zeyuan Allen-Zhu, Yuanzhi Li, Zhao Song	In this work, we prove simple algorithms such as stochastic gradient descent (SGD) can find Global Minima on the training objective of DNNs in Polynomial Time.
26	Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation	Ahsan Alvi, Binxin Ru, Jan-Peter Calliess, Stephen Roberts, Michael A. Osborne	We address this problem by developing an approach, Penalising Locally for Asynchronous Bayesian Optimisation on K Workers (PLAyBOOK), for asynchronous parallel BO.
27	Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy	Kareem Amin, Alex Kulesza, Andres Munoz, Sergei Vassilvtiskii	Here, we characterize this trade-off for an empirical risk minimization setting, showing that in general there is a “sweet spot” that depends on measurable properties of the dataset, but that there is also a concrete cost to privacy that cannot be avoided simply by collecting more data.
28	Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Value Approximation	Marco Ancona, Cengiz Oztireli, Markus Gross	In this work, by leveraging recent results on uncertainty propagation, we propose a novel, polynomial-time approximation of Shapley values in deep neural networks.
29	Scaling Up Ordinal Embedding: A Landmark Approach	Jesse Anderton, Javed Aslam	We propose a novel landmark-based method as a partial solution.
30	Sorting Out Lipschitz Function Approximation	Cem Anil, James Lucas, Roger Grosse	Based on this, we propose to combine a gradient norm preserving activation function, GroupSort, with norm-constrained weight matrices.
31	Sparse Multi-Channel Variational Autoencoder for the Joint Analysis of Heterogeneous Data	Luigi Antelmi, Nicholas Ayache, Philippe Robert, Marco Lorenzi	To tackle this problem, in this work we extend the variational framework of VAE to bring parsimony and interpretability when jointly account for latent relationships across multiple channels.
32	Unsupervised Label Noise Modeling and Loss Correction	Eric Arazo, Diego Ortego, Paul Albert, Noel O?Connor, Kevin Mcguinness	Specifically, we propose a beta mixture to estimate this probability and correct the loss by relying on the network prediction (the so-called bootstrapping loss).
33	Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks	Sanjeev Arora, Simon Du, Wei Hu, Zhiyuan Li, Ruosong Wang	This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR’17]. (ii) Generalization bound independent of network size, using a data-dependent complexity measure.
34	Distributed Weighted Matching via Randomized Composable Coresets	Sepehr Assadi, Mohammadhossein Bateni, Vahab Mirrokni	In this paper, we develop a simple distributed algorithm for the problem on general graphs with approximation guarantee of 2 + eps that (nearly) matches that of the sequential greedy algorithm.
35	Stochastic Gradient Push for Distributed Deep Learning	Mahmoud Assran, Nicolas Loizou, Nicolas Ballas, Mike Rabbat	This paper studies Stochastic Gradient Push (SGP), which combines PushSum with stochastic gradient updates.
36	Bayesian Optimization of Composite Functions	Raul Astudillo, Peter Frazier	We consider optimization of composite objective functions, i.e., of the form $f(x)=g(h(x))$, where $h$ is a black-box derivative-free expensive-to-evaluate function with vector-valued outputs, and $g$ is a cheap-to-evaluate real-valued function.
37	Linear-Complexity Data-Parallel Earth Mover?s Distance Approximations	Kubilay Atasu, Thomas Mittelholzer	We propose novel approximation algorithms that overcome both of these limitations, yet still achieve linear time complexity.
38	Benefits and Pitfalls of the Exponential Mechanism with Applications to Hilbert Spaces and Functional PCA	Jordan Awan, Ana Kenney, Matthew Reimherr, Aleksandra Slavkovic	We study its extension to settings with summaries based on infinite dimensional outputs such as with functional data analysis, shape analysis, and nonparametric statistics.
39	Feature Grouping as a Stochastic Regularizer for High-Dimensional Structured Data	Sergul Aydore, Bertrand Thirion, Gael Varoquaux	We propose a new regularizer specifically designed to leverage structure in the data in a way that can be applied efficiently to complex models.
40	Beyond the Chinese Restaurant and Pitman-Yor processes: Statistical Models with double power-law behavior	Fadhel Ayed, Juho Lee, Francois Caron	In this paper, we introduce a class of completely random measures which are doubly regularly-varying.
41	Scalable Fair Clustering	Arturs Backurs, Piotr Indyk, Krzysztof Onak, Baruch Schieber, Ali Vakilian, Tal Wagner	In this paper, we present a practical approximate fairlet decomposition algorithm that runs in nearly linear time.
42	Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs	Yogesh Balaji, Hamed Hassani, Rama Chellappa, Soheil Feizi	In this work, we resolve this issue by constructing an explicit probability model that can be used to compute sample likelihood statistics in GANs.
43	Provable Guarantees for Gradient-Based Meta-Learning	Maria-Florina Balcan, Mikhail Khodak, Ameet Talwalkar	We study the problem of meta-learning through the lens of online convex optimization, developing a meta-algorithm bridging the gap between popular gradient-based meta-learning and classical regularization-based multi-task transfer methods.
44	Open-ended learning in symmetric zero-sum games	David Balduzzi, Marta Garnelo, Yoram Bachrach, Wojciech Czarnecki, Julien Perolat, Max Jaderberg, Thore Graepel	In this paper, we introduce a geometric framework for formulating agent objectives in zero-sum games, in order to construct adaptive sequences of objectives that yield open-ended learning.
45	Concrete Autoencoders: Differentiable Feature Selection and Reconstruction	Muhammed Fatih Balin, Abubakar Abid, James Zou	We introduce the concrete autoencoder, an end-to-end differentiable method for global feature selection, which efficiently identifies a subset of the most informative features and simultaneously learns a neural network to reconstruct the input data from the selected features.
46	HOList: An Environment for Machine Learning of Higher Order Logic Theorem Proving	Kshitij Bansal, Sarah Loos, Markus Rabe, Christian Szegedy, Stewart Wilcox	We present an environment, benchmark, and deep learning driven automated theorem prover for higher-order logic.
47	Structured agents for physical construction	Victor Bapst, Alvaro Sanchez-Gonzalez, Carl Doersch, Kimberly Stachenfeld, Pushmeet Kohli, Peter Battaglia, Jessica Hamrick	We examine how a range of deep reinforcement learning agents fare on these challenges, and introduce several new approaches which provide superior performance.
48	Learning to Route in Similarity Graphs	Dmitry Baranchuk, Dmitry Persiyanov, Anton Sinitsin, Artem Babenko	In this paper we propose to learn the routing function that overcomes local minima via incorporating information about the graph global structure.
49	A Personalized Affective Memory Model for Improving Emotion Recognition	Pablo Barros, German Parisi, Stefan Wermter	In this paper, we present a neural model based on a conditional adversarial autoencoder to learn how to represent and edit general emotion expressions.
50	Scale-free adaptive planning for deterministic dynamics & discounted rewards	Peter Bartlett, Victor Gabillon, Jennifer Healey, Michal Valko	We introduce PlaTypOOS, an adaptive, robust, and efficient alternative to the OLOP (open-loop optimistic planning) algorithm.
51	Pareto Optimal Streaming Unsupervised Classification	Soumya Basu, Steven Gutstein, Brent Lance, Sanjay Shakkottai	In this paper, we characterize the Pareto-optimal region of accuracy and arrival rate, and develop an algorithm that can operate at any point within this region.
52	Categorical Feature Compression via Submodular Optimization	Mohammadhossein Bateni, Lin Chen, Hossein Esfandiari, Thomas Fu, Vahab Mirrokni, Afshin Rostamizadeh	To address this, we introduce a novel re-parametrization of the mutual information objective, which we prove is submodular, and also design a data structure to query the submodular function in amortized O(logn) time (where n is the input vocabulary size).
53	Noise2Self: Blind Denoising by Self-Supervision	Joshua Batson, Loic Royer	We propose a general framework for denoising high-dimensional measurements which requires no prior on the signal, no estimate of the noise, and no clean training data.
54	Efficient optimization of loops and limits with randomized telescoping sums	Alex Beatson, Ryan P Adams	We propose randomized telescope (RT) gradient estimators, which represent the objective as the sum of a telescoping series and sample linear combinations of terms to provide cheap unbiased gradient estimates.
55	Recurrent Kalman Networks: Factorized Inference in High-Dimensional Deep Feature Spaces	Philipp Becker, Harit Pandya, Gregor Gebhardt, Cheng Zhao, C. James Taylor, Gerhard Neumann	We propose a new deep approach to Kalman filtering which can be learned directly in an end-to-end manner using backpropagation without additional approximations.
56	Switching Linear Dynamics for Variational Bayes Filtering	Philip Becker-Ehmck, Jan Peters, Patrick Van Der Smagt	Leveraging Bayesian inference, Variational Autoencoders and Concrete relaxations, we show how to learn a richer and more meaningful state space, e.g. encoding joint constraints and collisions with walls in a maze, from partial and high-dimensional observations.
57	Active Learning for Probabilistic Structured Prediction of Cuts and Matchings	Sima Behpour, Anqi Liu, Brian Ziebart	We propose an adversarial approach for active learning with structured prediction domains that is tractable for cuts and matching.
58	Invertible Residual Networks	Jens Behrmann, Will Grathwohl, Ricky T. Q. Chen, David Duvenaud, Joern-Henrik Jacobsen	To compute likelihoods, we introduce a tractable approximation to the Jacobian log-determinant of a residual block.
59	Greedy Layerwise Learning Can Scale To ImageNet	Eugene Belilovsky, Michael Eickenberg, Edouard Oyallon	Here we use 1-hidden layer learning problems to sequentially build deep networks layer by layer, which can inherit properties from shallow networks.
60	Overcoming Multi-model Forgetting	Yassine Benyahia, Kaicheng Yu, Kamil Bennani Smires, Martin Jaggi, Anthony C. Davison, Mathieu Salzmann, Claudiu Musat	To overcome this, we introduce a statistically-justified weight plasticity loss that regularizes the learning of a model’s shared parameters according to their importance for the previous models, and demonstrate its effectiveness when training two models sequentially and for neural architecture search.
61	Optimal Kronecker-Sum Approximation of Real Time Recurrent Learning	Frederik Benzing, Marcelo Matheus Gauy, Asier Mujika, Anders Martinsson, Angelika Steger	We present a new approximation algorithm of RTRL, Optimal Kronecker-Sum Approximation (OK).
62	Adversarially Learned Representations for Information Obfuscation and Inference	Martin Bertran, Natalia Martinez, Afroditi Papadaki, Qiang Qiu, Miguel Rodrigues, Galen Reeves, Guillermo Sapiro	In this work, we take an information theoretic approach that is implemented as an unconstrained adversarial game between Deep Neural Networks in a principled, data-driven manner.
63	Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case	Alina Beygelzimer, David Pal, Balazs Szorenyi, Devanathan Thiruvenkatachari, Chen-Yu Wei, Chicheng Zhang	In this work, we take a first step towards this problem.
64	Analyzing Federated Learning through an Adversarial Lens	Arjun Nitin Bhagoji, Supriyo Chakraborty, Prateek Mittal, Seraphin Calo	In this work, we explore how the federated learning setting gives rise to a new threat, namely model poisoning, which differs from traditional data poisoning.
65	Optimal Continuous DR-Submodular Maximization and Applications to Provable Mean Field Inference	Yatao Bian, Joachim Buhmann, Andreas Krause	In this work we propose provable mean filed methods for probabilistic log-submodular models and its posterior agreement (PA) with strong approximation guarantees.
66	More Efficient Off-Policy Evaluation through Regularized Targeted Learning	Aurelien Bibaut, Ivana Malenica, Nikos Vlassis, Mark Van Der Laan	In particular, we introduce a novel doubly-robust estimator for the OPE problem in RL, based on the Targeted Maximum Likelihood Estimation principle from the statistical causal inference literature.
67	A Kernel Perspective for Regularizing Deep Neural Networks	Alberto Bietti, Gr?goire Mialon, Dexiong Chen, Julien Mairal	We propose a new point of view for regularizing deep neural networks by using the norm of a reproducing kernel Hilbert space (RKHS).
68	Rethinking Lossy Compression: The Rate-Distortion-Perception Tradeoff	Yochai Blau, Tomer Michaeli	In this paper, we adopt the mathematical definition of perceptual quality recently proposed by Blau & Michaeli (2018), and use it to study the three-way tradeoff between rate, distortion, and perception.
69	Correlated bandits or: How to minimize mean-squared error online	Vinay Praneeth Boda, Prashanth L.A.	Under a best-arm identification framework, we propose a successive rejects type algorithm and provide bounds on the probability of error in identifying the best arm.
70	Adversarial Attacks on Node Embeddings via Graph Poisoning	Aleksandar Bojchevski, Stephan G?nnemann	We provide the first adversarial vulnerability analysis on the widely used family of methods based on random walks.
71	Online Variance Reduction with Mixtures	Zal?n Borsos, Sebastian Curi, Kfir Yehuda Levy, Andreas Krause	In this work, we propose a new framework for variance reduction that enables the use of mixtures over predefined sampling distributions, which can naturally encode prior knowledge about the data.
72	Compositional Fairness Constraints for Graph Embeddings	Avishek Bose, William Hamilton	Here, we introduce an adversarial framework to enforce fairness constraints on graph embeddings.
73	Unreproducible Research is Reproducible	Xavier Bouthillier, C?sar Laurent, Pascal Vincent	This work is an attempt to promote the use of more rigorous and diversified methodologies.
74	Blended Conditonal Gradients	G?bor Braun, Sebastian Pokutta, Dan Tu, Stephen Wright	We present a blended conditional gradient approach for minimizing a smooth convex function over a polytope P, combining the Frank{–}Wolfe algorithm (also called conditional gradient) with gradient-based steps, different from away steps and pairwise steps, but still achieving linear convergence for strongly convex functions, along with good practical performance.
75	Coresets for Ordered Weighted Clustering	Vladimir Braverman, Shaofeng H.-C. Jiang, Robert Krauthgamer, Xuan Wu	Our main result is a construction of a simultaneous coreset of size O?,d(k2log2\|X\|) for Ordered k-Median.
76	Target Tracking for Contextual Bandits: Application to Demand Side Management	Margaux Br?g?re, Pierre Gaillard, Yannig Goude, Gilles Stoltz	We propose a contextual-bandit approach for demand side management by offering price incentives.
77	Active Manifolds: A non-linear analogue to Active Subspaces	Robert Bridges, Anthony Gruber, Christopher Felder, Miki Verma, Chelsey Hoff	We present an approach to analyze $C^1(\mathbb{R}^m)$ functions that addresses limitations present in the Active Subspaces (AS) method of Constantine et al. (2014; 2015).
78	Conditioning by adaptive sampling for robust design	David Brookes, Hahnbeom Park, Jennifer Listgarten	We present a method for design problems wherein the goal is to maximize or specify the value of one or more properties of interest (e.g. maximizing the fluorescence of a protein).
79	Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations	Daniel Brown, Wonjoon Goo, Prabhat Nagarajan, Scott Niekum	In this paper, we introduce a novel reward-learning-from-observation algorithm, Trajectory-ranked Reward EXtrapolation (T-REX), that extrapolates beyond a set of (approximately) ranked demonstrations in order to infer high-quality reward functions from a set of potentially poor demonstrations.
80	Deep Counterfactual Regret Minimization	Noam Brown, Adam Lerer, Sam Gross, Tuomas Sandholm	This paper introduces Deep Counterfactual Regret Minimization, a form of CFR that obviates the need for abstraction by instead using deep neural networks to approximate the behavior of CFR in the full game.
81	Understanding the Origins of Bias in Word Embeddings	Marc-Etienne Brunet, Colleen Alkalay-Houlihan, Ashton Anderson, Richard Zemel	In this work we develop a technique to address this question.
82	Low Latency Privacy Preserving Inference	Alon Brutzkus, Ran Gilad-Bachrach, Oren Elisha	In this study we provide two solutions that address these limitations.
83	Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem	Alon Brutzkus, Amir Globerson	In this work, we provide theoretical and empirical evidence that, in certain cases, overparameterized convolutional networks generalize better than small networks because of an interplay between weight clustering and feature exploration at initialization.
84	Adversarial examples from computational constraints	Sebastien Bubeck, Yin Tat Lee, Eric Price, Ilya Razenshteyn	Why are classifiers in high dimension vulnerable to “adversarial” perturbations? We show that it is likely not due to information theoretic limitations, but rather it could be due to computational constraints.
85	Self-similar Epochs: Value in arrangement	Eliav Buchnik, Edith Cohen, Avinatan Hasidim, Yossi Matias	We hypothesize that the training can be more effective with self-similar arrangements that potentially allow each epoch to provide benefits of multiple ones.
86	Learning Generative Models across Incomparable Spaces	Charlotte Bunne, David Alvarez-Melis, Andreas Krause, Stefanie Jegelka	In this work, we propose an approach to learn generative models across such incomparable spaces, and demonstrate how to steer the learned distribution towards target properties.
87	Rates of Convergence for Sparse Variational Gaussian Process Regression	David Burt, Carl Edward Rasmussen, Mark Van Der Wilk	We show that with high probability the KL divergence can be made arbitrarily small by growing $M$ more slowly than $N$.
88	What is the Effect of Importance Weighting in Deep Learning?	Jonathon Byrd, Zachary Lipton	We present the surprising finding that while importance weighting impacts models early in training, its effect diminishes over successive epochs.
89	A Quantitative Analysis of the Effect of Batch Normalization on Gradient Descent	Yongqiang Cai, Qianxiao Li, Zuowei Shen	In this paper, we provide such an analysis on the simple problem of ordinary least squares (OLS), where the precise dynamical properties of gradient descent (GD) is completely known, thus allowing us to isolate and compare the additional effects of BN.
90	Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances	Bugra Can, Mert Gurbuzbalaban, Lingjiong Zhu	For strongly convex problems, we show that the distribution of the iterates of AG converges with the accelerated $O(\sqrt{\kappa}\log(1/\varepsilon))$ linear rate to a ball of radius $\varepsilon$ centered at a unique invariant distribution in the 1-Wasserstein metric where $\kappa$ is the condition number as long as the noise variance is smaller than an explicit upper bound we can provide.
91	Active Embedding Search via Noisy Paired Comparisons	Gregory Canal, Andy Massimino, Mark Davenport, Christopher Rozell	In such tasks, queries can be extremely costly and subject to varying levels of response noise; thus, we aim to actively choose pairs that are most informative given the results of previous comparisons.
92	Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem	Junyu Cao, Wei Sun	For the offline version with known customers’ preferences, we propose a polynomial-time algorithm and characterize the properties of the optimal tiered product recommendation.
93	Competing Against Nash Equilibria in Adversarially Changing Zero-Sum Games	Adrian Rivera Cardoso, Jacob Abernethy, He Wang, Huan Xu	But when the payoff matrix evolves over time our goal is to find a sequential algorithm that can compete with, in a certain sense, the NE of the long-term-averaged payoff matrix.
94	Automated Model Selection with Bayesian Quadrature	Henry Chai, Jean-Francois Ton, Michael A. Osborne, Roman Garnett	We present a novel technique for tailoring Bayesian quadrature (BQ) to model selection.
95	Learning Action Representations for Reinforcement Learning	Yash Chandak, Georgios Theocharous, James Kostas, Scott Jordan, Philip Thomas	We provide an algorithm to both learn and use action representations and provide conditions for its convergence.
96	Dynamic Measurement Scheduling for Event Forecasting using Deep RL	Chun-Hao Chang, Mingjie Mai, Anna Goldenberg	We answer this question by deep reinforcement learning (RL) that jointly minimizes the measurement cost and maximizes predictive gain, by scheduling strategically-timed measurements.
97	On Symmetric Losses for Learning from Corrupted Labels	Nontawat Charoenphakdee, Jongyeong Lee, Masashi Sugiyama	This paper aims to provide a better understanding of a symmetric loss.
98	Online learning with kernel losses	Niladri Chatterji, Aldo Pacchiano, Peter Bartlett	We present a generalization of the adversarial linear bandits framework, where the underlying losses are kernel functions (with an associated reproducing kernel Hilbert space) rather than linear functions.
99	Neural Network Attributions: A Causal Perspective	Aditya Chattopadhyay, Piyushi Manupriya, Anirban Sarkar, Vineeth N Balasubramanian	We propose a new attribution method for neural networks developed using ?rst principles of causality (to the best of our knowledge, the ?rst such).
100	PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits	Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan	We present a lower bound on the worst-case sample complexity for general k, and a fully sequential PAC algorithm, LUCB-k-m, which is more sample-efficient on easy instances.
101	Nearest Neighbor and Kernel Survival Analysis: Nonasymptotic Error Bounds and Strong Consistency Rates	George Chen	We establish the first nonasymptotic error bounds for Kaplan-Meier-based nearest neighbor and kernel survival probability estimators where feature vectors reside in metric spaces.
102	Stein Point Markov Chain Monte Carlo	Wilson Ye Chen, Alessandro Barp, Francois-Xavier Briol, Jackson Gorham, Mark Girolami, Lester Mackey, Chris Oates	Stein Point Markov Chain Monte Carlo
103	Particle Flow Bayes? Rule	Xinshi Chen, Hanjun Dai, Le Song	We present a particle flow realization of Bayes’ rule, where an ODE-based neural operator is used to transport particles from a prior to its posterior after a new observation.
104	Proportionally Fair Clustering	Xingyu Chen, Brandon Fain, Liang Lyu, Kamesh Munagala	We present and analyze algorithms to efficiently compute, optimize, and audit proportional solutions.
105	Information-Theoretic Considerations in Batch Reinforcement Learning	Jinglin Chen, Nan Jiang	In this paper, we revisit these assumptions and provide theoretical results towards answering the above questions, and make steps towards a deeper understanding of value-function approximation.
106	Generative Adversarial User Model for Reinforcement Learning Based Recommendation System	Xinshi Chen, Shuang Li, Hui Li, Shaohua Jiang, Yuan Qi, Le Song	In this paper, we propose a novel model-based reinforcement learning framework for recommendation systems, where we develop a generative adversarial network to imitate user behavior dynamics and learn her reward function.
107	Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels	Pengfei Chen, Ben Ben Liao, Guangyong Chen, Shengyu Zhang	In this paper, we find that the test accuracy can be quantitatively characterized in terms of the noise ratio in datasets.
108	A Gradual, Semi-Discrete Approach to Generative Network Training via Explicit Wasserstein Minimization	Yucheng Chen, Matus Telgarsky, Chao Zhang, Bolton Bailey, Daniel Hsu, Jian Peng	This paper provides a simple procedure to fit generative networks to target distributions, with the goal of a small Wasserstein distance (or other optimal transport costs).
109	Transferability vs. Discriminability: Batch Spectral Penalization for Adversarial Domain Adaptation	Xinyang Chen, Sinan Wang, Mingsheng Long, Jianmin Wang	In this paper, a series of experiments based on spectral analysis of the feature representations have been conducted, revealing an unexpected deterioration of the discriminability while learning transferable features adversarially.
110	Fast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications	Pin-Yu Chen, Lingfei Wu, Sijia Liu, Indika Rajapakse	In this paper, we propose a new computational framework, Fast Incremental von Neumann Graph EntRopy (FINGER), which approaches VNGE with a performance guarantee.
111	Katalyst: Boosting Convex Katayusha for Non-Convex Problems with a Large Condition Number	Zaiyi Chen, Yi Xu, Haoyuan Hu, Tianbao Yang	In this paper, we present a simple but non-trivial boosting of a state-of-the-art SVRG-type method for convex problems (namely Katyusha) to enjoy an improved complexity for solving non-convex problems with a large condition number (that is close to a convex function).
112	Multivariate-Information Adversarial Ensemble for Scalable Joint Distribution Matching	Ziliang Chen, Zhanfu Yang, Xiaoxi Wang, Xiaodan Liang, Xiaopeng Yan, Guanbin Li, Liang Lin	In this paper, we propose a domain-scalable DGM, i.e., MMI-ALI for $m$-domain joint distribution matching.
113	Robust Decision Trees Against Adversarial Examples	Hongge Chen, Huan Zhang, Duane Boning, Cho-Jui Hsieh	In this paper, we show that tree-based models are also vulnerable to adversarial examples and develop a novel algorithm to learn robust trees.
114	RaFM: Rank-Aware Factorization Machines	Xiaoshuang Chen, Yin Zheng, Jiaxing Wang, Wenye Ma, Junzhou Huang	Different from existing FM-based approaches which use a fixed rank for all features, this paper proposes a Rank-Aware FM (RaFM) model which adopts pairwise interactions from embeddings with different ranks.
115	Control Regularization for Reduced Variance Reinforcement Learning	Richard Cheng, Abhinav Verma, Gabor Orosz, Swarat Chaudhuri, Yisong Yue, Joel Burdick	Focusing on problems arising in continuous control, we propose a functional regularization approach to augmenting model-free RL.
116	Predictor-Corrector Policy Optimization	Ching-An Cheng, Xinyan Yan, Nathan Ratliff, Byron Boots	We present a predictor-corrector framework, called PicCoLO, that can transform a first-order model-free reinforcement or imitation learning algorithm into a new hybrid method that leverages predictive models to accelerate policy learning.
117	Variational Inference for sparse network reconstruction from count data	Julien Chiquet, Stephane Robin, Mahendra Mariadassou	In this work, we consider instead a full-fledged probabilistic model with a latent layer where the counts follow Poisson distributions, conditional to latent (hidden) Gaussian correlated variables.
118	Random Walks on Hypergraphs with Edge-Dependent Vertex Weights	Uthsav Chitra, Benjamin Raphael	In this paper, we use random walks to develop a spectral theory for hypergraphs with edge-dependent vertex weights: hypergraphs where every vertex v has a weight $\gamma_e(v)$ for each incident hyperedge e that describes the contribution of v to the hyperedge e.
119	Neural Joint Source-Channel Coding	Kristy Choi, Kedar Tatwawadi, Aditya Grover, Tsachy Weissman, Stefano Ermon	In this work, we propose to jointly learn the encoding and decoding processes using a new discrete variational autoencoder model.
120	Beyond Backprop: Online Alternating Minimization with Auxiliary Variables	Anna Choromanska, Benjamin Cowen, Sadhana Kumaravel, Ronny Luss, Mattia Rigotti, Irina Rish, Paolo Diachille, Viatcheslav Gurev, Brian Kingsbury, Ravi Tejwani, Djallel Bouneffouf	The main contribution of our work is a novel online (stochastic/mini-batch) alternating minimization (AM) approach for training deep neural networks, together with the first theoretical convergence guarantees for AM in stochastic settings and promising empirical results on a variety of architectures and datasets.
121	Unifying Orthogonal Monte Carlo Methods	Krzysztof Choromanski, Mark Rowland, Wenyu Chen, Adrian Weller	In this paper, we present a unifying perspective of many approximate methods by considering Givens transformations, propose new approximate methods based on this framework, and demonstrate the ?rst statistical guarantees for families of approximate methods in kernel approximation.
122	Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning	Casey Chu, Jose Blanchet, Peter Glynn	The goal of this paper is to provide a unifying view of a wide range of problems of interest in machine learning by framing them as the minimization of functionals defined on the space of probability measures.
123	MeanSum: A Neural Model for Unsupervised Multi-Document Abstractive Summarization	Eric Chu, Peter Liu	In our work, we consider the setting where there are only documents (product or business reviews) with no summaries provided, and propose an end-to-end, neural model architecture to perform unsupervised abstractive summarization. Finally, we collect a ground-truth evaluation dataset and show that our model outperforms a strong extractive baseline.
124	Weak Detection of Signal in the Spiked Wigner Model	Hye Won Chung, Ji Oon Lee	In case the signal-to-noise ratio is under the threshold below which a reliable detection is impossible, we propose a hypothesis test based on the linear spectral statistics of the data matrix.
125	New results on information theoretic clustering	Ferdinando Cicalese, Eduardo Laber, Lucas Murtinho	We study the problem of optimizing the clustering of a set of vectors when the quality of the clustering is measured by the Entropy or the Gini impurity measure.
126	Sensitivity Analysis of Linear Structural Causal Models	Carlos Cinelli, Daniel Kumor, Bryant Chen, Judea Pearl, Elias Bareinboim	In this paper, we develop a formal, systematic approach to sensitivity analysis for arbitrary linear Structural Causal Models (SCMs).
127	Dimensionality Reduction for Tukey Regression	Kenneth Clarkson, Ruosong Wang, David Woodruff	We give the first dimensionality reduction methods for the overconstrained Tukey regression problem.
128	On Medians of (Randomized) Pairwise Means	Stephan Clemencon, Pierre Laforgue, Patrice Bertail	It is the purpose of this paper to extend this approach, in order to address other learning problems in particular, for which the performance criterion takes the form of an expectation over pairs of observations rather than over one single observation, as may be the case in pairwise ranking, clustering or metric learning.
129	Quantifying Generalization in Reinforcement Learning	Karl Cobbe, Oleg Klimov, Chris Hesse, Taehoon Kim, John Schulman	In this paper, we investigate the problem of overfitting in deep reinforcement learning.
130	Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models	Eldan Cohen, Christopher Beck	We perform an empirical study of the behavior of beam search across three sequence synthesis tasks.
131	Learning Linear-Quadratic Regulators Efficiently with only $\sqrtT$ Regret	Alon Cohen, Tomer Koren, Yishay Mansour	We present the first computationally-efficient algorithm with $\widetilde{O}(\sqrt{T})$ regret for learning in Linear Quadratic Control systems with unknown dynamics.
132	Certified Adversarial Robustness via Randomized Smoothing	Jeremy Cohen, Elan Rosenfeld, Zico Kolter	We show how to turn any classifier that classifies well under Gaussian noise into a new classifier that is certifiably robust to adversarial perturbations under the L2 norm.
133	Gauge Equivariant Convolutional Networks and the Icosahedral CNN	Taco Cohen, Maurice Weiler, Berkay Kicanaoglu, Max Welling	Here we show how this principle can be extended beyond global symmetries to local gauge transformations.
134	CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning	C?dric Colas, Pierre-Yves Oudeyer, Olivier Sigaud, Pierre Fournier, Mohamed Chetouani	This paper proposes CURIOUS , an algorithm that leverages 1) a modular Universal Value Function Approximator with hindsight learning to achieve a diversity of goals of different kinds within a unique policy and 2) an automated curriculum learning mechanism that biases the attention of the agent towards goals maximizing the absolute learning progress.
135	A fully differentiable beam search decoder	Ronan Collobert, Awni Hannun, Gabriel Synnaeve	We introduce a new beam search decoder that is fully differentiable, making it possible to optimize at training time through the inference procedure.
136	Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets	Rob Cornish, Paul Vanetti, Alexandre Bouchard-Cote, George Deligiannidis, Arnaud Doucet	We propose the Scalable Metropolis-Hastings (SMH) kernel that only requires processing on average $O(1)$ or even $O(1/\sqrt{n})$ data points per step.
137	Adjustment Criteria for Generalizing Experimental Findings	Juan Correa, Jin Tian, Elias Bareinboim	In this paper, we investigate the assumptions and machinery necessary for using covariate adjustment to correct for the biases generated by both of these problems, and generalize experimental data to infer causal effects in a new domain.
138	Online Learning with Sleeping Experts and Feedback Graphs	Corinna Cortes, Giulia Desalvo, Claudio Gentile, Mehryar Mohri, Scott Yang	Our main contribution is then to relax this assumption, present a more general notion of sleeping regret, and derive a general algorithm with strong theoretical guarantees.
139	Active Learning with Disagreement Graphs	Corinna Cortes, Giulia Desalvo, Mehryar Mohri, Ningshan Zhang, Claudio Gentile	We present two novel enhancements of an online importance-weighted active learning algorithm IWAL, using the properties of disagreements among hypotheses.
140	Shape Constraints for Set Functions	Andrew Cotter, Maya Gupta, Heinrich Jiang, Erez Louidor, James Muller, Tamann Narayan, Serena Wang, Tao Zhu	We propose making set functions more understandable and regularized by capturing domain knowledge through shape constraints.
141	Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints	Andrew Cotter, Maya Gupta, Heinrich Jiang, Nathan Srebro, Karthik Sridharan, Serena Wang, Blake Woodworth, Seungil You	To improve generalization, we frame the problem as a two-player game where one player optimizes the model parameters on a training dataset, and the other player enforces the constraints on an independent validation dataset.
142	Monge blunts Bayes: Hardness Results for Adversarial Training	Zac Cranko, Aditya Menon, Richard Nock, Cheng Soon Ong, Zhan Shi, Christian Walder	We suggest a formal answer for losses that satisfy the minimal statistical requirement of being proper.
143	Boosted Density Estimation Remastered	Zac Cranko, Richard Nock	We show how to combine this latter approach and the classical boosting theory in supervised learning to get the first density estimation algorithm that provably achieves geometric convergence under very weak assumptions.
144	Submodular Cost Submodular Cover with an Approximate Oracle	Victoria Crawford, Alan Kuhnle, My Thai	In this work, we study the Submodular Cost Submodular Cover problem, which is to minimize the submodular cost required to ensure that the submodular benefit function exceeds a given threshold.
145	Flexibly Fair Representation Learning by Disentanglement	Elliot Creager, David Madras, Joern-Henrik Jacobsen, Marissa Weis, Kevin Swersky, Toniann Pitassi, Richard Zemel	Taking inspiration from the disentangled representation learning literature, we propose an algorithm for learning compact representations of datasets that are useful for reconstruction and prediction, but are also flexibly fair, meaning they can be easily modified at test time to achieve subgroup demographic parity with respect to multiple sensitive attributes and their conjunctions.
146	Anytime Online-to-Batch, Optimism and Acceleration	Ashok Cutkosky	We close this gap by introducing a black-box modification to any online learning algorithm whose iterates converge to the optimum in stochastic scenarios.
147	Matrix-Free Preconditioning in Online Learning	Ashok Cutkosky, Tamas Sarlos	We provide an online convex optimization algorithm with regret that interpolates between the regret of an algorithm using an optimal preconditioning matrix and one using a diagonal preconditioning matrix.
148	Minimal Achievable Sufficient Statistic Learning	Milan Cvitkovic, G?nther Koliander	We introduce Minimal Achievable Sufficient Statistic (MASS) Learning, a machine learning training objective for which the minima are minimal sufficient statistics with respect to a class of functions being optimized over (e.g., deep networks).
149	Open Vocabulary Learning on Source Code with a Graph-Structured Cache	Milan Cvitkovic, Badal Singh, Animashree Anandkumar	We introduce a Graph-Structured Cache to address this problem; this cache contains a node for each new word the model encounters with edges connecting each word to its occurrences in the code.
150	The Value Function Polytope in Reinforcement Learning	Robert Dadashi, Marc G. Bellemare, Adrien Ali Taiga, Nicolas Le Roux, Dale Schuurmans	Our main contribution is the characterization of the nature of its shape: a general polytope (Aigner et al., 2010).
151	Bayesian Optimization Meets Bayesian Optimal Stopping	Zhongxiang Dai, Haibin Yu, Bryan Kian Hsiang Low, Patrick Jaillet	This paper proposes to unify BO (specifically, Gaussian process-upper confidence bound (GP-UCB)) with Bayesian optimal stopping (BO-BOS) to boost the epoch efficiency of BO.
152	Policy Certificates: Towards Accountable Reinforcement Learning	Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill	We address this lack of accountability by proposing that algorithms output policy certificates.
153	Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations	Tri Dao, Albert Gu, Matthew Eichhorn, Atri Rudra, Christopher Re	Motivated by a characterization of fast matrix-vector multiplication as products of sparse matrices, we introduce a parameterization of divide-and-conquer methods that is capable of representing a large class of transforms.
154	A Kernel Theory of Modern Data Augmentation	Tri Dao, Albert Gu, Alexander Ratner, Virginia Smith, Chris De Sa, Christopher Re	In this paper, we seek to establish a theoretical framework for understanding data augmentation.
155	TarMAC: Targeted Multi-Agent Communication	Abhishek Das, Th?ophile Gervet, Joshua Romoff, Dhruv Batra, Devi Parikh, Mike Rabbat, Joelle Pineau	We propose a targeted communication architecture for multi-agent reinforcement learning, where agents learn both what messages to send and whom to address them to while performing cooperative tasks in partially-observable environments.
156	Teaching a black-box learner	Sanjoy Dasgupta, Daniel Hsu, Stefanos Poulis, Xiaojin Zhu	We consider the problem of teaching a learner whose representation and hypothesis class are unknown—that is, the learner is a black box.
157	Stochastic Deep Networks	Gwendoline De Bie, Gabriel Peyr?, Marco Cuturi	We propose in this work a deep framework designed to handle crucial aspects of measures, namely permutation invariances, variations in weights and cardinality.
158	Learning-to-Learn Stochastic Gradient Descent with Biased Regularization	Giulia Denevi, Carlo Ciliberto, Riccardo Grazzi, Massimiliano Pontil	We present an average excess risk bound for such a learning algorithm that quantifies the potential benefit of using a bias vector with respect to the unbiased case.
159	A Multitask Multiple Kernel Learning Algorithm for Survival Analysis with Application to Cancer Biology	Onur Dereli, Ceyda Oguz, Mehmet G?nen	Rather than performing survival analysis on each data set to predict survival times of cancer patients, we developed a novel multitask approach based on multiple kernel learning (MKL).
160	Learning to Convolve: A Generalized Weight-Tying Approach	Nichita Diaconu, Daniel Worrall	In this paper, we learn how to transform filters for use in the group convolution, focussing on roto-translation.
161	Sever: A Robust Meta-Algorithm for Stochastic Optimization	Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Jacob Steinhardt, Alistair Stewart	To address this, we introduce a new meta-algorithm that can take in a base learner such as least squares or stochastic gradient descent, and harden the learner to be resistant to outliers.
162	Approximated Oracle Filter Pruning for Destructive CNN Width Optimization	Xiaohan Ding, Guiguang Ding, Yuchen Guo, Jungong Han, Chenggang Yan	To address these problems, we propose Approximated Oracle Filter Pruning (AOFP), which keeps searching for the least important filters in a binary search manner, makes pruning attempts by masking out filters randomly, accumulates the resulting errors, and finetunes the model via a multi-path framework.
163	Noisy Dual Principal Component Pursuit	Tianyu Ding, Zhihui Zhu, Tianjiao Ding, Yunchen Yang, Daniel Robinson, Manolis Tsakiris, Rene Vidal	Noisy Dual Principal Component Pursuit
164	Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning	Thinh Doan, Siva Maguluri, Justin Romberg	Our main contribution is providing a finite-time analysis for the convergence of the distributed TD(0) algorithm.
165	Trajectory-Based Off-Policy Deep Reinforcement Learning	Andreas Doerr, Michael Volpp, Marc Toussaint, Trimpe Sebastian, Christian Daniel	This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter space with deterministic behavioral policies.
166	Generalized No Free Lunch Theorem for Adversarial Robustness	Elvis Dohmatob	This manuscript presents some new impossibility results on adversarial robustness in machine learning, a very important yet largely open problem.
167	Width Provably Matters in Optimization for Deep Linear Neural Networks	Simon Du, Wei Hu	We prove that for an $L$-layer fully-connected linear neural network, if the width of every hidden layer is $\widetilde{\Omega}\left(L \cdot r \cdot d_{out} \cdot \kappa^3 \right)$, where $r$ and $\kappa$ are the rank and the condition number of the input data, and $d_{out}$ is the output dimension, then gradient descent with Gaussian random initialization converges to a global minimum at a linear rate.
168	Provably efficient RL with Rich Observations via Latent State Decoding	Simon Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudik, John Langford	We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states.
169	Gradient Descent Finds Global Minima of Deep Neural Networks	Simon Du, Jason Lee, Haochuan Li, Liwei Wang, Xiyu Zhai	The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet).
170	Incorporating Grouping Information into Bayesian Decision Tree Ensembles	Junliang Du, Antonio Linero	We consider the problem of nonparametric regression in the high-dimensional setting in which $P \gg N$.
171	Task-Agnostic Dynamics Priors for Deep Reinforcement Learning	Yilun Du, Karthic Narasimhan	In this work, we propose an approach to learn task-agnostic dynamics priors from videos and incorporate them into an RL agent.
172	Optimal Auctions through Deep Learning	Paul Duetting, Zhe Feng, Harikrishna Narasimhan, David Parkes, Sai Srivatsa Ravindranath	In this work, we initiate the exploration of the use of tools from deep learning for the automated design of optimal auctions.
173	Wasserstein of Wasserstein Loss for Learning Generative Models	Yonatan Dukler, Wuchen Li, Alex Lin, Guido Montufar	We propose to use the Wasserstein distance itself as the ground metric on the sample space of images.
174	Learning interpretable continuous-time models of latent stochastic dynamical systems	Lea Duncker, Gergo Bohner, Julien Boussard, Maneesh Sahani	This form yields a flexible nonparametric model of the dynamics, with a representation corresponding directly to the interpretable portraits routinely employed in the study of nonlinear dynamical systems.
175	Autoregressive Energy Machines	Conor Durkan, Charlie Nash	We propose the Autoregressive Energy Machine, an energy-based model which simultaneously learns an unnormalized density and computes an importance-sampling estimate of the normalizing constant for each conditional in an autoregressive decomposition.
176	Band-limited Training and Inference for Convolutional Neural Networks	Adam Dziedzic, John Paparrizos, Sanjay Krishnan, Aaron Elmore, Michael Franklin	We explore artificially constraining the frequency spectra of these filters and data, called band-limiting, during training.
177	Imitating Latent Policies from Observation	Ashley Edwards, Himanshu Sahni, Yannick Schroecker, Charles Isbell	In this paper, we describe a novel approach to imitation learning that infers latent policies directly from state observations.
178	Semi-Cyclic Stochastic Gradient Descent	Hubert Eichner, Tomer Koren, Brendan Mcmahan, Nathan Srebro, Kunal Talwar	We show that such block-cyclic structure can significantly deteriorate the performance of SGD, but propose a simple approach that allows prediction with the same guarantees as for i.i.d., non-cyclic, sampling.
179	GDPP: Learning Diverse Generations using Determinantal Point Processes	Mohamed Elfeki, Camille Couprie, Morgane Riviere, Mohamed Elhoseiny	In this work, we draw inspiration from Determinantal Point Process (DPP) to propose an unsupervised penalty loss that alleviates mode collapse while producing higher quality samples.
180	Sequential Facility Location: Approximate Submodularity and Greedy Algorithm	Ehsan Elhamifar	We propose a cardinality-constrained sequential facility location function that finds a fixed number of representatives, where the sequence of representatives is compatible with the dynamic model and well encodes the data.
181	Improved Convergence for $\ell_1$ and $\ell_8$ Regression via Iteratively Reweighted Least Squares	Alina Ene, Adrian Vladu	In this paper we propose a simple and natural version of IRLS for solving $\ell_\infty$ and $\ell_1$ regression, which provably converges to a $(1+\epsilon)$-approximate solution in $O(m^{1/3}\log(1/\epsilon)/\epsilon^{2/3} + \log m/\epsilon^2)$ iterations, where $m$ is the number of rows of the input matrix.
182	Exploring the Landscape of Spatial Robustness	Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, Aleksander Madry	In this work, we thoroughly investigate the vulnerability of neural network–based classifiers to rotations and translations.
183	Cross-Domain 3D Equivariant Image Embeddings	Carlos Esteves, Avneesh Sud, Zhengyi Luo, Kostas Daniilidis, Ameesh Makadia	In this paper we learn 2D image embeddings with a similar equivariant structure: embedding the image of a 3D object should commute with rotations of the object.
184	On the Connection Between Adversarial Robustness and Saliency Map Interpretability	Christian Etmann, Sebastian Lunz, Peter Maass, Carola Schoenlieb	We aim to quantify this behaviour by considering the alignment between input image and saliency map.
185	Non-monotone Submodular Maximization with Nearly Optimal Adaptivity and Query Complexity	Matthew Fahrbach, Vahab Mirrokni, Morteza Zadimoghaddam	In this paper, we give the first constant-factor approximation algorithm for maximizing a non-monotone submodular function subject to a cardinality constraint $k$ that runs in $O(\log(n))$ adaptive rounds and makes $O(n \log(k))$ oracle queries in expectation.
186	Multi-Frequency Vector Diffusion Maps	Yifeng Fan, Zhizhen Zhao	We introduce multi-frequency vector diffusion maps (MFVDM), a new framework for organizing and analyzing high dimensional data sets.
187	Stable-Predictive Optimistic Counterfactual Regret Minimization	Gabriele Farina, Christian Kroer, Noam Brown, Tuomas Sandholm	In this work we present the first CFR variant that breaks the square-root dependence on iterations.
188	Regret Circuits: Composability of Regret Minimizers	Gabriele Farina, Christian Kroer, Tuomas Sandholm	In this paper we study the general composability of regret minimizers.
189	Dead-ends and Secure Exploration in Reinforcement Learning	Mehdi Fatemi, Shikhar Sharma, Harm Van Seijen, Samira Ebrahimi Kahou	To deal with the bridge effect, we propose a condition for exploration, called security.
190	Invariant-Equivariant Representation Learning for Multi-Class Data	Ilya Feige	We introduce an approach to probabilistic modelling that learns to represent data with two separate deep representations: an invariant representation that encodes the information of the class from which the data belongs, and an equivariant representation that encodes the symmetry transformation defining the particular data point within the class manifold (equivariant in the sense that the representation varies naturally with symmetry transformations).
191	The advantages of multiple classes for reducing overfitting from test set reuse	Vitaly Feldman, Roy Frostig, Moritz Hardt	We show a new upper bound of $\tilde O(\max\{\sqrt{k\log(n)/(mn)}, k/n\})$ on the worst-case bias that any attack can achieve in a prediction problem with $m$ classes.
192	Decentralized Exploration in Multi-Armed Bandits	Raphael Feraud, Reda Alami, Romain Laroche	We consider the decentralized exploration problem: a set of players collaborate to identify the best arm by asynchronously interacting with the same stochastic environment.
193	Almost surely constrained convex optimization	Olivier Fercoq, Ahmet Alacaoglu, Ion Necoara, Volkan Cevher	We propose a stochastic gradient framework for solving stochastic composite convex optimization problems with (possibly) infinite number of linear inclusion constraints that need to be satisfied almost surely.
194	Online Meta-Learning	Chelsea Finn, Aravind Rajeswaran, Sham Kakade, Sergey Levine	This work introduces an online meta-learning setting, which merges ideas from both paradigms to better capture the spirit and practice of continual lifelong learning.
195	DL2: Training and Querying Neural Networks with Logic	Marc Fischer, Mislav Balunovic, Dana Drachsler-Cohen, Timon Gehr, Ce Zhang, Martin Vechev	We present DL2, a system for training and querying neural networks with logical constraints.
196	Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning	Jakob Foerster, Francis Song, Edward Hughes, Neil Burch, Iain Dunning, Shimon Whiteson, Matthew Botvinick, Michael Bowling	We present the Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment.
197	Scalable Nonparametric Sampling from Multimodal Posteriors with the Posterior Bootstrap	Edwin Fong, Simon Lyddon, Chris Holmes	We present a scalable Bayesian nonparametric learning routine that enables posterior sampling through the optimization of suitably randomized objective functions.
198	On discriminative learning of prediction uncertainty	Vojtech Franc, Daniel Prusa	We propose a discriminative algorithm learning an uncertainty function which preserves ordering of the input space induced by the conditional risk, and hence can be used to construct optimal rejection strategies.
199	Learning Discrete Structures for Graph Neural Networks	Luca Franceschi, Mathias Niepert, Massimiliano Pontil, Xiao He	With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph.
200	Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN	Dror Freirich, Tzahi Shimkin, Ron Meir, Aviv Tamar	In this work, we show that the distributional Bellman equation, which drives DiRL methods, is equivalent to a generative adversarial network (GAN) model.
201	Approximating Orthogonal Matrices with Effective Givens Factorization	Thomas Frerix, Joan Bruna	We analyze effective approximation of unitary matrices.
202	Fast and Flexible Inference of Joint Distributions from their Marginals	Charlie Frogner, Tomaso Poggio	In this paper, we treat the inference problem generally and propose a unified class of models that encompasses some of those previously proposed while including many new ones.
203	Analyzing and Improving Representations with the Soft Nearest Neighbor Loss	Nicholas Frosst, Nicolas Papernot, Geoffrey Hinton	We explore and expand the Soft Nearest Neighbor Loss to measure the entanglement of class manifolds in representation space: i.e., how close pairs of points from the same class are relative to pairs of points from different classes.
204	Diagnosing Bottlenecks in Deep Q-learning Algorithms	Justin Fu, Aviral Kumar, Matthew Soh, Sergey Levine	In this work, we aim to experimentally investigate potential issues in Q-learning, by means of a “unit testing” framework where we can utilize oracles to disentangle sources of error.
205	MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement	Szu-Wei Fu, Chien-Feng Liao, Yu Tsao, Shou-De Lin	To overcome this issue, we propose a novel MetricGAN approach with an aim to optimize the generator with respect to one or multiple evaluation metrics.
206	Beyond Adaptive Submodularity: Approximation Guarantees of Greedy Policy with Adaptive Submodularity Ratio	Kaito Fujii, Shinsaku Sakaue	We propose a new concept named adaptive submodularity ratio to study the greedy policy for sequential decision making.
207	Off-Policy Deep Reinforcement Learning without Exploration	Scott Fujimoto, David Meger, Doina Precup	In this paper, we demonstrate that due to errors introduced by extrapolation, standard off-policy deep reinforcement learning algorithms, such as DQN and DDPG, are incapable of learning with data uncorrelated to the distribution under the current policy, making them ineffective for this fixed batch setting.
208	Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation	Shani Gamrian, Yoav Goldberg	We demonstrate the approach on synthetic visual variants of the Breakout game, as well as on transfer between subsequent levels of Road Fighter, a Nintendo car-driving game.
209	Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities	Octavian Ganea, Sylvain Gelly, Gary Becigneul, Aliaksei Severyn	As an efficient and effective solution to alleviate this issue, we propose to learn parametric monotonic functions on top of the logits.
210	Graph U-Nets	Hongyang Gao, Shuiwang Ji	To address these challenges, we propose novel graph pooling (gPool) and unpooling (gUnpool) operations in this work.
211	Deep Generative Learning via Variational Gradient Flow	Yuan Gao, Yuling Jiao, Yang Wang, Yao Wang, Can Yang, Shunkang Zhang	We propose a framework to learn deep generative models via \textbf{V}ariational \textbf{Gr}adient Fl\textbf{ow} (VGrow) on probability spaces.
212	Rate Distortion For Model Compression:From Theory To Practice	Weihao Gao, Yu-Han Liu, Chong Wang, Sewoong Oh	In this paper, we propose principled approaches to improve upon the common heuristics used in those building blocks, by studying the fundamental limit for model compression via the rate distortion theory.
213	Demystifying Dropout	Hongchang Gao, Jian Pei, Heng Huang	In this paper, unlike existing works, we explore it from a new perspective to provide new insight into this line of research.
214	Geometric Scattering for Graph Data Analysis	Feng Gao, Guy Wolf, Matthew Hirn	We explore the generalization of scattering transforms from traditional (e.g., image or audio) signals to graph data, analogous to the generalization of ConvNets in geometric deep learning, and the utility of extracted graph features in graph data analysis.
215	Multi-Frequency Phase Synchronization	Tingran Gao, Zhizhen Zhao	We propose a novel formulation for phase synchronization—the statistical problem of jointly estimating alignment angles from noisy pairwise comparisons—as a nonconvex optimization problem that enforces consistency among the pairwise comparisons in multiple frequency channels.
216	Optimal Mini-Batch and Step Sizes for SAGA	Nidham Gazagnadou, Robert Gower, Joseph Salmon	Using these bounds, and since the SAGA algorithm is part of this JacSketch family, we suggest a new standard practice for setting the step and mini-batch sizes for SAGA that are competitive with a numerical grid search.
217	SelectiveNet: A Deep Neural Network with an Integrated Reject Option	Yonatan Geifman, Ran El-Yaniv	We consider the problem of selective prediction (also known as reject option) in deep neural networks, and introduce SelectiveNet, a deep neural architecture with an integrated reject option.
218	A Theory of Regularized Markov Decision Processes	Matthieu Geist, Bruno Scherrer, Olivier Pietquin	We propose a general theory of regularized Markov Decision Processes that generalizes these approaches in two directions: we consider a larger class of regularizers, and we consider the general modified policy iteration approach, encompassing both policy iteration and value iteration.
219	DeepMDP: Learning Continuous Latent Space Models for Representation Learning	Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, Marc G. Bellemare	To formalize this process, we introduce the concept of a \texit{DeepMDP}, a parameterized latent space model that is trained via the minimization of two tractable latent space losses: prediction of rewards and prediction of the distribution over next latent states.
220	Partially Linear Additive Gaussian Graphical Models	Sinong Geng, Minhao Yan, Mladen Kolar, Sanmi Koyejo	We propose a partially linear additive Gaussian graphical model (PLA-GGM) for the estimation of associations between random variables distorted by observed confounders.
221	Learning and Data Selection in Big Datasets	Hossein Shokri Ghadikolaei, Hadi Ghauch, Carlo Fischione, Mikael Skoglund	More specifically, we propose a framework that jointly learns the input-output mapping as well as the most representative samples of the dataset (sufficient dataset).
222	Improved Parallel Algorithms for Density-Based Network Clustering	Mohsen Ghaffari, Silvio Lattanzi, Slobodan Mitrovic	In the case of $k$-core decomposition, our work improves exponentially on the algorithm provided by Esfandiari et al. (ICML’18).
223	Recursive Sketches for Modular Deep Learning	Badih Ghazi, Rina Panigrahy, Joshua Wang	We present a mechanism to compute a sketch (succinct summary) of how a complex modular deep network processes its inputs.
224	An Instability in Variational Inference for Topic Models	Behrooz Ghorbani, Hamid Javadi, Andrea Montanari	We show that these methods suffer from an instability that can produce misleading conclusions.
225	An Investigation into Neural Net Optimization via Hessian Eigenvalue Density	Behrooz Ghorbani, Shankar Krishnan, Ying Xiao	To understand the dynamics of training in deep neural networks, we study the evolution of the Hessian eigenvalue density throughout the optimization process.
226	Data Shapley: Equitable Valuation of Data for Machine Learning	Amirata Ghorbani, James Zou	In this work, we develop a principled framework to address data valuation in the context of supervised machine learning.
227	Efficient Dictionary Learning with Gradient Descent	Dar Gilboa, Sam Buchanan, John Wright	We study one such problem – complete orthogonal dictionary learning, and provide converge guarantees for randomly initialized gradient descent to the neighborhood of a global optimum.
228	A Tree-Based Method for Fast Repeated Sampling of Determinantal Point Processes	Jennifer Gillenwater, Alex Kulesza, Zelda Mariet, Sergei Vassilvtiskii	In this work we address both of these shortcomings.
229	Learning to Groove with Inverse Sequence Transformations	Jon Gillick, Adam Roberts, Jesse Engel, Douglas Eck, David Bamman	We explore models for translating abstract musical ideas (scores, rhythms) into expressive performances using seq2seq and recurrent variational information bottleneck (VIB) models. Focusing on the case of drum set players, we create and release a new dataset for this purpose, containing over 13 hours of recordings by professional drummers aligned with fine-grained timing and dynamics information.
230	Adversarial Examples Are a Natural Consequence of Test Error in Noise	Justin Gilmer, Nicolas Ford, Nicholas Carlini, Ekin Cubuk	In this paper we provide both empirical and theoretical evidence that these are two manifestations of the same underlying phenomenon, and therefore the adversarial robustness and corruption robustness research programs are closely related.
231	Discovering Conditionally Salient Features with Statistical Guarantees	Jaime Roquero Gimenez, James Zou	We study a more fine-grained statistical problem: conditional feature selection, where a feature may be relevant depending on the values of the other features.
232	Estimating Information Flow in Deep Neural Networks	Ziv Goldfeld, Ewout Van Den Berg, Kristjan Greenewald, Igor Melnyk, Nam Nguyen, Brian Kingsbury, Yury Polyanskiy	Focusing on feedforward networks with fixed weights and noisy internal representations, we develop a rigorous framework for accurate estimation of I(X;T_$\ell$).
233	Amortized Monte Carlo Integration	Adam Golinski, Frank Wood, Tom Rainforth	In this paper, we address this inefficiency by introducing AMCI, a method for amortizing Monte Carlo integration directly.
234	Online Algorithms for Rent-Or-Buy with Expert Advice	Sreenivas Gollapudi, Debmalya Panigrahi	In particular, we consider the classical rent-or-buy problem (also called ski rental), and obtain algorithms that provably improve their performance over the adversarial scenario by using these predictions.
235	The information-theoretic value of unlabeled data in semi-supervised learning	Alexander Golovnev, David Pal, Balazs Szorenyi	More specifically, we prove a separation by $\Theta(\log n)$ multiplicative factor for the class of projections over the Boolean hypercube of dimension $n$.
236	Efficient Training of BERT by Progressively Stacking	Linyuan Gong, Di He, Zhuohan Li, Tao Qin, Liwei Wang, Tieyan Liu	In this paper, we explore an efficient training method for the state-of-the-art bidirectional Transformer (BERT) model.
237	Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization	Chengyue Gong, Jian Peng, Qiang Liu	In this paper, we introduce a novel variational framework for batch query optimization, based on the argument that the query batch should be selected to have both high diversity and good worst case performance.
238	Obtaining Fairness using Optimal Transport Theory	Paula Gordaliza, Eustasio Del Barrio, Gamboa Fabrice, Loubes Jean-Michel	We propose a Random Repair which yields a tradeoff between minimal information loss and a certain amount of fairness.
239	Combining parametric and nonparametric models for off-policy evaluation	Omer Gottesman, Yao Liu, Scott Sussex, Emma Brunskill, Finale Doshi-Velez	We consider a model-based approach to perform batch off-policy evaluation in reinforcement learning.
240	Counterfactual Visual Explanations	Yash Goyal, Ziyan Wu, Jan Ernst, Dhruv Batra, Devi Parikh, Stefan Lee	In this work, we develop a technique to produce counterfactual visual explanations.
241	Adaptive Sensor Placement for Continuous Spaces	James Grant, Alexis Boukouvalas, Ryan-Rhys Griffiths, David Leslie, Sattar Vakili, Enrique Munoz De Cote	We present a new formulation of the problem as a continuum-armed bandit problem with feedback in the form of partial observations of realisations of an inhomogeneous Poisson process.
242	A Statistical Investigation of Long Memory in Language and Music	Alexander Greaves-Tunnell, Zaid Harchaoui	We contribute a statistical framework for investigating long-range dependence in current applications of deep sequence modeling, drawing on the well-developed theory of long memory stochastic processes.
243	Automatic Posterior Transformation for Likelihood-Free Inference	David Greenberg, Marcel Nonnenmacher, Jakob Macke	Here we present automatic posterior transformation (APT), a new sequential neural posterior estimation method for simulation-based inference.
244	Learning to Optimize Multigrid PDE Solvers	Daniel Greenfeld, Meirav Galun, Ronen Basri, Irad Yavneh, Ron Kimmel	In this paper we propose a framework for learning multigrid solvers.
245	Multi-Object Representation Learning with Iterative Variational Inference	Klaus Greff, Rapha?l Lopez Kaufman, Rishabh Kabra, Nick Watters, Christopher Burgess, Daniel Zoran, Loic Matthey, Matthew Botvinick, Alexander Lerchner	Instead, we argue for the importance of learning to segment and represent objects jointly.
246	Graphite: Iterative Generative Modeling of Graphs	Aditya Grover, Aaron Zweig, Stefano Ermon	In this work, we propose Graphite, an algorithmic framework for unsupervised learning of representations over nodes in large graphs using deep latent variable generative models.
247	Fast Algorithm for Generalized Multinomial Models with Ranking Data	Jiaqi Gu, Guosheng Yin	Based on this property, we propose an iterative algorithm that is easy to implement and interpret, and is guaranteed to converge.
248	Towards a Deep and Unified Understanding of Deep Neural Models in NLP	Chaoyu Guan, Xiting Wang, Quanshi Zhang, Runjin Chen, Di He, Xing Xie	We define a unified information-based measure to provide quantitative explanations on how intermediate layers of deep Natural Language Processing (NLP) models leverage information of input words.
249	An Investigation of Model-Free Planning	Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sebastien Racaniere, Theophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, Greg Wayne, David Silver, Timothy Lillicrap	In this paper, we go even further, and demonstrate empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner.
250	Humor in Word Embeddings: Cockamamie Gobbledegook for Nincompoops	Limor Gultchin, Genevieve Patterson, Nancy Baym, Nathaniel Swinger, Adam Kalai	While humor is often thought to be beyond the reach of Natural Language Processing, we show that several aspects of single-word humor correlate with simple linear directions in Word Embeddings.
251	Simple Black-box Adversarial Attacks	Chuan Guo, Jacob Gardner, Yurong You, Andrew Gordon Wilson, Kilian Weinberger	We propose an intriguingly simple method for the construction of adversarial images in the black-box setting.
252	Exploring interpretable LSTM neural networks over multi-variable data	Tian Guo, Tao Lin, Nino Antulov-Fantulin	In this paper, we explore the structure of LSTM recurrent neural networks to learn variable-wise hidden states, with the aim to capture different dynamics in multi-variable time series and distinguish the contribution of variables to the prediction.
253	Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs	Lingbing Guo, Zequn Sun, Wei Hu	In this paper, we propose recurrent skipping networks (RSNs), which employ a skipping mechanism to bridge the gaps between entities.
254	Memory-Optimal Direct Convolutions for Maximizing Classification Accuracy in Embedded Applications	Albert Gural, Boris Murmann	This paper presents memory-optimal direct convolutions as a way to push classification accuracy as high as possible given strict hardware memory constraints at the expense of extra compute.
255	IMEXnet A Forward Stable Deep Neural Network	Eldad Haber, Keegan Lensink, Eran Treister, Lars Ruthotto	We introduce the IMEXnet that addresses these challenges by adapting semi-implicit methods for partial differential equations. We also present a new dataset for semantic segmentation and demonstrate the effectiveness of our architecture using the NYU Depth dataset.
256	On The Power of Curriculum Learning in Training Deep Networks	Guy Hacohen, Daphna Weinshall	In this work, we analyze the effect of curriculum learning, which involves the non-uniform sampling of mini-batches, on the training of deep networks, and specifically CNNs trained for image recognition.
257	Trading Redundancy for Communication: Speeding up Distributed SGD for Non-convex Optimization	Farzin Haddadpour, Mohammad Mahdi Kamani, Mehrdad Mahdavi, Viveck Cadambe	In this paper, we advocate the use of redundancy towards communication-efficient distributed stochastic algorithms for non-convex optimization.
258	Learning Latent Dynamics for Planning from Pixels	Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson	We propose the Deep Planning Network (PlaNet), a purely model-based agent that learns the environment dynamics from images and chooses actions through fast online planning in latent space.
259	Neural Separation of Observed and Unobserved Distributions	Tavi Halperin, Ariel Ephrat, Yedid Hoshen	In this work, we introduce a new method—Neural Egg Separation—to tackle the scenario of extracting a signal from an unobserved distribution additively mixed with a signal from an observed distribution.
260	Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI	Lei Han, Peng Sun, Yali Du, Jiechao Xiong, Qing Wang, Xinghai Sun, Han Liu, Tong Zhang	To address the issue, we propose a novel architecture that learns a spatial joint representation of all the agents and outputs grid-wise actions.
261	Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning	Seungyul Han, Youngchul Sung	In this paper, we consider PPO, a representative on-policy algorithm, and propose its improvement by dimension-wise IS weight clipping which separately clips the IS weight of each action dimension to avoid large bias and adaptively controls the IS weight to bound policy update from the current policy.
262	Complexity of Linear Regions in Deep Networks	Boris Hanin, David Rolnick	In this paper, we provide a mathematical framework to count the number of linear regions of a piecewise linear network and measure the volume of the boundaries between these regions.
263	Importance Sampling Policy Evaluation with an Estimated Behavior Policy	Josiah Hanna, Scott Niekum, Peter Stone	In this paper, we study importance sampling with an estimated behavior policy where the behavior policy estimate comes from the same set of data used to compute the importance sampling estimate.
264	Doubly-Competitive Distribution Estimation	Yi Hao, Alon Orlitsky	This paper combines and strengthens the two frameworks.
265	Random Shuffling Beats SGD after Finite Epochs	Jeff Haochen, Suvrit Sra	Building upon \citep{gurbuzbalaban2015random}, we present the first (to our knowledge) non-asymptotic results for this problem by proving that after a reasonable number of epochs \rsgd converges faster than \sgd.
266	Submodular Maximization beyond Non-negativity: Guarantees, Fast Algorithms, and Applications	Chris Harshaw, Moran Feldman, Justin Ward, Amin Karbasi	We present an algorithm for maximizing $g – c$ under a $k$-cardinality constraint which produces a random feasible set $S$ such that $\mathbb{E}[g(S) -c(S)] \geq (1 – e^{-\gamma} – \epsilon) g(\opt) – c(\opt)$, whose running time is $O (\frac{n}{\epsilon} \log^2 \frac{1}{\epsilon})$, independent of $k$.
267	Per-Decision Option Discounting	Anna Harutyunyan, Peter Vrancx, Philippe Hamel, Ann Nowe, Doina Precup	We propose a modification to the options framework that naturally scales the agent’s horizon with option length.
268	Submodular Observation Selection and Information Gathering for Quadratic Models	Abolfazl Hashemi, Mahsa Ghasemi, Haris Vikalo, Ufuk Topcu	We study the problem of selecting most informative subset of a large observation set to enable accurate estimation of unknown parameters.
269	Understanding and Controlling Memory in Recurrent Neural Networks	Doron Haviv, Alexander Rivkind, Omri Barak	Here, we utilize different training protocols, datasets and architectures to obtain a range of networks solving a delayed classification task with similar performance, alongside substantial differences in their ability to extrapolate for longer delays.
270	On the Impact of the Activation function on Deep Neural Networks Training	Soufiane Hayou, Arnaud Doucet, Judith Rousseau	We give a comprehensive theoretical analysis of the Edge of Chaos and show that we can indeed tune the initialization parameters and the activation function in order to accelerate the training and improve the performance.
271	Provably Efficient Maximum Entropy Exploration	Elad Hazan, Sham Kakade, Karan Singh, Abby Van Soest	We provide an efficient algorithm to optimize such such intrinsically defined objectives, when given access to a black box planning oracle (which is robust to function approximation).
272	On the Long-term Impact of Algorithmic Decision Policies: Effort Unfairness and Feature Segregation through Social Learning	Hoda Heidari, Vedant Nanda, Krishna Gummadi	We propose an effort-based measure of fairness and present a data-driven framework for characterizing the long-term impact of algorithmic policies on reshaping the underlying population.
273	Graph Resistance and Learning from Pairwise Comparisons	Julien Hendrickx, Alexander Olshevsky, Venkatesh Saligrama	We consider the problem of learning the qualities of a collection of items by performing noisy comparisons among them.
274	Using Pre-Training Can Improve Model Robustness and Uncertainty	Dan Hendrycks, Kimin Lee, Mantas Mazeika	We show that although pre-training may not improve performance on traditional classification metrics, it improves model robustness and uncertainty estimates.
275	Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design	Jonathan Ho, Xi Chen, Aravind Srinivas, Yan Duan, Pieter Abbeel	In this paper, we investigate and improve upon three limiting design choices employed by flow-based models in prior work: the use of uniform noise for dequantization, the use of inexpressive affine flows, and the use of purely convolutional conditioning networks in coupling layers.
276	Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules	Daniel Ho, Eric Liang, Xi Chen, Ion Stoica, Pieter Abbeel	In this paper, we introduce a new data augmentation algorithm, Population Based Augmentation (PBA), which generates nonstationary augmentation policy schedules instead of a fixed augmentation policy.
277	Collective Model Fusion for Multiple Black-Box Experts	Minh Hoang, Nghia Hoang, Bryan Kian Hsiang Low, Carleton Kingsford	The proposed method will enable thisby addressing the key issues of how black-boxexperts interact to understand the predictive be-haviors of one another; how these understandingscan be represented and shared efficiently amongthemselves; and how the shared understandingscan be combined to generate high-quality consen-sus prediction.
278	Connectivity-Optimized Representation Learning via Persistent Homology	Christoph Hofer, Roland Kwitt, Marc Niethammer, Mandar Dixit	Under mild conditions, this loss is differentiable and we present a theoretical analysis of the properties induced by the loss.
279	Better generalization with less data using robust gradient descent	Matthew Holland, Kazushi Ikeda	In pursuit of stronger performance under weaker assumptions, we propose a technique which uses a cheap and robust iterative estimate of the risk gradient, which can be easily fed into any steepest descent procedure.
280	Emerging Convolutions for Generative Normalizing Flows	Emiel Hoogeboom, Rianne Van Den Berg, Max Welling	We propose two methods to produce invertible convolutions, that have receptive fields identical to standard convolutions: Emerging convolutions are obtained by chaining specific autoregressive convolutions, and periodic convolutions are decoupled in the frequency domain.
281	Nonconvex Variance Reduced Optimization with Arbitrary Sampling	Samuel Horv?th, Peter Richtarik	We provide the first importance sampling variants of variance reduced algorithms for empirical risk minimization with non-convex loss functions.
282	Parameter-Efficient Transfer Learning for NLP	Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly	As an alternative, we propose transfer with adapter modules.
283	Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging	Ping-Chun Hsieh, Xi Liu, Anirban Bhattacharya, P R Kumar	To address the above issue, this paper proposes a model of heteroscedastic linear bandits with reneging, which allows each participant to have a distinct “satisfaction level,” with any interaction outcome falling short of that level resulting in that participant reneging.
284	Finding Mixed Nash Equilibria of Generative Adversarial Networks	Ya-Ping Hsieh, Chen Liu, Volkan Cevher	In this work, we tackle the training of GANs by rethinking the problem formulation from the mixed Nash Equilibria (NE) perspective.
285	Classification from Positive, Unlabeled and Biased Negative Data	Yu-Guan Hsieh, Gang Niu, Masashi Sugiyama	We provide a method based on empirical risk minimization to address this PUbN classification problem.
286	Bayesian Deconditional Kernel Mean Embeddings	Kelvin Hsu, Fabio Ramos	Critically, we introduce the notion of task transformed Gaussian processes and establish deconditional kernel means embeddings as their posterior predictive mean.
287	Faster Stochastic Alternating Direction Method of Multipliers for Nonconvex Optimization	Feihu Huang, Songcan Chen, Heng Huang	In this paper, we propose a faster stochastic alternating direction method of multipliers (ADMM) for nonconvex optimization by using a new stochastic path-integrated differential estimator (SPIDER), called as SPIDER-ADMM.
288	Unsupervised Deep Learning by Neighbourhood Discovery	Jiabo Huang, Qi Dong, Shaogang Gong, Xiatian Zhu	In this work, we introduce a generic unsupervised deep learning approach to training deep models without the need for any manual label supervision.
289	Detecting Overlapping and Correlated Communities without Pure Nodes: Identifiability and Algorithm	Kejun Huang, Xiao Fu	We adopt the mixed-membership stochastic blockmodel as the underlying probabilistic model, and give conditions under which the memberships of a subset of nodes can be uniquely identified.
290	Hierarchical Importance Weighted Autoencoders	Chin-Wei Huang, Kris Sankaran, Eeshan Dhekane, Alexandre Lacoste, Aaron Courville	Theoretically, we analyze the condition under which convergence of the estimator variance can be connected to convergence of the lower bound.
291	Stable and Fair Classification	Lingxiao Huang, Nisheeth Vishnoi	We propose an extended framework based on fair classification algorithms that are formulated as optimization problems, by introducing a stability-focused regularization term.
292	Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment	Chen Huang, Shuangfei Zhai, Walter Talbott, Miguel Bautista Martin, Shih-Yu Sun, Carlos Guestrin, Josh Susskind	In this work we assess this assumption by meta-learning an adaptive loss function to directly optimize the evaluation metric.
293	Causal Discovery and Forecasting in Nonstationary Environments with State-Space Models	Biwei Huang, Kun Zhang, Mingming Gong, Clark Glymour	In this paper, we study causal discovery and forecasting for nonstationary time series.
294	Composing Entropic Policies using Divergence Correction	Jonathan Hunt, Andre Barreto, Timothy Lillicrap, Nicolas Heess	As part of this analysis, we extend an important generalization of policy improvement to the maximum entropy framework and introduce an algorithm for the practical implementation of successor features in continuous action spaces.
295	HexaGAN: Generative Adversarial Nets for Real World Classification	Uiwon Hwang, Dahuin Jung, Sungroh Yoon	In this paper, we propose HexaGAN, a generative adversarial network framework that shows promising classification performance for all three problems.
296	Overcoming Mean-Field Approximations in Recurrent Gaussian Process Models	Alessandro Davide Ialongo, Mark Van Der Wilk, James Hensman, Carl Edward Rasmussen	We identify a new variational inference scheme for dynamical systems whose transition function is modelled by a Gaussian process.
297	Learning Structured Decision Problems with Unawareness	Craig Innes, Alex Lascarides	In this paper, we learn Bayesian Decision Networks from both domain exploration and expert assertions in a way which guarantees convergence to optimal behaviour, even when the agent starts unaware of actions or belief variables that are critical to success.
298	Phase transition in PCA with missing data: Reduced signal-to-noise ratio, not sample size!	Niels Ipsen, Lars Kai Hansen	Here we generalize this analysis to include missing data.
299	Actor-Attention-Critic for Multi-Agent Reinforcement Learning	Shariq Iqbal, Fei Sha	We present an actor-critic algorithm that trains decentralized policies in multi-agent settings, using centrally computed critics that share an attention mechanism which selects relevant information for each agent at every timestep.
300	Complementary-Label Learning for Arbitrary Losses and Models	Takashi Ishida, Gang Niu, Aditya Menon, Masashi Sugiyama	The goal of this paper is to derive a novel framework of complementary-label learning with an unbiased estimator of the classification risk, for arbitrary losses and models—all existing methods have failed to achieve this goal.
301	Causal Identification under Markov Equivalence: Completeness Results	Amin Jaber, Jiji Zhang, Elias Bareinboim	In this paper, we relax this requirement and consider that the knowledge is articulated in the form of an equivalence class of causal diagrams, in particular, a partial ancestral graph (PAG).
302	Learning from a Learner	Alexis Jacq, Matthieu Geist, Ana Paiva, Olivier Pietquin	In this paper, we propose a novel setting for Inverse Reinforcement Learning (IRL), namely “Learning from a Learner” (LfL).
303	Differentially Private Fair Learning	Matthew Jagielski, Michael Kearns, Jieming Mao, Alina Oprea, Aaron Roth, Saeed Sharifi -Malvajerdi, Jonathan Ullman	Motivated by settings in which predictive models may be required to be non-discriminatory with respect to certain attributes (such as race), but even collecting the sensitive attribute may be forbidden or restricted, we initiate the study of fair learning under the constraint of differential privacy.
304	Sum-of-Squares Polynomial Flow	Priyank Jaini, Kira A. Selby, Yaoliang Yu	Based on triangular maps, we propose a general framework for high-dimensional density estimation, by specifying one-dimensional transformations (equivalently conditional densities) and appropriate conditioner networks.
305	DBSCAN++: Towards fast and scalable density clustering	Jennifer Jang, Heinrich Jiang	We propose DBSCAN++, a simple modification of DBSCAN which only requires computing the densities for a chosen subset of points.
306	Learning What and Where to Transfer	Yunhun Jang, Hankook Lee, Sung Ju Hwang, Jinwoo Shin	To address the issue, we propose a novel transfer learning approach based on meta-learning that can automatically learn what knowledge to transfer from the source network to where in the target network.
307	Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning	Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro Ortega, Dj Strouse, Joel Z Leibo, Nando De Freitas	We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents’ actions.
308	A Deep Reinforcement Learning Perspective on Internet Congestion Control	Nathan Jay, Noga Rotman, Brighten Godfrey, Michael Schapira, Aviv Tamar	We present and investigate a novel and timely application domain for deep reinforcement learning (RL): Internet congestion control.
309	Graph Neural Network for Music Score Data and Modeling Expressive Piano Performance	Dasaem Jeong, Taegyun Kwon, Yoojin Kim, Juhan Nam	In this paper, we represent the unique form of musical score using graph neural network and apply it for rendering expressive piano performance from the music score.
310	Ladder Capsule Network	Taewon Jeong, Youngmin Lee, Heeyoung Kim	We propose a new architecture of the capsule network called the ladder capsule network, which has an alternative building block to the dynamic routing algorithm in the capsule network (Sabour et al., 2017).
311	Training CNNs with Selective Allocation of Channels	Jongheon Jeong, Jinwoo Shin	In this paper, we propose a simple way to improve the capacity of any CNN model having large-scale features, without adding more parameters.
312	Learning Discrete and Continuous Factors of Data via Alternating Disentanglement	Yeonwoo Jeong, Hyun Oh Song	We address the problem of unsupervised disentanglement of discrete and continuous explanatory factors of data.
313	Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization	Kaiyi Ji, Zhe Wang, Yi Zhou, Yingbin Liang	In this paper, we propose a new algorithm ZO-SVRG-Coord-Rand and develop a new analysis for an existing ZO-SVRG-Coord algorithm proposed in Liu et al. 2018b, and show that both ZO-SVRG-Coord-Rand and ZO-SVRG-Coord (under our new analysis) outperform other exiting SVRG-type zeroth-order methods as well as ZO-GD and ZO-SGD.
314	Neural Logic Reinforcement Learning	Zhengyao Jiang, Shan Luo	To address these two challenges, we propose a novel algorithm named Neural Logic Reinforcement Learning (NLRL) to represent the policies in reinforcement learning by first-order logic.
315	Finding Options that Minimize Planning Time	Yuu Jinnai, David Abel, David Hershkowitz, Michael Littman, George Konidaris	We formalize the problem of selecting the optimal set of options for planning as that of computing the smallest set of options so that planning converges in less than a given maximum of value-iteration passes.
316	Discovering Options for Exploration by Minimizing Cover Time	Yuu Jinnai, Jee Won Park, David Abel, George Konidaris	We introduce a new option discovery algorithm that diminishes the expected cover time by connecting the most distant states in the state-space graph with options.
317	Kernel Mean Matching for Content Addressability of GANs	Wittawat Jitkrittum, Patsorn Sangkloy, Muhammad Waleed Gondal, Amit Raj, James Hays, Bernhard Sch?lkopf	We propose a novel procedure which adds “content-addressability” to any given unconditional implicit model e.g., a generative adversarial network (GAN).
318	GOODE: A Gaussian Off-The-Shelf Ordinary Differential Equation Solver	David John, Vincent Heuveline, Michael Schober	Our method based on iterated Gaussian process (GP) regression returns a GP posterior over the solution of nonlinear ODEs, which provides a meaningful error estimation via its predictive posterior standard deviation.
319	Bilinear Bandits with Low-rank Structure	Kwang-Sung Jun, Rebecca Willett, Stephen Wright, Robert Nowak	We introduce the bilinear bandit problem with low-rank structure in which an action takes the form of a pair of arms from two different entity types, and the reward is a bilinear function of the known feature vectors of the arms.
320	Statistical Foundations of Virtual Democracy	Anson Kahng, Min Kyung Lee, Ritesh Noothigattu, Ariel Procaccia, Christos-Alexandros Psomas	One of the key questions is which aggregation method – or voting rule – to use; we offer a novel statistical viewpoint that provides guidance.
321	Molecular Hypergraph Grammar with Its Application to Molecular Optimization	Hiroshi Kajino	This paper presents a molecular hypergraph grammar variational autoencoder (MHG-VAE), which uses a single VAE to achieve 100% validity.
322	Robust Influence Maximization for Hyperparametric Models	Dimitris Kalimeris, Gal Kaplun, Yaron Singer	In this paper we study the problem of robust influence maximization in the independent cascade model under a hyperparametric assumption.
323	Classifying Treatment Responders Under Causal Effect Monotonicity	Nathan Kallus	In the context of individual-level causal inference, we study the problem of predicting whether someone will respond or not to a treatment based on their features and past examples of features, treatment indicator (e.g., drug/no drug), and a binary outcome (e.g., recovery from disease).
324	Trainable Decoding of Sets of Sequences for Neural Sequence Models	Ashwin Kalyan, Peter Anderson, Stefan Lee, Dhruv Batra	To address this, we propose $\nabla$BS, a trainable decoding procedure that outputs a set of sequences, highly valued according to the metric.
325	Myopic Posterior Sampling for Adaptive Goal Oriented Design of Experiments	Kirthevasan Kandasamy, Willie Neiswanger, Reed Zhang, Akshay Krishnamurthy, Jeff Schneider, Barnabas Poczos	In this work, we design a new myopic strategy for a wide class of adaptive design of experiment (DOE) problems, where we wish to collect data in order to fulfil a given goal.
326	Differentially Private Learning of Geometric Concepts	Haim Kaplan, Yishay Mansour, Yossi Matias, Uri Stemmer	We present differentially private efficient algorithms for learning union of polygons in the plane (which are not necessarily convex).
327	Policy Consolidation for Continual Reinforcement Learning	Christos Kaplanis, Murray Shanahan, Claudia Clopath	We propose a method for tackling catastrophic forgetting in deep reinforcement learning that is agnostic to the timescale of changes in the distribution of experiences, does not require knowledge of task boundaries and can adapt in continuously changing environments.
328	Error Feedback Fixes SignSGD and other Gradient Compression Schemes	Sai Praneeth Karimireddy, Quentin Rebjock, Sebastian Stich, Martin Jaggi	We show simple convex counter-examples where signSGD does not converge to the optimum.
329	Riemannian adaptive stochastic gradient algorithms on matrix manifolds	Hiroyuki Kasai, Pratik Jawanpuria, Bamdev Mishra	We propose novel stochastic gradient algorithms for problems on Riemannian matrix manifolds by adapting the row and column subspaces of gradients.
330	Neural Inverse Knitting: From Images to Manufacturing Instructions	Alexandre Kaspar, Tae-Hyun Oh, Liane Makatura, Petr Kellnhofer, Wojciech Matusik	Motivated by the recent potential of mass customization brought by whole-garment knitting machines, we introduce the new problem of automatic machine instruction generation using a single image of the desired physical product, which we apply to machine knitting. We create a cured dataset of real samples with their instruction counterpart and propose to use synthetic images to augment it in a novel way.
331	Processing Megapixel Images with Deep Attention-Sampling Models	Angelos Katharopoulos, Francois Fleuret	To tackle this limitation, we propose a fully differentiable end-to-end trainable model that samples and processes only a fraction of the full resolution input image.
332	Robust Estimation of Tree Structured Gaussian Graphical Models	Ashish Katiyar, Jessica Hoffmann, Constantine Caramanis	Robust Estimation of Tree Structured Gaussian Graphical Models.
333	Shallow-Deep Networks: Understanding and Mitigating Network Overthinking	Yigitcan Kaya, Sanghyun Hong, Tudor Dumitras	For prediction transparency, we propose the Shallow-Deep Network (SDN), a generic modification to off-the-shelf DNNs that introduces internal classifiers.
334	Submodular Streaming in All Its Glory: Tight Approximation, Minimum Memory and Low Adaptive Complexity	Ehsan Kazemi, Marko Mitrovic, Morteza Zadimoghaddam, Silvio Lattanzi, Amin Karbasi	In this paper, we study the problem of maximizing a monotone submodular function in the streaming setting with a cardinality constraint $k$.
335	Adaptive Scale-Invariant Online Algorithms for Learning Linear Models	Michal Kempka, Wojciech Kotlowski, Manfred K. Warmuth	In this paper, we resolve the tuning problem by proposing online algorithms making predictions which are invariant under arbitrary rescaling of the features.
336	CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network	Tom Kenter, Vincent Wan, Chun-An Chan, Rob Clark, Jakub Vit	We present a new, hierarchically structured conditional variational auto-encoder to generate prosodic features (fundamental frequency, energy and duration) suitable for use with a vocoder or a generative model like WaveNet.
337	Collaborative Evolutionary Reinforcement Learning	Shauharda Khadka, Somdeb Majumdar, Tarek Nassar, Zach Dwiel, Evren Tumer, Santiago Miret, Yinyin Liu, Kagan Tumer	In this paper, we introduce Collaborative Evolutionary Reinforcement Learning (CERL), a scalable framework that comprises a portfolio of policies that simultaneously explore and exploit diverse regions of the solution space.
338	Geometry Aware Convolutional Filters for Omnidirectional Images Representation	Renata Khasanova, Pascal Frossard	In this paper we aim at improving popular deep convolutional neural networks so that they can properly take into account the specific properties of omnidirectional data.
339	EMI: Exploration with Mutual Information	Hyoungseok Kim, Jaekyeom Kim, Yeonwoo Jeong, Sergey Levine, Hyun Oh Song	We propose EMI, which is an exploration method that constructs embedding representation of states and actions that does not rely on generative decoding of the full observation but extracts predictive signals that can be used to guide exploration based on forward prediction in the representation space.
340	FloWaveNet : A Generative Flow for Raw Audio	Sungwon Kim, Sang-Gil Lee, Jongyoon Song, Jaehyeon Kim, Sungroh Yoon	We propose FloWaveNet, a flow-based generative model for raw audio synthesis.
341	Curiosity-Bottleneck: Exploration By Distilling Task-Specific Novelty	Youngjin Kim, Wontae Nam, Hyunwoo Kim, Ji-Hoon Kim, Gunhee Kim	We introduce an information- theoretic exploration strategy named Curiosity-Bottleneck that distills task-relevant information from observation.
342	Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model	Gi-Soo Kim, Myunghee Cho Paik	This paper proposes a new contextual MAB algorithm for a relaxed, semiparametric reward model that supports nonstationarity.
343	Uniform Convergence Rate of the Kernel Density Estimator Adaptive to Intrinsic Volume Dimension	Jisu Kim, Jaehyeok Shin, Alessandro Rinaldo, Larry Wasserman	We derive concentration inequalities for the supremum norm of the difference between a kernel density estimator (KDE) and its point-wise expectation that hold uniformly over the selection of the bandwidth and under weaker conditions on the kernel and the data generating distribution than previously used in the literature.
344	Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables	Friso Kingma, Pieter Abbeel, Jonathan Ho	In this paper we propose Bit-Swap, a new compression scheme that generalizes BB-ANS and achieves strictly better compression rates for hierarchical latent variable models with Markov chain structure.
345	CompILE: Compositional Imitation Learning and Execution	Thomas Kipf, Yujia Li, Hanjun Dai, Vinicius Zambaldi, Alvaro Sanchez-Gonzalez, Edward Grefenstette, Pushmeet Kohli, Peter Battaglia	We introduce Compositional Imitation Learning and Execution (CompILE): a framework for learning reusable, variable-length segments of hierarchically-structured behavior from demonstration data.
346	Adaptive and Safe Bayesian Optimization in High Dimensions via One-Dimensional Subspaces	Johannes Kirschner, Mojmir Mutny, Nicole Hiller, Rasmus Ischebeck, Andreas Krause	In order to scale the method and keep its benefits, we propose an algorithm (LineBO) that restricts the problem to a sequence of iteratively chosen one-dimensional sub-problems that can be solved efficiently.
347	AUC\textmu: A Performance Metric for Multi-Class Machine Learning Models	Ross Kleiman, David Page	We provide in this work a multi-class extension of AUC that we call AUC{\textmu} that is derived from first principles of the binary class AUC.
348	Fair k-Center Clustering for Data Summarization	Matth?us Kleindessner, Pranjal Awasthi, Jamie Morgenstern	In this paper, we resolve this gap by providing a simple approximation algorithm for the $k$-center problem under the fairness constraint with running time linear in the size of the data set and $k$.
349	Guarantees for Spectral Clustering with Fairness Constraints	Matth?us Kleindessner, Samira Samadi, Pranjal Awasthi, Jamie Morgenstern	Given the widespread popularity of spectral clustering (SC) for partitioning graph data, we study a version of constrained SC in which we try to incorporate the fairness notion proposed by Chierichetti et al. (2017).
350	POPQORN: Quantifying Robustness of Recurrent Neural Networks	Ching-Yun Ko, Zhaoyang Lyu, Lily Weng, Luca Daniel, Ngai Wong, Dahua Lin	In this work, we propose POPQORN (Propagated-output Quantified Robustness for RNNs), a general algorithm to quantify robustness of RNNs, including vanilla RNNs, LSTMs, and GRUs.
351	Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication	Anastasia Koloskova, Sebastian Stich, Martin Jaggi	We (ii) present a novel gossip algorithm, CHOCO-GOSSIP, for the average consensus problem that converges in time O(1/(\rho^2\delta) \log (1/\epsilon)) for accuracy \epsilon > 0.
352	Robust Learning from Untrusted Sources	Nikola Konstantinov, Christoph Lampert	In this work, we address the question of how to learn robustly in such scenarios.
353	Stochastic Beams and Where To Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement	Wouter Kool, Herke Van Hoof, Max Welling	We show how to implicitly apply this ’Gumbel-Top-$k$’ trick on a factorized distribution over sequences, allowing to draw exact samples without replacement using a Stochastic Beam Search.
354	LIT: Learned Intermediate Representation Training for Model Compression	Animesh Koratana, Daniel Kang, Peter Bailis, Matei Zaharia	In this work, we introduce Learned Intermediate representation Training (LIT), a novel model compression technique that outperforms a range of recent model compression techniques by leveraging the highly repetitive structure of modern DNNs (e.g., ResNet).
355	Similarity of Neural Network Representations Revisited	Simon Kornblith, Mohammad Norouzi, Honglak Lee, Geoffrey Hinton	We introduce a similarity index that measures the relationship between representational similarity matrices and does not suffer from this limitation.
356	On the Complexity of Approximating Wasserstein Barycenters	Alexey Kroshnin, Nazarii Tupitsa, Darina Dvinskikh, Pavel Dvurechensky, Alexander Gasnikov, Cesar Uribe	To overcome this issue, we propose a novel proximal-IBP algorithm, which can be seen as a proximal gradient method, which uses IBP on each iteration to make a proximal step.
357	Estimate Sequences for Variance-Reduced Stochastic Composite Optimization	Andrei Kulunchakov, Julien Mairal	In this paper, we propose a unified view of gradient-based algorithms for stochastic convex composite optimization by extending the concept of estimate sequence introduced by Nesterov.
358	Faster Algorithms for Binary Matrix Factorization	Ravi Kumar, Rina Panigrahy, Ali Rahimi, David Woodruff	We give faster approximation algorithms for well-studied variants of Binary Matrix Factorization (BMF), where we are given a binary $m \times n$ matrix $A$ and would like to find binary rank-$k$ matrices $U, V$ to minimize the Frobenius norm of $U \cdot V – A$.
359	Loss Landscapes of Regularized Linear Autoencoders	Daniel Kunin, Jonathan Bloom, Aleksandrina Goeva, Cotton Seed	In this paper, we prove that $L_2$-regularized LAEs are symmetric at all critical points and learn the principal directions as the left singular vectors of the decoder.
360	Geometry and Symmetry in Short-and-Sparse Deconvolution	Han-Wen Kuo, Yenson Lau, Yuqian Zhang, John Wright	We propose a method based on nonconvex optimization, which under certain conditions recovers the target short and sparse signals, up to a signed shift symmetry which is intrinsic to this model.
361	A Large-Scale Study on Regularization and Normalization in GANs	Karol Kurach, Mario Lucic, Xiaohua Zhai, Marcin Michalski, Sylvain Gelly	In this work we take a sober view of the current state of GANs from a practical perspective.
362	Making Decisions that Reduce Discriminatory Impacts	Matt Kusner, Chris Russell, Joshua Loftus, Ricardo Silva	To address this, we describe causal methods that model the relevant parts of the real-world system in which the decisions are made.
363	Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits	Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Tor Lattimore, Mohammad Ghavamzadeh	We propose a bandit algorithm that explores by randomizing its history of rewards.
364	Characterizing Well-Behaved vs. Pathological Deep Neural Networks	Antoine Labatie	We introduce a novel approach, requiring only mild assumptions, for the characterization of deep neural networks at initialization.
365	State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations	Alex Lamb, Jonathan Binas, Anirudh Goyal, Sandeep Subramanian, Ioannis Mitliagkas, Yoshua Bengio, Michael Mozer	We introduce a method, which we refer to as _state reification_, that involves modeling the distribution of hidden states over the training data and then projecting hidden states observed during testing toward this distribution.
366	A Recurrent Neural Cascade-based Model for Continuous-Time Diffusion	Sylvain Lamprier	In this paper we propose a model at the crossroads of these two extremes, which embeds the history of diffusion in infected nodes as hidden continuous states.
367	Projection onto Minkowski Sums with Application to Constrained Learning	Kenneth Lange, Joong-Ho Won, Jason Xu	We introduce block descent algorithms for projecting onto Minkowski sums of sets.
368	Safe Policy Improvement with Baseline Bootstrapping	Romain Laroche, Paul Trichelair, Remi Tachet Des Combes	This paper considers Safe Policy Improvement (SPI) in Batch Reinforcement Learning (Batch RL): from a fixed dataset and without direct access to the true environment, train a policy that is guaranteed to perform at least as well as the baseline policy used to collect the data.
369	A Better k-means++ Algorithm via Local Search	Silvio Lattanzi, Christian Sohler	In this paper, we develop a new variant of k-means++ seeding that in expectation achieves a constant approximation guarantee.
370	Lorentzian Distance Learning for Hyperbolic Representations	Marc Law, Renjie Liao, Jake Snell, Richard Zemel	We introduce an approach to learn representations based on the Lorentzian distance in hyperbolic geometry.
371	DP-GP-LVM: A Bayesian Non-Parametric Model for Learning Multivariate Dependency Structures	Andrew Lawrence, Carl Henrik Ek, Neill Campbell	We present a non-parametric Bayesian latent variable model capable of learning dependency structures across dimensions in a multivariate setting.
372	POLITEX: Regret Bounds for Policy Iteration using Expert Prediction	Nevena Lazic, Yasin Abbasi-Yadkori, Kush Bhatia, Gellert Weisz, Peter Bartlett, Csaba Szepesvari	We present POLITEX (POLicy ITeration with EXpert advice), a variant of policy iteration where each policy is a Boltzmann distribution over the sum of action-value function estimates of the previous policies, and analyze its regret in continuing RL problems.
373	Batch Policy Learning under Constraints	Hoang Le, Cameron Voloshin, Yisong Yue	As part of off-policy learning, we propose a simple method for off-policy policy evaluation (OPE) and derive PAC-style bounds.
374	Target-Based Temporal-Difference Learning	Donghwan Lee, Niao He	In this work, we introduce a new family of target-based temporal difference (TD) learning algorithms that maintain two separate learning parameters {–} the target variable and online variable.
375	Functional Transparency for Structured Data: a Game-Theoretic Approach	Guang-He Lee, Wengong Jin, David Alvarez-Melis, Tommi Jaakkola	We provide a new approach to training neural models to exhibit transparency in a well-defined, functional manner.
376	Self-Attention Graph Pooling	Junhyun Lee, Inyeop Lee, Jaewoo Kang	In this paper, we propose a graph pooling method based on self-attention.
377	Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks	Juho Lee, Yoonho Lee, Jungtaek Kim, Adam Kosiorek, Seungjin Choi, Yee Whye Teh	We present an attention-based neural network module, the Set Transformer, specifically designed to model interactions among elements in the input set.
378	First-Order Algorithms Converge Faster than $O(1/k)$ on Convex Problems	Ching-Pei Lee, Stephen Wright	In this work, we improve this rate to $o(1/k)$.
379	Robust Inference via Generative Classifiers for Handling Noisy Labels	Kimin Lee, Sukmin Yun, Kibok Lee, Honglak Lee, Bo Li, Jinwoo Shin	To mitigate the issue, we propose a novel inference method, termed Robust Generative classifier (RoG), applicable to any discriminative (e.g., softmax) neural classifier pre-trained on noisy datasets.
380	Sublinear Time Nearest Neighbor Search over Generalized Weighted Space	Yifan Lei, Qiang Huang, Mohan Kankanhalli, Anthony Tung	Based on the idea of Asymmetric Locality-Sensitive Hashing (ALSH), we introduce a novel spherical asymmetric transformation and propose the first two novel weight-oblivious hashing schemes SL-ALSH and S2-ALSH accordingly.
381	MONK Outlier-Robust Mean Embedding Estimation by Median-of-Means	Matthieu Lerasle, Zoltan Szabo, Timoth?e Mathieu, Guillaume Lecue	In this paper, we show how the recently emerged principle of median-of-means can be used to design estimators for kernel mean embedding and MMD with excessive resistance properties to outliers, and optimal sub-Gaussian deviation bounds under mild assumptions.
382	Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group	Mario Lezcano-Casado, David Marti?nez-Rubio	We introduce a novel approach to perform first-order optimization with orthogonal and unitary constraints.
383	Are Generative Classifiers More Robust to Adversarial Attacks?	Yingzhen Li, John Bradshaw, Yash Sharma	In this paper, we propose and investigate the deep Bayes classifier, which improves classical naive Bayes with conditional deep generative models.
384	Sublinear quantum algorithms for training linear and kernel-based classifiers	Tongyang Li, Shouvanik Chakrabarti, Xiaodi Wu	We investigate quantum algorithms for classification, a fundamental problem in machine learning, with provable guarantees.
385	LGM-Net: Learning to Generate Matching Networks for Few-Shot Learning	Huaiyu Li, Weiming Dong, Xing Mei, Chongyang Ma, Feiyue Huang, Bao-Gang Hu	In this work, we propose a novel meta-learning approach for few-shot classification, which learns transferable prior knowledge across tasks and directly produces network parameters for similar unseen tasks with training samples.
386	Graph Matching Networks for Learning the Similarity of Graph Structured Objects	Yujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals, Pushmeet Kohli	This paper addresses the challenging problem of retrieval and matching of graph structured objects, and makes two key contributions.
387	Area Attention	Yang Li, Lukasz Kaiser, Samy Bengio, Si Si	We propose area attention: a way to attend to areas in the memory, where each area contains a group of items that are structurally adjacent, e.g., spatially for a 2D memory such as images, or temporally for a 1D memory such as natural language sentences.
388	Online Learning to Rank with Features	Shuai Li, Tor Lattimore, Csaba Szepesvari	We introduce a new model for online ranking in which the click probability factors into an examination and attractiveness function and the attractiveness function is a linear function of a feature vector and an unknown parameter.
389	NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks	Yandong Li, Lijun Li, Liqiang Wang, Tong Zhang, Boqing Gong	In this paper, we propose a black-box adversarial attack algorithm that can defeat both vanilla DNNs and those generated by various defense techniques developed recently.
390	Bayesian Joint Spike-and-Slab Graphical Lasso	Zehang Li, Tyler Mccormick, Samuel Clark	In this article, we propose a new class of priors for Bayesian inference with multiple Gaussian graphical models.
391	Exploiting Worker Correlation for Label Aggregation in Crowdsourcing	Yuan Li, Benjamin Rubinstein, Trevor Cohn	In this paper, we argue that existing crowdsourcing approaches do not sufficiently model worker correlations observed in practical settings; we propose in response an enhanced Bayesian classifier combination (EBCC) model, with inference based on a mean-field variational approach.
392	Adversarial camera stickers: A physical camera-based attack on deep learning systems	Juncheng Li, Frank Schmidt, Zico Kolter	In this work, we consider an alternative question: is it possible to fool deep classifiers, over all perceived objects of a certain type, by physically manipulating the camera itself?
393	Towards a Unified Analysis of Random Fourier Features	Zhu Li, Jean-Francois Ton, Dino Oglic, Dino Sejdinovic	We study both the standard random Fourier features method for which we improve the existing bounds on the number of features required to guarantee the corresponding minimax risk convergence rate of kernel ridge regression, as well as a data-dependent modification which samples features proportional to ridge leverage scores and further reduces the required number of features.
394	Feature-Critic Networks for Heterogeneous Domain Generalization	Yiying Li, Yongxin Yang, Wei Zhou, Timothy Hospedales	In this work, we propose a learning to learn approach, where the auxiliary loss that helps generalisation is itself learned.
395	Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting	Xilai Li, Yingbo Zhou, Tianfu Wu, Richard Socher, Caiming Xiong	This paper presents a conceptually simple yet general and effective framework for handling catastrophic forgetting in continual learning with DNNs.
396	Alternating Minimizations Converge to Second-Order Optimal Solutions	Qiuwei Li, Zhihui Zhu, Gongguo Tang	We show that under mild assumptions on the (nonconvex) objective function, both algorithms avoid strict saddles almost surely from random initialization.
397	Cautious Regret Minimization: Online Optimization with Long-Term Budget Constraints	Nikolaos Liakopoulos, Apostolos Destounis, Georgios Paschos, Thrasyvoulos Spyropoulos, Panayotis Mertikopoulos	We study a class of online convex optimization problems with long-term budget constraints that arise naturally as reliability guarantees or total consumption constraints.
398	Regularization in directable environments with application to Tetris	Jan Lichtenberg, Simsek	We present a regularized linear model called STEW that benefits from a generic and prevalent form of prior knowledge: feature directions.
399	Inference and Sampling of $K_33$-free Ising Models	Valerii Likhosherstov, Yury Maximov, Misha Chertkov	Inference and Sampling of $K_33$-free Ising Models.
400	Kernel-Based Reinforcement Learning in Robust Markov Decision Processes	Shiau Hong Lim, Arnaud Autef	We extend these results to the much larger class of kernel-based approximators and show, both analytically and empirically that the robust policies can significantly outperform the non-robust counterpart.
401	On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms	Tianyi Lin, Nhat Ho, Michael Jordan	We provide theoretical analyses for two algorithms that solve the regularized optimal transport (OT) problem between two discrete probability measures with at most $n$ atoms.
402	Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations	Wu Lin, Mohammad Emtiyaz Khan, Mark Schmidt	In this paper, we extend their application to estimate structured approximations such as mixtures of EF distributions.
403	Acceleration of SVRG and Katyusha X by Inexact Preconditioning	Yanli Liu, Fei Feng, Wotao Yin	In this paper, we propose to accelerate these two algorithms by inexact preconditioning, the proposed methods employ fixed preconditioners, although the subproblem in each epoch becomes harder, it suffices to apply fixed number of simple subroutines to solve it inexactly, without losing the overall convergence.
404	Transferable Adversarial Training: A General Approach to Adapting Deep Classifiers	Hong Liu, Mingsheng Long, Jianmin Wang, Michael Jordan	To this end, we propose Transferable Adversarial Training (TAT) to enable the adaptation of deep classifiers.
405	Rao-Blackwellized Stochastic Gradients for Discrete Distributions	Runjing Liu, Jeffrey Regier, Nilesh Tripuraneni, Michael Jordan, Jon Mcauliffe	In this paper, we describe a technique that can be applied to reduce the variance of any such estimator, without changing its bias{—}in particular, unbiasedness is retained.
406	Sparse Extreme Multi-label Learning with Oracle Property	Weiwei Liu, Xiaobo Shen	To fill this gap, we present a unified framework for SLEEC with nonconvex penalty.
407	Data Poisoning Attacks on Stochastic Bandits	Fang Liu, Ness Shroff	In this paper, we propose a framework of offline attacks on bandit algorithms and study convex optimization based attacks on several popular bandit algorithms.
408	The Implicit Fairness Criterion of Unconstrained Learning	Lydia T. Liu, Max Simchowitz, Moritz Hardt	We clarify what fairness guarantees we can and cannot expect to follow from unconstrained machine learning.
409	Taming MAML: Efficient unbiased meta-reinforcement learning	Hao Liu, Richard Socher, Caiming Xiong	We propose a surrogate objective function named, Taming MAML (TMAML), that adds control variates into gradient estimation via automatic differentiation.
410	On Certifying Non-Uniform Bounds against Adversarial Attacks	Chen Liu, Ryota Tomioka, Volkan Cevher	We formulate our target as an optimization problem with nonlinear constraints.
411	Understanding and Accelerating Particle-Based Variational Inference	Chang Liu, Jingwei Zhuo, Pengyu Cheng, Ruiyi Zhang, Jun Zhu	We propose an acceleration framework and a principled bandwidth-selection method for general ParVIs; these are based on the developed theory and leverage the geometry of the Wasserstein space.
412	Understanding MCMC Dynamics as Flows on the Wasserstein Space	Chang Liu, Jingwei Zhuo, Jun Zhu	In this work, by developing novel concepts, we propose a theoretical framework that recognizes a general MCMC dynamics as the fiber-gradient Hamiltonian flow on the Wasserstein space of a fiber-Riemannian Poisson manifold.
413	Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions	Antoine Liutkus, Umut Simsekli, Szymon Majewski, Alain Durmus, Fabian-Robert St?ter	By building upon the recent theory that established the connection between implicit generative modeling (IGM) and optimal transport, in this study, we propose a novel parameter-free algorithm for learning the underlying distributions of complicated datasets and sampling from them.
414	Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations	Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Raetsch, Sylvain Gelly, Bernhard Sch?lkopf, Olivier Bachem	In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions.
415	Bayesian Counterfactual Risk Minimization	Ben London, Ted Sandler	We present a Bayesian view of counterfactual risk minimization (CRM) for offline learning from logged bandit feedback.
416	PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization	Songtao Lu, Mingyi Hong, Zhengdao Wang	In this paper, we consider a smooth unconstrained nonconvex optimization problem, and propose a perturbed A-GD (PA-GD) which is able to converge (with high probability) to the second-order stationary points (SOSPs) with a global sublinear rate.
417	Neurally-Guided Structure Inference	Sidi Lu, Jiayuan Mao, Joshua Tenenbaum, Jiajun Wu	In this paper, we propose a hybrid inference algorithm, the Neurally-Guided Structure Inference (NG-SI), keeping the advantages of both search-based and data-driven methods.
418	Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards	Shiyin Lu, Guanghui Wang, Yao Hu, Lijun Zhang	To address this limitation, in this paper we relax the assumption on rewards to allow arbitrary distributions that have finite $(1+\epsilon)$-th moments for some $\epsilon \in (0, 1]$, and propose algorithms that enjoy a sublinear regret of $\widetilde{O}(T^{(d_z\epsilon + 1)/(d_z \epsilon + \epsilon + 1)})$ where $T$ is the time horizon and $d_z$ is the zooming dimension.
419	CoT: Cooperative Training for Generative Modeling of Discrete Data	Sidi Lu, Lantao Yu, Siyuan Feng, Yaoming Zhu, Weinan Zhang	In this paper, we study the generative models of sequential discrete data.
420	Generalized Approximate Survey Propagation for High-Dimensional Estimation	Carlo Lucibello, Luca Saglietti, Yue Lu	In this paper, we propose a new algorithm, named Generalized Approximate Survey Propagation (GASP), for solving GLE in the presence of prior or model misspecifications. Furthermore, we present a set of state evolution equations that can precisely characterize the performance of GASP in the high-dimensional limit.
421	High-Fidelity Image Generation With Fewer Labels	Mario Lucic, Michael Tschannen, Marvin Ritter, Xiaohua Zhai, Olivier Bachem, Sylvain Gelly	In this work we demonstrate how one can benefit from recent work on self- and semi-supervised learning to outperform the state of the art on both unsupervised ImageNet synthesis, as well as in the conditional setting.
422	Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction	Giulia Luise, Dimitrios Stamos, Massimiliano Pontil, Carlo Ciliberto	We propose an efficient algorithm based on trace norm regularization which, differently from previous methods, does not require explicit knowledge of the coding/decoding functions of the surrogate framework.
423	Differentiable Dynamic Normalization for Learning Deep Representation	Ping Luo, Peng Zhanglin, Shao Wenqi, Zhang Ruimao, Ren Jiamin, Wu Lingyun	This work presents Dynamic Normalization (DN), which is able to learn arbitrary normalization operations for different convolutional layers in a deep ConvNet.
424	Disentangled Graph Convolutional Networks	Jianxin Ma, Peng Cui, Kun Kuang, Xin Wang, Wenwu Zhu	In this paper, we introduce the disentangled graph convolutional network (DisenGCN) to learn disentangled node representations.
425	Variational Implicit Processes	Chao Ma, Yingzhen Li, Jose Miguel Hernandez-Lobato	We introduce the implicit processes (IPs), a stochastic process that places implicitly defined multivariate distributions over any finite collections of random variables.
426	EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE	Chao Ma, Sebastian Tschiatschek, Konstantina Palla, Jose Miguel Hernandez-Lobato, Sebastian Nowozin, Cheng Zhang	To this end, we propose a principled framework, named EDDI (Efficient Dynamic Discovery of high-value Information), based on the theory of Bayesian experimental design.
427	Bayesian leave-one-out cross-validation for large data	M?ns Magnusson, Michael Andersen, Johan Jonasson, Aki Vehtari	We propose a combination of using approximate inference techniques and probability-proportional-to-size-sampling (PPS) for fast LOO model evaluation for large datasets.
428	Composable Core-sets for Determinant Maximization: A Simple Near-Optimal Algorithm	Sepideh Mahabadi, Piotr Indyk, Shayan Oveis Gharan, Alireza Rezaei	In this work, we consider efficient construction of composable core-sets for the determinant maximization problem.
429	Guided evolutionary strategies: augmenting random search with surrogate gradients	Niru Maheswaranathan, Luke Metz, George Tucker, Dami Choi, Jascha Sohl-Dickstein	We propose Guided Evolutionary Strategies (GES), a method for optimally using surrogate gradient directions to accelerate random search.
430	Data Poisoning Attacks in Multi-Party Learning	Saeed Mahloujifar, Mohammad Mahmoody, Ameer Mohammed	In this work, we demonstrate universal multi-party poisoning attacks that adapt and apply to any multi-party learning process with arbitrary interaction pattern between the parties.
431	Traditional and Heavy Tailed Self Regularization in Neural Network Models	Michael Mahoney, Charles Martin	Building on recent results in RMT, most notably its extension to Universality classes of Heavy-Tailed matrices, we develop a theory to identify 5+1 Phases of Training, corresponding to increasing amounts of Implicit Self-Regularization.
432	Curvature-Exploiting Acceleration of Elastic Net Computations	Vien Mai, Mikael Johansson	This paper introduces an efficient second-order method for solving the elastic net problem.
433	Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms	Ashok Makkuva, Pramod Viswanath, Sreeram Kannan, Sewoong Oh	In this paper, we introduce the first algorithm that learns the true parameters of a MoE model for a wide class of non-linearities with global consistency guarantees.
434	Calibrated Model-Based Deep Reinforcement Learning	Ali Malik, Volodymyr Kuleshov, Jiaming Song, Danny Nemer, Harlan Seymour, Stefano Ermon	We describe a simple way to augment any model-based reinforcement learning agent with a calibrated model and show that doing so consistently improves planning, sample complexity, and exploration.
435	Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems	Timothy Arthur Mann, Sven Gowal, Andras Gyorgy, Huiyi Hu, Ray Jiang, Balaji Lakshminarayanan, Prav Srinivasan	Motivated by our regret analysis, we propose two neural network architectures: Factored Forecaster (FF) which is ideal if the proxy is informative of the outcome in hindsight, and Residual Factored Forecaster (RFF) that is robust to a non-informative proxy.
436	Passed & Spurious: Descent Algorithms and Local Minima in Spiked Matrix-Tensor Models	Stefano Sarao Mannelli, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborova	In this work we analyse quantitatively the interplay between the loss landscape and performance of descent algorithms in a prototypical inference problem, the spiked matrix-tensor model.
437	A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs	Jingkai Mao, Jakob Foerster, Tim Rockt?schel, Maruan Al-Shedivat, Gregory Farquhar, Shimon Whiteson	To improve the sample efficiency of DiCE, we propose a new baseline term for higher order gradient estimation.
438	Adversarial Generation of Time-Frequency Features with application in audio synthesis	Andr?s Marafioti, Nathana?l Perraudin, Nicki Holighaus, Piotr Majdak	In this article, focusing on the short-time Fourier transform, we discuss the challenges that arise in audio synthesis based on generated invertible TF features and how to overcome them.
439	On the Universality of Invariant Networks	Haggai Maron, Ethan Fetaya, Nimrod Segol, Yaron Lipman	In this paper, we consider a fundamental question that has received very little attention to date: Can these networks approximate any (continuous) invariant function?
440	Decomposing feature-level variation with Covariate Gaussian Process Latent Variable Models	Kaspar Martens, Kieran Campbell, Christopher Yau	In this paper, we propose to achieve this through a structured kernel decomposition in a hybrid Gaussian Process model which we call the Covariate Gaussian Process Latent Variable Model (c-GPLVM).
441	Fairness-Aware Learning for Continuous Attributes and Treatments	Jeremie Mary, Cl?ment Calauz?nes, Noureddine El Karoui	As common fairness metrics can be expressed as measures of (conditional) independence between variables, we propose to use the Rényi maximum correlation coefficient to generalize fairness measurement to continuous variables.
442	Optimal Minimal Margin Maximization with Boosting	Alexander Mathiasen, Kasper Green Larsen, Allan Gr?nlund	Our main contribution is a new algorithm refuting this conjecture.
443	Disentangling Disentanglement in Variational Autoencoders	Emile Mathieu, Tom Rainforth, N Siddharth, Yee Whye Teh	We develop a generalisation of disentanglement in variational autoencoders (VAEs)—decomposition of the latent representation—characterising it as the fulfilment of two factors: a) the latent encodings of the data having an appropriate level of overlap, and b) the aggregate encoding of the data conforming to a desired structure, represented through the prior.
444	MIWAE: Deep Generative Modelling and Imputation of Incomplete Data Sets	Pierre-Alexandre Mattei, Jes Frellsen	We consider the problem of handling missing data with deep latent variable models (DLVMs).
445	Distributional Reinforcement Learning for Efficient Exploration	Borislav Mavrin, Hengshuai Yao, Linglong Kong, Kaiwen Wu, Yaoliang Yu	We propose a novel and efficient exploration method for deep RL that has two components.
446	Graphical-model based estimation and inference for differential privacy	Ryan Mckenna, Daniel Sheldon, Gerome Miklau	In this work, we provide an approach to solve this estimation problem efficiently using graphical models, which is particularly effective when the distribution is high-dimensional but the measurements are over low-dimensional marginals.
447	Efficient Amortised Bayesian Inference for Hierarchical and Nonlinear Dynamical Systems	Ted Meeds, Geoffrey Roeder, Paul Grant, Andrew Phillips, Neil Dalchau	We introduce a flexible, scalable Bayesian inference framework for nonlinear dynamical systems characterised by distinct and hierarchical variability at the individual, group, and population levels.
448	Toward Controlling Discrimination in Online Ad Auctions	Anay Mehrotra, Elisa Celis, Nisheeth Vishnoi	To prevent this, we propose a constrained ad auction framework that maximizes the platform’s revenue conditioned on ensuring that the audience seeing an advertiser’s ad is distributed appropriately across sensitive types such as gender or race.
449	Stochastic Blockmodels meet Graph Neural Networks	Nikhil Mehta, Lawrence Carin Duke, Piyush Rai	In this work, we unify these two directions by developing a sparse variational autoencoder for graphs, that retains the interpretability of SBMs, while also enjoying the excellent predictive performance of graph neural nets.
450	Imputing Missing Events in Continuous-Time Event Streams	Hongyuan Mei, Guanghui Qin, Jason Eisner	Given a probability model of complete sequences, we propose particle smoothing—a form of sequential importance sampling—to impute the missing events in an incomplete sequence.
451	Same, Same But Different: Recovering Neural Network Quantization Error Through Weight Factorization	Eldad Meller, Alexander Finkelstein, Uri Almog, Mark Grobman	In this paper, we exploit an oft-overlooked degree of freedom in most networks – for a given layer, individual output channels can be scaled by any factor provided that the corresponding weights of the next layer are inversely scaled.
452	The Wasserstein Transform	Facundo Memoli, Zane Smith, Zhengchao Wan	We introduce the Wasserstein transform, a method for enhancing and denoising datasets defined on general metric spaces.
453	Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks	Charith Mendis, Alex Renda, Dr.Saman Amarasinghe, Michael Carbin	In this paper we present Ithemal, the first tool which learns to predict the throughput of a set of instructions.
454	Geometric Losses for Distributional Learning	Arthur Mensch, Mathieu Blondel, Gabriel Peyr?	Building upon recent advances in entropy-regularized optimal transport, and upon Fenchel duality between measures and continuous functions, we propose a generalization of the logistic loss that incorporates a metric or cost between classes.
455	Spectral Clustering of Signed Graphs via Matrix Power Means	Pedro Mercado, Francesco Tudisco, Matthias Hein	We provide a thorough analysis of the proposed approach in the setting of a general Stochastic Block Model that includes models such as the Labeled Stochastic Block Model and the Censored Block Model.
456	Simple Stochastic Gradient Methods for Non-Smooth Non-Convex Regularized Optimization	Michael Metel, Akiko Takeda	We present two simple stochastic gradient algorithms, for finite-sum and general stochastic optimization problems, which have superior convergence complexities compared to the current state-of-the-art.
457	Reinforcement Learning in Configurable Continuous Environments	Alberto Maria Metelli, Emanuele Ghelfi, Marcello Restelli	In this paper, we fill this gap by proposing a trust-region method, Relative Entropy Model Policy Search (REMPS), able to learn both the policy and the MDP configuration in continuous domains without requiring the knowledge of the true model of the environment.
458	Understanding and correcting pathologies in the training of learned optimizers	Luke Metz, Niru Maheswaranathan, Jeremy Nixon, Daniel Freeman, Jascha Sohl-Dickstein	In this work we propose a training scheme which overcomes both of these difficulties, by dynamically weighting two unbiased gradient estimators for a variational loss on optimizer performance.
459	Optimality Implies Kernel Sum Classifiers are Statistically Efficient	Raphael Meyer, Jean Honorio	We propose a novel combination of optimization tools with learning theory bounds in order to analyze the sample complexity of optimal kernel sum classifiers.
460	On Dropout and Nuclear Norm Regularization	Poorya Mianjy, Raman Arora	We give a formal and complete characterization of the explicit regularizer induced by dropout in deep linear networks with squared loss.
461	Discriminative Regularization for Latent Variable Models with Applications to Electrocardiography	Andrew Miller, Ziad Obermeyer, John Cunningham, Sendhil Mullainathan	We propose a generative model training objective that uses a black-box discriminative model as a regularizer to learn representations that preserve this predictive variation.
462	Formal Privacy for Functional Data with Gaussian Perturbations	Ardalan Mirshani, Matthew Reimherr, Aleksandra Slavkovic	Motivated by the rapid rise in statistical tools in Functional Data Analysis, we consider the Gaussian mechanism for achieving differential privacy (DP) with parameter estimates taking values in a, potentially infinite-dimensional, separable Banach space.
463	Co-manifold learning with missing data	Gal Mishne, Eric Chi, Ronald Coifman	We propose utilizing this coupled structure to perform co-manifold learning: uncovering the underlying geometry of both the rows and the columns of a given matrix, where we focus on a missing data setting.
464	Agnostic Federated Learning	Mehryar Mohri, Gary Sivek, Ananda Theertha Suresh	Instead, we propose a new framework of agnostic federated learning, where the centralized model is optimized for any target distribution formed by a mixture of the client distributions.
465	Flat Metric Minimization with Applications in Generative Modeling	Thomas M?llenhoff, Daniel Cremers	In our theoretical contribution we prove that the flat metric between a parametrized current and a reference current is Lipschitz continuous in the parameters.
466	Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization	Seungyong Moon, Gaon An, Hyun Oh Song	We propose an efficient discrete surrogate to the optimization problem which does not require estimating the gradient and consequently becomes free of the first order update hyperparameters to tune.
467	Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization	Hesham Mostafa, Xin Wang	Here we present a novel dynamic sparse reparameterization method that addresses the limitations of previous techniques such as high computational cost and the need for manual configuration of the number of free parameters allocated to each layer.
468	A Dynamical Systems Perspective on Nesterov Acceleration	Michael Muehlebach, Michael Jordan	We present a dynamical system framework for understanding Nesterov’s accelerated gradient method.
469	Relational Pooling for Graph Representations	Ryan Murphy, Balasubramaniam Srinivasan, Vinayak Rao, Bruno Ribeiro	This work generalizes graph neural networks (GNNs) beyond those based on the Weisfeiler-Lehman (WL) algorithm, graph Laplacians, and diffusions.
470	Learning Optimal Fair Policies	Razieh Nabi, Daniel Malinsky, Ilya Shpitser	In this paper, we consider how to make optimal but fair decisions, which “break the cycle of injustice” by correcting for the unfair dependence of both decisions and outcomes on sensitive features (e.g., variables that correspond to gender, race, disability, or other protected attributes).
471	Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models	Mor Shpigel Nacson, Suriya Gunasekar, Jason Lee, Nathan Srebro, Daniel Soudry	For non-homogeneous ensemble models, which output is a sum of homogeneous sub-models, we show that this solution discards the shallowest sub-models if they are unnecessary.
472	A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning	Yoshihiro Nagano, Shoichiro Yamaguchi, Yasuhiro Fujita, Masanori Koyama	In this paper, we present a novel hyperbolic distribution called hyperbolic wrapped distribution, a wrapped normal distribution on hyperbolic space whose density can be evaluated analytically and differentiated with respect to the parameters.
473	SGD without Replacement: Sharper Rates for General Smooth Convex Functions	Dheeraj Nagaraj, Prateek Jain, Praneeth Netrapalli	We study stochastic gradient descent without replacement (SGDo) for smooth convex functions.
474	Dropout as a Structured Shrinkage Prior	Eric Nalisnick, Jose Miguel Hernandez-Lobato, Padhraic Smyth	We propose a novel framework for understanding multiplicative noise in neural networks, considering continuous distributions as well as Bernoulli noise (i.e. dropout).
475	Hybrid Models with Deep and Invertible Features	Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, Balaji Lakshminarayanan	We propose a neural hybrid model consisting of a linear model defined on a set of features computed by a deep, invertible transformation (i.e. a normalizing flow).
476	Learning Context-dependent Label Permutations for Multi-label Classification	Jinseok Nam, Young-Bum Kim, Eneldo Loza Mencia, Sunghyun Park, Ruhi Sarikaya, Johannes F?rnkranz	In this work, we propose a multi-label classification approach which allows to choose a dynamic, context-dependent label ordering.
477	Zero-Shot Knowledge Distillation in Deep Networks	Gaurav Kumar Nayak, Konda Reddy Mopuri, Vaisakh Shaj, Venkatesh Babu Radhakrishnan, Anirban Chakraborty	Hence, in this paper, we propose a novel data-free method to train the Student from the Teacher.
478	A Framework for Bayesian Optimization in Embedded Subspaces	Amin Nayebi, Alexander Munteanu, Matthias Poloczek	We present a theoretically founded approach for high-dimensional Bayesian optimization based on low-dimensional subspace embeddings.
479	Phaseless PCA: Low-Rank Matrix Recovery from Column-wise Phaseless Measurements	Seyedehsara Nayer, Praneeth Narayanamurthy, Namrata Vaswani	We introduce a simple algorithm that is provably correct as long as the subspace changes are piecewise constant. This work proposes the first set of simple, practically useful, and provable algorithms for two inter-related problems.
480	Safe Grid Search with Optimal Complexity	Eugene Ndiaye, Tam Le, Olivier Fercoq, Joseph Salmon, Ichiro Takeuchi	In this paper, we revisit the techniques of approximating the regularization path up to predefined tolerance $\epsilon$ in a unified framework and show that its complexity is $O(1/\sqrt[d]{\epsilon})$ for uniformly convex loss of order $d \geq 2$ and $O(1/\sqrt{\epsilon})$ for Generalized Self-Concordant functions.
481	Learning to bid in revenue-maximizing auctions	Thomas Nedelec, Noureddine El Karoui, Vianney Perchet	Using a variational approach, we study the complexity of the original objective and we introduce a relaxation of the objective functional in order to use gradient descent methods.
482	On Connected Sublevel Sets in Deep Learning	Quynh Nguyen	This paper shows that every sublevel set of the loss function of a class of deep over-parameterized neural nets with piecewise linear activation functions is connected and unbounded.
483	Anomaly Detection With Multiple-Hypotheses Predictions	Duc Tam Nguyen, Zhongyu Lou, Michael Klar, Thomas Brox	We propose to learn the data distribution of the foreground more efficiently with a multi-hypotheses autoencoder.
484	Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization	Than Huy Nguyen, Umut Simsekli, Gael Richard	In this study, we analyze the non-asymptotic behavior of FLMC for non-convex optimization and prove finite-time bounds for its expected suboptimality.
485	Rotation Invariant Householder Parameterization for Bayesian PCA	Rajbir Nirwan, Nils Bertschinger	Here, we propose a parameterization based on Householder transformations, which remove the rotational symmetry of the posterior.
486	Lossless or Quantized Boosting with Integer Arithmetic	Richard Nock, Robert Williamson	We build a learning algorithm which is able, under mild assumptions, to achieve a lossless boosting-compliant training.
487	Training Neural Networks with Local Error Signals	Arild N?kland, Lars Hiller Eidnes	In this paper we demonstrate, for the first time, that layer-wise training can approach the state-of-the-art on a variety of image datasets.
488	Remember and Forget for Experience Replay	Guido Novati, Petros Koumoutsakos	We introduce Remember and Forget Experience Replay (ReF-ER), a novel method that can enhance RL algorithms with parameterized policies.
489	Learning to Infer Program Sketches	Maxwell Nye, Luke Hewitt, Joshua Tenenbaum, Armando Solar-Lezama	The key idea of this work is that a flexible combination of pattern recognition and explicit reasoning can be used to solve these complex programming problems.
490	Tensor Variable Elimination for Plated Factor Graphs	Fritz Obermeyer, Eli Bingham, Martin Jankowiak, Neeraj Pradhan, Justin Chiu, Alexander Rush, Noah Goodman	To exploit efficient tensor algebra in graphs with plates of variables, we generalize undirected factor graphs to plated factor graphs and variable elimination to a tensor variable elimination algorithm that operates directly on plated factor graphs.
491	Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models	Michael Oberst, David Sontag	In particular, we introduce a class of structural causal models (SCMs) for generating counterfactual trajectories in finite partially observable Markov Decision Processes (POMDPs).
492	Model Function Based Conditional Gradient Method with Armijo-like Line Search	Peter Ochs, Yura Malitsky	As special cases, for example, we develop an algorithm for additive composite problems and an algorithm for non-linear composite problems which leads to a Gauss-Newton-type algorithm.
493	TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing	Augustus Odena, Catherine Olsson, David Andersen, Ian Goodfellow	We introduce testing techniques for neural networks that can discover errors occurring only for rare inputs.
494	Scalable Learning in Reproducing Kernel Krein Spaces	Dino Oglic, Thomas G?rtner	We provide the first mathematically complete derivation of the Nystr{ö}m method for low-rank approximation of indefinite kernels and propose an efficient method for finding an approximate eigendecomposition of such kernel matrices.
495	Approximation and non-parametric estimation of ResNet-type convolutional neural networks	Kenta Oono, Taiji Suzuki	We show a ResNet-type CNN can attain the minimax optimal error rates in these classes in more plausible situations – it can be dense, and its width, channel size, and filter size are constant with respect to sample size.
496	Orthogonal Random Forest for Causal Inference	Miruna Oprescu, Vasilis Syrgkanis, Zhiwei Steven Wu	We propose the orthogonal random forest, an algorithm that combines Neyman-orthogonality to reduce sensitivity with respect to estimation error of nuisance parameters with generalized random forests (Athey et al., 2017)—a flexible non-parametric method for statistical estimation of conditional moment models using random forests.
497	Inferring Heterogeneous Causal Effects in Presence of Spatial Confounding	Muhammad Osama, Dave Zachariah, Thomas Sch?n	We address the problem of inferring the causal effect of an exposure on an outcome across space, using observational data.
498	Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path?	Samet Oymak, Mahdi Soltanolkotabi	In this paper we demonstrate that when the loss has certain properties over a minimally small neighborhood of the initial point, first order methods such as (stochastic) gradient descent have a few intriguing properties: (1) the iterates converge at a geometric rate to a global optima even when the loss is nonconvex, (2) among all global optima of the loss the iterates converge to one with a near minimal distance to the initial point, (3) the iterates take a near direct route from the initial point to this global optimum.
499	Multiplicative Weights Updates as a distributed constrained optimization algorithm: Convergence to second-order stationary points almost always	Ioannis Panageas, Georgios Piliouras, Xiao Wang	In this paper we focus on constrained non-concave maximization.
500	Improving Adversarial Robustness via Promoting Ensemble Diversity	Tianyu Pang, Kun Xu, Chao Du, Ning Chen, Jun Zhu	This paper presents a new method that explores the interaction among individual networks to improve robustness for ensemble models.
501	Nonparametric Bayesian Deep Networks with Local Competition	Konstantinos Panousis, Sotirios Chatzis, Sergios Theodoridis	The aim of this work is to enable inference of deep networks that retain high accuracy for the least possible model complexity, with the latter deduced from the data during inference.
502	Optimistic Policy Optimization via Multiple Importance Sampling	Matteo Papini, Alberto Maria Metelli, Lorenzo Lupo, Marcello Restelli	In this paper, we address the exploration-exploitation trade-off in PS by proposing an approach based on Optimism in the Face of Uncertainty.
503	Deep Residual Output Layers for Neural Language Generation	Nikolaos Pappas, James Henderson	In this paper, we investigate the usefulness of more powerful shared mappings for output labels, and propose a deep residual output mapping with dropout between layers to better capture the structure of the output space and avoid overfitting.
504	Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians	Vardan Papyan	We show this term is not a Covariance but a second moment matrix, i.e., it is influenced by means of gradients.
505	Generalized Majorization-Minimization	Sobhan Naderi Parizi, Kun He, Reza Aghajani, Stan Sclaroff, Pedro Felzenszwalb	We generalize MM by relaxing this constraint, and propose a new optimization framework, named Generalized Majorization-Minimization (G-MM), that is more flexible.
506	Variational Laplace Autoencoders	Yookoon Park, Chris Kim, Gunhee Kim	We present a novel approach that addresses both challenges.
507	The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study	Daniel Park, Jascha Sohl-Dickstein, Quoc Le, Samuel Smith	We investigate how the final parameters found by stochastic gradient descent are influenced by over-parameterization.
508	Spectral Approximate Inference	Sejun Park, Eunho Yang, Se-Young Yun, Jinwoo Shin	To overcome the limitation, we propose a novel approach utilizing the global spectral feature of GM.
509	Self-Supervised Exploration via Disagreement	Deepak Pathak, Dhiraj Gandhi, Abhinav Gupta	In this paper, we propose a formulation for exploration inspired by the work in active learning literature.
510	Subspace Robust Wasserstein Distances	Fran?ois-Pierre Paty, Marco Cuturi	We propose in this work a “max-min” robust variant of the Wasserstein distance by considering the maximal possible distance that can be realized between two measures, assuming they can be projected orthogonally on a lower k-dimensional subspace.
511	Fingerprint Policy Optimisation for Robust Reinforcement Learning	Supratik Paul, Michael A. Osborne, Shimon Whiteson	In this paper, we present fingerprint policy optimisation (FPO), which finds a policy that is optimal in expectation across the distribution of environment variables.
512	COMIC: Multi-view Clustering Without Parameter Selection	Xi Peng, Zhenyu Huang, Jiancheng Lv, Hongyuan Zhu, Joey Tianyi Zhou	In this paper, we study two challenges in clustering analysis, namely, how to cluster multi-view data and how to perform clustering without parameter selection on cluster size.
513	Domain Agnostic Learning with Disentangled Representations	Xingchao Peng, Zijun Huang, Ximeng Sun, Kate Saenko	In this paper, we propose the task of Domain-Agnostic Learning (DAL): How to transfer knowledge from a labeled source domain to unlabeled data from arbitrary target domains?
514	Collaborative Channel Pruning for Deep Networks	Hanyu Peng, Jiaxiang Wu, Shifeng Chen, Junzhou Huang	In this paper, we propose a novel algorithm, namely collaborative channel pruning (CCP), to reduce the computational overhead with negligible performance degradation.
515	Exploiting structure of uncertainty for efficient matroid semi-bandits	Pierre Perrault, Vianney Perchet, Michal Valko	We improve the efficiency of algorithms for stochastic combinatorial semi-bandits.
516	Cognitive model priors for predicting human decisions	Joshua Peterson, David Bourgin, Daniel Reichman, Thomas Griffiths, Stuart Russell	We argue that this is mainly due to data scarcity, since noisy human behavior requires massive sample sizes to be accurately captured by off-the-shelf machine learning methods. Second, we present the first large-scale dataset for human decision-making, containing over 240,000 human judgments across over 13,000 decision problems.
517	Towards Understanding Knowledge Distillation	Mary Phuong, Christoph Lampert	In this work, we provide the first insights into the working mechanisms of distillation by studying the special case of linear and deep linear classifiers.
518	Temporal Gaussian Mixture Layer for Videos	Aj Piergiovanni, Michael Ryoo	We present our fully convolutional video models with multiple TGM layers for activity detection.
519	Voronoi Boundary Classification: A High-Dimensional Geometric Approach via Weighted Monte Carlo Integration	Vladislav Polianskii, Florian T. Pokorny	We propose a Monte-Carlo integration based approach that instead computes a weighted integral over the boundaries of Voronoi cells, thus incorporating additional information about the Voronoi cell structure.
520	On Variational Bounds of Mutual Information	Ben Poole, Sherjil Ozair, Aaron Van Den Oord, Alex Alemi, George Tucker	In this work, we unify these recent developments in a single framework.
521	Hiring Under Uncertainty	Manish Purohit, Sreenivas Gollapudi, Manish Raghavan	In this paper we introduce the hiring under uncertainty problem to model the questions faced by hiring committees in large enterprises and universities alike.
522	SAGA with Arbitrary Sampling	Xun Qian, Zheng Qu, Peter Richtarik	We remedy this situation and propose a general and flexible variant of SAGA following the arbitrary sampling paradigm.
523	SGD with Arbitrary Sampling: General Analysis and Improved Rates	Xun Qian, Peter Richtarik, Robert Gower, Alibek Sailanbayev, Nicolas Loizou, Egor Shulgin	We propose a general yet simple theorem describing the convergence of SGD under the arbitrary sampling paradigm.
524	AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss	Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Mark Hasegawa-Johnson	In this paper, we propose a new style transfer scheme that involves only an autoencoder with a carefully designed bottleneck.
525	Fault Tolerance in Iterative-Convergent Machine Learning	Aurick Qiao, Bryon Aragam, Bingjing Zhang, Eric Xing	In this paper, we develop a general framework to quantify the effects of calculation errors on iterative-convergent algorithms.
526	Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition	Yao Qin, Nicholas Carlini, Garrison Cottrell, Ian Goodfellow, Colin Raffel	This paper makes progress on both of these fronts.
527	GMNN: Graph Markov Neural Networks	Meng Qu, Yoshua Bengio, Jian Tang	In this paper, we propose the Graph Markov Neural Network (GMNN) that combines the advantages of both worlds.
528	Nonlinear Distributional Gradient Temporal-Difference Learning	Chao Qu, Shie Mannor, Huan Xu	In the control setting, we propose the distributional Greedy-GQ using similar derivation.
529	Learning to Collaborate in Markov Decision Processes	Goran Radanovic, Rati Devidze, David Parkes, Adish Singla	We consider a two-agent MDP framework where agents repeatedly solve a task in a collaborative setting.
530	Meta-Learning Neural Bloom Filters	Jack Rae, Sergey Bartunov, Timothy Lillicrap	In this paper we explore the learning of approximate set membership over a set of data in one-shot via meta-learning.
531	Direct Uncertainty Prediction for Medical Second Opinions	Maithra Raghu, Katy Blumer, Rory Sayres, Ziad Obermeyer, Bobby Kleinberg, Sendhil Mullainathan, Jon Kleinberg	In this work, we show that machine learning models can be successfully trained to give uncertainty scores to data instances that result in high expert disagreements.
532	Game Theoretic Optimization via Gradient-based Nikaido-Isoda Function	Arvind Raghunathan, Anoop Cherian, Devesh Jha	To this end, we introduce the Gradient-based Nikaido-Isoda (GNI) function which serves: (i) as a merit function, vanishing only at the first-order stationary points of each player’s optimization problem, and (ii) provides error bounds to a stationary Nash point.
533	On the Spectral Bias of Neural Networks	Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred Hamprecht, Yoshua Bengio, Aaron Courville	In this work we present properties of neural networks that complement this aspect of expressivity.
534	Look Ma, No Latent Variables: Accurate Cutset Networks via Compilation	Tahrima Rahman, Shasha Jin, Vibhav Gogate	To address this problem, in this paper, we propose a novel approach for inducing cutset networks, a well-known tractable, highly interpretable representation that does not use latent variables and admits linear time MAR as well as MAP inference.
535	Does Data Augmentation Lead to Positive Margin?	Shashank Rajput, Zhili Feng, Zachary Charles, Po-Ling Loh, Dimitris Papailiopoulos	In this work, we analyze the robustness that DA begets by quantifying the margin that DA enforces on empirical risk minimizers.
536	Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables	Kate Rakelly, Aurick Zhou, Chelsea Finn, Sergey Levine, Deirdre Quillen	In this paper, we address these challenges by developing an off-policy meta-RL algorithm that disentangles task inference and control.
537	Screening rules for Lasso with non-convex Sparse Regularizers	Alain Rakotomamonjy, Gilles Gasso, Joseph Salmon	The approach we propose is based on a iterative majorization-minimization (MM) strategy that includes a screening rule in the inner solver and a condition for propagating screened variables between iterations of MM.
538	Topological Data Analysis of Decision Boundaries with Application to Model Selection	Karthikeyan Natesan Ramamurthy, Kush Varshney, Krishnan Mody	We propose the labeled Cech complex, the plain labeled Vietoris-Rips complex, and the locally scaled labeled Vietoris-Rips complex to perform persistent homology inference of decision boundaries in classification tasks.
539	HyperGAN: A Generative Model for Diverse, Performant Neural Networks	Neale Ratzlaff, Li Fuxin	We introduce HyperGAN, a generative model that learns to generate all the parameters of a deep neural network.
540	Efficient On-Device Models using Neural Projections	Sujith Ravi	We propose a neural projection approach for training compact on-device neural networks.
541	A Block Coordinate Descent Proximal Method for Simultaneous Filtering and Parameter Estimation	Ramin Raziperchikolaei, Harish Bhat	We propose and analyze a block coordinate descent proximal algorithm (BCD-prox) for simultaneous filtering and parameter estimation of ODE models.
542	Do ImageNet Classifiers Generalize to ImageNet?	Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, Vaishaal Shankar	We build new test sets for the CIFAR-10 and ImageNet datasets.
543	Fast Rates for a kNN Classifier Robust to Unknown Asymmetric Label Noise	Henry Reeve, Ata Kaban	We consider classification in the presence of class-dependent asymmetric label noise with unknown noise probabilities.
544	Almost Unsupervised Text to Speech and Automatic Speech Recognition	Yi Ren, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu	In this paper, by leveraging the dual nature of the two tasks, we propose an almost unsupervised learning method that only leverages few hundreds of paired data and extra unpaired data for TTS and ASR.
545	Adaptive Antithetic Sampling for Variance Reduction	Hongyu Ren, Shengjia Zhao, Stefano Ermon	In this paper, we propose a general-purpose adaptive antithetic sampling framework.
546	Adversarial Online Learning with noise	Alon Resler, Yishay Mansour	Specifically, we consider binary losses xored with the noise, which is a Bernoulli random variable.
547	A Polynomial Time MCMC Method for Sampling from Continuous Determinantal Point Processes	Alireza Rezaei, Shayan Oveis Gharan	We study the Gibbs sampling algorithm for discrete and continuous $k$-determinantal point processes.
548	A Persistent Weisfeiler-Lehman Procedure for Graph Classification	Bastian Rieck, Christian Bock, Karsten Borgwardt	Our method, which we formalise as a generalisation of Weisfeiler–Lehman subtree features, exhibits favourable classification accuracy and its improvements in predictive performance are mainly driven by including cycle information.
549	Efficient learning of smooth probability functions from Bernoulli tests with guarantees	Paul Rolland, Ali Kavis, Alexander Immer, Adish Singla, Volkan Cevher	We study the fundamental problem of learning an unknown, smooth probability function via point-wise Bernoulli tests.
550	Separable value functions across time-scales	Joshua Romoff, Peter Henderson, Ahmed Touati, Yann Ollivier, Joelle Pineau, Emma Brunskill	We present an extension of temporal difference (TD) learning, which we call TD($\Delta$), that breaks down a value function into a series of components based on the differences between value functions with smaller discount factors.
551	Online Convex Optimization in Adversarial Markov Decision Processes	Aviv Rosenberg, Yishay Mansour	We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes, and the transition function is not known to the learner.
552	Good Initializations of Variational Bayes for Deep Models	Simone Rossi, Pietro Michiardi, Maurizio Filippone	We address this by proposing a novel layer-wise initialization strategy based on Bayesian linear models.
553	The Odds are Odd: A Statistical Test for Detecting Adversarial Examples	Kevin Roth, Yannic Kilcher, Thomas Hofmann	We investigate conditions under which test statistics exist that can reliably detect examples, which have been adversarially manipulated in a white-box attack.
554	Neuron birth-death dynamics accelerates gradient descent and converges asymptotically	Grant Rotskoff, Samy Jelassi, Joan Bruna, Eric Vanden-Eijnden	In this work, we propose a non-local mass transport dynamics that leads to a modified PDE with the same minimizer.
555	Iterative Linearized Control: Stable Algorithms and Complexity Guarantees	Vincent Roulet, Dmitriy Drusvyatskiy, Siddhartha Srinivasa, Zaid Harchaoui	We examine popular gradient-based algorithms for nonlinear control in the light of the modern complexity analysis of first-order optimization algorithms.
556	Statistics and Samples in Distributional Reinforcement Learning	Mark Rowland, Robert Dadashi, Saurabh Kumar, Remi Munos, Marc G. Bellemare, Will Dabney	We present a unifying framework for designing and analysing distributional reinforcement learning (DRL) algorithms in terms of recursively estimating statistics of the return distribution.
557	A Contrastive Divergence for Combining Variational Inference and MCMC	Francisco Ruiz, Michalis Titsias	To make inference tractable, we introduce the variational contrastive divergence (VCD), a new divergence that replaces the standard Kullback-Leibler (KL) divergence used in VI.
558	Plug-and-Play Methods Provably Converge with Properly Trained Denoisers	Ernest Ryu, Jialin Liu, Sicheng Wang, Xiaohan Chen, Zhangyang Wang, Wotao Yin	In this paper, we theoretically establish convergence of PnP-FBS and PnP-ADMM, without using diminishing stepsizes, under a certain Lipschitz condition on the denoisers.
559	White-box vs Black-box: Bayes Optimal Strategies for Membership Inference	Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Yann Ollivier, Herve Jegou	In this paper, we derive the optimal strategy for membership inference with a few assumptions on the distribution of the parameters.
560	Tractable n-Metrics for Multiple Graphs	Sam Safavi, Jose Bento	In this paper, we introduce a new family of multi-distances (a distance between more than two elements) that satisfies a generalization of the properties of metrics to multiple elements.
561	An Optimal Private Stochastic-MAB Algorithm based on Optimal Private Stopping Rule	Touqir Sajed, Or Sheffet	We present a provably optimal differentially private algorithm for the stochastic multi-arm bandit problem, as opposed to the private analogue of the UCB-algorithm (Mishra and Thakurta, 2015; Tossou and Dimitrakakis, 2016) which doesn’t meet the recently discovered lower-bound of $\Omega \left(\frac{K\log(T)}{\epsilon} \right)$ (Shariff and Sheffet, 2018).
562	Deep Gaussian Processes with Importance-Weighted Variational Inference	Hugh Salimbeni, Vincent Dutordoir, James Hensman, Marc Deisenroth	We instead incorporate noisy variables as latent covariates, and propose a novel importance-weighted objective, which leverages analytic results and provides a mechanism to trade off computation for improved accuracy.
563	Multivariate Submodular Optimization	Richard Santiago, F. Bruce Shepherd	In this work we focus on a more general class of multivariate submodular optimization (MVSO) problems: $\min/\max f (S_1,S_2,\ldots,S_k): S_1 \uplus S_2 \uplus \cdots \uplus S_k \in \mathcal{F}$.
564	Near optimal finite time identification of arbitrary linear dynamical systems	Tuhin Sarkar, Alexander Rakhlin	We provide the first analysis of the general case when eigenvalues of the LTI system are arbitrarily distributed in three regimes: stable, marginally stable, and explosive.
565	Breaking Inter-Layer Co-Adaptation by Classifier Anonymization	Ikuro Sato, Kohta Ishikawa, Guoqing Liu, Masayuki Tanaka	We introduce a method called Feature-extractor Optimization through Classifier Anonymization (FOCA), which is designed to avoid an explicit co-adaptation between a feature extractor and a particular classifier by using many randomly-generated, weak classifiers during optimization.
566	A Theoretical Analysis of Contrastive Unsupervised Representation Learning	Nikunj Saunshi, Orestis Plevrakis, Sanjeev Arora, Mikhail Khodak, Hrishikesh Khandeparkar	The current paper uses the term contrastive learning for such algorithms and presents a theoretical framework for analyzing them by introducing latent classes and hypothesizing that semantically similar points are sampled from the same latent class.
567	Locally Private Bayesian Inference for Count Models	Aaron Schein, Zhiwei Steven Wu, Alexandra Schofield, Mingyuan Zhou, Hanna Wallach	We present a general and modular method for privacy-preserving Bayesian inference for Poisson factorization, a broad class of models that includes some of the most widely used models in the social sciences.
568	Weakly-Supervised Temporal Localization via Occurrence Count Learning	Julien Schroeter, Kirill Sidorov, David Marshall	We propose a novel model for temporal detection and localization which allows the training of deep neural networks using only counts of event occurrences as training labels.
569	Discovering Context Effects from Raw Choice Data	Arjun Seshadri, Alex Peysakhovich, Johan Ugander	In this work, our goal is to discover such choice set effects from raw choice data.
570	On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference	Rohin Shah, Noah Gundotra, Pieter Abbeel, Anca Dragan	Our goal is for agents to optimize the right reward function, despite how difficult it is for us to specify what that is.
571	Exploration Conscious Reinforcement Learning Revisited	Lior Shani, Yonathan Efroni, Shie Mannor	In this work, we take a different approach and study exploration-conscious criteria, that result in optimal policies with respect to the exploration mechanism.
572	Compressed Factorization: Fast and Accurate Low-Rank Factorization of Compressively-Sensed Data	Vatsal Sharan, Kai Sheng Tai, Peter Bailis, Gregory Valiant	In this work, we consider the question of accurately and efficiently computing low-rank matrix or tensor factorizations given data compressed via random projections.
573	Conditional Independence in Testing Bayesian Networks	Yujia Shen, Haiying Huang, Arthur Choi, Adnan Darwiche	In this paper, we study conditional independence in TBNs, showing that it can be inferred from d-separation as in BNs.
574	Learning to Clear the Market	Weiran Shen, Sebastien Lahaie, Renato Paes Leme	In this work, we cast the problem of predicting clearing prices into a learning framework and use the resulting models to perform revenue optimization in auctions and markets with contextual information.
575	Mixture Models for Diverse Machine Translation: Tricks of the Trade	Tianxiao Shen, Myle Ott, Michael Auli, Marc?Aurelio Ranzato	Mixture Models for Diverse Machine Translation: Tricks of the Trade.
576	Hessian Aided Policy Gradient	Zebang Shen, Alejandro Ribeiro, Hamed Hassani, Hui Qian, Chao Mi	This paper presents a Hessian aided policy gradient method with the first improved sample complexity of $\OM({1}/{\epsilon^3})$.
577	Learning with Bad Training Data via Iterative Trimmed Loss Minimization	Yanyao Shen, Sujay Sanghavi	In this paper, we study a simple and generic framework to tackle the problem of learning model parameters when a fraction of the training samples are corrupted.
578	Replica Conditional Sequential Monte Carlo	Alex Shestopaloff, Arnaud Doucet	We propose a Markov chain Monte Carlo (MCMC) scheme to perform state inference in non-linear non-Gaussian state-space models.
579	Scalable Training of Inference Networks for Gaussian-Process Models	Jiaxin Shi, Mohammad Emtiyaz Khan, Jun Zhu	We propose an algorithm that enables such training by tracking a stochastic, functional mirror-descent algorithm.
580	Fast Direct Search in an Optimally Compressed Continuous Target Space for Efficient Multi-Label Active Learning	Weishi Shi, Qi Yu	We propose a novel CS-BPCA process that integrates compressed sensing and Bayesian principal component analysis to perform a two-level label transformation, resulting in an optimally compressed continuous target space.
581	Model-Based Active Exploration	Pranav Shyam, Wojciech Jaskowski, Faustino Gomez	This paper introduces an efficient active exploration algorithm, Model-Based Active eXploration (MAX), which uses an ensemble of forward models to plan to observe novel events.
582	Rehashing Kernel Evaluation in High Dimensions	Paris Siminelakis, Kexin Rong, Peter Bailis, Moses Charikar, Philip Levis	In this paper, we close the gap between theory and practice by addressing these challenges via provable and practical procedures for adaptive sample size selection, preprocessing time reduction, and refined variance bounds that quantify the data-dependent performance of random sampling and hashing-based kernel evaluation methods.
583	Revisiting precision recall definition for generative modeling	Loic Simon, Ryan Webster, Julien Rabin	In this article we revisit the definition of Precision-Recall (PR) curves for generative models proposed by (Sajjadi et al., 2018).
584	First-Order Adversarial Vulnerability of Neural Networks and Input Dimension	Carl-Johann Simon-Gabriel, Yann Ollivier, Leon Bottou, Bernhard Sch?lkopf, David Lopez-Paz	We show that adversarial vulnerability increases with the gradients of the training objective when viewed as a function of the inputs.
585	Refined Complexity of PCA with Outliers	Kirill Simonov, Fedor Fomin, Petr Golovach, Fahad Panolan	We provide a rigorous algorithmic analysis of the problem.
586	A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks	Umut Simsekli, Levent Sagun, Mert Gurbuzbalaban	Accordingly, we propose to analyze SGD as an SDE driven by a Lévy motion.
587	Non-Parametric Priors For Generative Adversarial Networks	Rajhans Singh, Pavan Turaga, Suren Jayasuriya, Ravi Garg, Martin Braun	We present a straightforward formalization of this problem; using basic results from probability theory and off-the-shelf-optimization tools, we develop ways to arrive at appropriate non-parametric priors.
588	Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation	Sahil Singla, Eric Wallace, Shi Feng, Soheil Feizi	We use an L0 – L1 relaxation technique along with proximal gradient descent to efficiently compute group-feature importance values.
589	kernelPSI: a Post-Selection Inference Framework for Nonlinear Variable Selection	Lotfi Slim, Cl?ment Chatelain, Chloe-Agathe Azencott, Jean-Philippe Vert	In the present work, we exploit recent advances in post-selection inference to propose a valid statistical test for the association of a joint model of the selected kernels with the outcome.
590	GEOMetrics: Exploiting Geometric Structure for Graph-Encoded Objects	Edward Smith, Adriana Romero, Scott Fujimoto, David Meger	In this paper, we argue that the graph representation of geometric objects allows for additional structure, which should be leveraged for enhanced reconstruction.
591	The Evolved Transformer	David So, Quoc Le, Chen Liang	Our goal is to apply NAS to search for a better alternative to the Transformer.
592	QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning	Kyunghwan Son, Daewoo Kim, Wan Ju Kang, David Earl Hostallero, Yung Yi	In this paper, we propose a new factorization method for MARL, QTRAN, which is free from such structural constraints and takes on a new approach to transforming the original joint action-value function into an easily factorizable one, with the same optimal actions.
593	Distribution calibration for regression	Hao Song, Tom Diethe, Meelis Kull, Peter Flach	We introduce the novel concept of distribution calibration, and demonstrate its advantages over the existing definition of quantile calibration.
594	SELFIE: Refurbishing Unclean Samples for Robust Deep Learning	Hwanjun Song, Minseok Kim, Jae-Gil Lee	To overcome overfitting on the noisy labels, we propose a novel robust training method called SELFIE.
595	Revisiting the Softmax Bellman Operator: New Benefits and New Perspective	Zhao Song, Ron Parr, Lawrence Carin	To better understand how and why this occurs, we revisit theoretical properties of the softmax Bellman operator, and prove that (i) it converges to the standard Bellman operator exponentially fast in the inverse temperature parameter, and (ii) the distance of its Q function from the optimal one can be bounded.
596	MASS: Masked Sequence to Sequence Pre-training for Language Generation	Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu	Inspired by the success of BERT, we propose MAsked Sequence to Sequence pre-training (MASS) for the encoder-decoder based language generation tasks.
597	Dual Entangled Polynomial Code: Three-Dimensional Coding for Distributed Matrix Multiplication	Pedro Soto, Jun Li, Xiaodi Fan	In this paper, we propose dual entangled polynomial (DEP) codes that require around 25% fewer tasks than EP codes by executing two matrix multiplications on each task.
598	Compressing Gradient Optimizers via Count-Sketches	Ryan Spring, Anastasios Kyrillidis, Vijai Mohan, Anshumali Shrivastava	Theoretically, we prove that count-sketch optimization maintains the SGD convergence rate, while gracefully reducing memory usage for large-models.
599	Escaping Saddle Points with Adaptive Gradient Methods	Matthew Staib, Sashank Reddi, Satyen Kale, Sanjiv Kumar, Suvrit Sra	In this paper, we seek a crisp, clean and precise characterization of their behavior in nonconvex settings.
600	Faster Attend-Infer-Repeat with Tractable Probabilistic Models	Karl Stelzner, Robert Peharz, Kristian Kersting	In this paper, we show that the speed and robustness of learning in AIR can be considerably improved by replacing the intractable object representations with tractable probabilistic models.
601	Insertion Transformer: Flexible Sequence Generation via Insertion Operations	Mitchell Stern, William Chan, Jamie Kiros, Jakob Uszkoreit	We present the Insertion Transformer, an iterative, partially autoregressive model for sequence generation based on insertion operations.
602	BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning	Asa Cooper Stickland, Iain Murray	We explore multi-task approaches that share a \hbox{single} BERT model with a small number of additional task-specific parameters.
603	Learning Optimal Linear Regularizers	Matthew Streeter	We present algorithms for efficiently learning regularizers that improve generalization.
604	CAB: Continuous Adaptive Blending for Policy Evaluation and Learning	Yi Su, Lequn Wang, Michele Santacatterina, Thorsten Joachims	In this paper, we identify a family of counterfactual estimators which subsumes most such estimators proposed to date.
605	Learning Distance for Sequences by Learning a Ground Metric	Bing Su, Ying Wu	We propose to learn the distance for sequences through learning a ground Mahalanobis metric for the vectors in sequences.
606	Contextual Memory Trees	Wen Sun, Alina Beygelzimer, Hal Daum? Iii, John Langford, Paul Mineiro	We design and study a Contextual Memory Tree (CMT), a learning memory controller that inserts new memories into an experience store of unbounded size.
607	Provably Efficient Imitation Learning from Observation Alone	Wen Sun, Anirudh Vemula, Byron Boots, Drew Bagnell	We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner.
608	Active Learning for Decision-Making from Imbalanced Observational Data	Iiris Sundin, Peter Schulam, Eero Siivola, Aki Vehtari, Suchi Saria, Samuel Kaski	We propose to assess the decision-making reliability by estimating the ITE model’s Type S error rate, which is the probability of the model inferring the sign of the treatment effect wrong.
609	Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness	Raphael Suter, Djordje Miladinovic, Bernhard Sch?lkopf, Stefan Bauer	We provide a causal perspective on representation learning which covers disentanglement and domain shift robustness as special cases.
610	Hyperbolic Disk Embeddings for Directed Acyclic Graphs	Ryota Suzuki, Ryusuke Takahama, Shun Onoda	Tackling in this problem, we develop Disk Embeddings, which is a framework for embedding DAGs into quasi-metric spaces.
611	Accelerated Flow for Probability Distributions	Amirhossein Taghvaei, Prashant Mehta	This paper presents a methodology and numerical algorithms for constructing accelerated gradient flows on the space of probability distributions.
612	Equivariant Transformer Networks	Kai Sheng Tai, Peter Bailis, Gregory Valiant	We propose Equivariant Transformers (ETs), a family of differentiable image-to-image mappings that improve the robustness of models towards pre-defined continuous transformation groups.
613	Making Deep Q-learning methods robust to time discretization	Corentin Tallec, L?onard Blier, Yann Ollivier	In this paper, we identify sensitivity to time dis- cretization in near continuous-time environments as a critical factor; this covers, e.g., changing the number of frames per second, or the action frequency of the controller.
614	EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks	Mingxing Tan, Quoc Le	In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance.
615	Hierarchical Decompositional Mixtures of Variational Autoencoders	Ping Liang Tan, Robert Peharz	Since these problems become generally more severe in high dimensions, we propose a novel hierarchical mixture model over low-dimensional VAE experts.
616	Mallows ranking models: maximum likelihood estimate and regeneration	Wenpin Tang	Motivated by the infinite top-$t$ ranking model, we propose an algorithm to select the model size $t$ automatically.
617	Correlated Variational Auto-Encoders	Da Tang, Dawen Liang, Tony Jebara, Nicholas Ruozzi	We propose Correlated Variational Auto-Encoders (CVAEs) that can take the correlation structure into consideration when learning latent representations with VAEs.
618	The Variational Predictive Natural Gradient	Da Tang, Rajesh Ranganath	To address this, we construct a new natural gradient called the Variational Predictive Natural Gradient (VPNG).
619	$\textttDoubleSqueeze$: Parallel Stochastic Gradient Descent with Double-pass Error-Compensated Compression	Hanlin Tang, Chen Yu, Xiangru Lian, Tong Zhang, Ji Liu	In this work, we provide a detailed analysis on this two-pass communication model, with error-compensated compression both on the worker nodes and on the parameter server.
620	Adaptive Neural Trees	Ryutaro Tanno, Kai Arulkumaran, Daniel Alexander, Antonio Criminisi, Aditya Nori	We unite the two via adaptive neural trees (ANTs), a model that incorporates representation learning into edges, routing functions and leaf nodes of a decision tree, along with a backpropagation-based training algorithm that adaptively grows the architecture from primitive modules (e.g., convolutional layers).
621	Variational Annealing of GANs: A Langevin Perspective	Chenyang Tao, Shuyang Dai, Liqun Chen, Ke Bai, Junya Chen, Chang Liu, Ruiyi Zhang, Georgiy Bobashev, Lawrence Carin Duke	We highlight new insights from variational theory of diffusion processes to derive a likelihood-based regularizing scheme for GAN training, and present a novel approach to train GANs with an unnormalized distribution instead of empirical samples.
622	Predicate Exchange: Inference with Declarative Knowledge	Zenna Tavares, Rajesh Ranganath, Javier Burroni, Armando Solar-Lezama, Edgar Minasyan	To support a broader class of predicates, we develop an inference procedure called predicate exchange, which softens predicates.
623	The Natural Language of Actions	Guy Tennenholtz, Shie Mannor	We introduce Act2Vec, a general framework for learning context-based action representation for Reinforcement Learning.
624	Kernel Normalized Cut: a Theoretical Revisit	Yoshikazu Terada, Michio Yamamoto	In this paper, we study the theoretical properties of clustering based on the kernel normalized cut.
625	Action Robust Reinforcement Learning and Applications in Continuous Control	Chen Tessler, Yonathan Efroni, Shie Mannor	In this work we formalize two new criteria of robustness to action uncertainty.
626	Concentration Inequalities for Conditional Value at Risk	Philip Thomas, Erik Learned-Miller	In this paper we derive new concentration inequalities for the conditional value at risk (CVaR) of a random variable, and compare them to the previous state of the art (Brown, 2007).
627	Combating Label Noise in Deep Learning using Abstention	Sunil Thulasidasan, Tanmoy Bhattacharya, Jeff Bilmes, Gopinath Chennupati, Jamal Mohd-Yusof	We introduce a novel method to combat label noise when training deep neural networks for classification.
628	ELF OpenGo: an analysis and open reimplementation of AlphaZero	Yuandong Tian, Jerry Ma, Qucheng Gong, Shubho Sengupta, Zhuoyuan Chen, James Pinkerton, Larry Zitnick	Toward elucidating unresolved mysteries and facilitating future research, we propose ELF OpenGo, an open-source reimplementation of the AlphaZero algorithm.
629	Random Matrix Improved Covariance Estimation for a Large Class of Metrics	Malik Tiomoko, Romain Couillet, Florent Bouchard, Guillaume Ginolhac	Relying on recent advances in statistical estimation of covariance distances based on random matrix theory, this article proposes an improved covariance and precision matrix estimation for a wide family of metrics.
630	Transfer of Samples in Policy Search via Multiple Importance Sampling	Andrea Tirinzoni, Mattia Salvini, Marcello Restelli	In this paper, we consider the more complex case of reusing samples in policy search methods, in which the agent is required to transfer entire trajectories between environments with different transition models.
631	Optimal Transport for structured data with application on graphs	Vayer Titouan, Nicolas Courty, Romain Tavenard, Chapel Laetitia, R?mi Flamary	This work considers the problem of computing distances between structured objects such as undirected graphs, seen as probability distributions in a specific metric space.
632	Discovering Latent Covariance Structures for Multiple Time Series	Anh Tong, Jaesik Choi	We present a pragmatic search algorithm which explores a larger structure space efficiently.
633	Bayesian Generative Active Deep Learning	Toan Tran, Thanh-Toan Do, Ian Reid, Gustavo Carneiro	In this paper, we propose a Bayesian generative active deep learning approach that combines active learning with data augmentation – we provide theoretical and empirical evidence (MNIST, CIFAR-$\{10,100\}$, and SVHN) that our approach has more efficient training and better classification results than data augmentation and active learning.
634	DeepNose: Using artificial neural networks to represent the space of odorants	Ngoc Tran, Daniel Kepple, Sergey Shuvaev, Alexei Koulakov	We propose that DeepNose network can extract de novo chemical features predictive of various bioactivities and can help understand the factors influencing the composition of ORs ensemble.
635	LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations	Brian Trippe, Jonathan Huggins, Raj Agrawal, Tamara Broderick	We propose to reduce time and memory costs with a low-rank approximation of the data in an approach we call LR-GLM.
636	Learning Hawkes Processes Under Synchronization Noise	William Trouleau, Jalal Etesami, Matthias Grossglauser, Negar Kiyavash, Patrick Thiran	We characterize the robustness of the classic maximum likelihood estimator to synchronization noise, and we introduce a new approach for learning the causal structure in the presence of noise.
637	Homomorphic Sensing	Manolis Tsakiris, Liangzu Peng	In this paper we introduce an abstraction of this problem which we call “homomorphic sensing”.
638	Metropolis-Hastings Generative Adversarial Networks	Ryan Turner, Jane Hung, Eric Frank, Yunus Saatchi, Jason Yosinski	We introduce the Metropolis-Hastings generative adversarial network (MH-GAN), which combines aspects of Markov chain Monte Carlo and GANs.
639	Distributed, Egocentric Representations of Graphs for Detecting Critical Structures	Ruo-Chun Tzeng, Shan-Hung Wu	In this paper, we propose a novel graph embedding model, called the Ego-CNNs, that employs the ego-convolutions convolutions at each layer and stacks up layers using an ego-centric way to detects precise critical structures efficiently.
640	Sublinear Space Private Algorithms Under the Sliding Window Model	Jalaj Upadhyay	In this paper, we study heavy hitters in the sliding window model with window size $w$.
641	Fairness without Harm: Decoupled Classifiers with Preference Guarantees	Berk Ustun, Yang Liu, David Parkes	In this work, we argue that when there is this kind of treatment disparity, then it should be in the best interest of each group.
642	Large-Scale Sparse Kernel Canonical Correlation Analysis	Viivi Uurtio, Sahely Bhadra, Juho Rousu	This paper presents gradKCCA, a large-scale sparse non-linear canonical correlation method.
643	Characterization of Convex Objective Functions and Optimal Expected Convergence Rates for SGD	Marten Van Dijk, Lam Nguyen, Phuong Ha Nguyen, Dzung Phan	We introduce a definitional framework and theory that defines and characterizes a core property, called curvature, of convex objective functions.
644	Composing Value Functions in Reinforcement Learning	Benjamin Van Niekerk, Steven James, Adam Earle, Benjamin Rosman	Under the assumption of deterministic dynamics, we prove that optimal value function composition can be achieved in entropy-regularised reinforcement learning (RL), and extend this result to the standard RL setting.
645	Model Comparison for Semantic Grouping	Francisco Vargas, Kamen Brestnichki, Nils Hammerla	We introduce a probabilistic framework for quantifying the semantic similarity between two groups of embeddings.
646	Learning Dependency Structures for Weak Supervision Models	Paroma Varma, Frederic Sala, Ann He, Alexander Ratner, Christopher Re	We focus on a robust PCA-based algorithm for learning these dependency structures, establish improved theoretical recovery rates, and outperform existing methods on various real-world tasks.
647	Probabilistic Neural Symbolic Models for Interpretable Visual Question Answering	Ramakrishna Vedantam, Karan Desai, Stefan Lee, Marcus Rohrbach, Dhruv Batra, Devi Parikh	We propose a new class of probabilistic neural-symbolic models, that have symbolic functional programs as a latent, stochastic variable.
648	Manifold Mixup: Better Representations by Interpolating Hidden States	Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, David Lopez-Paz, Yoshua Bengio	To address these issues, we propose \manifoldmixup{}, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden representations.
649	Maximum Likelihood Estimation for Learning Populations of Parameters	Ramya Korlakai Vinayak, Weihao Kong, Gregory Valiant, Sham Kakade	After observing the outcomes of $t$ independent Bernoulli trials, i.e., $X_i \sim \text{Binomial}(t, p_i)$ per individual, our objective is to accurately estimate $P^\star$ in the sparse regime, namely when $t \ll N$.
650	Understanding Priors in Bayesian Neural Networks at the Unit Level	Mariia Vladimirova, Jakob Verbeek, Pablo Mesejo, Julyan Arbel	We investigate deep Bayesian neural networks with Gaussian priors on the weights and a class of ReLU-like nonlinearities.
651	On the Design of Estimators for Bandit Off-Policy Evaluation	Nikos Vlassis, Aurelien Bibaut, Maria Dimakopoulou, Tony Jebara	We present our main results in the context of multi-armed bandits, and we propose a simple design for contextual bandits that gives rise to an estimator that is shown to perform well in multi-class cost-sensitive classification datasets.
652	Learning to select for a predefined ranking	Aleksandr Vorobev, Aleksei Ustimenko, Gleb Gusev, Pavel Serdyukov	In this paper, we formulate a novel problem of learning to select a set of items maximizing the quality of their ordered list, where the order is predefined by some explicit rule.
653	On the Limitations of Representing Functions on Sets	Edward Wagstaff, Fabian Fuchs, Martin Engelcke, Ingmar Posner, Michael A. Osborne	Motivated by this observation, we prove that an implementation of this model via continuous mappings (as provided by e.g. neural networks or Gaussian processes) actually imposes a constraint on the dimensionality of the latent space.
654	Graph Convolutional Gaussian Processes	Ian Walker, Ben Glocker	We propose a novel Bayesian nonparametric method to learn translation-invariant relationships on non-Euclidean domains.
655	Gaining Free or Low-Cost Interpretability with Interpretable Partial Substitute	Tong Wang	Under this framework, we develop a Hybrid Rule Sets (HyRS) model that uses decision rules to capture the subspace of data where the rules are as accurate or almost as accurate as the black-box provided.
656	Convolutional Poisson Gamma Belief Network	Chaojie Wang, Bo Chen, Sucheng Xiao, Mingyuan Zhou	In this paper, we propose convolutional Poisson factor analysis (CPFA) that directly operates on a lossless representation that processes the words in each document as a sequence of high-dimensional one-hot vectors.
657	Differentially Private Empirical Risk Minimization with Non-convex Loss Functions	Di Wang, Changyou Chen, Jinhui Xu	We study the problem of Empirical Risk Minimization (ERM) with (smooth) non-convex loss functions under the differential-privacy (DP) model.
658	Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation	Ruohan Wang, Carlo Ciliberto, Pierluigi Vito Amadori, Yiannis Demiris	We propose a new framework for imitation learning by estimating the support of the expert policy to compute a fixed reward function, which allows us to re-frame imitation learning within the standard reinforcement learning setting.
659	SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver	Po-Wei Wang, Priya Donti, Bryan Wilder, Zico Kolter	In this paper, we propose a new direction toward this goal by introducing a differentiable (smoothed) maximum satisfiability (MAXSAT) solver that can be integrated into the loop of larger deep learning systems.
660	Improving Neural Language Modeling via Adversarial Training	Dilin Wang, Chengyue Gong, Qiang Liu	In this paper, we present a simple yet highly effective adversarial training mechanism for regularizing neural language models.
661	EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis	Chaoqi Wang, Roger Grosse, Sanja Fidler, Guodong Zhang	In particular, we highlight that the improvements are especially significant for more challenging datasets and networks.
662	Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models	Dilin Wang, Qiang Liu	In this work, we present a variational approach for diversity-promoting learning, which leverages the entropy functional as a natural mechanism for enforcing diversity.
663	On the Convergence and Robustness of Adversarial Training	Yisen Wang, Xingjun Ma, James Bailey, Jinfeng Yi, Bowen Zhou, Quanquan Gu	In this paper, we propose such a criterion, namely First-Order Stationary Condition for constrained optimization (FOSC), to quantitatively evaluate the convergence quality of adversarial examples found in the inner maximization.
664	State-Regularized Recurrent Neural Networks	Cheng Wang, Mathias Niepert	We aim to address both shortcomings with a class of recurrent networks that use a stochastic state transition mechanism between cell applications.
665	Deep Factors for Forecasting	Yuyang Wang, Alex Smola, Danielle Maddix, Jan Gasthaus, Dean Foster, Tim Januschowski	In this paper, we propose a hybrid model that incorporates the benefits of both approaches.
666	Repairing without Retraining: Avoiding Disparate Impact with Counterfactual Distributions	Hao Wang, Berk Ustun, Flavio Calmon	In this paper, we exploit this fact to reduce the disparate impact of a fixed classification model over a population of interest.
667	On Sparse Linear Regression in the Local Differential Privacy Model	Di Wang, Jinhui Xu	In this paper, we study the sparse linear regression problem under the Local Differential Privacy (LDP) model.
668	Doubly Robust Joint Learning for Recommendation on Data Missing Not at Random	Xiaojie Wang, Rui Zhang, Yu Sun, Jianzhong Qi	To achieve good performance guarantees, based on this estimator, we propose joint learning of rating prediction and error imputation, which outperforms the state-of-the-art approaches on four real-world datasets.
669	On the Generalization Gap in Reparameterizable Reinforcement Learning	Huan Wang, Stephan Zheng, Caiming Xiong, Richard Socher	We focus on the special class of reparameterizable RL problems, where the trajectory distribution can be decomposed using the reparametrization trick.
670	Bias Also Matters: Bias Attribution for Deep Neural Network Explanation	Shengjie Wang, Tianyi Zhou, Jeff Bilmes	In this paper, we observe that since the bias in a DNN also has a non-negligible contribution to the correctness of predictions, it can also play a significant role in understanding DNN behavior.
671	Jumpout : Improved Dropout for Deep Neural Networks with ReLUs	Shengjie Wang, Tianyi Zhou, Jeff Bilmes	We discuss three novel insights about dropout for DNNs with ReLUs: 1) dropout encourages each local linear piece of a DNN to be trained on data points from nearby regions; 2) the same dropout rate results in different (effective) deactivation rates for layers with different portions of ReLU-deactivated neurons; and 3) the rescaling factor of dropout causes a normalization inconsistency between training and test when used together with batch normalization.
672	AdaGrad stepsizes: sharp convergence over nonconvex landscapes	Rachel Ward, Xiaoxia Wu, Leon Bottou	We bridge this gap by providing strong theoretical guarantees for the convergence of AdaGrad over smooth, nonconvex landscapes.
673	Generalized Linear Rule Models	Dennis Wei, Sanjeeb Dash, Tian Gao, Oktay Gunluk	This paper considers generalized linear models using rule-based features, also referred to as rule ensembles, for regression and probabilistic classification.
674	On the statistical rate of nonlinear recovery in generative models with heavy-tailed data	Xiaohan Wei, Zhuoran Yang, Zhaoran Wang	In this paper, we make a step towards such a direction by considering the scenario where the measurements are non-Gaussian, subject to possibly unknown nonlinear transformations and the responses are heavy-tailed.
675	CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration	Gellert Weisz, Andras Gyorgy, Csaba Szepesvari	In this paper we present a new algorithm, CapsAndRuns, which finds a near-optimal configuration while using time that scales (in a problem dependent way) with the optimal expected capped runtime, significantly strengthening previous results which could only guarantee a bound that scaled with the potentially much larger optimal expected uncapped runtime.
676	Non-Monotonic Sequential Text Generation	Sean Welleck, Kiant? Brantley, Hal Daum? Iii, Kyunghyun Cho	In this work, we propose a framework for training models of text generation that operate in non-monotonic orders; the model directly learns good orders, without any additional annotation.
677	PROVEN: Verifying Robustness of Neural Networks with a Probabilistic Approach	Lily Weng, Pin-Yu Chen, Lam Nguyen, Mark Squillante, Akhilan Boopathy, Ivan Oseledets, Luca Daniel	We propose a novel framework PROVEN to \textbf{PRO}babilistically \textbf{VE}rify \textbf{N}eural network’s robustness with statistical guarantees.
678	Learning deep kernels for exponential family densities	Li Wenliang, Dougal Sutherland, Heiko Strathmann, Arthur Gretton	We provide a scheme for learning a kernel parameterized by a deep network, which can find complex location-dependent local features of the data geometry.
679	Improving Model Selection by Employing the Test Data	Max Westphal, Werner Brannath	We investigate the properties of novel evaluation strategies, namely when the final model is selected based on empirical performances on the test data.
680	Automatic Classifiers as Scientific Instruments: One Step Further Away from Ground-Truth	Jacob Whitehill, Anand Ramakrishnan	We examine how the accuracy of d, as quantified by the correlation q of d’s out- puts with the ground-truth construct U, impacts the estimated correlation between U (e.g., stress) and some other phenomenon V (e.g., academic performance).
681	Moment-Based Variational Inference for Markov Jump Processes	Christian Wildner, Heinz Koeppl	We propose moment-based variational inference as a flexible framework for approximate smoothing of latent Markov jump processes.
682	End-to-End Probabilistic Inference for Nonstationary Audio Analysis	William Wilkinson, Michael Andersen, Joshua D. Reiss, Dan Stowell, Arno Solin	We show how time-frequency analysis and nonnegative matrix factorisation can be jointly formulated as a spectral mixture Gaussian process model with nonstationary priors over the amplitude variance parameters.
683	Fairness risk measures	Robert Williamson, Aditya Menon	In this paper, we propose a new definition of fairness that generalises some existing proposals, while allowing for generic sensitive features and resulting in a convex objective.
684	Partially Exchangeable Networks and Architectures for Learning Summary Statistics in Approximate Bayesian Computation	Samuel Wiqvist, Pierre-Alexandre Mattei, Umberto Picchini, Jes Frellsen	We present a novel family of deep neural architectures, named partially exchangeable networks (PENs) that leverage probabilistic symmetries.
685	Wasserstein Adversarial Examples via Projected Sinkhorn Iterations	Eric Wong, Frank Schmidt, Zico Kolter	In this paper, we propose a new threat model for adversarial attacks based on the Wasserstein distance.
686	Imitation Learning from Imperfect Demonstration	Yueh-Hua Wu, Nontawat Charoenphakdee, Han Bao, Voot Tangkaratt, Masashi Sugiyama	To effectively learn from imperfect demonstrations, we propose a novel approach that utilizes confidence scores, which describe the quality of demonstrations.
687	Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling	Shanshan Wu, Alex Dimakis, Sujay Sanghavi, Felix Yu, Daniel Holtmann-Rice, Dmitry Storcheus, Afshin Rostamizadeh, Sanjiv Kumar	In this paper we present a new method to learn linear encoders that adapt to data, while still performing well with the widely used $\ell_1$ decoder.
688	Heterogeneous Model Reuse via Optimizing Multiparty Multiclass Margin	Xi-Zhu Wu, Song Liu, Zhi-Hua Zhou	In this paper, we define a multiparty multiclass margin to measure the global behavior of a set of heterogeneous local models, and propose a general learning method called HMR (Heterogeneous Model Reuse) to optimize the margin.
689	Deep Compressed Sensing	Yan Wu, Mihaela Rosca, Timothy Lillicrap	Here we propose a novel framework that significantly improves both the performance and speed of signal recovery by jointly training a generator and the optimisation process for reconstruction via meta-learning.
690	Simplifying Graph Convolutional Networks	Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, Kilian Weinberger	In this paper, we reduce this excess complexity through successively removing nonlinearities and collapsing weight matrices between consecutive layers.
691	Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment	Yifan Wu, Ezra Winston, Divyansh Kaushik, Zachary Lipton	We propose asymmetrically-relaxed distribution alignment, a new approach that overcomes some limitations of standard domain-adversarial algorithms.
692	On Scalable and Efficient Computation of Large Scale Optimal Transport	Yujia Xie, Minshuo Chen, Haoming Jiang, Tuo Zhao, Hongyuan Zha	To address the scalability issue, we propose an implicit generative learning-based framework called SPOT (Scalable Push-forward of Optimal Transport).
693	Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance	Cong Xie, Sanmi Koyejo, Indranil Gupta	We present Zeno, a technique to make distributed machine learning, particularly Stochastic Gradient Descent (SGD), tolerant to an arbitrary number of faulty workers.
694	Differentiable Linearized ADMM	Xingyu Xie, Jianlong Wu, Guangcan Liu, Zhisheng Zhong, Zhouchen Lin	In this paper, we propose Differentiable Linearized ADMM (D-LADMM) for solving the problems with linear constraints.
695	Calibrated Approximate Bayesian Inference	Hanwen Xing, Geoff Nicholls, Jeong Lee	We give a general purpose computational framework for estimating the bias in coverage resulting from making approximations in Bayesian inference.
696	Power k-Means Clustering	Jason Xu, Kenneth Lange	This paper explores an alternative to Lloyd’s algorithm that retains its simplicity and mitigates its tendency to get trapped by local minima.
697	Gromov-Wasserstein Learning for Graph Matching and Node Embedding	Hongteng Xu, Dixin Luo, Hongyuan Zha, Lawrence Carin Duke	We apply the proposed method to matching problems in real-world networks, and demonstrate its superior performance compared to alternative approaches.
698	Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence	Yi Xu, Qi Qi, Qihang Lin, Rong Jin, Tianbao Yang	In this paper, we propose new stochastic optimization algorithms and study their first-order convergence theories for solving a broad family of DC functions.
699	Learning a Prior over Intent via Meta-Inverse Reinforcement Learning	Kelvin Xu, Ellis Ratner, Anca Dragan, Sergey Levine, Chelsea Finn	In this work, we exploit the insight that demonstrations from other tasks can be used to constrain the set of possible reward functions by learning a “prior” that is specifically optimized for the ability to infer expressive reward functions from limited numbers of demonstrations.
700	Variational Russian Roulette for Deep Bayesian Nonparametrics	Kai Xu, Akash Srivastava, Charles Sutton	Instead, we propose a new variational approximation, based on a method from statistical physics called Russian roulette sampling.
701	Supervised Hierarchical Clustering with Exponential Linkage	Nishant Yadav, Ari Kobren, Nicholas Monath, Andrew Mccallum	In this paper, we introduce a method for training the dissimilarity function in a way that is tightly coupled with hierarchical clustering, in particular single linkage.
702	Learning to Prove Theorems via Interacting with Proof Assistants	Kaiyu Yang, Jia Deng	In this paper, we study the problem of using machine learning to automate the interaction with proof assistants.
703	Sample-Optimal Parametric Q-Learning Using Linearly Additive Features	Lin Yang, Mengdi Wang	We propose a parametric Q-learning algorithm that finds an approximate-optimal policy using a sample size proportional to the feature dimension $K$ and invariant with respect to the size of the state space.
704	LegoNet: Efficient Convolutional Neural Networks with Lego Filters	Zhaohui Yang, Yunhe Wang, Chuanjian Liu, Hanting Chen, Chunjing Xu, Boxin Shi, Chao Xu, Chang Xu	This paper aims to build efficient convolutional neural networks using a set of Lego filters.
705	SWALP : Stochastic Weight Averaging in Low Precision Training	Guandao Yang, Tianyi Zhang, Polina Kirichenko, Junwen Bai, Andrew Gordon Wilson, Chris De Sa	This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate schedule.
706	ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation	Yuzhe Yang, Guo Zhang, Zhi Xu, Dina Katabi	This paper proposes ME-Net, a defense method that leverages matrix estimation (ME).
707	Efficient Nonconvex Regularized Tensor Completion with Structure-aware Proximal Iterations	Quanming Yao, James Tin-Yau Kwok, Bo Han	In this paper, we extend this to the more challenging problem of low-rank tensor completion.
708	Hierarchically Structured Meta-learning	Huaxiu Yao, Ying Wei, Junzhou Huang, Zhenhui Li	In this paper, based on gradient-based meta-learning, we propose a hierarchically structured meta-learning (HSML) algorithm that explicitly tailors the transferable knowledge to different clusters of tasks.
709	Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel $k$-means Clustering	Taisuke Yasuda, David Woodruff, Manuel Fernandez	In this work, we present nearly tight lower bounds on the number of kernel evaluations required to approximately solve kernel ridge regression (KRR) and kernel $k$-means clustering (KKMC) on $n$ input points.
710	Understanding Geometry of Encoder-Decoder CNNs	Jong Chul Ye, Woon Kyoung Sung	Inspired by recent theoretical understanding on generalizability, expressivity and optimization landscape of neural networks, as well as the theory of convolutional framelets, here we provide a unified theoretical framework that leads to a better understanding of geometry of encoder-decoder CNNs.
711	Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning	Dong Yin, Yudong Chen, Ramchandran Kannan, Peter Bartlett	As a by-product, we give a simpler algorithm and analysis for escaping saddle points in the usual non-Byzantine setting.
712	Rademacher Complexity for Adversarially Robust Generalization	Dong Yin, Ramchandran Kannan, Peter Bartlett	In this paper, we focus on $\ell_\infty$ attacks, and study the adversarially robust generalization problem through the lens of Rademacher complexity.
713	ARSM: Augment-REINFORCE-Swap-Merge Estimator for Gradient Backpropagation Through Categorical Variables	Mingzhang Yin, Yuguang Yue, Mingyuan Zhou	To address the challenge of backpropagating the gradient through categorical variables, we propose the augment-REINFORCE-swap-merge (ARSM) gradient estimator that is unbiased and has low variance.
714	NAS-Bench-101: Towards Reproducible Neural Architecture Search	Chris Ying, Aaron Klein, Eric Christiansen, Esteban Real, Kevin Murphy, Frank Hutter	We aim to ameliorate these problems by introducing NAS-Bench-101, the first public architecture dataset for NAS research.
715	TapNet: Neural Network Augmented with Task-Adaptive Projection for Few-Shot Learning	Sung Whan Yoon, Jun Seo, Jaekyun Moon	We propose TapNets, neural networks augmented with task-adaptive projection for improved few-shot learning.
716	Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation	Kaichao You, Ximei Wang, Mingsheng Long, Michael Jordan	To this end, we propose Deep Embedded Validation (DEV), which embeds adapted feature representation into the validation procedure to obtain unbiased estimation of the target risk with bounded variance.
717	Position-aware Graph Neural Networks	Jiaxuan You, Rex Ying, Jure Leskovec	Here we propose Position-aware Graph Neural Networks (P-GNNs), a new class of GNNs for computing position-aware node embeddings.
718	Learning Neurosymbolic Generative Models via Program Synthesis	Halley Young, Osbert Bastani, Mayur Naik	We propose to address this problem by incorporating programs representing global structure into generative models{—}e.g., a 2D for-loop may represent a repeating pattern of windows{—}along with a framework for learning these models by leveraging program synthesis to obtain training data.
719	DAG-GNN: DAG Structure Learning with Graph Neural Networks	Yue Yu, Jie Chen, Tian Gao, Mo Yu	Motivated by the widespread success of deep learning that is capable of capturing complex nonlinear mappings, in this work we propose a deep generative model and apply a variant of the structural constraint to learn the DAG.
720	How does Disagreement Help Generalization against Label Corruption?	Xingrui Yu, Bo Han, Jiangchao Yao, Gang Niu, Ivor Tsang, Masashi Sugiyama	To tackle this issue, we propose a robust learning paradigm called Co-teaching+, which bridges the “Update by Disagreement” strategy with the original Co-teaching.
721	On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization	Hao Yu, Rong Jin	For general stochastic non-convex optimization, we propose a Catalyst-like algorithm to achieve the fastest known $O(1/\sqrt{NT})$ convergence with only $O(\sqrt{NT}\log(\frac{T}{N}))$ communication rounds.
722	On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization	Hao Yu, Rong Jin, Sen Yang	This paper fills the gap by considering a distributed communication efficient momentum SGD method and proving its linear speedup property.
723	Multi-Agent Adversarial Inverse Reinforcement Learning	Lantao Yu, Jiaming Song, Stefano Ermon	In this paper, we propose MA-AIRL, a new framework for multi-agent inverse reinforcement learning, which is effective and scalable for Markov games with high-dimensional state-action space and unknown dynamics.
724	Distributed Learning over Unreliable Networks	Chen Yu, Hanlin Tang, Cedric Renggli, Simon Kassing, Ankit Singla, Dan Alistarh, Ce Zhang, Ji Liu	In this paper, we connect these two trends, and consider the following question: Can we design machine learning systems that are tolerant to network unreliability during training?
725	Online Adaptive Principal Component Analysis and Its extensions	Jianjun Yuan, Andrew Lamperski	We propose algorithms for online principal component analysis (PCA) and variance minimization for adaptive settings.
726	Generative Modeling of Infinite Occluded Objects for Compositional Scene Representation	Jinyang Yuan, Bin Li, Xiangyang Xue	We present a deep generative model which explicitly models object occlusions for compositional scene representation.
727	Differential Inclusions for Modeling Nonsmooth ADMM Variants: A Continuous Limit Theory	Huizhuo Yuan, Yuren Zhou, Chris Junchi Li, Qingyun Sun	In this paper, we analyze some well-known and widely used ADMM variants for nonsmooth optimization problems using tools of differential inclusions.
728	Trimming the $\ell_1$ Regularizer: Statistical Analysis, Optimization, and Applications to Deep Learning	Jihun Yun, Peng Zheng, Eunho Yang, Aurelie Lozano, Aleksandr Aravkin	We present the first statistical analyses for M-estimation, and characterize support recovery, $\ell_\infty$ and $\ell_2$ error of the trimmed $\ell_1$ estimates as a function of the trimming parameter h.
729	Bayesian Nonparametric Federated Learning of Neural Networks	Mikhail Yurochkin, Mayank Agarwal, Soumya Ghosh, Kristjan Greenewald, Nghia Hoang, Yasaman Khazaeni	We develop a Bayesian nonparametric framework for federated learning with neural networks.
730	Dirichlet Simplex Nest and Geometric Inference	Mikhail Yurochkin, Aritra Guha, Yuekai Sun, Xuanlong Nguyen	We propose Dirichlet Simplex Nest, a class of probabilistic models suitable for a variety of data types, and develop fast and provably accurate inference algorithms by accounting for the model’s convex geometry and low dimensional simplicial structure.
731	A Conditional-Gradient-Based Augmented Lagrangian Framework	Alp Yurtsever, Olivier Fercoq, Volkan Cevher	To this end, we propose a new conditional gradient method, based on a unified treatment of smoothing and augmented Lagrangian frameworks.
732	Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator	Alp Yurtsever, Suvrit Sra, Volkan Cevher	We propose a class of variance-reduced stochastic conditional gradient methods.
733	Context-Aware Zero-Shot Learning for Object Recognition	Eloi Zablocki, Patrick Bordes, Laure Soulier, Benjamin Piwowarski, Patrick Gallinari	Following the intuitive principle that objects tend to be found in certain contexts but not others, we propose a new and challenging approach, context-aware ZSL, that leverages semantic representations in a new way to model the conditional likelihood of an object to appear in a given context.
734	Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds	Andrea Zanette, Emma Brunskill	As a step towards this we derive an algorithm and analysis for finite horizon discrete MDPs with state-of-the-art worst-case regret bounds and substantially tighter bounds if the RL environment has special features but without apriori knowledge of the environment from the algorithm.
735	Global Convergence of Block Coordinate Descent in Deep Learning	Jinshan Zeng, Tim Tsz-Kit Lau, Shaobo Lin, Yuan Yao	In this paper, we aim at providing a general methodology for provable convergence guarantees for this type of methods.
736	Making Convolutional Networks Shift-Invariant Again	Richard Zhang	We show that when integrated correctly, it is compatible with existing architectural components, such as max-pooling.
737	Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback	Chicheng Zhang, Alekh Agarwal, Hal Daum? Iii, John Langford, Sahand Negahban	We investigate the feasibility of learning from both fully-labeled supervised data and contextual bandit data.
738	When Samples Are Strategically Selected	Hanrui Zhang, Yu Cheng, Vincent Conitzer	In this paper, we introduce a theoretical framework for this problem and provide key structural and computational results.
739	Self-Attention Generative Adversarial Networks	Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena	In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks.
740	Circuit-GNN: Graph Neural Networks for Distributed Circuit Design	Guo Zhang, Hao He, Dina Katabi	We present Circuit-GNN, a graph neural network (GNN) model for designing distributed circuits.
741	LatentGNN: Learning Efficient Non-local Relations for Visual Recognition	Songyang Zhang, Xuming He, Shipeng Yan	In this work, we propose an efficient and yet flexible non-local relation representation based on a novel class of graph neural networks.
742	Neural Collaborative Subspace Clustering	Tong Zhang, Pan Ji, Mehrtash Harandi, Wenbing Huang, Hongdong Li	We introduce the Neural Collaborative Subspace Clustering, a neural model that discovers clusters of data points drawn from a union of low-dimensional subspaces.
743	Incremental Randomized Sketching for Online Kernel Learning	Xiao Zhang, Shizhong Liao	To address these issues, we propose a novel incremental randomized sketching approach for online kernel learning, which has efficient incremental maintenances with theoretical guarantees.
744	Bridging Theory and Algorithm for Domain Adaptation	Yuchen Zhang, Tianle Liu, Mingsheng Long, Michael Jordan	We introduce Margin Disparity Discrepancy, a novel measurement with rigorous generalization bounds, tailored to the distribution comparison with the asymmetric margin loss, and to the minimax optimization for easier training.
745	Adaptive Regret of Convex and Smooth Functions	Lijun Zhang, Tie-Yan Liu, Zhi-Hua Zhou	To this end, we develop novel adaptive algorithms for convex and smooth functions, and establish problem-dependent regret bounds over any interval.
746	Random Function Priors for Correlation Modeling	Aonan Zhang, John Paisley	In this paper, we introduce random function priors for $Z_n$ for modeling correlations among its $K$ dimensions $Z_{n1}$ through $Z_{nK}$, which we call population random measure embedding (PRME).
747	Co-Representation Network for Generalized Zero-Shot Learning	Fei Zhang, Guangming Shi	Hence we propose a embedding model called co-representation network to learn a more uniform visual embedding space that effectively alleviates the bias problem and helps with classification.
748	SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning	Marvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel, Matthew Johnson, Sergey Levine	In this paper, we present a method for learning representations that are suitable for iterative model-based policy improvement, even when the underlying dynamical system has complex dynamics and image observations, in that these representations are optimized for inferring simple dynamics and cost models given data from the current policy.
749	A Composite Randomized Incremental Gradient Method	Junyu Zhang, Lin Xiao	We propose a composite randomized incremental gradient method by extending the SAGA framework.
750	Fast and Stable Maximum Likelihood Estimation for Incomplete Multinomial Models	Chenyang Zhang, Guosheng Yin	We propose a fixed-point iteration approach to the maximum likelihood estimation for the incomplete multinomial model, which provides a unified framework for ranking data analysis.
751	Theoretically Principled Trade-off between Robustness and Accuracy	Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, Michael Jordan	In this work, we decompose the prediction error for adversarial examples (robust error) as the sum of the natural (classification) error and boundary error, and provide a differentiable upper bound using the theory of classification-calibrated loss, which is shown to be the tightest possible upper bound uniform over all probability distributions and measurable predictors.
752	Learning Novel Policies For Tasks	Yunbo Zhang, Wenhao Yu, Greg Turk	In this work, we present a reinforcement learning algorithm that can find a variety of policies (novel policies) for a task that is given by a task reward function.
753	Greedy Orthogonal Pivoting Algorithm for Non-Negative Matrix Factorization	Kai Zhang, Sheng Zhang, Jun Liu, Jun Wang, Jie Zhang	To address this challenge, we propose an innovative procedure called Greedy Orthogonal Pivoting Algorithm (GOPA).
754	Interpreting Adversarially Trained Convolutional Neural Networks	Tianyuan Zhang, Zhanxing Zhu	We design systematic approaches to interpret AT-CNNs in both qualitative and quantitative ways and compare them with normally trained models. Second, to achieve quantitative verification, we construct additional test datasets that destroy either textures or shapes, such as style-transferred version of clean data, saturated images and patch-shuffled ones, and then evaluate the classification accuracy of AT-CNNs and normal CNNs on these datasets.
755	Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits	Martin Zhang, James Zou, David Tse	In this paper, we propose \texttt{A}daptive \texttt{M}C multiple \texttt{T}esting (\texttt{AMT}) to estimate MC p-values and control false discovery rate in multiple testing.
756	On Learning Invariant Representations for Domain Adaptation	Han Zhao, Remi Tachet Des Combes, Kun Zhang, Geoffrey Gordon	To give a sufficient condition for domain adaptation, we propose a natural and interpretable generalization upper bound that explicitly takes into account the aforementioned shift.
757	Metric-Optimized Example Weights	Sen Zhao, Mahdi Milani Fard, Harikrishna Narasimhan, Maya Gupta	Motivated by known connections between complex test metrics and cost-weighted learning, we propose addressing these issues by using a weighted loss function with a standard loss, where the weights on the training examples are learned to optimize the test metric on a validation set.
758	Improving Neural Network Quantization without Retraining using Outlier Channel Splitting	Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Chris De Sa, Zhiru Zhang	In this work, we propose outlier channel splitting (OCS), which duplicates channels containing outliers, then halves the channel values.
759	Maximum Entropy-Regularized Multi-Goal Reinforcement Learning	Rui Zhao, Xudong Sun, Volker Tresp	On a set of multi-goal robotic tasks of OpenAI Gym, we compare our method with other baselines and show promising improvements in both performance and sample-efficiency.
760	Stochastic Iterative Hard Thresholding for Graph-structured Sparsity Optimization	Baojian Zhou, Feng Chen, Yiming Ying	In this paper, we propose a stochastic gradient-based method for solving graph-structured sparsity constraint problems, not restricted to the least square loss.
761	Lower Bounds for Smooth Nonconvex Finite-Sum Optimization	Dongruo Zhou, Quanquan Gu	In this paper, we study the lower bounds for smooth nonconvex finite-sum optimization, where the objective function is the average of $n$ nonconvex component functions.
762	Lipschitz Generative Adversarial Nets	Zhiming Zhou, Jiadong Liang, Yuxuan Song, Lantao Yu, Hongwei Wang, Weinan Zhang, Yong Yu, Zhihua Zhang	In this paper we show that generative adversarial networks (GANs) without restriction on the discriminative function space commonly suffer from the problem that the gradient produced by the discriminator is uninformative to guide the generator.
763	Toward Understanding the Importance of Noise in Training Neural Networks	Mo Zhou, Tianyi Liu, Yan Li, Dachao Lin, Enlu Zhou, Tuo Zhao	This implies that the noise enables the algorithm to efficiently escape from the spurious local optimum.
764	BayesNAS: A Bayesian Approach for Neural Architecture Search	Hongpeng Zhou, Minghao Yang, Jun Wang, Wei Pan	In this paper, we employ the classic Bayesian learning approach to alleviate these two issues by modeling architecture parameters using hierarchical automatic relevance determination (HARD) priors.
765	Transferable Clean-Label Poisoning Attacks on Deep Neural Nets	Chen Zhu, W. Ronny Huang, Hengduo Li, Gavin Taylor, Christoph Studer, Tom Goldstein	In this paper, we explore clean-label poisoning attacks on deep convolutional networks with access to neither the network’s output nor its architecture or parameters.
766	Improved Dynamic Graph Learning through Fault-Tolerant Sparsification	Chunjiang Zhu, Sabine Storandt, Kam-Yiu Lam, Song Han, Jinbo Bi	We propose a new type of graph sparsification namely fault-tolerant (FT) sparsification to significantly reduce the cost to only a constant.
767	Poission Subsampled R?nyi Differential Privacy	Yuqing Zhu, Yu-Xiang Wang	We consider the problem of privacy-amplification by under the Renyi Differential Privacy framework.
768	Learning Classifiers for Target Domain with Limited or No Labels	Pengkai Zhu, Hanxiao Wang, Venkatesh Saligrama	We propose a novel visual attribute encoding method that encodes each image as a low-dimensional probability vector composed of prototypical part-type probabilities.
769	The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects	Zhanxing Zhu, Jingfeng Wu, Bing Yu, Lei Wu, Jinwen Ma	Along this line, we study a general form of gradient based optimization dynamics with unbiased noise, which unifies SGD and standard Langevin dynamics.
770	Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization	Zhenxun Zhuang, Ashok Cutkosky, Francesco Orabona	In this paper, we propose new surrogate losses to cast the problem of learning the optimal stepsizes for the stochastic optimization of a non-convex smooth objective function onto an online convex optimization problem.
771	Latent Normalizing Flows for Discrete Sequences	Zachary Ziegler, Alexander Rush	We propose a VAE-based generative model which jointly learns a normalizing flow-based distribution in the latent space and a stochastic mapping to an observed discrete space.
772	Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously	Julian Zimmert, Haipeng Luo, Chen-Yu Wei	We develop the first general semi-bandit algorithm that simultaneously achieves $\mathcal{O}(\log T)$ regret for stochastic environments and $\mathcal{O}(\sqrt{T})$ regret for adversarial environments without knowledge of the regime or the number of rounds $T$.
773	Fast Context Adaptation via Meta-Learning	Luisa Zintgraf, Kyriacos Shiarli, Vitaly Kurin, Katja Hofmann, Shimon Whiteson	We propose CAVIA for meta-learning, a simple extension to MAML that is less prone to meta-overfitting, easier to parallelise, and more interpretable.
774	Natural Analysts in Adaptive Data Analysis	Tijana Zrnic, Moritz Hardt	In this work, we propose notions of natural analysts that smoothly interpolate between the optimal non-adaptive bounds and the best-known adaptive generalization bounds.