Paper Digest: ICLR 2020 Highlights

December 20, 2019January 13, 2020 admin

Download ICLR-2020-Paper-Digests.pdf– highlights of all ICLR-2020 papers.

The International Conference on Learning Representations (ICLR) is one of the top machine learning conferences in the world. In 2020, it is to be held in Addis Ababa, Ethiopia. There were 2,594 paper submissions, of which 48 accepted as 10 minute oral presentations, 107 accepted as 4 minute spotlight presentations and 532 as poster presentations. Around 200 papers also published their code (download link).

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: ICLR 2020 Oral Papers

	Title	Authors	Highlight	Code
1	CATER: A diagnostic dataset for Compositional Actions & TEmporal Reasoning	Rohit Girdhar, Deva Ramanan	We propose a new video understanding benchmark, with tasks that by-design require temporal reasoning to be solved, unlike most existing video datasets.
2	BackPACK: Packing more into Backprop	Felix Dangel, Frederik Kunstner, Philipp Hennig	To address this problem, we introduce BackPACK, an efficient framework built on top of PyTorch, that extends the backpropagation algorithm to extract additional information from first-and second-order derivatives.	code
3	GenDICE: Generalized Offline Estimation of Stationary Values	Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans	In this paper, we proposed a novel algorithm, GenDICE, for general stationary distribution correction estimation, which can handle both discounted and average off-policy evaluation on multiple behavior-agnostic samples.
4	Principled Weight Initialization for Hypernetworks	Oscar Chang, Lampros Flokas, Hod Lipson	The first principled weight initialization method for hypernetworks
5	On the Convergence of FedAvg on Non-IID Data	Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, Zhihua Zhang	In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs.
6	Data-dependent Gaussian Prior Objective for Language Generation	Zuchao Li, Rui Wang, Kehai Chen, Masso Utiyama, Eiichiro Sumita, Zhuosheng Zhang, Hai Zhao	We introduce an extra data-dependent Gaussian prior objective to augment the current MLE training, which is designed to capture the prior knowledge in the ground-truth data.	code
7	Contrastive Learning of Structured World Models	Thomas Kipf, Elise van der Pol, Max Welling	Contrastively-trained Structured World Models (C-SWMs) learn object-oriented state representations and a relational model of an environment from raw pixel input.	code
8	Neural Network Branching for Neural Network Verification	Jingyue Lu, M. Pawan Kumar	We propose a novel learning to branch framework using graph neural networks to improve branch and bound based neural network verification methods.
9	Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity	Jingzhao Zhang, Tianxing He, Suvrit Sra, Ali Jadbabaie	Gradient clipping provably accelerates gradient descent for non-smooth non-convex functions.
10	Posterior sampling for multi-agent reinforcement learning: solving extensive games with imperfect information	Yichi Zhou, Jialian Li, Jun Zhu	In this work, we extend PSRL to two-player zero-sum extensive-games with imperfect information (TZIEG), which is a class of multi-agent systems.
11	Mogrifier LSTM	G?bor Melis, Tom? Kocisk?, Phil Blunsom	An LSTM extension with state-of-the-art language modelling results.
12	Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech	David Harwath, Wei-Ning Hsu, James Glass	Vector quantization layers incorporated into a self-supervised neural model of speech audio learn hierarchical and discrete linguistic units (phone-like, word-like) when trained with a visual-grounding objective.
13	Mirror-Generative Neural Machine Translation	Zaixiang Zheng, Hao Zhou, Shujian Huang, Lei Li, Xin-Yu Dai, Jiajun Chen	In this paper, we propose the mirror-generative NMT (MGNMT), a single unified architecture that simultaneously integrates the source to target translation model, the target to source translation model, and two language models.
14	Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning	Ruqi Zhang, Chunyuan Li, Jianyi Zhang, Changyou Chen, Andrew Gordon Wilson	In particular, we propose a cyclical stepsize schedule, where larger steps discover new modes, and smaller steps characterize each mode.
15	Your classifier is secretly an energy based model and you should treat it like one	Will Grathwohl, Kuan-Chieh Wang, Joern-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, Kevin Swersky	We show that there is a hidden generative model inside of every classifier. We demonstrate how to train this model and show the many benefits of doing so.
16	Dynamics-Aware Unsupervised Skill Discovery	Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, Karol Hausman	We propose an unsupervised skill discovery which enables model-based planning for hierarchical reinforcement learning.
17	Optimal Strategies Against Generative Attacks	Roy Mor, Erez Peterfreund, Matan Gavish, Amir Globerson	We cast the problem as a maximin game, characterize the optimal strategy for both attacker and authenticator in the general case, and provide the optimal strategies in closed form for the case of Gaussian source distributions.	code
18	GraphZoom: A Multi-level Spectral Approach for Accurate and Scalable Graph Embedding	Chenhui Deng, Zhiqiang Zhao, Yongyu Wang, Zhiru Zhang, Zhuo Feng	A multi-level spectral approach to improving the quality and scalability of unsupervised graph embedding.	code
19	Harnessing Structures for Value-Based Planning and Reinforcement Learning	Yuzhe Yang, Guo Zhang, Zhi Xu, Dina Katabi	We propose a generic framework that allows for exploiting the low-rank structure in both planning and deep reinforcement learning.
20	Comparing Fine-tuning and Rewinding in Neural Network Pruning	Alex Renda, Jonathan Frankle, Michael Carbin	Instead of fine-tuning after pruning, rewind weights to their values earlier in training and re-train the networks to achieve higher accuracy when pruning neural networks.	code
21	Meta-Q-Learning	Rasool Fakoor, Pratik Chaudhari, Stefano Soatto, Alexander J. Smola	MQL is a simple off-policy meta-RL algorithm that recycles data from the meta-training replay buffer to adapt to new tasks.
22	Mathematical Reasoning in Latent Space	Dennis Lee, Christian Szegedy, Markus Rabe, Sarah Loos, Kshitij Bansal	Learning to reason about higher order logic formulas in the latent space.
23	A Theory of Usable Information under Computational Constraints	Yilun Xu, Shengjia Zhao, Jiaming Song, Russell Stewart, Stefano Ermon	We propose a new framework for reasoning about information in complex systems.
24	Geometric Analysis of Nonconvex Optimization Landscapes for Overcomplete Learning	Qing Qu, Yuexiang Zhai, Xiao Li, Yuqian Zhang, Zhihui Zhu	In this work, we provide new theoretical insights for several important representation learning problems: learning \emph{(i)} sparsely used overcomplete dictionaries and \emph{(ii)} convolutional dictionaries.
25	Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds	Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, Alekh Agarwal	We introduce a new batch active learning algorithm that’s robust to model architecture, batch size, and dataset.
26	Understanding and Robustifying Differentiable Architecture Search	Arber Zela, Thomas Elsken, Tonmoy Saikia, Yassine Marrakchi, Thomas Brox, Frank Hutter	We study the failure modes of DARTS (Differentiable Architecture Search) by looking at the eigenvalues of the Hessian of validation loss w.r.t. the architecture and propose robustifications based on our analysis.	code
27	A Closer Look at Deep Policy Gradients	Andrew Ilyas, Logan Engstrom, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry	To this end, we propose a fine-grained analysis of state-of-the-art methods based on key elements of this framework: gradient estimation, value prediction, and optimization landscapes.
28	Implementation Matters in Deep RL: A Case Study on PPO and TRPO	Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry	We study the roots of algorithmic progress in deep policy gradient algorithms through a case study on two popular algorithms, Proximal Policy Optimization and Trust Region Policy Optimization.	code
29	Fast Task Inference with Variational Intrinsic Successor Features	Steven Hansen, Will Dabney, Andre Barreto, David Warde-Farley, Tom Van de Wiele, Volodymyr Mnih	We introduce Variational Intrinsic Successor FeatuRes (VISR), a novel algorithm which learns controllable features that can be leveraged to provide fast task inference through the successor features framework.
30	Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks	Donghyun Na, Hae Beom Lee, Hayeon Lee, Saehoon Kim, Minseop Park, Eunho Yang, Sung Ju Hwang	A novel meta-learning model that adaptively balances the effect of the meta-learning and task-specific learning, and also class-specific learning within each task.	code
31	RNA Secondary Structure Prediction By Learning Unrolled Algorithms	Xinshi Chen, Yu Li, Ramzan Umarov, Xin Gao, Le Song	A DL model for RNA secondary structure prediction, which uses an unrolled algorithm in the architecture to enforce constraints.
32	Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search	Anji Liu, Jianshu Chen, Mingze Yu, Yu Zhai, Xuewen Zhou, Ji Liu	We developed an effective parallel UCT algorithm that achieves linear speedup and suffers negligible performance loss.
33	Target-Embedding Autoencoders for Supervised Representation Learning	Daniel Jarrett, Mihaela van der Schaar	This paper analyzes a framework for improving generalization in a purely supervised setting, where the target space is high-dimensional.
34	Reformer: The Efficient Transformer	Nikita Kitaev, Lukasz Kaiser, Anselm Levskaya	Efficient Transformer with locality-sensitive hashing and reversible layers	code
35	Rotation-invariant clustering of functional cell types in primary visual cortex	Ivan Ustyuzhaninov, Santiago A. Cadena, Emmanouil Froudarakis, Paul G. Fahey, Edgar Y. Walker, Erick Cobos, Jacob Reimer, Fabian H. Sinz, Andreas S. Tolias, Matthias Bethge, Alexander S. Ecker	We classify mouse V1 neurons into putative functional cell types based on their representations in a CNN predicting neural responses
36	Causal Discovery with Reinforcement Learning	Shengyu Zhu, Ignavier Ng, Zhitang Chen	We apply reinforcement learning to score-based causal discovery and achieve promising results on both synthetic and real datasets
37	Intrinsically Motivated Discovery of Diverse Patterns in Self-Organizing Systems	Chris Reinke, Mayalen Etcheverry, Pierre-Yves Oudeyer	We study how an unsupervised exploration and feature learning approach addresses efficiently a new problem: automatic discovery of diverse self-organized patterns in high-dim complex systems such as the game of life.	code
38	Restricting the Flow: Information Bottlenecks for Attribution	Karl Schulz, Leon Sixt, Federico Tombari, Tim Landgraf	We apply the informational bottleneck concept to attribution.	code
39	Building Deep Equivariant Capsule Networks	Sairaam Venkatraman, S. Balasubramanian, R. Raghunatha Sarma	A new scalable, group-equivariant model for capsule networks that preserves compositionality under transformations, and is empirically more transformation-robust to older capsule network models.	code
40	A Generalized Training Approach for Multiagent Learning	Paul Muller, Shayegan Omidshafiei, Mark Rowland, Karl Tuyls, Julien Perolat, Siqi Liu, Daniel Hennes, Luke Marris, Marc Lanctot, Edward Hughes, Zhe Wang, Guy Lever, Nicolas Heess, Thore Graepel, Remi Munos	This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO).
41	High Fidelity Speech Synthesis with Adversarial Networks	Mikolaj Binkowski, Jeff Donahue, Sander Dieleman, Aidan Clark, Erich Elsen, Norman Casagrande, Luis C. Cobo, Karen Simonyan	We introduce GAN-TTS, a Generative Adversarial Network for Text-to-Speech, which achieves Mean Opinion Score (MOS) 4.2.
42	SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference	Lasse Espeholt, Rapha?l Marinier, Piotr Stanczyk, Ke Wang, Marcin Michalski?	SEED RL, a scalable and efficient deep reinforcement learning agent with accelerated central inference. State of the art results, reduces cost and can process millions of frames per second.	code
43	Meta-Learning with Warped Gradient Descent	Sebastian Flennerhag, Andrei A. Rusu, Razvan Pascanu, Francesco Visin, Hujun Yin, Raia Hadsell	We propose a novel framework for meta-learning a gradient-based update rule that scales to beyond few-shot learning and is applicable to any form of learning, including continual learning.
44	Convolutional Conditional Neural Processes	Jonathan Gordon, Wessel P. Bruinsma, Andrew Y. K. Foong, James Requeima, Yann Dubois, Richard E. Turner	We extend deep sets to functional embeddings and Neural Processes to include translation equivariant members
45	Gradient Descent Maximizes the Margin of Homogeneous Neural Networks	Kaifeng Lyu, Jian Li	We study the implicit bias of gradient descent and prove under a minimal set of assumptions that the parameter direction of homogeneous models converges to KKT points of a natural margin maximization problem.
46	Adversarial Training and Provable Defenses: Bridging the Gap	Mislav Balunovic, Martin Vechev	We propose a novel combination of adversarial training and provable defenses which produces a model with state-of-the-art accuracy and certified robustness on CIFAR-10.
47	Differentiable Reasoning over a Virtual Knowledge Base	Bhuwan Dhingra, Manzil Zaheer, Vidhisha Balachandran, Graham Neubig, Ruslan Salakhutdinov, William W. Cohen	Differentiable multi-hop access to a textual knowledge base of indexed contextual representations
48	Federated Learning with Matched Averaging	Hongyi Wang, Mikhail Yurochkin, Yuekai Sun, Dimitris Papailiopoulos, Yasaman Khazaeni	Communication efficient federated learning with layer-wise matching

TABLE 2: ICLR 2020 Spotlights

	Title	Authors	Highlight	Code
1	Program Guided Agent	Shao-Hua Sun, Te-Lin Wu, Joseph J. Lim	We propose a modular framework that can accomplish tasks specified by programs and achieve zero-shot generalization to more complex tasks.
2	Sparse Coding with Gated Learned ISTA	Kailun Wu, Yiwen Guo, Ziang Li, Changshui Zhang	We propose gated mechanisms to enhance learned ISTA for sparse coding, with theoretical guarantees on the superiority of the method.
3	Graph Neural Networks Exponentially Lose Expressive Power for Node Classification	Kenta Oono, Taiji Suzuki	We relate the asymptotic behavior of graph neural networks to the graph spectra of underlying graphs and gives principled guidelines for normalizing weights.	code
4	Multi-Scale Representation Learning for Spatial Feature Distributions using Grid Cells	Gengchen Mai, Krzysztof Janowicz, Bo Yan, Rui Zhu, Ling Cai, Ni Lao	We propose a representation learning model called Space2vec to encode the absolute positions and spatial relationships of places.
5	InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization	Fan-Yun Sun, Jordan Hoffman, Vikas Verma, Jian Tang	Inspired by recent progress of unsupervised representation learning, in this paper we proposed a novel method called InfoGraph for learning graph-level representations.
6	On Robustness of Neural Ordinary Differential Equations	Hanshu YAN, Jiawei DU, Vincent TAN, Jiashi FENG	In this work, we fill this important gap by exploring robustness properties of neural ODEs both empirically and theoretically.
7	Defending Against Physically Realizable Attacks on Image Classification	Tong Wu, Liang Tong, Yevgeniy Vorobeychik	Defending Against Physically Realizable Attacks on Image Classification
8	Estimating Gradients for Discrete Random Variables by Sampling without Replacement	Wouter Kool, Herke van Hoof, Max Welling	We derive a low-variance, unbiased gradient estimator for expectations over discrete random variables based on sampling without replacement	code
9	Learning to Control PDEs with Differentiable Physics	Philipp Holl, Nils Thuerey, Vladlen Koltun	We train a combination of neural networks to predict optimal trajectories for complex physical systems.
10	Intensity-Free Learning of Temporal Point Processes	Oleksandr Shchur, Marin Bilo?, Stephan G?nnemann	Learn in temporal point processes by modeling the conditional density, not the conditional intensity.	code
11	A Signal Propagation Perspective for Pruning Neural Networks at Initialization	Namhoon Lee, Thalaiyasingam Ajanthan, Stephen Gould, Philip H. S. Torr	We formally characterize the initialization conditions for effective pruning at initialization and analyze the signal propagation properties of the resulting pruned networks which leads to a method to enhance their trainability and pruning results.
12	Rethinking the Security of Skip Connections in ResNet-like Neural Networks	Dongxian Wu, Yisen Wang, Shu-Tao Xia, James Bailey, Xingjun Ma	We identify the security weakness of skip connections in ResNet-like neural networks
13	WHITE NOISE ANALYSIS OF NEURAL NETWORKS	Ali Borji, Sikun Lin	A white noise analysis of modern deep neural networks is presented to characterizetheir biases at the whole network level or the single neuron level.
14	Neural Machine Translation with Universal Visual Representation	Zhuosheng Zhang, Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Zuchao Li, Hai Zhao	This work proposed a universal visual representation for neural machine translation (NMT) using retrieved images with similar topics to source sentence, extending image applicability in NMT.
15	Tranquil Clouds: Neural Networks for Learning Temporally Coherent Features in Point Clouds	Lukas Prantl, Nuttapong Chentanez, Stefan Jeschke, Nils Thuerey	We propose a generative neural network approach for temporally coherent point clouds.
16	PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search	Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, Hongkai Xiong	Allowing partial channel connection in super-networks to regularize and accelerate differentiable architecture search	code
17	Online and stochastic optimization beyond Lipschitz continuity: A Riemannian approach	Kimon Antonakopoulos, E. Veronica Belmega, Panayotis Mertikopoulos	We introduce a novel version of Lipschitz objective continuity that allows stochastic mirror descent methodologies to achieve optimal convergence rates in problems with singularities.
18	Enhancing Adversarial Defense by k-Winners-Take-All	Chang Xiao, Peilin Zhong, Changxi Zheng	We propose a simple change to existing neural network structures for better defending against gradient-based adversarial attacks, using the k-winners-take-all activation function.	code
19	Encoding word order in complex embeddings	Benyou Wang, Donghao Zhao, Christina Lioma, Qiuchi Li, Peng Zhang, Jakob Grue Simonsen	We present a novel and principled solution for modeling both the global absolute positions of words and their order relationships.	code
20	DDSP: Differentiable Digital Signal Processing	Jesse Engel, Lamtharn (Hanoi) Hantrakul, Chenjie Gu, Adam Roberts	Better audio synthesis by combining interpretable DSP with end-to-end learning.
21	Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation	Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang, Ming-Hsuan Yang	In this work, we address the problem of few-shot classification under domain shifts for metric-based methods.
22	Ridge Regression: Structure, Cross-Validation, and Sketching	Sifan Liu, Edgar Dobriban	We study the structure of ridge regression in a high-dimensional asymptotic framework, and get insights about cross-validation and sketching.	code
23	Finite Depth and Width Corrections to the Neural Tangent Kernel	Boris Hanin, Mihai Nica	The neural tangent kernel in a randomly initialized ReLU net is non-trivial fluctuations as long as the depth and width are comparable.
24	Meta-Learning without Memorization	Mingzhang Yin, George Tucker, Mingyuan Zhou, Sergey Levine, Chelsea Finn	We identify and formalize the memorization problem in meta-learning and solve this problem with novel meta-regularization method, which greatly expand the domain that meta-learning can be applicable to and effective on.
25	Influence-Based Multi-Agent Exploration	Tonghan Wang, Jianhao Wang, Yi Wu, Chongjie Zhang	We present two exploration methods: exploration via information-theoretic influence (EITI) and exploration via decision-theoretic influence (EDTI), by exploiting the role of interaction in coordinated behaviors of agents.	code
26	HOPPITY: LEARNING GRAPH TRANSFORMATIONS TO DETECT AND FIX BUGS IN PROGRAMS	Elizabeth Dinella, Hanjun Dai, Ziyang Li, Mayur Naik, Le Song, Ke Wang	An learning-based approach for detecting and fixing bugs in Javascript
27	Sliced Cramer Synaptic Consolidation for Preserving Deeply Learned Representations	Soheil Kolouri, Nicholas A. Ketz, Andrea Soltoggio, Praveen K. Pilly	“A novel framework for overcoming catastrophic forgetting by preserving the distribution of the network’s output at an arbitrary layer.”
28	How much Position Information Do Convolutional Neural Networks Encode?	Md Amirul Islam, Sen Jia, Neil D. B. Bruce	Our work shows positional information has been implicitly encoded in a network. This information is important for detecting position-dependent features, e.g. semantic and saliency.
29	Hamiltonian Generative Networks	Aleksandar Botev, Irina Higgins, Andrew Jaegle, Sebastian Racaniere, Danilo J. Rezende, Peter Toth	We introduce a class of generative models that reliably learn Hamiltonian dynamics from high-dimensional observations. The learnt Hamiltonian can be applied to sequence modeling or as a normalising flow.
30	COPHY: Counterfactual Learning of Physical Dynamics	Fabien Baradel, Natalia Neverova, Julien Mille, Greg Mori, Christian Wolf	We develop the COPHY benchmark to assess the capacity of the state-of-the-art models for causal physical reasoning in a synthetic 3D environment and propose a model for learning the physical dynamics in a counterfactual setting.
31	Estimating counterfactual treatment outcomes over time through adversarially balanced representations	Ioana Bica, Ahmed M Alaa, James Jordon, Mihaela van der Schaar	In this paper, we introduce the Counterfactual Recurrent Network (CRN), a novel sequence-to-sequence model that leverages the increasingly available patient observational data to estimate treatment effects over time and answer such medical questions.
32	Gradientless Descent: High-Dimensional Zeroth-Order Optimization	Daniel Golovin, John Karro, Greg Kochanski, Chansoo Lee, Xingyou Song, Qiuyi Zhang	Gradientless Descent is a provably efficient gradient-free algorithm that is monotone-invariant and fast for high-dimensional zero-th order optimization.
33	Conditional Learning of Fair Representations	Han Zhao, Amanda Coston, Tameem Adel, Geoffrey J. Gordon	We propose a novel algorithm for learning fair representations that can simultaneously mitigate two notions of disparity among different demographic subgroups.
34	Inductive Matrix Completion Based on Graph Neural Networks	Muhan Zhang, Yixin Chen	We propose an inductive matrix completion model without using side information.
35	Duration-of-Stay Storage Assignment under Uncertainty	Michael Lingzhi Li, Elliott Wolf, Daniel Wintz	We develop a new storage assignment framework with a novel neural network that enables large efficiency gains in the warehouse.	code
36	Emergence of functional and structural properties of the head direction system by optimization of recurrent neural networks	Christopher J. Cueva, Peter Y. Wang, Matthew Chin, Xue-Xin Wei	Artificial neural networks trained with gradient descent are capable of recapitulating both realistic neural activity and the anatomical organization of a biological circuit.
37	Deep neuroethology of a virtual rodent	Josh Merel, Diego Aldarondo, Jesse Marshall, Yuval Tassa, Greg Wayne, Bence Olveczky	We built a physical simulation of a rodent, trained it to solve a set of tasks, and analyzed the resulting networks.
38	Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation	Ziyang Tang, Yihao Feng, Lihong Li, Dengyong Zhou, Qiang Liu	We develop a new doubly robust estimator based on the infinite horizon density ratio and off policy value estimation.
39	Learning Compositional Koopman Operators for Model-Based Control	Yunzhu Li, Hao He, Jiajun Wu, Dina Katabi, Antonio Torralba	Learning compositional Koopman operators for efficient system identification and model-based control.
40	CLEVRER: Collision Events for Video Representation and Reasoning	Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum	We present a diagnostic dataset for systematic study of temporal and casual reasoning in videos.
41	The Logical Expressiveness of Graph Neural Networks	Pablo Barcel?, Egor V. Kostylev, Mikael Monet, Jorge P?rez, Juan Reutter, Juan Pablo Silva	We characterize the expressive power of GNNs in terms of classical logical languages, separating different GNNs and showing connections with standard notions in Knowledge Representation.	code
42	The Break-Even Point on the Optimization Trajectories of Deep Neural Networks	Stanislaw Jastrzebski, Maciej Szymczak, Stanislav Fort, Devansh Arpit, Jacek Tabor, Kyunghyun Cho, Krzysztof Geras	In the early phase of training of deep neural networks there exists a “break-even point” which determines properties of the entire optimization trajectory.
43	ALBERT: A Lite BERT for Self-supervised Learning of Language Representations	Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut	A new pretraining method that establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.
44	Disentangling neural mechanisms for perceptual grouping	Junkyung Kim, Drew Linsley, Kalpit Thakkar, Thomas Serre	Horizontal and top-down feedback connections are responsible for complementary perceptual grouping strategies in biological and recurrent vision systems.	code
45	Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees	Binghong Chen, Bo Dai, Qinjie Lin, Guo Ye, Han Liu, Le Song	We propose a meta path planning algorithm which exploits a novel attention-based neural module that can learn generalizable structures from prior experiences to drastically reduce the sample requirement for solving new path planning problems.	code
46	Symplectic Recurrent Neural Networks	Zhengdao Chen, Jianyu Zhang, Martin Arjovsky, L?on Bottou	We propose Symplectic Recurrent Neural Networks (SRNNs) as learning algorithms that capture the dynamics of physical systems from observed trajectories.
47	Asymptotics of Wide Networks from Feynman Diagrams	Ethan Dyer, Guy Gur-Ari	A general method for computing the asymptotic behavior of wide networks using Feynman diagrams
48	Learning The Difference That Makes A Difference With Counterfactually-Augmented Data	Divyansh Kaushik, Eduard Hovy, Zachary Lipton	Humans in the loop revise documents to accord with counterfactual labels, resulting resource helps to reduce reliance on spurious associations.	code
49	Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?	Simon S. Du, Sham M. Kakade, Ruosong Wang, Lin F. Yang	Exponential lower bounds for value-based and policy-based reinforcement learning with function approximation.
50	Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning	Hengyuan Hu, Jakob N Foerster	We develop Simplified Action Decoder, a simple MARL algorithm that beats previous SOTA on Hanabi by a big margin across 2- to 5-player games.	code
51	Network Deconvolution	Chengxi Ye, Matthew Evanusa, Hua He, Anton Mitrokhin, Thomas Goldstein, James A. Yorke, Cornelia Fermuller, Yiannis Aloimonos	We propose a method called network deconvolution that resembles animal vision system to train convolution networks better.	code
52	Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension	Xinyun Chen, Chen Liang, Adams Wei Yu, Denny Zhou, Dawn Song, Quoc Le	In this work, we propose the Neural Symbolic Reader (NeRd), which includes a reader, e.g., BERT, to encode the passage and question, and a programmer, e.g., LSTM, to generate a program that is executed to produce the answer.
53	Real or Not Real, that is the Question	Yuanbo Xiangli, Yubin Deng, Bo Dai*, Chen Change Loy, Dahua Lin	While generative adversarial networks (GAN) have been widely adopted in various topics, in this paper we generalize the standard GAN to a new perspective by treating realness as a random variable that can be estimated from multiple angles.
54	Dream to Control: Learning Behaviors by Latent Imagination	Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi	We present Dreamer, an agent that learns long-horizon behaviors purely by latent imagination using analytic value gradients.
55	A Probabilistic Formulation of Unsupervised Text Style Transfer	Junxian He, Xinyi Wang, Graham Neubig, Taylor Berg-Kirkpatrick	We formulate a probabilistic latent sequence model to tackle unsupervised text style transfer, and show its effectiveness across a suite of unsupervised text style transfer tasks.
56	Emergent Tool Use From Multi-Agent Autocurricula	Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, Igor Mordatch	Through multi-agent competition, the simple objective of hide-and-seek, and standard reinforcement learning algorithms at scale, we find that agents create a self-supervised autocurriculum inducing multiple distinct rounds of emergent strategy, many of which require sophisticated tool use and coordination.
57	NAS-Bench-102: Extending the Scope of Reproducible Neural Architecture Search	Xuanyi Dong, Yi Yang	A NAS benchmark applicable to almost any NAS algorithms.	code
58	Strategies for Pre-training Graph Neural Networks	Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, Jure Leskovec	We develop a strategy for pre-training Graph Neural Networks (GNNs) and systematically study its effectiveness on multiple datasets, GNN architectures, and diverse downstream tasks.
59	Behaviour Suite for Reinforcement Learning	Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepezvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado Van Hasselt	Bsuite is a collection of carefully-designed experiments that investigate the core capabilities of RL agents.	code
60	FreeLB: Enhanced Adversarial Training for Language Understanding	Chen Zhu, Yu Cheng, Zhe Gan, Siqi Sun, Thomas Goldstein	In this work, we propose a novel adversarial training algorithm – FreeLB, that promotes higher robustness and invariance in the embedding space, by adding adversarial perturbations to word embeddings and minimizing the resultant adversarial risk inside different regions around input samples.
61	Kernelized Wasserstein Natural Gradient	M Arbel, A Gretton, W Li, G Montufar	Estimator for the Wasserstein natural gradient
62	And the Bit Goes Down: Revisiting the Quantization of Neural Networks	Pierre Stock, Armand Joulin, R?mi Gribonval, Benjamin Graham, Herv? J?gou	Using a structured quantization technique aiming at better in-domain reconstruction to compress convolutional neural networks	code
63	A Latent Morphology Model for Open-Vocabulary Neural Machine Translation	Duygu Ataman, Wilker Aziz, Alexandra Birch	In this paper, we propose to translate words by modeling word formation through a hierarchical latent variable model which mimics the process of morphological inflection.
64	Understanding Why Neural Networks Generalize Well Through GSNR of Parameters	Jinlong Liu, Yunzhi Bai, Guoqing Jiang, Ting Chen, Huayan Wang	In this paper, we provide a novel perspective on these issues using the gradient signal to noise ratio (GSNR) of parameters during training process of DNNs.
65	Model Based Reinforcement Learning for Atari	Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski	We use video prediction models, a model-based reinforcement learning algorithm and 2h of gameplay per game to train agents for 26 Atari games.	code
66	Disagreement-Regularized Imitation Learning	Kiante Brantley, Wen Sun, Mikael Henaff	Method for addressing covariate shift in imitation learning using ensemble uncertainty
67	Stable Rank Normalization for Improved Generalization in Neural Networks and GANs	Amartya Sanyal, Philip H. Torr, Puneet K. Dokania	We propose Stable Rank Normalisation, a new regularisor based on recent generelization bounds and show how to optimize it with extensive experiments.
68	Measuring the Reliability of Reinforcement Learning Algorithms	Stephanie C.Y. Chan, Anoop Korattikara, Sam Fishman, John Canny, Sergio Guadarrama	A novel set of metrics for measuring reliability of reinforcement learning algorithms (+ accompanying statistical tests)
69	Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue	Byeongchang Kim, Jaewoo Ahn, Gunhee Kim	Our approach is the first attempt to leverage a sequential latent variable model for knowledge selection in the multi-turn knowledge-grounded dialogue. It achieves the new state-of-the-art performance on Wizard of Wikipedia benchmark.
70	Neural Tangents: Fast and Easy Infinite Neural Networks in Python	Roman Novak, Lechao Xiao, Jiri Hron, Jaehoon Lee, Jascha Sohl-Dickstein, Samuel S. Schoenholz	Keras for infinite neural networks.	code
71	Self-labelling via simultaneous clustering and representation learning	Asano YM., Rupprecht C., Vedaldi A.	We propose a self-supervised learning formulation that simultaneously learns feature representations and useful dataset labels by optimizing the common cross-entropy loss for features _and_ labels, while maximizing information.
72	The intriguing role of module criticality in the generalization of deep networks	Niladri Chatterji, Behnam Neyshabur, Hanie Sedghi	We study the phenomenon that some modules of DNNs are more \emph{critical} than others. Our analysis leads us to propose a complexity measure, that is able to explain the superior generalization performance of some architectures over others.
73	Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks	Sanjeev Arora, Simon S. Du, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu	We verify neural tangent kernel is powerful on small data via experiments on UCI datasets, small CIFAR 10 and low-shot learning on VOC07.
74	Differentiation of Blackbox Combinatorial Solvers	Marin Vlastelica Pogancic, Anselm Paulus, Vit Musil, Georg Martius, Michal Rolinek	In this work, we present a method that implements an efficient backward pass through blackbox implementations of combinatorial solvers with linear objective functions.	code
75	Scaling Autoregressive Video Models	Dirk Weissenborn, Oscar T?ckstr?m, Jakob Uszkoreit	We present a novel autoregressive video generation that achieves strong results on popular datasets and produces encouraging continuations of real world videos.
76	The Ingredients of Real World Robotic Reinforcement Learning	Henry Zhu, Justin Yu, Abhishek Gupta, Dhruv Shah, Kristian Hartikainen, Avi Singh, Vikash Kumar, Sergey Levine	System to learn robotic tasks in the real world with reinforcement learning without instrumentation
77	Meta-Learning Acquisition Functions for Transfer Learning in Bayesian Optimization	Michael Volpp, Lukas Froehlich, Kirsten Fischer, Andreas Doerr, Stefan Falkner, Frank Hutter, Christian Daniel	We perform efficient and flexible transfer learning in the framework of Bayesian optimization through meta-learned neural acquisition functions.	code
78	Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning	Dexter R.R. Scobee, S. Shankar Sastry	Our method infers constraints on task execution by leveraging the principle of maximum entropy to quantify how demonstrations differ from expected, un-constrained behavior.	code
79	Spectral Embedding of Regularized Block Models	Nathan De Lara, Thomas Bonald	Graph regularization forces spectral embedding to focus on the largest clusters, making the representation less sensitive to noise.	code
80	Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models	Xisen Jin, Junyi Du, Zhongyu Wei, Xiangyang Xue, Xiang Ren	We propose measurement of phrase importance and algorithms for hierarchical explanation of neural sequence model predictions
81	word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement	Aliakbar Panahi, Seyran Saeedi, Tom Arodz	We use ideas from quantum computing to proposed word embeddings that utilize much fewer trainable parameters.
82	What Can Neural Networks Reason About?	Keyulu Xu, Jingling Li, Mozhi Zhang, Simon S. Du, Ken-ichi Kawarabayashi, Stefanie Jegelka	We develop a theoretical framework to characterize which reasoning tasks a neural network can learn well.	code
83	Training individually fair ML models with sensitive subspace robustness	Mikhail Yurochkin, Amanda Bower, Yuekai Sun	Algorithm for training individually fair classifier using adversarial robustness	code
84	Learning from Rules Generalizing Labeled Exemplars	Abhijeet Awasthi, Sabyasachi Ghosh, Rasna Goyal, Sunita Sarawagi	Coupled rule-exemplar supervision and a implication loss helps to jointly learn to denoise rules and imply labels.	code
85	Directional Message Passing for Molecular Graphs	Johannes Klicpera, Janek Gro?, Stephan G?nnemann	Directional message passing incorporates spatial directional information to improve graph neural networks.	code
86	Explanation by Progressive Exaggeration	Sumedha Singla, Brian Pollack, Junxiang Chen, Kayhan Batmanghelich	A method to explain a classifier, by generating visual perturbation of an image by exaggerating or diminishing the semantic features that the classifier associates with a target label.
87	Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network	Taiji Suzuki	In this paper, we give a unified frame-work that can convert compression based bounds to those for non-compressed original networks.
88	At Stability’s Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?	Niv Giladi, Mor Shpigel Nacson, Elad Hoffer, Daniel Soudry	How to prevent stale gradients (in asynchronous SGD) from changing minima stability and degrade steady state generalization?	code
89	Disentanglement through Nonlinear ICA with General Incompressible-flow Networks (GIN)	Peter Sorrenson, Ullrich K?the	Recent breakthrough work by Khemakhem et al. (2019) on nonlinear ICA has answered this question for a broad class of conditional generative processes. We extend this important result in a direction relevant for application to real-world data.
90	Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps	Tri Dao, Nimit Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder, Megan Leszczynski, Atri Rudra	We propose a differentiable family of “kaleidoscope matrices,” prove that all structured matrices can be represented in this form, and use them to replace hand-crafted linear maps in deep learning models.	code
91	Improving Generalization in Meta Reinforcement Learning using Neural Objectives	Louis Kirsch, Sjoerd van Steenkiste, Juergen Schmidhuber	We introduce MetaGenRL, a novel meta reinforcement learning algorithm. Unlike prior work, MetaGenRL can generalize to new environments that are entirely different from those used for meta-training.
92	Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks	Haoran You, Chaojian Li, Pengfei Xu, Yonggan Fu, Yue Wang, Xiaohan Chen, Yingyan Lin, Zhangyang Wang, Richard G. Baraniuk	In this paper, we discover for the first time that the winning tickets can be identified at the very early training stage, which we term as early-bird (EB) tickets, via low-cost training schemes (e.g., early stopping and low-precision training) at large learning rates.
93	Truth or backpropaganda? An empirical investigation of deep learning theory	Micah Goldblum, Jonas Geiping, Avi Schwarzschild, Michael Moeller, Tom Goldstein	We study the prevalence of local minima in loss landscapes, whether small-norm parameter vectors generalize better (and whether this explains the advantages of weight decay), whether wide-network theories (like the neural tangent kernel) describe the behaviors of classifiers, and whether the rank of weight matrices can be linked to generalization and robustness in real-world networks.
94	Neural Arithmetic Units	Andreas Madsen, Alexander Rosenberg Johansen	We present two new neural network components: the Neural Addition Unit (NAU), which can learn to add and subtract; and Neural Multiplication Unit (NMU) that can multiply subsets of a vector.	code
95	DeepSphere: a graph-based spherical CNN	Micha?l Defferrard, Martino Milani, Fr?d?rick Gusset, Nathana?l Perraudin	A graph-based spherical CNN that strikes an interesting balance of trade-offs for a wide variety of applications.
96	SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models	Yucen Luo, Alex Beatson, Mohammad Norouzi, Jun Zhu, David Duvenaud, Ryan P. Adams, Ricky T. Q. Chen	We create an unbiased estimator for the log probability of latent variable models, extending such models to a larger scope of applications.
97	Deep Learning For Symbolic Mathematics	Guillaume Lample, Fran?ois Charton	We train a neural network to compute function integrals, and to solve complex differential equations.
98	Making Sense of Reinforcement Learning and Probabilistic Inference	Brendan O’Donoghue, Ian Osband, Catalin Ionescu	Popular algorithms that cast `”RL as Inference” ignore the role of uncertainty and exploration. We highlight the importance of these issues and present a coherent framework for RL and inference that handles them gracefully.
99	Unbiased Contrastive Divergence Algorithm for Training Energy-Based Latent Variable Models	Yixuan Qiu, Lingsong Zhang, Xiao Wang	We have developed a new training algorithm for energy-based latent variable models that completely removes the bias of contrastive divergence.
100	A Mutual Information Maximization Perspective of Language Representation Learning	Lingpeng Kong, Cyprien de Masson d’Autume, Lei Yu, Wang Ling, Zihang Dai, Dani Yogatama	We provide an example by drawing inspirations from related methods based on mutual information maximization that have been successful in computer vision, and introduce a simple self-supervised objective that maximizes the mutual information between a global sentence representation and n-grams in the sentence.
101	Energy-based models for atomic-resolution protein conformations	Yilun Du, Joshua Meier, Jerry Ma, Rob Fergus, Alexander Rives	Energy-based models trained on crystallized protein structures predict native side chain configuration and automatically discover molecular energy features.
102	Depth-Width Trade-offs for ReLU Networks via Sharkovsky’s Theorem	Vaggos Chatziafratis, Sai Ganesh Nagarajan, Ioannis Panageas, Xiao Wang	In this work, we point to a new connection between DNNs expressivity and Sharkovsky?s Theorem from dynamical systems, that enables us to characterize the depth-width trade-offs of ReLU networks	code
103	Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint	Jimmy Ba, Murat Erdogdu, Taiji Suzuki, Denny Wu, Tianzong Zhang	Derived population risk of two-layer neural networks in high dimensions and examined presence / absence of “double descent”.
104	Reconstructing continuous distributions of 3D protein structure from cryo-EM images	Ellen D. Zhong, Tristan Bepler, Joseph H. Davis, Bonnie Berger	We propose a deep generative model of volumes for 3D cryo-EM reconstruction from unlabelled 2D images and show that it can learn can learn continuous deformations in protein structure.
105	PROGRESSIVE LEARNING AND DISENTANGLEMENT OF HIERARCHICAL REPRESENTATIONS	Zhiyuan Li, Jaideep Vitthal Murkute, Prashnna Kumar Gyawali, Linwei Wang	We proposed a progressive learning method to improve learning and disentangling latent representations at different levels of abstraction.
106	AN EXPONENTIAL LEARNING RATE SCHEDULE FOR BATCH NORMALIZED NETWORKS	Zhiyuan Li, Sanjeev Arora	We propose an exponential learning rate schedule for networks with BatchNorm, which surprisingly performs well in practice and is provably equivalent to popular LR schedules like Step Decay.
107	Geom-GCN: Geometric Graph Convolutional Networks	Hongbin Pei, Bingzhe Wei, Kevin Chen-Chuan Chang, Yu Lei, Bo Yang	From the observations on classical neural network and network geometry, we propose a novel geometric aggregation scheme for graph neural networks to overcome the two weaknesses.	code

TABLE 3: ICLR 2020 Posters

	Title	Authors	Highlight	Code
1	Large Batch Optimization for Deep Learning: Training BERT in 76 minutes	Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, Cho-Jui Hsieh	A fast optimizer for general applications and large-batch training.
2	SELF: Learning to Filter Noisy Labels with Self-Ensembling	Duc Tam Nguyen, Chaithanya Kumar Mummadi, Thi Phuong Nhung Ngo, Thi Hoai Phuong Nguyen, Laura Beggel, Thomas Brox	We propose a self-ensemble framework to train more robust deep learning models under noisy labeled datasets.
3	Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation	Yu Chen, Lingfei Wu, Mohammed J. Zaki	To address these limitations, in this paper, we propose a reinforcement learning (RL) based graph-to sequence (Graph2Seq) model for QG.
4	Sharing Knowledge in Multi-Task Deep Reinforcement Learning	Carlo D’Eramo, Davide Tateo, Andrea Bonarini, Marcello Restelli, Jan Peters	A study on the benefit of sharing representation in Multi-Task Reinforcement Learning.	code
5	On the Weaknesses of Reinforcement Learning for Neural Machine Translation	Leshem Choshen, Lior Fox, Zohar Aizenbud, Omri Abend	Reinforcment practices for machine translation performance gains might not come from better predictions.
6	StructPool: Structured Graph Pooling via Conditional Random Fields	Hao Yuan, Shuiwang Ji	A novel graph pooling method considering relationships between different nodes via conditional random fields.
7	Learning deep graph matching with channel-independent embedding and Hungarian attention	Tianshu Yu, Runzhong Wang, Junchi Yan, Baoxin Li	We proposed a deep graph matching method with novel channel-independent embedding and Hungarian loss, which achieved state-of-the-art performance.
8	Graph inference learning for semi-supervised classification	Chunyan Xu, Zhen Cui, Xiaobin Hong, Tong Zhang, Jian Yang, Wei Liu	We propose a novel graph inference learning framework by building structure relations to infer unknown node labels from those labeled nodes in an end-to-end way.
9	SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards	Siddharth Reddy, Anca D. Dragan, Sergey Levine	A simple and effective alternative to adversarial imitation learning: initialize experience replay buffer with demonstrations, set their reward to +1, set reward for all other data to 0, run Q-learning or soft actor-critic to train.
10	Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data	Sergei Popov, Stanislav Morozov, Artem Babenko	We propose a new DNN architecture for deep learning on tabular data	code
11	Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification	Yixiao Ge, Dapeng Chen, Hongsheng Li	A framework that conducts online refinement of pseudo labels with a novel soft softmax-triplet loss for unsupervised domain adaptation on person re-identification.	code
12	Automatically Discovering and Learning New Visual Categories with Ranking Statistics	Kai Han, Sylvestre-Alvise Rebuffi, Sebastien Ehrhardt, Andrea Vedaldi, Andrew Zisserman	A method to automatically discover new categories in unlabelled data, by effectively transferring knowledge from labelled data of other different categories using feature rank statistics.
13	Maxmin Q-learning: Controlling the Estimation Bias of Q-learning	Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White	We propose a new variant of Q-learning algorithm called Maxmin Q-learning which provides a parameter-tuning mechanism to flexibly control bias.
14	Federated Adversarial Domain Adaptation	Xingchao Peng, Zijun Huang, Yizhe Zhu, Kate Saenko	we present a principled approach to the problem of federated domain adaptation, which aims to align the representations learned among the different nodes with the data distribution of the target node.	code
15	Depth-Adaptive Transformer	Maha Elbayad, Jiatao Gu, Edouard Grave, Michael Auli	Sequence model that dynamically adjusts the amount of computation for each input.
16	DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures	Huanrui Yang, Wei Wen, Hai Li	We propose almost everywhere differentiable and scale invariant regularizers for DNN pruning, which can lead to supremum sparsity through standard SGD training.
17	Evaluating The Search Phase of Neural Architecture Search	Kaicheng Yu, Christian Sciuto, Martin Jaggi, Claudiu Musat, Mathieu Salzmann	We empirically disprove a fundamental hypothesis of the widely-adopted weight sharing strategy in neural architecture search and explain why the state-of-the-arts NAS algorithms performs similarly to random search.
18	Diverse Trajectory Forecasting with Determinantal Point Processes	Ye Yuan, Kris M. Kitani	We learn a diversity sampling function with DPPs to obtain a diverse set of samples from a generative model.
19	Prox-SGD: Training Structured Neural Networks under Regularization and Constraints	Yang Yang, Yaxiong Yuan, Avraam Chatzimichailidis, Ruud JG van Sloun, Lei Lei, Symeon Chatzinotas	We propose a convergent proximal-type stochastic gradient descent algorithm for constrained nonsmooth nonconvex optimization problems
20	LAMAL: LAnguage Modeling Is All You Need for Lifelong Language Learning	Fan-Keng Sun, Cheng-Hao Ho, Hung-Yi Lee	Language modeling is all you need for lifelong language learning.	code
21	Learning Expensive Coordination: An Event-Based Deep RL Approach	Zhenyu Shi, Runsheng Yu, Xinrun Wang, Rundong Wang, Youzhi Zhang, Hanjiang Lai, Bo An	We propose an event-based policy gradient to train the leader and an action abstraction policy gradient to train the followers in leader-follower Markov game.
22	Curvature Graph Network	Ze Ye, Kin Sum Liu, Tengfei Ma, Jie Gao, Chao Chen	We propose a novel network architecture that incorporates advanced graph structural features.
23	Distance-Based Learning from Errors for Confidence Calibration	Chen Xing, Sercan Arik, Zizhao Zhang, Tomas Pfister	To improve confidence calibration of DNNs, we propose a novel training method, distance-based learning from errors (DBLE).	code
24	Deep Learning of Determinantal Point Processes via Proper Spectral Sub-gradient	Tianshu Yu, Yikang Li, Baoxin Li	We proposed a specific back-propagation method via proper spectral sub-gradient to integrate determinantal point process to deep learning framework.
25	N-BEATS: Neural basis expansion analysis for interpretable time series forecasting	Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, Yoshua Bengio	A novel deep interpretable architecture that achieves state of the art on three large scale univariate time series forecasting datasets
26	Automated Relational Meta-learning	Huaxiu Yao, Xian Wu, Zhiqiang Tao, Yaliang Li, Bolin Ding, Zhenhui Li	Addressing task heterogeneity problem in meta-learning by introducing meta-knowledge graph
27	To Relieve Your Headache of Training an MRF, Take AdVIL	Chongxuan Li, Chao Du, Kun Xu, Max Welling, Jun Zhu, Bo Zhang	We propose a black-box algorithm called AdVIL to perform inference and learning on a general Markov random field.	code
28	Linear Symmetric Quantization of Neural Networks for Low-precision Integer Hardware	Xiandong Zhao, Ying Wang, Xuyi Cai, Cheng Liu, Lei Zhang	We introduce an efficient quantization process that allows for performance acceleration on specialized integer-only neural network accelerator.	code
29	Weakly Supervised Clustering by Exploiting Unique Class Count	Mustafa Umit Oner, Hwee Kuan Lee, Wing-Kin Sung	A weakly supervised learning based clustering framework performs comparable to that of fully supervised learning models by exploiting unique class count.	code
30	Scalable and Order-robust Continual Learning with Additive Parameter Decomposition	Jaehong Yoon, Saehoon Kim, Eunho Yang, Sung Ju Hwang	To tackle these practical challenges, we propose a novel continual learning method that is scalable as well as order-robust, which instead of learning a completely shared set of weights, represents the parameters for each task as a sum of task-shared and sparse task-adaptive parameters.	code
31	Continual Learning with Adaptive Weights (CLAW)	Tameem Adel, Han Zhao, Richard E. Turner	A continual learning framework which learns to automatically adapt its architecture based on a proposed variational inference algorithm.
32	Transferable Perturbations of Deep Feature Distributions	Nathan Inkawhich, Kevin Liang, Lawrence Carin, Yiran Chen	We show that perturbations based-on intermediate feature distributions yield more transferable adversarial examples and allow for analysis of the affects of adversarial perturbations on intermediate representations.
33	A Learning-based Iterative Method for Solving Vehicle Routing Problems	Hao Lu, Xingwen Zhang, Shuang Yang	In this paper, we present the first learning based approach for CVRP that is efficient in solving speed and at the same time outperforms OR methods.
34	Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring	Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, Jason Weston	In this work, we develop a new transformer architecture, the Poly-encoder, that learns global rather than token level self-attention features.
35	AutoQ: Automated Kernel-Wise Neural Network Quantization	Qian Lou, Feng Guo, Minje Kim, Lantao Liu, Lei Jiang.	Accurate, Fast and Automated Kernel-Wise Neural Network Quantization with Mixed Precision using Hierarchical Deep Reinforcement Learning
36	Understanding Architectures Learnt by Cell-based Neural Architecture Search	Yao Shu, Wei Wang, Shaofeng Cai	We empirically and theoretically show that the common connection pattern contributes to a smooth loss landscape and more accurate gradient information, and therefore fast convergence.
37	SVQN: Sequential Variational Soft Q-Learning Networks	Shiyu Huang, Hang Su, Jun Zhu, Ting Chen	SVQNs formalizes the inference of hidden states and maximum entropy reinforcement learning under a unified graphical model and optimizes the two modules jointly.
38	Ranking Policy Gradient	Kaixiang Lin, Jiayu Zhou	We propose ranking policy gradient that learns the optimal rank of actions to maximize return. We propose a general off-policy learning framework with the properties of optimality preserving, variance reduction, and sample-efficiency.
39	On Mutual Information Maximization for Representation Learning	Michael Tschannen, Josip Djolonga, Paul K. Rubenstein, Sylvain Gelly, Mario Lucic	The success of recent mutual information (MI)-based representation learning approaches strongly depends on the inductive bias in both the choice of network architectures and the parametrization of the employed MI estimators.	code
40	Observational Overfitting in Reinforcement Learning	Xingyou Song, Yiding Jiang, Yilun Du, Behnam Neyshabur	We isolate one factor of RL generalization by analyzing the case when the agent only overfits to the observations. We show that architectural implicit regularizations occur in this regime.
41	Enhancing Transformation-Based Defenses Against Adversarial Attacks with a Distribution Classifier	Connie Kou, Hwee Kuan Lee, Teck Khim Ng, Ee-Chien Chang	We enhance existing transformation-based defenses by using a distribution classifier on the distribution of softmax obtained from transformed images.
42	Additive Powers-of-Two Quantization: A Non-uniform Discretization for Neural Networks	Yuhang Li, Xin Dong, Wei Wang	We proposed Additive Powers-of-Two (APoT) quantization, an ef?cient nonuniform quantization scheme that attends to the bell-shaped and long-tailed distribution of weights in neural networks.
43	Lazy-CFR: fast and near-optimal regret minimization for extensive games with imperfect information	Yichi Zhou, Tongzheng Ren, Jialian Li, Dong Yan, Jun Zhu	In this paper, we present Lazy-CFR, a CFR algorithm that adopts a lazy update strategy to avoid traversing the whole game tree in each round.
44	Knowledge Consistency between Neural Networks and Beyond	Ruofan Liang, Tianlin Li, Longfei Li, Quanshi Zhang	This paper aims to analyze knowledge consistency between pre-trained deep neural networks.
45	Image-guided Neural Object Rendering	Justus Thies, Michael Zollh?fer, Christian Theobalt, Marc Stamminger, Matthias Nie?ner	We propose a learned image-guided rendering technique that combines the benefits of image-based rendering and GAN-based image synthesis while considering view-dependent effects.
46	Implicit Bias of Gradient Descent based Adversarial Training on Separable Data	Yan Li, Ethan X.Fang, Huan Xu, Tuo Zhao	The solution of gradient descent based adversarial training converges in direction to a robust max margin solution that is adapted to adversary geometry, using L2 perturbation also shows significant speed-up in convergence compared to clean training.
47	TabFact: A Large-scale Dataset for Table-based Fact Verification	Wenhu Chen, Hongmin Wang, Jianshu Chen, Yunkai Zhang, Hong Wang, Shiyang Li, Xiyou Zhou, William Yang Wang	We propose a new dataset to investigate the entailment problem under semi-structured table as premise
48	ES-MAML: Simple Hessian-Free Meta Learning	Xingyou Song, Wenbo Gao, Yuxiang Yang, Krzysztof Choromanski, Aldo Pacchiano, Yunhao Tang	We provide a new framework for MAML in the ES/blackbox setting, and show that it allows deterministic and linear policies, better exploration, and non-differentiable adaptation operators.
49	Neural Stored-program Memory	Hung Le, Truyen Tran, Svetha Venkatesh	A neural simulation of Universal Turing Machine
50	Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation	Suraj Nair, Chelsea Finn	Hierarchical visual foresight learns to generate visual subgoals that break down long-horizon tasks into subtasks, using only self-supervision.	code
51	Multi-agent Reinforcement Learning for Networked System Control	Tianshu Chu, Sandeep Chinchali, Sachin Katti	This paper proposes a new formulation and a new communication protocol for networked multi-agent control problems	code
52	FSPool: Learning Set Representations with Featurewise Sort Pooling	Yan Zhang, Jonathon Hare, Adam Pr?gel-Bennett	Sort in encoder and undo sorting in decoder to avoid responsibility problem in set auto-encoders	code
53	Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction	Taeuk Kim, Jihun Choi, Daniel Edmiston, Sang-goo Lee	In line with such interest, we propose a novel method that assists us in investigating the extent to which pre-trained LMs capture the syntactic notion of constituency.
54	Dynamically Pruned Message Passing Networks for Large-scale Knowledge Graph Reasoning	Xiaoran Xu, Wei Feng, Yunsheng Jiang, Xiaohui Xie, Zhiqing Sun, Zhi-Hong Deng	We propose to learn an input-dependent subgraph, dynamically and selectively expanded, to explicitly model a sequential reasoning process.	code
55	Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks	Tianyu Pang, Kun Xu, Jun Zhu	We exploit the global linearity of the mixup-trained models in inference to break the locality of the adversarial perturbations.	code
56	Theory and Evaluation Metrics for Learning Disentangled Representations	Kien Do, Truyen Tran	We make two theoretical contributions to disentanglement learning by (a) defining precise semantics of disentangled representations, and (b) establishing robust metrics for evaluation.
57	Measuring Compositional Generalization: A Comprehensive Method on Realistic Data	Daniel Keysers, Nathanael Sch?rli, Nathan Scales, Hylke Buisman, Daniel Furrer, Sergii Kashubin, Nikola Momchev, Danila Sinopalnikov, Lukasz Stafiniak, Tibor Tihon, Dmitry Tsarkov, Xiao Wang, Marc van Zee, Olivier Bousquet	Benchmark and method to measure compositional generalization by maximizing divergence of compound frequency at small divergence of atom frequency.	code
58	Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness	Tianyu Pang, Kun Xu, Yinpeng Dong, Chao Du, Ning Chen, Jun Zhu	Applying the softmax function in training leads to indirect and unexpected supervision on features. We propose a new training objective to explicitly induce dense feature regions for locally sufficient samples to benefit adversarial robustness.	code
59	The Implicit Bias of Depth: How Incremental Learning Drives Generalization	Daniel Gissin, Shai Shalev-Shwartz, Amit Daniely	We study the sparsity-inducing bias of deep models, caused by their learning dynamics.
60	The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget	Anirudh Goyal, Yoshua Bengio, Matthew Botvinick, Sergey Levine	Training agents with adaptive computation based on information bottleneck can promote generalization.
61	Learning the Arrow of Time for Problems in Reinforcement Learning	Nasim Rahaman, Steffen Wolf, Anirudh Goyal, Roman Remme, Yoshua Bengio	We learn the arrow of time for MDPs and use it to measure reachability, detect side-effects and obtain a curiosity reward signal.	code
62	Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives	Anirudh Goyal, Shagun Sodhani, Jonathan Binas, Xue Bin Peng, Sergey Levine, Yoshua Bengio	Learning an implicit master policy, as a master policy in HRL can fail to generalize.
63	Robust Local Features for Improving the Generalization of Adversarial Training	Chuanbiao Song, Kun He, Jiadong Lin, Liwei Wang, John E. Hopcroft	We propose a new stream of adversarial training approach called Robust Local Features for Adversarial Training (RLFAT) that significantly improves both the adversarially robust generalization and the standard generalization.	code
64	Analysis of Video Feature Learning in Two-Stream CNNs on the Example of Zebrafish Swim Bout Classification	Bennet Breier, Arno Onken	We demonstrate the utility of a recent AI explainability technique by visualizing the learned features of a CNN trained on binary classification of zebrafish movements.	code
65	Learning Disentangled Representations for CounterFactual Regression	Negar Hassanpour, Russell Greiner	This paper is an attempt to conceptualize this line of thought and provide a path to explore it further.In this work, we propose an algorithm to (1) identify disentangled representations of the above-mentioned underlying factors from any given observational dataset D and (2) leverage this knowledge to reduce, as well as account for, the negative impact of selection bias on estimating the treatment effects from D.	code
66	Exploration in Reinforcement Learning with Deep Covering Options	Yuu Jinnai, Jee Won Park, Marlos C. Machado, George Konidaris	We introduce a method to automatically discover task-agnostic options that encourage exploration for reinforcement learning.
67	AE-OT: A NEW GENERATIVE MODEL BASED ON EXTENDED SEMI-DISCRETE OPTIMAL TRANSPORT	Dongsheng An, Yang Guo, Na Lei, Zhongxuan Luo, Shing-Tung Yau, Xianfeng Gu	In this work, wegive a theoretic explanation of the both problems by Figalli?s regularity theory ofoptimal transportation maps.
68	Logic and the 2-Simplicial Transformer	James Clift, Dmitry Doryn, Daniel Murfet, James Wallbridge	We introduce the 2-simplicial Transformer and show that this architecture is a useful inductive bias for logical reasoning in the context of deep reinforcement learning.	code
69	Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards	Allan Zhou, Eric Jang, Daniel Kappler, Alex Herzog, Mohi Khansari, Paul Wohlhart, Yunfei Bai, Mrinal Kalakrishnan, Sergey Levine, Chelsea Finn	In this work, we propose a method that can learn to learn from both demonstrations and trial-and-error experience with sparse reward feedback.	code
70	Fooling Detection Alone is Not Enough: Adversarial Attack against Multiple Object Tracking	Yunhan Jia, Yantao Lu, Junjie Shen, Qi Alfred Chen, Hao Chen, Zhenyu Zhong, Tao Wei	We study the adversarial machine learning attacks against the Multiple Object Tracking mechanisms for the first time.	code
71	DivideMix: Learning with Noisy Labels as Semi-supervised Learning	Junnan Li, Steven C.H. Hoi, Richard Socher	We propose a novel framework for learning with noisy labels by leveraging semi-supervised learning.
72	Improving Adversarial Robustness Requires Revisiting Misclassified Examples	Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma, Quanquan Gu	By differentiating misclassified and correctly classified data, we propose a new misclassification aware defense that improves the state-of-the-art adversarial robustness.
73	V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control	H. Francis Song, Abbas Abdolmaleki, Jost Tobias Springenberg, Aidan Clark, Hubert Soyer, Jack W. Rae, Seb Noury, Arun Ahuja, Siqi Liu, Dhruva Tirumala, Nicolas Heess, Dan Belov, Martin Riedmiller, Matthew M. Botvinick	A state-value function-based version of MPO that achieves good results in a wide range of tasks in discrete and continuous control.
74	Attributes Obfuscation with Complex-Valued Features	Liyao Xiang, Hao Zhang, Haotian Ma, Yifan Zhang, Jie Ren, Quanshi Zhang	We propose a generic method to revise a conventional neural network to boost the challenge of adversarially inferring about the input but still yields useful outputs.
75	Accelerating SGD with momentum for over-parameterized learning	Chaoyue Liu, Mikhail Belkin	This work proves the non-acceleration of Nesterov SGD with any hyper-parameters, and proposes new algorithm which provably accelerates SGD in the over-parameterized setting.	code
76	A critical analysis of self-supervision, or what we can learn from a single image	Asano YM., Rupprecht C., Vedaldi A.	We evaluate self-supervised feature learning methods and find that with sufficient data augmentation early layers can be learned using just one image. This is informative about self-supervision and the role of augmentations.
77	Disentangling Factors of Variations Using Few Labels	Francesco Locatello, Michael Tschannen, Stefan Bauer, Gunnar R?tsch, Bernhard Sch?lkopf, Olivier Bachem	In this paper, we investigate the impact of such supervision on state-of-the-art disentanglement methods and perform a large scale study, training over 52000 models under well-defined and reproducible experimental conditions.
78	Functional vs. parametric equivalence of ReLU networks	Mary Phuong, Christoph H. Lampert	We prove that there exist ReLU networks whose parameters are almost uniquely determined by the function they implement.
79	Input Complexity and Out-of-distribution Detection with Likelihood-based Generative Models	Joan Serr?, David ?lvarez, Vicen? G?mez, Olga Slizovskaia, Jos? F. N??ez, Jordi Luque	We pose that generative models’ likelihoods are excessively influenced by the input’s complexity, and propose a way to compensate it when detecting out-of-distribution inputs
80	RTFM: Generalising to New Environment Dynamics via Reading	Victor Zhong, Tim Rockt?schel, Edward Grefenstette	We show language understanding via reading is promising way to learn policies that generalise to new environments.
81	What graph neural networks cannot learn: depth vs width	Andreas Loukas	Several graph problems are impossible unless the product of a graph neural network’s depth and width exceeds (a function of) the graph size.
82	Progressive Memory Banks for Incremental Domain Adaptation	Nabiha Asghar, Lili Mou, Kira A. Selby, Kevin D. Pantasdo, Pascal Poupart, Xin Jiang	We present a neural memory-based architecture for incremental domain adaptation, and provide theoretical and empirical results.	code
83	Automated curriculum generation through setter-solver interactions	Andrew Lampinen, Sebastien Racaniere, Adam Santoro, David Reichert, Vlad Firoiu, Timothy Lillicrap	We investigate automatic curriculum generation and identify a number of losses useful to learn to generate a curriculum of tasks.
84	On Identifiability in Transformers	Gino Brunner, Yang Liu, Damian Pascual, Oliver Richter, Massimiliano Ciaramita, Roger Wattenhofer	We investigate the identifiability and interpretability of attention distributions and tokens within contextual embeddings in the self-attention based BERT model.
85	Exploring Model-based Planning with Policy Networks	Tingwu Wang, Jimmy Ba	how to achieve state-of-the-art performance by combining policy network in model-based planning
86	Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling	Yuping Luo, Huazhe Xu, Tengyu Ma	We introduce a notion of conservatively-extrapolated value functions, which provably lead to policies that can self-correct to stay close to the demonstration states, and learn them with a novel negative sampling technique.
87	Geometric Insights into the Convergence of Nonlinear TD Learning	David Brandfonbrener, Joan Bruna	Here we take a first step towards extending theoretical convergence guarantees to TD learning with nonlinear function approximation.
88	Few-shot Text Classification with Distributional Signatures	Yujia Bao, Menghua Wu, Shiyu Chang, Regina Barzilay	Meta-learning methods used for vision, directly applied to NLP, perform worse than nearest neighbors on new classes; we can do better with distributional signatures.	code
89	Escaping Saddle Points Faster with Stochastic Momentum	Jun-Kun Wang, Chi-Heng Lin, Jacob Abernethy	Higher momentum parameter $\beta$ helps for escaping saddle points faster
90	Adversarial Policies: Attacking Deep Reinforcement Learning	Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell	Deep RL policies can be attacked by other agents taking actions so as to create natural observations that are adversarial.	code
91	VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation	Manoj Kumar, Mohammad Babaeizadeh, Dumitru Erhan, Chelsea Finn, Sergey Levine, Laurent Dinh, Durk Kingma	We demonstrate that flow-based generative models offer a viable and competitive approach to generative modeling of video.	code
92	GLAD: Learning Sparse Graph Recovery	Harsh Shrivastava, Xinshi Chen, Binghong Chen, Guanghui Lan, Srinivas Aluru, Han Liu, Le Song	A data-driven learning algorithm based on unrolling the Alternating Minimization optimization for sparse graph recovery.	code
93	Pruned Graph Scattering Transforms	Vassilis N. Ioannidis, Siheng Chen, Georgios B. Giannakis	The present work addresses some limitations of GSTs by introducing a novel so-termed pruned (p)GST approach.
94	Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model	Wenhan Xiong, Jingfei Du, William Yang Wang, Veselin Stoyanov	In this work, we further investigate the extent to which pretrained models such as BERT capture knowledge using a zero-shot fact completion task.
95	Can gradient clipping mitigate label noise?	Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar	Gradient clipping doesn’t endow robustness to label noise, but a simple loss-based variant does.
96	Editable Neural Networks	Anton Sinitsin, Vsevolod Plokhotnyuk, Dmitry Pyrkin, Sergei Popov, Artem Babenko	Training neural networks so you can efficiently patch them later.	code
97	LEARNING EXECUTION THROUGH NEURAL CODE FUSION	Zhan Shi, Kevin Swersky, Daniel Tarlow, Parthasarathy Ranganathan, Milad Hashemi	In this work, wepropose a new approach using GNNs to learn fused representations of generalsource code and its execution.	code
98	FasterSeg: Searching for Faster Real-time Semantic Segmentation	Wuyang Chen, Xinyu Gong, Xianming Liu, Qian Zhang, Yuan Li, Zhangyang Wang	We present a real-time segmentation model automatically discovered by a multi-scale NAS framework, achieving 30% faster than state-of-the-art models.	code
99	Difference-Seeking Generative Adversarial Network–Unseen Sample Generation	Yi Lin Sung, Sung-Hsien Hsieh, Soo-Chang Pei, Chun-Shien Lu	We proposed a novel GAN framework to generate unseen data.	code
100	Stochastic AUC Maximization with Deep Neural Networks	Mingrui Liu, Zhuoning Yuan, Yiming Ying, Tianbao Yang	The paper designs two algorithms for the stochastic AUC maximization problem with state-of-the-art complexities when using deep neural network as predictive model, which are also verified by empirical studies.	code
101	Semantically-Guided Representation Learning for Self-Supervised Monocular Depth	Vitor Guizilini, Rui Hou, Jie Li, Rares Ambrus, Adrien Gaidon	We propose a novel semantically-guided architecture for self-supervised monocular depth estimation
102	MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius	Runtian Zhai, Chen Dan, Di He, Huan Zhang, Boqing Gong, Pradeep Ravikumar, Cho-Jui Hsieh, Liwei Wang	We propose MACER: a provable defense algorithm that trains robust models by maximizing the certified radius. It does not use adversarial training but performs better than all existing provable l2-defenses.	code
103	Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions	Yao Qin, Nicholas Frosst, Sara Sabour, Colin Raffel, Garrison Cottrell, Geoffrey Hinton	These suggest that CapsNets use features that are more aligned with human perception and address the central issue raised by adversarial examples.
104	Adversarial Example Detection and Classification with Asymmetrical Adversarial Training	Xuwang Yin, Soheil Kolouri, Gustavo K Rohde	A new generative modeling technique based on asymmetrical adversarial training, and its applications to adversarial example detection and robust classification	code
105	Variational Recurrent Models for Solving Partially Observable Control Tasks	Dongqi Han, Kenji Doya, Jun Tani	A deep RL algorithm for solving POMDPs by auto-encoding the underlying states using a variational recurrent model	code
106	Population-Guided Parallel Policy Search for Reinforcement Learning	Whiyoung Jung, Giseung Park, Youngchul Sung	In this paper, a new population-guided parallel learning scheme is proposed to enhance the performance of off-policy reinforcement learning (RL).
107	Compositional languages emerge in a neural iterated learning model	Yi Ren, Shangmin Guo, Matthieu Labeau, Shay B. Cohen, Simon Kirby	Use iterated learning framework to facilitate the dominance of high compositional language in multi-agent games.
108	Black-Box Adversarial Attack with Transferable Model-based Embedding	Zhichao Huang, Tong Zhang	We present a new method that combines transfer-based and scored black-box adversarial attack, improving the success rate and query efficiency of black-box adversarial attack across different network architectures.	code
109	I Am Going MAD: Maximum Discrepancy Competition for Comparing Classifiers Adaptively	Haotao Wang, Tianlong Chen, Zhangyang Wang, Kede Ma	We present an efficient and adaptive framework for comparing image classifiers to maximize the discrepancies between the classifiers, in place of comparing on fixed test sets.
110	Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models	Cheolhyoung Lee, Kyunghyun Cho, Wanmo Kang	In this paper, we introduce a new regularization technique, to which we refer as ?mixout?, motivated by dropout.
111	Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP	Yuanhao Wang, Kefan Dong, Xiaoyu Chen, Liwei Wang	We adapt Q-learning with UCB-exploration bonus to infinite-horizon MDP with discounted rewards without accessing a generative model, and improves the previously best known result.
112	Deep Network classification by Scattering and Homotopy dictionary learning	John Zarka, Louis Thiry, Tomas Angles, Stephane Mallat	A scattering transform followed by supervised dictionary learning reaches a higher accuracy than AlexNet on ImageNet.
113	Data-Independent Neural Pruning via Coresets	Ben Mussay, Margarita Osadchy, Vladimir Braverman, Samson Zhou, Dan Feldman	We propose an efficient, provable and data independent method for network compression via neural pruning using coresets of neurons — a novel construction proposed in this paper.
114	Bounds on Over-Parameterization for Guaranteed Existence of Descent Paths in Shallow ReLU Networks	Arsalan Sharifnassab, Saber Salehkaleybar, S. Jamaloddin Golestani	In this perspective, our results provide a somewhat sharp characterization of the over-parameterization required for “existence of descent paths” in the loss landscape.
115	Novelty Detection Via Blurring	Sungik Choi, Sae-Young Chung	We propose a novel OOD detector that employ blurred images as adversarial examples . Our model achieve significant OOD detection performance in various domains.
116	Nonlinearities in activations substantially shape the loss surfaces of neural networks	Fengxiang He, Bohan Wang, Dacheng Tao	This paper presents how the loss surfaces of nonlinear neural networks are substantially shaped by the nonlinearities in activations.
117	Relational State-Space Model for Stochastic Multi-Object Systems	Fan Yang, Ling Chen, Fan Zhou, Yusong Gao, Wei Cao	A deep hierarchical state-space model in which the state transitions of correlated objects are coordinated by graph neural networks.
118	Learning Efficient Parameter Server Synchronization Policies for Distributed SGD	Rong Zhu, Sheng Yang, Andreas Pfadler, Zhengping Qian, Jingren Zhou	We apply a reinforcement learning based approach to learning optimal synchronization policies used for Parameter Server-based distributed training of SGD.
119	Action Semantics Network: Considering the Effects of Actions in Multiagent Systems	Weixun Wang, Tianpei Yang, Yong Liu, Jianye Hao, Xiaotian Hao, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao	Our proposed ASN characterizes different actions’ influence on other agents using neural networks based on the action semantics between them.	code
120	Vid2Game: Controllable Characters Extracted from Real-World Videos	Oran Gafni, Lior Wolf, Yaniv Taigman	We extract a controllable model from a video of a person performing a certain activity.
121	Self-Adversarial Learning with Comparative Discrimination for Text Generation	Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou	We propose a self-adversarial learning (SAL) paradigm which improves the generator in a self-play fashion for improving GANs’ performance in text generation.
122	Robust training with ensemble consensus	Jisoo Lee, Sae-Young Chung	This work presents a method of generating and using ensembles effectively to identify noisy examples in the presence of annotation noise.
123	Identifying through Flows for Recovering Latent Representations	Shen Li, Bryan Hooi, Gim Hee Lee	In contrast, we propose an identifiable framework for estimating latent representations using a flow-based model (iFlow).
124	Certified Robustness for Top-k Predictions against Adversarial Perturbations via Randomized Smoothing	Jinyuan Jia, Xiaoyu Cao, Binghui Wang, Neil Zhenqiang Gong	We study the certified robustness for top-k predictions via randomized smoothing under Gaussian noise and derive a tight robustness bound in L_2 norm.
125	Optimistic Exploration even with a Pessimistic Initialisation	Tabish Rashid, Bei Peng, Wendelin Boehmer, Shimon Whiteson	We augment the Q-value estimates with a count-based bonus that ensures optimism during action selection and bootstrapping, even if the Q-value estimates are pessimistic.
126	VL-BERT: Pre-training of Generic Visual-Linguistic Representations	Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai	VL-BERT is a simple yet powerful pre-trainable generic representation for visual-linguistic tasks. It is pre-trained on the massive-scale caption dataset and text-only corpus, and can be finetuned for varies down-stream visual-linguistic tasks.
127	Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation	Hang Gao, Xizhou Zhu, Stephen Lin, Jifeng Dai	Don’t deform your convolutions — deform your kernels.
128	Ensemble Distribution Distillation	Andrey Malinin, Bruno Mlodozeniec, Mark Gales	We distill an ensemble of models into a single model, capturing both the improved classification performance and information about the diversity of the ensemble, which is useful for uncertainty estimation.
129	Gap-Aware Mitigation of Gradient Staleness	Saar Barkai, Ido Hakimi, Assaf Schuster	A new distributed, asynchronous, SGD-based algorithm, which achieves state-of-the-art accuracy on existing architectures using staleness penalization without having to re-tune the hyperparameters.	code
130	Counterfactuals uncover the modular structure of deep generative models	Michel Besserve, Arash Mehrjou, Remy Sun, Bernhard Schoelkopf	We develop a framework to find modular internal representations in generative models and manipulate then to generate counterfactual examples.	code
131	Physics-as-Inverse-Graphics: Unsupervised Physical Parameter Estimation from Video	Miguel Jaques, Michael Burke, Timothy Hospedales	We propose a model that is able to perform physical parameter estimation of systems from video, where the differential equations governing the scene dynamics are known, but labeled states or objects are not available.
132	An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality	Silviu Pitis, Harris Chan, Kiarash Jamali, Jimmy Ba	We propose novel neural network architectures, guaranteed to satisfy the triangle inequality, for purposes of (asymmetric) metric learning and modeling graph distances.
133	A Constructive Prediction of the Generalization Error Across Scales	Jonathan S. Rosenfeld, Amir Rosenfeld, Yonatan Belinkov, Nir Shavit	We predict the generalization error and specify the model which attains it across model/data scales.
134	Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base	William W. Cohen, Haitian Sun, R. Alex Hofer, Matthew Siegler	A scalable differentiable neural module that implements reasoning on symbolic KBs.
135	CLN2INV: Learning Loop Invariants with Continuous Logic Networks	Gabriel Ryan, Justin Wong, Jianan Yao, Ronghui Gu, Suman Jana	We introduce the Continuous Logic Network (CLN), a novel neural architecture for automatically learning loop invariants and general SMT formulas.
136	NAS evaluation is frustratingly hard	Antoine Yang, Pedro M. Esperan?a, Fabio M. Carlucci	A study of how different components in the NAS pipeline contribute to the final accuracy. Also, a benchmark of 8 methods on 5 datasets.	code
137	Efficient and Information-Preserving Future Frame Prediction and Beyond	Wei Yu, Yichao Lu, Steve Easterbrook, Sanja Fidler	We propose CrevNet, a Conditionally Reversible Network that uses reversible architectures to build a bijective two-way autoencoder and its complementary recurrent predictor.	code
138	Order Learning and Its Application to Age Estimation	Kyungsun Lim, Nyeong-Ho Shin, Young-Yoon Lee, Chang-Su Kim	The notion of order learning is proposed and it is applied to regression problems in computer vision
139	ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning	Weihao Yu, Zihang Jiang, Yanfei Dong, Jiashi Feng	We introduce ReClor, a reading comprehension dataset requiring logical reasoning, and find that current state-of-the-art models struggle with real logical reasoning with poor performance near that of random guess.
140	AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures	Michael S. Ryoo, AJ Piergiovanni, Mingxing Tan, Anelia Angelova	We search for multi-stream neural architectures with better connectivity and spatio-temporal interactions for video understanding.
141	Adversarially Robust Representations with Smooth Encoders	Taylan Cemgil, Sumedh Ghaisas, Krishnamurthy (Dj) Dvijotham, Pushmeet Kohli	We propose a method for computing adversarially robust representations in an entirely unsupervised way.
142	From Variational to Deterministic Autoencoders	Partha Ghosh, Mehdi S. M. Sajjadi, Antonio Vergari, Michael Black, Bernhard Scholkopf	Deterministic regularized autoencoders can learn a smooth, meaningful latent space as VAEs without having to force some arbitrarily chosen prior (i.e., Gaussian).
143	Computation Reallocation for Object Detection	Feng Liang, Ronghao Guo, Chen Lin, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang	We propose CR-NAS to reallocate engaged computation resources in different resolution and spatial position.
144	Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents	Christian Rupprecht, Cyril Ibrahim, Christopher J. Pal	We generate critical states of a trained RL algorithms to visualize potential weaknesses.
145	A Fair Comparison of Graph Neural Networks for Graph Classification	Federico Errica, Marco Podda, Davide Bacciu, Alessio Micheli	We provide a rigorous comparison of different Graph Neural Networks for graph classification.	code
146	Size-free generalization bounds for convolutional neural networks	Philip M. Long, Hanie Sedghi	We prove generalization bounds for convolutional neural networks that take account of weight-tying
147	SAdam: A Variant of Adam for Strongly Convex Functions	Guanghui Wang, Shiyin Lu, Quan Cheng, Weiwei Tu, Lijun Zhang	A variant of Adam for strongly convex functions	code
148	Continual Learning with Bayesian Neural Networks for Non-Stationary Data	Richard Kurle, Botond Cseke, Alexej Klushyn, Patrick van der Smagt, Stephan G?nnemann	This work addresses continual learning for non-stationary data, using Bayesian neural networks and memory-based online variational Bayes.
149	Multiplicative Interactions and Where to Find Them	Siddhant M. Jayakumar, Jacob Menick, Wojciech M. Czarnecki, Jonathan Schwarz, Jack Rae, Simon Osindero, Yee Whye Teh, Tim Harley, Razvan Pascanu	We explore the role of multiplicative interaction as a unifying framework to describe a range of classical and modern neural network architectural motifs, such as gating, attention layers, hypernetworks, and dynamic convolutions amongst others.
150	FEW-SHOT LEARNING ON GRAPHS VIA SUPER-CLASSES BASED ON GRAPH SPECTRAL MEASURES	Jatin Chauhan, Deepak Nathani, Manohar Kaul	We propose to study the problem of few-shot graph classification in graph neural networks (GNNs) to recognize unseen classes, given limited labeled graph examples.	code
151	ON COMPUTATION AND GENERALIZATION OF GENER- ATIVE ADVERSARIAL IMITATION LEARNING	Minshuo Chen, Yizhou Wang, Tianyi Liu, Zhuoran Yang, Xingguo Li, Zhaoran Wang, Tuo Zhao	To bridge such a gap between theory and practice, this paper investigates the theoretical properties of GAIL.
152	A TARGET-AGNOSTIC ATTACK ON DEEP MODELS: EXPLOITING SECURITY VULNERABILITIES OF TRANSFER LEARNING	Shahbaz Rezaei, Xin Liu	In this paper, we show that without any additional knowledge other than the pre-trained model, an attacker can launch an effective and efficient brute force attack that can craft instances of input to trigger each target class with high confidence.	code
153	Low-Resource Knowledge-Grounded Dialogue Generation	Xueliang Zhao, Wei Wu, Chongyang Tao, Can Xu, Dongyan Zhao, Rui Yan	Motivated by the challenge in practice, we consider knowledge-grounded dialogue generation under a natural assumption that only limited training examples are available.
154	Deep 3D Pan via Local adaptive “t-shaped” convolutions with global and local adaptive dilations	Juan Luis Gonzalez Bello, Munchurl Kim	Novel architecture for stereoscopic view synthesis at arbitrary camera shifts utilizing adaptive t-shaped kernels with adaptive dilations.
155	Tree-Structured Attention with Hierarchical Accumulation	Xuan-Phi Nguyen, Shafiq Joty	In this paper, we attempt to bridge this gap with Hierarchical Accumulation to encode parse tree structures into self-attention at constant time complexity.
156	The asymptotic spectrum of the Hessian of DNN throughout training	Arthur Jacot, Franck Gabriel, Clement Hongler	Description of the limiting spectrum of the Hesian of the loss surface of DNNs in the infinite-width limit.
157	Actor-Critic Provably Finds Nash Equilibria of Linear-Quadratic Mean-Field Games	Zuyue Fu, Zhuoran Yang, Yongxin Chen, Zhaoran Wang	Actor-Critic method with function approximation finds the Nash equilibrium pairs in mean-field games with theoretical guarantee.
158	In Search for a SAT-friendly Binarized Neural Network Architecture	Nina Narodytska, Hongce Zhang, Aarti Gupta, Toby Walsh	Formal analysis of Binarized Neural Networks
159	Generative Ratio Matching Networks	Akash Srivastava, Kai Xu, Michael U. Gutmann, Charles Sutton	In this work, we take their insight of using kernels as fixed adversaries further and present a novel method for training deep generative models that does not involve saddlepoint optimization.	code
160	Learning to Represent Programs with Property Signatures	Augustus Odena, Charles Sutton	We represent a computer program using a set of simpler programs and use this representation to improve program synthesis techniques.
161	V4D: 4D Covolutional Neural Networks for Video-level Representations Learning	Shiwen Zhang, Sheng Guo, Weilin Huang, Matthew R. Scott, Limin Wang	A novel 4D CNN structure for video-level representation learning, surpassing recent 3D CNNs.
162	Option Discovery using Deep Skill Chaining	Akhil Bagaria, George Konidaris	We present a new hierarchical reinforcement learning algorithm which can solve high-dimensional goal-oriented tasks far more reliably than non-hierarchical agents and other state-of-the-art skill discovery techniques.	code
163	Quantifying the Cost of Reliable Photo Authentication via High-Performance Learned Lossy Representations	Pawel Korus, Nasir Memon	We learn an efficient lossy image compression codec which can be optimized to facilitate reliable photo manipulation detection at fractional cost in payload/quality and even at low bitrates.
164	On the Variance of the Adaptive Learning Rate and Beyond	Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han	If warmup is the answer, what is the question?	code
165	Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery	Kristian Hartikainen, Xinyang Geng, Tuomas Haarnoja, Sergey Levine	We show how to automatically learn dynamical distances in reinforcement learning setting and use them to provide well-shaped reward functions for reaching new goals.
166	A Theoretical Analysis of the Number of Shots in Few-Shot Learning	Tianshi Cao, Marc T Law, Sanja Fidler	The paper analyzes the effect of shot number on prototypical networks and proposes a robust method when the shot number differs from meta-training to meta-testing time.
167	Unsupervised Model Selection for Variational Disentangled Representation Learning	Sunny Duan, Loic Matthey, Andre Saraiva, Nick Watters, Chris Burgess, Alexander Lerchner, Irina Higgins	We introduce a method for unsupervised disentangled model selection for VAE-based disentangled representation learning approaches.
168	Extracting and Leveraging Feature Interaction Interpretations	Michael Tsang, Dehua Cheng, Hanpeng Liu, Xue Feng, Eric Zhou, Yan Liu	Proposed a method to extract and leverage interpretations of feature interactions
169	Understanding the Limitations of Variational Mutual Information Estimators	Jiaming Song, Stefano Ermon	We theoretically show that, under some conditions, estimators such as MINE exhibit variance that could grow exponentially with the true amount of underlying MI.
170	GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations	Martin Engelcke, Adam R. Kosiorek, Oiwi Parker Jones, Ingmar Posner	We present the first object-centric generative model of 3D visual scenes capable of both decomposing and generating scenes.
171	Language GANs Falling Short	Massimo Caccia, Lucas Caccia, William Fedus, Hugo Larochelle, Joelle Pineau, Laurent Charlin	GANs have been applied to text generation and are believed SOTA. However, we propose a new evaluation protocol demonstrating that maximum-likelihood trained models are still better.	code
172	Stochastic Conditional Generative Networks with Basis Decomposition	Ze Wang, Xiuyuan Cheng, Guillermo Sapiro, Qiang Qiu	To address this, we introduce BasisGAN, a stochastic conditional multi-mode image generator.
173	LEARNED STEP SIZE QUANTIZATION	Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Appuswamy, Dharmendra S. Modha	A method for learning quantization configuration for low precision networks that achieves state of the art performance for quantized networks.
174	On the “steerability” of generative adversarial networks	Ali Jahanian, Lucy Chai, Phillip Isola	We show that although current GANs can fit standard datasets very well, they still fall short of being comprehensive models of the visual manifold.
175	Reinforced active learning for image segmentation	Arantxa Casanova, Pedro O. Pinheiro, Negar Rostamzadeh, Christopher J. Pal	Learning a labeling policy with reinforcement learning to reduce labeling effort for the task of semantic segmentation
176	Sign Bits Are All You Need for Black-Box Attacks	Abdullah Al-Dujaili, Una-May O’Reilly	We present a sign-based, rather than magnitude-based, gradient estimation approach that shifts gradient estimation from continuous to binary black-box optimization.	code
177	Deep Semi-Supervised Anomaly Detection	Lukas Ruff, Robert A. Vandermeulen, Nico G?rnitz, Alexander Binder, Emmanuel M?ller, Klaus-Robert M?ller, Marius Kloft	We introduce Deep SAD, a deep method for general semi-supervised anomaly detection that especially takes advantage of labeled anomalies.	code
178	Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints	Mengtian Li, Ersin Yumer, Deva Ramanan	Introduce a formal setting for budgeted training and propose a budget-aware linear learning rate schedule
179	Minimizing FLOPs to Learn Efficient Sparse Representations	Biswajit Paria, Chih-Kuan Yeh, Ning Xu, Barnabas Poczos, Pradeep Ravikumar, Ian E.H. Yen	We propose an approach to learn sparse high dimensional representations that are fast to search, by incorporating a surrogate of the number of operations directly into the loss function.
180	Reanalysis of Variance Reduced Temporal Difference Learning	Tengyu Xu, Zhe Wang, Yi Zhou, Yingbin Liang	This paper provides a rigorous study of the variance reduced TD learning and characterizes its advantage over vanilla TD learning
181	Imitation Learning via Off-Policy Distribution Matching	Ilya Kostrikov, Ofir Nachum, Jonathan Tompson	In this work, we show how the original distribution ratio estimation objective may be transformed in a principled manner to yield a completely off-policy objective.
182	Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML	Aniruddh Raghu, Maithra Raghu, Samy Bengio, Oriol Vinyals	The success of MAML relies on feature reuse from the meta-initialization, which also yields a natural simplification of the algorithm, with the inner loop removed for the network body, as well as other insights on the head and body.
183	Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical Space	AkshatKumar Nigam, Pascal Friederich, Mario Krenn, Alan Aspuru-Guzik	Tackling inverse design via genetic algorithms augmented with deep neural networks.
184	Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin	Colin Wei, Tengyu Ma	We propose a new notion of margin that has a direct relationship with neural net generalization, and obtain improved generalization bounds for neural nets and robust classification by analyzing this margin.
185	Identity Crisis: Memorization and Generalization Under Extreme Overparameterization	Chiyuan Zhang, Samy Bengio, Moritz Hardt, Michael C. Mozer, Yoram Singer	We study the interplay between memorization and generalization of overparameterized networks in the extreme case of a single training example and an identity-mapping task.
186	ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring	David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, Colin Raffel	We introduce Distribution Matching and Augmentation Anchoring, two improvements to MixMatch which produce state-of-the-art results and enable surprisingly strong performance with only 40 labels on CIFAR-10 and SVHN.
187	Adaptive Structural Fingerprints for Graph Attention Networks	Kai Zhang, Yaokang Zhu, Jun Wang, Jie Zhang	Exploiting rich strucural details in graph-structued data via adaptive “strucutral fingerprints”	code
188	CAQL: Continuous Action Q-Learning	Moonkyung Ryu, Yinlam Chow, Ross Anderson, Christian Tjandraatmadja, Craig Boutilier	A general framework of value-based reinforcement learning for continuous control
189	Learning Heuristics for Quantified Boolean Formulas through Reinforcement Learning	Gil Lederman, Markus Rabe, Sanjit Seshia, Edward A. Lee	We use RL to automatically learn branching heuristic within a state of the art QBF solver, on industrial problems.
190	Pure and Spurious Critical Points: a Geometric Study of Linear Networks	Matthew Trager, Kathl?n Kohn, Joan Bruna	We introduce a natural distinction between pure critical points, which only depend on the functional space, and spurious critical points, which arise from the parameterization.	code
191	Neural Text Generation With Unlikelihood Training	Sean Welleck, Ilia Kulikov, Stephen Roller, Emily Dinan, Kyunghyun Cho, Jason Weston	We propose a new objective, unlikelihood training, which forces unlikely generations to be assigned lower probability by the model.	code
192	Semi-Supervised Generative Modeling for Controllable Speech Synthesis	Raza Habib, Soroosh Mariooryad, Matt Shannon, Eric Battenberg, RJ Ryan, Daisy Stanton, David Kao, Tom Bagby	We present a novel generative model that combines state-of-the-art neural text- to-speech (TTS) with semi-supervised probabilistic latent variable models.
193	Dynamic Time Lag Regression: Predicting What & When	Mandar Chandorkar, Cyril Furtlehner, Bala Poduval, Enrico Camporeale, Michele Sebag	We propose a new regression framework for temporal phenomena having non-stationary time-lag dependencies.	code
194	Scalable Model Compression by Entropy Penalized Reparameterization	Deniz Oktay, Johannes Ball?, Abhinav Shrivastava, Saurabh Singh	An end-to-end trainable model compression method optimizing accuracy jointly with the expected model size.
195	AMRL: Aggregated Memory For Reinforcement Learning	Jacob Beck, Kamil Ciosek, Sam Devlin, Sebastian Tschiatschek, Cheng Zhang, Katja Hofmann	In Deep RL, order-invariant functions can be used in conjunction with standard memory modules to improve gradient decay and resilience to noise.
196	Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform	Jun Li, Fuxin Li, Sinisa Todorovic	To address this challenge, we present two main contributions: (1) A new efficient retraction map based on an iterative Cayley transform for optimization updates, and (2) An implicit vector transport mechanism based on the combination of a projection of the momentum and the Cayley transform on the Stiefel manifold.
197	UNPAIRED POINT CLOUD COMPLETION ON REAL SCANS USING ADVERSARIAL TRAINING	Xuelin Chen, Baoquan Chen, Niloy J. Mitra	We develop a first approach that works directly on input point clouds, does not require paired training data, and hence can directly be applied to real scans for scan completion.
198	Adjustable Real-time Style Transfer	Mohammad Babaeizadeh, Golnaz Ghiasi	Stochastic style transfer with adjustable features.
199	Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well	Vipul Gupta, Santiago Akle Serrano, Dennis DeCoste	We propose SWAP, a distributed algorithm for large-batch training of neural networks.
200	Short and Sparse Deconvolution — A Geometric Approach	Yenson Lau, Qing Qu, Han-Wen Kuo, Pengcheng Zhou, Yuqian Zhang, John Wright	We leverage the key ideas from this theory (sphere constraints, data-driven initialization) to develop a {\em practical} algorithm, which performs well on data arising from a range of application areas.
201	Selection via Proxy: Efficient Data Selection for Deep Learning	Cody Coleman, Christopher Yeh, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, Matei Zaharia	we can significantly improve the computational efficiency of data selection in deep learning by using a much smaller proxy model to perform data selection.
202	Global Relational Models of Source Code	Vincent J. Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis	Models of source code that combine global and structural features learn more powerful representations of programs.
203	Detecting Extrapolation with Local Ensembles	David Madras, James Atwood, Alexander D’Amour	We present local ensembles, a method for detecting extrapolation in trained models, which approximates the variance of an ensemble using local-second order information.
204	Learning to Link	Maria-Florina Balcan, Travis Dick, Manuel Lang	We show how to use data to automatically learn low-loss linkage procedures and metrics for specific clustering applications.
205	Adversarially robust transfer learning	Ali Shafahi, Parsa Saadatpanah, Chen Zhu, Amin Ghiasi, Christoph Studer, David Jacobs, Tom Goldstein	Robust models have robust feature extractors which can be useful for transferring robustness to other domains
206	Overlearning Reveals Sensitive Attributes	Congzheng Song, Vitaly Shmatikov	Overlearning means that a model trained for a seemingly simple objective implicitly learns to recognize attributes and concepts that are (1) not part of the learning objective, and (2) sensitive from a privacy or bias perspective.	code
207	Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness	Pu Zhao, Pin-Yu Chen, Payel Das, Karthikeyan Natesan Ramamurthy, Xue Lin	A novel approach using mode connectivity in loss landscapes to mitigate adversarial effects, repair tampered models and evaluate adversarial robustness
208	Differentially Private Meta-Learning	Jeffrey Li, Mikhail Khodak, Sebastian Caldas, Ameet Talwalkar	We conduct the first formal study of privacy in this setting and formalize the notion of task-global differential privacy as a practical relaxation of more commonly studied threat models.
209	One-Shot Pruning of Recurrent Neural Networks by Jacobian Spectrum Evaluation	Shunshi Zhang, Bradly C. Stadie	New Objective for One-Shot Pruning Recurrent Neural Networks
210	Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples	Eleni Triantafillou, Tyler Zhu, Vincent Dumoulin, Pascal Lamblin, Utku Evci, Kelvin Xu, Ross Goroshin, Carles Gelada, Kevin Swersky, Pierre-Antoine Manzagol, Hugo Larochelle	We propose a new large-scale diverse environment for few-shot learning, and evaluate popular models’ performance on it, revealing important research challenges.	code
211	Are Transformers universal approximators of sequence-to-sequence functions?	Chulhee Yun, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar	We prove that Transformer networks are universal approximators of sequence-to-sequence functions.
212	GOING BEYOND TOKEN-LEVEL PRE-TRAINING FOR EMBEDDING-BASED LARGE-SCALE RETRIEVAL	Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang, Yiming Yang, Sanjiv Kumar	We consider large-scale retrieval problems such as question answering retrieval and present a comprehensive study of how different sentence level pre-training improving the BERT-style token-level pre-training for two-tower Transformer models.
213	Deep Imitative Models for Flexible Inference, Planning, and Control	Nicholas Rhinehart, Rowan McAllister, Sergey Levine	In this paper, we propose Imitative Models to combine the benefits of IL and goal-directed planning: probabilistic predictive models of desirable behavior able to plan interpretable expert-like trajectories to achieve specified goals.
214	CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning	Jiachen Yang, Alireza Nakhaei, David Isele, Kikuo Fujimura, Hongyuan Zha	A modular method for fully cooperative multi-goal multi-agent reinforcement learning, based on curriculum learning for efficient exploration and credit assignment for action-goal interactions.
215	Robust And Interpretable Blind Image Denoising Via Bias-Free Convolutional Neural Networks	Sreyas Mohan, Zahra Kadkhodaie, Eero P. Simoncelli, Carlos Fernandez-Granda	We study the generalization properties of deep convolutional neural networks for image denoising in the presence of varying noise levels.
216	Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets	Mingrui Liu, Youssef Mroueh, Jerret Ross, Wei Zhang, Xiaodong Cui, Payel Das, Tianbao Yang	This paper provides novel analysis of adaptive gradient algorithms for solving non-convex non-concave min-max problems as GANs, and explains the reason why adaptive gradient methods outperform its non-adaptive counterparts by empirical studies.
217	DeepV2D: Video to Depth with Differentiable Structure from Motion	Zachary Teed, Jia Deng	DeepV2D predicts depth from a video clip by composing elements of classical SfM into a fully differentiable network.
218	Learning Space Partitions for Nearest Neighbor Search	Yihe Dong, Piotr Indyk, Ilya Razenshteyn, Tal Wagner	We use supervised learning (and in particular deep learning) to produce better space partitions for fast nearest neighbor search.	code
219	Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP	Haonan Yu, Sergey Edunov, Yuandong Tian, Ari S. Morcos	We find that the lottery ticket phenomenon is present in both NLP and RL, and find that it can be used to train compressed Transformers to high performance
220	Sign-OPT: A Query-Efficient Hard-label Adversarial Attack	Minhao Cheng, Simranjit Singh, Patrick H. Chen, Pin-Yu Chen, Sijia Liu, Cho-Jui Hsieh	In this paper, we adopt the same optimization formulation but propose to directly estimate the sign of gradient at any direction instead of the gradient itself, which enjoys the benefit of single query.Using this single query oracle for retrieving sign of directional derivative, we develop a novel query-efficient Sign-OPT approach for hard-label black-box attack.
221	Toward Amortized Ranking-Critical Training For Collaborative Filtering	Sam Lobel, Chunyuan Li, Jianfeng Gao, Lawrence Carin	We apply the actor-critic methodology from reinforcement learning to collaborative filtering, resulting in improved performance across a variety of latent-variable models	code
222	Intrinsic Motivation for Encouraging Synergistic Behavior	Rohan Chitnis, Shubham Tulsiani, Saurabh Gupta, Abhinav Gupta	We propose a formulation of intrinsic motivation that is suitable as an exploration bias in multi-agent sparse-reward synergistic tasks, by encouraging agents to affect the world in ways that would not be achieved if they were acting individually.
223	Chameleon: Adaptive Code Optimization For Expedited Deep Neural Network Compilation	Byung Hoon Ahn, Prannoy Pilligundla, Hadi Esmaeilzadeh	Reinforcement learning and Adaptive Sampling for Optimized Compilation of Deep Neural Networks.
224	The function of contextual illusions	Drew Linsley, Junkyung Kim, Alekh Ashok, Thomas Serre	Contextual illusions are a feature, not a bug, of neural routines optimized for contour detection.	code
225	Locality and Compositionality in Zero-Shot Learning	Tristan Sylvain, Linda Petrini, Devon Hjelm	An analysis of the effects of compositionality and locality on representation learning for zero-shot learning.
226	Understanding Knowledge Distillation in Non-autoregressive Machine Translation	Chunting Zhou, Jiatao Gu, Graham Neubig	We systematically examine why knowledge distillation is crucial to the training of non-autoregressive translation (NAT) models, and propose methods to further improve the distilled data to best match the capacity of an NAT model.
227	Thieves on Sesame Street! Model Extraction of BERT-based APIs	Kalpesh Krishna, Gaurav Singh Tomar, Ankur P. Parikh, Nicolas Papernot, Mohit Iyyer	Outputs of modern NLP APIs on nonsensical text provide strong signals about model internals, allowing adversaries to steal the APIs.
228	Fast is better than free: Revisiting adversarial training	Eric Wong, Leslie Rice, J. Zico Kolter	FGSM-based adversarial training, with randomization, works just as well as PGD-based adversarial training: we can use this to train a robust classifier in 6 minutes on CIFAR10, and 12 hours on ImageNet, on a single machine.	code
229	DBA: Distributed Backdoor Attacks against Federated Learning	Chulin Xie, Keli Huang, Pin-Yu Chen, Bo Li	We proposed a novel distributed backdoor attack on federated learning and show that it is not only more effective compared with standard centralized attacks, but also harder to be defended by existing robust FL methods
230	DeFINE: Deep Factorized Input Word Embeddings for Neural Sequence Modeling	Sachin Mehta, Rik Koncel-Kedziorski, Mohammad Rastegari, Hannaneh Hajishirzi	DeFINE uses a deep, hierarchical, sparse network with new skip connections to learn better word embeddings efficiently.
231	Sampling-Free Learning of Bayesian Quantized Neural Networks	Jiahao Su, Milan Cvitkovic, Furong Huang	We propose Bayesian quantized networks, for which we learn a posterior distribution over their quantized parameters.
232	Learning to solve the credit assignment problem	Benjamin James Lansdell, Prashanth Ravi Prakash, Konrad Paul Kording	Perturbations can be used to train feedback weights to learn in fully connected and convolutional neural networks
233	Four Things Everyone Should Know to Improve Batch Normalization	Cecilia Summers, Michael J. Dinneen	Four things that improve batch normalization across all batch sizes	code
234	Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving	Yurong You, Yan Wang, Wei-Lun Chao*, Divyansh Garg, Geoff Pleiss, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger	In this paper, we provide substantial advances to the pseudo-LiDAR framework through improvements in stereo depth estimation.
235	SloMo: Improving Communication-Efficient Distributed SGD with Slow Momentum	Jianyu Wang, Vinayak Tantia, Nicolas Ballas, Michael Rabbat	SlowMo improves the optimization and generalization performance of communication-efficient decentralized algorithms without sacrificing speed.
236	MetaPix: Few-Shot Video Retargeting	Jessica Lee, Deva Ramanan, Rohit Girdhar	Video retargeting typically requires large amount of target data to be effective, which may not always be available; we propose a metalearning approach that improves over popular baselines while producing temporally coherent frames.
237	Learning to Learn by Zeroth-Order Oracle	Yangjun Ruan, Yuanhao Xiong, Sashank Reddi, Sanjiv Kumar, Cho-Jui Hsieh	Novel variant of learning to learn framework for zeroth-order optimization that learns both the update rule and the Gaussian sampling rule.	code
238	Decentralized Distributed PPO: Mastering PointGoal Navigation	Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra	We present Decentralized Distributed Proximal Policy Optimization (DD-PPO), a method for distributed reinforcement learning in resource-intensive simulated environments.	code
239	PAC Confidence Sets for Deep Neural Networks via Calibrated Prediction	Sangdon Park, Osbert Bastani, Nikolai Matni, Insup Lee	We propose an algorithm combining calibrated prediction and generalization bounds from learning theory to construct confidence sets for deep neural networks with PAC guarantees—i.e., the confidence set for a given input contains the true label with high probability.
240	Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations	Yichi Zhang, Ritchie Zhao, Weizhe Hua, Nayun Xu, Edward Suh, Zhiru Zhang	We propose precision gating, an end-to-end trainable dual-precision activation quantization technique for deep neural networks.	code
241	Locally Constant Networks	Guang-He Lee, Tommi S. Jaakkola	A novel neural architecture which implicitly learns an (oblique) decision tree.
242	Span Recovery for Deep Neural Networks with Applications to Input Obfuscation	Rajesh Jayaram, David P. Woodruff, Qiuyi Zhang	We provably recover the span of a deep multi-layered neural network with latent structure and empirically apply efficient span recovery algorithms to attack networks by obfuscating inputs.	code
243	Improving Neural Language Generation with Spectrum Control	Lingxiao Wang, Jing Huang, Kevin Huang, Ziniu Hu, Guangtao Wang, Quanquan Gu	In this paper, we propose a novel spectrum control approach to address this degeneration problem.
244	Learn to Explain Efficiently via Neural Logic Inductive Learning	Yuan Yang, Le Song	An efficient differentiable ILP model that learns first-order logic rules that can explain the data.	code
245	Improved memory in recurrent neural networks with sequential non-normal dynamics	Emin Orhan, Xaq Pitkow	a feedforward, chain-like motif (1->2->3->…) is proposed as a useful inductive bias for better memory in RNNs; amazingly, it works.
246	Neural Module Networks for Reasoning over Text	Nitish Gupta, Kevin Lin, Dan Roth, Sameer Singh, Matt Gardner	This paper extends neural module networks to answer compositional questions against text by introducing differentiable modules that perform reasoning over text and symbols in a probabilistic manner.
247	Higher-Order Function Networks for Learning Composable 3D Object Representations	Eric Mitchell, Selim Engin, Volkan Isler, Daniel D Lee	Neural nets can encode complex 3D objects into the parameters of other (surprisingly small) neural nets
248	Variational Hetero-Encoder Randomized GANs for Joint Image-Text Modeling	Hao Zhang, Bo Chen, Long Tian, Zhengjue Wang, Mingyuan Zhou	A novel Bayesian deep learning framework that captures and relates hierarchical semantic and visual concepts, performing well on a variety of image and text modeling and generation tasks.	code
249	Towards Fast Adaptation of Neural Architectures with Meta Learning	Dongze Lian, Yin Zheng, Yintao Xu, Yanxiong Lu, Leyu Lin, Peilin Zhao, Junzhou Huang, Shenghua Gao	In order to tackle the transferability of NAS and conduct fast adaptation of neural architectures, we propose a novel Transferable Neural Architecture Search method based on meta-learning in this paper, which is termed as T-NAS.
250	Graph Constrained Reinforcement Learning for Natural Language Action Spaces	Prithviraj Ammanabrolu, Matthew Hausknecht	We present KG-A2C, a reinforcement learning agent that builds a dynamic knowledge graph while exploring and generates natural language using a template-based action space – outperforming all current agents on a wide set of text-based games.
251	Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control	Nir Levine, Yinlam Chow, Rui Shu, Ang Li, Mohammad Ghavamzadeh, Hung Bui	Learning embedding for control with high-dimensional observations
252	Augmenting Non-Collaborative Dialog Systems with Explicit Semantic and Strategic Dialog History	Yiheng Zhou, Yulia Tsvetkov, Alan W Black, Zhou Yu	We propose to model both semantic and tactic history using finite state transducers (FSTs).
253	BERTScore: Evaluating Text Generation with BERT	Tianyi Zhang, Varsha Kishore, Felix Wu*, Kilian Q. Weinberger, Yoav Artzi	We propose BERTScore, an automatic evaluation metric for text generation, which correlates better with human judgments and provides stronger model selection performance than existing metrics.
254	Neural Execution of Graph Algorithms	Petar Velickovic, Rex Ying, Matilde Padovano, Raia Hadsell, Charles Blundell	We supervise graph neural networks to imitate intermediate and step-wise outputs of classical graph algorithms, recovering highly favourable insights.
255	On Need for Topology-Aware Generative Models for Manifold-Based Defenses	Uyeong Jang, Susmit Jha, Somesh Jha	This paper asks the following question: do the generative models used in manifold-based defenses need to be topology-aware? Our paper suggests the answer is yes.
256	FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary	Yingzhen Yang, Jiahui Yu, Nebojsa Jojic, Jun Huan, Thomas S. Huang	We present a novel method of compression of deep Convolutional Neural Networks (CNNs) by weight sharing through a new representation of convolutional filters.
257	Capsules with Inverted Dot-Product Attention Routing	Yao-Hung Hubert Tsai, Nitish Srivastava, Hanlin Goh, Ruslan Salakhutdinov	We present a new routing method for Capsule networks, and it performs at-par with ResNet-18 on CIFAR-10/ CIFAR-100.
258	Composition-based Multi-Relational Graph Convolutional Networks	Shikhar Vashishth, Soumya Sanyal, Vikram Nitin, Partha Talukdar	A Composition-based Graph Convolutional framework for multi-relational graphs.	code
259	Gradient-Based Neural DAG Learning	S?bastien Lachapelle, Philippe Brouillard, Tristan Deleu, Simon Lacoste-Julien	We are proposing a new score-based approach to structure/causal learning leveraging neural networks and a recent continuous constrained formulation to this problem	code
260	The Local Elasticity of Neural Networks	Hangfeng He, Weijie Su	This paper presents a phenomenon in neural networks that we refer to as local elasticity.
261	Composing Task-Agnostic Policies with Deep Reinforcement Learning	Ahmed H. Qureshi, Jacob J. Johnson, Yuzhe Qin, Taylor Henderson, Byron Boots, Michael C. Yip	We propose a novel reinforcement learning-based skill transfer and composition method that takes the agent’s primitive policies to solve unseen tasks.	code
262	Convergence Behaviour of Some Gradient-Based Methods on Bilinear Zero-Sum Games	Guojun Zhang, Yaoliang Yu	We systematically analyze the convergence behaviour of popular gradient algorithms for solving bilinear games, with both simultaneous and alternating updates.	code
263	Discovering Motor Programs by Recomposing Demonstrations	Tanmay Shankar, Shubham Tulsiani, Lerrel Pinto, Abhinav Gupta	We learn a space of motor primitives from unannotated robot demonstrations, and show these primitives are semantically meaningful and can be composed for new robot tasks.
264	Learning from Explanations with Neural Module Execution Tree	Yujia Qin, Ziqi Wang, Wenxuan Zhou, Jun Yan, Qinyuan Ye, Xiang Ren, Leonardo Neves, Zhiyuan Liu	In this paper, we propose a novel Neural Modular Execution Tree (NMET) framework for augmenting sequence classification with NL explanations.	code
265	Jelly Bean World: A Testbed for Never-Ending Learning	Emmanouil Antonios Platanios, Abulhair Saparov, Tom Mitchell	To this end, we propose the Jelly Bean World testbed.
266	Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization	Sat Chatterjee	We propose a hypothesis for why gradient descent generalizes based on how per-example gradients interact with each other.
267	Probabilistic Connection Importance Inference and Lossless Compression of Deep Neural Networks	Xin Xing, Long Sha, Pengyu Hong, Zuofeng Shang, Jun S. Liu	We here propose a probabilistic importance inference approach for pruning DNNs.
268	MEMO: A Deep Network for Flexible Combination of Episodic Memories	Andrea Banino, Adri? Puigdom?nech Badia, Raphael K?ster, Martin J. Chadwick, Vinicius Zambaldi, Demis Hassabis, Caswell Barry, Matthew Botvinick, Dharshan Kumaran, Charles Blundell	A memory architecture that support inferential reasoning.
269	Economy Statistical Recurrent Units For Inferring Nonlinear Granger Causality	Saurabh Khanna, Vincent Y. F. Tan	A new recurrent neural network architecture for detecting pairwise Granger causality between nonlinearly interacting time series.	code
270	Bayesian Meta Sampling for Fast Uncertainty Adaptation	Zhenyi Wang, Yang Zhao, Ping Yu, Ruiyi Zhang, Changyou Chen	We proposed a Bayesian meta sampling method for adapting the model uncertainty in meta learning
271	Non-Autoregressive Dialog State Tracking	Hung Le, Steven C.H. Hoi, Richard Socher	We propose the first non-autoregressive neural model for Dialogue State Tracking (DST), achieving the SOTA accuracy (49.04%) on MultiWOZ2.1 benchmark, and reducing inference latency by an order of magnitude.
272	Extreme Tensoring for Low-Memory Preconditioning	Xinyi Chen, Naman Agarwal, Elad Hazan, Cyril Zhang, Yi Zhang	We propose \emph{extreme tensoring} for high-dimensional stochastic optimization, showing that an optimizer needs very little memory to benefit from adaptive preconditioning.
273	Incremental RNN: A Dynamical View.	Anil Kag, Ziming Zhang, Venkatesh Saligrama	Incremental-RNNs resolves exploding/vanishing gradient problem by updating state vectors based on difference between previous state and that predicted by an ODE.
274	The Early Phase of Neural Network Training	Jonathan Frankle, David J. Schwab, Ari S. Morcos	We thoroughly investigate neural network learning dynamics over the early phase of training, finding that these changes are crucial and difficult to approximate, though extended pretraining can recover them.
275	NeurQuRI: Neural Question Requirement Inspector for Answerability Prediction in Machine Reading Comprehension	Seohyun Back, Sai Chetan Chinthakindi, Akhil Kedia, Haejun Lee, Jaegul Choo	We propose a neural question requirement inspection model called NeurQuRI that extracts a list of conditions from the question, each of which should be satisfied by the candidate answer generated by an MRC model.
276	TOWARDS STABILIZING BATCH STATISTICS IN BACKWARD PROPAGATION OF BATCH NORMALIZATION	Junjie Yan, Ruosi Wan, Xiangyu Zhang, Wei Zhang, Yichen Wei, Jian Sun	We propose a novel normalization method to handle small batch size cases.	code
277	Single episode transfer for differing environmental dynamics in reinforcement learning	Jiachen Yang, Brenden Petersen, Hongyuan Zha, Daniel Faissol	Single episode policy transfer in a family of environments with related dynamics, via optimized probing for rapid inference of latent variables and immediate execution of a universal policy.
278	Generalization through Memorization: Nearest Neighbor Language Models	Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, Mike Lewis	We extend a pre-trained neural language model by linearly interpolating it with a k-nearest neighbors model, achieving new state-of-the-art results on Wikitext-103 with no additional training.
279	Transformer-XH: Multi-hop question answering with eXtra Hop attention	Chen Zhao, Chenyan Xiong, Corby Rosset, Xia Song, Paul Bennett, Saurabh Tiwary	We present Transformer-XH, which upgrades Transformer with eXtra Hop attentions to intrinsically model structured texts in a data driven way. It leads to a simpler yet state-of-the-art multi-hop QA system.	code
280	Synthesizing Programmatic Policies that Inductively Generalize	Jeevana Priya Inala, Osbert Bastani, Zenna Tavares, Armando Solar-Lezama	An approach to learn program policies that inductively generalize.
281	Decoding As Dynamic Programming For Recurrent Autoregressive Models	Najam Zaidi, Trevor Cohn, Gholamreza Haffari	Approximate inference using dynamic programming for Autoregressive models.
282	Deep Double Descent: Where Bigger Models and More Data Hurt	Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever	We demonstrate, and characterize, realistic settings where bigger models are worse, and more data hurts.
283	Intriguing Properties of Adversarial Training at Scale	Cihang Xie, Alan Yuille	The first rigor diagnose of large-scale adversarial training on ImageNet
284	Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks	Leopold Cambier, Anahita Bhiwandiwalla, Ting Gong, Oguz H. Elibol, Mehran Nekuii, Hanlin Tang	We propose a novel 8-bit format that eliminates the need for loss scaling, stochastic rounding, and other low precision techniques
285	Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication	Yuanhao Wang, Jiachen Hu, Xiaoyu Chen, Liwei Wang	Our goal is to design communication protocols with near-optimal regret and little communication cost, which is measured by the total amount of transmitted data.
286	Biologically inspired sleep algorithm for increased generalization and adversarial robustness in deep neural networks	Timothy Tadros, Giri Krishnan, Ramyaa Ramyaa, Maxim Bazhenov	We describe a biologically inspired sleep algorithm for increased an artificial neural network’s ability to extract the gist of a training set and exhibit increased robustness to adversarial attacks and general distortions.
287	A Closer Look at the Optimization Landscapes of Generative Adversarial Networks	Hugo Berard, Gauthier Gidel, Amjad Almahairi, Pascal Vincent, Simon Lacoste-Julien	By proposing new visualization techniques we give better insights on GANs optimization in practical settings, we show that GANs on challenging datasets exhibit rotational behavior and do not converge to Nash-Equilibria	code
288	On the Global Convergence of Training Deep Linear ResNets	Difan Zou, Philip M. Long, Quanquan Gu	Under certain condition on the input and output linear transformations, both GD and SGD can achieve global convergence for training deep linear ResNets.
289	Towards a Deep Network Architecture for Structured Smoothness	Haroun Habeeb, Oluwasanmi Koyejo	A feedforward layer to incorporate structured smoothness into a deep learning model
290	Revisiting Self-Training for Neural Sequence Generation	Junxian He, Jiatao Gu, Jiajun Shen, Marc’Aurelio Ranzato	We revisit self-training as a semi-supervised learning method for neural sequence generation problem, and show that self-training can be quite successful with injected noise.
291	Denoising and Regularization via Exploiting the Structural Bias of Convolutional Generators	Reinhard Heckel and Mahdi Soltanolkotabi	In this paper we take a step towards demystifying this experimental phenomena by attributing this effect to particular architectural choices of convolutional networks, namely fixed convolutional operations.	code
292	Variational Autoencoders for Highly Multivariate Spatial Point Processes Intensities	Baichuan Yuan, Xiaowei Wang, Andrea Bertozzi, Hongxia Yang	To bridge this gap, we introduce a declustering based hidden variable model that leads to an efficient inference procedure via a variational autoencoder (VAE).
293	Model-Augmented Actor-Critic: Backpropagating through Paths	Ignasi Clavera, Yao Fu, Pieter Abbeel	Policy gradient through backpropagation through time using learned models and Q-functions. SOTA results in reinforcement learning benchmark environments.
294	LambdaNet: Probabilistic Type Inference using Graph Neural Networks	Jiayi Wei, Maruth Goyal, Greg Durrett, Isil Dillig	This paper proposes a probabilistic type inference scheme for Typescript based on a graph neural network.
295	From Inference to Generation: End-to-end Fully Self-supervised Generation of Human Face from Speech	Hyeong-Seok Choi, Changdae Park, Kyogu Lee	This paper proposes a method of end-to-end multi-modal generation of human face from speech based on a self-supervised learning framework.
296	Visual Representation Learning with 3D View-Constrastive Inverse Graphics Networks	Adam W. Harley, Fangyu Li, Shrinidhi K. Lakshmikanth, Xian Zhou, Hsiao-Yu Fish Tung, Katerina Fragkiadaki	We show that with the right loss and architecture, view-predictive learning improves 3D object detection
297	Decoupling Representation and Classifier for Long-Tailed Recognition	Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis	In this work, we decouple the learning procedure into representation learning and classification, and systematically explore how different balancing strategies affect them for long-tailed recognition.
298	Robust Reinforcement Learning for Continuous Control with Model Misspecification	Daniel J. Mankowitz, Nir Levine, Rae Jeong, Abbas Abdolmaleki, Jost Tobias Springenberg, Yuanyuan Shi, Jackie Kay, Todd Hester, Timothy Mann, Martin Riedmiller	A framework for incorporating robustness to model misspecification into continuous control Reinforcement Learning algorithms.
299	Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework	Zirui Wang, Jiateng Xie, Ruochen Xu, Yiming Yang, Graham Neubig, Jaime G. Carbonell	We conduct a comparative study of cross-lingual alignment vs joint training methods and unify these two previously exclusive paradigms in a new framework.
300	Training Recurrent Neural Networks Online by Learning Explicit State Variables	Somjit Nath, Vincent Liu, Alan Chan, Adam White, Martha White	In this work, we reformulate the RNN training objective to explicitly learn state vectors; this breaks the dependence across time and so avoids the need to estimate gradients far back in time.
301	Uncertainty-guided Continual Learning with Bayesian Neural Networks	Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, Marcus Rohrbach	A regularization-based approach for continual learning using Bayesian neural networks to predict parameters’ importance
302	Curriculum Loss: Robust Learning and Generalization against Label Corruption	Yueming Lyu, Ivor W. Tsang	A novel loss bridges curriculum learning and robust learning
303	Picking Winning Tickets Before Training by Preserving Gradient Flow	Chaoqi Wang, Guodong Zhang, Roger Grosse	We introduced a pruning criterion for pruning networks before training by preserving gradient flow.
304	Generative Models for Effective ML on Private, Decentralized Datasets	Sean Augenstein, H. Brendan McMahan, Daniel Ramage, Swaroop Ramaswamy, Peter Kairouz, Mingqing Chen, Rajiv Mathews, Blaise Aguera y Arcas	Generative Models + Federated Learning + Differential Privacy gives data scientists a way to analyze private, decentralized data (e.g., on mobile devices) where direct inspection is prohibited.	code
305	Inductive representation learning on temporal graphs	da Xu, chuanwei ruan, evren korpeoglu, sushant kumar, kannan achan	We propose the temporal graph attention (TGAT) layer to effectively aggregate temporal-topological neighborhood features as well as learning time-feature interactions.	code
306	BatchEnsemble: an Alternative Approach to Efficient Ensemble and Lifelong Learning	Yeming Wen, Dustin Tran, Jimmy Ba	We introduced BatchEnsemble, an efficient method for ensembling and lifelong learning which can be used to improve the accuracy and uncertainty of any neural network like typical ensemble methods.
307	Towards neural networks that provably know when they don’t know	Alexander Meinke, Matthias Hein	In this paper we propose a new approach to OOD which overcomes both problems.
308	Iterative energy-based projection on a normal data manifold for anomaly localization	David Dehaene, Oriel Frigo, S?bastien Combrexelle, Pierre Eline	We use gradient descent on a regularized autoencoder loss to correct anomalous images.
309	Towards Stable and Efficient Training of Verifiably Robust Neural Networks	Huan Zhang, Hongge Chen, Chaowei Xiao, Sven Gowal, Robert Stanforth, Bo Li, Duane Boning, Cho-Jui Hsieh	We propose a new certified adversarial training method, CROWN-IBP, that achieves state-of-the-art robustness for L_inf norm adversarial perturbations.
310	Frequency-based Search-control in Dyna	Yangchen Pan, Jincheng Mei, Amir-massoud Farahmand, Martha White	Acquire states from high frequency region for search-control in Dyna.
311	Learning representations for binary-classification without backpropagation	Mathias Lechner	First feedback alignment algorithm with provable learning guarantees for networks with single output neuron	code
312	Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks	Ziwei Ji, Matus Telgarsky	This work shows that O(1/epsilon) iterations of gradient descent on two-layer networks of any width exceeding polylog(n, 1/epsilon, 1/delta) and Omega(1/epsilon^2) training examples suffices to achieve a test error of epsilon.
313	Physics-aware Difference Graph Networks for Sparsely-Observed Dynamics	Sungyong Seo, Chuizheng Meng, Yan Liu	We propose physics-aware difference graph networks designed to effectively learn spatial differences to modeling sparsely-observed dynamics.
314	HiLLoC: lossless image compression with hierarchical latent variable models	James Townsend, Thomas Bird, Julius Kunze, David Barber	We scale up lossless compression with latent variables, beating existing approaches on full-size ImageNet images.	code
315	IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks	Michael Luo, Jiahao Yao, Richard Liaw, Eric Liang, Ion Stoica	IMPACT helps RL agents train faster by decreasing training wall-clock time and increasing sample efficiency simultaneously.
316	On Bonus Based Exploration Methods In The Arcade Learning Environment	Adrien Ali Taiga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare	We find that existing bonus-based exploration methods have not been able to address the exploration-exploitation trade-off in the Arcade Learning Environment.
317	Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation	Xinjie Fan, Yizhe Zhang, Zhendong Wang, Mingyuan Zhou	To stabilize this method for contextual generation of categorical sequences, we estimate the gradient by evaluating a set of correlated Monte Carlo rollouts.	code
318	Smoothness and Stability in GANs	Casey Chu, Kentaro Minami, Kenji Fukumizu	We develop a principled theoretical framework for understanding and enforcing the stability of various types of GANs
319	SNOW: Subscribing to Knowledge via Channel Pooling for Transfer & Lifelong Learning	Chungkuk Yoo, Bumsoo Kang, Minsik Cho	We propose SNOW, an efficient way of transfer and lifelong learning by subscribing knowledge of a source model for new tasks through a novel channel pooling block.
320	Empirical Studies on the Properties of Linear Regions in Deep Neural Networks	Xiao Zhang, Dongrui Wu	This paper provides a novel and meticulous perspective to look into DNNs: Instead of just counting the number of the linear regions, we study their local properties, such as the inspheres, the directions of the corresponding hyperplanes, the decision boundaries, and the relevance of the surrounding regions.
321	Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning	Ali Mousavi, Lihong Li, Qiang Liu, Denny Zhou	We present a novel approach for the off-policy estimation problem in infinite-horizon RL.
322	PairNorm: Tackling Oversmoothing in GNNs	Lingxiao Zhao, Leman Akoglu	We proposed a normalization layer for GNN models to solve the oversmoothing problem.	code
323	Unsupervised Clustering using Pseudo-semi-supervised Learning	Divam Gupta, Ramachandran Ramjee, Nipun Kwatra, Muthian Sivathanu	Using ensembles and pseudo labels for unsupervised clustering	code
324	Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee	Wei Hu, Zhiyuan Li, Dingli Yu	This paper proposes and analyzes two simple and intuitive regularization methods: (i) regularization by the distance between the network parameters to initialization, and (ii) adding a trainable auxiliary variable to the network output for each training example.
325	Controlling generative models with continuous factors of variations	Antoine Plumerault, Herv? Le Borgne, C?line Hudelot	A model to control the generation of images with GAN and beta-VAE with regard to scale and position of the objects
326	Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control	Yaofeng Desmond Zhong, Biswadip Dey, Amit Chakraborty	This work enforces Hamiltonian dynamics with control to learn system models from embedded position and velocity data, and exploits this physically-consistent dynamics to synthesize model-based control via energy shaping.
327	Understanding l4-based Dictionary Learning: Interpretation, Stability, and Robustness	Yuexiang Zhai, Hermish Mehta, Zhengyuan Zhou, Yi Ma	We compare the l4-norm based dictionary learning with PCA, ICA and show its stability as well as robustness.
328	Quantum Algorithms for Deep Convolutional Neural Networks	Iordanis Kerenidis, Jonas Landman, Anupam Prakash	We provide the first algorithm for quantum computers implementing universal convolutional neural network with a speedup	code
329	Self-Supervised Learning of Appliance Usage	Chen-Yu Hsu, Abbas Zeitoun, Guang-He Lee, Dina Katabi, Tommi Jaakkola	We learn appliance usage patterns in homes without labels, using self-supervised learning with energy and location data
330	Deep Graph Matching Consensus	Matthias Fey, Jan E. Lenssen, Christopher Morris, Jonathan Masci, Nils M. Kriege	We develop a deep graph matching architecture which refines initial correspondences based on a neighborhood consensus error.
331	Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks	Yu Bai, Jason D. Lee	Wide neural networks can escape the NTK regime and couple with quadratic models, with provably nice optimization landscape and better generalization.
332	Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers	Junjie LIU, Zhe XU, Runbin SHI, Ray C. C. Cheung, Hayden K.H. So	We present a novel network pruning method that can find the optimal sparse structure during the training process with trainable pruning threshold	code
333	Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference	Ting-Kuei Hu, Tianlong Chen, Haotao Wang, Zhangyang Wang	Is it possible to co-design model accuracy, robustness and efficiency to achieve their triple wins? Yes!
334	Neural Policy Gradient Methods: Global Optimality and Rates of Convergence	Lingxiao Wang, Qi Cai, Zhuoran Yang, Zhaoran Wang	In detail, we prove that neural natural policy gradient converges to a globally optimal policy at a sublinear rate. Also, we show that neural vanilla policy gradient converges sublinearly to a stationary point.
335	Double Neural Counterfactual Regret Minimization	Hui Li, Kailiang Hu, Shaohua Zhang, Yuan Qi, Le Song	We proposed a double neural framework to solve large-scale imperfect information game.
336	GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation	Chence Shi, Minkai Xu, Zhaocheng Zhu, Weinan Zhang, Ming Zhang, Jian Tang	A flow-based autoregressive model for molecular graph generation. Reaching state-of-the-art results on molecule generation and properties optimization.	code
337	The Gambler’s Problem and Beyond	Baoxiang Wang, Shuai Li, Jiajin Li, Siu On Chan	The optimal value function is fractal and is like a Cantor function.
338	Multilingual Alignment of Contextual Word Representations	Steven Cao, Nikita Kitaev, Dan Klein	We propose procedures for evaluating and strengthening contextual embedding alignment and show that they both improve multilingual BERT’s zero-shot XNLI transfer and provide useful insights into the model.
339	The Curious Case of Neural Text Degeneration	Ari Holtzman, Jan Buys, Leo Du, Maxwell Forbes, Yejin Choi	Current language generation systems either aim for high likelihood and devolve into generic repetition or miscalibrate their stochasticity?we provide evidence of both and propose a solution: Nucleus Sampling.
340	Graph Convolutional Reinforcement Learning	Jiechuan Jiang, Chen Dun, Tiejun Huang, Zongqing Lu	To tackle these difficulties, we propose graph convolutional reinforcement learning, where graph convolution adapts to the dynamics of the underlying graph of the multi-agent environment, and relation kernels capture the interplay between agents by their relation representations.	code
341	Meta-Learning Deep Energy-Based Memory Models	Sergey Bartunov, Jack Rae, Simon Osindero, Timothy Lillicrap	Deep associative memory models using arbitrary neural networks as a storage.
342	Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep RL	Akanksha Atrey, Kaleigh Clary, David Jensen	Proposing a new counterfactual-based methodology to evaluate the hypotheses generated from saliency maps about deep RL agent behavior.
343	Fast Neural Network Adaptation via Parameters Remapping	Jiemin Fang, Yuzhu Sun, Kangjian Peng*, Qian Zhang, Yuan Li, Wenyu Liu, Xinggang Wang	In this paper, we propose a fast neural network adaptation method FNA, which can adapt the manually designed network on ImageNet to the new seg/det tasks efficiently.
344	Guiding Program Synthesis by Learning to Generate Examples	Larissa Laich, Pavol Bielik, Martin Vechev	In this paper we address this challenge via an iterative approach that finds ambiguities in the provided specification and learns to resolve these by generating additional input-output examples.
345	SNODE: Spectral Discretization of Neural ODEs for System Identification	Alessio Quaglino, Marco Gallieri, Jonathan Masci, Jan Koutn?k	This paper proposes the use of spectral element methods for fast and accurate training of Neural Ordinary Differential Equations for system identification.
346	Generalized Convolutional Forest Networks for Domain Generalization and Visual Recognition	Jongbin Ryu, GiTaek Kwon, Ming-Hsuan Yang, Jongwoo Lim	In this work, we propose a generalized convolutional forest networks to learn a feature space to maximize the strength of individual tree classifiers while minimizing the respective correlation.
347	Once for All: Train One Network and Specialize it for Efficient Deployment	Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han	We introduce techniques to train a single once-for-all network that fits many hardware platforms.
348	Multi-Agent Interactions Modeling with Correlated Policies	Minghuan Liu, Ming Zhou, Weinan Zhang, Yuzheng Zhuang, Jun Wang, Wulong Liu, Yong Yu	Modeling complex multi-agent interactions under multi-agent imitation learning framework with explicit modeling of correlated policies by approximating opponents? policies.	code
349	PCMC-Net: Feature-based Pairwise Choice Markov Chains	Alix Lh?ritier	We propose a generic neural network architecture equipping Pairwise Choice Markov Chains choice models with amortized and automatic differentiation based inference using alternatives’ and individuals’ features.
350	Implementing Inductive bias for different navigation tasks through diverse RNN attrractors	Tie XU, Omri Barak	Task agnostic pre-training can shape RNN’s attractor landscape, and form diverse inductive bias for different navigation tasks	code
351	Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings	Hongyu Ren, Weihua Hu, Jure Leskovec	Answering a wide class of logical queries over knowledge graphs with box embeddings in vector space
352	Rethinking the Hyperparameters for Fine-tuning	Hao Li, Pratik Chaudhari, Hao Yang, Michael Lam, Avinash Ravichandran, Rahul Bhotik, Stefano Soatto	This paper re-examines several common practices of setting hyper-parameters for fine-tuning.
353	Plug and Play Language Model: A simple baseline for controlled language generation	Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, Rosanne Liu	We control the topic and sentiment of text generation (almost) without any training.	code
354	Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks	Wei Hu, Lechao Xiao, Jeffrey Pennington	We provide for the first time a rigorous proof that orthogonal initialization speeds up convergence relative to Gaussian initialization, for deep linear networks.
355	RGBD-GAN: Unsupervised 3D Representation Learning From Natural Image Datasets via RGBD Image Synthesis	Atsuhiro Noguchi, Tatsuya Harada	RGBD image generation for unsupervised camera parameter conditioning
356	Towards Verified Robustness under Text Deletion Interventions	Johannes Welbl, Po-Sen Huang, Robert Stanforth, Sven Gowal, Krishnamurthy (Dj) Dvijotham, Martin Szummer, Pushmeet Kohli	Formal verification of a specification on a model’s prediction undersensitivity using Interval Bound Propagation
357	Jacobian Adversarially Regularized Networks for Robustness	Alvin Chan, Yi Tay, Yew Soon Ong, Jie Fu	We show that training classifiers to produce salient input Jacobian matrices with a GAN-like regularization can boost adversarial robustness.
358	Thinking While Moving: Deep Reinforcement Learning with Concurrent Control	Ted Xiao, Eric Jang, Dmitry Kalashnikov, Sergey Levine, Julian Ibarz, Karol Hausman, Alexander Herzog	Reinforcement learning formulation that allows agents to think and act at the same time, demonstrated on real-world robotic grasping.
359	Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning	Qian Long, Zihan Zhou, Abhinav Gupta, Fei Fang, Yi Wu?, Xiaolong Wang?	In this paper, we introduce Evolutionary Population Curriculum (EPC), a curriculum learning paradigm that scales up Multi-Agent Reinforcement Learning (MARL) by progressively increasing the population of training agents in a stage-wise manner.
360	ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators	Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning	A text encoder trained to distinguish real input tokens from plausible fakes efficiently learns effective language representations.
361	Emergent Systematic Generalization In a Situated Agent	Felix Hill, Andrew Lampinen, Rosalia Schneider, Stephen Clark, Matthew Botvinick, James L. McClelland, Adam Santoro	We isolate the environmental and training factors that contribute to strong emergent systematic generalization in a situated language-learning agent
362	Abstract Diagrammatic Reasoning with Multiplex Graph Networks	Duo Wang, Mateja Jamnik, Pietro Lio	MXGNet is a multilayer, multiplex graph based architecture which achieves good performance on various diagrammatic reasoning tasks.
363	A Baseline for Few-Shot Image Classification	Guneet Singh Dhillon, Pratik Chaudhari, Avinash Ravichandran, Stefano Soatto	Transductive fine-tuning of a deep network is a strong baseline for few-shot image classification and outperforms the state-of-the-art on all standard benchmarks.
364	Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering	Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, Caiming Xiong	Graph-based recurrent retriever that learns to retrieve reasoning paths over Wikipedia Graph outperforms the most recent state of the art on HotpotQA by more than 10 points.
365	Pad? Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks	Alejandro Molina, Patrick Schramowski, Kristian Kersting	We introduce PAU, a new learnable activation function for neural networks. They free the network designers from the activation selection process and increase the test prediction accuracy.
366	A FRAMEWORK FOR ROBUSTNESS CERTIFICATION OF SMOOTHED CLASSIFIERS USING F-DIVERGENCES	Krishnamurthy (Dj) Dvijotham, Jamie Hayes, Borja Balle, Zico Kolter, Chongli Qin, Andras Gyorgy, Kai Xiao, Sven Gowal, Pushmeet Kohli	Develop a general framework to establish certified robustness of ML models against various classes of adversarial perturbations
367	Contrastive Representation Distillation	Yonglong Tian, Dilip Krishnan, Phillip Isola	Representation/knowledge distillation by maximizing mutual information between teacher and student
368	Certified Defenses for Adversarial Patches	Ping-yeh Chiang, Renkun Ni, Ahmed Abdelkader, Chen Zhu, Chris Studor, Tom Goldstein	Motivated by this finding, we present an extension of certified defense algorithms and propose significantly faster variants for robust training against patch attacks.
369	Sample Efficient Policy Gradient Methods with Recursive Variance Reduction	Pan Xu, Felicia Gao, Quanquan Gu	In this work, we aim to reduce the sample complexity of existing policy gradient methods.
370	Deep Symbolic Superoptimization Without Human Knowledge	Hui Shi, Yang Zhang, Xinyun Chen, Yuandong Tian, Jishen Zhao	We thus propose HISS, a reinforcement learning framework for symbolic super-optimization that keeps human outside the loop.
371	Explain Your Move: Understanding Agent Actions Using Focused Feature Saliency	Piyush Gupta, Nikaash Puri, Sukriti Verma, Dhruv Kayastha, Shripad Deshmukh, Balaji Krishnamurthy, Sameer Singh	We propose a model-agnostic approach to explain the behaviour of black-box deep RL agents, trained to play Atari and board games, by highlighting relevant features of an input state.	code
372	Universal Approximation with Certified Networks	Maximilian Baader, Matthew Mirman, Martin Vechev	We prove that for a large class of functions f there exists an interval certified robust network approximating f up to arbitrary precision.
373	Measuring and Improving the Use of Graph Information in Graph Neural Networks	Yifan Hou, Jian Zhang, James Cheng, Kaili Ma, Richard T. B. Ma, Hongzhi Chen, Ming-Chang Yang	This paper introduces a context-surrounding GNN framework and proposes two smoothness metrics to measure the quantity and quality of information obtained from graph data.
374	State-only Imitation with Transition Dynamics Mismatch	Tanmay Gangwani, Jian Peng	Algorithm for imitation with state-only expert demonstrations; builds on adversarial-IRL; experiments with transition dynamics mismatch b/w expert and imitator
375	Adversarial AutoAugment	Xinyu Zhang, Qiang Wang, Jian Zhang, Zhao Zhong	We introduce the idea of adversarial learning into automatic data augmentation to improve the generalization of a targe network.
376	Meta Dropout: Learning to Perturb Latent Features for Generalization	Hae Beom Lee, Taewook Nam, Eunho Yang, Sung Ju Hwang	To tackle this challenge, we propose a novel regularization method, meta-dropout, which learns to perturb the latent features of training examples for generalization in a meta-learning framework.	code
377	R?nyi Fair Inference	Sina Baharlouei, Maher Nouiehed, Meisam Razaviyayn	In this paper, we use R?nyi correlation as a measure of fairness of machine learning models and develop a general training framework to impose fairness.
378	Learning transport cost from subset correspondence	Ruishan Liu, Akshay Balsubramani, James Zou	In this work, we investigate how to learn the cost function using a small amount of side information which is often available.	code
379	BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget	Jack Turner, Elliot J. Crowley, Michael O’Boyle, Amos Storkey, Gavin Gray	A simple and effective method for reducing large neural networks to flexible parameter targets based on block substitution.
380	Variance Reduction With Sparse Gradients	Melih Elibol, Lihua Lei, Michael I. Jordan	We use sparsity to improve the computational complexity of variance reduction methods.	code
381	Abductive Commonsense Reasoning	Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Wen-tau Yih, Yejin Choi	We present the first study that investigates the viability of language-based abductive reasoning. We introduce a challenge dataset, ART, that consists of over 20k commonsense narrative contexts and 200k explanations.
382	Discrepancy Ratio: Evaluating Model Performance When Even Experts Disagree on the Truth	Igor Lovchinsky, Alon Daks, Israel Malkin, Pouya Samangouei, Ardavan Saeedi, Yang Liu, Swami Sankaranarayanan, Tomer Gafner, Ben Sternlieb, Patrick Maher, Nathan Silberman	A framework for evaluating model performance when even experts disagree on what the ground truth is.
383	Weakly Supervised Disentanglement with Guarantees	Rui Shu, Yining Chen, Abhishek Kumar, Stefano Ermon, Ben Poole	We construct a theoretical framework for weakly supervised disentanglement and conducted lots of experiments to back up the theory.	code
384	Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks	Jiadong Lin, Chuanbiao Song, Kun He, Liwei Wang, John E. Hopcroft	We proposed a Nesterov Iterative Fast Gradient Sign Method (NI-FGSM) and a Scale-Invariant attack Method (SIM) that can boost the transferability of adversarial examples for image classification.	code
385	Fantastic Generalization Measures and Where to Find Them	Yiding Jiang, Behnam Neyshabur, Dilip Krishnan, Hossein Mobahi, Samy Bengio	We empirically study generalization measures over more than 2000 models, identify common pitfall in existing practice of studying generalization measures and provide some new bounds based on measures in our study.	code
386	Robustness Verification for Transformers	Zhouxing Shi, Huan Zhang, Kai-Wei Chang, Minlie Huang, Cho-Jui Hsieh	We propose the first algorithm for verifying the robustness of Transformers.
387	A Simple Randomization Technique for Generalization in Deep Reinforcement Learning	Kimin Lee, Kibok Lee, Jinwoo Shin, Honglak Lee	We propose a simple randomization technique for improving generalization in deep reinforcement learning across tasks with various unseen visual patterns.
388	Tensor Decompositions for Temporal Knowledge Base Completion	Timoth?e Lacroix, Guillaume Obozinski, Nicolas Usunier	We propose new tensor decompositions and associated regularizers to obtain state of the art performances on temporal knowledge base completion.	code
389	On Universal Equivariant Set Networks	Nimrod Segol, Yaron Lipman	Settling permutation equivariance universality for popular deep models.
390	Provable robustness against all adversarial $l_p$-perturbations for $p\geq 1$	Francesco Croce, Matthias Hein	We introduce a method to train models with provable robustness wrt all the $l_p$-norms for $p\geq 1$ simultaneously.
391	Don’t Use Large Mini-batches, Use Local SGD	Tao Lin, Sebastian U. Stich, Kumar Kshitij Patel, Martin Jaggi	As a remedy, we propose a \emph{post-local} SGD and show that it significantly improves the generalization performance compared to large-batch training on standard benchmarks while enjoying the same efficiency (time-to-accuracy) and scalability.
392	Kernel of CycleGAN as a principal homogeneous space	Nikita Moriakov, Jonas Adler, Jonas Teuwen	The space of approximate solutions of CycleGAN admits a lot of symmetry, and an identity loss does not fix this.
393	Distributionally Robust Neural Networks	Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, Percy Liang	Overparameterized neural networks can be distributionally robust, but only when you account for generalization.
394	On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach	Yuanhao Wang, Guodong Zhang, Jimmy Ba	In this paper, we propose Follow-the-Ridge (FR), a novel algorithm that provably converges to and only converges to local minimax.
395	A Neural Dirichlet Process Mixture Model for Task-Free Continual Learning	Soochan Lee, Junsoo Ha, Dongsu Zhang, Gunhee Kim	We propose an expansion-based approach for task-free continual learning for the first time. Our model consists of a set of neural network experts and expands the number of experts under the Bayesian nonparametric principle.
396	Hyper-SAGNN: a self-attention based graph neural network for hypergraphs	Ruochi Zhang, Yuesong Zou, Jian Ma	We develop a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes that can fulfill tasks like node classification and hyperedge prediction.	code
397	Neural Epitome Search for Architecture-Agnostic Network Compression	Daquan Zhou, Xiaojie Jin, Qibin Hou, Kaixin Wang, Jianchao Yang, Jiashi Feng	We present a novel neural network compression method which can reuse the parameters efficiently to reduce the model size.
398	On the Equivalence between Node Embeddings and Structural Graph Representations	Balasubramaniam Srinivasan, Bruno Ribeiro	We develop the foundations of a unifying theoretical framework connecting node embeddings and structural graph representations through invariant theory
399	Probability Calibration for Knowledge Graph Embedding Models	Pedro Tabacof, Luca Costabello	We propose a novel method to calibrate knowledge graph embedding models without the need of negative examples.
400	Why Not to Use Zero Imputation? Correcting Sparsity Bias in Training Neural Networks	Joonyoung Yi, Juhyuk Lee, Sung Ju Hwang, Eunho Yang	In this paper, we introduce the variable sparsity problem (VSP), which describes a phenomenon where the output of a predictive model largely varies with respect to the rate of missingness in the given input, and show that it adversarially affects the model performance.
401	DropEdge: Towards Deep Graph Convolutional Networks on Node Classification	Yu Rong, Wenbing Huang, Tingyang Xu, Junzhou Huang	This paper proposes DropEdge, a novel and flexible technique to alleviate over-smoothing and overfitting issue in deep Graph Convolutional Networks.	code
402	Masked Based Unsupervised Content Transfer	Ron Mokady, Sagie Benaim, Lior Wolf, Amit Bermano	We consider the problem of translating, in an unsupervised manner, between two domains where one contains some additional information compared to the other.	code
403	U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation	Junho Kim, Minjae Kim, Hyeonwoo Kang, Kwang Hee Lee	We propose a novel method for unsupervised image-to-image translation, which incorporates a new attention module and a new learnable normalization function in an end-to-end manner.	code
404	Inductive and Unsupervised Representation Learning on Graph Structured Objects	Lichen Wang, Bo Zong, Qianqian Ma, Wei Cheng, Jingchao Ni, Wenchao Yu, Yanchi Liu, Dongjin Song, Haifeng Chen, Yun Fu	This paper proposed a novel framework for graph similarity learning in inductive and unsupervised scenario.
405	Batch-shaping for learning conditional channel gated networks	Babak Ehteshami Bejnordi, Tijmen Blankevoort, Max Welling	A method that trains large capacity neural networks with significantly improved accuracy and lower dynamic computational cost
406	Learning Robust Representations via Multi-View Information Bottleneck	Marco Federici, Anjan Dutta, Patrick Forr?, Nate Kushman, Zeynep Akata	We extend the information bottleneck method to the unsupervised multiview setting and show state of the art results on standard datasets	code
407	Deep probabilistic subsampling for task-adaptive compressed sensing	Iris A.M. Huijben, Bastiaan S. Veeling, Ruud J.G. van Sloun	In this work, we demonstrate that the deep learning paradigm can be extended to incorporate a subsampling scheme that is jointly optimized under a desired minimum sample rate.
408	Robust anomaly detection and backdoor attack detection via differential privacy	Min Du, Ruoxi Jia, Dawn Song	This paper shows that differential privacy could improve the utility of outlier detection, novelty detection and backdoor attack detection, through both a theoretical analysis and extensive experimental results (constructed and real-world).	code
409	Learning to Guide Random Search	Ozan Sener, Vladlen Koltun	We improve the sample-efficiency of the random search for functions defined on low-dimensional manifolds. Our method jointly learns the underlying manifold and optimizes the function.
410	Lagrangian Fluid Simulation with Continuous Convolutions	Benjamin Ummenhofer, Lukas Prantl, Nils Th?rey, Vladlen Koltun	We learn particle-based fluid simulation with convolutional networks.
411	Reinforced Genetic Algorithm Learning for Optimizing Computation Graphs	Aditya Paliwal, Felix Gimeno, Vinod Nair, Yujia Li, Miles Lubin, Pushmeet Kohli, Oriol Vinyals	We use deep RL to learn a policy that directs the search of a genetic algorithm to better optimize the execution cost of computation graphs, and show improved results on real-world TensorFlow graphs.
412	Compressive Transformers for Long-Range Sequence Modelling	Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Chloe Hillier, Timothy P. Lillicrap	Long-range transformer using a compressive memory, achieves sota in wikitext-103 and enwik8 LM benchmarks, release a new book-level LM benchmark PG-19.
413	A Stochastic Derivative Free Optimization Method with Momentum	Eduard Gorbunov, Adel Bibi, Ozan Sener, El Houcine Bergou, Peter Richtarik	We develop and analyze a new derivative free optimization algorithm with momentum and importance sampling with applications to continuous control.
414	Understanding and Improving Information Transfer in Multi-Task Learning	Sen Wu, Hongyang Zhang, Christopher R?	A Theoretical Study of Multi-Task Learning with Practical Implications for Improving Multi-Task Training and Transfer Learning
415	Learning To Explore Using Active Neural Mapping	Devendra Singh Chaplot, Saurabh Gupta, Dhiraj Gandhi, Abhinav Gupta, Ruslan Salakhutdinov	A modular and hierarchical approach to learn policies for exploring 3D environments.
416	EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness Against Adversarial Attacks	Sanchari Sen, Balaraman Ravindran, Anand Raghunathan	We propose ensembles of mixed-precision DNNs as a new form of defense against adversarial attacks
417	Quantifying Point-Prediction Uncertainty in Neural Networks via Residual Estimation with an I/O Kernel	Xin Qiu, Elliot Meyerson, Risto Miikkulainen	Learning to Estimate Point-Prediction Uncertainty and Correct Output in Neural Networks	code
418	B-Spline CNNs on Lie groups	Erik J Bekkers	The paper describes a flexible framework for building CNNs that are equivariant to a large class of transformations groups.	code
419	Neural Outlier Rejection for Self-Supervised Keypoint Learning	Jiexiong Tang, Rares Ambrus, Vitor Guizilini, Hanme Kim	Learning to extract distinguishable keypoints from a proxy task, outlier rejection.
420	Reducing Transformer Depth on Demand with Structured Dropout	Angela Fan, Edouard Grave, Armand Joulin	Layerdrop, a form of structured dropout that allows you to train one model at training time and prune to any desired depth at test time. You can also
421	Cross-Lingual Ability of Multilingual BERT: An Empirical Study	Karthikeyan K, Zihan Wang, Stephen Mayhew, Dan Roth	Cross-Lingual Ability of Multilingual BERT: An Empirical Study
422	Spatially Parallel Attention and Component Extraction for Scene Decomposition	Sungjin Ahn, Zhixuan Lin, Weihao Sun, Skand Vishwanath Peri, Gautam Singh, Yi-Fu Wu, Fei Deng, Jindong Jiang	We propose a generative latent variable model for unsupervised scene decomposition that provides factorized object representation per foreground object while also decomposing background segments of complex morphology.
423	RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments	Roberta Raileanu, Tim Rockt?schel	Instead of rewarding agents for predicting the next state, reward them for taking actions that lead to changes in the state.
424	On the geometry and learning low-dimensional embeddings for directed graphs	Thorben Funke, Tian Guo, Alen Lancic, Nino Antulov-Fantulin	We propose a novel node embedding of directed graphs to statistical manifolds and analyze connections to divergence, geometry and efficient learning procedure.
425	Efficient Probabilistic Logic Reasoning with Graph Neural Networks	Yuyu Zhang, Xinshi Chen, Yuan Yang, Arun Ramamurthy, Bo Li, Yuan Qi, Le Song	We employ graph neural networks in the variational EM framework for efficient inference and learning of Markov Logic Networks.	code
426	GraphSAINT: Graph Sampling Based Inductive Learning Method	Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, Viktor Prasanna	We propose a graph sampling based minibatch construction method for training Graph Convolutional Networks.	code
427	You Only Train Once: Loss-Conditional Training of Deep Networks	Alexey Dosovitskiy, Josip Djolonga	A method to train a single model simultaneously minimizing a family of loss functions instead of training a set of per-loss models.
428	Projection Based Constrained Policy Optimization	Tsung-Yen Yang, Justinian Rosca, Karthik Narasimhan, Peter J. Ramadge	We propose a new algorithm that learns constraint-satisfying policies, and provide theoretical analysis and empirical demonstration in the context of reinforcement learning with constraints.	code
429	Infinite-Horizon Differentiable Model Predictive Control	Sebastian East, Marco Gallieri, Jonathan Masci, Jan Koutnik, Mark Cannon	This paper proposes a differentiable linear quadratic Model Predictive Control (MPC) framework for safe imitation learning.
430	Combining Q-Learning and Search with Amortized Value Estimates	Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Tobias Pfaff, Theophane Weber, Lars Buesing, Peter W. Battaglia	We propose a model-based method called “Search with Amortized Value Estimates” (SAVE) which leverages both real and planned experience by combining Q-learning with Monte-Carlo Tree Search, achieving strong performance with very small search budgets.
431	Training Generative Adversarial Networks from Incomplete Observations using Factorised Discriminators	Daniel Stoller, Sebastian Ewert, Simon Dixon	We decompose the discriminator in a GAN in a principled way so that each component can be independently trained on different parts of the input. The resulting “FactorGAN” can be used for semi-supervised learning and in missing data scenarios.	code
432	Decentralized Deep Learning with Arbitrary Communication Compression	Anastasia Koloskova, Tao Lin, Sebastian U Stich, Martin Jaggi	We propose Choco-SGD—decentralized SGD with compressed communication—for non-convex objectives and show its strong performance in various deep learning applications (on-device learning, datacenter case).	code
433	Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control	Tsui-Wei Weng, Krishnamurthy (Dj) Dvijotham, Jonathan Uesato, Kai Xiao, Sven Gowal, Robert Stanforth*, Pushmeet Kohli	We study the problem of continuous control agents in deep RL with adversarial attacks and proposed a two-step algorithm based on learned model dynamics.
434	Gradient $\ell_1$ Regularization for Quantization Robustness	Milad Alizadeh, Arash Behboodi, Mart van Baalen, Christos Louizos, Tijmen Blankevoort, Max Welling	We show that regularizing the $\ell_1$-norm of gradients improves robustness to post-training quantization in neural networks.
435	SpikeGrad: An ANN-equivalent Computation Model for Implementing Backpropagation with Spikes	Johannes C. Thiele, Olivier Bichler, Antoine Dupret	An implementation of the backpropagation algorithm using spiking neurons for forward and backward propagation.
436	On the Relationship between Self-Attention and Convolutional Layers	Jean-Baptiste Cordonnier, Andreas Loukas, Martin Jaggi	A self-attention layer can perform convolution and often learns to do so in practice.	code
437	Learning-Augmented Data Stream Algorithms	Tanqiu Jiang, Yi Li, Honghao Lin, Yisong Ruan, David P. Woodruff	In this paper we explore the full power of such an oracle, showing that it can be applied to a wide array of problems in data streams, sometimes resulting in the first optimal bounds for such problems.	code
438	Structured Object-Aware Physics Prediction for Video Modeling and Planning	Jannik Kossen, Karl Stelzner, Marcel Hussing, Claas Voelcker, Kristian Kersting	We propose a structured object-aware video prediction model, which explicitly reasons about objects and demonstrate that it provides high-quality long term video predictions for planning.	code
439	Incorporating BERT into Neural Machine Translation	Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tieyan Liu	We propose a new algorithm named BERT-fused NMT, in which we first use BERT to extract representations for an input sequence, and then the representations are fused with each layer of the encoder and decoder of the NMT model through attention mechanisms.	code
440	MMA Training: Direct Input Space Margin Maximization through Adversarial Training	Gavin Weiguang Ding, Yash Sharma, Kry Yik Chau Lui, Ruitong Huang	We propose MMA training to directly maximize input space margin in order to improve adversarial robustness primarily by removing the requirement of specifying a fixed distortion bound.
441	Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies	Xinyun Chen, Lu Wang, Yizhe Hang, Heng Ge, Hongyuan Zha	A new partially policy-agnostic method for infinite-horizon off-policy policy evalution with multiple known or unknown behavior policies.
442	vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations	Alexei Baevski, Steffen Schneider, Michael Auli	Learn how to quantize speech signal and apply algorithms requiring discrete inputs to audio data such as BERT.
443	Meta-learning curiosity algorithms	Ferran Alet, Martin F. Schneider, Tomas Lozano-Perez, Leslie Pack Kaelbling	Meta-learning curiosity algorithms by searching through a rich space of programs yields novel mechanisms that generalize across very different reinforcement-learning domains.	code
444	Making Efficient Use of Demonstrations to Solve Hard Exploration Problems	Caglar Gulcehre, Tom Le Paine, Bobak Shahriari, Misha Denil, Matt Hoffman, Hubert Soyer, Richard Tanburn, Steven Kapturowski, Neil Rabinowitz, Duncan Williams, Gabriel Barth-Maron, Ziyu Wang, Nando de Freitas, Worlds Team	We introduce R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions.
445	VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning	Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson	VariBAD opens a path to tractable approximate Bayes-optimal exploration for deep RL using ideas from meta-learning, Bayesian RL, and approximate variational inference.
446	Lookahead: A Far-sighted Alternative of Magnitude-based Pruning	Sejun Park, Jaeho Lee, Sangwoo Mo, Jinwoo Shin	We study a multi-layer generalization of the magnitude-based pruning.	code
447	Spike-based causal inference for weight alignment	Jordan Guerguiev, Konrad Kording, Blake Richards	We present a learning rule for feedback weights in a spiking neural network that addresses the weight transport problem.	code
448	Empirical Bayes Transductive Meta-Learning with Synthetic Gradients	Xu Hu, Pablo Moreno, Yang Xiao, Xi Shen, Guillaume Obozinski, Neil Lawrence	We propose a meta-learning approach that learns from multiple tasks in a transductive setting, by leveraging unlabeled information in the query set to learn a more powerful meta-model.	code
449	Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning	Noah Siegel, Jost Tobias Springenberg, Felix Berkenkamp, Abbas Abdolmaleki, Michael Neunert, Thomas Lampe, Roland Hafner, Nicolas Heess, Martin Riedmiller	We develop a method for stable offline reinforcement learning from logged data. The key is to regularize the RL policy towards a learned “advantage weighted” model of the data.
450	Understanding the Limitations of Conditional Generative Models	Ethan Fetaya, Joern-Henrik Jacobsen, Will Grathwohl, Richard Zemel	In this work, we investigate robust classification with likelihood-based generative models from a theoretical and practical perspective to investigate if they can deliver on their promises.
451	Demystifying Inter-Class Disentanglement	Aviv Gabbay, Yedid Hoshen	Latent Optimization for Representation Disentanglement	code
452	Mixed-curvature Variational Autoencoders	Ondrej Skopek, Gary B?cigneul, Octavian-Eugen Ganea	Variational Autoencoders with latent spaces modeled as products of constant curvature Riemannian manifolds improve on image reconstruction over single-manifold variants.	code
453	BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations	Hyungjun Kim, Kyungsu Kim, Jinseok Kim, Jae-Joon Kim	In this work, we introduce coordinate discrete gradient (CDG) to better estimate the gradient mismatch.	code
454	Model-based reinforcement learning for biological sequence design	Christof Angermueller, David Dohan, David Belanger, Ramya Deshpande, Kevin Murphy, Lucy Colwell	We augment model-free policy learning with a sequence-level surrogate reward functions and count-based visitation bonus and demonstrate effectiveness in the large batch, low-round regime seen in designing DNA and protein sequences.
455	BayesOpt Adversarial Attack	Binxin Ru, Adam Cobb, Arno Blaas, Yarin Gal	We propose a query-efficient black-box attack which uses Bayesian optimisation in combination with Bayesian model selection to optimise over the adversarial perturbation and the optimal degree of search space dimension reduction.	code
456	Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies	Sungryull Sohn, Hyunjae Woo, Jongwook Choi, Honglak Lee	A novel meta-RL method that infers latent subtask structure
457	Hypermodels for Exploration	Vikranth Dwaracherla, Xiuyuan Lu, Morteza Ibrahimi, Ian Osband, Zheng Wen, Benjamin Van Roy	Hypermodels can encode posterior distributions similar to large ensembles at much smaller computational cost. This can facilitate significant improvements in exploration.
458	RaPP: Novelty Detection with Reconstruction along Projection Pathway	Ki Hyun Kim, Sangwoo Shim, Yongsub Lim, Jongseob Jeon, Jeongwoo Choi, Byungchan Kim, Andre S. Yoon	A new methodology for novelty detection by utilizing hidden space activation values obtained from a deep autoencoder.	code
459	Dynamics-Aware Embeddings	William Whitney, Rajat Agarwal, Kyunghyun Cho, Abhinav Gupta	State and action embeddings which incorporate the dynamics improve exploration and RL from pixels.	code
460	Functional Regularisation for Continual Learning with Gaussian Processes	Michalis K. Titsias, Jonathan Schwarz, Alexander G. de G. Matthews, Razvan Pascanu, Yee Whye Teh	Using inducing point sparse Gaussian process methods to overcome catastrophic forgetting in neural networks.
461	You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings	Daniel Ruffinelli, Samuel Broscheit, Rainer Gemulla	We study the impact of training strategies on the performance of knowledge graph embeddings.
462	AdvectiveNet: An Eulerian-Lagrangian Fluidic Reservoir for Point Cloud Processing	Xingzhe He, Helen Lu Cao, Bo Zhu	We present a new grid-particle learning method to process point clouds motivated by computational fluid dynamics.	code
463	Never Give Up: Learning Directed Exploration Strategies	Adri? Puigdom?nech Badia, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Bilal Piot, Steven Kapturowski, Olivier Tieleman, Martin Arjovsky, Alexander Pritzel, Andrew Bolt, Charles Blundell	We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
464	Fair Resource Allocation in Federated Learning	Tian Li, Maziar Sanjabi, Ahmad Beirami, Virginia Smith	We propose a novel optimization objective that encourages fairness in heterogeneous federated networks, and develop a scalable method to solve it.
465	Smooth markets: A basic mechanism for organizing gradient-based learners	David Balduzzi, Wojciech M. Czarnecki, Edward Hughes, Joel Leibo, Ian Gemp, Tom Anthony, Georgios Piliouras, Thore Graepel	We introduce a class of n-player games suited to gradient-based methods.
466	StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding	Wei Wang, Bin Bi, Ming Yan, Chen Wu, Jiangnan Xia, Zuyi Bao, Liwei Peng, Luo Si	Inspired by the linearization exploration work of Elman, we extend BERT to a new model, StructBERT, by incorporating language structures into pre-training.
467	Training binary neural networks with real-to-binary convolutions	Brais Martinez, Jing Yang, Adrian Bulat, Georgios Tzimiropoulos	This paper shows how to train binary networks to within a few percent points (~3-5 %) of the full precision counterpart with a negligible increase in the computational cost.
468	Permutation Equivariant Models for Compositional Generalization in Language	Jonathan Gordon, David Lopez-Paz, Marco Baroni, Diane Bouchacourt	We propose a link between permutation equivariance and compositional generalization, and provide equivariant language models
469	Continual learning with hypernetworks	Johannes von Oswald, Christian Henning, Jo?o Sacramento, Benjamin F. Grewe	To overcome this problem, we present a novel approach based on task-conditioned hypernetworks, i.e., networks that generate the weights of a target model based on task identity.
470	Phase Transitions for the Information Bottleneck in Representation Learning	Tailin Wu, Ian Fischer	We give a theoretical analysis of the Information Bottleneck objective to understand and predict observed phase transitions.
471	Variational Template Machine for Data-to-Text Generation	Rong Ye, Wenxian Shi, Hao Zhou, Zhongyu Wei, Lei Li	We propose the variational template machine (VTM), a novel method to generate text descriptions from data tables.
472	MEMORY-BASED GRAPH NETWORKS	Amir hosein Khasahmadi, Kaveh Hassani, Parsa Moradi, Leo Lee, Quaid Morris	We introduce efficient memory layers for graph neural networks
473	AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty	Dan Hendrycks, Norman Mu, Ekin Dogus Cubuk, Barret Zoph, Justin Gilmer, Balaji Lakshminarayanan	We obtain state-of-the-art on robustness to data shifts, and we maintain calibration under data shift even though even when accuracy drops
474	AtomNAS: Fine-Grained End-to-End Neural Architecture Search	Jieru Mei, Yingwei Li, Xiaochen Lian, Xiaojie Jin, Linjie Yang, Alan Yuille, Jianchao Yang	A new state-of-the-art on Imagenet for mobile setting	code
475	Residual Energy-Based Models for Text Generation	Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam	We show that Energy-Based models when trained on the residual of an auto-regressive language model can be used effectively and efficiently to generate text.
476	A closer look at the approximation capabilities of neural networks	Kai Fong Ernest Chong	A quantitative refinement of the universal approximation theorem via an algebraic approach.
477	Deep Audio Priors Emerge From Harmonic Convolutional Networks	Zhoutong Zhang, Yunyun Wang, Chuang Gan, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman	A new operation called Harmonic Convolution makes deep network model audio priors without training.	code
478	Expected Information Maximization: Using the I-Projection for Mixture Density Estimation	Philipp Becker, Oleg Arenz, Gerhard Neumann	A novel, non-adversarial, approach to learn latent variable models in general and mixture models in particular by computing the I-Projection solely based on samples.	code
479	A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms	Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Nan Rosemary Ke, Sebastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, Christopher Pal	This paper proposes a meta-learning objective based on speed of adaptation to transfer distributions to discover a modular decomposition and causal variables.	code
480	On the interaction between supervision and self-play in emergent communication	Ryan Lowe, Abhinav Gupta, Jakob Foerster, Douwe Kiela, Joelle Pineau	In this paper, we investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency: imitating human language data via supervised learning, and maximizing reward in a simulated multi-agent environment via self-play (as done in emergent communication), and introduce the term \textit{supervised self-play (S2P)} for algorithms using both of these signals.
481	Dynamic Model Pruning with Feedback	Tao Lin, Sebastian U. Stich, Luis Barba, Daniil Dmitriev, Martin Jaggi	We propose a novel model compression method that generates a sparse trained model without additional overhead: by allowing (i) dynamic allocation of the sparsity pattern and (ii) incorporating feedback signal to reactivate prematurely pruned weights we obtain a performant sparse model in one single training pass (retraining is not needed, but can further improve the performance).
482	Latent Normalizing Flows for Many-to-Many Cross Domain Mappings	Shweta Mahajan, Iryna Gurevych, Stefan Roth	Therefore, we propose a novel semi-supervised framework, which models shared information between domains and domain-specific information separately.	code
483	Transferring Optimality Across Data Distributions via Homotopy Methods	Matilde Gargiani, Andrea Zanelli, Quoc Tran Dinh, Moritz Diehl, Frank Hutter	We propose a new homotopy-based method to transfer “optimality knowledge” across different data distributions in order to speed up training of deep models.
484	Regularizing activations in neural networks via distribution matching with the Wassertein metric	Taejong Joo, Donggu Kang, Byunghoon Kim	We propose the projected error function regularization loss (PER) that encourages activations to follow the standard normal distribution.
485	Mutual Information Gradient Estimation for Representation Learning	Liangjian Wen, Yiji Zhou, Lirong He, Mingyuan Zhou, Zenglin Xu	Therefore, we propose the Mutual Information Gradient Estimator (MIGE) for representation learning based on score estimation of implicit distributions.
486	Efficient Transformer for Mobile Applications	Zhanghao Wu, Zhijian Liu, Ji Lin, Yujun Lin, Song Han	In this paper, we investigate the mobile setting (under 500M Mult-Adds) for NLP tasks to facilitate the deployment on the edge devices.
487	A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case	Greg Ongie, Rebecca Willett, Daniel Soudry, Nathan Srebro	We characterize the space of functions realizable as a ReLU network with an unbounded number of units (infinite width), but where the Euclidean norm of the weights is bounded.
488	Adversarial Lipschitz Regularization	D?vid Terj?k	alternative to gradient penalty	code
489	Compositional Continual Language Learning	Yuanpeng Li, Liang Zhao, Kenneth Church, Mohamed Elhoseiny	Inspired by that, in this paper, we propose a method for compositional continual learning of sequence-to-sequence models.	code
490	End to End Trainable Active Contours via Differentiable Rendering	Shir Gur, Tal Shaharabany, Lior Wolf	We present an image segmentation method that iteratively evolves a polygon.
491	Provable Filter Pruning for Efficient Neural Networks	Lucas Liebenwein, Cenk Baykal, Harry Lang, Dan Feldman, Daniela Rus	A sampling-based filter pruning approach for convolutional neural networks exhibiting provable guarantees on the size and performance of the pruned network.
492	HOW THE CHOICE OF ACTIVATION AFFECTS TRAINING OF OVERPARAMETRIZED NEURAL NETS	Abhishek Panigrahi, Abhishek Shetty, Navin Goyal	We provide theoretical results about the effect of activation function on the training of highly overparametrized 2-layer neural networks	code
493	Lipschitz constant estimation for Neural Networks via sparse polynomial optimization	Fabian Latorre, Paul Rolland, Volkan Cevher	We introduce LiPopt, a polynomial optimization framework for computing increasingly tighter upper bound on the Lipschitz constant of neural networks.	code
494	State Alignment-based Imitation Learning	Fangchen Liu, Zhan Ling, Tongzhou Mu, Hao Su	We propose a novel state alignment-based imitation learning method to train the imitator by following the state sequences in the expert demonstrations as much as possible.
495	Learning to Group: A Bottom-Up Framework for 3D Part Discovery in Unseen Categories	Tiange Luo, Kaichun Mo, Zhiao Huang, Siyu Hu, Jiarui Xu, Liwei Wang, Hao Su	We propose a learning-based iterative grouping framework which learns a grouping policy to progressively merge small part proposals into bigger ones in a bottom-up fashion and achieve the state-of-the-art performance in open-context setting.
496	Discriminative Particle Filter Reinforcement Learning for Complex Partial observations	Xiao Ma, Peter Karkus, Nan Ye, David Hsu, Wee Sun Lee	We introduce DPFRL, a framework for reinforcement learning under partial and complex observations with a fully differentiable discriminative particle filter
497	Unrestricted Adversarial Examples via Semantic Manipulation	Anand Bhattad, Min Jin Chong, Kaizhao Liang, Bo Li, David Forsyth	We introduce unrestricted perturbations that manipulate semantically meaningful image-based visual descriptors — color and texture — in order to generate effective and photorealistic adversarial examples.	code
498	Classification-Based Anomaly Detection for General Data	Liron Bergman, Yedid Hoshen	An anomaly detection that: uses random-transformation classification for generalizing to non-image data.
499	Scale-Equivariant Steerable Networks	Ivan Sosnovik, Michal Szmaja, Arnold Smeulders	In this work, we pay attention to scale changes,which regularly appear in various tasks due to the changing distancesbetween the objects and the camera.First, we introduce the general theory for building scale-equivariantconvolutional networks with steerable filters.
500	On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning	Jian Li, Xuanyuan Luo, Mingda Qiao	We give some generalization error bounds of noisy gradient methods such as SGLD, Langevin dynamics, noisy momentum and so forth.
501	Consistency Regularization for Generative Adversarial Networks	Han Zhang, Zizhao Zhang, Augustus Odena, Honglak Lee	In this work, we propose a simple and effective training stabilizer based on the notion of Consistency Regularization – a popular technique in the Semi-Supervised Learning literature.
502	Differentiable learning of numerical rules in knowledge graphs	Po-Wei Wang, Daria Stepanova, Csaba Domokos, J. Zico Kolter	We present an efficient approach to integrating numerical comparisons into differentiable rule learning in knowledge graphs
503	Learning to Move with Affordance Maps	William Qi, Ravi Teja Mullapudi, Saurabh Gupta, Deva Ramanan	We address the task of autonomous exploration and navigation using spatial affordance maps that can be learned in a self-supervised manner, these outperform classic geometric baselines while being more sample efficient than contemporary RL algorithms
504	Neural tangent kernels, transportation mappings, and universal approximation	Ziwei Ji, Matus Telgarsky, Ruicheng Xian	The NTK linearization is a universal approximator, even when looking arbitrarily close to initialization
505	SCALABLE OBJECT-ORIENTED SEQUENTIAL GENERATIVE MODELS	Jindong Jiang, Sepehr Janghorbani, Gerard De Melo, Sungjin Ahn	In this paper, we propose SCALOR, a generative model for Scalable Sequential Object-Oriented Representation.
506	Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks	Tribhuvanesh Orekondy, Bernt Schiele, Mario Fritz	We propose the first approach that can resist DNN model stealing/extraction attacks
507	Domain Adaptive Multiflow Networks	R?ger Berm?dez-Chac?n, Mathieu Salzmann, Pascal Fua	A Multiflow Network is a dynamic architecture for domain adaptation that learns potentially different computational graphs per domain, so as to map them to a common representation where inference can be performed in a domain-agnostic fashion.
508	Differentiable Programming for Physical Simulation	Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, Fredo Durand	We study the problem of learning and optimizing through physical simulations via differentiable programming, using our proposed DiffSim programming language and compiler.	code
509	Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning	Arsenii Ashukha, Alexander Lyzhov, Dmitry Molchanov, Dmitry Vetrov	We highlight the problems with common metrics of in-domain uncertainty and perform a broad study of modern ensembling techniques.
510	Episodic Reinforcement Learning with Associative Memory	Guangxiang Zhu, Zichuan Lin, Guangwen Yang, Chongjie Zhang	To improve sample efficiency of reinforcement learning, we propose a novel framework, calledEpisodic Reinforcement Learning with Associative Memory (ERLAM), which associates related experience trajectories to enable reasoning effective strategies.
511	Sub-policy Adaptation for Hierarchical Reinforcement Learning	Alexander Li, Carlos Florensa, Ignasi Clavera, Pieter Abbeel	We propose HiPPO, a stable Hierarchical Reinforcement Learning algorithm that can train several levels of the hierarchy simultaneously, giving good performance both in skill discovery and adaptation.	code
512	Critical initialisation in continuous approximations of binary neural networks	George Stamatescu, Federica Gerace, Carlo Lucibello, Ian Fuss, Langford White	signal propagation theory applied to continuous surrogates of binary nets; counter intuitive initialisation; reparameterisation trick not helpful
513	Deep Orientation Uncertainty Learning based on a Bingham Loss	Igor Glitschenski, Wilko Schwarting, Roshni Sahoo, Alexander Amini, Sertac Karaman	A method for learning uncertainties over orientations using the Bingham Distribution
514	Co-Attentive Equivariant Neural Networks: Focusing Equivariance On Transformations Co-Ocurring in Data	David W. Romero Guzm?n, Mark Hoogendoorn	We utilize attention to restrict equivariant neural networks to the set or co-ocurring transformations in data.	code
515	Mixed Precision DNNs: All you need is a good parametrization	Stefan Uhlich, Lukas Mauch, Fabien Cardinaux, Kazuki Yoshiyama, Javier Alonso Garcia, Stephen Tiedemann, Thomas Kemp, Akira Nakamura	We show that a suited parametrization of the quantizer is the key to achievea stable training and a good final performance.
516	Information Geometry of Orthogonal Initializations and Training	Piotr Aleksander Sok?l, Il Memming Park	nearly isometric DNN initializations imply low parameter space curvature, and a lower condition number, but that’s not always great	code
517	Extreme Classification via Adversarial Softmax Approximation	Robert Bamler, Stephan Mandt	An efficient, unbiased approximation of the softmax loss function for extreme classification	code
518	Learning Nearly Decomposable Value Functions Via Communication Minimization	Tonghan Wang, Jianhao Wang, Chongyi Zheng, Chongjie Zhang	To address this limitation, this paper presents a novel framework for learning nearly decomposable value functions with communication, with which agents act on their own most of the time but occasionally send messages to other agents in order for effective coordination.
519	Robust Subspace Recovery Layer for Unsupervised Anomaly Detection	Chieh-Hsin Lai, Dongmian Zou, Gilad Lerman	This work proposes an autoencoder with a novel robust subspace recovery layer for unsupervised anomaly detection and demonstrates state-of-the-art results on various datasets.
520	Learning to Coordinate Manipulation Skills via Skill Behavior Diversification	Youngwoon Lee, Jingyun Yang, Joseph J. Lim	We propose to tackle complex tasks of multiple agents by learning composable primitive skills and coordination of the skills.
521	NAS-BENCH-1SHOT1: BENCHMARKING AND DISSECTING ONE-SHOT NEURAL ARCHITECTURE SEARCH	Arber Zela, Julien Siems, Frank Hutter	In order to allowa scientific study of these components, we introduce a general framework forone-shot NAS that can be instantiated to many recently-introduced variants andintroduce a general benchmarking framework that draws on the recent large-scaletabular benchmark NAS-Bench-101 for cheap anytime evaluations of one-shotNAS methods.	code
522	Conservative Uncertainty Estimation By Fitting Prior Networks	Kamil Ciosek, Vincent Fortuin, Ryota Tomioka, Katja Hofmann, Richard Turner	We provide theoretical support to uncertainty estimates for deep learning obtained fitting random priors.
523	Understanding Generalization in Recurrent Neural Networks	Zhuozhuo Tu, Fengxiang He, Dacheng Tao	In this work, we develop the theory for analyzing the generalization performance of recurrent neural networks.
524	The Shape of Data: Intrinsic Distance for Data Distributions	Anton Tsitsulin, Marina Munkhoeva, Davide Mottin, Panagiotis Karras, Alex Bronstein, Ivan Oseledets, Emmanuel Mueller	We propose a metric for comparing data distributions based on their geometry while not relying on any positional information.	code
525	How to 0wn the NAS in Your Spare Time	Sanghyun Hong, Michael Davinroy, Yigitcan Kaya, Dana Dachman-Soled, Tudor Dumitras	We design an algorithm that reconstructs the key components of a novel deep learning system by exploiting a small amount of information leakage from a cache side-channel attack, Flush+Reload.
526	Enabling Deep Spiking Neural Networks with Hybrid Conversion and Spike Timing Dependent Backpropagation	Nitin Rathi, Gopalakrishnan Srinivasan, Priyadarshini Panda, Kaushik Roy	To address these challenges, we present a computationally-efficient training technique for deep SNNs.	code
527	BREAKING CERTIFIED DEFENSES: SEMANTIC ADVERSARIAL EXAMPLES WITH SPOOFED ROBUSTNESS CERTIFICATES	Amin Ghiasi, Ali Shafahi, Tom Goldstein	We present an attack that maintains the imperceptibility property of adversarial examples while being outside of the certified radius.
528	Query-efficient Meta Attack to Deep Neural Networks	Jiawei Du, Hu Zhang, Joey Tianyi Zhou, Yi Yang, Jiashi Feng	In this work, we propose a meta attack approach that is capable of attacking a targeted model with much fewer queries.
529	Massively Multilingual Sparse Word Representations	G?bor Berend	We propose an efficient algorithm for determining multilingually comparable sparse word representations that we release for 27 typologically diverse languages.	code
530	Monotonic Multihead Attention	Xutai Ma, Juan Miguel Pino, James Cross, Liezl Puzon, Jiatao Gu	Make the transformer streamable with monotonic attention.
531	Gradients as Features for Deep Representation Learning	Fangzhou Mu, Yingyu Liang, Yin Li	Given a pre-trained model, we explored the per-sample gradients of the model parameters relative to a task-specific loss, and constructed a linear model that combines gradients of model parameters and the activation of the model.
532	Pay Attention to Features, Transfer Learn faster CNNs	Kafeng Wang, Xitong Gao, Yiren Zhao, Xingjian Li, Dejing Dou, Cheng-Zhong Xu	We introduce attentive feature distillation and selection, to fine-tune a large model and produce a faster one.