Paper Digest: ICLR 2019 Highlights

May 7, 2019October 5, 2019 admin

Download ICLR-2019-Paper-Digests.pdf– highlights of all 24 oral presentation papers.
Download ICLR-2019-Poster-Digests.pdf– highlights of all 478 poster papers.

The International Conference on Learning Representations (ICLR) is one of the top machine learning conferences in the world. In 2019, there were 1,591 paper submissions, of which 478 accepted with poster presentations and 24 with oral presentations.

To help AI community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting AI paper, you are welcome to sign up our free paper digest service to get new paper updates customized to your own interests on a daily basis.

Paper Digest Team
team@paperdigest.org

TABLE 1: ICLR 2019 Papers

	Title	Authors	Highlight
1	BA-Net: Dense Bundle Adjustment Networks	Chengzhou Tang, Ping Tan	This paper introduces a network architecture to solve the structure-from-motion (SfM) problem via feature-metric bundle adjustment (BA), which explicitly enforces multi-view geometry constraints in the form of feature-metric error.
2	Deterministic Variational Inference for Robust Bayesian Neural Networks	Anqi Wu, Sebastian Nowozin, Edward Meeds, Richard E. Turner, Jos? Miguel Hern?ndez-Lobato, Alexander L. Gaunt	On the application of heteroscedastic regression we demonstrate good predictive performance over alternative approaches.
3	Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks	Yikang Shen, Shawn Tan, Alessandro Sordoni, Aaron Courville	This paper proposes to add such inductive bias by ordering the neurons; a vector of master input and forget gates ensures that when a given neuron is updated, all the neurons that follow it in the ordering are also updated.
4	Large Scale GAN Training for High Fidelity Natural Image Synthesis	Andrew Brock, Jeff Donahue, Karen Simonyan	To this end, we train Generative Adversarial Networks at the largest scale yet attempted, and study the instabilities specific to such scale.
5	Learning deep representations by mutual information estimation and maximization	R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, Yoshua Bengio	This work investigates unsupervised learning of representations by maximizing mutual information between an input and the output of a deep neural network encoder.
6	KnockoffGAN: Generating Knockoffs for Feature Selection using Generative Adversarial Networks	James Jordon, Jinsung Yoon, Mihaela van der Schaar	In this work, we build on the promising Knockoff framework by developing a flexible knockoff generation model.
7	Learning Protein Structure with a Differentiable Simulator	John Ingraham, Adam Riesselman, Chris Sander, Debora Marks	In this work we aim to bridge the gap between the expressive capacity of energy functions and the practical capabilities of their simulators by using an unrolled Monte Carlo simulation as a model for data.
8	ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness	Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, Wieland Brendel	We show that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence and reveals fundamentally different classification strategies.
9	Smoothing the Geometry of Probabilistic Box Embeddings	Xiang Li, Luke Vilnis, Dongxu Zhang, Michael Boratko, Andrew McCallum	In this work, we present a novel hierarchical embedding model, inspired by a relaxation of box embeddings into parameterized density functions using Gaussian convolutions over the boxes.
10	On Random Deep Weight-Tied Autoencoders: Exact Asymptotic Analysis, Phase Transitions, and Implications to Training	Ping Li, Phan-Minh Nguyen	We study the behavior of weight-tied multilayer vanilla autoencoders under the assumption of random weights.
11	Meta-Learning Update Rules for Unsupervised Representation Learning	Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein	In this work, we propose instead to directly target later desired tasks by meta-learning an unsupervised learning rule which leads to representations useful for those tasks.
12	Transferring Knowledge across Learning Processes	Sebastian Flennerhag, Pablo G. Moreno, Neil D. Lawrence, Andreas Damianou	We propose Leap, a framework that achieves this by transferring knowledge across learning processes.
13	GENERATING HIGH FIDELITY IMAGES WITH SUBSCALE PIXEL NETWORKS AND MULTIDIMENSIONAL UPSCALING	Jacob Menick, Nal Kalchbrenner	To address theformer challenge, we propose the Subscale Pixel Network (SPN), a conditionaldecoder architecture that generates an image as a sequence of image slices of equalsize.
14	Temporal Difference Variational Auto-Encoder	Karol Gregor, George Papamakarios, Frederic Besse, Lars Buesing, Theophane Weber	Motivated by the absence of a model satisfying all these requirements, we propose TD-VAE, a generative sequence model that learns representations containing explicit beliefs about states several steps into the future, and that can be rolled out directly without single-step transitions.
15	A Unified Theory of Early Visual Representations from Retina to Cortex through Anatomically Constrained Deep CNNs	Jack Lindsey, Samuel A. Ocko, Surya Ganguli, Stephane Deny	Here, using a deep convolutional neural network trained on image recognition as a model of the visual system, we show that such differences in representation can emerge as a direct consequence of different neural resource constraints on the retinal and cortical networks, and for the first time we find a single model from which both geometries spontaneously emerge at the appropriate stages of visual processing.
16	Pay Less Attention with Lightweight and Dynamic Convolutions	Felix Wu, Angela Fan, Alexei Baevski, Yann Dauphin, Michael Auli	In this paper, we show that a very lightweight convolution can perform competitively to the best reported self-attention results.
17	Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset	Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, Douglas Eck	The networks and the dataset together present a promising approach toward creating new expressive and interpretable neural models of music.
18	Learning to Remember More with Less Memorization	Hung Le, Truyen Tran, Svetha Venkatesh	This method aims to balance between maximizing memorization and forgetting via overwriting mechanisms.
19	Learning Robust Representations by Projecting Superficial Statistics Out	Haohan Wang, Zexue He, Zachary C. Lipton, Eric P. Xing	We test our method on the battery of standard domain generalization data sets and, interestingly, achieve comparable or better performance as compared to other domain generalization methods that explicitly require samples from the target distribution for training.
20	Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware	Florian Tramer, Dan Boneh	Building upon an efficient outsourcing scheme for matrix multiplication, we propose Slalom, a framework that securely delegates execution of all linear layers in a DNN from a TEE (e.g., Intel SGX or Sanctum) to a faster, yet untrusted, co-located processor.
21	The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision	Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, Jiajun Wu	We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that learns visual concepts, words, and semantic parsing of sentences without explicit supervision on any of them; instead, our model learns by simply looking at images and reading paired questions and answers.
22	The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks	Jonathan Frankle, Michael Carbin	Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy.
23	FFJORD: Free-Form Continuous Dynamics for Scalable Reversible Generative Models	Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, David Duvenaud	In this paper, we use Hutchinson?s trace estimator to give a scalable unbiased estimate of the log-density.
24	How Powerful are Graph Neural Networks?	Keyulu Xu, Weihua Hu, Jure Leskovec, Stefanie Jegelka	Here, we present a theoretical framework for analyzing the expressive power of GNNs to capture different graph structures.

TABLE 2: ICLR 2019 Posters

	Title	Authors	Highlight
1	Convolutional Neural Networks on Non-uniform Geometrical Signals Using Euclidean Spectral Transformation	Chiyu Max Jiang, Dequan Wang, Jingwei Huang, Philip Marcus, Matthias Niessner	To this end, we develop mathematical formulations for Non-Uniform Fourier Transforms (NUFT) to directly, and optimally, sample nonuniform data signals of different topologies defined on a simplex mesh into the spectral domain with no spatial sampling error.
2	Augmented Cyclic Adversarial Learning for Low Resource Domain Adaptation	Ehsan Hosseini-Asl, Yingbo Zhou, Caiming Xiong, Richard Socher	In this paper, we propose an augmented cyclic adversarial learning model that enforces the cycle-consistency constraint via an external task specific model, which encourages the preservation of task-relevant content as opposed to exact reconstruction.
3	Variance Networks: When Expectation Does Not Meet Your Expectations	Kirill Neklyudov, Dmitry Molchanov, Arsenii Ashukha, Dmitry Vetrov	In this paper, we introduce variance layers, a different kind of stochastic layers.
4	Initialized Equilibrium Propagation for Backprop-Free Training	Peter O’Connor, Efstratios Gavves, Max Welling	In response to this problem, we propose Initialized Equilibrium Propagation, which trains a feedforward network to initialize the iterative inference procedure for Equilibrium propagation.
5	Explaining Image Classifiers by Counterfactual Generation	Chun-Hao Chang, Elliot Creager, Anna Goldenberg, David Duvenaud	Explaining Image Classifiers by Counterfactual Generation.
6	SNIP: SINGLE-SHOT NETWORK PRUNING BASED ON CONNECTION SENSITIVITY	Namhoon Lee, Thalaiyasingam Ajanthan, Philip Torr	In this work, we present a new approach that prunes a given network once at initialization prior to training.
7	Diagnosing and Enhancing VAE Models	Bin Dai, David Wipf	In this regard, we rigorously analyze the VAE objective, differentiating situations where this belief is and is not actually true.
8	Disjoint Mapping Network for Cross-modal Matching of Voices and Faces	Yandong Wen, Mahmoud Al Ismail, Weiyang Liu, Bhiksha Raj, Rita Singh	We propose a novel framework, called Disjoint Mapping Network (DIMNet), for cross-modal biometric matching, in particular of voices and faces.
9	Automatically Composing Representation Transformations as a Means for Generalization	Michael Chang, Abhishek Gupta, Sergey Levine, Thomas L. Griffiths	As a first step for tackling compositional generalization, we introduce the compositional recursive learner, a domain-general framework for learning algorithmic procedures for composing representation transformations, producing a learner that reasons about what computation to execute by making analogies to previously seen problems. We propose the compositional generalization problem for measuring how readily old knowledge can be reused and hence built upon.
10	Visual Reasoning by Progressive Module Networks	Seung Wook Kim, Makarand Tapaswi, Sanja Fidler	We propose to represent a solver for each task as a neural module that calls existing modules (solvers for simpler tasks) in a functional program-like manner.
11	Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes	Roman Novak, Lechao Xiao, Yasaman Bahri, Jaehoon Lee, Greg Yang, Jiri Hron, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-dickstein	In this work, we derive an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and achieve state of the art results on CIFAR10 for GPs without trainable kernels.
12	Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference	Matthew Riemer, Ignacio Cases, Robert Ajemian, Miao Liu, Irina Rish, Yuhai Tu, and Gerald Tesauro	In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples.
13	Sparse Dictionary Learning by Dynamical Neural Networks	Tsung-Han Lin, Ping Tak Peter Tang	Using spiking neurons to construct our dynamical network, we present a learning process, its rigorous mathematical analysis, and numerical results on several dictionary learning problems.
14	Eidetic 3D LSTM: A Model for Video Prediction and Beyond	Yunbo Wang, Lu Jiang, Ming-Hsuan Yang, Li-Jia Li, Mingsheng Long, Li Fei-Fei	We present a new model, Eidetic 3D LSTM (E3D-LSTM), that integrates 3D convolutions into RNNs.
15	ALISTA: Analytic Weights Are As Good As Learned Weights in LISTA	Jialin Liu, Xiaohan Chen, Zhangyang Wang, Wotao Yin	In this work, we propose Analytic LISTA (ALISTA), where the weight matrix in LISTA is computed as the solution to a data-free optimization problem, leaving only the stepsize and threshold parameters to data-driven learning.
16	Three Mechanisms of Weight Decay Regularization	Guodong Zhang, Chaoqi Wang, Bowen Xu, Roger Grosse	We identify three distinct mechanisms by which weight decay exerts a regularization effect, depending on the particular optimization algorithm and architecture: (1) increasing the effective learning rate, (2) approximately regularizing the input-output Jacobian norm, and (3) reducing the effective damping coefficient for second-order optimization.Our results provide insight into how to improve the regularization of neural networks.
17	Learning Multimodal Graph-to-Graph Translation for Molecule Optimization	Wengong Jin, Kevin Yang, Regina Barzilay, Tommi Jaakkola	Our primary contributions include a junction tree encoder-decoder for learning diverse graph translations along with a novel adversarial training method for aligning distributions of molecules.
18	A Data-Driven and Distributed Approach to Sparse Signal Representation and Recovery	Ali Mousavi, Gautam Dasarathy, Richard G. Baraniuk	In this paper, we focus on two challenges which offset the promise of sparse signal representation, sensing, and recovery.
19	On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data	Nan Lu, Gang Niu, Aditya Krishna Menon, Masashi Sugiyama	In this paper, we study training arbitrary (from linear to deep) binary classifier from only unlabeled (U) data by ERM.
20	Neural Logic Machines	Honghua Dong, Jiayuan Mao, Tian Lin, Chong Wang, Lihong Li, Denny Zhou	We propose the Neural Logic Machine (NLM), a neural-symbolic architecture for both inductive learning and logic reasoning.
21	Neural Speed Reading with Structural-Jump-LSTM	Christian Hansen, Casper Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma	We present Structural-Jump-LSTM: the first neural speed reading model to both skip and jump text during inference.
22	Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures	Jonathan Uesato, Ananya Kumar, Csaba Szepesvari*, Tom Erez, Avraham Ruderman, Keith Anderson, Krishnamurthy (Dj) Dvijotham, Nicolas Heess, Pushmeet Kohli	To solve this we propose a continuation approach that learns failure modes in related but less robust agents.
23	Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search	Lars Buesing, Theophane Weber, Yori Zwols, Nicolas Heess, Sebastien Racaniere, Arthur Guez, Jean-Baptiste Lespiau	Based on this, we propose the Counterfactually-Guided Policy Search (CF-GPS) algorithm for learning policies in POMDPs from off-policy experience.
24	signSGD via Zeroth-Order Oracle	Sijia Liu, Pin-Yu Chen, Xiangyi Chen, Mingyi Hong	In this paper, we design and analyze a new zeroth-order (ZO) stochastic optimization algorithm, ZO-signSGD, which enjoys dual advantages of gradient-free operations and signSGD.
25	Preventing Posterior Collapse with delta-VAEs	Ali Razavi, Aaron van den Oord, Ben Poole, Oriol Vinyals	Due to the phenomenon of ?posterior collapse,? current latent variable generative models pose a challenging design choice that either weakens the capacity of the decoder or requires altering the training objective.
26	Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees	Yuping Luo, Huazhe Xu, Yuanzhi Li, Yuandong Tian, Trevor Darrell, Tengyu Ma	This paper introduces a novel algorithmic framework for designing and analyzing model-based RL algorithms with theoretical guarantees.
27	Knowledge Flow: Improve Upon Your Teachers	Iou-Jen Liu, Jian Peng, Alexander Schwing	To address this issue, in this paper, we develop knowledge flow which moves ?knowledge? from multiple deep nets, referred to as teachers, to a new deep net model, called the student.
28	Directed-Info GAIL: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information	Mohit Sharma, Arjun Sharma, Nicholas Rhinehart, Kris M. Kitani	We propose a new algorithm based on the generative adversarial imitation learning framework which automatically learns sub-task policies from unsegmented demonstrations.
29	A Max-Affine Spline Perspective of Recurrent Neural Networks	Zichao Wang, Randall Balestriero, Richard Baraniuk	The resulting representation provides several new perspectives for analyzing RNNs, three of which we study in this paper.
30	Learning to Navigate the Web	Izzeddin Gur, Ulrich Rueckert, Aleksandra Faust, Dilek Hakkani-Tur	We approach the aforementioned problems from a different perspective and propose guided RL approaches that can generate unbounded amount of experience for an agent to learn from.
31	Training for Faster Adversarial Robustness Verification via Inducing ReLU Stability	Kai Y. Xiao, Vincent Tjeng, Nur Muhammad (Mahi) Shafiullah, Aleksander Madry	Specifically, we aim to train deep neural networks that not only are robust to adversarial perturbations but also whose robustness can be verified more easily.
32	Learning to Learn with Conditional Class Dependencies	Xiang Jiang, Mohammad Havaei, Farshid Varno, Gabriel Chartrand, Nicolas Chapados, Stan Matwin	We propose a meta-learning framework, Conditional class-Aware Meta-Learning (CAML), that conditionally transforms feature representations based on a metric space that is trained to capture inter-class dependencies.
33	Hierarchical Visuomotor Control of Humanoids	Josh Merel, Arun Ahuja, Vu Pham, Saran Tunyasuvunakool, Siqi Liu, Dhruva Tirumala, Nicolas Heess, Greg Wayne	In this work, we partly factor this problem into low-level motor control from proprioception and high-level coordination of the low-level skills informed by vision.
34	Unsupervised Adversarial Image Reconstruction	Arthur Pajot, Emmanuel de Bezenac, Patrick Gallinari	We cast the problem as finding the \textit{maximum a posteriori} estimate of the signal given each measurement, and propose a general framework for the reconstruction problem.
35	Max-MIG: an Information Theoretic Approach for Joint Learning from Crowds	Peng Cao, Yilun Xu, Yuqing Kong, Yizhou Wang	We propose an information theoretic approach, Max-MIG, for joint learning from crowds, with a common assumption: the crowdsourced labels and the data are independent conditioning on the ground truth.
36	AutoLoss: Learning Discrete Schedule for Alternate Optimization	Haowen Xu, Hao Zhang, Zhiting Hu, Xiaodan Liang, Ruslan Salakhutdinov, Eric Xing	In this paper, we present AutoLoss, a meta-learning framework that automatically learns and determines the optimization schedule.
37	Learning what and where to attend	Drew Linsley, Dan Shiebler, Sven Eberhardt, Thomas Serre	Here, we demonstrate the benefit of using stronger supervisory signals by teaching DCNs to attend to image regions that humans deem important for object recognition.
38	ROBUST ESTIMATION VIA GENERATIVE ADVERSARIAL NETWORKS	Chao GAO, jiyi LIU, Yuan YAO, Weizhi ZHU	In this paper, we establish an intriguing connection between f-GANs and various depth functions through the lens of f-Learning.
39	INVASE: Instance-wise Variable Selection using Neural Networks	Jinsung Yoon, James Jordon, Mihaela van der Schaar	In this paper, we propose a new instance-wise feature selection method, which we term INVASE.
40	Meta-Learning with Latent Embedding Optimization	Andrei A. Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan Pascanu, Simon Osindero, Raia Hadsell	We show that it is possible to bypass these limitations by learning a data-dependent latent generative representation of model parameters, and performing gradient-based meta-learning in this low-dimensional latent space.
41	Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach	Wenda Zhou, Victor Veitch, Morgane Austern, Ryan P. Adams, Peter Orbanz	The purpose of this paper is to connect these two empirical observations.
42	Learning to Represent Edits	Pengcheng Yin, Graham Neubig, Miltiadis Allamanis, Marc Brockschmidt, Alexander L. Gaunt	We introduce the problem of learning distributed representations of edits.
43	Neural Probabilistic Motor Primitives for Humanoid Control	Josh Merel, Leonard Hasenclever, Alexandre Galashov, Arun Ahuja, Vu Pham, Greg Wayne, Yee Whye Teh, Nicolas Heess	To do this, we propose a motor architecture that has the general structure of an inverse model with a latent-variable bottleneck.
44	Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder	Caio Corro, Ivan Titov	To this end, we propose a novel latent-variable generative model for semi-supervised syntactic dependency parsing.
45	Janossy Pooling: Learning Deep Permutation-Invariant Functions for Variable-Size Inputs	Ryan L. Murphy, Balasubramaniam Srinivasan, Vinayak Rao, Bruno Ribeiro	We consider a simple and overarching representation for permutation-invariant functions of sequences (or set functions).
46	An Empirical Study of Example Forgetting during Deep Neural Network Learning	Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes*, Adam Trischler, Yoshua Bengio, Geoffrey J. Gordon	Our goal is to understand whether a related phenomenon occurs when data does not undergo a clear distributional shift.
47	RNNs implicitly implement tensor-product representations	R. Thomas McCoy, Tal Linzen, Ewan Dunbar, Paul Smolensky	To test this hypothesis, we introduce Tensor Product Decomposition Networks (TPDNs), which use TPRs to approximate existing vector representations.
48	Learning To Solve Circuit-SAT: An Unsupervised Differentiable Approach	Saeed Amizadeh, Sergiy Matusevych, Markus Weimer	In this paper, we propose a neural framework that can learn to solve the Circuit Satisfiability problem.
49	Dynamic Channel Pruning: Feature Boosting and Suppression	Xitong Gao, Yiren Zhao, Lukasz Dudziak, Robert Mullins, Cheng-zhong Xu	In this paper, we reduce this cost by exploiting the fact that the importance of features computed by convolutional layers is highly input-dependent, and propose feature boosting and suppression (FBS), a new method to predictively amplify salient convolutional channels and skip unimportant ones at run-time.
50	signSGD with Majority Vote is Communication Efficient and Fault Tolerant	Jeremy Bernstein, Jiawei Zhao, Kamyar Azizzadenesheli, Anima Anandkumar	We explore a particularly simple algorithm for robust, communication-efficient learning—signSGD.
51	Bounce and Learn: Modeling Scene Dynamics with Real-World Bounces	Senthil Purushwalkam, Abhinav Gupta, Danny Kaufman, Bryan Russell	We introduce an approach to model surface properties governing bounces in everyday scenes. VIM learns to infer physical parameters for locations in a scene given a single still image, while PIM learns to model physical interactions for the prediction task given physical parameters and observed pre-collision 3D trajectories.To achieve our results, we introduce the Bounce Dataset comprising 5K RGB-D videos of bouncing trajectories of a foam ball to probe surfaces of varying shapes and materials in everyday scenes including homes and offices.Our proposed model learns from our collected dataset of real-world bounces and is bootstrapped with additional information from simple physics simulations.
52	K for the Price of 1: Parameter-efficient Multi-task and Transfer Learning	Pramod Kaushik Mudrakarta, Mark Sandler, Andrey Zhmoginov, Andrew Howard	We introduce a novel method that enables parameter-efficient transfer and multi-task learning with deep neural networks.
53	Towards Metamerism via Foveated Style Transfer	Arturo Deza, Aditya Jonnalagadda, Miguel P. Eckstein	In this paper, we propose ourNeuroFovea metamer model, a foveated generative model that is based on a mixtureof peripheral representations and style transfer forward-pass algorithms.
54	Post Selection Inference with Incomplete Maximum Mean Discrepancy Estimator	Makoto Yamada, Denny Wu, Yao-Hung Hubert Tsai, Hirofumi Ohta, Ruslan Salakhutdinov, Ichiro Takeuchi, Kenji Fukumizu	In this paper, we propose a post selection inference (PSI) framework for divergence measure, which can select a set of statistically significant features that discriminate two distributions.
55	Emergent Coordination Through Competition	Siqi Liu, Guy Lever, Josh Merel, Saran Tunyasuvunakool, Nicolas Heess, Thore Graepel	We study the emergence of cooperative behaviors in reinforcement learning agents by introducing a challenging competitive multi-agent soccer environment with continuous simulated physics.
56	Prior Convictions: Black-box Adversarial Attacks with Bandits and Priors	Andrew Ilyas, Logan Engstrom, Aleksander Madry	We introduce a framework that conceptually unifies much of the existing work on black-box attacks, and demonstrate that the current state-of-the-art methods are optimal in a natural sense.
57	Sample Efficient Imitation Learning for Continuous Control	Fumihiro Sasaki	We believe that IL algorithms could be more applicable to real-world problems if the number of interactions could be reduced.In this paper, we propose a model-free IL algorithm for continuous control.
58	Generative Code Modeling with Graphs	Marc Brockschmidt, Miltiadis Allamanis, Alexander L. Gaunt, Oleksandr Polozov	We present a novel model for this problem that uses a graph to represent the intermediate state of the generated output.
59	Critical Learning Periods in Deep Networks	Alessandro Achille, Matteo Rovere, Stefano Soatto	To better understand this phenomenon, we use the Fisher Information of the weights to measure the effective connectivity between layers of a network during training.
60	CEM-RL: Combining evolutionary and gradient-based methods for policy search	Pourchot, Sigaud	In this paper, we propose a different combination scheme using the simple cross-entropymethod (CEM) and Twin Delayed Deep Deterministic policy gradient (TD3), another off-policy deep RL algorithm which improves over DDPG.
61	LanczosNet: Multi-Scale Deep Graph Convolutional Networks	Renjie Liao, Zhizhen Zhao, Raquel Urtasun, Richard Zemel	We propose Lanczos network (LanczosNet) which uses the Lanczos algorithm to construct low rank approximations of the graph Laplacian for graph convolution.Relying on the tridiagonal decomposition of the Lanczos algorithm, we not only efficiently exploit multi-scale information via fast approximated computation of matrix power but also design learnable spectral filters.Being fully differentiable, LanczosNet facilitates both graph kernel learning as well as learning node embeddings.We show the connection between our LanczosNet and graph based manifold learning, especially diffusion maps.We benchmark our model against $8$ recent deep graph networks on citation datasets and QM8 quantum chemistry dataset.Experimental results show that our model achieves the state-of-the-art performance in most tasks.
62	Excessive Invariance Causes Adversarial Vulnerability	Joern-Henrik Jacobsen, Jens Behrmann, Richard Zemel, Matthias Bethge	One core idea of adversarial example research is to reveal neural network errors under such distribution shifts.
63	Hindsight policy gradients	Paulo Rauber, Avinash Ummadisingu, Filipe Mutz, J?rgen Schmidhuber	In this paper, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms.
64	Adaptive Gradient Methods with Dynamic Bound of Learning Rate	Liangchen Luo, Yuanhao Xiong, Yan Liu, Xu Sun	Adaptive optimization methods such as AdaGrad, RMSprop and Adam have been proposed to achieve a rapid training process with an element-wise scaling term on learning rates.
65	Decoupled Weight Decay Regularization	Ilya Loshchilov, Frank Hutter	L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is \emph{not} the case for adaptive gradient algorithms, such as Adam.
66	Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile	Panayotis Mertikopoulos, Bruno Lecouat, Houssam Zenati, Chuan-Sheng Foo, Vijay Chandrasekhar, Georgios Piliouras	To make piecemeal progress along these lines, we analyze the behavior of mirror descent (MD) in a class of non-monotone problems whose solutions coincide with those of a naturally associated variational inequality ? a property which we call coherence.
67	DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder	Xiaodong Gu, Kyunghyun Cho, Jung-Woo Ha, Sunghun Kim	In this paper, we propose DialogWAE, a conditional Wasserstein autoencoder (WAE) specially designed for dialogue modeling.
68	No Training Required: Exploring Random Encoders for Sentence Classification	John Wieting, Douwe Kiela	Our aim is to put sentence embeddings on more solid footing by 1) looking at how much modern sentence embeddings gain over random methods—as it turns out, surprisingly little; and by 2) providing the field with more appropriate baselines going forward—which are, as it turns out, quite strong.
69	Neural Graph Evolution: Towards Efficient Automatic Robot Design	Tingwu Wang, Yuhao Zhou, Sanja Fidler, Jimmy Ba	We propose Neural Graph Evolution (NGE), which performs selection on current candidates and evolves new ones iteratively.
70	Function Space Particle Optimization for Bayesian Neural Networks	Ziyu Wang, Tongzheng Ren, Jun Zhu, Bo Zhang	In this paper, we propose to solve this issue by performing particle optimization directly in the space of regression functions.
71	Structured Adversarial Attack: Towards General Implementation and Better Interpretability	Kaidi Xu, Sijia Liu, Pu Zhao, Pin-Yu Chen, Huan Zhang, Quanfu Fan, Deniz Erdogmus, Yanzhi Wang, Xue Lin	This work develops a more general attack model, i.e., the structured attack (StrAttack), which explores group sparsity in adversarial perturbation by sliding a mask through images aiming for extracting key spatial structures.
72	Spherical CNNs on Unstructured Grids	Chiyu Max Jiang, Jingwei Huang, Karthik Kashinath, Prabhat, Philip Marcus, Matthias Niessner	We present an efficient convolution kernel for Convolutional Neural Networks (CNNs) on unstructured grids using parameterized differential operators while focusing on spherical signals such as panorama images or planetary signals.To this end, we replace conventional convolution kernels with linear combinations of differential operators that are weighted by learnable parameters.
73	Optimal Transport Maps For Distribution Preserving Operations on Latent Spaces of Generative Models	Eirikur Agustsson, Alexander Sage, Radu Timofte, Luc Van Gool	In this paper, we propose a framework for modifying the latent space operations such that the distribution mismatch is fully eliminated.
74	Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning	Michael Lutter, Christian Ritter, Jan Peters	As a first example, we propose Deep Lagrangian Networks (DeLaN) as a deep network structure upon which Lagrangian Mechanics have been imposed.
75	Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks	Charbel Sakr, Naigang Wang, Chia-Yu Chen, Jungwook Choi, Ankur Agrawal, Naresh Shanbhag, Kailash Gopalakrishnan	We present a statistical approach to analyze the impact of reduced accumulation precision on deep learning training.
76	Deep Convolutional Networks as shallow Gaussian Processes	Adri? Garriga-Alonso, Carl Edward Rasmussen, Laurence Aitchison	We show that the output of a (residual) CNN with an appropriate prior over the weights and biases is a GP in the limit of infinitely many convolutional filters, extending similar results for dense networks.
77	Unsupervised Domain Adaptation for Distance Metric Learning	Kihyuk Sohn, Wenling Shang, Xiang Yu, Manmohan Chandraker	To handle both within and cross domain verifications, we propose a Feature Transfer Network (FTN) to separate the target feature space from the original source space while aligned with a transformed source space.
78	A comprehensive, application-oriented study of catastrophic forgetting in DNNs	B. Pf?lb, A. Gepperth	We present a large-scale empirical study of catastrophic forgetting (CF) in modern Deep Neural Network (DNN) models that perform sequential (or: incremental) learning.A new experimental protocol is proposed that takes into account typical constraints encountered in application scenarios.As the investigation is empirical, we evaluate CF behavior on the hitherto largest number of visual classification datasets, from each of which we construct a representative number of Sequential Learning Tasks (SLTs) in close alignment to previous works on CF.Our results clearly indicate that there is no model that avoids CF for all investigated datasets and SLTs under application conditions.
79	Posterior Attention Models for Sequence to Sequence Learning	Shiv Shankar, Sunita Sarawagi	In this paper we show that prevalent attention architectures do not adequately model the dependence among the attention and output tokens across a predicted sequence.We present an alternative architecture called Posterior Attention Models that after a principled factorization of the full joint distribution of the attention and output variables, proposes two major changes.
80	Generative Question Answering: Learning to Answer the Whole Question	Mike Lewis, Angela Fan	We introduce generative models of the joint distribution of questions and answers, which are trained to explain the whole question, not just to answer it.Our question answering (QA) model is implemented by learning a prior over answers, and a conditional language model to generate the question given the answer?allowing scalable and interpretable many-hop reasoning as the question is generated word-by-word.
81	Diversity and Depth in Per-Example Routing Models	Prajit Ramachandran, Quoc V. Le	In this work, we address both of these deficiencies.
82	Selfless Sequential Learning	Rahaf Aljundi, Marcus Rohrbach, Tinne Tuytelaars	In this paper we look at a scenario with fixed model capacity, and postulate that the learning process should not be selfish, i.e. it should account for future tasks to be added and thus leave enough capacity for them.
83	M^3RL: Mind-aware Multi-agent Management Reinforcement Learning	Tianmin Shu, Yuandong Tian	In this paper, we aim to address this from a different angle.
84	The Deep Weight Prior	Andrei Atanov, Arsenii Ashukha, Kirill Struminsky, Dmitriy Vetrov, Max Welling	In this work, we propose a new type of prior distributions for convolutional neural networks, deep weight prior (DWP), that exploit generative models to encourage a specific structure of trained convolutional filters e.g., spatial correlations of weights.
85	Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution	Thomas Elsken, Jan Hendrik Metzen, Frank Hutter	We address the first shortcoming by proposing LEMONADE, an evolutionary algorithm for multi-objective architecture search that allows approximating the Pareto-front of architectures under multiple objectives, such as predictive performance and number of parameters, in a single run of the method.
86	Quaternion Recurrent Neural Networks	Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linar?s, Chiheb Trabelsi, Renato De Mori, Yoshua Bengio	We propose a novel quaternion recurrent neural network (QRNN), alongside with a quaternion long-short term memory neural network (QLSTM), that take into account both the external relations and these internal structural dependencies with the quaternion algebra.
87	Adversarial Audio Synthesis	Chris Donahue, Julian McAuley, Miller Puckette	In this paper we introduce WaveGAN, a first attempt at applying GANs to unsupervised synthesis of raw-waveform audio.
88	Preconditioner on Matrix Lie Group for SGD	Xi-Lin Li	We study two types of preconditioners and preconditioned stochastic gradient descent (SGD) methods in a unified framework.
89	Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks	Patrick Chen, Si Si, Sanjiv Kumar, Yang Li, Cho-Jui Hsieh	In this paper, we introduce a novel softmax layer approximation algorithm by exploiting the clustering structure of context vectors.
90	Adaptive Posterior Learning: few-shot learning with a surprise-based memory module	Tiago Ramalho, Marta Garnelo	In this paper we introduce APL, an algorithm that approximates probability distributions by remembering the most surprising observations it has encountered.
91	Probabilistic Planning with Sequential Monte Carlo methods	Alexandre Piche, Valentin Thomas, Cyril Ibrahim, Yoshua Bengio, Chris Pal	In this work, we propose a novel formulation of planning which views it as a probabilistic inference problem over future optimal trajectories.
92	Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control	Kendall Lowrey, Aravind Rajeswaran, Sham Kakade, Emanuel Todorov, Igor Mordatch	We propose a “plan online and learn offline” framework for the setting where an agent, with an internal model, needs to continually act and learn in the world.
93	DHER: Hindsight Experience Replay for Dynamic Goals	Meng Fang, Cheng Zhou, Bei Shi, Boqing Gong, Jia Xu, Tong Zhang	DHER automatically assembles successful experiences from two relevant failures and can be used to enhance an arbitrary off-policy RL algorithm when the tasks’ goals are dynamic.
94	FlowQA: Grasping Flow in History for Conversational Machine Comprehension	Hsin-Yuan Huang, Eunsol Choi, Wen-tau Yih	To enable traditional, single-turn models to encode the history comprehensively, we introduce Flow, a mechanism that can incorporate intermediate representations generated during the process of answering previous questions, through an alternating parallel processing structure.
95	Learning to Design RNA	Frederic Runge, Danny Stoll, Stefan Falkner, Frank Hutter	Here, we propose a new algorithm for the RNA Design problem, dubbed LEARNA.
96	Robust Conditional Generative Adversarial Networks	Grigorios G. Chrysos, Jean Kossaifi, Stefanos Zafeiriou	In this work, we introduce a novel conditional GAN model, called RoCGAN, which leverages structure in the target space of the model to address the issue.
97	Top-Down Neural Model For Formulae	Karel Chvalovsk?	We present a simple neural model that given a formula and a property tries to answer the question whether the formula has the given property, for example whether a propositional formula is always true.
98	Cost-Sensitive Robustness against Adversarial Examples	Xiao Zhang, David Evans	We encode the potential harm of each adversarial transformation in a cost matrix, and propose a general objective function to adapt the robust training method of Wong & Kolter (2018) to optimize for cost-sensitive robustness.
99	The role of over-parametrization in generalization of neural networks	Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann LeCun, Nathan Srebro	In this work we suggest a novel complexity measure based on unit-wise capacities resulting in a tighter generalization bound for two layer ReLU networks.
100	Diffusion Scattering Transforms on Graphs	Fernando Gama, Alejandro Ribeiro, Joan Bruna	This stability to deformations can be interpreted as stability with respect to changes in the metric structure of the domain.In this work, we show that scattering transforms can be generalized to non-Euclidean domains using diffusion wavelets, while preserving a notion of stability with respect to metric changes in the domain, measured with diffusion maps.
101	Capsule Graph Neural Network	Zhang Xinyi, Lihui Chen	Capsule Graph Neural Network.
102	Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking	Haichuan Yang, Yuhao Zhu, Ji Liu	This paper proposes the first end-to-end DNN training framework that provides quantitative energy consumption guarantees via weighted sparse projection and input masking.
103	Emerging Disentanglement in Auto-Encoder Based Unsupervised Image Content Transfer	Ori Press, Tomer Galanti, Sagie Benaim, Lior Wolf	We study the problem of learning to map, in an unsupervised way, between domains $A$ and $B$, such that the samples $\vb \in B$ contain all the information that exists in samples $\va\in A$ and some additional information.
104	SGD Converges to Global Minimum in Deep Learning via Star-convex Path	Yi Zhou, Junjie Yang, Huishuai Zhang, Yingbin Liang, Vahid Tarokh	In this study, we establish the convergence of SGD to a global minimum for nonconvex optimization problems that are commonly encountered in neural network training.
105	Toward Understanding the Impact of Staleness in Distributed Machine Learning	Wei Dai, Yi Zhou, Nanqing Dong, Hao Zhang, Eric Xing	In this work, we study the convergence behaviors of a wide array of ML models and algorithms under delayed updates.
106	Transfer Learning for Sequences via Learning to Collocate	Wanyun Cui, Guangyu Zheng, Zhiqiang Shen, Sihang Jiang, Wei Wang	We conducted extensive experiments on both sequence labeling tasks (POS tagging, NER) and sentence classification (sentiment analysis).
107	Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure	Karan Goel, Emma Brunskill	In this work, we consider the problem of learning procedural abstractions from possibly high-dimensional observational sequences, such as video demonstrations.
108	Unsupervised Speech Recognition via Segmental Empirical Output Distribution Matching	Chih-Kuan Yeh, Jianshu Chen, Chengzhu Yu, Dong Yu	We propose a fully unsupervised learning algorithm that alternates between solving two sub-problems: (i) learn a phoneme classifier for a given set of phoneme segmentation boundaries, and (ii) refining the phoneme boundaries based on a given classifier.
109	Adversarial Attacks on Graph Neural Networks via Meta Learning	Daniel Z?gner, Stephan G?nnemann	Deep learning models for graphs have advanced the state of the art on many tasks.
110	Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection	Tue Le, Tuan Nguyen, Trung Le, Dinh Phung, Paul Montague, Olivier De Vel, Lizhen Qu	In this paper, we attempt to alleviate this severe binary vulnerability detection bottleneck by leveraging recent advances in deep learning representations and propose the Maximal Divergence Sequential Auto-Encoder.
111	Neural Program Repair by Jointly Learning to Localize and Repair	Marko Vasic, Aditya Kanade, Petros Maniatis, David Bieber, Rishabh Singh	In this work, we consider a recently identified class of bugs called variable-misuse bugs.
112	Information-Directed Exploration for Deep Reinforcement Learning	Nikolay Nikolov, Johannes Kirschner, Felix Berkenkamp, Andreas Krause	Motivated by recent findings that address this issue in bandits, we propose to use Information-Directed Sampling (IDS) for exploration in reinforcement learning.
113	Attention, Learn to Solve Routing Problems!	Wouter Kool, Herke van Hoof, Max Welling	We contribute in both directions: we propose a model based on attention layers with benefits over the Pointer Network and we show how to train this model using REINFORCE with a simple baseline based on a deterministic greedy rollout, which we find is more efficient than using a value function.
114	L2-Nonexpansive Neural Networks	Haifeng Qian, Mark N. Wegman	This paper proposes a class of well-conditioned neural networks in which a unit amount of change in the inputs causes at most a unit amount of change in the outputs or any of the internal layers.
115	Improving Generalization and Stability of Generative Adversarial Networks	Hoang Thanh-Tung, Truyen Tran, Svetha Venkatesh	In this paper, we analyze the generalization of GANs in practical settings.
116	Adaptive Input Representations for Neural Language Modeling	Alexei Baevski, Michael Auli	We introduce adaptive input representations for neural language modeling which extend the adaptive softmax of Grave et al. (2017) to input representations of variable capacity.
117	Neural Persistence: A Complexity Measure for Deep Neural Networks Using Algebraic Topology	Bastian Rieck, Matteo Togninalli, Christian Bock, Michael Moor, Max Horn, Thomas Gumbsch, Karsten Borgwardt	In this work, we propose neural persistence, a complexity measure for neural network architectures based on topological data analysis on weighted stratified graphs.
118	Efficient Augmentation via Data Subsampling	Michael Kuchnik, Virginia Smith	In this work, we demonstrate that it is possible to significantly reduce the number of data points included in data augmentation while realizing the same accuracy and invariance benefits of augmenting the entire dataset. We propose a novel set of subsampling policies, based on model influence and loss, that can achieve a 90% reduction in augmentation set size while maintaining the accuracy gains of standard data augmentation.
119	Neural TTS Stylization with Adversarial and Collaborative Games	Shuang Ma, Daniel Mcduff, Yale Song	In this work, we introduce an end-to-end TTS model that offers enhanced content-style disentanglement ability and controllability.
120	Optimal Control Via Neural Networks: A Convex Approach	Yize Chen, Yuanyuan Shi, Baosen Zhang	Therefore many systems are still identified and controlled based on simple linear models despite their poor representation capability.In this paper we bridge the gap between model accuracy and control tractability faced by neural networks, by explicitly constructing networks that are convex with respect to their inputs.
121	CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model	Florian Mai, Lukas Galke, Ansgar Scherp	Motivated by these findings, we propose a hybrid model that combines the strengths of CBOW and CMOW.
122	Stochastic Optimization of Sorting Networks via Continuous Relaxations	Aditya Grover, Eric Wang, Aaron Zweig, Stefano Ermon	In this work, we propose NeuralSort, a general-purpose continuous relaxation of the output of the sorting operator from permutation matrices to the set of unimodal row-stochastic matrices, where every row sums to one and has a distinct argmax.
123	Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality	Taiji Suzuki	Deep learning has shown high performances in various types of tasks from visual recognition to natural language processing,which indicates superior flexibility and adaptivity of deep learning.To understand this phenomenon theoretically, we develop a new approximation and estimation error analysis ofdeep learning with the ReLU activation for functions in a Besov space and its variant with mixed smoothness.The Besov space is a considerably general function space including the Holder space and Sobolev space, and especially can capture spatial inhomogeneity of smoothness.
124	Generating Multiple Objects at Spatially Distinct Locations	Tobias Hinz, Stefan Heinrich, Stefan Wermter	We introduce a new approach which allows us to control the location of arbitrarily many objects within an image by adding an object pathway to both the generator and the discriminator.
125	Near-Optimal Representation Learning for Hierarchical Reinforcement Learning	Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine	We study the problem of representation learning in goal-conditioned hierarchical reinforcement learning.
126	Understanding Composition of Word Embeddings via Tensor Decomposition	Abraham Frandsen, Rong Ge	In this paper we consider the problem of word embedding composition \— given vector representations of two words, compute a vector for the entire phrase.
127	Structured Neural Summarization	Patrick Fernandes, Miltiadis Allamanis, Marc Brockschmidt	Based on the promising results of graph neural networks on highly structured data, we develop a framework to extend existing sequence encoders with a graph component that can reason about long-distance relationships in weakly structured data such as text.
128	Graph Wavelet Neural Network	Bingbing Xu, Huawei Shen, Qi Cao, Yunqi Qiu, Xueqi Cheng	We present graph wavelet neural network (GWNN), a novel graph convolutional neural network (CNN), leveraging graph wavelet transform to address the shortcomings of previous spectral graph CNN methods that depend on graph Fourier transform.
129	A rotation-equivariant convolutional neural network model of primary visual cortex	Alexander S. Ecker, Fabian H. Sinz, Emmanouil Froudarakis, Paul G. Fahey, Santiago A. Cadena, Edgar Y. Walker, Erick Cobos, Jacob Reimer, Andreas S. Tolias, Matthias Bethge	We present a framework for identifying common features independent of individual neurons’ orientation selectivity by using a rotation-equivariant convolutional neural network, which automatically extracts every feature at multiple different orientations.
130	Supervised Community Detection with Line Graph Neural Networks	Zhengdao Chen, Lisha Li, Joan Bruna	We present a novel family of Graph Neural Networks (GNNs) for solving community detection problems in a supervised learning setting.
131	Multiple-Attribute Text Rewriting	Guillaume Lample, Sandeep Subramanian, Eric Smith, Ludovic Denoyer, Marc’Aurelio Ranzato, Y-Lan Boureau	We thus propose a new model that controls several factors of variation in textual data where this condition on disentanglement is replaced with a simpler mechanism based on back-translation.
132	Wasserstein Barycenter Model Ensembling	Pierre Dognin, Igor Melnyk, Youssef Mroueh, Jarret Ross, Cicero Dos Santos, Tom Sercu	In this paper we propose to perform model ensembling in a multiclass or a multilabel learning setting using Wasserstein (W.) barycenters.
133	Policy Transfer with Strategy Optimization	Wenhao Yu, C. Karen Liu, Greg Turk	In this paper, we present a differentapproach that leverages domain randomization for transferring control policies tounknown environments.
134	code2seq: Generating Sequences from Structured Representations of Code	Uri Alon, Shaked Brody, Omer Levy, Eran Yahav	We present code2seq: an alternative approach that leverages the syntactic structure of programming languages to better encode source code.
135	Predict then Propagate: Graph Neural Networks meet Personalized PageRank	Johannes Klicpera, Aleksandar Bojchevski, Stephan G?nnemann	In this paper, we use the relationship between graph convolutional networks (GCN) and PageRank to derive an improved propagation scheme based on personalized PageRank.
136	Slimmable Neural Networks	Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, Thomas Huang	We present a simple and general method to train a single neural network executable at different widths (number of channels in a layer), permitting instant and adaptive accuracy-efficiency trade-offs at runtime.
137	Analysing Mathematical Reasoning Abilities of Neural Models	David Saxton, Edward Grefenstette, Felix Hill, Pushmeet Kohli	In this paper, we present a new challenge for the evaluation (and eventually the design) of neural architectures and similar system, developing a task suite of mathematics problems involving sequential questions and answers in a free-form textual input/output format.
138	RotDCF: Decomposition of Convolutional Filters for Rotation-Equivariant Deep Networks	Xiuyuan Cheng, Qiang Qiu, Robert Calderbank, Guillermo Sapiro	This paper proposes to decompose the convolutional filters over joint steerable bases across the space and the group geometry simultaneously, namely a rotation-equivariant CNN with decomposed convolutional filters (RotDCF).
139	Execution-Guided Neural Program Synthesis	Xinyun Chen, Chang Liu, Dawn Song	In this work, we propose two simple yet principled techniques to better leverage the semantic information, which are execution-guided synthesis and synthesizer ensemble.
140	Dynamic Sparse Graph for Efficient Deep Learning	Liu Liu, Lei Deng, Xing Hu, Maohua Zhu, Guoqi Li, Yufei Ding, Yuan Xie	We propose to execute deep neural networks (DNNs) with dynamic and sparse graph (DSG) structure for compressive memory and accelerative execution during both training and inference.
141	Fixup Initialization: Residual Learning Without Normalization	Hongyi Zhang, Yann N. Dauphin, Tengyu Ma	In this work, we challenge the commonly-held beliefs by showing that none of the perceived benefits is unique to normalization.
142	ProbGAN: Towards Probabilistic GAN with Theoretical Guarantees	Hao He, Hao Wang, Guang-He Lee, Yonglong Tian	In this paper, we propose a novel probabilistic framework for GANs, ProbGAN, which iteratively learns a distribution over generators with a carefully crafted prior.
143	Exploration by random network distillation	Yuri Burda, Harrison Edwards, Amos Storkey, Oleg Klimov	We introduce an exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed.
144	Unsupervised Learning of the Set of Local Maxima	Lior Wolf, Sagie Benaim, Tomer Galanti	We present an algorithm, show an example where it is more efficient to use local maxima as an indicator function than to employ conventional classification, and derive a suitable generalization bound.
145	On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization	Xiangyi Chen, Sijia Liu, Ruoyu Sun, Mingyi Hong	In this paper, we develop an analysis framework and a set of mild sufficient conditions that guarantee the convergence of the Adam-type methods, with a convergence rate of order $O(\log{T}/\sqrt{T})$ for non-convex stochastic optimization.
146	Minimum Divergence vs. Maximum Margin: an Empirical Comparison on Seq2Seq Models	Huan Zhang, Hai Zhao	We introduce a new training criterion based on the analysis of existing work, and empirically compare models in the two categories.
147	GANSynth: Adversarial Neural Audio Synthesis	Jesse Engel, Kumar Krishna Agrawal, Shuo Chen, Ishaan Gulrajani, Chris Donahue, Adam Roberts	Herein, we demonstrate that GANs can in fact generate high-fidelity and locally-coherent audio by modeling log magnitudes and instantaneous frequencies with sufficient frequency resolution in the spectral domain.
148	Sliced Wasserstein Auto-Encoders	Soheil Kolouri, Phillip E. Pope, Charles E. Martin, Gustavo K. Rohde	In this paper we use the geometric properties of the optimal transport (OT) problem and the Wasserstein distances to define a prior distribution for the latent space of an auto-encoder.
149	Learning Two-layer Neural Networks with Symmetric Inputs	Rong Ge, Rohith Kuditipudi, Zhize Li, Xiang Wang	We give a new algorithm for learning a two-layer neural network under a very general class of input distributions.
150	Learning to Understand Goal Specifications by Modelling Reward	Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Arian Hosseini, Pushmeet Kohli, Edward Grefenstette	To overcome this limitation, we present a framework within which instruction-conditional RL agents are trained using rewards obtained not from the environment, but from reward models which are jointly trained from expert examples.
151	Do Deep Generative Models Know What They Don’t Know?	Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, Balaji Lakshminarayanan	In this paper we challenge this assumption.
152	Identifying and Controlling Important Neurons in Neural Machine Translation	Anthony Bau, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, James Glass	We develop unsupervised methods for discovering important neurons in NMT models.
153	Representing Formal Languages: A Comparison Between Finite Automata and Recurrent Neural Networks	Joshua J. Michalenko, Ameesh Shah, Abhinav Verma, Richard G. Baraniuk, Swarat Chaudhuri, Ankit B. Patel	We investigate the internal representations that a recurrent neural network (RNN) uses while learning to recognize a regular formal language.
154	Visual Explanation by Interpretation: Improving Visual Feedback Capabilities of Deep Neural Networks	Jose Oramas, Kaili Wang, Tinne Tuytelaars	In this paper, we propose a novel scheme for both interpretation as well as explanation in which, given a pretrained model, we automatically identify internal features relevant for the set of classes considered by the model, without relying on additional annotations.
155	Don’t let your Discriminator be fooled	Brady Zhou, Philipp Kr?henb?hl	In this paper, we show that the Wasserstein distance is just one out of a large family of objective functions that yield these properties.
156	Latent Convolutional Models	ShahRukh Athar, Evgeny Burnaev, Victor Lempitsky	We present a new latent model of natural images that can be learned on large-scale datasets.
157	A Universal Music Translation Network	Noam Mor, Lior Wolf, Adam Polyak, Yaniv Taigman	We present a method for translating music across musical instruments and styles.
158	How to train your MAML	Antreas Antoniou, Harrison Edwards, Amos Storkey	In this paper, we propose various modifications to MAML that not only stabilize the system, but also substantially improve the generalization performance, convergence speed and computational overhead of MAML, which we call MAML++.
159	Learning a SAT Solver from Single-Bit Supervision	Daniel Selsam, Matthew Lamm, Benedikt B\”{u}nz, Percy Liang, Leonardo de Moura, David L. Dill	We present NeuroSAT, a message passing neural network that learns to solve SAT problems after only being trained as a classifier to predict satisfiability.
160	Learning Representations of Sets through Optimized Permutations	Yan Zhang, Jonathon Hare, Adam Pr?gel-Bennett	To this end, we propose a Permutation-Optimisation module that learns how to permute a set end-to-end.
161	Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition	Chun-Fu (Richard) Chen, Quanfu Fan, Neil Mallinar, Tom Sercu, Rogerio Feris	In this paper, we propose a novel Convolutional Neural Network (CNN) architecture for learning multi-scale feature representations with good tradeoffs between speed and accuracy.
162	Unsupervised Hyper-alignment for Multilingual Word Embeddings	Jean Alaux, Edouard Grave, Marco Cuturi, Armand Joulin	We thus propose a novel formulation that ensures composable mappings, leading to better alignments.
163	Visual Semantic Navigation using Scene Priors	Wei Yang, Xiaolong Wang, Ali Farhadi, Abhinav Gupta, Roozbeh Mottaghi	In this work, we focus on incorporating semantic priors in the task of semantic navigation.
164	NOODL: Provable Online Dictionary Learning and Sparse Coding	Sirisha Rambhatla, Xingguo Li, Jarvis Haupt	This was a major challenge until recently, when provable algorithms for dictionary learning were proposed.
165	Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization	Navid Azizan, Babak Hassibi	Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization.
166	Active Learning with Partial Feedback	Peiyun Hu, Zachary C. Lipton, Anima Anandkumar, Deva Ramanan	To address this more realistic setting, we propose active learning with partial feedback (ALPF), where the learner must actively choose both which example to label and which binary question to ask.
167	Gradient descent aligns the layers of deep linear networks	Ziwei Ji, Matus Telgarsky	This paper establishes risk convergence and asymptotic weight matrix alignment — a form of implicit regularization — of gradient flow and gradient descent when applied to deep linear networks on linearly separable data.
168	Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds	Cenk Baykal, Lucas Liebenwein, Igor Gilitschenski, Dan Feldman, Daniela Rus	We present an efficient coresets-based neural network compression algorithm that sparsifies the parameters of a trained fully-connected neural network in a manner that provably approximates the network’s output.
169	On the loss landscape of a class of deep neural networks with no bad local valleys	Quynh Nguyen, Mahesh Chandra Mukkamala, Matthias Hein	We identify a class of over-parameterized deep neural networks with standard activation functions and cross-entropy loss which provably have no bad local valley, in the sense that from any point in parameter space there exists a continuous path on which the cross-entropy loss is non-increasing and gets arbitrarily close to zero.
170	DOM-Q-NET: Grounded RL on Structured Language	Sheng Jia, Jamie Ryan Kiros, Jimmy Ba	In this work, we introduce DOM-Q-NET, a novel architecture for RL-based web navigation to address both of these problems.
171	Boosting Robustness Certification of Neural Networks	Gagandeep Singh, Timon Gehr, Markus P?schel, Martin Vechev	We present a novel approach for the certification of neural networks against adversarial perturbations which combines scalable overapproximation methods with precise (mixed integer) linear programming.
172	Learning To Simulate	Nataniel Ruiz, Samuel Schulter, Manmohan Chandraker	In this work, we propose a reinforcement learning-based method for automatically adjusting the parameters of any (non-differentiable) simulator, thereby controlling the distribution of synthesized data in order to maximize the accuracy of a model trained on that data.
173	Towards Understanding Regularization in Batch Normalization	Ping Luo, Xinjiang Wang, Wenqi Shao, Zhanglin Peng	We analyze BN by using a basic block of neural networks, consisting of a kernel layer, a BN layer, and a nonlinear activation function.
174	The Laplacian in RL: Learning Representations with Efficient Approximations	Yifan Wu, George Tucker, Ofir Nachum	In this paper, we present a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context.
175	Predicting the Generalization Gap in Deep Networks with Margin Distributions	Yiding Jiang, Dilip Krishnan, Hossein Mobahi, Samy Bengio	In this paper, we propose such a measure, and conduct extensive empirical studies on how well it can predict the generalization gap.
176	Adversarial Imitation via Variational Inverse Reinforcement Learning	Ahmed H. Qureshi, Byron Boots, Michael C. Yip	Our proposed method builds on the framework of generative adversarial networks and introduces the empowerment-regularized maximum-entropy inverse reinforcement learning to learn near-optimal rewards and policies.
177	Reasoning About Physical Interactions with Object-Oriented Prediction and Planning	Michael Janner, Sergey Levine, William T. Freeman, Joshua B. Tenenbaum, Chelsea Finn, Jiajun Wu	We present a paradigm for learning object-centric representations for physical scene understanding without direct supervision of object properties.
178	LayoutGAN: Generating Graphic Layouts with Wireframe Discriminators	Jianan Li, Jimei Yang, Aaron Hertzmann, Jianming Zhang, Tingfa Xu	We propose a novel Generative Adversarial Network, called LayoutGAN, that synthesizes layouts by modeling geometric relations of different types of 2D elements.
179	Learning Mixed-Curvature Representations in Product Spaces	Albert Gu, Frederic Sala, Beliz Gunel, Christopher R?	The quality of the representations achieved by embeddings is determined by how well the geometry of the embedding space matches the structure of the data.Euclidean space has been the workhorse for embeddings; recently hyperbolic and spherical spaces have gained popularity due to their ability to better embed new types of structured data—such as hierarchical data—but most data is not structured so uniformly.We address this problem by proposing learning embeddings in a product manifold combining multiple copies of these model spaces (spherical, hyperbolic, Euclidean), providing a space of heterogeneous curvature suitable for a wide variety of structures.We introduce a heuristic to estimate the sectional curvature of graph data and directly determine an appropriate signature—the number of component spaces and their dimensions—of the product manifold.Empirically, we jointly learn the curvature and the embedding in the product space via Riemannian optimization.We discuss how to define and compute intrinsic quantities such as means—a challenging notion for product manifolds—and provably learnable optimization functions.On a range of datasets and reconstruction tasks, our product space embeddings outperform single Euclidean or hyperbolic spaces used in previous works, reducing distortion by 32.55% on a Facebook social network dataset.
180	StrokeNet: A Neural Painting Environment	Ningyuan Zheng, Yifan Jiang, Dingjiang Huang	In this paper we try to address the discrete nature of software environment with an intermediate, differentiable simulation.
181	Harmonizing Maximum Likelihood with GANs for Multimodal Conditional Generation	Soochan Lee, Junsoo Ha, Gunhee Kim	In order to accomplish both training stability and multimodal output generation, we propose novel training schemes with a new set of losses named moment reconstruction losses that simply replace the reconstruction loss.
182	Measuring Compositionality in Representation Learning	Jacob Andreas	We describe a procedure for evaluating compositionality by measuring how well the true representation-producing model can be approximated by a model that explicitly composes a collection of inferred representational primitives.
183	Benchmarking Neural Network Robustness to Common Corruptions and Perturbations	Dan Hendrycks, Thomas Dietterich	In this paper we establish rigorous benchmarks for image classifier robustness. Then we propose a new dataset called ImageNet-P which enables researchers to benchmark a classifier’s robustness to common perturbations.
184	ADef: an Iterative Algorithm to Construct Adversarial Deformations	Rima Alaifari, Giovanni S. Alberti, Tandri Gauksson	In this paper, we propose the ADef algorithm to construct a different kind of adversarial attack created by iteratively applying small deformations to the image, found through a gradient descent step.
185	Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning	Ilya Kostrikov, Kumar Krishna Agrawal, Debidatta Dwibedi, Sergey Levine, Jonathan Tompson	In order to address these issues, we propose a new algorithm called Discriminator-Actor-Critic that uses off-policy Reinforcement Learning to reduce policy-environment interaction sample complexity by an average factor of 10.
186	Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives	George Tucker, Dieterich Lawson, Shixiang Gu, Chris J. Maddison	In particular, we show that this estimator reduces the variance of the IWAE gradient, the reweighted wake-sleep update (RWS) (Bornschein & Bengio 2014), and the jackknife variational inference (JVI) gradient (Nowozin 2018).
187	Learning Recurrent Binary/Ternary Weights	Arash Ardakani, Zhengyun Ji, Sean C. Smithson, Brett H. Meyer, Warren J. Gross	To address the above issues, we introduce a method that can learn binary and ternary weights during the training phase to facilitate hardware implementations of RNNs.
188	Learning concise representations for regression by evolving networks of trees	William La Cava, Tilak Raj Singh, James Taggart, Srinivas Suri	We propose and study a method for learning interpretable representations for the task of regression.
189	Efficient Training on Very Large Corpora via Gramian Estimation	Walid Krichene, Nicolas Mayoraz, Steffen Rendle, Li Zhang, Xinyang Yi, Lichan Hong, Ed Chi, John Anderson	These models are typically trained using SGD with random sampling of unobserved pairs, with a sample size that grows quadratically with the corpus size, making it expensive to scale.We propose new efficient methods to train these models without having to sample unobserved pairs.
190	MAE: Mutual Posterior-Divergence Regularization for Variational AutoEncoders	Xuezhe Ma, Chunting Zhou, Eduard Hovy	In this work, we introduce mutual posterior-divergence regularization, a novel regularization that is able to control the geometry of the latent space to accomplish meaningful representation learning, while achieving comparable or superior capability of density estimation.Experiments on three image benchmark datasets demonstrate that, when equipped with powerful decoders, our model performs well both on density estimation and representation learning.
191	Residual Non-local Attention Networks for Image Restoration	Yulun Zhang, Kunpeng Li, Kai Li, Bineng Zhong, Yun Fu	In this paper, we propose a residual non-local attention network for high-quality image restoration.
192	Meta-Learning For Stochastic Gradient MCMC	Wenbo Gong, Yingzhen Li, Jos? Miguel Hern?ndez-Lobato	This paper presents the first meta-learning algorithm that allows automated design for the underlying continuous dynamics of an SG-MCMC sampler.
193	Systematic Generalization: What Is Required and Can It Be Learned?	Dzmitry Bahdanau, Shikhar Murty, Michael Noukhovitch, Thien Huu Nguyen, Harm de Vries, Aaron Courville	Numerous models for grounded language understanding have been recently proposed, including (i) generic models that can be easily adapted to any given task and (ii) intuitively appealing modular models that require background knowledge to be instantiated.
194	Efficient Lifelong Learning with A-GEM	Arslan Chaudhry, Marc?Aurelio Ranzato, Marcus Rohrbach, Mohamed Elhoseiny	In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost.
195	Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering	Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Andrew McCallum	This paper introduces a new framework for open-domain question answering in which the retriever and the reader \emph{iteratively interact} with each other.
196	Double Viterbi: Weight Encoding for High Compression Ratio and Fast On-Chip Reconstruction for Deep Neural Network	Daehyun Ahn, Dongsoo Lee, Taesu Kim, Jae-Joon Kim	In this paper, we propose a new sparse matrix format in order to enable a highly parallel decoding process of the entire sparse matrix.
197	Overcoming the Disentanglement vs Reconstruction Trade-off via Jacobian Supervision	Jos? Lezama	In this work, we propose to overcome this trade-off by progressively growing the dimension of the latent code, while constraining the Jacobian of the output image with respect to the disentangled variables to remain the same.
198	RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space	Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, Jian Tang	In this paper, we present a new approach for knowledge graph embedding called RotatE, which is able to model and infer various relation patterns including: symmetry/antisymmetry, inversion, and composition.
199	Guiding Policies with Language via Meta-Learning	John D. Co-Reyes, Abhishek Gupta, Suvansh Sanjeev, Nick Altieri, Jacob Andreas, John DeNero, Pieter Abbeel, Sergey Levine	In this work, we propose an interactive formulation of the task specification problem, where iterative language corrections are provided to an autonomous agent, guiding it in acquiring the desired skill.
200	AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods	Zhiming Zhou, Qingru Zhang, Guansong Lu, Hongwei Wang, Weinan Zhang, Yong Yu	In this paper, we provide a new insight into the non-convergence issue of Adam as well as other adaptive learning rate methods.
201	AD-VAT: An Asymmetric Dueling mechanism for learning Visual Active Tracking	Fangwei Zhong, Peng Sun, Wenhan Luo, Tingyun Yan, Yizhou Wang	To learn a robust tracker for VAT, in this paper, we propose a novel adversarial RL method which adopts an Asymmetric Dueling mechanism, referred to as AD-VAT.
202	Marginal Policy Gradients: A Unified Family of Estimators for Bounded Action Spaces with Applications	Carson Eisenach, Haichuan Yang, Ji Liu, Han Liu	To this end we introduce the angular policy gradient (APG), a stochastic policy gradient method for directional control.
203	On Self Modulation for Generative Adversarial Networks	Ting Chen, Mario Lucic, Neil Houlsby, Sylvain Gelly	We propose and study an architectural modification, self-modulation, which improves GAN performance across different data sets, architectures, losses, regularizers, and hyperparameter settings.
204	Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy	Yuan Xie, Boyi Liu, Qiang Liu, Zhaoran Wang, Yuan Zhou, Jian Peng	In this work, we introduce a new approach named Maximum Likelihood Inverse Propensity Scoring (MLIPS) for batch learning from logged bandit feedback.
205	Subgradient Descent Learns Orthogonal Dictionaries	Yu Bai, Qijia Jiang, Ju Sun	We show that a subgradient descent algorithm, with random initialization, can recover orthogonal dictionaries on a natural nonsmooth, nonconvex L1 minimization formulation of the problem, under mild statistical assumption on the data.
206	ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech	Wei Ping, Kainan Peng, Jitong Chen	In this work, we propose a new solution for parallel wave generation by WaveNet.
207	MARGINALIZED AVERAGE ATTENTIONAL NETWORK FOR WEAKLY-SUPERVISED LEARNING	Yuan Yuan, Yueming Lyu, Xi Shen, Ivor W. Tsang, Dit-Yan Yeung	To alleviate this issue, we propose a marginalized average attentional network (MAAN) to suppress the dominant response of the most salient regions in a principled manner.
208	Towards GAN Benchmarks Which Require Generalization	Ishaan Gulrajani, Colin Raffel, Luke Metz	For many evaluation metrics commonly used as benchmarks for unconditional image generation, trivially memorizing the training set attains a better score than models which are considered state-of-the-art; we consider this problematic.We clarify a necessary condition for an evaluation metric not to behave this way: estimating the function must require a large sample from the model.
209	A Closer Look at Few-shot Classification	Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, Jia-Bin Huang	In this paper, we present 1) a consistent comparative analysis of several representative few-shot classi?cation algorithms, with results showing that deeper backbones signi?cantly reduce the gap across methods including the baseline, 2) a slightly modi?ed baseline method that surprisingly achieves competitive performance when compared with the state-of-the-art on both the mini-ImageNet and the CUB datasets, and 3) a new experimental setting for evaluating the cross-domain generalization ability for few-shot classi?cation algorithms.
210	Meta-Learning Probabilistic Inference for Prediction	Jonathan Gordon, John Bronskill, Matthias Bauer, Sebastian Nowozin, Richard Turner	This paper introduces a new framework for data efficient and versatile learning.
211	Deep reinforcement learning with relational inductive biases	Vinicius Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David Reichert, Timothy Lillicrap, Edward Lockhart, Murray Shanahan, Victoria Langston, Razvan Pascanu, Matthew Botvinick, Oriol Vinyals, Peter Battaglia	We introduce an approach for augmenting model-free deep reinforcement learning agents with a mechanism for relational reasoning over structured representations, which improves performance, learning efficiency, generalization, and interpretability.
212	Relaxed Quantization for Discretized Neural Networks	Christos Louizos, Matthias Reisser, Tijmen Blankevoort, Efstratios Gavves, Max Welling	In order to train networks that can be effectively discretized without loss of performance, we introduce a differentiable quantization procedure.
213	Tree-Structured Recurrent Switching Linear Dynamical Systems for Multi-Scale Modeling	Josue Nassar, Scott Linderman, Monica Bugallo, Il Memming Park	To fit this model, we present a fully-Bayesian sampling procedure using Polya-Gamma data augmentation to allow for fast and conjugate Gibbs sampling.
214	STCN: Stochastic Temporal Convolutional Networks	Emre Aksan, Otmar Hilliges	In this work, we propose stochastic temporal convolutional networks (STCNs), a novel architecture that combines the computational advantages of temporal convolutional networks (TCN) with the representational power and robustness of stochastic latent spaces.
215	Soft Q-Learning with Mutual-Information Regularization	Jordi Grau-Moya, Felix Leibfried, Peter Vrancx	We propose a reinforcement learning (RL) algorithm that uses mutual-information regularization to optimize a prior action distribution for better performance and exploration.
216	On the Turing Completeness of Modern Neural Network Architectures	Jorge P?rez, Javier Marinkovic, Pablo Barcel?	We show both models to be Turing complete exclusively based on their capacity to compute and access internal dense representations of the data.
217	Improving Differentiable Neural Computers Through Memory Masking, De-allocation, and Link Distribution Sharpness Control	Robert Csordas, Juergen Schmidhuber	An analysis of its internal activation patterns reveals three problems: Most importantly, the lack of key-value separation makes the address distribution resulting from content-based look-up noisy and flat, since the value influences the score calculation, although only the key should.
218	Evaluating Robustness of Neural Networks with Mixed Integer Programming	Vincent Tjeng, Kai Y. Xiao, Russ Tedrake	We achieve this computational speedup via tight formulations for non-linearities, as well as a novel presolve algorithm that makes full use of all information available.
219	Random mesh projectors for inverse problems	Konik Kothari, Sidharth Gupta, Maarten v. de Hoop, Ivan Dokmanic	We propose a new learning-based approach to solve ill-posed inverse problems in imaging.
220	Multi-Agent Dual Learning	Yiren Wang, Yingce Xia, Tianyu He, Fei Tian, Tao Qin, ChengXiang Zhai, Tie-Yan Liu	In this paper, we extend this framework by introducing multiple primal and dual models, and propose the multi-agent dual learning framework.
221	Complement Objective Training	Hao-Yun Chen, Pei-Hsin Wang, Chun-Hao Liu, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan	We conduct extensive experiments on multiple tasks ranging from computer vision to natural language understanding.
222	Mode Normalization	Lucas Deecke, Iain Murray, Hakan Bilen	As a remedy, we propose a more flexible approach: by extending the normalization to more than a single mean and variance, we detect modes of data on-the-fly, jointly normalizing samples that share common features.
223	Detecting Egregious Responses in Neural Sequence-to-sequence Models	Tianxing He, James Glass	In this work, we attempt to answer a critical question: whether there exists some input sequence that will cause a well-trained discrete-space neural network sequence-to-sequence (seq2seq) model to generate egregious outputs (aggressive, malicious, attacking, etc.).
224	Learning Actionable Representations with Goal Conditioned Policies	Dibya Ghosh, Abhishek Gupta, Sergey Levine	In this paper, we instead aim to learn functionally salient representations: representations that are not necessarily complete in terms of capturing all factors of variation in the observation space, but rather aim to capture those factors of variation that are important for decision making — that are “actionable”.
225	Verification of Non-Linear Specifications for Neural Networks	Chongli Qin, Krishnamurthy (Dj) Dvijotham, Brendan O’Donoghue, Rudy Bunel, Robert Stanforth, Sven Gowal, Jonathan Uesato, Grzegorz Swirszcz, Pushmeet Kohli	In this paper, we extend verification algorithms to be able to certify richer properties of neural networks.
226	Generating Liquid Simulations with Deformation-aware Neural Networks	Lukas Prantl, Boris Bonev, Nils Thuerey	We propose a novel approach for deformation-aware neural networks that learn the weighting and synthesis of dense volumetric deformation fields.
227	DyRep: Learning Representations over Dynamic Graphs	Rakshit Trivedi, Mehrdad Farajtabar, Prasenjeet Biswal, Hongyuan Zha	We present DyRep – a novel modeling framework for dynamic graphs that posits representation learning as a latent mediation process bridging two observed processes namely — dynamics of the network (realized as topological evolution) and dynamics on the network (realized as activities between nodes).
228	Trellis Networks for Sequence Modeling	Shaojie Bai, J. Zico Kolter, Vladlen Koltun	We present trellis networks, a new architecture for sequence modeling.
229	Scalable Unbalanced Optimal Transport using Generative Adversarial Networks	Karren D. Yang, Caroline Uhler	In this paper, we present a scalable method for unbalanced optimal transport (OT) based on the generative-adversarial framework.
230	Solving the Rubik’s Cube with Approximate Policy Iteration	Stephen McAleer, Forest Agostinelli, Alexander Shmakov, Pierre Baldi	We introduce Autodidactic Iteration: an API algorithm that overcomes the problem of sparse rewards by training on a distribution of states that allows the reward to propagate from the goal state to states farther away.
231	Variance Reduction for Reinforcement Learning in Input-Driven Environments	Hongzi Mao, Shaileshh Bojja Venkatakrishnan, Malte Schwarzkopf, Mohammad Alizadeh	We consider reinforcement learning in input-driven environments, where an exogenous, stochastic input process affects the dynamics of the system.
232	Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic	Mikael Henaff, Alfredo Canziani, Yann LeCun	In this work, we propose to train a policy while explicitly penalizing the mismatch between these two distributions over a fixed time horizon.
233	GAN Dissection: Visualizing and Understanding Generative Adversarial Networks	David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, Antonio Torralba	Answering such questions could enable us to develop new insights and better models.In this work, we present an analytic framework to visualize and understand GANs at the unit-, object-, and scene-level.
234	Improving MMD-GAN Training with Repulsive Loss Function	Wei Wang, Yuan Sun, Saman Halgamuge	To address this issue, we propose a repulsive loss function to actively learn the difference among the real data by simply rearranging the terms in MMD.
235	Deterministic PAC-Bayesian generalization bounds for deep networks via generalizing noise-resilience	Vaishnavh Nagarajan, Zico Kolter	In this work, we present a general PAC-Bayesian framework that leverages this observation to provide a bound on the original network learned — a network that is deterministic and uncompressed.
236	Recall Traces: Backtracking Models for Efficient Reinforcement Learning	Anirudh Goyal, Philemon Brakel, William Fedus, Soumye Singhal, Timothy Lillicrap, Sergey Levine, Hugo Larochelle, Yoshua Bengio	Hence, we may want to preferentially train on those high-reward states and the probable trajectories leading to them.To this end, we advocate for the use of a \textit{backtracking model} that predicts the preceding states that terminate at a given high-reward state.
237	Stable Recurrent Models	John Miller, Moritz Hardt	In this work, we conduct a thorough investigation of stable recurrent models.
238	The Limitations of Adversarial Training and the Blind-Spot Attack	Huan Zhang, Hongge Chen, Zhao Song, Duane Boning, Inderjit S. Dhillon, Cho-Jui Hsieh	In our paper, we shed some lights on the practicality and the hardness of adversarial training by showing that the effectiveness (robustness on test set) of adversarial training has a strong correlation with the distance between a test point and the manifold of training data embedded by the network.
239	Efficiently testing local optimality and escaping saddles for ReLU networks	Chulhee Yun, Suvrit Sra, Ali Jadbabaie	We provide a theoretical algorithm for checking local optimality and escaping saddles at nondifferentiable points of empirical risks of two-layer ReLU networks.
240	ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware	Han Cai, Ligeng Zhu, Song Han	In this paper, we present ProxylessNAS that can directly learn the architectures for large-scale target tasks and target hardware platforms.
241	Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization	Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama	In this paper, we propose an HRL method that learns a latent variable of a hierarchical policy using mutual information maximization.
242	Generalizable Adversarial Training via Spectral Normalization	Farzan Farnia, Jesse Zhang, David Tse	In this work, we extend the notion of margin loss to adversarial settings and bound the generalization error for DNNs trained under several well-known gradient-based attack schemes, motivating an effective regularization scheme based on spectral normalization of the DNN’s weight matrices.
243	Adversarial Domain Adaptation for Stable Brain-Machine Interfaces	Ali Farshchian, Juan A. Gallego, Joseph P. Cohen, Yoshua Bengio, Lee E. Miller, Sara A. Solla	Here, we introduce a new computationalapproach that decodes movement intent from a low-dimensional latent representationof the neural data.
244	Deep Online Learning Via Meta-Learning: Continual Adaptation for Model-Based RL	Anusha Nagabandi, Chelsea Finn, Sergey Levine	The goal in this paper is to develop a method for continual online learning from an incoming stream of data, using deep neural network models.
245	Deep Anomaly Detection with Outlier Exposure	Dan Hendrycks, Mantas Mazeika, Thomas Dietterich	We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE).
246	Contingency-Aware Exploration in Reinforcement Learning	Jongwook Choi, Yijie Guo, Marcin Moczulski, Junhyuk Oh, Neal Wu, Mohammad Norouzi, Honglak Lee	In this study, we develop an attentive dynamics model (ADM) that discovers controllable elements of the observations, which are often associated with the location of the character in Atari games.
247	Context-adaptive Entropy Model for End-to-end Optimized Image Compression	Jooyoung Lee, Seunghyun Cho, Seung-Kwon Beack	We propose a context-adaptive entropy model for use in end-to-end optimized image compression.
248	Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow	Xue Bin Peng, Angjoo Kanazawa, Sam Toyer, Pieter Abbeel, Sergey Levine	In this work, we propose a simple and general technique to constrain information flow in the discriminator by means of an information bottleneck.
249	Meta-learning with differentiable closed-form solvers	Luca Bertinetto, Joao F. Henriques, Philip Torr, Andrea Vedaldi	Adapting deep networks to new concepts from a few examples is challenging, due to the high computational requirements of standard fine-tuning procedures.Most work on few-shot learning has thus focused on simple learning techniques for adaptation, such as nearest neighbours or gradient descent.Nonetheless, the machine learning literature contains a wealth of methods that learn non-deep models very efficiently.In this paper, we propose to use these fast convergent methods as the main adaptation mechanism for few-shot learning.The main idea is to teach a deep network to use standard machine learning tools, such as ridge regression, as part of its own internal model, enabling it to quickly adapt to novel data.This requires back-propagating errors through the solver steps.While normally the cost of the matrix operations involved in such a process would be significant, by using the Woodbury identity we can make the small number of examples work to our advantage.We propose both closed-form and iterative solvers, based on ridge regression and logistic regression components.Our methods constitute a simple and novel approach to the problem of few-shot learning and achieve performance competitive with or superior to the state of the art on three benchmarks.
250	Learning Self-Imitating Diverse Policies	Tanmay Gangwani, Qiang Liu, Jian Peng	In this work, we introduce a self-imitation learning algorithm that exploits and explores well in the sparse and episodic reward settings.
251	ProxQuant: Quantized Neural Networks via Proximal Operators	Yu Bai, Yu-Xiang Wang, Edo Liberty	Despite its empirical success, little is understood about why the straight-through gradient method works.Building upon a novel observation that the straight-through gradient method is in fact identical to the well-known Nesterov?s dual-averaging algorithm on a quantization constrained optimization problem, we propose a more principled alternative approach, called ProxQuant , that formulates quantized network training as a regularized learning problem instead and optimizes it via the prox-gradient method.
252	Universal Transformers	Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, Lukasz Kaiser	We propose the Universal Transformer (UT), a parallel-in-time self-attentive recurrent sequence model which can be cast as a generalization of the Transformer model and which addresses these issues.
253	Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning	Anusha Nagabandi, Ignasi Clavera, Simin Liu, Ronald S. Fearing, Pieter Abbeel, Sergey Levine, Chelsea Finn	Given that it is impractical to train separate policies to accommodate all situations the agent may see in the real world, this work proposes to learn how to quickly and effectively adapt online to new tasks.
254	L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data	Jianbo Chen, Le Song, Martin J. Wainwright, Michael I. Jordan	Methods based on the Shapley score have been proposed as a fair way of computing feature attributions, but incur an exponential complexity in the number of features.
255	Discovery of Natural Language Concepts in Individual Units of CNNs	Seil Na, Yo Joong Choe, Dong-Hyun Lee, Gunhee Kim	In order to quantitatively analyze such intriguing phenomenon, we propose a concept alignment method based on how units respond to replicated text.
256	Towards the first adversarially robust neural network model on MNIST	Lukas Schott, Jonas Rauber, Matthias Bethge, Wieland Brendel	We present a novel robust classification model that performs analysis by synthesis using learned class-conditional data distributions.
257	Discriminator Rejection Sampling	Samaneh Azadi, Catherine Olsson, Trevor Darrell, Ian Goodfellow, Augustus Odena	We propose a rejection sampling scheme using the discriminator of a GAN toapproximately correct errors in the GAN generator distribution.
258	Harmonic Unpaired Image-to-image Translation	Rui Zhang, Tomas Pfister, Jia Li	In this paper, we take a manifold view of the problem by introducing a smoothness term over the sample graph to attain harmonic functions to enforce consistent mappings during the translation.
259	Universal Successor Features Approximators	Diana Borsa, Andre Barreto, John Quan, Daniel J. Mankowitz, Hado van Hasselt, Remi Munos, David Silver, Tom Schaul	We discuss the challenges involved in training a USFA, its generalisation properties and demonstrate its practical benefits and transfer abilities on a large-scale domain in which the agent has to navigate in a first-person perspective three-dimensional environment.
260	Gradient Descent Provably Optimizes Over-parameterized Neural Networks	Simon S. Du, Xiyu Zhai, Barnabas Poczos, Aarti Singh	One of the mysteries in the success of neural networks is randomly initialized first order methods like gradient descent can achieve zero training loss even though the objective function is non-convex and non-smooth.
261	Opportunistic Learning: Budgeted Cost-Sensitive Learning from Data Streams	Mohammad Kachuee, Orpaz Goldstein, Kimmo K?rkk?inen, Sajad Darabi, Majid Sarrafzadeh	In this paper, we propose a novel approach for cost-sensitive feature acquisition at the prediction-time.
262	DARTS: Differentiable Architecture Search	Hanxiao Liu, Karen Simonyan, Yiming Yang	This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner.
263	Feature-Wise Bias Amplification	Klas Leino, Matt Fredrikson, Emily Black, Shayak Sen, Anupam Datta	We present two new feature selection algorithms for mitigating bias amplification in linear models, and show how they can be adapted to convolutional neural networks efficiently.
264	The relativistic discriminator: a key element missing from standard GAN	Alexia Jolicoeur-Martineau	We generalize both approaches to non-standard GAN loss functions and we refer to them respectively as Relativistic GANs (RGANs) and Relativistic average GANs (RaGANs).
265	Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer	David Berthelot, Colin Raffel, Aurko Roy, Ian Goodfellow	In this paper, we propose a regularization procedure which encourages interpolated outputs to appear more realistic by fooling a critic network which has been trained to recover the mixing coefficient from interpolated data.
266	Quasi-hyperbolic momentum and Adam for deep learning	Jerry Ma, Denis Yarats	We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step.
267	Local SGD Converges Fast and Communicates Little	Sebastian U. Stich	To overcome this communication bottleneck recent works propose to reduce the communication frequency.
268	Learning Finite State Representations of Recurrent Policy Networks	Anurag Koul, Alan Fern, Sam Greydanus	In this paper, we introduce a new technique, Quantized Bottleneck Insertion, to learn finite representations of these vectors and features.
269	Multilingual Neural Machine Translation with Knowledge Distillation	Xu Tan, Yi Ren, Di He, Tao Qin, Zhou Zhao, Tie-Yan Liu	In this paper, we propose a distillation-based approach to boost the accuracy of multilingual machine translation.
270	MisGAN: Learning from Incomplete Data with Generative Adversarial Networks	Steven Cheng-Xian Li, Bo Jiang, Benjamin Marlin	In this paper, we present a GAN-based framework for learning from complex, high-dimensional incomplete data.
271	A Direct Approach to Robust Deep Learning Using Adversarial Networks	Huaxia Wang, Chun-Nam Yu	In this paper we propose a new defensive mechanism under the generative adversarial network~(GAN) framework.
272	Combinatorial Attacks on Binarized Neural Networks	Elias B Khalil, Amrita Gupta, Bistra Dilkina	In this work, we study the problem of attacking a BNN through the lens of combinatorial and integer optimization.
273	Exemplar Guided Unsupervised Image-to-Image Translation with Semantic Consistency	Liqian Ma, Xu Jia, Stamatios Georgoulis, Tinne Tuytelaars, Luc Van Gool	To alleviate these issues, we propose the Exemplar Guided & Semantically Consistent Image-to-image Translation (EGSC-IT) network which conditions the translation process on an exemplar image in the target domain.
274	ARM: Augment-REINFORCE-Merge Gradient for Stochastic Binary Networks	Mingzhang Yin, Mingyuan Zhou	To backpropagate the gradients through stochastic binary layers, we propose the augment-REINFORCE-merge (ARM) estimator that is unbiased, exhibits low variance, and has low computational complexity.
275	Building Dynamic Knowledge Graphs from Text using Machine Reading Comprehension	Rajarshi Das, Tsendsuren Munkhdalai, Xingdi Yuan, Adam Trischler, Andrew McCallum	We propose a neural machine-reading model that constructs dynamic knowledge graphs from procedural text.
276	Information asymmetry in KL-regularized RL	Alexandre Galashov, Siddhant M. Jayakumar, Leonard Hasenclever, Dhruva Tirumala, Jonathan Schwarz, Guillaume Desjardins, Wojciech M. Czarnecki, Yee Whye Teh, Razvan Pascanu, Nicolas Heess	In this work we study the possibility of leveraging such repeated structure to speed up and regularize learning.
277	TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer	Sicong Huang, Qiyang Li, Cem Anil, Xuchan Bao, Sageev Oore, Roger B. Grosse	In this work, we address the problem of musical timbre transfer, where the goal is to manipulate the timbre of a sound sample from one instrument to match another instrument while preserving other musical content, such as pitch, rhythm, and loudness.
278	Whitening and Coloring Batch Transform for GANs	Aliaksandr Siarohin, Enver Sangineto, Nicu Sebe	In this paper we propose to generalize both BN and cBN using a Whitening and Coloring based batch normalization.
279	Learnable Embedding Space for Efficient Neural Architecture Compression	Shengcao Cao, Xiaofang Wang, Kris M. Kitani	We propose a method to incrementally learn an embedding space over the domain of network architectures, to enable the careful selection of architectures for evaluation during compressed architecture search.
280	On the Sensitivity of Adversarial Robustness to Input Data Distributions	Gavin Weiguang Ding, Kry Yik Chau Lui, Xiaomeng Jin, Luyu Wang, Ruitong Huang	In this paper, we demonstrate an intriguing phenomenon about the most popular robust training method in the literature, adversarial training: Adversarial robustness, unlike clean accuracy, is sensitive to the input data distribution.
281	Minimal Images in Deep Neural Networks: Fragile Object Recognition in Natural Images	Sanjana Srivastava, Guy Ben-Yosef, Xavier Boix	In this paper, we demonstrate that such drops in accuracy due to changes of the visible region are a common phenomenon between humans and existing state-of-the-art deep neural networks (DNNs), and are much more prominent in DNNs.
282	A Statistical Approach to Assessing Neural Network Robustness	Stefan Webb, Tom Rainforth, Yee Whye Teh, M. Pawan Kumar	We present a new approach to assessing the robustness of neural networks based on estimating the proportion of inputs for which a property is violated.
283	Improving Sequence-to-Sequence Learning via Optimal Transport	Liqun Chen, Yizhe Zhang, Ruiyi Zhang, Chenyang Tao, Zhe Gan, Haichao Zhang, Bai Li, Dinghan Shen, Changyou Chen, Lawrence Carin	We present a novel solution to alleviate these issues.
284	PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees	James Jordon, Jinsung Yoon, Mihaela van der Schaar	In this paper, we investigate a method for ensuring (differential) privacy of the generator of the Generative Adversarial Nets (GAN) framework.
285	Integer Networks for Data Compression with Latent-Variable Models	Johannes Ball?, Nick Johnston, David Minnen	We propose using integer networks as a universal solution to this problem, and demonstrate that they enable reliable cross-platform encoding and decoding of images using variational models.
286	Value Propagation Networks	Nantas Nardelli, Gabriel Synnaeve, Zeming Lin, Pushmeet Kohli, Philip H. S. Torr, Nicolas Usunier	We present Value Propagation (VProp), a set of parameter-efficient differentiable planning modules built on Value Iteration which can successfully be trained using reinforcement learning to solve unseen tasks, has the capability to generalize to larger map sizes, and can learn to navigate in dynamic environments.
287	Bayesian Policy Optimization for Model Uncertainty	Gilwoo Lee, Brian Hou, Aditya Mandalika, Jeongseok Lee, Sanjiban Choudhury, Siddhartha S. Srinivasa	To address challenges from discretizing the continuous latent parameter space, we propose a new policy network architecture that encodes the belief distribution independently from the observable state.
288	Variational Bayesian Phylogenetic Inference	Cheng Zhang, Frederick A. Matsen IV	In this paper we present an alternative approach: a variational framework for Bayesian phylogenetic analysis.
289	LEARNING FACTORIZED REPRESENTATIONS FOR OPEN-SET DOMAIN ADAPTATION	Mahsa Baktashmotlagh, Masoud Faraki, Tom Drummond, Mathieu Salzmann	In this paper, we tackle the more challenging, yet more realistic case of open-set domain adaptation, where new, unknown classes can be present in the target data.
290	On the Universal Approximability and Complexity Bounds of Quantized ReLU Neural Networks	Yukun Ding, Jinglan Liu, Jinjun Xiong, Yiyu Shi	In this paper, we study the representation power of quantized neural networks.
291	Learning Localized Generative Models for 3D Point Clouds via Graph Convolution	Diego Valsesia, Giulia Fracastoro, Enrico Magli	We focus on the generator of a GAN and define methods for graph convolution when the graph is not known in advance as it is the very output of the generator.
292	ACCELERATING NONCONVEX LEARNING VIA REPLICA EXCHANGE LANGEVIN DIFFUSION	Yi Chen, Jinglin Chen, Jing Dong, Jian Peng, Zhaoran Wang	To attain the advantages of both regimes, we propose to use replica exchange, which swaps between two Langevin diffusions with different temperatures.
293	Dynamically Unfolding Recurrent Restorer: A Moving Endpoint Control Method for Image Restoration	Xiaoshuai Zhang, Yiping Lu, Jiaying Liu, Bin Dong	In this paper, we propose a new control framework called the moving endpoint control to restore images corrupted by different degradation levels in one model.
294	Bias-Reduced Uncertainty Estimation for Deep Neural Classifiers	Yonatan Geifman, Guy Uziel, Ran El-Yaniv	We consider the problem of uncertainty estimation in the context of (non-Bayesian) deep neural classification.
295	CAMOU: Learning Physical Vehicle Camouflages to Adversarially Attack Detectors in the Wild	Yang Zhang, Hassan Foroosh, Philip David, Boqing Gong	In this paper, we conduct an intriguing experimental study about the physical adversarial attack on object detectors in the wild.
296	Learning Latent Superstructures in Variational Autoencoders for Deep Multidimensional Clustering	Xiaopeng Li, Zhourong Chen, Leonard K. M. Poon, Nevin L. Zhang	We investigate a variant of variational autoencoders where there is a superstructure of discrete latent variables on top of the latent features.
297	Learning Programmatically Structured Representations with Perceptor Gradients	Svetlin Penkov, Subramanian Ramamoorthy	We present the perceptor gradients algorithm — a novel approach to learning symbolic representations based on the idea of decomposing an agent’s policy into i) a perceptor network extracting symbols from raw observation data and ii) a task encoding program which maps the input symbols to output actions.
298	Variational Autoencoders with Jointly Optimized Latent Dependency Structure	Jiawei He, Yu Gong, Joseph Marino, Greg Mori, Andreas Lehrmann	We propose a method for learning the dependency structure between latent variables in deep latent variable models.
299	The Unusual Effectiveness of Averaging in GAN Training	Yasin Yaz{\i}c{\i}, Chuan-Sheng Foo, Stefan Winkler, Kim-Hui Yap, Georgios Piliouras, Vijay Chandrasekhar	We examine two different techniques for parameter averaging in GAN training.
300	Beyond Pixel Norm-Balls: Parametric Adversaries using an Analytically Differentiable Renderer	Hsueh-Ti Derek Liu, Michael Tao, Chun-Liang Li, Derek Nowrouzezahrai, Alec Jacobson	As such, we propose a novel evaluation measure, parametric norm-balls, by directly perturbing physical parameters that underly image formation.
301	Diversity is All You Need: Learning Skills without a Reward Function	Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine	Intelligent creatures can explore their environments and learn useful skills without supervision.In this paper, we propose “Diversity is All You Need”(DIAYN), a method for learning useful skills without a reward function.
302	Supervised Policy Update for Deep Reinforcement Learning	Quan Vuong, Yiming Zhang, Keith W. Ross	We propose a new sample-efficient methodology, called Supervised Policy Update (SPU), for deep reinforcement learning.
303	Learning sparse relational transition models	Victoria Xia, Zi Wang, Kelsey Allen, Tom Silver, Leslie Pack Kaelbling	We present a representation for describing transition models in complex uncertain domains using relational rules.
304	Learning to Schedule Communication in Multi-agent Reinforcement Learning	Daewoo Kim, Sangwoo Moon, David Hostallero, Wan Ju Kang, Taeyoung Lee, Kyunghwan Son, Yung Yi	In this paper, we study a practical scenario when (i) the communication bandwidth is limited and (ii) the agents share the communication medium so that only a restricted number of agents are able to simultaneously use the medium, as in the state-of-the-art wireless networking standards.
305	Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies	Kenneth Marino, Abhinav Gupta, Rob Fergus, Arthur Szlam	In this paper we introduce a simple, robust approach to hierarchically training an agent in the setting of sparse reward tasks.The agent is split into a low-level and a high-level policy.
306	Multi-class classification without multi-class labels	Yen-Chang Hsu, Zhaoyang Lv, Joel Schlosser, Phillip Odom, Zsolt Kira	We formulate this approach, present a probabilistic graphical model for it, and derive a surprisingly simple loss function that can be used to learn neural network-based models.
307	What do you learn from context? Probing for sentence structure in contextualized word representations	Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R Thomas McCoy, Najoung Kim, Benjamin Van Durme, Samuel R. Bowman, Dipanjan Das, Ellie Pavlick	Building on recent token-level probing work, we introduce a novel edge probing task design and construct a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline.
308	Spectral Inference Networks: Unifying Deep and Spectral Learning	David Pfau, Stig Petersen, Ashish Agarwal, David G. T. Barrett, Kimberly L. Stachenfeld	We present Spectral Inference Networks, a framework for learning eigenfunctions of linear operators by stochastic optimization.
309	PeerNets: Exploiting Peer Wisdom Against Adversarial Attacks	Jan Svoboda, Jonathan Masci, Federico Monti, Michael Bronstein, Leonidas Guibas	Unfortunately, it has been shown that such systems are vulnerable to adversarial attacks, making them prone to potential unlawful uses.Designing deep neural networks that are robust to adversarial attacks is a fundamental step in making such systems safer and deployable in a broader variety of applications (e.g. autonomous driving), but more importantly is a necessary step to design novel and more advanced architectures built on new computational paradigms rather than marginally building on the existing ones.In this paper we introduce PeerNets, a novel family of convolutional networks alternating classical Euclidean convolutions with graph convolutions to harness information from a graph of peer samples.
310	Attentive Neural Processes	Hyunjik Kim, Andriy Mnih, Jonathan Schwarz, Marta Garnelo, Ali Eslami, Dan Rosenbaum, Oriol Vinyals, Yee Whye Teh	We address this issue by incorporating attention into NPs, allowing each input location to attend to the relevant context points for the prediction.
311	Representation Degeneration Problem in Training Natural Language Generation Models	Jun Gao, Di He, Xu Tan, Tao Qin, Liwei Wang, Tieyan Liu	We analyze the conditions and causes of this problem and propose a novel regularization method to address it.
312	Hierarchical interpretations for neural network predictions	Chandan Singh, W. James Murdoch, Bin Yu	To ameliorate this problem, we introduce the use of hierarchical interpretations to explain DNN predictions through our proposed method: agglomerative contextual decomposition (ACD).
313	Spreading vectors for similarity search	Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Herv? J?gou	In this work, we propose to reverse this paradigm and adapt the data to the quantizer: we train a neural net whose last layers form a fixed parameter-free quantizer, such as pre-defined points of a sphere.
314	A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks	Sanjeev Arora, Nadav Cohen, Noah Golowich, Wei Hu	We analyze speed of convergence to global optimum for gradient descent training a deep linear neural network by minimizing the L2 loss over whitened data.
315	Feed-forward Propagation in Probabilistic Neural Networks with Categorical and Max Layers	Alexander Shekhovtsov, Boris Flach	Probabilistic Neural Networks deal with various sources of stochasticity: input noise, dropout, stochastic neurons, parameter uncertainties modeled as random variables, etc.In this paper we revisit a feed-forward propagation approach that allows one to estimate for each neuron its mean and variance w.r.t. all mentioned sources of stochasticity.
316	Measuring and regularizing networks in function space	Ari Benjamin, David Rolnick, Konrad Kording	Here, we show that it is simple and computationally feasible to calculate distances between functions in a $L^2$ Hilbert space.
317	Fluctuation-dissipation relations for stochastic gradient descent	Sho Yaida	Here, we derive stationary fluctuation-dissipation relations that link measurable quantities and hyperparameters in the stochastic gradient descent algorithm.
318	Poincare Glove: Hyperbolic Word Embeddings	Alexandru Tifrea, Gary Becigneul, Octavian-Eugen Ganea*	In this paper, justified by the notion of delta-hyperbolicity or tree-likeliness of a space, we propose to embed words in a Cartesian product of hyperbolic spaces which we theoretically connect to the Gaussian word embeddings and their Fisher geometry.
319	Episodic Curiosity through Reachability	Nikolay Savinov, Anton Raichuk, Damien Vincent, Raphael Marinier, Marc Pollefeys, Timothy Lillicrap, Sylvain Gelly	We propose a new curiosity method which uses episodic memory to form the novelty bonus.
320	Phase-Aware Speech Enhancement with Deep Complex U-Net	Hyeong-Seok Choi, Jang-Hyun Kim, Jaesung Huh, Adrian Kim, Jung-Woo Ha, Kyogu Lee	To improve speech enhancement performance, we tackle the phase estimation problem in three ways.
321	Generative predecessor models for sample-efficient imitation learning	Yannick Schroecker, Mel Vecerik, Jon Scholz	We propose Generative Predecessor Models for Imitation Learning (GPRIL), a novel imitation learning algorithm that matches the state-action distribution to the distribution observed in expert demonstrations, using generative models to reason probabilistically about alternative histories of demonstrated states.
322	Adaptive Estimators Show Information Compression in Deep Neural Networks	Ivan Chelombiev, Conor Houghton, Cian O’Donnell	In this paper we developed more robust mutual information estimation techniques, that adapt to hidden activity of neural networks and produce more sensitive measurements of activations from all functions, especially unbounded functions.
323	Multilingual Neural Machine Translation With Soft Decoupled Encoding	Xinyi Wang, Hieu Pham, Philip Arthur, Graham Neubig	In this paper, we propose Soft Decoupled Encoding (SDE), a multilingual lexicon encoding framework specifically designed to share lexical-level information intelligently without requiring heuristic preprocessing such as pre-segmenting the data.
324	Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet	Wieland Brendel, Matthias Bethge	We here introduce a high-performance DNN architecture on ImageNet whose decisions are considerably easier to explain.
325	Reward Constrained Policy Optimization	Chen Tessler, Daniel J. Mankowitz, Shie Mannor	In this work we present a novel multi-timescale approach for constrained policy optimization, called `Reward Constrained Policy Optimization’ (RCPO), which uses an alternative penalty signal to guide the policy towards a constraint satisfying one.
326	On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length	Stanislaw Jastrzebski, Zachary Kenton, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey	In this paper we extend previous work by investigating the curvature of the loss surface along the whole training trajectory, rather than only at the endpoint.
327	Modeling the Long Term Future in Model-Based Reinforcement Learning	Nan Rosemary Ke, Amanpreet Singh, Ahmed Touati, Anirudh Goyal, Yoshua Bengio, Devi Parikh, Dhruv Batra	To this end, we build a latent-variable autoregressive model by leveraging recent ideas in variational inference.
328	Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets	Penghang Yin, Jiancheng Lyu, Shuai Zhang, Stanley Osher, Yingyong Qi, Jack Xin	In this paper, we provide the theoretical justification of the concept of STE by answering this question.
329	DISTRIBUTIONAL CONCAVITY REGULARIZATION FOR GANS	Shoichiro Yamaguchi, Masanori Koyama	We propose Distributional Concavity (DC) regularization for Generative Adversarial Networks (GANs), a functional gradient-based method that promotes the entropy of the generator distribution and works against mode collapse.Our DC regularization is an easy-to-implement method that can be used in combination with the current state of the art methods like Spectral Normalization and Wasserstein GAN with gradient penalty to further improve the performance.We will not only show that our DC regularization can achieve highly competitive results on ILSVRC2012 and CIFAR datasets in terms of Inception score and Fr\’echet inception distance, but also provide a mathematical guarantee that our method can always increase the entropy of the generator distribution.
330	LeMoNADe: Learned Motif and Neuronal Assembly Detection in calcium imaging videos	Elke Kirschbaum, Manuel Hau?mann, Steffen Wolf, Hannah Sonntag, Justus Schneider, Shehabeldin Elzoheiry, Oliver Kann, Daniel Durstewitz, Fred A Hamprecht	We here propose LeMoNADe, a new exploratory data analysis method that facilitates hunting for motifs in calcium imaging videos, the dominant microscopic functional imaging modality in neurophysiology.
331	Competitive experience replay	Hao Liu, Alexander Trott, Richard Socher, Caiming Xiong	We propose a novel method called competitive experience replay, which efficiently supplements a sparse reward by placing learning in the context of an exploration competition between a pair of agents.
332	Multi-Domain Adversarial Learning	Alice Schoenauer-Sebag, Louise Heinrich, Marc Schoenauer, Michele Sebag, Lani F. Wu, Steve J. Altschuler	This paper presents a multi-domain adversarial learning approach, MuLANN, to leverage multiple datasets with overlapping but distinct class sets, in a semi-supervised setting.
333	ProMP: Proximal Meta-Policy Search	Jonas Rothfuss, Dennis Lee, Ignasi Clavera, Tamim Asfour, Pieter Abbeel	Building on the gained insights we develop a novel meta-learning algorithm that overcomes both the issue of poor credit assignment and previous difficulties in estimating meta-policy gradients.
334	Don’t Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors	Vitalii Zhelezniak, Aleksandar Savkov, April Shen, Francesco Moramarco, Jack Flann, Nils Y. Hammerla	We propose a novel fuzzy bag-of-words (FBoW) representation for text that contains all the words in the vocabulary simultaneously but with different degrees of membership, which are derived from similarities between word vectors.
335	Stable Opponent Shaping in Differentiable Games	Alistair Letcher, Jakob Foerster, David Balduzzi, Tim Rockt?schel, Shimon Whiteson	In this paper we present Stable Opponent Shaping (SOS), a new method that interpolates between LOLA and a stable variant named LookAhead.
336	A Mean Field Theory of Batch Normalization	Greg Yang, Jeffrey Pennington, Vinay Rao, Jascha Sohl-Dickstein, Samuel S. Schoenholz	We develop a mean field theory for batch normalization in fully-connected feedforward neural networks.
337	Learning Exploration Policies for Navigation	Tao Chen, Saurabh Gupta, Abhinav Gupta	In this work, we study how agents can autonomously explore realistic and complex 3D environments without the context of task-rewards.
338	Distribution-Interpolation Trade off in Generative Models	Damian Lesniak, Igor Sieradzki, Igor Podolak	We investigate the properties of multidimensional probability distributions in the context of latent space prior distributions of implicit generative models.
339	Learning to Describe Scenes with Programs	Yunchao Liu, Zheng Wu, Daniel Ritchie, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu	In this paper, we present scene programs, representing a scene via a symbolic program for its objects, attributes, and their relations.
340	Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards	Daniel McDuff, Ashish Kapoor	We present a novel approach to reinforcement learning that leverages a task-independent intrinsic reward function trained on peripheral pulse measurements that are correlated with human autonomic nervous system responses.
341	Deep Frank-Wolfe For Neural Network Optimization	Leonard Berrada, Andrew Zisserman, M. Pawan Kumar	We present an optimization method that offers empirically the best of both worlds: our algorithm yields good generalization performance while requiring only one hyper-parameter.
342	LEARNING TO PROPAGATE LABELS: TRANSDUCTIVE PROPAGATION NETWORK FOR FEW-SHOT LEARNING	Yanbin Liu, Juho Lee, Minseop Park, Saehoon Kim, Eunho Yang, Sung Ju Hwang, Yi Yang	In this paper, we propose Transductive Propagation Network (TPN), a novel meta-learning framework for transductive inference that classifies the entire test set at once to alleviate the low-data problem.
343	Improving the Generalization of Adversarial Training with Domain Adaptation	Chuanbiao Song, Kun He, Liwei Wang, John E. Hopcroft	To alleviate this problem, we propose a novel Adversarial Training with Domain Adaptation (ATDA) method.
344	Dimensionality Reduction for Representing the Knowledge of Probabilistic Models	Marc T Law, Jake Snell, Amir-massoud Farahmand, Raquel Urtasun, Richard S Zemel	We propose a simple, intuitive and scalable dimension reduction framework that takes into account the soft probabilistic interpretation of standard deep models for classification.
345	Learning protein sequence embeddings using information from structure	Tristan Bepler, Bonnie Berger	We introduce a framework that maps any protein sequence to a sequence of vector embeddings — one per amino acid position — that encode structural information.
346	Variational Smoothing in Recurrent Neural Network Language Models	Lingpeng Kong, Gabor Melis, Wang Ling, Lei Yu, Dani Yogatama	We present a new theoretical perspective of data noising in recurrent neural network language models (Xie et al., 2017).
347	Biologically-Plausible Learning Algorithms Can Scale to Large Datasets	Will Xiao, Honglin Chen, Qianli Liao, Tomaso Poggio	To address this ?weight transport problem? (Grossberg, 1987), two biologically-plausible algorithms, proposed by Liao et al. (2016) and Lillicrap et al. (2016), relax BP?s weight symmetry requirements and demonstrate comparable learning capabilities to that of BP on small datasets.
348	Coarse-grain Fine-grain Coattention Network for Multi-evidence Question Answering	Victor Zhong, Caiming Xiong, Nitish Shirish Keskar, Richard Socher	In this work, we propose the Coarse-grain Fine-grain Coattention Network (CFC), a new question answering model that combines information from evidence across multiple documents.
349	Learning a Meta-Solver for Syntax-Guided Program Synthesis	Xujie Si, Yuan Yang, Hanjun Dai, Mayur Naik, Le Song	To address these challenges, we propose a meta-learning framework that learns a transferable policy from only weak supervision.
350	Towards Robust, Locally Linear Deep Networks	Guang-He Lee, David Alvarez-Melis, Tommi S. Jaakkola	In this paper, we propose a new learning problem to encourage deep networks to have stable derivatives over larger regions.
351	How Important is a Neuron	Kedar Dhamdhere, Mukund Sundararajan, Qiqi Yan	We introduce the notion of conductanceto extend the notion of attribution to understanding the importance of hidden units.Informally, the conductance of a hidden unit of a deep network is the flow of attributionvia this hidden unit.
352	Learning to Make Analogies by Contrasting Abstract Relational Structure	Felix Hill, Adam Santoro, David Barrett, Ari Morcos, Timothy Lillicrap	Here, we study how analogical reasoning can be induced in neural networks that learn to perceive and reason about raw visual data.
353	Learning what you can do before doing anything	Oleh Rybkin, Karl Pertsch, Konstantinos G. Derpanis, Kostas Daniilidis, Andrew Jaegle	In this work, we address the problem of learning an agent?s action space purely from visual observation.
354	Learning Grid Cells as Vector Representation of Self-Position Coupled with Matrix Representation of Self-Motion	Ruiqi Gao, Jianwen Xie, Song-Chun Zhu, Ying Nian Wu	This paper proposes a representational model for grid cells.
355	Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions	Zaiyi Chen, Zhuoning Yuan, Jinfeng Yi, Bowen Zhou, Enhong Chen, Tianbao Yang	We propose a universal stagewise optimization framework for a broad family of non-smooth non-convex problems with the following key features: (i) at each stage any suitable stochastic convex optimization algorithms (e.g., SGD or AdaGrad) that return an averaged solution can be employed for minimizing a regularized convex problem; (ii) the step size is decreased in a stagewise manner; (iii) an averaged solution is returned as the final solution.
356	Invariant and Equivariant Graph Networks	Haggai Maron, Heli Ben-Hamu, Nadav Shamir, Yaron Lipman	A basic challenge in developing such networks is finding the maximal collection of invariant and equivariant \emph{linear} layers.
357	Robustness May Be at Odds with Accuracy	Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, Aleksander Madry	We show that there exists an inherent tension between the goal of adversarial robustness and that of standard generalization.Specifically, training robust models may not only be more resource-consuming, but also lead to a reduction of standard accuracy.
358	Feature Intertwiner for Object Detection	Hongyang Li, Bo Dai, Shaoshuai Shi, Wanli Ouyang, Xiaogang Wang	In this paper, we address this problem via a new perspective.
359	Adversarial Reprogramming of Neural Networks	Gamaleldin F. Elsayed, Ian Goodfellow, Jascha Sohl-Dickstein	We introduce attacks that instead reprogram the target model to perform a task chosen by the attacker without the attacker needing to specify or compute the desired output for each test-time input.
360	G-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space	Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Nenghai Yu, Tie-Yan Liu	In this paper, we provide our positive answer to this question.
361	From Hard to Soft: Understanding Deep Network Nonlinearities via Vector Quantization and Statistical Inference	Randall Balestriero, Richard Baraniuk	{\em This paper extends the MASO framework to these and an infinitely large class of new nonlinearities by linking deterministic MASOs with probabilistic Gaussian Mixture Models (GMMs).}
362	Aggregated Momentum: Stability Through Passive Damping	James Lucas, Shengyang Sun, Richard Zemel, Roger Grosse	We propose Aggregated Momentum (AggMo), a variant of momentum which combines multiple velocity vectors with different damping coefficients.
363	Variational Autoencoder with Arbitrary Conditioning	Oleg Ivanov, Michael Figurnov, Dmitry Vetrov	We propose a single neural probabilistic model based on variational autoencoder that can be conditioned on an arbitrary subset of observed features and then sample the remaining features in “one shot”.
364	Time-Agnostic Prediction: Predicting Predictable Video Frames	Dinesh Jayaraman, Frederik Ebert, Alexei Efros, Sergey Levine	We evaluate our approach for future and intermediate frame prediction across three robotic manipulation tasks.
365	A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation	Akhilesh Gotmare, Nitish Shirish Keskar, Caiming Xiong, Richard Socher	Instead, we revisit the empirical analysis of heuristics through the lens of recently proposed methods for loss surface and representation analysis, viz. mode connectivity and canonical correlation analysis (CCA), and hypothesize reasons why the heuristics succeed.
366	Self-Monitoring Navigation Agent via Auxiliary Progress Estimation	Chih-Yao Ma, Jiasen Lu, Zuxuan Wu, Ghassan AlRegib, Zsolt Kira, Richard Socher, Caiming Xiong	In this paper, we introduce a self-monitoring agent with two complementary components: (1) visual-textual co-grounding module to locate the instruction completed in the past, the instruction required for the next action, and the next moving direction from surrounding images and (2) progress monitor to ensure the grounded instruction correctly reflects the navigation progress.
367	Kernel Change-point Detection with Auxiliary Deep Generative Models	Wei-Cheng Chang, Chun-Liang Li, Yiming Yang, Barnab?s P?czos	In this paper, we propose KL-CPD, a novel kernel learning framework for time series CPD that optimizes a lower bound of test power via an auxiliary generative model.
368	Unsupervised Learning via Meta-Learning	Kyle Hsu, Sergey Levine, Chelsea Finn	Many prior unsupervised learning works aim to do so by developing proxy objectives based on reconstruction, disentanglement, prediction, and other metrics.
369	Auxiliary Variational MCMC	Raza Habib, David Barber	We introduce Auxiliary Variational MCMC, a novel framework for learning MCMC kernels that combines recent advances in variational inference with insights drawn from traditional auxiliary variable MCMC methods such as Hamiltonian Monte Carlo.
370	Neural network gradient-based learning of black-box function interfaces	Alon Jacovi, Guy Hadash, Einat Kermany, Boaz Carmeli, Ofer Lavi, George Kour, Jonathan Berant	We propose a method for end-to-end training of a base neural network that integrates calls to existing black-box functions.
371	Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions	Matthew Mackay, Paul Vicol, Jonathan Lorraine, David Duvenaud, Roger Grosse	We aim to adapt regularization hyperparameters for neural networks by fitting compact approximations to the best-response function, which maps hyperparameters to optimal weights and biases.
372	Unsupervised Control Through Non-Parametric Discriminative Rewards	David Warde-Farley, Tom Van de Wiele, Tejas Kulkarni, Catalin Ionescu, Steven Hansen, Volodymyr Mnih	We present an unsupervised learning algorithm to train agents to achieve perceptually-specified goals using only a stream of observations and actions.
373	Interpolation-Prediction Networks for Irregularly Sampled Time Series	Satya Narayan Shukla, Benjamin Marlin	In this paper, we present a new deep learning architecture for addressing the problem of supervised learning with sparse and irregularly sampled multivariate time series.
374	Riemannian Adaptive Optimization Methods	Gary Becigneul, Octavian-Eugen Ganea	Several first order stochastic optimization methods commonly used in the Euclidean domain such as stochastic gradient descent (SGD), accelerated gradient descent or variance reduced methods have already been adapted to certain Riemannian settings.
375	Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters	Marton Havasi, Robert Peharz, Jos? Miguel Hern?ndez-Lobato	A typical approach is to train a set of deterministic weights, while applying certain techniques such as pruning and quantization, in order that the empirical weight distribution becomes amenable to Shannon-style coding schemes.
376	Characterizing Audio Adversarial Examples Using Temporal Dependency	Zhuolin Yang, Bo Li, Pin-Yu Chen, Dawn Song	Nonetheless, as unique data properties have inspired distinct and powerful learning principles, this paper aims to explore their potentials towards mitigating adversarial inputs.
377	Equi-normalization of Neural Networks	Pierre Stock, Benjamin Graham, R?mi Gribonval, Herv? J?gou	Inspired by the Sinkhorn-Knopp algorithm, we introduce a fast iterative method for minimizing the l2 norm of the weights, equivalently the weight decay regularizer.
378	Generalized Tensor Models for Recurrent Neural Networks	Valentin Khrulkov, Oleksii Hrinchuk, Ivan Oseledets	In this work, we attempt to reduce the gap between theory and practice by extending the theoretical analysis to RNNs which employ various nonlinearities, such as Rectified Linear Unit (ReLU), and show that they also benefit from properties of universality and depth efficiency.
379	Wizard of Wikipedia: Knowledge-Powered Conversational Agents	Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, Jason Weston	To that end we collect and release a large dataset with conversations directly grounded with knowledge retrieved from Wikipedia.
380	Are adversarial examples inevitable?	Ali Shafahi, W. Ronny Huang, Christoph Studer, Soheil Feizi, Tom Goldstein	Given the lack of success at generating robust defenses, we are led to ask a fundamental question: Are adversarial attacks inevitable?This paper analyzes adversarial examples from a theoretical perspective, and identifies fundamental bounds on the susceptibility of a classifier to adversarial attacks.
381	A Variational Inequality Perspective on Generative Adversarial Networks	Gauthier Gidel, Hugo Berard, Ga?tan Vignoud, Pascal Vincent, Simon Lacoste-Julien	In this work, we cast GAN optimization problems in the general variational inequality framework.
382	Learning-Based Frequency Estimation Algorithms	Chen-Yu Hsu, Piotr Indyk, Dina Katabi, Ali Vakilian	We propose a new class of algorithms that automatically learn relevant patterns in the input data and use them to improve its frequency estimates.
383	From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following	Justin Fu, Anoop Korattikara, Sergey Levine, Sergio Guadarrama	In this work, we investigate the problem of grounding language commands as reward functions using inverse reinforcement learning, and argue that language-conditioned rewards are more transferable than language-conditioned policies to new environments.
384	Backpropamine: training self-modifying neural networks with differentiable neuromodulated plasticity	Thomas Miconi, Aditya Rawal, Jeff Clune, Kenneth O. Stanley	Extending previous work on differentiable Hebbian plasticity, we propose a differentiable formulation for the neuromodulation of plasticity.
385	Recurrent Experience Replay in Distributed Reinforcement Learning	Steven Kapturowski, Georg Ostrovski, John Quan, Remi Munos, Will Dabney	Building on the recent successes of distributed training of RL agents, in this paper we investigate the training of RNN-based RL agents from distributed prioritized experience replay.
386	A Generative Model For Electron Paths	John Bradshaw, Matt J. Kusner, Brooks Paige, Marwin H. S. Segler, Jos? Miguel Hern?ndez-Lobato	We propose an electron path prediction model (ELECTRO) to learn these sequences directly from raw reaction data.
387	Modeling Uncertainty with Hedged Instance Embeddings	Seong Joon Oh, Kevin P. Murphy, Jiyan Pan, Joseph Roth, Florian Schroff, Andrew C. Gallagher	We introduce the hedged instance embedding (HIB) in which embeddings are modeled as random variables and the model is trained under the variational information bottleneck principle (Alemi et al., 2016; Achille & Soatto, 2018).
388	Beyond Greedy Ranking: Slate Optimization via List-CVAE	Ray Jiang, Sven Gowal, Yuqiu Qian, Timothy Mann, Danilo J. Rezende	In this paper, we introduce List Conditional Variational Auto-Encoders (ListCVAE),which learn the joint distribution of documents on the slate conditionedon user responses, and directly generate full slates.
389	Stochastic Prediction of Multi-Agent Interactions from Partial Observations	Chen Sun, Per Karlsson, Jiajun Wu, Joshua B Tenenbaum, Kevin Murphy	We present a method which learns to integrate temporal information, from a learned dynamics model, with ambiguous visual information, from a learned vision model, in the context of interacting agents.
390	GamePad: A Learning Environment for Theorem Proving	Daniel Huang, Prafulla Dhariwal, Dawn Song, Ilya Sutskever	In this paper, we introduce a system called GamePad that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant.
391	GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding	Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman	In pursuit of this objective, we introduce the General Language Understanding Evaluation (GLUE) benchmark, a collection of tools for evaluating the performance of models across a diverse set of existing NLU tasks.
392	On Computation and Generalization of Generative Adversarial Networks under Spectrum Control	Haoming Jiang, Zhehui Chen, Minshuo Chen, Feng Liu, Dingding Wang, Tuo Zhao	Motivated by their discovery, we propose a new framework for training GANs, which allows more flexible spectrum control (e.g., making the weight matrices of the discriminator have slow singular value decays).
393	Large-Scale Study of Curiosity-Driven Learning	Yuri Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, Alexei A. Efros	In this paper: (a) We perform the first large-scale study of purely curiosity-driven learning, i.e. {\em without any extrinsic rewards}, across $54$ standard benchmark environments, including the Atari game suite.
394	Unsupervised Discovery of Parts, Structure, and Dynamics	Zhenjia Xu, Zhijian Liu, Chen Sun, Kevin Murphy, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu	In this paper, we propose a novel formulation that simultaneously learns a hierarchical, disentangled object representation and a dynamics model for object parts from unlabeled videos.
395	Music Transformer: Generating Music with Long-Term Structure	Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck	We propose an algorithm that reduces the intermediate memory requirements to linear in the sequence length.
396	BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning	Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, Yoshua Bengio	We introduce the BabyAI research platform, with the goal of supporting investigations towards including humans in the loop for grounded language learning.
397	Analyzing Inverse Problems with Invertible Neural Networks	Lynton Ardizzone, Jakob Kruse, Carsten Rother, Ullrich K?the	We prove theoretically and verify experimentally, onartificial data and real-world problems from medicine and astrophysics, thatINNs are a powerful analysis tool to find multi-modalities in parameter space,uncover parameter correlations, and identify unrecoverable parameters.
398	RelGAN: Relational Generative Adversarial Networks for Text Generation	Weili Nie, Nina Narodytska, Ankit Patel	In this work, we propose RelGAN, a new GAN architecture for text generation, consisting of three main components: a relational memory based generator for the long-distance dependency modeling, the Gumbel-Softmax relaxation for training GANs on discrete data, and multiple embedded representations in the discriminator to provide a more informative signal for the generator updates.
399	The Singular Values of Convolutional Layers	Hanie Sedghi, Vineet Gupta, Philip M. Long	We characterize the singular values of the linear transformation associated with a standard 2D multi-channel convolutional layer, enabling their efficient computation.
400	An Empirical study of Binary Neural Networks’ Optimisation	Milad Alizadeh, Javier Fern?ndez-Marqu?s, Nicholas D. Lane, Yarin Gal	In this work, we empirically identify and study the effectiveness of the various ad-hoc techniques commonly used in the literature, providing best-practices for efficient training of binary models.
401	Approximability of Discriminators Implies Diversity in GANs	Yu Bai, Tengyu Ma, Andrej Risteski	The theoretical work of Arora et al. (2017a) suggests a dilemma about GANs? statistical properties: powerful discriminators cause overfitting, whereas weak discriminators cannot detect mode collapse.By contrast, we show in this paper that GANs can in principle learn distributions in Wasserstein distance (or KL-divergence in many cases) with polynomial sample complexity, if the discriminator class has strong distinguishing power against the particular generator class (instead of against all possible generators).
402	Learning Embeddings into Entropic Wasserstein Spaces	Charlie Frogner, Farzaneh Mirzazadeh, Justin Solomon	We propose to exploit this flexibility by learning an embedding that captures the semantic information in the Wasserstein distance between embedded distributions.
403	DeepOBS: A Deep Learning Optimizer Benchmark Suite	Frank Schneider, Lukas Balles, Philipp Hennig	As the primary contribution, we present DeepOBS, a Python package of deep learning optimization benchmarks.
404	InfoBot: Transfer and Exploration via the Information Bottleneck	Anirudh Goyal, Riashat Islam, DJ Strouse, Zafarali Ahmed, Hugo Larochelle, Matthew Botvinick, Yoshua Bengio, Sergey Levine	We propose to learn about decision states from prior experience.
405	The Comparative Power of ReLU Networks and Polynomial Kernels in the Presence of Sparse Latent Structure	Frederic Koehler, Andrej Risteski	We give an almost-tight theoretical analysis of the performance of both neural networks and polynomials for this problem, as well as verify our theory with simulations.
406	Learning Implicitly Recurrent CNNs Through Parameter Sharing	Pedro Savarese, Michael Maire	We introduce a parameter sharing scheme, in which different layers of a convolutional neural network (CNN) are defined by a learned linear combination of parameter tensors from a global bank of templates.
407	Learning Particle Dynamics for Manipulating Rigid Bodies, Deformable Objects, and Fluids	Yunzhu Li, Jiajun Wu, Russ Tedrake, Joshua B. Tenenbaum, Antonio Torralba	In this paper, we propose to learn a particle-based simulator for complex control tasks.
408	Regularized Learning for Domain Adaptation under Label Shifts	Kamyar Azizzadenesheli, Anqi Liu, Fanny Yang, Animashree Anandkumar	We propose Regularized Learning under Label shifts (RLLS), a principled and a practical domain-adaptation algorithm to correct for shifts in the label distribution between a source and a target domain.
409	Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs	Sachin Kumar, Yulia Tsvetkov	We propose a general technique for replacing the softmax layer with a continuous embedding layer.
410	Relational Forward Models for Multi-Agent Learning	Andrea Tacchetti, H. Francis Song, Pedro A. M. Mediano, Vinicius Zambaldi, J?nos Kram?r, Neil C. Rabinowitz, Thore Graepel, Matthew Botvinick, Peter W. Battaglia	Here we introduce Relational Forward Models (RFM) for multi-agent learning, networks that can learn to make accurate predictions of agents’ future behavior in multi-agent environments.
411	Imposing Category Trees Onto Word-Embeddings Using A Geometric Construction	Tiansi Dong, Chrisitan Bauckhage, Hailong Jin, Juanzi Li, Olaf Cremers, Daniel Speicher, Armin B. Cremers, Joerg Zimmermann	We present a novel method to precisely impose tree-structured category information onto word-embeddings, resulting in ball embeddings in higher dimensional spaces (N-balls for short).
412	Two-Timescale Networks for Nonlinear Value Function Approximation	Wesley Chung, Somjit Nath, Ajin Joseph, Martha White	In this work, we provide a two-timescale network (TTN) architecture that enables linear methods to be used to learn values, with a nonlinear representation learned at a slower timescale.
413	Diversity-Sensitive Conditional Generative Adversarial Networks	Dingdong Yang, Seunghoon Hong, Yunseok Jang, Tianchen Zhao, Honglak Lee	We propose a simple yet highly effective method that addresses the mode-collapse problem in the Conditional Generative Adversarial Network (cGAN).
414	Query-Efficient Hard-label Black-box Attack: An Optimization-based Approach	Minhao Cheng, Thong Le, Pin-Yu Chen, Huan Zhang, JinFeng Yi, Cho-Jui Hsieh	We study the problem of attacking machine learning models in the hard-label black-box setting, where no model information is revealed except that the attacker can make queries to probe the corresponding hard-label decisions.
415	Rethinking the Value of Network Pruning	Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, Trevor Darrell	In this work, we make several surprising observations which contradict common beliefs.
416	Hyperbolic Attention Networks	Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz Hermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro	Recent approaches have successfully demonstrated the benefits of learning the parameters of shallow networks in hyperbolic space.
417	Learning from Positive and Unlabeled Data with a Selection Bias	Masahiro Kato, Takeshi Teshima, Junya Honda	In this paper, we propose a method to partially identify the classifier.
418	Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network	Xuanqing Liu, Yao Li, Chongruo Wu, Cho-Jui Hsieh	We present a new algorithm to train a robust neural network against adversarial attacks.Our algorithm is motivated by the following two ideas.
419	Optimal Completion Distillation for Sequence Learning	Sara Sabour, William Chan, Mohammad Norouzi	We present Optimal Completion Distillation (OCD), a training procedure for optimizing sequence to sequence models based on edit distance.
420	Caveats for information bottleneck in deterministic scenarios	Artemy Kolchinsky, Brendan D. Tracey, Steven Van Kuyk	To address problem (1), we propose a functional that, unlike the IB Lagrangian, can recover the IB curve in all cases.
421	Deep Learning 3D Shapes Using Alt-az Anisotropic 2-Sphere Convolution	Min Liu, Fupin Yao, Chiho Choi, Ayan Sinha, Karthik Ramani	In this paper, we present a method for applying deep learning to 3D surfaces using their spherical descriptors and alt-az anisotropic convolution on 2-sphere.
422	Small nonlinearities in activation functions create bad local minima in neural networks	Chulhee Yun, Suvrit Sra, Ali Jadbabaie	We investigate the loss surface of neural networks.
423	Information Theoretic lower bounds on negative log likelihood	Luis A. Lastras-Monta?o	In this article we use rate-distortion theory, a branch of information theory devoted to the problem of lossy compression, to shed light on an important problem in latent variable modeling of data: is there room to improve the model?
424	Preferences Implicit in the State of the World	Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca Dragan	This motivates our key insight: when a robot is deployed in an environment that humans act in, the state of the environment is already optimized for what humans want.
425	A Kernel Random Matrix-Based Approach for Sparse PCA	Mohamed El Amine Seddik, Mohamed Tamaazousti, Romain Couillet	In this paper, we present a random matrix approach to recover sparse principal components from n p-dimensional vectors.
426	Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods	Apratim Bhattacharyya, Mario Fritz, Bernt Schiele	In this work, we propose a novel Bayesian formulation for anticipating future scene states which leverages synthetic likelihoods that encourage the learning of diverse models to accurately capture the multi-modal nature of future scene states.
427	There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average	Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson	Motivated by these observations, we propose to train consistency-based methods with Stochastic Weight Averaging (SWA), a recent approach which averages weights along the trajectory of SGD with a modified learning rate schedule.
428	Large-Scale Answerer in Questioner’s Mind for Visual Dialog Question Generation	Sang-Woo Lee, Tong Gao, Sohee Yang, Jaejun Yoo, Jung-Woo Ha	To address this, we propose AQM+ that can deal with a large-scale problem and ask a question that is more coherent to the current context of the dialog.
429	Graph HyperNetworks for Neural Architecture Search	Chris Zhang, Mengye Ren, Raquel Urtasun	In this work, we propose the Graph HyperNetwork (GHN) to amortize the search cost: given an architecture, it directly generates the weights by running inference on a graph neural network.
430	DELTA: DEEP LEARNING TRANSFER USING FEATURE MAP WITH ATTENTION FOR CONVOLUTIONAL NETWORKS	Xingjian Li, Haoyi Xiong, Hanchao Wang, Yuxuan Rao, Liping Liu, Jun Huan	In this paper, we propose a novel regularized transfer learning framework DELTA, namely DEep Learning Transfer using Feature Map with Attention.
431	textTOvec: DEEP CONTEXTUALIZED NEURAL AUTOREGRESSIVE TOPIC MODELS OF LANGUAGE WITH DISTRIBUTED COMPOSITIONAL PRIOR	Pankaj Gupta, Yatin Chaudhary, Florian Buettner, Hinrich Schuetze	In this work, we incorporate language structureby combining a neural autoregressive topic model (TM) with a LSTM based languagemodel (LSTM-LM) in a single probabilistic framework.
432	Amortized Bayesian Meta-Learning	Sachin Ravi, Alex Beatson	We propose a meta-learning method which efficiently amortizes hierarchical variational inference across tasks, learning a prior distribution over neural network weights so that a few steps of Bayes by Backprop will produce a good task-specific approximate posterior.
433	Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning	Ying Wen, Yaodong Yang, Rui Luo, Jun Wang, Wei Pan	In this paper, we start from level-$1$ recursion and introduce a probabilistic recursive reasoning (PR2) framework for multi-agent reinforcement learning.
434	Learning Neural PDE Solvers with Convergence Guarantees	Jun-Ting Hsieh, Shengjia Zhao, Stephan Eismann, Lucia Mirabella, Stefano Ermon	In contrast to existing hand-crafted solutions, we propose an approach to learn a fast iterative solver tailored to a specific domain.
435	A new dog learns old tricks: RL finds classic optimization algorithms	Weiwei Kong, Christopher Liaw, Aranyak Mehta, D. Sivakumar	This paper introduces a novel framework for learning algorithms to solve online combinatorial optimization problems.
436	Deep Graph Infomax	Petar Velickovic, William Fedus, William L. Hamilton, Pietro Li?, Yoshua Bengio, R Devon Hjelm	We present Deep Graph Infomax (DGI), a general approach for learning node representations within graph-structured data in an unsupervised manner.
437	Theoretical Analysis of Auto Rate-Tuning by Batch Normalization	Sanjeev Arora, Zhiyuan Li, Kaifeng Lyu	It is shown that even if we fix the learning rate of scale-invariant parameters (e.g., weights of each layer with BN) to a constant (say, 0.3), gradient descent still approaches a stationary point (i.e., a solution where gradient is zero) in the rate of T^{-1/2} in T iterations, asymptotically matching the best bound for gradient descent with well-tuned learning rates.
438	Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm	Charbel Sakr, Naresh Shanbhag	We describe a precision assignment methodology for neural network training in which all network parameters, i.e., activations and weights in the feedforward path, gradients and weight accumulators in the feedback path, are assigned close to minimal precision.
439	FUNCTIONAL VARIATIONAL BAYESIAN NEURAL NETWORKS	Shengyang Sun, Guodong Zhang, Jiaxin Shi, Roger Grosse	Based on this, we introduce a practical training objective which approximates the functional ELBO using finite measurement sets and the spectral Stein gradient estimator.
440	NADPEx: An on-policy temporally consistent exploration method for deep reinforcement learning	Sirui Xie, Junning Huang, Lanxin Lei, Chunxiao Liu, Zheng Ma, Wei Zhang, Liang Lin	In this work, we introduce a novel on-policy temporally consistent exploration strategy – Neural Adaptive Dropout Policy Exploration (NADPEx) – for deep reinforcement learning agents.
441	SPIGAN: Privileged Adversarial Learning from Simulation	Kuan-Hui Lee, German Ros, Jie Li, Adrien Gaidon	We propose a new unsupervised domain adaptation algorithm, called SPIGAN, relying on Simulator Privileged Information (PI) and Generative Adversarial Networks (GAN).
442	Generating Multi-Agent Trajectories using Programmatic Weak Supervision	Eric Zhan, Stephan Zheng, Yisong Yue, Long Sha, Patrick Lucey	We present a hierarchical framework that can effectively learn such sequential generative models.
443	Label super-resolution networks	Kolya Malkin, Caleb Robinson, Le Hou, Rachel Soobitsky, Jacob Czawlytko, Dimitris Samaras, Joel Saltz, Lucas Joppa, Nebojsa Jojic	We present a deep learning-based method for super-resolving coarse (low-resolution) labels assigned to groups of image pixels into pixel-level (high-resolution) labels, given the joint distribution between those low- and high-resolution labels.
444	ANYTIME MINIBATCH: EXPLOITING STRAGGLERS IN ONLINE DISTRIBUTED OPTIMIZATION	Nuwan Ferdinand, Haider Al-Lawati, Stark Draper, Matthew Nokleby	To mitigate the impact of stragglers, we propose an online distributed optimization method called Anytime Minibatch.
445	Sample Efficient Adaptive Text-to-Speech	Yutian Chen, Yannis Assael, Brendan Shillingford, David Budden, Scott Reed, Heiga Zen, Quan Wang, Luis C. Cobo, Andrew Trask, Ben Laurie, Caglar Gulcehre, A?ron van den Oord, Oriol Vinyals, Nando de Freitas	We present a meta-learning approach for adaptive text-to-speech (TTS) with few data.
446	Practical lossless compression with latent variables using bits back coding	James Townsend, Thomas Bird, David Barber	We present ‘`Bits Back with ANS’ (BB-ANS), a scheme to perform lossless compression with latent variable models at a near optimal rate.
447	Kernel RNN Learning (KeRNL)	Christopher Roth, Ingmar Kanitscheider, Ila Fiete	We describe Kernel RNN Learning (KeRNL), a reduced-rank, temporal eligibility trace-based approximation to backpropagation through time (BPTT) for training recurrent neural networks (RNNs) that gives competitive performance to BPTT on long time-dependence tasks.
448	Deep, Skinny Neural Networks are not Universal Approximators	Jesse Johnson	In this paper, we examine the topological constraints that the architecture of a neural network imposes on the level sets of all the functions that it is able to approximate.
449	Large Scale Graph Learning From Smooth Signals	Vassilis Kalofolias, Nathana?l Perraudin	Our algorithm uses known approximate nearest neighbor techniques to reduce the number of variables, and automatically selects the correct parameters of the model, requiring a single intuitive input: the desired edge density.
450	Overcoming Catastrophic Forgetting for Continual Learning via Model Adaptation	Wenpeng Hu, Zhou Lin, Bing Liu, Chongyang Tao, Zhengwei Tao, Jinwen Ma, Dongyan Zhao, Rui Yan	In this paper, we propose a very different approach, called Parameter Generation and Model Adaptation (PGMA), to dealing with the problem.
451	Analysis of Quantized Models	Lu Hou, Ruiliang Zhang, James T. Kwok	In recent years, manyweight-quantized models have been proposed.
452	Deep learning generalizes because the parameter-function map is biased towards simple functions	Guillermo Valle-Perez, Chico Q. Camargo, Ard A. Louis	In this paper, we provide a new explanation.
453	Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks	Amanpreet Singh, Tushar Jain, Sainbayar Sukhbaatar	In this paper, we present Individualized Controlled Continuous Communication Model (IC3Net) which has better training efficiency than simple continuous communication model, and can be applied to semi-cooperative and competitive settings along with the cooperative settings.
454	Synthetic Datasets for Neural Program Synthesis	Richard Shin, Neel Kant, Kavi Gupta, Chris Bender, Brandon Trabucco, Rishabh Singh, Dawn Song	The goal of program synthesis is to automatically generate programs in a particular language from corresponding specifications, e.g. input-output behavior.Many current approaches achieve impressive results after training on randomly generated I/O examples in limited domain-specific languages (DSLs), as with string transformations in RobustFill.However, we empirically discover that applying test input generation techniques for languages with control flow and rich input space causes deep networks to generalize poorly to certain data distributions;to correct this, we propose a new methodology for controlling and evaluating the bias of synthetic data distributions over both programs and specifications.We demonstrate, using the Karel DSL and a small Calculator DSL, that training deep networks on these distributions leads to improved cross-distribution generalization performance.
455	DPSNet: End-to-end Deep Plane Sweep Stereo	Sunghoon Im, Hae-Gon Jeon, Stephen Lin, In So Kweon	In this paper, we present a convolutional neural network called DPSNet (Deep Plane Sweep Network) whose design is inspired by best practices of traditional geometry-based approaches.
456	Conditional Network Embeddings	Bo Kang, Jefrey Lijffijt, Tijl De Bie	In recent years various methods for NE have been introduced, all following a similar strategy: defining a notion of similarity between nodes (typically some distance measure within the network), a distance measure in the embedding space, and a loss function that penalizes large distances for similar nodes and small distances for dissimilar nodes.A difficulty faced by existing methods is that certain networks are fundamentally hard to embed due to their structural properties: (approximate) multipartiteness, certain degree distributions, assortativity, etc.
457	Defensive Quantization: When Efficiency Meets Robustness	Ji Lin, Chuang Gan, Song Han	This paper aims to raise people’s awareness about the security of the quantized models, and we designed a novel quantization methodology to jointly optimize the efficiency and robustness of deep learning models.
458	GO Gradient for Expectation-Based Objectives	Yulai Cong, Miaoyun Zhao, Ke Bai, Lawrence Carin	To address these limitations, we propose a General and One-sample (GO) gradient that ($i$) applies to many distributions associated with non-reparameterizable continuous {\em or} discrete random variables, and ($ii$) has the same low-variance as the reparameterization trick.
459	h-detach: Modifying the LSTM Gradient Towards Better Optimization	Bhargav Kanuparthi, Devansh Arpit, Giancarlo Kerg, Nan Rosemary Ke, Ioannis Mitliagkas, Yoshua Bengio	We introduce a simple stochastic algorithm (\textit{h}-detach) that is specific to LSTM optimization and targeted towards addressing this problem.
460	An analytic theory of generalization dynamics and transfer learning in deep linear networks	Andrew K. Lampinen, Surya Ganguli	We develop an analytic theory of the nonlinear dynamics of generalization in deep linear networks, both within and across tasks.
461	Differentiable Learning-to-Normalize via Switchable Normalization	Ping Luo, Jiamin Ren, Zhanglin Peng, Ruimao Zhang, Jingyu Li	We address a learning-to-normalize problem by proposing Switchable Normalization (SN), which learns to select different normalizers for different normalization layers of a deep neural network.
462	SOM-VAE: Interpretable Discrete Representation Learning on Time Series	Vincent Fortuin, Matthias H?ser, Francesco Locatello, Heiko Strathmann, Gunnar R?tsch	This is due to non-intuitive mappings from data features to salient properties of the representation and non-smoothness over time.To address this problem, we propose a new representation learning framework building on ideas from interpretable discrete dimensionality reduction and deep generative modeling.
463	Hierarchical Generative Modeling for Controllable Speech Synthesis	Wei-Ning Hsu, Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Yuxuan Wang, Yuan Cao, Ye Jia, Zhifeng Chen, Jonathan Shen, Patrick Nguyen, Ruoming Pang	This paper proposes a neural end-to-end text-to-speech (TTS) model which can control latent attributes in the generated speech that are rarely annotated in the training data, such as speaking style, accent, background noise, and recording conditions.
464	Learning Factorized Multimodal Representations	Yao-Hung Hubert Tsai, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency, Ruslan Salakhutdinov	In this paper, we propose to optimize for a joint generative-discriminative objective across multimodal data and labels.
465	Composing Complex Skills by Learning Transition Policies	Youngwoon Lee, Shao-Hua Sun, Sriram Somasundaram, Edward S. Hu, Joseph J. Lim	To empower machines with this ability, we propose a method that can learn transition policies which effectively connect primitive skills to perform sequential tasks without handcrafted rewards.
466	Human-level Protein Localization with Convolutional Neural Networks	Elisabeth Rumetshofer, Markus Hofmarcher, Clemens R?hrl, Sepp Hochreiter, G?nter Klambauer	We present the largest comparison of CNN architectures including GapNet-PL for protein localization in HTI images of human cells.
467	Environment Probing Interaction Policies	Wenxuan Zhou, Lerrel Pinto, Abhinav Gupta	In this work, we propose the ?Environment-Probing? Interaction (EPI) policy, a policy that probes a new environment to extract an implicit understanding of that environment?s behavior.
468	Lagging Inference Networks and Posterior Collapse in Variational Autoencoders	Junxian He, Daniel Spokoyny, Graham Neubig, Taylor Berg-Kirkpatrick	In this paper, we investigate posterior collapse from the perspective of training dynamics.
469	A2BCD: Asynchronous Acceleration with Optimal Complexity	Robert Hannah, Fei Feng, Wotao Yin	A2BCD: Asynchronous Acceleration with Optimal Complexity.
470	Learning to Infer and Execute 3D Shape Programs	Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu	In this paper, we propose 3D shape programs, integrating bottom-up recognition systems with top-down, symbolic program structure to capture both low-level geometry and high-level structural priors for 3D shapes.
471	Deep Decoder: Concise Image Representations from Untrained Non-convolutional Networks	Reinhard Heckel, Paul Hand	Contrary to classical tools such as wavelets, image-generating deep neural networks have a large number of parameters—typically a multiple of their output dimension—and need to be trained on large datasets.In this paper, we propose an untrained simple image model, called the deep decoder, which is a deep neural network that can generate natural images from very few weight parameters.The deep decoder has a simple architecture with no convolutions and fewer weight parameters than the output dimensionality.
472	SNAS: stochastic neural architecture search	Sirui Xie, Hehui Zheng, Chunxiao Liu, Liang Lin	In this work, NAS is reformulated as an optimization problem on parameters of a joint distribution for the search space in a cell.
473	Revealing interpretable object representations from human behavior	Charles Y. Zheng, Francisco Pereira, Chris I. Baker, Martin N. Hebart	Revealing interpretable object representations from human behavior
474	AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks	Bo Chang, Minmin Chen, Eldad Haber, Ed H. Chi	In this paper, we draw connections between recurrent networks and ordinary differential equations.
475	Global-to-local Memory Pointer Networks for Task-Oriented Dialogue	Chien-Sheng Wu, Richard Socher, Caiming Xiong	We propose the global-to-local memory pointer (GLMP) networks to address this issue.
476	InstaGAN: Instance-aware Image-to-Image Translation	Sangwoo Mo, Minsu Cho, Jinwoo Shin	To tackle the issues, we propose a novel method, coined instance-aware GAN (InstaGAN), that incorporates the instance information (e.g., object segmentation masks) and improves multi-instance transfiguration.
477	Deep Layers as Stochastic Solvers	Adel Bibi, Bernard Ghanem, Vladlen Koltun, Rene Ranftl	We provide a novel perspective on the forward pass through a block of layers in a deep network.
478	Learning Multi-Level Hierarchies with Hindsight	Andrew Levy, George Konidaris, Robert Platt, Kate Saenko	To address this problem, we introduce a framework that can learn multiple levels of policies in parallel.