Paper Digest: ICLR 2019 Highlights
Download ICLR-2019-Poster-Digests.pdf– highlights of all 478 poster papers.
The International Conference on Learning Representations (ICLR) is one of the top machine learning conferences in the world. In 2019, there were 1,591 paper submissions, of which 478 accepted with poster presentations and 24 with oral presentations.
To help AI community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting AI paper, you are welcome to sign up our free paper digest service to get new paper updates customized to your own interests on a daily basis.
Paper Digest Team
team@paperdigest.org
TABLE 1: ICLR 2019 Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | BA-Net: Dense Bundle Adjustment Networks | Chengzhou Tang, Ping Tan | This paper introduces a network architecture to solve the structure-from-motion (SfM) problem via feature-metric bundle adjustment (BA), which explicitly enforces multi-view geometry constraints in the form of feature-metric error. |
2 | Deterministic Variational Inference for Robust Bayesian Neural Networks | Anqi Wu, Sebastian Nowozin, Edward Meeds, Richard E. Turner, Jos? Miguel Hern?ndez-Lobato, Alexander L. Gaunt | On the application of heteroscedastic regression we demonstrate good predictive performance over alternative approaches. |
3 | Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks | Yikang Shen, Shawn Tan, Alessandro Sordoni, Aaron Courville | This paper proposes to add such inductive bias by ordering the neurons; a vector of master input and forget gates ensures that when a given neuron is updated, all the neurons that follow it in the ordering are also updated. |
4 | Large Scale GAN Training for High Fidelity Natural Image Synthesis | Andrew Brock, Jeff Donahue, Karen Simonyan | To this end, we train Generative Adversarial Networks at the largest scale yet attempted, and study the instabilities specific to such scale. |
5 | Learning deep representations by mutual information estimation and maximization | R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, Yoshua Bengio | This work investigates unsupervised learning of representations by maximizing mutual information between an input and the output of a deep neural network encoder. |
6 | KnockoffGAN: Generating Knockoffs for Feature Selection using Generative Adversarial Networks | James Jordon, Jinsung Yoon, Mihaela van der Schaar | In this work, we build on the promising Knockoff framework by developing a flexible knockoff generation model. |
7 | Learning Protein Structure with a Differentiable Simulator | John Ingraham, Adam Riesselman, Chris Sander, Debora Marks | In this work we aim to bridge the gap between the expressive capacity of energy functions and the practical capabilities of their simulators by using an unrolled Monte Carlo simulation as a model for data. |
8 | ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness | Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, Wieland Brendel | We show that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence and reveals fundamentally different classification strategies. |
9 | Smoothing the Geometry of Probabilistic Box Embeddings | Xiang Li, Luke Vilnis, Dongxu Zhang, Michael Boratko, Andrew McCallum | In this work, we present a novel hierarchical embedding model, inspired by a relaxation of box embeddings into parameterized density functions using Gaussian convolutions over the boxes. |
10 | On Random Deep Weight-Tied Autoencoders: Exact Asymptotic Analysis, Phase Transitions, and Implications to Training | Ping Li, Phan-Minh Nguyen | We study the behavior of weight-tied multilayer vanilla autoencoders under the assumption of random weights. |
11 | Meta-Learning Update Rules for Unsupervised Representation Learning | Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein | In this work, we propose instead to directly target later desired tasks by meta-learning an unsupervised learning rule which leads to representations useful for those tasks. |
12 | Transferring Knowledge across Learning Processes | Sebastian Flennerhag, Pablo G. Moreno, Neil D. Lawrence, Andreas Damianou | We propose Leap, a framework that achieves this by transferring knowledge across learning processes. |
13 | GENERATING HIGH FIDELITY IMAGES WITH SUBSCALE PIXEL NETWORKS AND MULTIDIMENSIONAL UPSCALING | Jacob Menick, Nal Kalchbrenner | To address theformer challenge, we propose the Subscale Pixel Network (SPN), a conditionaldecoder architecture that generates an image as a sequence of image slices of equalsize. |
14 | Temporal Difference Variational Auto-Encoder | Karol Gregor, George Papamakarios, Frederic Besse, Lars Buesing, Theophane Weber | Motivated by the absence of a model satisfying all these requirements, we propose TD-VAE, a generative sequence model that learns representations containing explicit beliefs about states several steps into the future, and that can be rolled out directly without single-step transitions. |
15 | A Unified Theory of Early Visual Representations from Retina to Cortex through Anatomically Constrained Deep CNNs | Jack Lindsey, Samuel A. Ocko, Surya Ganguli, Stephane Deny | Here, using a deep convolutional neural network trained on image recognition as a model of the visual system, we show that such differences in representation can emerge as a direct consequence of different neural resource constraints on the retinal and cortical networks, and for the first time we find a single model from which both geometries spontaneously emerge at the appropriate stages of visual processing. |
16 | Pay Less Attention with Lightweight and Dynamic Convolutions | Felix Wu, Angela Fan, Alexei Baevski, Yann Dauphin, Michael Auli | In this paper, we show that a very lightweight convolution can perform competitively to the best reported self-attention results. |
17 | Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset | Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, Douglas Eck | The networks and the dataset together present a promising approach toward creating new expressive and interpretable neural models of music. |
18 | Learning to Remember More with Less Memorization | Hung Le, Truyen Tran, Svetha Venkatesh | This method aims to balance between maximizing memorization and forgetting via overwriting mechanisms. |
19 | Learning Robust Representations by Projecting Superficial Statistics Out | Haohan Wang, Zexue He, Zachary C. Lipton, Eric P. Xing | We test our method on the battery of standard domain generalization data sets and, interestingly, achieve comparable or better performance as compared to other domain generalization methods that explicitly require samples from the target distribution for training. |
20 | Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware | Florian Tramer, Dan Boneh | Building upon an efficient outsourcing scheme for matrix multiplication, we propose Slalom, a framework that securely delegates execution of all linear layers in a DNN from a TEE (e.g., Intel SGX or Sanctum) to a faster, yet untrusted, co-located processor. |
21 | The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision | Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, Jiajun Wu | We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that learns visual concepts, words, and semantic parsing of sentences without explicit supervision on any of them; instead, our model learns by simply looking at images and reading paired questions and answers. |
22 | The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks | Jonathan Frankle, Michael Carbin | Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. |
23 | FFJORD: Free-Form Continuous Dynamics for Scalable Reversible Generative Models | Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, David Duvenaud | In this paper, we use Hutchinson?s trace estimator to give a scalable unbiased estimate of the log-density. |
24 | How Powerful are Graph Neural Networks? | Keyulu Xu*, Weihua Hu*, Jure Leskovec, Stefanie Jegelka | Here, we present a theoretical framework for analyzing the expressive power of GNNs to capture different graph structures. |
TABLE 2: ICLR 2019 Posters
Title | Authors | Highlight | |
---|---|---|---|
1 | Convolutional Neural Networks on Non-uniform Geometrical Signals Using Euclidean Spectral Transformation | Chiyu Max Jiang, Dequan Wang, Jingwei Huang, Philip Marcus, Matthias Niessner | To this end, we develop mathematical formulations for Non-Uniform Fourier Transforms (NUFT) to directly, and optimally, sample nonuniform data signals of different topologies defined on a simplex mesh into the spectral domain with no spatial sampling error. |
2 | Augmented Cyclic Adversarial Learning for Low Resource Domain Adaptation | Ehsan Hosseini-Asl, Yingbo Zhou, Caiming Xiong, Richard Socher | In this paper, we propose an augmented cyclic adversarial learning model that enforces the cycle-consistency constraint via an external task specific model, which encourages the preservation of task-relevant content as opposed to exact reconstruction. |
3 | Variance Networks: When Expectation Does Not Meet Your Expectations | Kirill Neklyudov, Dmitry Molchanov, Arsenii Ashukha, Dmitry Vetrov | In this paper, we introduce variance layers, a different kind of stochastic layers. |
4 | Initialized Equilibrium Propagation for Backprop-Free Training | Peter O’Connor, Efstratios Gavves, Max Welling | In response to this problem, we propose Initialized Equilibrium Propagation, which trains a feedforward network to initialize the iterative inference procedure for Equilibrium propagation. |
5 | Explaining Image Classifiers by Counterfactual Generation | Chun-Hao Chang, Elliot Creager, Anna Goldenberg, David Duvenaud | Explaining Image Classifiers by Counterfactual Generation. |
6 | SNIP: SINGLE-SHOT NETWORK PRUNING BASED ON CONNECTION SENSITIVITY | Namhoon Lee, Thalaiyasingam Ajanthan, Philip Torr | In this work, we present a new approach that prunes a given network once at initialization prior to training. |
7 | Diagnosing and Enhancing VAE Models | Bin Dai, David Wipf | In this regard, we rigorously analyze the VAE objective, differentiating situations where this belief is and is not actually true. |
8 | Disjoint Mapping Network for Cross-modal Matching of Voices and Faces | Yandong Wen, Mahmoud Al Ismail, Weiyang Liu, Bhiksha Raj, Rita Singh | We propose a novel framework, called Disjoint Mapping Network (DIMNet), for cross-modal biometric matching, in particular of voices and faces. |
9 | Automatically Composing Representation Transformations as a Means for Generalization | Michael Chang, Abhishek Gupta, Sergey Levine, Thomas L. Griffiths | As a first step for tackling compositional generalization, we introduce the compositional recursive learner, a domain-general framework for learning algorithmic procedures for composing representation transformations, producing a learner that reasons about what computation to execute by making analogies to previously seen problems. We propose the compositional generalization problem for measuring how readily old knowledge can be reused and hence built upon. |
10 | Visual Reasoning by Progressive Module Networks | Seung Wook Kim, Makarand Tapaswi, Sanja Fidler | We propose to represent a solver for each task as a neural module that calls existing modules (solvers for simpler tasks) in a functional program-like manner. |
11 | Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes | Roman Novak, Lechao Xiao, Yasaman Bahri, Jaehoon Lee, Greg Yang, Jiri Hron, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-dickstein | In this work, we derive an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and achieve state of the art results on CIFAR10 for GPs without trainable kernels. |
12 | Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference | Matthew Riemer, Ignacio Cases, Robert Ajemian, Miao Liu, Irina Rish, Yuhai Tu, and Gerald Tesauro | In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. |
13 | Sparse Dictionary Learning by Dynamical Neural Networks | Tsung-Han Lin, Ping Tak Peter Tang | Using spiking neurons to construct our dynamical network, we present a learning process, its rigorous mathematical analysis, and numerical results on several dictionary learning problems. |
14 | Eidetic 3D LSTM: A Model for Video Prediction and Beyond | Yunbo Wang, Lu Jiang, Ming-Hsuan Yang, Li-Jia Li, Mingsheng Long, Li Fei-Fei | We present a new model, Eidetic 3D LSTM (E3D-LSTM), that integrates 3D convolutions into RNNs. |
15 | ALISTA: Analytic Weights Are As Good As Learned Weights in LISTA | Jialin Liu, Xiaohan Chen, Zhangyang Wang, Wotao Yin | In this work, we propose Analytic LISTA (ALISTA), where the weight matrix in LISTA is computed as the solution to a data-free optimization problem, leaving only the stepsize and threshold parameters to data-driven learning. |
16 | Three Mechanisms of Weight Decay Regularization | Guodong Zhang, Chaoqi Wang, Bowen Xu, Roger Grosse | We identify three distinct mechanisms by which weight decay exerts a regularization effect, depending on the particular optimization algorithm and architecture: (1) increasing the effective learning rate, (2) approximately regularizing the input-output Jacobian norm, and (3) reducing the effective damping coefficient for second-order optimization.Our results provide insight into how to improve the regularization of neural networks. |
17 | Learning Multimodal Graph-to-Graph Translation for Molecule Optimization | Wengong Jin, Kevin Yang, Regina Barzilay, Tommi Jaakkola | Our primary contributions include a junction tree encoder-decoder for learning diverse graph translations along with a novel adversarial training method for aligning distributions of molecules. |
18 | A Data-Driven and Distributed Approach to Sparse Signal Representation and Recovery | Ali Mousavi, Gautam Dasarathy, Richard G. Baraniuk | In this paper, we focus on two challenges which offset the promise of sparse signal representation, sensing, and recovery. |
19 | On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data | Nan Lu, Gang Niu, Aditya Krishna Menon, Masashi Sugiyama | In this paper, we study training arbitrary (from linear to deep) binary classifier from only unlabeled (U) data by ERM. |
20 | Neural Logic Machines | Honghua Dong, Jiayuan Mao, Tian Lin, Chong Wang, Lihong Li, Denny Zhou | We propose the Neural Logic Machine (NLM), a neural-symbolic architecture for both inductive learning and logic reasoning. |
21 | Neural Speed Reading with Structural-Jump-LSTM | Christian Hansen, Casper Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma | We present Structural-Jump-LSTM: the first neural speed reading model to both skip and jump text during inference. |
22 | Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures | Jonathan Uesato*, Ananya Kumar*, Csaba Szepesvari*, Tom Erez, Avraham Ruderman, Keith Anderson, Krishnamurthy (Dj) Dvijotham, Nicolas Heess, Pushmeet Kohli | To solve this we propose a continuation approach that learns failure modes in related but less robust agents. |
23 | Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search | Lars Buesing, Theophane Weber, Yori Zwols, Nicolas Heess, Sebastien Racaniere, Arthur Guez, Jean-Baptiste Lespiau | Based on this, we propose the Counterfactually-Guided Policy Search (CF-GPS) algorithm for learning policies in POMDPs from off-policy experience. |
24 | signSGD via Zeroth-Order Oracle | Sijia Liu, Pin-Yu Chen, Xiangyi Chen, Mingyi Hong | In this paper, we design and analyze a new zeroth-order (ZO) stochastic optimization algorithm, ZO-signSGD, which enjoys dual advantages of gradient-free operations and signSGD. |
25 | Preventing Posterior Collapse with delta-VAEs | Ali Razavi, Aaron van den Oord, Ben Poole, Oriol Vinyals | Due to the phenomenon of ?posterior collapse,? current latent variable generative models pose a challenging design choice that either weakens the capacity of the decoder or requires altering the training objective. |
26 | Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees | Yuping Luo, Huazhe Xu, Yuanzhi Li, Yuandong Tian, Trevor Darrell, Tengyu Ma | This paper introduces a novel algorithmic framework for designing and analyzing model-based RL algorithms with theoretical guarantees. |
27 | Knowledge Flow: Improve Upon Your Teachers | Iou-Jen Liu, Jian Peng, Alexander Schwing | To address this issue, in this paper, we develop knowledge flow which moves ?knowledge? from multiple deep nets, referred to as teachers, to a new deep net model, called the student. |
28 | Directed-Info GAIL: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information | Mohit Sharma, Arjun Sharma, Nicholas Rhinehart, Kris M. Kitani | We propose a new algorithm based on the generative adversarial imitation learning framework which automatically learns sub-task policies from unsegmented demonstrations. |
29 | A Max-Affine Spline Perspective of Recurrent Neural Networks | Zichao Wang, Randall Balestriero, Richard Baraniuk | The resulting representation provides several new perspectives for analyzing RNNs, three of which we study in this paper. |
30 | Learning to Navigate the Web | Izzeddin Gur, Ulrich Rueckert, Aleksandra Faust, Dilek Hakkani-Tur | We approach the aforementioned problems from a different perspective and propose guided RL approaches that can generate unbounded amount of experience for an agent to learn from. |
31 | Training for Faster Adversarial Robustness Verification via Inducing ReLU Stability | Kai Y. Xiao, Vincent Tjeng, Nur Muhammad (Mahi) Shafiullah, Aleksander Madry | Specifically, we aim to train deep neural networks that not only are robust to adversarial perturbations but also whose robustness can be verified more easily. |
32 | Learning to Learn with Conditional Class Dependencies | Xiang Jiang, Mohammad Havaei, Farshid Varno, Gabriel Chartrand, Nicolas Chapados, Stan Matwin | We propose a meta-learning framework, Conditional class-Aware Meta-Learning (CAML), that conditionally transforms feature representations based on a metric space that is trained to capture inter-class dependencies. |
33 | Hierarchical Visuomotor Control of Humanoids | Josh Merel, Arun Ahuja, Vu Pham, Saran Tunyasuvunakool, Siqi Liu, Dhruva Tirumala, Nicolas Heess, Greg Wayne | In this work, we partly factor this problem into low-level motor control from proprioception and high-level coordination of the low-level skills informed by vision. |
34 | Unsupervised Adversarial Image Reconstruction | Arthur Pajot, Emmanuel de Bezenac, Patrick Gallinari | We cast the problem as finding the \textit{maximum a posteriori} estimate of the signal given each measurement, and propose a general framework for the reconstruction problem. |
35 | Max-MIG: an Information Theoretic Approach for Joint Learning from Crowds | Peng Cao, Yilun Xu, Yuqing Kong, Yizhou Wang | We propose an information theoretic approach, Max-MIG, for joint learning from crowds, with a common assumption: the crowdsourced labels and the data are independent conditioning on the ground truth. |
36 | AutoLoss: Learning Discrete Schedule for Alternate Optimization | Haowen Xu, Hao Zhang, Zhiting Hu, Xiaodan Liang, Ruslan Salakhutdinov, Eric Xing | In this paper, we present AutoLoss, a meta-learning framework that automatically learns and determines the optimization schedule. |
37 | Learning what and where to attend | Drew Linsley, Dan Shiebler, Sven Eberhardt, Thomas Serre | Here, we demonstrate the benefit of using stronger supervisory signals by teaching DCNs to attend to image regions that humans deem important for object recognition. |
38 | ROBUST ESTIMATION VIA GENERATIVE ADVERSARIAL NETWORKS | Chao GAO, jiyi LIU, Yuan YAO, Weizhi ZHU | In this paper, we establish an intriguing connection between f-GANs and various depth functions through the lens of f-Learning. |
39 | INVASE: Instance-wise Variable Selection using Neural Networks | Jinsung Yoon, James Jordon, Mihaela van der Schaar | In this paper, we propose a new instance-wise feature selection method, which we term INVASE. |
40 | Meta-Learning with Latent Embedding Optimization | Andrei A. Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan Pascanu, Simon Osindero, Raia Hadsell | We show that it is possible to bypass these limitations by learning a data-dependent latent generative representation of model parameters, and performing gradient-based meta-learning in this low-dimensional latent space. |
41 | Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach | Wenda Zhou, Victor Veitch, Morgane Austern, Ryan P. Adams, Peter Orbanz | The purpose of this paper is to connect these two empirical observations. |
42 | Learning to Represent Edits | Pengcheng Yin, Graham Neubig, Miltiadis Allamanis, Marc Brockschmidt, Alexander L. Gaunt | We introduce the problem of learning distributed representations of edits. |
43 | Neural Probabilistic Motor Primitives for Humanoid Control | Josh Merel, Leonard Hasenclever, Alexandre Galashov, Arun Ahuja, Vu Pham, Greg Wayne, Yee Whye Teh, Nicolas Heess | To do this, we propose a motor architecture that has the general structure of an inverse model with a latent-variable bottleneck. |
44 | Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder | Caio Corro, Ivan Titov | To this end, we propose a novel latent-variable generative model for semi-supervised syntactic dependency parsing. |
45 | Janossy Pooling: Learning Deep Permutation-Invariant Functions for Variable-Size Inputs | Ryan L. Murphy, Balasubramaniam Srinivasan, Vinayak Rao, Bruno Ribeiro | We consider a simple and overarching representation for permutation-invariant functions of sequences (or set functions). |
46 | An Empirical Study of Example Forgetting during Deep Neural Network Learning | Mariya Toneva*, Alessandro Sordoni*, Remi Tachet des Combes*, Adam Trischler, Yoshua Bengio, Geoffrey J. Gordon | Our goal is to understand whether a related phenomenon occurs when data does not undergo a clear distributional shift. |
47 | RNNs implicitly implement tensor-product representations | R. Thomas McCoy, Tal Linzen, Ewan Dunbar, Paul Smolensky | To test this hypothesis, we introduce Tensor Product Decomposition Networks (TPDNs), which use TPRs to approximate existing vector representations. |
48 | Learning To Solve Circuit-SAT: An Unsupervised Differentiable Approach | Saeed Amizadeh, Sergiy Matusevych, Markus Weimer | In this paper, we propose a neural framework that can learn to solve the Circuit Satisfiability problem. |
49 | Dynamic Channel Pruning: Feature Boosting and Suppression | Xitong Gao, Yiren Zhao, Lukasz Dudziak, Robert Mullins, Cheng-zhong Xu | In this paper, we reduce this cost by exploiting the fact that the importance of features computed by convolutional layers is highly input-dependent, and propose feature boosting and suppression (FBS), a new method to predictively amplify salient convolutional channels and skip unimportant ones at run-time. |
50 | signSGD with Majority Vote is Communication Efficient and Fault Tolerant | Jeremy Bernstein, Jiawei Zhao, Kamyar Azizzadenesheli, Anima Anandkumar | We explore a particularly simple algorithm for robust, communication-efficient learning—signSGD. |
51 | Bounce and Learn: Modeling Scene Dynamics with Real-World Bounces | Senthil Purushwalkam, Abhinav Gupta, Danny Kaufman, Bryan Russell | We introduce an approach to model surface properties governing bounces in everyday scenes. VIM learns to infer physical parameters for locations in a scene given a single still image, while PIM learns to model physical interactions for the prediction task given physical parameters and observed pre-collision 3D trajectories.To achieve our results, we introduce the Bounce Dataset comprising 5K RGB-D videos of bouncing trajectories of a foam ball to probe surfaces of varying shapes and materials in everyday scenes including homes and offices.Our proposed model learns from our collected dataset of real-world bounces and is bootstrapped with additional information from simple physics simulations. |
52 | K for the Price of 1: Parameter-efficient Multi-task and Transfer Learning | Pramod Kaushik Mudrakarta, Mark Sandler, Andrey Zhmoginov, Andrew Howard | We introduce a novel method that enables parameter-efficient transfer and multi-task learning with deep neural networks. |
53 | Towards Metamerism via Foveated Style Transfer | Arturo Deza, Aditya Jonnalagadda, Miguel P. Eckstein | In this paper, we propose ourNeuroFovea metamer model, a foveated generative model that is based on a mixtureof peripheral representations and style transfer forward-pass algorithms. |
54 | Post Selection Inference with Incomplete Maximum Mean Discrepancy Estimator | Makoto Yamada, Denny Wu, Yao-Hung Hubert Tsai, Hirofumi Ohta, Ruslan Salakhutdinov, Ichiro Takeuchi, Kenji Fukumizu | In this paper, we propose a post selection inference (PSI) framework for divergence measure, which can select a set of statistically significant features that discriminate two distributions. |
55 | Emergent Coordination Through Competition | Siqi Liu, Guy Lever, Josh Merel, Saran Tunyasuvunakool, Nicolas Heess, Thore Graepel | We study the emergence of cooperative behaviors in reinforcement learning agents by introducing a challenging competitive multi-agent soccer environment with continuous simulated physics. |
56 | Prior Convictions: Black-box Adversarial Attacks with Bandits and Priors | Andrew Ilyas, Logan Engstrom, Aleksander Madry | We introduce a framework that conceptually unifies much of the existing work on black-box attacks, and demonstrate that the current state-of-the-art methods are optimal in a natural sense. |
57 | Sample Efficient Imitation Learning for Continuous Control | Fumihiro Sasaki | We believe that IL algorithms could be more applicable to real-world problems if the number of interactions could be reduced.In this paper, we propose a model-free IL algorithm for continuous control. |
58 | Generative Code Modeling with Graphs | Marc Brockschmidt, Miltiadis Allamanis, Alexander L. Gaunt, Oleksandr Polozov | We present a novel model for this problem that uses a graph to represent the intermediate state of the generated output. |
59 | Critical Learning Periods in Deep Networks | Alessandro Achille, Matteo Rovere, Stefano Soatto | To better understand this phenomenon, we use the Fisher Information of the weights to measure the effective connectivity between layers of a network during training. |
60 | CEM-RL: Combining evolutionary and gradient-based methods for policy search | Pourchot, Sigaud | In this paper, we propose a different combination scheme using the simple cross-entropymethod (CEM) and Twin Delayed Deep Deterministic policy gradient (TD3), another off-policy deep RL algorithm which improves over DDPG. |
61 | LanczosNet: Multi-Scale Deep Graph Convolutional Networks | Renjie Liao, Zhizhen Zhao, Raquel Urtasun, Richard Zemel | We propose Lanczos network (LanczosNet) which uses the Lanczos algorithm to construct low rank approximations of the graph Laplacian for graph convolution.Relying on the tridiagonal decomposition of the Lanczos algorithm, we not only efficiently exploit multi-scale information via fast approximated computation of matrix power but also design learnable spectral filters.Being fully differentiable, LanczosNet facilitates both graph kernel learning as well as learning node embeddings.We show the connection between our LanczosNet and graph based manifold learning, especially diffusion maps.We benchmark our model against $8$ recent deep graph networks on citation datasets and QM8 quantum chemistry dataset.Experimental results show that our model achieves the state-of-the-art performance in most tasks. |
62 | Excessive Invariance Causes Adversarial Vulnerability | Joern-Henrik Jacobsen, Jens Behrmann, Richard Zemel, Matthias Bethge | One core idea of adversarial example research is to reveal neural network errors under such distribution shifts. |
63 | Hindsight policy gradients | Paulo Rauber, Avinash Ummadisingu, Filipe Mutz, J?rgen Schmidhuber | In this paper, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms. |
64 | Adaptive Gradient Methods with Dynamic Bound of Learning Rate | Liangchen Luo, Yuanhao Xiong, Yan Liu, Xu Sun | Adaptive optimization methods such as AdaGrad, RMSprop and Adam have been proposed to achieve a rapid training process with an element-wise scaling term on learning rates. |
65 | Decoupled Weight Decay Regularization | Ilya Loshchilov, Frank Hutter | L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is \emph{not} the case for adaptive gradient algorithms, such as Adam. |
66 | Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile | Panayotis Mertikopoulos, Bruno Lecouat, Houssam Zenati, Chuan-Sheng Foo, Vijay Chandrasekhar, Georgios Piliouras | To make piecemeal progress along these lines, we analyze the behavior of mirror descent (MD) in a class of non-monotone problems whose solutions coincide with those of a naturally associated variational inequality ? a property which we call coherence. |
67 | DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder | Xiaodong Gu, Kyunghyun Cho, Jung-Woo Ha, Sunghun Kim | In this paper, we propose DialogWAE, a conditional Wasserstein autoencoder (WAE) specially designed for dialogue modeling. |
68 | No Training Required: Exploring Random Encoders for Sentence Classification | John Wieting, Douwe Kiela | Our aim is to put sentence embeddings on more solid footing by 1) looking at how much modern sentence embeddings gain over random methods—as it turns out, surprisingly little; and by 2) providing the field with more appropriate baselines going forward—which are, as it turns out, quite strong. |
69 | Neural Graph Evolution: Towards Efficient Automatic Robot Design | Tingwu Wang, Yuhao Zhou, Sanja Fidler, Jimmy Ba | We propose Neural Graph Evolution (NGE), which performs selection on current candidates and evolves new ones iteratively. |
70 | Function Space Particle Optimization for Bayesian Neural Networks | Ziyu Wang, Tongzheng Ren, Jun Zhu, Bo Zhang | In this paper, we propose to solve this issue by performing particle optimization directly in the space of regression functions. |
71 | Structured Adversarial Attack: Towards General Implementation and Better Interpretability | Kaidi Xu, Sijia Liu, Pu Zhao, Pin-Yu Chen, Huan Zhang, Quanfu Fan, Deniz Erdogmus, Yanzhi Wang, Xue Lin | This work develops a more general attack model, i.e., the structured attack (StrAttack), which explores group sparsity in adversarial perturbation by sliding a mask through images aiming for extracting key spatial structures. |
72 | Spherical CNNs on Unstructured Grids | Chiyu Max Jiang, Jingwei Huang, Karthik Kashinath, Prabhat, Philip Marcus, Matthias Niessner | We present an efficient convolution kernel for Convolutional Neural Networks (CNNs) on unstructured grids using parameterized differential operators while focusing on spherical signals such as panorama images or planetary signals.To this end, we replace conventional convolution kernels with linear combinations of differential operators that are weighted by learnable parameters. |
73 | Optimal Transport Maps For Distribution Preserving Operations on Latent Spaces of Generative Models | Eirikur Agustsson, Alexander Sage, Radu Timofte, Luc Van Gool | In this paper, we propose a framework for modifying the latent space operations such that the distribution mismatch is fully eliminated. |
74 | Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning | Michael Lutter, Christian Ritter, Jan Peters | As a first example, we propose Deep Lagrangian Networks (DeLaN) as a deep network structure upon which Lagrangian Mechanics have been imposed. |
75 | Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks | Charbel Sakr, Naigang Wang, Chia-Yu Chen, Jungwook Choi, Ankur Agrawal, Naresh Shanbhag, Kailash Gopalakrishnan | We present a statistical approach to analyze the impact of reduced accumulation precision on deep learning training. |
76 | Deep Convolutional Networks as shallow Gaussian Processes | Adri? Garriga-Alonso, Carl Edward Rasmussen, Laurence Aitchison | We show that the output of a (residual) CNN with an appropriate prior over the weights and biases is a GP in the limit of infinitely many convolutional filters, extending similar results for dense networks. |
77 | Unsupervised Domain Adaptation for Distance Metric Learning | Kihyuk Sohn, Wenling Shang, Xiang Yu, Manmohan Chandraker | To handle both within and cross domain verifications, we propose a Feature Transfer Network (FTN) to separate the target feature space from the original source space while aligned with a transformed source space. |
78 | A comprehensive, application-oriented study of catastrophic forgetting in DNNs | B. Pf?lb, A. Gepperth | We present a large-scale empirical study of catastrophic forgetting (CF) in modern Deep Neural Network (DNN) models that perform sequential (or: incremental) learning.A new experimental protocol is proposed that takes into account typical constraints encountered in application scenarios.As the investigation is empirical, we evaluate CF behavior on the hitherto largest number of visual classification datasets, from each of which we construct a representative number of Sequential Learning Tasks (SLTs) in close alignment to previous works on CF.Our results clearly indicate that there is no model that avoids CF for all investigated datasets and SLTs under application conditions. |
79 | Posterior Attention Models for Sequence to Sequence Learning | Shiv Shankar, Sunita Sarawagi | In this paper we show that prevalent attention architectures do not adequately model the dependence among the attention and output tokens across a predicted sequence.We present an alternative architecture called Posterior Attention Models that after a principled factorization of the full joint distribution of the attention and output variables, proposes two major changes. |
80 | Generative Question Answering: Learning to Answer the Whole Question | Mike Lewis, Angela Fan | We introduce generative models of the joint distribution of questions and answers, which are trained to explain the whole question, not just to answer it.Our question answering (QA) model is implemented by learning a prior over answers, and a conditional language model to generate the question given the answer?allowing scalable and interpretable many-hop reasoning as the question is generated word-by-word. |
81 | Diversity and Depth in Per-Example Routing Models | Prajit Ramachandran, Quoc V. Le | In this work, we address both of these deficiencies. |
82 | Selfless Sequential Learning | Rahaf Aljundi, Marcus Rohrbach, Tinne Tuytelaars | In this paper we look at a scenario with fixed model capacity, and postulate that the learning process should not be selfish, i.e. it should account for future tasks to be added and thus leave enough capacity for them. |
83 | M^3RL: Mind-aware Multi-agent Management Reinforcement Learning | Tianmin Shu, Yuandong Tian | In this paper, we aim to address this from a different angle. |
84 | The Deep Weight Prior | Andrei Atanov, Arsenii Ashukha, Kirill Struminsky, Dmitriy Vetrov, Max Welling | In this work, we propose a new type of prior distributions for convolutional neural networks, deep weight prior (DWP), that exploit generative models to encourage a specific structure of trained convolutional filters e.g., spatial correlations of weights. |
85 | Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution | Thomas Elsken, Jan Hendrik Metzen, Frank Hutter | We address the first shortcoming by proposing LEMONADE, an evolutionary algorithm for multi-objective architecture search that allows approximating the Pareto-front of architectures under multiple objectives, such as predictive performance and number of parameters, in a single run of the method. |
86 | Quaternion Recurrent Neural Networks | Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linar?s, Chiheb Trabelsi, Renato De Mori, Yoshua Bengio | We propose a novel quaternion recurrent neural network (QRNN), alongside with a quaternion long-short term memory neural network (QLSTM), that take into account both the external relations and these internal structural dependencies with the quaternion algebra. |
87 | Adversarial Audio Synthesis | Chris Donahue, Julian McAuley, Miller Puckette | In this paper we introduce WaveGAN, a first attempt at applying GANs to unsupervised synthesis of raw-waveform audio. |
88 | Preconditioner on Matrix Lie Group for SGD | Xi-Lin Li | We study two types of preconditioners and preconditioned stochastic gradient descent (SGD) methods in a unified framework. |
89 | Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks | Patrick Chen, Si Si, Sanjiv Kumar, Yang Li, Cho-Jui Hsieh | In this paper, we introduce a novel softmax layer approximation algorithm by exploiting the clustering structure of context vectors. |
90 | Adaptive Posterior Learning: few-shot learning with a surprise-based memory module | Tiago Ramalho, Marta Garnelo | In this paper we introduce APL, an algorithm that approximates probability distributions by remembering the most surprising observations it has encountered. |
91 | Probabilistic Planning with Sequential Monte Carlo methods | Alexandre Piche, Valentin Thomas, Cyril Ibrahim, Yoshua Bengio, Chris Pal | In this work, we propose a novel formulation of planning which views it as a probabilistic inference problem over future optimal trajectories. |
92 | Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control | Kendall Lowrey, Aravind Rajeswaran, Sham Kakade, Emanuel Todorov, Igor Mordatch | We propose a “plan online and learn offline” framework for the setting where an agent, with an internal model, needs to continually act and learn in the world. |
93 | DHER: Hindsight Experience Replay for Dynamic Goals | Meng Fang, Cheng Zhou, Bei Shi, Boqing Gong, Jia Xu, Tong Zhang | DHER automatically assembles successful experiences from two relevant failures and can be used to enhance an arbitrary off-policy RL algorithm when the tasks’ goals are dynamic. |
94 | FlowQA: Grasping Flow in History for Conversational Machine Comprehension | Hsin-Yuan Huang, Eunsol Choi, Wen-tau Yih | To enable traditional, single-turn models to encode the history comprehensively, we introduce Flow, a mechanism that can incorporate intermediate representations generated during the process of answering previous questions, through an alternating parallel processing structure. |
95 | Learning to Design RNA | Frederic Runge, Danny Stoll, Stefan Falkner, Frank Hutter | Here, we propose a new algorithm for the RNA Design problem, dubbed LEARNA. |
96 | Robust Conditional Generative Adversarial Networks | Grigorios G. Chrysos, Jean Kossaifi, Stefanos Zafeiriou | In this work, we introduce a novel conditional GAN model, called RoCGAN, which leverages structure in the target space of the model to address the issue. |
97 | Top-Down Neural Model For Formulae | Karel Chvalovsk? | We present a simple neural model that given a formula and a property tries to answer the question whether the formula has the given property, for example whether a propositional formula is always true. |
98 | Cost-Sensitive Robustness against Adversarial Examples | Xiao Zhang, David Evans | We encode the potential harm of each adversarial transformation in a cost matrix, and propose a general objective function to adapt the robust training method of Wong & Kolter (2018) to optimize for cost-sensitive robustness. |
99 | The role of over-parametrization in generalization of neural networks | Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann LeCun, Nathan Srebro | In this work we suggest a novel complexity measure based on unit-wise capacities resulting in a tighter generalization bound for two layer ReLU networks. |
100 | Diffusion Scattering Transforms on Graphs | Fernando Gama, Alejandro Ribeiro, Joan Bruna | This stability to deformations can be interpreted as stability with respect to changes in the metric structure of the domain.In this work, we show that scattering transforms can be generalized to non-Euclidean domains using diffusion wavelets, while preserving a notion of stability with respect to metric changes in the domain, measured with diffusion maps. |
101 | Capsule Graph Neural Network | Zhang Xinyi, Lihui Chen | Capsule Graph Neural Network. |
102 | Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking | Haichuan Yang, Yuhao Zhu, Ji Liu | This paper proposes the first end-to-end DNN training framework that provides quantitative energy consumption guarantees via weighted sparse projection and input masking. |
103 | Emerging Disentanglement in Auto-Encoder Based Unsupervised Image Content Transfer | Ori Press, Tomer Galanti, Sagie Benaim, Lior Wolf | We study the problem of learning to map, in an unsupervised way, between domains $A$ and $B$, such that the samples $\vb \in B$ contain all the information that exists in samples $\va\in A$ and some additional information. |
104 | SGD Converges to Global Minimum in Deep Learning via Star-convex Path | Yi Zhou, Junjie Yang, Huishuai Zhang, Yingbin Liang, Vahid Tarokh | In this study, we establish the convergence of SGD to a global minimum for nonconvex optimization problems that are commonly encountered in neural network training. |
105 | Toward Understanding the Impact of Staleness in Distributed Machine Learning | Wei Dai, Yi Zhou, Nanqing Dong, Hao Zhang, Eric Xing | In this work, we study the convergence behaviors of a wide array of ML models and algorithms under delayed updates. |
106 | Transfer Learning for Sequences via Learning to Collocate | Wanyun Cui, Guangyu Zheng, Zhiqiang Shen, Sihang Jiang, Wei Wang | We conducted extensive experiments on both sequence labeling tasks (POS tagging, NER) and sentence classification (sentiment analysis). |
107 | Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure | Karan Goel, Emma Brunskill | In this work, we consider the problem of learning procedural abstractions from possibly high-dimensional observational sequences, such as video demonstrations. |
108 | Unsupervised Speech Recognition via Segmental Empirical Output Distribution Matching | Chih-Kuan Yeh, Jianshu Chen, Chengzhu Yu, Dong Yu | We propose a fully unsupervised learning algorithm that alternates between solving two sub-problems: (i) learn a phoneme classifier for a given set of phoneme segmentation boundaries, and (ii) refining the phoneme boundaries based on a given classifier. |
109 | Adversarial Attacks on Graph Neural Networks via Meta Learning | Daniel Z?gner, Stephan G?nnemann | Deep learning models for graphs have advanced the state of the art on many tasks. |
110 | Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection | Tue Le, Tuan Nguyen, Trung Le, Dinh Phung, Paul Montague, Olivier De Vel, Lizhen Qu | In this paper, we attempt to alleviate this severe binary vulnerability detection bottleneck by leveraging recent advances in deep learning representations and propose the Maximal Divergence Sequential Auto-Encoder. |
111 | Neural Program Repair by Jointly Learning to Localize and Repair | Marko Vasic, Aditya Kanade, Petros Maniatis, David Bieber, Rishabh Singh | In this work, we consider a recently identified class of bugs called variable-misuse bugs. |
112 | Information-Directed Exploration for Deep Reinforcement Learning | Nikolay Nikolov, Johannes Kirschner, Felix Berkenkamp, Andreas Krause | Motivated by recent findings that address this issue in bandits, we propose to use Information-Directed Sampling (IDS) for exploration in reinforcement learning. |
113 | Attention, Learn to Solve Routing Problems! | Wouter Kool, Herke van Hoof, Max Welling | We contribute in both directions: we propose a model based on attention layers with benefits over the Pointer Network and we show how to train this model using REINFORCE with a simple baseline based on a deterministic greedy rollout, which we find is more efficient than using a value function. |
114 | L2-Nonexpansive Neural Networks | Haifeng Qian, Mark N. Wegman | This paper proposes a class of well-conditioned neural networks in which a unit amount of change in the inputs causes at most a unit amount of change in the outputs or any of the internal layers. |
115 | Improving Generalization and Stability of Generative Adversarial Networks | Hoang Thanh-Tung, Truyen Tran, Svetha Venkatesh | In this paper, we analyze the generalization of GANs in practical settings. |
116 | Adaptive Input Representations for Neural Language Modeling | Alexei Baevski, Michael Auli | We introduce adaptive input representations for neural language modeling which extend the adaptive softmax of Grave et al. (2017) to input representations of variable capacity. |
117 | Neural Persistence: A Complexity Measure for Deep Neural Networks Using Algebraic Topology | Bastian Rieck, Matteo Togninalli, Christian Bock, Michael Moor, Max Horn, Thomas Gumbsch, Karsten Borgwardt | In this work, we propose neural persistence, a complexity measure for neural network architectures based on topological data analysis on weighted stratified graphs. |
118 | Efficient Augmentation via Data Subsampling | Michael Kuchnik, Virginia Smith | In this work, we demonstrate that it is possible to significantly reduce the number of data points included in data augmentation while realizing the same accuracy and invariance benefits of augmenting the entire dataset. We propose a novel set of subsampling policies, based on model influence and loss, that can achieve a 90% reduction in augmentation set size while maintaining the accuracy gains of standard data augmentation. |
119 | Neural TTS Stylization with Adversarial and Collaborative Games | Shuang Ma, Daniel Mcduff, Yale Song | In this work, we introduce an end-to-end TTS model that offers enhanced content-style disentanglement ability and controllability. |
120 | Optimal Control Via Neural Networks: A Convex Approach | Yize Chen, Yuanyuan Shi, Baosen Zhang | Therefore many systems are still identified and controlled based on simple linear models despite their poor representation capability.In this paper we bridge the gap between model accuracy and control tractability faced by neural networks, by explicitly constructing networks that are convex with respect to their inputs. |
121 | CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model | Florian Mai, Lukas Galke, Ansgar Scherp | Motivated by these findings, we propose a hybrid model that combines the strengths of CBOW and CMOW. |
122 | Stochastic Optimization of Sorting Networks via Continuous Relaxations | Aditya Grover, Eric Wang, Aaron Zweig, Stefano Ermon | In this work, we propose NeuralSort, a general-purpose continuous relaxation of the output of the sorting operator from permutation matrices to the set of unimodal row-stochastic matrices, where every row sums to one and has a distinct argmax. |
123 | Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality | Taiji Suzuki | Deep learning has shown high performances in various types of tasks from visual recognition to natural language processing,which indicates superior flexibility and adaptivity of deep learning.To understand this phenomenon theoretically, we develop a new approximation and estimation error analysis ofdeep learning with the ReLU activation for functions in a Besov space and its variant with mixed smoothness.The Besov space is a considerably general function space including the Holder space and Sobolev space, and especially can capture spatial inhomogeneity of smoothness. |
124 | Generating Multiple Objects at Spatially Distinct Locations | Tobias Hinz, Stefan Heinrich, Stefan Wermter | We introduce a new approach which allows us to control the location of arbitrarily many objects within an image by adding an object pathway to both the generator and the discriminator. |
125 | Near-Optimal Representation Learning for Hierarchical Reinforcement Learning | Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine | We study the problem of representation learning in goal-conditioned hierarchical reinforcement learning. |
126 | Understanding Composition of Word Embeddings via Tensor Decomposition | Abraham Frandsen, Rong Ge | In this paper we consider the problem of word embedding composition \— given vector representations of two words, compute a vector for the entire phrase. |
127 | Structured Neural Summarization | Patrick Fernandes, Miltiadis Allamanis, Marc Brockschmidt | Based on the promising results of graph neural networks on highly structured data, we develop a framework to extend existing sequence encoders with a graph component that can reason about long-distance relationships in weakly structured data such as text. |
128 | Graph Wavelet Neural Network | Bingbing Xu, Huawei Shen, Qi Cao, Yunqi Qiu, Xueqi Cheng | We present graph wavelet neural network (GWNN), a novel graph convolutional neural network (CNN), leveraging graph wavelet transform to address the shortcomings of previous spectral graph CNN methods that depend on graph Fourier transform. |
129 | A rotation-equivariant convolutional neural network model of primary visual cortex | Alexander S. Ecker, Fabian H. Sinz, Emmanouil Froudarakis, Paul G. Fahey, Santiago A. Cadena, Edgar Y. Walker, Erick Cobos, Jacob Reimer, Andreas S. Tolias, Matthias Bethge | We present a framework for identifying common features independent of individual neurons’ orientation selectivity by using a rotation-equivariant convolutional neural network, which automatically extracts every feature at multiple different orientations. |
130 | Supervised Community Detection with Line Graph Neural Networks | Zhengdao Chen, Lisha Li, Joan Bruna | We present a novel family of Graph Neural Networks (GNNs) for solving community detection problems in a supervised learning setting. |
131 | Multiple-Attribute Text Rewriting | Guillaume Lample, Sandeep Subramanian, Eric Smith, Ludovic Denoyer, Marc’Aurelio Ranzato, Y-Lan Boureau | We thus propose a new model that controls several factors of variation in textual data where this condition on disentanglement is replaced with a simpler mechanism based on back-translation. |
132 | Wasserstein Barycenter Model Ensembling | Pierre Dognin*, Igor Melnyk*, Youssef Mroueh*, Jarret Ross*, Cicero Dos Santos*, Tom Sercu* | In this paper we propose to perform model ensembling in a multiclass or a multilabel learning setting using Wasserstein (W.) barycenters. |
133 | Policy Transfer with Strategy Optimization | Wenhao Yu, C. Karen Liu, Greg Turk | In this paper, we present a differentapproach that leverages domain randomization for transferring control policies tounknown environments. |
134 | code2seq: Generating Sequences from Structured Representations of Code | Uri Alon, Shaked Brody, Omer Levy, Eran Yahav | We present code2seq: an alternative approach that leverages the syntactic structure of programming languages to better encode source code. |
135 | Predict then Propagate: Graph Neural Networks meet Personalized PageRank | Johannes Klicpera, Aleksandar Bojchevski, Stephan G?nnemann | In this paper, we use the relationship between graph convolutional networks (GCN) and PageRank to derive an improved propagation scheme based on personalized PageRank. |
136 | Slimmable Neural Networks | Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, Thomas Huang | We present a simple and general method to train a single neural network executable at different widths (number of channels in a layer), permitting instant and adaptive accuracy-efficiency trade-offs at runtime. |
137 | Analysing Mathematical Reasoning Abilities of Neural Models | David Saxton, Edward Grefenstette, Felix Hill, Pushmeet Kohli | In this paper, we present a new challenge for the evaluation (and eventually the design) of neural architectures and similar system, developing a task suite of mathematics problems involving sequential questions and answers in a free-form textual input/output format. |
138 | RotDCF: Decomposition of Convolutional Filters for Rotation-Equivariant Deep Networks | Xiuyuan Cheng, Qiang Qiu, Robert Calderbank, Guillermo Sapiro | This paper proposes to decompose the convolutional filters over joint steerable bases across the space and the group geometry simultaneously, namely a rotation-equivariant CNN with decomposed convolutional filters (RotDCF). |
139 | Execution-Guided Neural Program Synthesis | Xinyun Chen, Chang Liu, Dawn Song | In this work, we propose two simple yet principled techniques to better leverage the semantic information, which are execution-guided synthesis and synthesizer ensemble. |
140 | Dynamic Sparse Graph for Efficient Deep Learning | Liu Liu, Lei Deng, Xing Hu, Maohua Zhu, Guoqi Li, Yufei Ding, Yuan Xie | We propose to execute deep neural networks (DNNs) with dynamic and sparse graph (DSG) structure for compressive memory and accelerative execution during both training and inference. |
141 | Fixup Initialization: Residual Learning Without Normalization | Hongyi Zhang, Yann N. Dauphin, Tengyu Ma | In this work, we challenge the commonly-held beliefs by showing that none of the perceived benefits is unique to normalization. |
142 | ProbGAN: Towards Probabilistic GAN with Theoretical Guarantees | Hao He, Hao Wang, Guang-He Lee, Yonglong Tian | In this paper, we propose a novel probabilistic framework for GANs, ProbGAN, which iteratively learns a distribution over generators with a carefully crafted prior. |
143 | Exploration by random network distillation | Yuri Burda, Harrison Edwards, Amos Storkey, Oleg Klimov | We introduce an exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed. |
144 | Unsupervised Learning of the Set of Local Maxima | Lior Wolf, Sagie Benaim, Tomer Galanti | We present an algorithm, show an example where it is more efficient to use local maxima as an indicator function than to employ conventional classification, and derive a suitable generalization bound. |
145 | On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization | Xiangyi Chen, Sijia Liu, Ruoyu Sun, Mingyi Hong | In this paper, we develop an analysis framework and a set of mild sufficient conditions that guarantee the convergence of the Adam-type methods, with a convergence rate of order $O(\log{T}/\sqrt{T})$ for non-convex stochastic optimization. |
146 | Minimum Divergence vs. Maximum Margin: an Empirical Comparison on Seq2Seq Models | Huan Zhang, Hai Zhao | We introduce a new training criterion based on the analysis of existing work, and empirically compare models in the two categories. |
147 | GANSynth: Adversarial Neural Audio Synthesis | Jesse Engel, Kumar Krishna Agrawal, Shuo Chen, Ishaan Gulrajani, Chris Donahue, Adam Roberts | Herein, we demonstrate that GANs can in fact generate high-fidelity and locally-coherent audio by modeling log magnitudes and instantaneous frequencies with sufficient frequency resolution in the spectral domain. |
148 | Sliced Wasserstein Auto-Encoders | Soheil Kolouri, Phillip E. Pope, Charles E. Martin, Gustavo K. Rohde | In this paper we use the geometric properties of the optimal transport (OT) problem and the Wasserstein distances to define a prior distribution for the latent space of an auto-encoder. |
149 | Learning Two-layer Neural Networks with Symmetric Inputs | Rong Ge, Rohith Kuditipudi, Zhize Li, Xiang Wang | We give a new algorithm for learning a two-layer neural network under a very general class of input distributions. |
150 | Learning to Understand Goal Specifications by Modelling Reward | Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Arian Hosseini, Pushmeet Kohli, Edward Grefenstette | To overcome this limitation, we present a framework within which instruction-conditional RL agents are trained using rewards obtained not from the environment, but from reward models which are jointly trained from expert examples. |
151 | Do Deep Generative Models Know What They Don’t Know? | Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, Balaji Lakshminarayanan | In this paper we challenge this assumption. |
152 | Identifying and Controlling Important Neurons in Neural Machine Translation | Anthony Bau, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, James Glass | We develop unsupervised methods for discovering important neurons in NMT models. |
153 | Representing Formal Languages: A Comparison Between Finite Automata and Recurrent Neural Networks | Joshua J. Michalenko, Ameesh Shah, Abhinav Verma, Richard G. Baraniuk, Swarat Chaudhuri, Ankit B. Patel | We investigate the internal representations that a recurrent neural network (RNN) uses while learning to recognize a regular formal language. |
154 | Visual Explanation by Interpretation: Improving Visual Feedback Capabilities of Deep Neural Networks | Jose Oramas, Kaili Wang, Tinne Tuytelaars | In this paper, we propose a novel scheme for both interpretation as well as explanation in which, given a pretrained model, we automatically identify internal features relevant for the set of classes considered by the model, without relying on additional annotations. |
155 | Don’t let your Discriminator be fooled | Brady Zhou, Philipp Kr?henb?hl | In this paper, we show that the Wasserstein distance is just one out of a large family of objective functions that yield these properties. |
156 | Latent Convolutional Models | ShahRukh Athar, Evgeny Burnaev, Victor Lempitsky | We present a new latent model of natural images that can be learned on large-scale datasets. |
157 | A Universal Music Translation Network | Noam Mor, Lior Wolf, Adam Polyak, Yaniv Taigman | We present a method for translating music across musical instruments and styles. |
158 | How to train your MAML | Antreas Antoniou, Harrison Edwards, Amos Storkey | In this paper, we propose various modifications to MAML that not only stabilize the system, but also substantially improve the generalization performance, convergence speed and computational overhead of MAML, which we call MAML++. |
159 | Learning a SAT Solver from Single-Bit Supervision | Daniel Selsam, Matthew Lamm, Benedikt B\”{u}nz, Percy Liang, Leonardo de Moura, David L. Dill | We present NeuroSAT, a message passing neural network that learns to solve SAT problems after only being trained as a classifier to predict satisfiability. |
160 | Learning Representations of Sets through Optimized Permutations | Yan Zhang, Jonathon Hare, Adam Pr?gel-Bennett | To this end, we propose a Permutation-Optimisation module that learns how to permute a set end-to-end. |
161 | Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition | Chun-Fu (Richard) Chen, Quanfu Fan, Neil Mallinar, Tom Sercu, Rogerio Feris | In this paper, we propose a novel Convolutional Neural Network (CNN) architecture for learning multi-scale feature representations with good tradeoffs between speed and accuracy. |
162 | Unsupervised Hyper-alignment for Multilingual Word Embeddings | Jean Alaux, Edouard Grave, Marco Cuturi, Armand Joulin | We thus propose a novel formulation that ensures composable mappings, leading to better alignments. |
163 | Visual Semantic Navigation using Scene Priors | Wei Yang, Xiaolong Wang, Ali Farhadi, Abhinav Gupta, Roozbeh Mottaghi | In this work, we focus on incorporating semantic priors in the task of semantic navigation. |
164 | NOODL: Provable Online Dictionary Learning and Sparse Coding | Sirisha Rambhatla, Xingguo Li, Jarvis Haupt | This was a major challenge until recently, when provable algorithms for dictionary learning were proposed. |
165 | Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization | Navid Azizan, Babak Hassibi | Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization. |
166 | Active Learning with Partial Feedback | Peiyun Hu, Zachary C. Lipton, Anima Anandkumar, Deva Ramanan | To address this more realistic setting, we propose active learning with partial feedback (ALPF), where the learner must actively choose both which example to label and which binary question to ask. |
167 | Gradient descent aligns the layers of deep linear networks | Ziwei Ji, Matus Telgarsky | This paper establishes risk convergence and asymptotic weight matrix alignment — a form of implicit regularization — of gradient flow and gradient descent when applied to deep linear networks on linearly separable data. |
168 | Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds | Cenk Baykal, Lucas Liebenwein, Igor Gilitschenski, Dan Feldman, Daniela Rus | We present an efficient coresets-based neural network compression algorithm that sparsifies the parameters of a trained fully-connected neural network in a manner that provably approximates the network’s output. |
169 | On the loss landscape of a class of deep neural networks with no bad local valleys | Quynh Nguyen, Mahesh Chandra Mukkamala, Matthias Hein | We identify a class of over-parameterized deep neural networks with standard activation functions and cross-entropy loss which provably have no bad local valley, in the sense that from any point in parameter space there exists a continuous path on which the cross-entropy loss is non-increasing and gets arbitrarily close to zero. |
170 | DOM-Q-NET: Grounded RL on Structured Language | Sheng Jia, Jamie Ryan Kiros, Jimmy Ba | In this work, we introduce DOM-Q-NET, a novel architecture for RL-based web navigation to address both of these problems. |
171 | Boosting Robustness Certification of Neural Networks | Gagandeep Singh, Timon Gehr, Markus P?schel, Martin Vechev | We present a novel approach for the certification of neural networks against adversarial perturbations which combines scalable overapproximation methods with precise (mixed integer) linear programming. |
172 | Learning To Simulate | Nataniel Ruiz, Samuel Schulter, Manmohan Chandraker | In this work, we propose a reinforcement learning-based method for automatically adjusting the parameters of any (non-differentiable) simulator, thereby controlling the distribution of synthesized data in order to maximize the accuracy of a model trained on that data. |
173 | Towards Understanding Regularization in Batch Normalization | Ping Luo, Xinjiang Wang, Wenqi Shao, Zhanglin Peng | We analyze BN by using a basic block of neural networks, consisting of a kernel layer, a BN layer, and a nonlinear activation function. |
174 | The Laplacian in RL: Learning Representations with Efficient Approximations | Yifan Wu, George Tucker, Ofir Nachum | In this paper, we present a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context. |
175 | Predicting the Generalization Gap in Deep Networks with Margin Distributions | Yiding Jiang, Dilip Krishnan, Hossein Mobahi, Samy Bengio | In this paper, we propose such a measure, and conduct extensive empirical studies on how well it can predict the generalization gap. |
176 | Adversarial Imitation via Variational Inverse Reinforcement Learning | Ahmed H. Qureshi, Byron Boots, Michael C. Yip | Our proposed method builds on the framework of generative adversarial networks and introduces the empowerment-regularized maximum-entropy inverse reinforcement learning to learn near-optimal rewards and policies. |
177 | Reasoning About Physical Interactions with Object-Oriented Prediction and Planning | Michael Janner, Sergey Levine, William T. Freeman, Joshua B. Tenenbaum, Chelsea Finn, Jiajun Wu | We present a paradigm for learning object-centric representations for physical scene understanding without direct supervision of object properties. |
178 | LayoutGAN: Generating Graphic Layouts with Wireframe Discriminators | Jianan Li, Jimei Yang, Aaron Hertzmann, Jianming Zhang, Tingfa Xu | We propose a novel Generative Adversarial Network, called LayoutGAN, that synthesizes layouts by modeling geometric relations of different types of 2D elements. |
179 | Learning Mixed-Curvature Representations in Product Spaces | Albert Gu, Frederic Sala, Beliz Gunel, Christopher R? | The quality of the representations achieved by embeddings is determined by how well the geometry of the embedding space matches the structure of the data.Euclidean space has been the workhorse for embeddings; recently hyperbolic and spherical spaces have gained popularity due to their ability to better embed new types of structured data—such as hierarchical data—but most data is not structured so uniformly.We address this problem by proposing learning embeddings in a product manifold combining multiple copies of these model spaces (spherical, hyperbolic, Euclidean), providing a space of heterogeneous curvature suitable for a wide variety of structures.We introduce a heuristic to estimate the sectional curvature of graph data and directly determine an appropriate signature—the number of component spaces and their dimensions—of the product manifold.Empirically, we jointly learn the curvature and the embedding in the product space via Riemannian optimization.We discuss how to define and compute intrinsic quantities such as means—a challenging notion for product manifolds—and provably learnable optimization functions.On a range of datasets and reconstruction tasks, our product space embeddings outperform single Euclidean or hyperbolic spaces used in previous works, reducing distortion by 32.55% on a Facebook social network dataset. |
180 | StrokeNet: A Neural Painting Environment | Ningyuan Zheng, Yifan Jiang, Dingjiang Huang | In this paper we try to address the discrete nature of software environment with an intermediate, differentiable simulation. |
181 | Harmonizing Maximum Likelihood with GANs for Multimodal Conditional Generation | Soochan Lee, Junsoo Ha, Gunhee Kim | In order to accomplish both training stability and multimodal output generation, we propose novel training schemes with a new set of losses named moment reconstruction losses that simply replace the reconstruction loss. |
182 | Measuring Compositionality in Representation Learning | Jacob Andreas | We describe a procedure for evaluating compositionality by measuring how well the true representation-producing model can be approximated by a model that explicitly composes a collection of inferred representational primitives. |
183 | Benchmarking Neural Network Robustness to Common Corruptions and Perturbations | Dan Hendrycks, Thomas Dietterich | In this paper we establish rigorous benchmarks for image classifier robustness. Then we propose a new dataset called ImageNet-P which enables researchers to benchmark a classifier’s robustness to common perturbations. |
184 | ADef: an Iterative Algorithm to Construct Adversarial Deformations | Rima Alaifari, Giovanni S. Alberti, Tandri Gauksson | In this paper, we propose the ADef algorithm to construct a different kind of adversarial attack created by iteratively applying small deformations to the image, found through a gradient descent step. |
185 | Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning | Ilya Kostrikov, Kumar Krishna Agrawal, Debidatta Dwibedi, Sergey Levine, Jonathan Tompson | In order to address these issues, we propose a new algorithm called Discriminator-Actor-Critic that uses off-policy Reinforcement Learning to reduce policy-environment interaction sample complexity by an average factor of 10. |
186 | Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives | George Tucker, Dieterich Lawson, Shixiang Gu, Chris J. Maddison | In particular, we show that this estimator reduces the variance of the IWAE gradient, the reweighted wake-sleep update (RWS) (Bornschein & Bengio 2014), and the jackknife variational inference (JVI) gradient (Nowozin 2018). |
187 | Learning Recurrent Binary/Ternary Weights | Arash Ardakani, Zhengyun Ji, Sean C. Smithson, Brett H. Meyer, Warren J. Gross | To address the above issues, we introduce a method that can learn binary and ternary weights during the training phase to facilitate hardware implementations of RNNs. |
188 | Learning concise representations for regression by evolving networks of trees | William La Cava, Tilak Raj Singh, James Taggart, Srinivas Suri | We propose and study a method for learning interpretable representations for the task of regression. |
189 | Efficient Training on Very Large Corpora via Gramian Estimation | Walid Krichene, Nicolas Mayoraz, Steffen Rendle, Li Zhang, Xinyang Yi, Lichan Hong, Ed Chi, John Anderson | These models are typically trained using SGD with random sampling of unobserved pairs, with a sample size that grows quadratically with the corpus size, making it expensive to scale.We propose new efficient methods to train these models without having to sample unobserved pairs. |
190 | MAE: Mutual Posterior-Divergence Regularization for Variational AutoEncoders | Xuezhe Ma, Chunting Zhou, Eduard Hovy | In this work, we introduce mutual posterior-divergence regularization, a novel regularization that is able to control the geometry of the latent space to accomplish meaningful representation learning, while achieving comparable or superior capability of density estimation.Experiments on three image benchmark datasets demonstrate that, when equipped with powerful decoders, our model performs well both on density estimation and representation learning. |
191 | Residual Non-local Attention Networks for Image Restoration | Yulun Zhang, Kunpeng Li, Kai Li, Bineng Zhong, Yun Fu | In this paper, we propose a residual non-local attention network for high-quality image restoration. |
192 | Meta-Learning For Stochastic Gradient MCMC | Wenbo Gong, Yingzhen Li, Jos? Miguel Hern?ndez-Lobato | This paper presents the first meta-learning algorithm that allows automated design for the underlying continuous dynamics of an SG-MCMC sampler. |
193 | Systematic Generalization: What Is Required and Can It Be Learned? | Dzmitry Bahdanau*, Shikhar Murty*, Michael Noukhovitch, Thien Huu Nguyen, Harm de Vries, Aaron Courville | Numerous models for grounded language understanding have been recently proposed, including (i) generic models that can be easily adapted to any given task and (ii) intuitively appealing modular models that require background knowledge to be instantiated. |
194 | Efficient Lifelong Learning with A-GEM | Arslan Chaudhry, Marc?Aurelio Ranzato, Marcus Rohrbach, Mohamed Elhoseiny | In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost. |
195 | Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering | Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Andrew McCallum | This paper introduces a new framework for open-domain question answering in which the retriever and the reader \emph{iteratively interact} with each other. |
196 | Double Viterbi: Weight Encoding for High Compression Ratio and Fast On-Chip Reconstruction for Deep Neural Network | Daehyun Ahn, Dongsoo Lee, Taesu Kim, Jae-Joon Kim | In this paper, we propose a new sparse matrix format in order to enable a highly parallel decoding process of the entire sparse matrix. |
197 | Overcoming the Disentanglement vs Reconstruction Trade-off via Jacobian Supervision | Jos? Lezama | In this work, we propose to overcome this trade-off by progressively growing the dimension of the latent code, while constraining the Jacobian of the output image with respect to the disentangled variables to remain the same. |
198 | RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space | Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, Jian Tang | In this paper, we present a new approach for knowledge graph embedding called RotatE, which is able to model and infer various relation patterns including: symmetry/antisymmetry, inversion, and composition. |
199 | Guiding Policies with Language via Meta-Learning | John D. Co-Reyes, Abhishek Gupta, Suvansh Sanjeev, Nick Altieri, Jacob Andreas, John DeNero, Pieter Abbeel, Sergey Levine | In this work, we propose an interactive formulation of the task specification problem, where iterative language corrections are provided to an autonomous agent, guiding it in acquiring the desired skill. |
200 | AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods | Zhiming Zhou*, Qingru Zhang*, Guansong Lu, Hongwei Wang, Weinan Zhang, Yong Yu | In this paper, we provide a new insight into the non-convergence issue of Adam as well as other adaptive learning rate methods. |
201 | AD-VAT: An Asymmetric Dueling mechanism for learning Visual Active Tracking | Fangwei Zhong, Peng Sun, Wenhan Luo, Tingyun Yan, Yizhou Wang | To learn a robust tracker for VAT, in this paper, we propose a novel adversarial RL method which adopts an Asymmetric Dueling mechanism, referred to as AD-VAT. |
202 | Marginal Policy Gradients: A Unified Family of Estimators for Bounded Action Spaces with Applications | Carson Eisenach, Haichuan Yang, Ji Liu, Han Liu | To this end we introduce the angular policy gradient (APG), a stochastic policy gradient method for directional control. |
203 | On Self Modulation for Generative Adversarial Networks | Ting Chen, Mario Lucic, Neil Houlsby, Sylvain Gelly | We propose and study an architectural modification, self-modulation, which improves GAN performance across different data sets, architectures, losses, regularizers, and hyperparameter settings. |
204 | Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy | Yuan Xie, Boyi Liu, Qiang Liu, Zhaoran Wang, Yuan Zhou, Jian Peng | In this work, we introduce a new approach named Maximum Likelihood Inverse Propensity Scoring (MLIPS) for batch learning from logged bandit feedback. |
205 | Subgradient Descent Learns Orthogonal Dictionaries | Yu Bai, Qijia Jiang, Ju Sun | We show that a subgradient descent algorithm, with random initialization, can recover orthogonal dictionaries on a natural nonsmooth, nonconvex L1 minimization formulation of the problem, under mild statistical assumption on the data. |
206 | ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech | Wei Ping, Kainan Peng, Jitong Chen | In this work, we propose a new solution for parallel wave generation by WaveNet. |
207 | MARGINALIZED AVERAGE ATTENTIONAL NETWORK FOR WEAKLY-SUPERVISED LEARNING | Yuan Yuan, Yueming Lyu, Xi Shen, Ivor W. Tsang, Dit-Yan Yeung | To alleviate this issue, we propose a marginalized average attentional network (MAAN) to suppress the dominant response of the most salient regions in a principled manner. |
208 | Towards GAN Benchmarks Which Require Generalization | Ishaan Gulrajani, Colin Raffel, Luke Metz | For many evaluation metrics commonly used as benchmarks for unconditional image generation, trivially memorizing the training set attains a better score than models which are considered state-of-the-art; we consider this problematic.We clarify a necessary condition for an evaluation metric not to behave this way: estimating the function must require a large sample from the model. |
209 | A Closer Look at Few-shot Classification | Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, Jia-Bin Huang | In this paper, we present 1) a consistent comparative analysis of several representative few-shot classi?cation algorithms, with results showing that deeper backbones signi?cantly reduce the gap across methods including the baseline, 2) a slightly modi?ed baseline method that surprisingly achieves competitive performance when compared with the state-of-the-art on both the mini-ImageNet and the CUB datasets, and 3) a new experimental setting for evaluating the cross-domain generalization ability for few-shot classi?cation algorithms. |
210 | Meta-Learning Probabilistic Inference for Prediction | Jonathan Gordon, John Bronskill, Matthias Bauer, Sebastian Nowozin, Richard Turner | This paper introduces a new framework for data efficient and versatile learning. |
211 | Deep reinforcement learning with relational inductive biases | Vinicius Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David Reichert, Timothy Lillicrap, Edward Lockhart, Murray Shanahan, Victoria Langston, Razvan Pascanu, Matthew Botvinick, Oriol Vinyals, Peter Battaglia | We introduce an approach for augmenting model-free deep reinforcement learning agents with a mechanism for relational reasoning over structured representations, which improves performance, learning efficiency, generalization, and interpretability. |
212 | Relaxed Quantization for Discretized Neural Networks | Christos Louizos, Matthias Reisser, Tijmen Blankevoort, Efstratios Gavves, Max Welling | In order to train networks that can be effectively discretized without loss of performance, we introduce a differentiable quantization procedure. |
213 | Tree-Structured Recurrent Switching Linear Dynamical Systems for Multi-Scale Modeling | Josue Nassar, Scott Linderman, Monica Bugallo, Il Memming Park | To fit this model, we present a fully-Bayesian sampling procedure using Polya-Gamma data augmentation to allow for fast and conjugate Gibbs sampling. |
214 | STCN: Stochastic Temporal Convolutional Networks | Emre Aksan, Otmar Hilliges | In this work, we propose stochastic temporal convolutional networks (STCNs), a novel architecture that combines the computational advantages of temporal convolutional networks (TCN) with the representational power and robustness of stochastic latent spaces. |
215 | Soft Q-Learning with Mutual-Information Regularization | Jordi Grau-Moya, Felix Leibfried, Peter Vrancx | We propose a reinforcement learning (RL) algorithm that uses mutual-information regularization to optimize a prior action distribution for better performance and exploration. |
216 | On the Turing Completeness of Modern Neural Network Architectures | Jorge P?rez, Javier Marinkovic, Pablo Barcel? | We show both models to be Turing complete exclusively based on their capacity to compute and access internal dense representations of the data. |
217 | Improving Differentiable Neural Computers Through Memory Masking, De-allocation, and Link Distribution Sharpness Control | Robert Csordas, Juergen Schmidhuber | An analysis of its internal activation patterns reveals three problems: Most importantly, the lack of key-value separation makes the address distribution resulting from content-based look-up noisy and flat, since the value influences the score calculation, although only the key should. |
218 | Evaluating Robustness of Neural Networks with Mixed Integer Programming | Vincent Tjeng, Kai Y. Xiao, Russ Tedrake | We achieve this computational speedup via tight formulations for non-linearities, as well as a novel presolve algorithm that makes full use of all information available. |
219 | Random mesh projectors for inverse problems | Konik Kothari*, Sidharth Gupta*, Maarten v. de Hoop, Ivan Dokmanic | We propose a new learning-based approach to solve ill-posed inverse problems in imaging. |
220 | Multi-Agent Dual Learning | Yiren Wang, Yingce Xia, Tianyu He, Fei Tian, Tao Qin, ChengXiang Zhai, Tie-Yan Liu | In this paper, we extend this framework by introducing multiple primal and dual models, and propose the multi-agent dual learning framework. |
221 | Complement Objective Training | Hao-Yun Chen, Pei-Hsin Wang, Chun-Hao Liu, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan | We conduct extensive experiments on multiple tasks ranging from computer vision to natural language understanding. |
222 | Mode Normalization | Lucas Deecke, Iain Murray, Hakan Bilen | As a remedy, we propose a more flexible approach: by extending the normalization to more than a single mean and variance, we detect modes of data on-the-fly, jointly normalizing samples that share common features. |
223 | Detecting Egregious Responses in Neural Sequence-to-sequence Models | Tianxing He, James Glass | In this work, we attempt to answer a critical question: whether there exists some input sequence that will cause a well-trained discrete-space neural network sequence-to-sequence (seq2seq) model to generate egregious outputs (aggressive, malicious, attacking, etc.). |
224 | Learning Actionable Representations with Goal Conditioned Policies | Dibya Ghosh, Abhishek Gupta, Sergey Levine | In this paper, we instead aim to learn functionally salient representations: representations that are not necessarily complete in terms of capturing all factors of variation in the observation space, but rather aim to capture those factors of variation that are important for decision making — that are “actionable”. |
225 | Verification of Non-Linear Specifications for Neural Networks | Chongli Qin, Krishnamurthy (Dj) Dvijotham, Brendan O’Donoghue, Rudy Bunel, Robert Stanforth, Sven Gowal, Jonathan Uesato, Grzegorz Swirszcz, Pushmeet Kohli | In this paper, we extend verification algorithms to be able to certify richer properties of neural networks. |
226 | Generating Liquid Simulations with Deformation-aware Neural Networks | Lukas Prantl, Boris Bonev, Nils Thuerey | We propose a novel approach for deformation-aware neural networks that learn the weighting and synthesis of dense volumetric deformation fields. |
227 | DyRep: Learning Representations over Dynamic Graphs | Rakshit Trivedi, Mehrdad Farajtabar, Prasenjeet Biswal, Hongyuan Zha | We present DyRep – a novel modeling framework for dynamic graphs that posits representation learning as a latent mediation process bridging two observed processes namely — dynamics of the network (realized as topological evolution) and dynamics on the network (realized as activities between nodes). |
228 | Trellis Networks for Sequence Modeling | Shaojie Bai, J. Zico Kolter, Vladlen Koltun | We present trellis networks, a new architecture for sequence modeling. |
229 | Scalable Unbalanced Optimal Transport using Generative Adversarial Networks | Karren D. Yang, Caroline Uhler | In this paper, we present a scalable method for unbalanced optimal transport (OT) based on the generative-adversarial framework. |
230 | Solving the Rubik’s Cube with Approximate Policy Iteration | Stephen McAleer, Forest Agostinelli, Alexander Shmakov, Pierre Baldi | We introduce Autodidactic Iteration: an API algorithm that overcomes the problem of sparse rewards by training on a distribution of states that allows the reward to propagate from the goal state to states farther away. |
231 | Variance Reduction for Reinforcement Learning in Input-Driven Environments | Hongzi Mao, Shaileshh Bojja Venkatakrishnan, Malte Schwarzkopf, Mohammad Alizadeh | We consider reinforcement learning in input-driven environments, where an exogenous, stochastic input process affects the dynamics of the system. |
232 | Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic | Mikael Henaff, Alfredo Canziani, Yann LeCun | In this work, we propose to train a policy while explicitly penalizing the mismatch between these two distributions over a fixed time horizon. |
233 | GAN Dissection: Visualizing and Understanding Generative Adversarial Networks | David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, Antonio Torralba | Answering such questions could enable us to develop new insights and better models.In this work, we present an analytic framework to visualize and understand GANs at the unit-, object-, and scene-level. |
234 | Improving MMD-GAN Training with Repulsive Loss Function | Wei Wang, Yuan Sun, Saman Halgamuge | To address this issue, we propose a repulsive loss function to actively learn the difference among the real data by simply rearranging the terms in MMD. |
235 | Deterministic PAC-Bayesian generalization bounds for deep networks via generalizing noise-resilience | Vaishnavh Nagarajan, Zico Kolter | In this work, we present a general PAC-Bayesian framework that leverages this observation to provide a bound on the original network learned — a network that is deterministic and uncompressed. |
236 | Recall Traces: Backtracking Models for Efficient Reinforcement Learning | Anirudh Goyal, Philemon Brakel, William Fedus, Soumye Singhal, Timothy Lillicrap, Sergey Levine, Hugo Larochelle, Yoshua Bengio | Hence, we may want to preferentially train on those high-reward states and the probable trajectories leading to them.To this end, we advocate for the use of a \textit{backtracking model} that predicts the preceding states that terminate at a given high-reward state. |
237 | Stable Recurrent Models | John Miller, Moritz Hardt | In this work, we conduct a thorough investigation of stable recurrent models. |
238 | The Limitations of Adversarial Training and the Blind-Spot Attack | Huan Zhang*, Hongge Chen*, Zhao Song, Duane Boning, Inderjit S. Dhillon, Cho-Jui Hsieh | In our paper, we shed some lights on the practicality and the hardness of adversarial training by showing that the effectiveness (robustness on test set) of adversarial training has a strong correlation with the distance between a test point and the manifold of training data embedded by the network. |
239 | Efficiently testing local optimality and escaping saddles for ReLU networks | Chulhee Yun, Suvrit Sra, Ali Jadbabaie | We provide a theoretical algorithm for checking local optimality and escaping saddles at nondifferentiable points of empirical risks of two-layer ReLU networks. |
240 | ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware | Han Cai, Ligeng Zhu, Song Han | In this paper, we present ProxylessNAS that can directly learn the architectures for large-scale target tasks and target hardware platforms. |
241 | Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization | Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama | In this paper, we propose an HRL method that learns a latent variable of a hierarchical policy using mutual information maximization. |
242 | Generalizable Adversarial Training via Spectral Normalization | Farzan Farnia, Jesse Zhang, David Tse | In this work, we extend the notion of margin loss to adversarial settings and bound the generalization error for DNNs trained under several well-known gradient-based attack schemes, motivating an effective regularization scheme based on spectral normalization of the DNN’s weight matrices. |
243 | Adversarial Domain Adaptation for Stable Brain-Machine Interfaces | Ali Farshchian, Juan A. Gallego, Joseph P. Cohen, Yoshua Bengio, Lee E. Miller, Sara A. Solla | Here, we introduce a new computationalapproach that decodes movement intent from a low-dimensional latent representationof the neural data. |
244 | Deep Online Learning Via Meta-Learning: Continual Adaptation for Model-Based RL | Anusha Nagabandi, Chelsea Finn, Sergey Levine | The goal in this paper is to develop a method for continual online learning from an incoming stream of data, using deep neural network models. |
245 | Deep Anomaly Detection with Outlier Exposure | Dan Hendrycks, Mantas Mazeika, Thomas Dietterich | We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). |
246 | Contingency-Aware Exploration in Reinforcement Learning | Jongwook Choi, Yijie Guo, Marcin Moczulski, Junhyuk Oh, Neal Wu, Mohammad Norouzi, Honglak Lee | In this study, we develop an attentive dynamics model (ADM) that discovers controllable elements of the observations, which are often associated with the location of the character in Atari games. |
247 | Context-adaptive Entropy Model for End-to-end Optimized Image Compression | Jooyoung Lee, Seunghyun Cho, Seung-Kwon Beack | We propose a context-adaptive entropy model for use in end-to-end optimized image compression. |
248 | Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow | Xue Bin Peng, Angjoo Kanazawa, Sam Toyer, Pieter Abbeel, Sergey Levine | In this work, we propose a simple and general technique to constrain information flow in the discriminator by means of an information bottleneck. |
249 | Meta-learning with differentiable closed-form solvers | Luca Bertinetto, Joao F. Henriques, Philip Torr, Andrea Vedaldi | Adapting deep networks to new concepts from a few examples is challenging, due to the high computational requirements of standard fine-tuning procedures.Most work on few-shot learning has thus focused on simple learning techniques for adaptation, such as nearest neighbours or gradient descent.Nonetheless, the machine learning literature contains a wealth of methods that learn non-deep models very efficiently.In this paper, we propose to use these fast convergent methods as the main adaptation mechanism for few-shot learning.The main idea is to teach a deep network to use standard machine learning tools, such as ridge regression, as part of its own internal model, enabling it to quickly adapt to novel data.This requires back-propagating errors through the solver steps.While normally the cost of the matrix operations involved in such a process would be significant, by using the Woodbury identity we can make the small number of examples work to our advantage.We propose both closed-form and iterative solvers, based on ridge regression and logistic regression components.Our methods constitute a simple and novel approach to the problem of few-shot learning and achieve performance competitive with or superior to the state of the art on three benchmarks. |
250 | Learning Self-Imitating Diverse Policies | Tanmay Gangwani, Qiang Liu, Jian Peng | In this work, we introduce a self-imitation learning algorithm that exploits and explores well in the sparse and episodic reward settings. |
251 | ProxQuant: Quantized Neural Networks via Proximal Operators | Yu Bai, Yu-Xiang Wang, Edo Liberty | Despite its empirical success, little is understood about why the straight-through gradient method works.Building upon a novel observation that the straight-through gradient method is in fact identical to the well-known Nesterov?s dual-averaging algorithm on a quantization constrained optimization problem, we propose a more principled alternative approach, called ProxQuant , that formulates quantized network training as a regularized learning problem instead and optimizes it via the prox-gradient method. |
252 | Universal Transformers | Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, Lukasz Kaiser | We propose the Universal Transformer (UT), a parallel-in-time self-attentive recurrent sequence model which can be cast as a generalization of the Transformer model and which addresses these issues. |
253 | Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning | Anusha Nagabandi, Ignasi Clavera, Simin Liu, Ronald S. Fearing, Pieter Abbeel, Sergey Levine, Chelsea Finn | Given that it is impractical to train separate policies to accommodate all situations the agent may see in the real world, this work proposes to learn how to quickly and effectively adapt online to new tasks. |
254 | L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data | Jianbo Chen, Le Song, Martin J. Wainwright, Michael I. Jordan | Methods based on the Shapley score have been proposed as a fair way of computing feature attributions, but incur an exponential complexity in the number of features. |
255 | Discovery of Natural Language Concepts in Individual Units of CNNs | Seil Na, Yo Joong Choe, Dong-Hyun Lee, Gunhee Kim | In order to quantitatively analyze such intriguing phenomenon, we propose a concept alignment method based on how units respond to replicated text. |
256 | Towards the first adversarially robust neural network model on MNIST | Lukas Schott, Jonas Rauber, Matthias Bethge, Wieland Brendel | We present a novel robust classification model that performs analysis by synthesis using learned class-conditional data distributions. |
257 | Discriminator Rejection Sampling | Samaneh Azadi, Catherine Olsson, Trevor Darrell, Ian Goodfellow, Augustus Odena | We propose a rejection sampling scheme using the discriminator of a GAN toapproximately correct errors in the GAN generator distribution. |
258 | Harmonic Unpaired Image-to-image Translation | Rui Zhang, Tomas Pfister, Jia Li | In this paper, we take a manifold view of the problem by introducing a smoothness term over the sample graph to attain harmonic functions to enforce consistent mappings during the translation. |
259 | Universal Successor Features Approximators | Diana Borsa, Andre Barreto, John Quan, Daniel J. Mankowitz, Hado van Hasselt, Remi Munos, David Silver, Tom Schaul | We discuss the challenges involved in training a USFA, its generalisation properties and demonstrate its practical benefits and transfer abilities on a large-scale domain in which the agent has to navigate in a first-person perspective three-dimensional environment. |
260 | Gradient Descent Provably Optimizes Over-parameterized Neural Networks | Simon S. Du, Xiyu Zhai, Barnabas Poczos, Aarti Singh | One of the mysteries in the success of neural networks is randomly initialized first order methods like gradient descent can achieve zero training loss even though the objective function is non-convex and non-smooth. |
261 | Opportunistic Learning: Budgeted Cost-Sensitive Learning from Data Streams | Mohammad Kachuee, Orpaz Goldstein, Kimmo K?rkk?inen, Sajad Darabi, Majid Sarrafzadeh | In this paper, we propose a novel approach for cost-sensitive feature acquisition at the prediction-time. |
262 | DARTS: Differentiable Architecture Search | Hanxiao Liu, Karen Simonyan, Yiming Yang | This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. |
263 | Feature-Wise Bias Amplification | Klas Leino, Matt Fredrikson, Emily Black, Shayak Sen, Anupam Datta | We present two new feature selection algorithms for mitigating bias amplification in linear models, and show how they can be adapted to convolutional neural networks efficiently. |
264 | The relativistic discriminator: a key element missing from standard GAN | Alexia Jolicoeur-Martineau | We generalize both approaches to non-standard GAN loss functions and we refer to them respectively as Relativistic GANs (RGANs) and Relativistic average GANs (RaGANs). |
265 | Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer | David Berthelot*, Colin Raffel*, Aurko Roy, Ian Goodfellow | In this paper, we propose a regularization procedure which encourages interpolated outputs to appear more realistic by fooling a critic network which has been trained to recover the mixing coefficient from interpolated data. |
266 | Quasi-hyperbolic momentum and Adam for deep learning | Jerry Ma, Denis Yarats | We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step. |
267 | Local SGD Converges Fast and Communicates Little | Sebastian U. Stich | To overcome this communication bottleneck recent works propose to reduce the communication frequency. |
268 | Learning Finite State Representations of Recurrent Policy Networks | Anurag Koul, Alan Fern, Sam Greydanus | In this paper, we introduce a new technique, Quantized Bottleneck Insertion, to learn finite representations of these vectors and features. |
269 | Multilingual Neural Machine Translation with Knowledge Distillation | Xu Tan, Yi Ren, Di He, Tao Qin, Zhou Zhao, Tie-Yan Liu | In this paper, we propose a distillation-based approach to boost the accuracy of multilingual machine translation. |
270 | MisGAN: Learning from Incomplete Data with Generative Adversarial Networks | Steven Cheng-Xian Li, Bo Jiang, Benjamin Marlin | In this paper, we present a GAN-based framework for learning from complex, high-dimensional incomplete data. |
271 | A Direct Approach to Robust Deep Learning Using Adversarial Networks | Huaxia Wang, Chun-Nam Yu | In this paper we propose a new defensive mechanism under the generative adversarial network~(GAN) framework. |
272 | Combinatorial Attacks on Binarized Neural Networks | Elias B Khalil, Amrita Gupta, Bistra Dilkina | In this work, we study the problem of attacking a BNN through the lens of combinatorial and integer optimization. |
273 | Exemplar Guided Unsupervised Image-to-Image Translation with Semantic Consistency | Liqian Ma, Xu Jia, Stamatios Georgoulis, Tinne Tuytelaars, Luc Van Gool | To alleviate these issues, we propose the Exemplar Guided & Semantically Consistent Image-to-image Translation (EGSC-IT) network which conditions the translation process on an exemplar image in the target domain. |
274 | ARM: Augment-REINFORCE-Merge Gradient for Stochastic Binary Networks | Mingzhang Yin, Mingyuan Zhou | To backpropagate the gradients through stochastic binary layers, we propose the augment-REINFORCE-merge (ARM) estimator that is unbiased, exhibits low variance, and has low computational complexity. |
275 | Building Dynamic Knowledge Graphs from Text using Machine Reading Comprehension | Rajarshi Das, Tsendsuren Munkhdalai, Xingdi Yuan, Adam Trischler, Andrew McCallum | We propose a neural machine-reading model that constructs dynamic knowledge graphs from procedural text. |
276 | Information asymmetry in KL-regularized RL | Alexandre Galashov, Siddhant M. Jayakumar, Leonard Hasenclever, Dhruva Tirumala, Jonathan Schwarz, Guillaume Desjardins, Wojciech M. Czarnecki, Yee Whye Teh, Razvan Pascanu, Nicolas Heess | In this work we study the possibility of leveraging such repeated structure to speed up and regularize learning. |
277 | TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer | Sicong Huang, Qiyang Li, Cem Anil, Xuchan Bao, Sageev Oore, Roger B. Grosse | In this work, we address the problem of musical timbre transfer, where the goal is to manipulate the timbre of a sound sample from one instrument to match another instrument while preserving other musical content, such as pitch, rhythm, and loudness. |
278 | Whitening and Coloring Batch Transform for GANs | Aliaksandr Siarohin, Enver Sangineto, Nicu Sebe | In this paper we propose to generalize both BN and cBN using a Whitening and Coloring based batch normalization. |
279 | Learnable Embedding Space for Efficient Neural Architecture Compression | Shengcao Cao, Xiaofang Wang, Kris M. Kitani | We propose a method to incrementally learn an embedding space over the domain of network architectures, to enable the careful selection of architectures for evaluation during compressed architecture search. |
280 | On the Sensitivity of Adversarial Robustness to Input Data Distributions | Gavin Weiguang Ding, Kry Yik Chau Lui, Xiaomeng Jin, Luyu Wang, Ruitong Huang | In this paper, we demonstrate an intriguing phenomenon about the most popular robust training method in the literature, adversarial training: Adversarial robustness, unlike clean accuracy, is sensitive to the input data distribution. |
281 | Minimal Images in Deep Neural Networks: Fragile Object Recognition in Natural Images | Sanjana Srivastava, Guy Ben-Yosef, Xavier Boix | In this paper, we demonstrate that such drops in accuracy due to changes of the visible region are a common phenomenon between humans and existing state-of-the-art deep neural networks (DNNs), and are much more prominent in DNNs. |
282 | A Statistical Approach to Assessing Neural Network Robustness | Stefan Webb, Tom Rainforth, Yee Whye Teh, M. Pawan Kumar | We present a new approach to assessing the robustness of neural networks based on estimating the proportion of inputs for which a property is violated. |
283 | Improving Sequence-to-Sequence Learning via Optimal Transport | Liqun Chen, Yizhe Zhang, Ruiyi Zhang, Chenyang Tao, Zhe Gan, Haichao Zhang, Bai Li, Dinghan Shen, Changyou Chen, Lawrence Carin | We present a novel solution to alleviate these issues. |
284 | PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees | James Jordon, Jinsung Yoon, Mihaela van der Schaar | In this paper, we investigate a method for ensuring (differential) privacy of the generator of the Generative Adversarial Nets (GAN) framework. |
285 | Integer Networks for Data Compression with Latent-Variable Models | Johannes Ball?, Nick Johnston, David Minnen | We propose using integer networks as a universal solution to this problem, and demonstrate that they enable reliable cross-platform encoding and decoding of images using variational models. |
286 | Value Propagation Networks | Nantas Nardelli, Gabriel Synnaeve, Zeming Lin, Pushmeet Kohli, Philip H. S. Torr, Nicolas Usunier | We present Value Propagation (VProp), a set of parameter-efficient differentiable planning modules built on Value Iteration which can successfully be trained using reinforcement learning to solve unseen tasks, has the capability to generalize to larger map sizes, and can learn to navigate in dynamic environments. |
287 | Bayesian Policy Optimization for Model Uncertainty | Gilwoo Lee, Brian Hou, Aditya Mandalika, Jeongseok Lee, Sanjiban Choudhury, Siddhartha S. Srinivasa | To address challenges from discretizing the continuous latent parameter space, we propose a new policy network architecture that encodes the belief distribution independently from the observable state. |
288 | Variational Bayesian Phylogenetic Inference | Cheng Zhang, Frederick A. Matsen IV | In this paper we present an alternative approach: a variational framework for Bayesian phylogenetic analysis. |
289 | LEARNING FACTORIZED REPRESENTATIONS FOR OPEN-SET DOMAIN ADAPTATION | Mahsa Baktashmotlagh, Masoud Faraki, Tom Drummond, Mathieu Salzmann | In this paper, we tackle the more challenging, yet more realistic case of open-set domain adaptation, where new, unknown classes can be present in the target data. |
290 | On the Universal Approximability and Complexity Bounds of Quantized ReLU Neural Networks | Yukun Ding, Jinglan Liu, Jinjun Xiong, Yiyu Shi | In this paper, we study the representation power of quantized neural networks. |
291 | Learning Localized Generative Models for 3D Point Clouds via Graph Convolution | Diego Valsesia, Giulia Fracastoro, Enrico Magli | We focus on the generator of a GAN and define methods for graph convolution when the graph is not known in advance as it is the very output of the generator. |
292 | ACCELERATING NONCONVEX LEARNING VIA REPLICA EXCHANGE LANGEVIN DIFFUSION | Yi Chen, Jinglin Chen, Jing Dong, Jian Peng, Zhaoran Wang | To attain the advantages of both regimes, we propose to use replica exchange, which swaps between two Langevin diffusions with different temperatures. |
293 | Dynamically Unfolding Recurrent Restorer: A Moving Endpoint Control Method for Image Restoration | Xiaoshuai Zhang, Yiping Lu, Jiaying Liu, Bin Dong | In this paper, we propose a new control framework called the moving endpoint control to restore images corrupted by different degradation levels in one model. |
294 | Bias-Reduced Uncertainty Estimation for Deep Neural Classifiers | Yonatan Geifman, Guy Uziel, Ran El-Yaniv | We consider the problem of uncertainty estimation in the context of (non-Bayesian) deep neural classification. |
295 | CAMOU: Learning Physical Vehicle Camouflages to Adversarially Attack Detectors in the Wild | Yang Zhang, Hassan Foroosh, Philip David, Boqing Gong | In this paper, we conduct an intriguing experimental study about the physical adversarial attack on object detectors in the wild. |
296 | Learning Latent Superstructures in Variational Autoencoders for Deep Multidimensional Clustering | Xiaopeng Li, Zhourong Chen, Leonard K. M. Poon, Nevin L. Zhang | We investigate a variant of variational autoencoders where there is a superstructure of discrete latent variables on top of the latent features. |
297 | Learning Programmatically Structured Representations with Perceptor Gradients | Svetlin Penkov, Subramanian Ramamoorthy | We present the perceptor gradients algorithm — a novel approach to learning symbolic representations based on the idea of decomposing an agent’s policy into i) a perceptor network extracting symbols from raw observation data and ii) a task encoding program which maps the input symbols to output actions. |
298 | Variational Autoencoders with Jointly Optimized Latent Dependency Structure | Jiawei He, Yu Gong, Joseph Marino, Greg Mori, Andreas Lehrmann | We propose a method for learning the dependency structure between latent variables in deep latent variable models. |
299 | The Unusual Effectiveness of Averaging in GAN Training | Yasin Yaz{\i}c{\i}, Chuan-Sheng Foo, Stefan Winkler, Kim-Hui Yap, Georgios Piliouras, Vijay Chandrasekhar | We examine two different techniques for parameter averaging in GAN training. |
300 | Beyond Pixel Norm-Balls: Parametric Adversaries using an Analytically Differentiable Renderer | Hsueh-Ti Derek Liu, Michael Tao, Chun-Liang Li, Derek Nowrouzezahrai, Alec Jacobson | As such, we propose a novel evaluation measure, parametric norm-balls, by directly perturbing physical parameters that underly image formation. |
301 | Diversity is All You Need: Learning Skills without a Reward Function | Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine | Intelligent creatures can explore their environments and learn useful skills without supervision.In this paper, we propose “Diversity is All You Need”(DIAYN), a method for learning useful skills without a reward function. |
302 | Supervised Policy Update for Deep Reinforcement Learning | Quan Vuong, Yiming Zhang, Keith W. Ross | We propose a new sample-efficient methodology, called Supervised Policy Update (SPU), for deep reinforcement learning. |
303 | Learning sparse relational transition models | Victoria Xia, Zi Wang, Kelsey Allen, Tom Silver, Leslie Pack Kaelbling | We present a representation for describing transition models in complex uncertain domains using relational rules. |
304 | Learning to Schedule Communication in Multi-agent Reinforcement Learning | Daewoo Kim, Sangwoo Moon, David Hostallero, Wan Ju Kang, Taeyoung Lee, Kyunghwan Son, Yung Yi | In this paper, we study a practical scenario when (i) the communication bandwidth is limited and (ii) the agents share the communication medium so that only a restricted number of agents are able to simultaneously use the medium, as in the state-of-the-art wireless networking standards. |
305 | Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies | Kenneth Marino, Abhinav Gupta, Rob Fergus, Arthur Szlam | In this paper we introduce a simple, robust approach to hierarchically training an agent in the setting of sparse reward tasks.The agent is split into a low-level and a high-level policy. |
306 | Multi-class classification without multi-class labels | Yen-Chang Hsu, Zhaoyang Lv, Joel Schlosser, Phillip Odom, Zsolt Kira | We formulate this approach, present a probabilistic graphical model for it, and derive a surprisingly simple loss function that can be used to learn neural network-based models. |
307 | What do you learn from context? Probing for sentence structure in contextualized word representations | Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R Thomas McCoy, Najoung Kim, Benjamin Van Durme, Samuel R. Bowman, Dipanjan Das, Ellie Pavlick | Building on recent token-level probing work, we introduce a novel edge probing task design and construct a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline. |
308 | Spectral Inference Networks: Unifying Deep and Spectral Learning | David Pfau, Stig Petersen, Ashish Agarwal, David G. T. Barrett, Kimberly L. Stachenfeld | We present Spectral Inference Networks, a framework for learning eigenfunctions of linear operators by stochastic optimization. |
309 | PeerNets: Exploiting Peer Wisdom Against Adversarial Attacks | Jan Svoboda, Jonathan Masci, Federico Monti, Michael Bronstein, Leonidas Guibas | Unfortunately, it has been shown that such systems are vulnerable to adversarial attacks, making them prone to potential unlawful uses.Designing deep neural networks that are robust to adversarial attacks is a fundamental step in making such systems safer and deployable in a broader variety of applications (e.g. autonomous driving), but more importantly is a necessary step to design novel and more advanced architectures built on new computational paradigms rather than marginally building on the existing ones.In this paper we introduce PeerNets, a novel family of convolutional networks alternating classical Euclidean convolutions with graph convolutions to harness information from a graph of peer samples. |
310 | Attentive Neural Processes | Hyunjik Kim, Andriy Mnih, Jonathan Schwarz, Marta Garnelo, Ali Eslami, Dan Rosenbaum, Oriol Vinyals, Yee Whye Teh | We address this issue by incorporating attention into NPs, allowing each input location to attend to the relevant context points for the prediction. |
311 | Representation Degeneration Problem in Training Natural Language Generation Models | Jun Gao, Di He, Xu Tan, Tao Qin, Liwei Wang, Tieyan Liu | We analyze the conditions and causes of this problem and propose a novel regularization method to address it. |
312 | Hierarchical interpretations for neural network predictions | Chandan Singh, W. James Murdoch, Bin Yu | To ameliorate this problem, we introduce the use of hierarchical interpretations to explain DNN predictions through our proposed method: agglomerative contextual decomposition (ACD). |
313 | Spreading vectors for similarity search | Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Herv? J?gou | In this work, we propose to reverse this paradigm and adapt the data to the quantizer: we train a neural net whose last layers form a fixed parameter-free quantizer, such as pre-defined points of a sphere. |
314 | A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks | Sanjeev Arora, Nadav Cohen, Noah Golowich, Wei Hu | We analyze speed of convergence to global optimum for gradient descent training a deep linear neural network by minimizing the L2 loss over whitened data. |
315 | Feed-forward Propagation in Probabilistic Neural Networks with Categorical and Max Layers | Alexander Shekhovtsov, Boris Flach | Probabilistic Neural Networks deal with various sources of stochasticity: input noise, dropout, stochastic neurons, parameter uncertainties modeled as random variables, etc.In this paper we revisit a feed-forward propagation approach that allows one to estimate for each neuron its mean and variance w.r.t. all mentioned sources of stochasticity. |
316 | Measuring and regularizing networks in function space | Ari Benjamin, David Rolnick, Konrad Kording | Here, we show that it is simple and computationally feasible to calculate distances between functions in a $L^2$ Hilbert space. |
317 | Fluctuation-dissipation relations for stochastic gradient descent | Sho Yaida | Here, we derive stationary fluctuation-dissipation relations that link measurable quantities and hyperparameters in the stochastic gradient descent algorithm. |
318 | Poincare Glove: Hyperbolic Word Embeddings | Alexandru Tifrea*, Gary Becigneul*, Octavian-Eugen Ganea* | In this paper, justified by the notion of delta-hyperbolicity or tree-likeliness of a space, we propose to embed words in a Cartesian product of hyperbolic spaces which we theoretically connect to the Gaussian word embeddings and their Fisher geometry. |
319 | Episodic Curiosity through Reachability | Nikolay Savinov, Anton Raichuk, Damien Vincent, Raphael Marinier, Marc Pollefeys, Timothy Lillicrap, Sylvain Gelly | We propose a new curiosity method which uses episodic memory to form the novelty bonus. |
320 | Phase-Aware Speech Enhancement with Deep Complex U-Net | Hyeong-Seok Choi, Jang-Hyun Kim, Jaesung Huh, Adrian Kim, Jung-Woo Ha, Kyogu Lee | To improve speech enhancement performance, we tackle the phase estimation problem in three ways. |
321 | Generative predecessor models for sample-efficient imitation learning | Yannick Schroecker, Mel Vecerik, Jon Scholz | We propose Generative Predecessor Models for Imitation Learning (GPRIL), a novel imitation learning algorithm that matches the state-action distribution to the distribution observed in expert demonstrations, using generative models to reason probabilistically about alternative histories of demonstrated states. |
322 | Adaptive Estimators Show Information Compression in Deep Neural Networks | Ivan Chelombiev, Conor Houghton, Cian O’Donnell | In this paper we developed more robust mutual information estimation techniques, that adapt to hidden activity of neural networks and produce more sensitive measurements of activations from all functions, especially unbounded functions. |
323 | Multilingual Neural Machine Translation With Soft Decoupled Encoding | Xinyi Wang, Hieu Pham, Philip Arthur, Graham Neubig | In this paper, we propose Soft Decoupled Encoding (SDE), a multilingual lexicon encoding framework specifically designed to share lexical-level information intelligently without requiring heuristic preprocessing such as pre-segmenting the data. |
324 | Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet | Wieland Brendel, Matthias Bethge | We here introduce a high-performance DNN architecture on ImageNet whose decisions are considerably easier to explain. |
325 | Reward Constrained Policy Optimization | Chen Tessler, Daniel J. Mankowitz, Shie Mannor | In this work we present a novel multi-timescale approach for constrained policy optimization, called `Reward Constrained Policy Optimization’ (RCPO), which uses an alternative penalty signal to guide the policy towards a constraint satisfying one. |
326 | On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length | Stanislaw Jastrzebski, Zachary Kenton, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey | In this paper we extend previous work by investigating the curvature of the loss surface along the whole training trajectory, rather than only at the endpoint. |
327 | Modeling the Long Term Future in Model-Based Reinforcement Learning | Nan Rosemary Ke, Amanpreet Singh, Ahmed Touati, Anirudh Goyal, Yoshua Bengio, Devi Parikh, Dhruv Batra | To this end, we build a latent-variable autoregressive model by leveraging recent ideas in variational inference. |
328 | Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets | Penghang Yin, Jiancheng Lyu, Shuai Zhang, Stanley Osher, Yingyong Qi, Jack Xin | In this paper, we provide the theoretical justification of the concept of STE by answering this question. |
329 | DISTRIBUTIONAL CONCAVITY REGULARIZATION FOR GANS | Shoichiro Yamaguchi, Masanori Koyama | We propose Distributional Concavity (DC) regularization for Generative Adversarial Networks (GANs), a functional gradient-based method that promotes the entropy of the generator distribution and works against mode collapse.Our DC regularization is an easy-to-implement method that can be used in combination with the current state of the art methods like Spectral Normalization and Wasserstein GAN with gradient penalty to further improve the performance.We will not only show that our DC regularization can achieve highly competitive results on ILSVRC2012 and CIFAR datasets in terms of Inception score and Fr\’echet inception distance, but also provide a mathematical guarantee that our method can always increase the entropy of the generator distribution. |
330 | LeMoNADe: Learned Motif and Neuronal Assembly Detection in calcium imaging videos | Elke Kirschbaum, Manuel Hau?mann, Steffen Wolf, Hannah Sonntag, Justus Schneider, Shehabeldin Elzoheiry, Oliver Kann, Daniel Durstewitz, Fred A Hamprecht | We here propose LeMoNADe, a new exploratory data analysis method that facilitates hunting for motifs in calcium imaging videos, the dominant microscopic functional imaging modality in neurophysiology. |
331 | Competitive experience replay | Hao Liu, Alexander Trott, Richard Socher, Caiming Xiong | We propose a novel method called competitive experience replay, which efficiently supplements a sparse reward by placing learning in the context of an exploration competition between a pair of agents. |
332 | Multi-Domain Adversarial Learning | Alice Schoenauer-Sebag, Louise Heinrich, Marc Schoenauer, Michele Sebag, Lani F. Wu, Steve J. Altschuler | This paper presents a multi-domain adversarial learning approach, MuLANN, to leverage multiple datasets with overlapping but distinct class sets, in a semi-supervised setting. |
333 | ProMP: Proximal Meta-Policy Search | Jonas Rothfuss, Dennis Lee, Ignasi Clavera, Tamim Asfour, Pieter Abbeel | Building on the gained insights we develop a novel meta-learning algorithm that overcomes both the issue of poor credit assignment and previous difficulties in estimating meta-policy gradients. |
334 | Don’t Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors | Vitalii Zhelezniak, Aleksandar Savkov, April Shen, Francesco Moramarco, Jack Flann, Nils Y. Hammerla | We propose a novel fuzzy bag-of-words (FBoW) representation for text that contains all the words in the vocabulary simultaneously but with different degrees of membership, which are derived from similarities between word vectors. |
335 | Stable Opponent Shaping in Differentiable Games | Alistair Letcher, Jakob Foerster, David Balduzzi, Tim Rockt?schel, Shimon Whiteson | In this paper we present Stable Opponent Shaping (SOS), a new method that interpolates between LOLA and a stable variant named LookAhead. |
336 | A Mean Field Theory of Batch Normalization | Greg Yang, Jeffrey Pennington, Vinay Rao, Jascha Sohl-Dickstein, Samuel S. Schoenholz | We develop a mean field theory for batch normalization in fully-connected feedforward neural networks. |
337 | Learning Exploration Policies for Navigation | Tao Chen, Saurabh Gupta, Abhinav Gupta | In this work, we study how agents can autonomously explore realistic and complex 3D environments without the context of task-rewards. |
338 | Distribution-Interpolation Trade off in Generative Models | Damian Lesniak, Igor Sieradzki, Igor Podolak | We investigate the properties of multidimensional probability distributions in the context of latent space prior distributions of implicit generative models. |
339 | Learning to Describe Scenes with Programs | Yunchao Liu, Zheng Wu, Daniel Ritchie, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu | In this paper, we present scene programs, representing a scene via a symbolic program for its objects, attributes, and their relations. |
340 | Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards | Daniel McDuff, Ashish Kapoor | We present a novel approach to reinforcement learning that leverages a task-independent intrinsic reward function trained on peripheral pulse measurements that are correlated with human autonomic nervous system responses. |
341 | Deep Frank-Wolfe For Neural Network Optimization | Leonard Berrada, Andrew Zisserman, M. Pawan Kumar | We present an optimization method that offers empirically the best of both worlds: our algorithm yields good generalization performance while requiring only one hyper-parameter. |
342 | LEARNING TO PROPAGATE LABELS: TRANSDUCTIVE PROPAGATION NETWORK FOR FEW-SHOT LEARNING | Yanbin Liu, Juho Lee, Minseop Park, Saehoon Kim, Eunho Yang, Sung Ju Hwang, Yi Yang | In this paper, we propose Transductive Propagation Network (TPN), a novel meta-learning framework for transductive inference that classifies the entire test set at once to alleviate the low-data problem. |
343 | Improving the Generalization of Adversarial Training with Domain Adaptation | Chuanbiao Song, Kun He, Liwei Wang, John E. Hopcroft | To alleviate this problem, we propose a novel Adversarial Training with Domain Adaptation (ATDA) method. |
344 | Dimensionality Reduction for Representing the Knowledge of Probabilistic Models | Marc T Law, Jake Snell, Amir-massoud Farahmand, Raquel Urtasun, Richard S Zemel | We propose a simple, intuitive and scalable dimension reduction framework that takes into account the soft probabilistic interpretation of standard deep models for classification. |
345 | Learning protein sequence embeddings using information from structure | Tristan Bepler, Bonnie Berger | We introduce a framework that maps any protein sequence to a sequence of vector embeddings — one per amino acid position — that encode structural information. |
346 | Variational Smoothing in Recurrent Neural Network Language Models | Lingpeng Kong, Gabor Melis, Wang Ling, Lei Yu, Dani Yogatama | We present a new theoretical perspective of data noising in recurrent neural network language models (Xie et al., 2017). |
347 | Biologically-Plausible Learning Algorithms Can Scale to Large Datasets | Will Xiao, Honglin Chen, Qianli Liao, Tomaso Poggio | To address this ?weight transport problem? (Grossberg, 1987), two biologically-plausible algorithms, proposed by Liao et al. (2016) and Lillicrap et al. (2016), relax BP?s weight symmetry requirements and demonstrate comparable learning capabilities to that of BP on small datasets. |
348 | Coarse-grain Fine-grain Coattention Network for Multi-evidence Question Answering | Victor Zhong, Caiming Xiong, Nitish Shirish Keskar, Richard Socher | In this work, we propose the Coarse-grain Fine-grain Coattention Network (CFC), a new question answering model that combines information from evidence across multiple documents. |
349 | Learning a Meta-Solver for Syntax-Guided Program Synthesis | Xujie Si, Yuan Yang, Hanjun Dai, Mayur Naik, Le Song | To address these challenges, we propose a meta-learning framework that learns a transferable policy from only weak supervision. |
350 | Towards Robust, Locally Linear Deep Networks | Guang-He Lee, David Alvarez-Melis, Tommi S. Jaakkola | In this paper, we propose a new learning problem to encourage deep networks to have stable derivatives over larger regions. |
351 | How Important is a Neuron | Kedar Dhamdhere, Mukund Sundararajan, Qiqi Yan | We introduce the notion of conductanceto extend the notion of attribution to understanding the importance of hidden units.Informally, the conductance of a hidden unit of a deep network is the flow of attributionvia this hidden unit. |
352 | Learning to Make Analogies by Contrasting Abstract Relational Structure | Felix Hill, Adam Santoro, David Barrett, Ari Morcos, Timothy Lillicrap | Here, we study how analogical reasoning can be induced in neural networks that learn to perceive and reason about raw visual data. |
353 | Learning what you can do before doing anything | Oleh Rybkin, Karl Pertsch, Konstantinos G. Derpanis, Kostas Daniilidis, Andrew Jaegle | In this work, we address the problem of learning an agent?s action space purely from visual observation. |
354 | Learning Grid Cells as Vector Representation of Self-Position Coupled with Matrix Representation of Self-Motion | Ruiqi Gao, Jianwen Xie, Song-Chun Zhu, Ying Nian Wu | This paper proposes a representational model for grid cells. |
355 | Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions | Zaiyi Chen, Zhuoning Yuan, Jinfeng Yi, Bowen Zhou, Enhong Chen, Tianbao Yang | We propose a universal stagewise optimization framework for a broad family of non-smooth non-convex problems with the following key features: (i) at each stage any suitable stochastic convex optimization algorithms (e.g., SGD or AdaGrad) that return an averaged solution can be employed for minimizing a regularized convex problem; (ii) the step size is decreased in a stagewise manner; (iii) an averaged solution is returned as the final solution. |
356 | Invariant and Equivariant Graph Networks | Haggai Maron, Heli Ben-Hamu, Nadav Shamir, Yaron Lipman | A basic challenge in developing such networks is finding the maximal collection of invariant and equivariant \emph{linear} layers. |
357 | Robustness May Be at Odds with Accuracy | Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, Aleksander Madry | We show that there exists an inherent tension between the goal of adversarial robustness and that of standard generalization.Specifically, training robust models may not only be more resource-consuming, but also lead to a reduction of standard accuracy. |
358 | Feature Intertwiner for Object Detection | Hongyang Li, Bo Dai, Shaoshuai Shi, Wanli Ouyang, Xiaogang Wang | In this paper, we address this problem via a new perspective. |
359 | Adversarial Reprogramming of Neural Networks | Gamaleldin F. Elsayed, Ian Goodfellow, Jascha Sohl-Dickstein | We introduce attacks that instead reprogram the target model to perform a task chosen by the attacker without the attacker needing to specify or compute the desired output for each test-time input. |
360 | G-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space | Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Nenghai Yu, Tie-Yan Liu | In this paper, we provide our positive answer to this question. |
361 | From Hard to Soft: Understanding Deep Network Nonlinearities via Vector Quantization and Statistical Inference | Randall Balestriero, Richard Baraniuk | {\em This paper extends the MASO framework to these and an infinitely large class of new nonlinearities by linking deterministic MASOs with probabilistic Gaussian Mixture Models (GMMs).} |
362 | Aggregated Momentum: Stability Through Passive Damping | James Lucas, Shengyang Sun, Richard Zemel, Roger Grosse | We propose Aggregated Momentum (AggMo), a variant of momentum which combines multiple velocity vectors with different damping coefficients. |
363 | Variational Autoencoder with Arbitrary Conditioning | Oleg Ivanov, Michael Figurnov, Dmitry Vetrov | We propose a single neural probabilistic model based on variational autoencoder that can be conditioned on an arbitrary subset of observed features and then sample the remaining features in “one shot”. |
364 | Time-Agnostic Prediction: Predicting Predictable Video Frames | Dinesh Jayaraman, Frederik Ebert, Alexei Efros, Sergey Levine | We evaluate our approach for future and intermediate frame prediction across three robotic manipulation tasks. |
365 | A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation | Akhilesh Gotmare, Nitish Shirish Keskar, Caiming Xiong, Richard Socher | Instead, we revisit the empirical analysis of heuristics through the lens of recently proposed methods for loss surface and representation analysis, viz. mode connectivity and canonical correlation analysis (CCA), and hypothesize reasons why the heuristics succeed. |
366 | Self-Monitoring Navigation Agent via Auxiliary Progress Estimation | Chih-Yao Ma, Jiasen Lu, Zuxuan Wu, Ghassan AlRegib, Zsolt Kira, Richard Socher, Caiming Xiong | In this paper, we introduce a self-monitoring agent with two complementary components: (1) visual-textual co-grounding module to locate the instruction completed in the past, the instruction required for the next action, and the next moving direction from surrounding images and (2) progress monitor to ensure the grounded instruction correctly reflects the navigation progress. |
367 | Kernel Change-point Detection with Auxiliary Deep Generative Models | Wei-Cheng Chang, Chun-Liang Li, Yiming Yang, Barnab?s P?czos | In this paper, we propose KL-CPD, a novel kernel learning framework for time series CPD that optimizes a lower bound of test power via an auxiliary generative model. |
368 | Unsupervised Learning via Meta-Learning | Kyle Hsu, Sergey Levine, Chelsea Finn | Many prior unsupervised learning works aim to do so by developing proxy objectives based on reconstruction, disentanglement, prediction, and other metrics. |
369 | Auxiliary Variational MCMC | Raza Habib, David Barber | We introduce Auxiliary Variational MCMC, a novel framework for learning MCMC kernels that combines recent advances in variational inference with insights drawn from traditional auxiliary variable MCMC methods such as Hamiltonian Monte Carlo. |
370 | Neural network gradient-based learning of black-box function interfaces | Alon Jacovi, Guy Hadash, Einat Kermany, Boaz Carmeli, Ofer Lavi, George Kour, Jonathan Berant | We propose a method for end-to-end training of a base neural network that integrates calls to existing black-box functions. |
371 | Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions | Matthew Mackay, Paul Vicol, Jonathan Lorraine, David Duvenaud, Roger Grosse | We aim to adapt regularization hyperparameters for neural networks by fitting compact approximations to the best-response function, which maps hyperparameters to optimal weights and biases. |
372 | Unsupervised Control Through Non-Parametric Discriminative Rewards | David Warde-Farley, Tom Van de Wiele, Tejas Kulkarni, Catalin Ionescu, Steven Hansen, Volodymyr Mnih | We present an unsupervised learning algorithm to train agents to achieve perceptually-specified goals using only a stream of observations and actions. |
373 | Interpolation-Prediction Networks for Irregularly Sampled Time Series | Satya Narayan Shukla, Benjamin Marlin | In this paper, we present a new deep learning architecture for addressing the problem of supervised learning with sparse and irregularly sampled multivariate time series. |
374 | Riemannian Adaptive Optimization Methods | Gary Becigneul, Octavian-Eugen Ganea | Several first order stochastic optimization methods commonly used in the Euclidean domain such as stochastic gradient descent (SGD), accelerated gradient descent or variance reduced methods have already been adapted to certain Riemannian settings. |
375 | Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters | Marton Havasi, Robert Peharz, Jos? Miguel Hern?ndez-Lobato | A typical approach is to train a set of deterministic weights, while applying certain techniques such as pruning and quantization, in order that the empirical weight distribution becomes amenable to Shannon-style coding schemes. |
376 | Characterizing Audio Adversarial Examples Using Temporal Dependency | Zhuolin Yang, Bo Li, Pin-Yu Chen, Dawn Song | Nonetheless, as unique data properties have inspired distinct and powerful learning principles, this paper aims to explore their potentials towards mitigating adversarial inputs. |
377 | Equi-normalization of Neural Networks | Pierre Stock, Benjamin Graham, R?mi Gribonval, Herv? J?gou | Inspired by the Sinkhorn-Knopp algorithm, we introduce a fast iterative method for minimizing the l2 norm of the weights, equivalently the weight decay regularizer. |
378 | Generalized Tensor Models for Recurrent Neural Networks | Valentin Khrulkov, Oleksii Hrinchuk, Ivan Oseledets | In this work, we attempt to reduce the gap between theory and practice by extending the theoretical analysis to RNNs which employ various nonlinearities, such as Rectified Linear Unit (ReLU), and show that they also benefit from properties of universality and depth efficiency. |
379 | Wizard of Wikipedia: Knowledge-Powered Conversational Agents | Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, Jason Weston | To that end we collect and release a large dataset with conversations directly grounded with knowledge retrieved from Wikipedia. |
380 | Are adversarial examples inevitable? | Ali Shafahi, W. Ronny Huang, Christoph Studer, Soheil Feizi, Tom Goldstein | Given the lack of success at generating robust defenses, we are led to ask a fundamental question: Are adversarial attacks inevitable?This paper analyzes adversarial examples from a theoretical perspective, and identifies fundamental bounds on the susceptibility of a classifier to adversarial attacks. |
381 | A Variational Inequality Perspective on Generative Adversarial Networks | Gauthier Gidel, Hugo Berard, Ga?tan Vignoud, Pascal Vincent, Simon Lacoste-Julien | In this work, we cast GAN optimization problems in the general variational inequality framework. |
382 | Learning-Based Frequency Estimation Algorithms | Chen-Yu Hsu, Piotr Indyk, Dina Katabi, Ali Vakilian | We propose a new class of algorithms that automatically learn relevant patterns in the input data and use them to improve its frequency estimates. |
383 | From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following | Justin Fu, Anoop Korattikara, Sergey Levine, Sergio Guadarrama | In this work, we investigate the problem of grounding language commands as reward functions using inverse reinforcement learning, and argue that language-conditioned rewards are more transferable than language-conditioned policies to new environments. |
384 | Backpropamine: training self-modifying neural networks with differentiable neuromodulated plasticity | Thomas Miconi, Aditya Rawal, Jeff Clune, Kenneth O. Stanley | Extending previous work on differentiable Hebbian plasticity, we propose a differentiable formulation for the neuromodulation of plasticity. |
385 | Recurrent Experience Replay in Distributed Reinforcement Learning | Steven Kapturowski, Georg Ostrovski, John Quan, Remi Munos, Will Dabney | Building on the recent successes of distributed training of RL agents, in this paper we investigate the training of RNN-based RL agents from distributed prioritized experience replay. |
386 | A Generative Model For Electron Paths | John Bradshaw, Matt J. Kusner, Brooks Paige, Marwin H. S. Segler, Jos? Miguel Hern?ndez-Lobato | We propose an electron path prediction model (ELECTRO) to learn these sequences directly from raw reaction data. |
387 | Modeling Uncertainty with Hedged Instance Embeddings | Seong Joon Oh, Kevin P. Murphy, Jiyan Pan, Joseph Roth, Florian Schroff, Andrew C. Gallagher | We introduce the hedged instance embedding (HIB) in which embeddings are modeled as random variables and the model is trained under the variational information bottleneck principle (Alemi et al., 2016; Achille & Soatto, 2018). |
388 | Beyond Greedy Ranking: Slate Optimization via List-CVAE | Ray Jiang, Sven Gowal, Yuqiu Qian, Timothy Mann, Danilo J. Rezende | In this paper, we introduce List Conditional Variational Auto-Encoders (ListCVAE),which learn the joint distribution of documents on the slate conditionedon user responses, and directly generate full slates. |
389 | Stochastic Prediction of Multi-Agent Interactions from Partial Observations | Chen Sun, Per Karlsson, Jiajun Wu, Joshua B Tenenbaum, Kevin Murphy | We present a method which learns to integrate temporal information, from a learned dynamics model, with ambiguous visual information, from a learned vision model, in the context of interacting agents. |
390 | GamePad: A Learning Environment for Theorem Proving | Daniel Huang, Prafulla Dhariwal, Dawn Song, Ilya Sutskever | In this paper, we introduce a system called GamePad that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant. |
391 | GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding | Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman | In pursuit of this objective, we introduce the General Language Understanding Evaluation (GLUE) benchmark, a collection of tools for evaluating the performance of models across a diverse set of existing NLU tasks. |
392 | On Computation and Generalization of Generative Adversarial Networks under Spectrum Control | Haoming Jiang, Zhehui Chen, Minshuo Chen, Feng Liu, Dingding Wang, Tuo Zhao | Motivated by their discovery, we propose a new framework for training GANs, which allows more flexible spectrum control (e.g., making the weight matrices of the discriminator have slow singular value decays). |
393 | Large-Scale Study of Curiosity-Driven Learning | Yuri Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, Alexei A. Efros | In this paper: (a) We perform the first large-scale study of purely curiosity-driven learning, i.e. {\em without any extrinsic rewards}, across $54$ standard benchmark environments, including the Atari game suite. |
394 | Unsupervised Discovery of Parts, Structure, and Dynamics | Zhenjia Xu, Zhijian Liu, Chen Sun, Kevin Murphy, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu | In this paper, we propose a novel formulation that simultaneously learns a hierarchical, disentangled object representation and a dynamics model for object parts from unlabeled videos. |
395 | Music Transformer: Generating Music with Long-Term Structure | Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck | We propose an algorithm that reduces the intermediate memory requirements to linear in the sequence length. |
396 | BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning | Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, Yoshua Bengio | We introduce the BabyAI research platform, with the goal of supporting investigations towards including humans in the loop for grounded language learning. |
397 | Analyzing Inverse Problems with Invertible Neural Networks | Lynton Ardizzone, Jakob Kruse, Carsten Rother, Ullrich K?the | We prove theoretically and verify experimentally, onartificial data and real-world problems from medicine and astrophysics, thatINNs are a powerful analysis tool to find multi-modalities in parameter space,uncover parameter correlations, and identify unrecoverable parameters. |
398 | RelGAN: Relational Generative Adversarial Networks for Text Generation | Weili Nie, Nina Narodytska, Ankit Patel | In this work, we propose RelGAN, a new GAN architecture for text generation, consisting of three main components: a relational memory based generator for the long-distance dependency modeling, the Gumbel-Softmax relaxation for training GANs on discrete data, and multiple embedded representations in the discriminator to provide a more informative signal for the generator updates. |
399 | The Singular Values of Convolutional Layers | Hanie Sedghi, Vineet Gupta, Philip M. Long | We characterize the singular values of the linear transformation associated with a standard 2D multi-channel convolutional layer, enabling their efficient computation. |
400 | An Empirical study of Binary Neural Networks’ Optimisation | Milad Alizadeh, Javier Fern?ndez-Marqu?s, Nicholas D. Lane, Yarin Gal | In this work, we empirically identify and study the effectiveness of the various ad-hoc techniques commonly used in the literature, providing best-practices for efficient training of binary models. |
401 | Approximability of Discriminators Implies Diversity in GANs | Yu Bai, Tengyu Ma, Andrej Risteski | The theoretical work of Arora et al. (2017a) suggests a dilemma about GANs? statistical properties: powerful discriminators cause overfitting, whereas weak discriminators cannot detect mode collapse.By contrast, we show in this paper that GANs can in principle learn distributions in Wasserstein distance (or KL-divergence in many cases) with polynomial sample complexity, if the discriminator class has strong distinguishing power against the particular generator class (instead of against all possible generators). |
402 | Learning Embeddings into Entropic Wasserstein Spaces | Charlie Frogner, Farzaneh Mirzazadeh, Justin Solomon | We propose to exploit this flexibility by learning an embedding that captures the semantic information in the Wasserstein distance between embedded distributions. |
403 | DeepOBS: A Deep Learning Optimizer Benchmark Suite | Frank Schneider, Lukas Balles, Philipp Hennig | As the primary contribution, we present DeepOBS, a Python package of deep learning optimization benchmarks. |
404 | InfoBot: Transfer and Exploration via the Information Bottleneck | Anirudh Goyal, Riashat Islam, DJ Strouse, Zafarali Ahmed, Hugo Larochelle, Matthew Botvinick, Yoshua Bengio, Sergey Levine | We propose to learn about decision states from prior experience. |
405 | The Comparative Power of ReLU Networks and Polynomial Kernels in the Presence of Sparse Latent Structure | Frederic Koehler, Andrej Risteski | We give an almost-tight theoretical analysis of the performance of both neural networks and polynomials for this problem, as well as verify our theory with simulations. |
406 | Learning Implicitly Recurrent CNNs Through Parameter Sharing | Pedro Savarese, Michael Maire | We introduce a parameter sharing scheme, in which different layers of a convolutional neural network (CNN) are defined by a learned linear combination of parameter tensors from a global bank of templates. |
407 | Learning Particle Dynamics for Manipulating Rigid Bodies, Deformable Objects, and Fluids | Yunzhu Li, Jiajun Wu, Russ Tedrake, Joshua B. Tenenbaum, Antonio Torralba | In this paper, we propose to learn a particle-based simulator for complex control tasks. |
408 | Regularized Learning for Domain Adaptation under Label Shifts | Kamyar Azizzadenesheli, Anqi Liu, Fanny Yang, Animashree Anandkumar | We propose Regularized Learning under Label shifts (RLLS), a principled and a practical domain-adaptation algorithm to correct for shifts in the label distribution between a source and a target domain. |
409 | Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs | Sachin Kumar, Yulia Tsvetkov | We propose a general technique for replacing the softmax layer with a continuous embedding layer. |
410 | Relational Forward Models for Multi-Agent Learning | Andrea Tacchetti, H. Francis Song, Pedro A. M. Mediano, Vinicius Zambaldi, J?nos Kram?r, Neil C. Rabinowitz, Thore Graepel, Matthew Botvinick, Peter W. Battaglia | Here we introduce Relational Forward Models (RFM) for multi-agent learning, networks that can learn to make accurate predictions of agents’ future behavior in multi-agent environments. |
411 | Imposing Category Trees Onto Word-Embeddings Using A Geometric Construction | Tiansi Dong, Chrisitan Bauckhage, Hailong Jin, Juanzi Li, Olaf Cremers, Daniel Speicher, Armin B. Cremers, Joerg Zimmermann | We present a novel method to precisely impose tree-structured category information onto word-embeddings, resulting in ball embeddings in higher dimensional spaces (N-balls for short). |
412 | Two-Timescale Networks for Nonlinear Value Function Approximation | Wesley Chung, Somjit Nath, Ajin Joseph, Martha White | In this work, we provide a two-timescale network (TTN) architecture that enables linear methods to be used to learn values, with a nonlinear representation learned at a slower timescale. |
413 | Diversity-Sensitive Conditional Generative Adversarial Networks | Dingdong Yang, Seunghoon Hong, Yunseok Jang, Tianchen Zhao, Honglak Lee | We propose a simple yet highly effective method that addresses the mode-collapse problem in the Conditional Generative Adversarial Network (cGAN). |
414 | Query-Efficient Hard-label Black-box Attack: An Optimization-based Approach | Minhao Cheng, Thong Le, Pin-Yu Chen, Huan Zhang, JinFeng Yi, Cho-Jui Hsieh | We study the problem of attacking machine learning models in the hard-label black-box setting, where no model information is revealed except that the attacker can make queries to probe the corresponding hard-label decisions. |
415 | Rethinking the Value of Network Pruning | Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, Trevor Darrell | In this work, we make several surprising observations which contradict common beliefs. |
416 | Hyperbolic Attention Networks | Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz Hermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro | Recent approaches have successfully demonstrated the benefits of learning the parameters of shallow networks in hyperbolic space. |
417 | Learning from Positive and Unlabeled Data with a Selection Bias | Masahiro Kato, Takeshi Teshima, Junya Honda | In this paper, we propose a method to partially identify the classifier. |
418 | Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network | Xuanqing Liu, Yao Li, Chongruo Wu, Cho-Jui Hsieh | We present a new algorithm to train a robust neural network against adversarial attacks.Our algorithm is motivated by the following two ideas. |
419 | Optimal Completion Distillation for Sequence Learning | Sara Sabour, William Chan, Mohammad Norouzi | We present Optimal Completion Distillation (OCD), a training procedure for optimizing sequence to sequence models based on edit distance. |
420 | Caveats for information bottleneck in deterministic scenarios | Artemy Kolchinsky, Brendan D. Tracey, Steven Van Kuyk | To address problem (1), we propose a functional that, unlike the IB Lagrangian, can recover the IB curve in all cases. |
421 | Deep Learning 3D Shapes Using Alt-az Anisotropic 2-Sphere Convolution | Min Liu, Fupin Yao, Chiho Choi, Ayan Sinha, Karthik Ramani | In this paper, we present a method for applying deep learning to 3D surfaces using their spherical descriptors and alt-az anisotropic convolution on 2-sphere. |
422 | Small nonlinearities in activation functions create bad local minima in neural networks | Chulhee Yun, Suvrit Sra, Ali Jadbabaie | We investigate the loss surface of neural networks. |
423 | Information Theoretic lower bounds on negative log likelihood | Luis A. Lastras-Monta?o | In this article we use rate-distortion theory, a branch of information theory devoted to the problem of lossy compression, to shed light on an important problem in latent variable modeling of data: is there room to improve the model? |
424 | Preferences Implicit in the State of the World | Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca Dragan | This motivates our key insight: when a robot is deployed in an environment that humans act in, the state of the environment is already optimized for what humans want. |
425 | A Kernel Random Matrix-Based Approach for Sparse PCA | Mohamed El Amine Seddik, Mohamed Tamaazousti, Romain Couillet | In this paper, we present a random matrix approach to recover sparse principal components from n p-dimensional vectors. |
426 | Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods | Apratim Bhattacharyya, Mario Fritz, Bernt Schiele | In this work, we propose a novel Bayesian formulation for anticipating future scene states which leverages synthetic likelihoods that encourage the learning of diverse models to accurately capture the multi-modal nature of future scene states. |
427 | There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average | Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson | Motivated by these observations, we propose to train consistency-based methods with Stochastic Weight Averaging (SWA), a recent approach which averages weights along the trajectory of SGD with a modified learning rate schedule. |
428 | Large-Scale Answerer in Questioner’s Mind for Visual Dialog Question Generation | Sang-Woo Lee, Tong Gao, Sohee Yang, Jaejun Yoo, Jung-Woo Ha | To address this, we propose AQM+ that can deal with a large-scale problem and ask a question that is more coherent to the current context of the dialog. |
429 | Graph HyperNetworks for Neural Architecture Search | Chris Zhang, Mengye Ren, Raquel Urtasun | In this work, we propose the Graph HyperNetwork (GHN) to amortize the search cost: given an architecture, it directly generates the weights by running inference on a graph neural network. |
430 | DELTA: DEEP LEARNING TRANSFER USING FEATURE MAP WITH ATTENTION FOR CONVOLUTIONAL NETWORKS | Xingjian Li, Haoyi Xiong, Hanchao Wang, Yuxuan Rao, Liping Liu, Jun Huan | In this paper, we propose a novel regularized transfer learning framework DELTA, namely DEep Learning Transfer using Feature Map with Attention. |
431 | textTOvec: DEEP CONTEXTUALIZED NEURAL AUTOREGRESSIVE TOPIC MODELS OF LANGUAGE WITH DISTRIBUTED COMPOSITIONAL PRIOR | Pankaj Gupta, Yatin Chaudhary, Florian Buettner, Hinrich Schuetze | In this work, we incorporate language structureby combining a neural autoregressive topic model (TM) with a LSTM based languagemodel (LSTM-LM) in a single probabilistic framework. |
432 | Amortized Bayesian Meta-Learning | Sachin Ravi, Alex Beatson | We propose a meta-learning method which efficiently amortizes hierarchical variational inference across tasks, learning a prior distribution over neural network weights so that a few steps of Bayes by Backprop will produce a good task-specific approximate posterior. |
433 | Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning | Ying Wen, Yaodong Yang, Rui Luo, Jun Wang, Wei Pan | In this paper, we start from level-$1$ recursion and introduce a probabilistic recursive reasoning (PR2) framework for multi-agent reinforcement learning. |
434 | Learning Neural PDE Solvers with Convergence Guarantees | Jun-Ting Hsieh, Shengjia Zhao, Stephan Eismann, Lucia Mirabella, Stefano Ermon | In contrast to existing hand-crafted solutions, we propose an approach to learn a fast iterative solver tailored to a specific domain. |
435 | A new dog learns old tricks: RL finds classic optimization algorithms | Weiwei Kong, Christopher Liaw, Aranyak Mehta, D. Sivakumar | This paper introduces a novel framework for learning algorithms to solve online combinatorial optimization problems. |
436 | Deep Graph Infomax | Petar Velickovic, William Fedus, William L. Hamilton, Pietro Li?, Yoshua Bengio, R Devon Hjelm | We present Deep Graph Infomax (DGI), a general approach for learning node representations within graph-structured data in an unsupervised manner. |
437 | Theoretical Analysis of Auto Rate-Tuning by Batch Normalization | Sanjeev Arora, Zhiyuan Li, Kaifeng Lyu | It is shown that even if we fix the learning rate of scale-invariant parameters (e.g., weights of each layer with BN) to a constant (say, 0.3), gradient descent still approaches a stationary point (i.e., a solution where gradient is zero) in the rate of T^{-1/2} in T iterations, asymptotically matching the best bound for gradient descent with well-tuned learning rates. |
438 | Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm | Charbel Sakr, Naresh Shanbhag | We describe a precision assignment methodology for neural network training in which all network parameters, i.e., activations and weights in the feedforward path, gradients and weight accumulators in the feedback path, are assigned close to minimal precision. |
439 | FUNCTIONAL VARIATIONAL BAYESIAN NEURAL NETWORKS | Shengyang Sun, Guodong Zhang, Jiaxin Shi, Roger Grosse | Based on this, we introduce a practical training objective which approximates the functional ELBO using finite measurement sets and the spectral Stein gradient estimator. |
440 | NADPEx: An on-policy temporally consistent exploration method for deep reinforcement learning | Sirui Xie, Junning Huang, Lanxin Lei, Chunxiao Liu, Zheng Ma, Wei Zhang, Liang Lin | In this work, we introduce a novel on-policy temporally consistent exploration strategy – Neural Adaptive Dropout Policy Exploration (NADPEx) – for deep reinforcement learning agents. |
441 | SPIGAN: Privileged Adversarial Learning from Simulation | Kuan-Hui Lee, German Ros, Jie Li, Adrien Gaidon | We propose a new unsupervised domain adaptation algorithm, called SPIGAN, relying on Simulator Privileged Information (PI) and Generative Adversarial Networks (GAN). |
442 | Generating Multi-Agent Trajectories using Programmatic Weak Supervision | Eric Zhan, Stephan Zheng, Yisong Yue, Long Sha, Patrick Lucey | We present a hierarchical framework that can effectively learn such sequential generative models. |
443 | Label super-resolution networks | Kolya Malkin, Caleb Robinson, Le Hou, Rachel Soobitsky, Jacob Czawlytko, Dimitris Samaras, Joel Saltz, Lucas Joppa, Nebojsa Jojic | We present a deep learning-based method for super-resolving coarse (low-resolution) labels assigned to groups of image pixels into pixel-level (high-resolution) labels, given the joint distribution between those low- and high-resolution labels. |
444 | ANYTIME MINIBATCH: EXPLOITING STRAGGLERS IN ONLINE DISTRIBUTED OPTIMIZATION | Nuwan Ferdinand, Haider Al-Lawati, Stark Draper, Matthew Nokleby | To mitigate the impact of stragglers, we propose an online distributed optimization method called Anytime Minibatch. |
445 | Sample Efficient Adaptive Text-to-Speech | Yutian Chen, Yannis Assael, Brendan Shillingford, David Budden, Scott Reed, Heiga Zen, Quan Wang, Luis C. Cobo, Andrew Trask, Ben Laurie, Caglar Gulcehre, A?ron van den Oord, Oriol Vinyals, Nando de Freitas | We present a meta-learning approach for adaptive text-to-speech (TTS) with few data. |
446 | Practical lossless compression with latent variables using bits back coding | James Townsend, Thomas Bird, David Barber | We present ‘`Bits Back with ANS’ (BB-ANS), a scheme to perform lossless compression with latent variable models at a near optimal rate. |
447 | Kernel RNN Learning (KeRNL) | Christopher Roth, Ingmar Kanitscheider, Ila Fiete | We describe Kernel RNN Learning (KeRNL), a reduced-rank, temporal eligibility trace-based approximation to backpropagation through time (BPTT) for training recurrent neural networks (RNNs) that gives competitive performance to BPTT on long time-dependence tasks. |
448 | Deep, Skinny Neural Networks are not Universal Approximators | Jesse Johnson | In this paper, we examine the topological constraints that the architecture of a neural network imposes on the level sets of all the functions that it is able to approximate. |
449 | Large Scale Graph Learning From Smooth Signals | Vassilis Kalofolias, Nathana?l Perraudin | Our algorithm uses known approximate nearest neighbor techniques to reduce the number of variables, and automatically selects the correct parameters of the model, requiring a single intuitive input: the desired edge density. |
450 | Overcoming Catastrophic Forgetting for Continual Learning via Model Adaptation | Wenpeng Hu, Zhou Lin, Bing Liu, Chongyang Tao, Zhengwei Tao, Jinwen Ma, Dongyan Zhao, Rui Yan | In this paper, we propose a very different approach, called Parameter Generation and Model Adaptation (PGMA), to dealing with the problem. |
451 | Analysis of Quantized Models | Lu Hou, Ruiliang Zhang, James T. Kwok | In recent years, manyweight-quantized models have been proposed. |
452 | Deep learning generalizes because the parameter-function map is biased towards simple functions | Guillermo Valle-Perez, Chico Q. Camargo, Ard A. Louis | In this paper, we provide a new explanation. |
453 | Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks | Amanpreet Singh, Tushar Jain, Sainbayar Sukhbaatar | In this paper, we present Individualized Controlled Continuous Communication Model (IC3Net) which has better training efficiency than simple continuous communication model, and can be applied to semi-cooperative and competitive settings along with the cooperative settings. |
454 | Synthetic Datasets for Neural Program Synthesis | Richard Shin, Neel Kant, Kavi Gupta, Chris Bender, Brandon Trabucco, Rishabh Singh, Dawn Song | The goal of program synthesis is to automatically generate programs in a particular language from corresponding specifications, e.g. input-output behavior.Many current approaches achieve impressive results after training on randomly generated I/O examples in limited domain-specific languages (DSLs), as with string transformations in RobustFill.However, we empirically discover that applying test input generation techniques for languages with control flow and rich input space causes deep networks to generalize poorly to certain data distributions;to correct this, we propose a new methodology for controlling and evaluating the bias of synthetic data distributions over both programs and specifications.We demonstrate, using the Karel DSL and a small Calculator DSL, that training deep networks on these distributions leads to improved cross-distribution generalization performance. |
455 | DPSNet: End-to-end Deep Plane Sweep Stereo | Sunghoon Im, Hae-Gon Jeon, Stephen Lin, In So Kweon | In this paper, we present a convolutional neural network called DPSNet (Deep Plane Sweep Network) whose design is inspired by best practices of traditional geometry-based approaches. |
456 | Conditional Network Embeddings | Bo Kang, Jefrey Lijffijt, Tijl De Bie | In recent years various methods for NE have been introduced, all following a similar strategy: defining a notion of similarity between nodes (typically some distance measure within the network), a distance measure in the embedding space, and a loss function that penalizes large distances for similar nodes and small distances for dissimilar nodes.A difficulty faced by existing methods is that certain networks are fundamentally hard to embed due to their structural properties: (approximate) multipartiteness, certain degree distributions, assortativity, etc. |
457 | Defensive Quantization: When Efficiency Meets Robustness | Ji Lin, Chuang Gan, Song Han | This paper aims to raise people’s awareness about the security of the quantized models, and we designed a novel quantization methodology to jointly optimize the efficiency and robustness of deep learning models. |
458 | GO Gradient for Expectation-Based Objectives | Yulai Cong, Miaoyun Zhao, Ke Bai, Lawrence Carin | To address these limitations, we propose a General and One-sample (GO) gradient that ($i$) applies to many distributions associated with non-reparameterizable continuous {\em or} discrete random variables, and ($ii$) has the same low-variance as the reparameterization trick. |
459 | h-detach: Modifying the LSTM Gradient Towards Better Optimization | Bhargav Kanuparthi, Devansh Arpit, Giancarlo Kerg, Nan Rosemary Ke, Ioannis Mitliagkas, Yoshua Bengio | We introduce a simple stochastic algorithm (\textit{h}-detach) that is specific to LSTM optimization and targeted towards addressing this problem. |
460 | An analytic theory of generalization dynamics and transfer learning in deep linear networks | Andrew K. Lampinen, Surya Ganguli | We develop an analytic theory of the nonlinear dynamics of generalization in deep linear networks, both within and across tasks. |
461 | Differentiable Learning-to-Normalize via Switchable Normalization | Ping Luo, Jiamin Ren, Zhanglin Peng, Ruimao Zhang, Jingyu Li | We address a learning-to-normalize problem by proposing Switchable Normalization (SN), which learns to select different normalizers for different normalization layers of a deep neural network. |
462 | SOM-VAE: Interpretable Discrete Representation Learning on Time Series | Vincent Fortuin, Matthias H?ser, Francesco Locatello, Heiko Strathmann, Gunnar R?tsch | This is due to non-intuitive mappings from data features to salient properties of the representation and non-smoothness over time.To address this problem, we propose a new representation learning framework building on ideas from interpretable discrete dimensionality reduction and deep generative modeling. |
463 | Hierarchical Generative Modeling for Controllable Speech Synthesis | Wei-Ning Hsu, Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Yuxuan Wang, Yuan Cao, Ye Jia, Zhifeng Chen, Jonathan Shen, Patrick Nguyen, Ruoming Pang | This paper proposes a neural end-to-end text-to-speech (TTS) model which can control latent attributes in the generated speech that are rarely annotated in the training data, such as speaking style, accent, background noise, and recording conditions. |
464 | Learning Factorized Multimodal Representations | Yao-Hung Hubert Tsai, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency, Ruslan Salakhutdinov | In this paper, we propose to optimize for a joint generative-discriminative objective across multimodal data and labels. |
465 | Composing Complex Skills by Learning Transition Policies | Youngwoon Lee*, Shao-Hua Sun*, Sriram Somasundaram, Edward S. Hu, Joseph J. Lim | To empower machines with this ability, we propose a method that can learn transition policies which effectively connect primitive skills to perform sequential tasks without handcrafted rewards. |
466 | Human-level Protein Localization with Convolutional Neural Networks | Elisabeth Rumetshofer, Markus Hofmarcher, Clemens R?hrl, Sepp Hochreiter, G?nter Klambauer | We present the largest comparison of CNN architectures including GapNet-PL for protein localization in HTI images of human cells. |
467 | Environment Probing Interaction Policies | Wenxuan Zhou, Lerrel Pinto, Abhinav Gupta | In this work, we propose the ?Environment-Probing? Interaction (EPI) policy, a policy that probes a new environment to extract an implicit understanding of that environment?s behavior. |
468 | Lagging Inference Networks and Posterior Collapse in Variational Autoencoders | Junxian He, Daniel Spokoyny, Graham Neubig, Taylor Berg-Kirkpatrick | In this paper, we investigate posterior collapse from the perspective of training dynamics. |
469 | A2BCD: Asynchronous Acceleration with Optimal Complexity | Robert Hannah, Fei Feng, Wotao Yin | A2BCD: Asynchronous Acceleration with Optimal Complexity. |
470 | Learning to Infer and Execute 3D Shape Programs | Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu | In this paper, we propose 3D shape programs, integrating bottom-up recognition systems with top-down, symbolic program structure to capture both low-level geometry and high-level structural priors for 3D shapes. |
471 | Deep Decoder: Concise Image Representations from Untrained Non-convolutional Networks | Reinhard Heckel, Paul Hand | Contrary to classical tools such as wavelets, image-generating deep neural networks have a large number of parameters—typically a multiple of their output dimension—and need to be trained on large datasets.In this paper, we propose an untrained simple image model, called the deep decoder, which is a deep neural network that can generate natural images from very few weight parameters.The deep decoder has a simple architecture with no convolutions and fewer weight parameters than the output dimensionality. |
472 | SNAS: stochastic neural architecture search | Sirui Xie, Hehui Zheng, Chunxiao Liu, Liang Lin | In this work, NAS is reformulated as an optimization problem on parameters of a joint distribution for the search space in a cell. |
473 | Revealing interpretable object representations from human behavior | Charles Y. Zheng, Francisco Pereira, Chris I. Baker, Martin N. Hebart | Revealing interpretable object representations from human behavior |
474 | AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks | Bo Chang, Minmin Chen, Eldad Haber, Ed H. Chi | In this paper, we draw connections between recurrent networks and ordinary differential equations. |
475 | Global-to-local Memory Pointer Networks for Task-Oriented Dialogue | Chien-Sheng Wu, Richard Socher, Caiming Xiong | We propose the global-to-local memory pointer (GLMP) networks to address this issue. |
476 | InstaGAN: Instance-aware Image-to-Image Translation | Sangwoo Mo, Minsu Cho, Jinwoo Shin | To tackle the issues, we propose a novel method, coined instance-aware GAN (InstaGAN), that incorporates the instance information (e.g., object segmentation masks) and improves multi-instance transfiguration. |
477 | Deep Layers as Stochastic Solvers | Adel Bibi, Bernard Ghanem, Vladlen Koltun, Rene Ranftl | We provide a novel perspective on the forward pass through a block of layers in a deep network. |
478 | Learning Multi-Level Hierarchies with Hindsight | Andrew Levy, George Konidaris, Robert Platt, Kate Saenko | To address this problem, we introduce a framework that can learn multiple levels of policies in parallel. |