Paper Digest: ICLR 2020 Highlights
The International Conference on Learning Representations (ICLR) is one of the top machine learning conferences in the world. In 2020, it is to be held in Addis Ababa, Ethiopia. There were 2,594 paper submissions, of which 48 accepted as 10 minute oral presentations, 107 accepted as 4 minute spotlight presentations and 532 as poster presentations. Around 200 papers also published their code (download link).
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
team@paperdigest.org
TABLE 1: ICLR 2020 Oral Papers
Title | Authors | Highlight | Code | |
---|---|---|---|---|
1 | CATER: A diagnostic dataset for Compositional Actions & TEmporal Reasoning | Rohit Girdhar, Deva Ramanan | We propose a new video understanding benchmark, with tasks that by-design require temporal reasoning to be solved, unlike most existing video datasets. | |
2 | BackPACK: Packing more into Backprop | Felix Dangel, Frederik Kunstner, Philipp Hennig | To address this problem, we introduce BackPACK, an efficient framework built on top of PyTorch, that extends the backpropagation algorithm to extract additional information from first-and second-order derivatives. | code |
3 | GenDICE: Generalized Offline Estimation of Stationary Values | Ruiyi Zhang*, Bo Dai*, Lihong Li, Dale Schuurmans | In this paper, we proposed a novel algorithm, GenDICE, for general stationary distribution correction estimation, which can handle both discounted and average off-policy evaluation on multiple behavior-agnostic samples. | |
4 | Principled Weight Initialization for Hypernetworks | Oscar Chang, Lampros Flokas, Hod Lipson | The first principled weight initialization method for hypernetworks | |
5 | On the Convergence of FedAvg on Non-IID Data | Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, Zhihua Zhang | In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs. | |
6 | Data-dependent Gaussian Prior Objective for Language Generation | Zuchao Li, Rui Wang, Kehai Chen, Masso Utiyama, Eiichiro Sumita, Zhuosheng Zhang, Hai Zhao | We introduce an extra data-dependent Gaussian prior objective to augment the current MLE training, which is designed to capture the prior knowledge in the ground-truth data. | code |
7 | Contrastive Learning of Structured World Models | Thomas Kipf, Elise van der Pol, Max Welling | Contrastively-trained Structured World Models (C-SWMs) learn object-oriented state representations and a relational model of an environment from raw pixel input. | code |
8 | Neural Network Branching for Neural Network Verification | Jingyue Lu, M. Pawan Kumar | We propose a novel learning to branch framework using graph neural networks to improve branch and bound based neural network verification methods. | |
9 | Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity | Jingzhao Zhang, Tianxing He, Suvrit Sra, Ali Jadbabaie | Gradient clipping provably accelerates gradient descent for non-smooth non-convex functions. | |
10 | Posterior sampling for multi-agent reinforcement learning: solving extensive games with imperfect information | Yichi Zhou, Jialian Li, Jun Zhu | In this work, we extend PSRL to two-player zero-sum extensive-games with imperfect information (TZIEG), which is a class of multi-agent systems. | |
11 | Mogrifier LSTM | G?bor Melis, Tom? Kocisk?, Phil Blunsom | An LSTM extension with state-of-the-art language modelling results. | |
12 | Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech | David Harwath, Wei-Ning Hsu, James Glass | Vector quantization layers incorporated into a self-supervised neural model of speech audio learn hierarchical and discrete linguistic units (phone-like, word-like) when trained with a visual-grounding objective. | |
13 | Mirror-Generative Neural Machine Translation | Zaixiang Zheng, Hao Zhou, Shujian Huang, Lei Li, Xin-Yu Dai, Jiajun Chen | In this paper, we propose the mirror-generative NMT (MGNMT), a single unified architecture that simultaneously integrates the source to target translation model, the target to source translation model, and two language models. | |
14 | Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning | Ruqi Zhang, Chunyuan Li, Jianyi Zhang, Changyou Chen, Andrew Gordon Wilson | In particular, we propose a cyclical stepsize schedule, where larger steps discover new modes, and smaller steps characterize each mode. | |
15 | Your classifier is secretly an energy based model and you should treat it like one | Will Grathwohl, Kuan-Chieh Wang, Joern-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, Kevin Swersky | We show that there is a hidden generative model inside of every classifier. We demonstrate how to train this model and show the many benefits of doing so. | |
16 | Dynamics-Aware Unsupervised Skill Discovery | Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, Karol Hausman | We propose an unsupervised skill discovery which enables model-based planning for hierarchical reinforcement learning. | |
17 | Optimal Strategies Against Generative Attacks | Roy Mor, Erez Peterfreund, Matan Gavish, Amir Globerson | We cast the problem as a maximin game, characterize the optimal strategy for both attacker and authenticator in the general case, and provide the optimal strategies in closed form for the case of Gaussian source distributions. | code |
18 | GraphZoom: A Multi-level Spectral Approach for Accurate and Scalable Graph Embedding | Chenhui Deng, Zhiqiang Zhao, Yongyu Wang, Zhiru Zhang, Zhuo Feng | A multi-level spectral approach to improving the quality and scalability of unsupervised graph embedding. | code |
19 | Harnessing Structures for Value-Based Planning and Reinforcement Learning | Yuzhe Yang, Guo Zhang, Zhi Xu, Dina Katabi | We propose a generic framework that allows for exploiting the low-rank structure in both planning and deep reinforcement learning. | |
20 | Comparing Fine-tuning and Rewinding in Neural Network Pruning | Alex Renda, Jonathan Frankle, Michael Carbin | Instead of fine-tuning after pruning, rewind weights to their values earlier in training and re-train the networks to achieve higher accuracy when pruning neural networks. | code |
21 | Meta-Q-Learning | Rasool Fakoor, Pratik Chaudhari, Stefano Soatto, Alexander J. Smola | MQL is a simple off-policy meta-RL algorithm that recycles data from the meta-training replay buffer to adapt to new tasks. | |
22 | Mathematical Reasoning in Latent Space | Dennis Lee, Christian Szegedy, Markus Rabe, Sarah Loos, Kshitij Bansal | Learning to reason about higher order logic formulas in the latent space. | |
23 | A Theory of Usable Information under Computational Constraints | Yilun Xu, Shengjia Zhao, Jiaming Song, Russell Stewart, Stefano Ermon | We propose a new framework for reasoning about information in complex systems. | |
24 | Geometric Analysis of Nonconvex Optimization Landscapes for Overcomplete Learning | Qing Qu, Yuexiang Zhai, Xiao Li, Yuqian Zhang, Zhihui Zhu | In this work, we provide new theoretical insights for several important representation learning problems: learning \emph{(i)} sparsely used overcomplete dictionaries and \emph{(ii)} convolutional dictionaries. | |
25 | Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds | Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, Alekh Agarwal | We introduce a new batch active learning algorithm that’s robust to model architecture, batch size, and dataset. | |
26 | Understanding and Robustifying Differentiable Architecture Search | Arber Zela, Thomas Elsken, Tonmoy Saikia, Yassine Marrakchi, Thomas Brox, Frank Hutter | We study the failure modes of DARTS (Differentiable Architecture Search) by looking at the eigenvalues of the Hessian of validation loss w.r.t. the architecture and propose robustifications based on our analysis. | code |
27 | A Closer Look at Deep Policy Gradients | Andrew Ilyas, Logan Engstrom, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry | To this end, we propose a fine-grained analysis of state-of-the-art methods based on key elements of this framework: gradient estimation, value prediction, and optimization landscapes. | |
28 | Implementation Matters in Deep RL: A Case Study on PPO and TRPO | Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry | We study the roots of algorithmic progress in deep policy gradient algorithms through a case study on two popular algorithms, Proximal Policy Optimization and Trust Region Policy Optimization. | code |
29 | Fast Task Inference with Variational Intrinsic Successor Features | Steven Hansen, Will Dabney, Andre Barreto, David Warde-Farley, Tom Van de Wiele, Volodymyr Mnih | We introduce Variational Intrinsic Successor FeatuRes (VISR), a novel algorithm which learns controllable features that can be leveraged to provide fast task inference through the successor features framework. | |
30 | Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks | Donghyun Na, Hae Beom Lee, Hayeon Lee, Saehoon Kim, Minseop Park, Eunho Yang, Sung Ju Hwang | A novel meta-learning model that adaptively balances the effect of the meta-learning and task-specific learning, and also class-specific learning within each task. | code |
31 | RNA Secondary Structure Prediction By Learning Unrolled Algorithms | Xinshi Chen, Yu Li, Ramzan Umarov, Xin Gao, Le Song | A DL model for RNA secondary structure prediction, which uses an unrolled algorithm in the architecture to enforce constraints. | |
32 | Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search | Anji Liu, Jianshu Chen, Mingze Yu, Yu Zhai, Xuewen Zhou, Ji Liu | We developed an effective parallel UCT algorithm that achieves linear speedup and suffers negligible performance loss. | |
33 | Target-Embedding Autoencoders for Supervised Representation Learning | Daniel Jarrett, Mihaela van der Schaar | This paper analyzes a framework for improving generalization in a purely supervised setting, where the target space is high-dimensional. | |
34 | Reformer: The Efficient Transformer | Nikita Kitaev, Lukasz Kaiser, Anselm Levskaya | Efficient Transformer with locality-sensitive hashing and reversible layers | code |
35 | Rotation-invariant clustering of functional cell types in primary visual cortex | Ivan Ustyuzhaninov, Santiago A. Cadena, Emmanouil Froudarakis, Paul G. Fahey, Edgar Y. Walker, Erick Cobos, Jacob Reimer, Fabian H. Sinz, Andreas S. Tolias, Matthias Bethge, Alexander S. Ecker | We classify mouse V1 neurons into putative functional cell types based on their representations in a CNN predicting neural responses | |
36 | Causal Discovery with Reinforcement Learning | Shengyu Zhu, Ignavier Ng, Zhitang Chen | We apply reinforcement learning to score-based causal discovery and achieve promising results on both synthetic and real datasets | |
37 | Intrinsically Motivated Discovery of Diverse Patterns in Self-Organizing Systems | Chris Reinke, Mayalen Etcheverry, Pierre-Yves Oudeyer | We study how an unsupervised exploration and feature learning approach addresses efficiently a new problem: automatic discovery of diverse self-organized patterns in high-dim complex systems such as the game of life. | code |
38 | Restricting the Flow: Information Bottlenecks for Attribution | Karl Schulz, Leon Sixt, Federico Tombari, Tim Landgraf | We apply the informational bottleneck concept to attribution. | code |
39 | Building Deep Equivariant Capsule Networks | Sairaam Venkatraman, S. Balasubramanian, R. Raghunatha Sarma | A new scalable, group-equivariant model for capsule networks that preserves compositionality under transformations, and is empirically more transformation-robust to older capsule network models. | code |
40 | A Generalized Training Approach for Multiagent Learning | Paul Muller, Shayegan Omidshafiei, Mark Rowland, Karl Tuyls, Julien Perolat, Siqi Liu, Daniel Hennes, Luke Marris, Marc Lanctot, Edward Hughes, Zhe Wang, Guy Lever, Nicolas Heess, Thore Graepel, Remi Munos | This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO). | |
41 | High Fidelity Speech Synthesis with Adversarial Networks | Mikolaj Binkowski, Jeff Donahue, Sander Dieleman, Aidan Clark, Erich Elsen, Norman Casagrande, Luis C. Cobo, Karen Simonyan | We introduce GAN-TTS, a Generative Adversarial Network for Text-to-Speech, which achieves Mean Opinion Score (MOS) 4.2. | |
42 | SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference | Lasse Espeholt, Rapha?l Marinier, Piotr Stanczyk, Ke Wang, Marcin Michalski? | SEED RL, a scalable and efficient deep reinforcement learning agent with accelerated central inference. State of the art results, reduces cost and can process millions of frames per second. | code |
43 | Meta-Learning with Warped Gradient Descent | Sebastian Flennerhag, Andrei A. Rusu, Razvan Pascanu, Francesco Visin, Hujun Yin, Raia Hadsell | We propose a novel framework for meta-learning a gradient-based update rule that scales to beyond few-shot learning and is applicable to any form of learning, including continual learning. | |
44 | Convolutional Conditional Neural Processes | Jonathan Gordon, Wessel P. Bruinsma, Andrew Y. K. Foong, James Requeima, Yann Dubois, Richard E. Turner | We extend deep sets to functional embeddings and Neural Processes to include translation equivariant members | |
45 | Gradient Descent Maximizes the Margin of Homogeneous Neural Networks | Kaifeng Lyu, Jian Li | We study the implicit bias of gradient descent and prove under a minimal set of assumptions that the parameter direction of homogeneous models converges to KKT points of a natural margin maximization problem. | |
46 | Adversarial Training and Provable Defenses: Bridging the Gap | Mislav Balunovic, Martin Vechev | We propose a novel combination of adversarial training and provable defenses which produces a model with state-of-the-art accuracy and certified robustness on CIFAR-10. | |
47 | Differentiable Reasoning over a Virtual Knowledge Base | Bhuwan Dhingra, Manzil Zaheer, Vidhisha Balachandran, Graham Neubig, Ruslan Salakhutdinov, William W. Cohen | Differentiable multi-hop access to a textual knowledge base of indexed contextual representations | |
48 | Federated Learning with Matched Averaging | Hongyi Wang, Mikhail Yurochkin, Yuekai Sun, Dimitris Papailiopoulos, Yasaman Khazaeni | Communication efficient federated learning with layer-wise matching |
TABLE 2: ICLR 2020 Spotlights
Title | Authors | Highlight | Code | |
---|---|---|---|---|
1 | Program Guided Agent | Shao-Hua Sun, Te-Lin Wu, Joseph J. Lim | We propose a modular framework that can accomplish tasks specified by programs and achieve zero-shot generalization to more complex tasks. | |
2 | Sparse Coding with Gated Learned ISTA | Kailun Wu, Yiwen Guo, Ziang Li, Changshui Zhang | We propose gated mechanisms to enhance learned ISTA for sparse coding, with theoretical guarantees on the superiority of the method. | |
3 | Graph Neural Networks Exponentially Lose Expressive Power for Node Classification | Kenta Oono, Taiji Suzuki | We relate the asymptotic behavior of graph neural networks to the graph spectra of underlying graphs and gives principled guidelines for normalizing weights. | code |
4 | Multi-Scale Representation Learning for Spatial Feature Distributions using Grid Cells | Gengchen Mai, Krzysztof Janowicz, Bo Yan, Rui Zhu, Ling Cai, Ni Lao | We propose a representation learning model called Space2vec to encode the absolute positions and spatial relationships of places. | |
5 | InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization | Fan-Yun Sun, Jordan Hoffman, Vikas Verma, Jian Tang | Inspired by recent progress of unsupervised representation learning, in this paper we proposed a novel method called InfoGraph for learning graph-level representations. | |
6 | On Robustness of Neural Ordinary Differential Equations | Hanshu YAN, Jiawei DU, Vincent TAN, Jiashi FENG | In this work, we fill this important gap by exploring robustness properties of neural ODEs both empirically and theoretically. | |
7 | Defending Against Physically Realizable Attacks on Image Classification | Tong Wu, Liang Tong, Yevgeniy Vorobeychik | Defending Against Physically Realizable Attacks on Image Classification | |
8 | Estimating Gradients for Discrete Random Variables by Sampling without Replacement | Wouter Kool, Herke van Hoof, Max Welling | We derive a low-variance, unbiased gradient estimator for expectations over discrete random variables based on sampling without replacement | code |
9 | Learning to Control PDEs with Differentiable Physics | Philipp Holl, Nils Thuerey, Vladlen Koltun | We train a combination of neural networks to predict optimal trajectories for complex physical systems. | |
10 | Intensity-Free Learning of Temporal Point Processes | Oleksandr Shchur, Marin Bilo?, Stephan G?nnemann | Learn in temporal point processes by modeling the conditional density, not the conditional intensity. | code |
11 | A Signal Propagation Perspective for Pruning Neural Networks at Initialization | Namhoon Lee, Thalaiyasingam Ajanthan, Stephen Gould, Philip H. S. Torr | We formally characterize the initialization conditions for effective pruning at initialization and analyze the signal propagation properties of the resulting pruned networks which leads to a method to enhance their trainability and pruning results. | |
12 | Rethinking the Security of Skip Connections in ResNet-like Neural Networks | Dongxian Wu, Yisen Wang, Shu-Tao Xia, James Bailey, Xingjun Ma | We identify the security weakness of skip connections in ResNet-like neural networks | |
13 | WHITE NOISE ANALYSIS OF NEURAL NETWORKS | Ali Borji, Sikun Lin | A white noise analysis of modern deep neural networks is presented to characterizetheir biases at the whole network level or the single neuron level. | |
14 | Neural Machine Translation with Universal Visual Representation | Zhuosheng Zhang, Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Zuchao Li, Hai Zhao | This work proposed a universal visual representation for neural machine translation (NMT) using retrieved images with similar topics to source sentence, extending image applicability in NMT. | |
15 | Tranquil Clouds: Neural Networks for Learning Temporally Coherent Features in Point Clouds | Lukas Prantl, Nuttapong Chentanez, Stefan Jeschke, Nils Thuerey | We propose a generative neural network approach for temporally coherent point clouds. | |
16 | PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search | Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, Hongkai Xiong | Allowing partial channel connection in super-networks to regularize and accelerate differentiable architecture search | code |
17 | Online and stochastic optimization beyond Lipschitz continuity: A Riemannian approach | Kimon Antonakopoulos, E. Veronica Belmega, Panayotis Mertikopoulos | We introduce a novel version of Lipschitz objective continuity that allows stochastic mirror descent methodologies to achieve optimal convergence rates in problems with singularities. | |
18 | Enhancing Adversarial Defense by k-Winners-Take-All | Chang Xiao, Peilin Zhong, Changxi Zheng | We propose a simple change to existing neural network structures for better defending against gradient-based adversarial attacks, using the k-winners-take-all activation function. | code |
19 | Encoding word order in complex embeddings | Benyou Wang, Donghao Zhao, Christina Lioma, Qiuchi Li, Peng Zhang, Jakob Grue Simonsen | We present a novel and principled solution for modeling both the global absolute positions of words and their order relationships. | code |
20 | DDSP: Differentiable Digital Signal Processing | Jesse Engel, Lamtharn (Hanoi) Hantrakul, Chenjie Gu, Adam Roberts | Better audio synthesis by combining interpretable DSP with end-to-end learning. | |
21 | Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation | Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang, Ming-Hsuan Yang | In this work, we address the problem of few-shot classification under domain shifts for metric-based methods. | |
22 | Ridge Regression: Structure, Cross-Validation, and Sketching | Sifan Liu, Edgar Dobriban | We study the structure of ridge regression in a high-dimensional asymptotic framework, and get insights about cross-validation and sketching. | code |
23 | Finite Depth and Width Corrections to the Neural Tangent Kernel | Boris Hanin, Mihai Nica | The neural tangent kernel in a randomly initialized ReLU net is non-trivial fluctuations as long as the depth and width are comparable. | |
24 | Meta-Learning without Memorization | Mingzhang Yin, George Tucker, Mingyuan Zhou, Sergey Levine, Chelsea Finn | We identify and formalize the memorization problem in meta-learning and solve this problem with novel meta-regularization method, which greatly expand the domain that meta-learning can be applicable to and effective on. | |
25 | Influence-Based Multi-Agent Exploration | Tonghan Wang*, Jianhao Wang*, Yi Wu, Chongjie Zhang | We present two exploration methods: exploration via information-theoretic influence (EITI) and exploration via decision-theoretic influence (EDTI), by exploiting the role of interaction in coordinated behaviors of agents. | code |
26 | HOPPITY: LEARNING GRAPH TRANSFORMATIONS TO DETECT AND FIX BUGS IN PROGRAMS | Elizabeth Dinella, Hanjun Dai, Ziyang Li, Mayur Naik, Le Song, Ke Wang | An learning-based approach for detecting and fixing bugs in Javascript | |
27 | Sliced Cramer Synaptic Consolidation for Preserving Deeply Learned Representations | Soheil Kolouri, Nicholas A. Ketz, Andrea Soltoggio, Praveen K. Pilly | “A novel framework for overcoming catastrophic forgetting by preserving the distribution of the network’s output at an arbitrary layer.” | |
28 | How much Position Information Do Convolutional Neural Networks Encode? | Md Amirul Islam*, Sen Jia*, Neil D. B. Bruce | Our work shows positional information has been implicitly encoded in a network. This information is important for detecting position-dependent features, e.g. semantic and saliency. | |
29 | Hamiltonian Generative Networks | Aleksandar Botev, Irina Higgins, Andrew Jaegle, Sebastian Racaniere, Danilo J. Rezende, Peter Toth | We introduce a class of generative models that reliably learn Hamiltonian dynamics from high-dimensional observations. The learnt Hamiltonian can be applied to sequence modeling or as a normalising flow. | |
30 | COPHY: Counterfactual Learning of Physical Dynamics | Fabien Baradel, Natalia Neverova, Julien Mille, Greg Mori, Christian Wolf | We develop the COPHY benchmark to assess the capacity of the state-of-the-art models for causal physical reasoning in a synthetic 3D environment and propose a model for learning the physical dynamics in a counterfactual setting. | |
31 | Estimating counterfactual treatment outcomes over time through adversarially balanced representations | Ioana Bica, Ahmed M Alaa, James Jordon, Mihaela van der Schaar | In this paper, we introduce the Counterfactual Recurrent Network (CRN), a novel sequence-to-sequence model that leverages the increasingly available patient observational data to estimate treatment effects over time and answer such medical questions. | |
32 | Gradientless Descent: High-Dimensional Zeroth-Order Optimization | Daniel Golovin, John Karro, Greg Kochanski, Chansoo Lee, Xingyou Song, Qiuyi Zhang | Gradientless Descent is a provably efficient gradient-free algorithm that is monotone-invariant and fast for high-dimensional zero-th order optimization. | |
33 | Conditional Learning of Fair Representations | Han Zhao, Amanda Coston, Tameem Adel, Geoffrey J. Gordon | We propose a novel algorithm for learning fair representations that can simultaneously mitigate two notions of disparity among different demographic subgroups. | |
34 | Inductive Matrix Completion Based on Graph Neural Networks | Muhan Zhang, Yixin Chen | We propose an inductive matrix completion model without using side information. | |
35 | Duration-of-Stay Storage Assignment under Uncertainty | Michael Lingzhi Li, Elliott Wolf, Daniel Wintz | We develop a new storage assignment framework with a novel neural network that enables large efficiency gains in the warehouse. | code |
36 | Emergence of functional and structural properties of the head direction system by optimization of recurrent neural networks | Christopher J. Cueva, Peter Y. Wang, Matthew Chin, Xue-Xin Wei | Artificial neural networks trained with gradient descent are capable of recapitulating both realistic neural activity and the anatomical organization of a biological circuit. | |
37 | Deep neuroethology of a virtual rodent | Josh Merel, Diego Aldarondo, Jesse Marshall, Yuval Tassa, Greg Wayne, Bence Olveczky | We built a physical simulation of a rodent, trained it to solve a set of tasks, and analyzed the resulting networks. | |
38 | Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation | Ziyang Tang*, Yihao Feng*, Lihong Li, Dengyong Zhou, Qiang Liu | We develop a new doubly robust estimator based on the infinite horizon density ratio and off policy value estimation. | |
39 | Learning Compositional Koopman Operators for Model-Based Control | Yunzhu Li, Hao He, Jiajun Wu, Dina Katabi, Antonio Torralba | Learning compositional Koopman operators for efficient system identification and model-based control. | |
40 | CLEVRER: Collision Events for Video Representation and Reasoning | Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum | We present a diagnostic dataset for systematic study of temporal and casual reasoning in videos. | |
41 | The Logical Expressiveness of Graph Neural Networks | Pablo Barcel?, Egor V. Kostylev, Mikael Monet, Jorge P?rez, Juan Reutter, Juan Pablo Silva | We characterize the expressive power of GNNs in terms of classical logical languages, separating different GNNs and showing connections with standard notions in Knowledge Representation. | code |
42 | The Break-Even Point on the Optimization Trajectories of Deep Neural Networks | Stanislaw Jastrzebski, Maciej Szymczak, Stanislav Fort, Devansh Arpit, Jacek Tabor, Kyunghyun Cho*, Krzysztof Geras* | In the early phase of training of deep neural networks there exists a “break-even point” which determines properties of the entire optimization trajectory. | |
43 | ALBERT: A Lite BERT for Self-supervised Learning of Language Representations | Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut | A new pretraining method that establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large. | |
44 | Disentangling neural mechanisms for perceptual grouping | Junkyung Kim, Drew Linsley, Kalpit Thakkar, Thomas Serre | Horizontal and top-down feedback connections are responsible for complementary perceptual grouping strategies in biological and recurrent vision systems. | code |
45 | Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees | Binghong Chen, Bo Dai, Qinjie Lin, Guo Ye, Han Liu, Le Song | We propose a meta path planning algorithm which exploits a novel attention-based neural module that can learn generalizable structures from prior experiences to drastically reduce the sample requirement for solving new path planning problems. | code |
46 | Symplectic Recurrent Neural Networks | Zhengdao Chen, Jianyu Zhang, Martin Arjovsky, L?on Bottou | We propose Symplectic Recurrent Neural Networks (SRNNs) as learning algorithms that capture the dynamics of physical systems from observed trajectories. | |
47 | Asymptotics of Wide Networks from Feynman Diagrams | Ethan Dyer, Guy Gur-Ari | A general method for computing the asymptotic behavior of wide networks using Feynman diagrams | |
48 | Learning The Difference That Makes A Difference With Counterfactually-Augmented Data | Divyansh Kaushik, Eduard Hovy, Zachary Lipton | Humans in the loop revise documents to accord with counterfactual labels, resulting resource helps to reduce reliance on spurious associations. | code |
49 | Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? | Simon S. Du, Sham M. Kakade, Ruosong Wang, Lin F. Yang | Exponential lower bounds for value-based and policy-based reinforcement learning with function approximation. | |
50 | Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning | Hengyuan Hu, Jakob N Foerster | We develop Simplified Action Decoder, a simple MARL algorithm that beats previous SOTA on Hanabi by a big margin across 2- to 5-player games. | code |
51 | Network Deconvolution | Chengxi Ye, Matthew Evanusa, Hua He, Anton Mitrokhin, Thomas Goldstein, James A. Yorke, Cornelia Fermuller, Yiannis Aloimonos | We propose a method called network deconvolution that resembles animal vision system to train convolution networks better. | code |
52 | Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension | Xinyun Chen, Chen Liang, Adams Wei Yu, Denny Zhou, Dawn Song, Quoc Le | In this work, we propose the Neural Symbolic Reader (NeRd), which includes a reader, e.g., BERT, to encode the passage and question, and a programmer, e.g., LSTM, to generate a program that is executed to produce the answer. | |
53 | Real or Not Real, that is the Question | Yuanbo Xiangli*, Yubin Deng*, Bo Dai*, Chen Change Loy, Dahua Lin | While generative adversarial networks (GAN) have been widely adopted in various topics, in this paper we generalize the standard GAN to a new perspective by treating realness as a random variable that can be estimated from multiple angles. | |
54 | Dream to Control: Learning Behaviors by Latent Imagination | Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi | We present Dreamer, an agent that learns long-horizon behaviors purely by latent imagination using analytic value gradients. | |
55 | A Probabilistic Formulation of Unsupervised Text Style Transfer | Junxian He, Xinyi Wang, Graham Neubig, Taylor Berg-Kirkpatrick | We formulate a probabilistic latent sequence model to tackle unsupervised text style transfer, and show its effectiveness across a suite of unsupervised text style transfer tasks. | |
56 | Emergent Tool Use From Multi-Agent Autocurricula | Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, Igor Mordatch | Through multi-agent competition, the simple objective of hide-and-seek, and standard reinforcement learning algorithms at scale, we find that agents create a self-supervised autocurriculum inducing multiple distinct rounds of emergent strategy, many of which require sophisticated tool use and coordination. | |
57 | NAS-Bench-102: Extending the Scope of Reproducible Neural Architecture Search | Xuanyi Dong, Yi Yang | A NAS benchmark applicable to almost any NAS algorithms. | code |
58 | Strategies for Pre-training Graph Neural Networks | Weihua Hu*, Bowen Liu*, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, Jure Leskovec | We develop a strategy for pre-training Graph Neural Networks (GNNs) and systematically study its effectiveness on multiple datasets, GNN architectures, and diverse downstream tasks. | |
59 | Behaviour Suite for Reinforcement Learning | Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepezvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado Van Hasselt | Bsuite is a collection of carefully-designed experiments that investigate the core capabilities of RL agents. | code |
60 | FreeLB: Enhanced Adversarial Training for Language Understanding | Chen Zhu, Yu Cheng, Zhe Gan, Siqi Sun, Thomas Goldstein | In this work, we propose a novel adversarial training algorithm – FreeLB, that promotes higher robustness and invariance in the embedding space, by adding adversarial perturbations to word embeddings and minimizing the resultant adversarial risk inside different regions around input samples. | |
61 | Kernelized Wasserstein Natural Gradient | M Arbel, A Gretton, W Li, G Montufar | Estimator for the Wasserstein natural gradient | |
62 | And the Bit Goes Down: Revisiting the Quantization of Neural Networks | Pierre Stock, Armand Joulin, R?mi Gribonval, Benjamin Graham, Herv? J?gou | Using a structured quantization technique aiming at better in-domain reconstruction to compress convolutional neural networks | code |
63 | A Latent Morphology Model for Open-Vocabulary Neural Machine Translation | Duygu Ataman, Wilker Aziz, Alexandra Birch | In this paper, we propose to translate words by modeling word formation through a hierarchical latent variable model which mimics the process of morphological inflection. | |
64 | Understanding Why Neural Networks Generalize Well Through GSNR of Parameters | Jinlong Liu, Yunzhi Bai, Guoqing Jiang, Ting Chen, Huayan Wang | In this paper, we provide a novel perspective on these issues using the gradient signal to noise ratio (GSNR) of parameters during training process of DNNs. | |
65 | Model Based Reinforcement Learning for Atari | Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski | We use video prediction models, a model-based reinforcement learning algorithm and 2h of gameplay per game to train agents for 26 Atari games. | code |
66 | Disagreement-Regularized Imitation Learning | Kiante Brantley, Wen Sun, Mikael Henaff | Method for addressing covariate shift in imitation learning using ensemble uncertainty | |
67 | Stable Rank Normalization for Improved Generalization in Neural Networks and GANs | Amartya Sanyal, Philip H. Torr, Puneet K. Dokania | We propose Stable Rank Normalisation, a new regularisor based on recent generelization bounds and show how to optimize it with extensive experiments. | |
68 | Measuring the Reliability of Reinforcement Learning Algorithms | Stephanie C.Y. Chan, Anoop Korattikara, Sam Fishman, John Canny, Sergio Guadarrama | A novel set of metrics for measuring reliability of reinforcement learning algorithms (+ accompanying statistical tests) | |
69 | Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue | Byeongchang Kim, Jaewoo Ahn, Gunhee Kim | Our approach is the first attempt to leverage a sequential latent variable model for knowledge selection in the multi-turn knowledge-grounded dialogue. It achieves the new state-of-the-art performance on Wizard of Wikipedia benchmark. | |
70 | Neural Tangents: Fast and Easy Infinite Neural Networks in Python | Roman Novak, Lechao Xiao, Jiri Hron, Jaehoon Lee, Jascha Sohl-Dickstein, Samuel S. Schoenholz | Keras for infinite neural networks. | code |
71 | Self-labelling via simultaneous clustering and representation learning | Asano YM., Rupprecht C., Vedaldi A. | We propose a self-supervised learning formulation that simultaneously learns feature representations and useful dataset labels by optimizing the common cross-entropy loss for features _and_ labels, while maximizing information. | |
72 | The intriguing role of module criticality in the generalization of deep networks | Niladri Chatterji, Behnam Neyshabur, Hanie Sedghi | We study the phenomenon that some modules of DNNs are more \emph{critical} than others. Our analysis leads us to propose a complexity measure, that is able to explain the superior generalization performance of some architectures over others. | |
73 | Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks | Sanjeev Arora, Simon S. Du, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu | We verify neural tangent kernel is powerful on small data via experiments on UCI datasets, small CIFAR 10 and low-shot learning on VOC07. | |
74 | Differentiation of Blackbox Combinatorial Solvers | Marin Vlastelica Pogancic, Anselm Paulus, Vit Musil, Georg Martius, Michal Rolinek | In this work, we present a method that implements an efficient backward pass through blackbox implementations of combinatorial solvers with linear objective functions. | code |
75 | Scaling Autoregressive Video Models | Dirk Weissenborn, Oscar T?ckstr?m, Jakob Uszkoreit | We present a novel autoregressive video generation that achieves strong results on popular datasets and produces encouraging continuations of real world videos. | |
76 | The Ingredients of Real World Robotic Reinforcement Learning | Henry Zhu, Justin Yu, Abhishek Gupta, Dhruv Shah, Kristian Hartikainen, Avi Singh, Vikash Kumar, Sergey Levine | System to learn robotic tasks in the real world with reinforcement learning without instrumentation | |
77 | Meta-Learning Acquisition Functions for Transfer Learning in Bayesian Optimization | Michael Volpp, Lukas Froehlich, Kirsten Fischer, Andreas Doerr, Stefan Falkner, Frank Hutter, Christian Daniel | We perform efficient and flexible transfer learning in the framework of Bayesian optimization through meta-learned neural acquisition functions. | code |
78 | Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning | Dexter R.R. Scobee, S. Shankar Sastry | Our method infers constraints on task execution by leveraging the principle of maximum entropy to quantify how demonstrations differ from expected, un-constrained behavior. | code |
79 | Spectral Embedding of Regularized Block Models | Nathan De Lara, Thomas Bonald | Graph regularization forces spectral embedding to focus on the largest clusters, making the representation less sensitive to noise. | code |
80 | Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models | Xisen Jin, Junyi Du, Zhongyu Wei, Xiangyang Xue, Xiang Ren | We propose measurement of phrase importance and algorithms for hierarchical explanation of neural sequence model predictions | |
81 | word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement | Aliakbar Panahi, Seyran Saeedi, Tom Arodz | We use ideas from quantum computing to proposed word embeddings that utilize much fewer trainable parameters. | |
82 | What Can Neural Networks Reason About? | Keyulu Xu, Jingling Li, Mozhi Zhang, Simon S. Du, Ken-ichi Kawarabayashi, Stefanie Jegelka | We develop a theoretical framework to characterize which reasoning tasks a neural network can learn well. | code |
83 | Training individually fair ML models with sensitive subspace robustness | Mikhail Yurochkin, Amanda Bower, Yuekai Sun | Algorithm for training individually fair classifier using adversarial robustness | code |
84 | Learning from Rules Generalizing Labeled Exemplars | Abhijeet Awasthi, Sabyasachi Ghosh, Rasna Goyal, Sunita Sarawagi | Coupled rule-exemplar supervision and a implication loss helps to jointly learn to denoise rules and imply labels. | code |
85 | Directional Message Passing for Molecular Graphs | Johannes Klicpera, Janek Gro?, Stephan G?nnemann | Directional message passing incorporates spatial directional information to improve graph neural networks. | code |
86 | Explanation by Progressive Exaggeration | Sumedha Singla, Brian Pollack, Junxiang Chen, Kayhan Batmanghelich | A method to explain a classifier, by generating visual perturbation of an image by exaggerating or diminishing the semantic features that the classifier associates with a target label. | |
87 | Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network | Taiji Suzuki | In this paper, we give a unified frame-work that can convert compression based bounds to those for non-compressed original networks. | |
88 | At Stability’s Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks? | Niv Giladi, Mor Shpigel Nacson, Elad Hoffer, Daniel Soudry | How to prevent stale gradients (in asynchronous SGD) from changing minima stability and degrade steady state generalization? | code |
89 | Disentanglement through Nonlinear ICA with General Incompressible-flow Networks (GIN) | Peter Sorrenson, Ullrich K?the | Recent breakthrough work by Khemakhem et al. (2019) on nonlinear ICA has answered this question for a broad class of conditional generative processes. We extend this important result in a direction relevant for application to real-world data. | |
90 | Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps | Tri Dao, Nimit Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder, Megan Leszczynski, Atri Rudra | We propose a differentiable family of “kaleidoscope matrices,” prove that all structured matrices can be represented in this form, and use them to replace hand-crafted linear maps in deep learning models. | code |
91 | Improving Generalization in Meta Reinforcement Learning using Neural Objectives | Louis Kirsch, Sjoerd van Steenkiste, Juergen Schmidhuber | We introduce MetaGenRL, a novel meta reinforcement learning algorithm. Unlike prior work, MetaGenRL can generalize to new environments that are entirely different from those used for meta-training. | |
92 | Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks | Haoran You, Chaojian Li, Pengfei Xu, Yonggan Fu, Yue Wang, Xiaohan Chen, Yingyan Lin, Zhangyang Wang, Richard G. Baraniuk | In this paper, we discover for the first time that the winning tickets can be identified at the very early training stage, which we term as early-bird (EB) tickets, via low-cost training schemes (e.g., early stopping and low-precision training) at large learning rates. | |
93 | Truth or backpropaganda? An empirical investigation of deep learning theory | Micah Goldblum, Jonas Geiping, Avi Schwarzschild, Michael Moeller, Tom Goldstein | We study the prevalence of local minima in loss landscapes, whether small-norm parameter vectors generalize better (and whether this explains the advantages of weight decay), whether wide-network theories (like the neural tangent kernel) describe the behaviors of classifiers, and whether the rank of weight matrices can be linked to generalization and robustness in real-world networks. | |
94 | Neural Arithmetic Units | Andreas Madsen, Alexander Rosenberg Johansen | We present two new neural network components: the Neural Addition Unit (NAU), which can learn to add and subtract; and Neural Multiplication Unit (NMU) that can multiply subsets of a vector. | code |
95 | DeepSphere: a graph-based spherical CNN | Micha?l Defferrard, Martino Milani, Fr?d?rick Gusset, Nathana?l Perraudin | A graph-based spherical CNN that strikes an interesting balance of trade-offs for a wide variety of applications. | |
96 | SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models | Yucen Luo, Alex Beatson, Mohammad Norouzi, Jun Zhu, David Duvenaud, Ryan P. Adams, Ricky T. Q. Chen | We create an unbiased estimator for the log probability of latent variable models, extending such models to a larger scope of applications. | |
97 | Deep Learning For Symbolic Mathematics | Guillaume Lample, Fran?ois Charton | We train a neural network to compute function integrals, and to solve complex differential equations. | |
98 | Making Sense of Reinforcement Learning and Probabilistic Inference | Brendan O’Donoghue, Ian Osband, Catalin Ionescu | Popular algorithms that cast `”RL as Inference” ignore the role of uncertainty and exploration. We highlight the importance of these issues and present a coherent framework for RL and inference that handles them gracefully. | |
99 | Unbiased Contrastive Divergence Algorithm for Training Energy-Based Latent Variable Models | Yixuan Qiu, Lingsong Zhang, Xiao Wang | We have developed a new training algorithm for energy-based latent variable models that completely removes the bias of contrastive divergence. | |
100 | A Mutual Information Maximization Perspective of Language Representation Learning | Lingpeng Kong, Cyprien de Masson d’Autume, Lei Yu, Wang Ling, Zihang Dai, Dani Yogatama | We provide an example by drawing inspirations from related methods based on mutual information maximization that have been successful in computer vision, and introduce a simple self-supervised objective that maximizes the mutual information between a global sentence representation and n-grams in the sentence. | |
101 | Energy-based models for atomic-resolution protein conformations | Yilun Du, Joshua Meier, Jerry Ma, Rob Fergus, Alexander Rives | Energy-based models trained on crystallized protein structures predict native side chain configuration and automatically discover molecular energy features. | |
102 | Depth-Width Trade-offs for ReLU Networks via Sharkovsky’s Theorem | Vaggos Chatziafratis, Sai Ganesh Nagarajan, Ioannis Panageas, Xiao Wang | In this work, we point to a new connection between DNNs expressivity and Sharkovsky?s Theorem from dynamical systems, that enables us to characterize the depth-width trade-offs of ReLU networks | code |
103 | Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint | Jimmy Ba, Murat Erdogdu, Taiji Suzuki, Denny Wu, Tianzong Zhang | Derived population risk of two-layer neural networks in high dimensions and examined presence / absence of “double descent”. | |
104 | Reconstructing continuous distributions of 3D protein structure from cryo-EM images | Ellen D. Zhong, Tristan Bepler, Joseph H. Davis, Bonnie Berger | We propose a deep generative model of volumes for 3D cryo-EM reconstruction from unlabelled 2D images and show that it can learn can learn continuous deformations in protein structure. | |
105 | PROGRESSIVE LEARNING AND DISENTANGLEMENT OF HIERARCHICAL REPRESENTATIONS | Zhiyuan Li, Jaideep Vitthal Murkute, Prashnna Kumar Gyawali, Linwei Wang | We proposed a progressive learning method to improve learning and disentangling latent representations at different levels of abstraction. | |
106 | AN EXPONENTIAL LEARNING RATE SCHEDULE FOR BATCH NORMALIZED NETWORKS | Zhiyuan Li, Sanjeev Arora | We propose an exponential learning rate schedule for networks with BatchNorm, which surprisingly performs well in practice and is provably equivalent to popular LR schedules like Step Decay. | |
107 | Geom-GCN: Geometric Graph Convolutional Networks | Hongbin Pei, Bingzhe Wei, Kevin Chen-Chuan Chang, Yu Lei, Bo Yang | From the observations on classical neural network and network geometry, we propose a novel geometric aggregation scheme for graph neural networks to overcome the two weaknesses. | code |
TABLE 3: ICLR 2020 Posters
Title | Authors | Highlight | Code | |
---|---|---|---|---|
1 | Large Batch Optimization for Deep Learning: Training BERT in 76 minutes | Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, Cho-Jui Hsieh | A fast optimizer for general applications and large-batch training. | |
2 | SELF: Learning to Filter Noisy Labels with Self-Ensembling | Duc Tam Nguyen, Chaithanya Kumar Mummadi, Thi Phuong Nhung Ngo, Thi Hoai Phuong Nguyen, Laura Beggel, Thomas Brox | We propose a self-ensemble framework to train more robust deep learning models under noisy labeled datasets. | |
3 | Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation | Yu Chen, Lingfei Wu, Mohammed J. Zaki | To address these limitations, in this paper, we propose a reinforcement learning (RL) based graph-to sequence (Graph2Seq) model for QG. | |
4 | Sharing Knowledge in Multi-Task Deep Reinforcement Learning | Carlo D’Eramo, Davide Tateo, Andrea Bonarini, Marcello Restelli, Jan Peters | A study on the benefit of sharing representation in Multi-Task Reinforcement Learning. | code |
5 | On the Weaknesses of Reinforcement Learning for Neural Machine Translation | Leshem Choshen, Lior Fox, Zohar Aizenbud, Omri Abend | Reinforcment practices for machine translation performance gains might not come from better predictions. | |
6 | StructPool: Structured Graph Pooling via Conditional Random Fields | Hao Yuan, Shuiwang Ji | A novel graph pooling method considering relationships between different nodes via conditional random fields. | |
7 | Learning deep graph matching with channel-independent embedding and Hungarian attention | Tianshu Yu, Runzhong Wang, Junchi Yan, Baoxin Li | We proposed a deep graph matching method with novel channel-independent embedding and Hungarian loss, which achieved state-of-the-art performance. | |
8 | Graph inference learning for semi-supervised classification | Chunyan Xu, Zhen Cui, Xiaobin Hong, Tong Zhang, Jian Yang, Wei Liu | We propose a novel graph inference learning framework by building structure relations to infer unknown node labels from those labeled nodes in an end-to-end way. | |
9 | SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards | Siddharth Reddy, Anca D. Dragan, Sergey Levine | A simple and effective alternative to adversarial imitation learning: initialize experience replay buffer with demonstrations, set their reward to +1, set reward for all other data to 0, run Q-learning or soft actor-critic to train. | |
10 | Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data | Sergei Popov, Stanislav Morozov, Artem Babenko | We propose a new DNN architecture for deep learning on tabular data | code |
11 | Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification | Yixiao Ge, Dapeng Chen, Hongsheng Li | A framework that conducts online refinement of pseudo labels with a novel soft softmax-triplet loss for unsupervised domain adaptation on person re-identification. | code |
12 | Automatically Discovering and Learning New Visual Categories with Ranking Statistics | Kai Han, Sylvestre-Alvise Rebuffi, Sebastien Ehrhardt, Andrea Vedaldi, Andrew Zisserman | A method to automatically discover new categories in unlabelled data, by effectively transferring knowledge from labelled data of other different categories using feature rank statistics. | |
13 | Maxmin Q-learning: Controlling the Estimation Bias of Q-learning | Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White | We propose a new variant of Q-learning algorithm called Maxmin Q-learning which provides a parameter-tuning mechanism to flexibly control bias. | |
14 | Federated Adversarial Domain Adaptation | Xingchao Peng, Zijun Huang, Yizhe Zhu, Kate Saenko | we present a principled approach to the problem of federated domain adaptation, which aims to align the representations learned among the different nodes with the data distribution of the target node. | code |
15 | Depth-Adaptive Transformer | Maha Elbayad, Jiatao Gu, Edouard Grave, Michael Auli | Sequence model that dynamically adjusts the amount of computation for each input. | |
16 | DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures | Huanrui Yang, Wei Wen, Hai Li | We propose almost everywhere differentiable and scale invariant regularizers for DNN pruning, which can lead to supremum sparsity through standard SGD training. | |
17 | Evaluating The Search Phase of Neural Architecture Search | Kaicheng Yu, Christian Sciuto, Martin Jaggi, Claudiu Musat, Mathieu Salzmann | We empirically disprove a fundamental hypothesis of the widely-adopted weight sharing strategy in neural architecture search and explain why the state-of-the-arts NAS algorithms performs similarly to random search. | |
18 | Diverse Trajectory Forecasting with Determinantal Point Processes | Ye Yuan, Kris M. Kitani | We learn a diversity sampling function with DPPs to obtain a diverse set of samples from a generative model. | |
19 | Prox-SGD: Training Structured Neural Networks under Regularization and Constraints | Yang Yang, Yaxiong Yuan, Avraam Chatzimichailidis, Ruud JG van Sloun, Lei Lei, Symeon Chatzinotas | We propose a convergent proximal-type stochastic gradient descent algorithm for constrained nonsmooth nonconvex optimization problems | |
20 | LAMAL: LAnguage Modeling Is All You Need for Lifelong Language Learning | Fan-Keng Sun, Cheng-Hao Ho, Hung-Yi Lee | Language modeling is all you need for lifelong language learning. | code |
21 | Learning Expensive Coordination: An Event-Based Deep RL Approach | Zhenyu Shi, Runsheng Yu, Xinrun Wang, Rundong Wang, Youzhi Zhang, Hanjiang Lai, Bo An | We propose an event-based policy gradient to train the leader and an action abstraction policy gradient to train the followers in leader-follower Markov game. | |
22 | Curvature Graph Network | Ze Ye, Kin Sum Liu, Tengfei Ma, Jie Gao, Chao Chen | We propose a novel network architecture that incorporates advanced graph structural features. | |
23 | Distance-Based Learning from Errors for Confidence Calibration | Chen Xing, Sercan Arik, Zizhao Zhang, Tomas Pfister | To improve confidence calibration of DNNs, we propose a novel training method, distance-based learning from errors (DBLE). | code |
24 | Deep Learning of Determinantal Point Processes via Proper Spectral Sub-gradient | Tianshu Yu, Yikang Li, Baoxin Li | We proposed a specific back-propagation method via proper spectral sub-gradient to integrate determinantal point process to deep learning framework. | |
25 | N-BEATS: Neural basis expansion analysis for interpretable time series forecasting | Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, Yoshua Bengio | A novel deep interpretable architecture that achieves state of the art on three large scale univariate time series forecasting datasets | |
26 | Automated Relational Meta-learning | Huaxiu Yao, Xian Wu, Zhiqiang Tao, Yaliang Li, Bolin Ding, Zhenhui Li | Addressing task heterogeneity problem in meta-learning by introducing meta-knowledge graph | |
27 | To Relieve Your Headache of Training an MRF, Take AdVIL | Chongxuan Li, Chao Du, Kun Xu, Max Welling, Jun Zhu, Bo Zhang | We propose a black-box algorithm called AdVIL to perform inference and learning on a general Markov random field. | code |
28 | Linear Symmetric Quantization of Neural Networks for Low-precision Integer Hardware | Xiandong Zhao, Ying Wang, Xuyi Cai, Cheng Liu, Lei Zhang | We introduce an efficient quantization process that allows for performance acceleration on specialized integer-only neural network accelerator. | code |
29 | Weakly Supervised Clustering by Exploiting Unique Class Count | Mustafa Umit Oner, Hwee Kuan Lee, Wing-Kin Sung | A weakly supervised learning based clustering framework performs comparable to that of fully supervised learning models by exploiting unique class count. | code |
30 | Scalable and Order-robust Continual Learning with Additive Parameter Decomposition | Jaehong Yoon, Saehoon Kim, Eunho Yang, Sung Ju Hwang | To tackle these practical challenges, we propose a novel continual learning method that is scalable as well as order-robust, which instead of learning a completely shared set of weights, represents the parameters for each task as a sum of task-shared and sparse task-adaptive parameters. | code |
31 | Continual Learning with Adaptive Weights (CLAW) | Tameem Adel, Han Zhao, Richard E. Turner | A continual learning framework which learns to automatically adapt its architecture based on a proposed variational inference algorithm. | |
32 | Transferable Perturbations of Deep Feature Distributions | Nathan Inkawhich, Kevin Liang, Lawrence Carin, Yiran Chen | We show that perturbations based-on intermediate feature distributions yield more transferable adversarial examples and allow for analysis of the affects of adversarial perturbations on intermediate representations. | |
33 | A Learning-based Iterative Method for Solving Vehicle Routing Problems | Hao Lu, Xingwen Zhang, Shuang Yang | In this paper, we present the first learning based approach for CVRP that is efficient in solving speed and at the same time outperforms OR methods. | |
34 | Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring | Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, Jason Weston | In this work, we develop a new transformer architecture, the Poly-encoder, that learns global rather than token level self-attention features. | |
35 | AutoQ: Automated Kernel-Wise Neural Network Quantization | Qian Lou, Feng Guo, Minje Kim, Lantao Liu, Lei Jiang. | Accurate, Fast and Automated Kernel-Wise Neural Network Quantization with Mixed Precision using Hierarchical Deep Reinforcement Learning | |
36 | Understanding Architectures Learnt by Cell-based Neural Architecture Search | Yao Shu, Wei Wang, Shaofeng Cai | We empirically and theoretically show that the common connection pattern contributes to a smooth loss landscape and more accurate gradient information, and therefore fast convergence. | |
37 | SVQN: Sequential Variational Soft Q-Learning Networks | Shiyu Huang, Hang Su, Jun Zhu, Ting Chen | SVQNs formalizes the inference of hidden states and maximum entropy reinforcement learning under a unified graphical model and optimizes the two modules jointly. | |
38 | Ranking Policy Gradient | Kaixiang Lin, Jiayu Zhou | We propose ranking policy gradient that learns the optimal rank of actions to maximize return. We propose a general off-policy learning framework with the properties of optimality preserving, variance reduction, and sample-efficiency. | |
39 | On Mutual Information Maximization for Representation Learning | Michael Tschannen, Josip Djolonga, Paul K. Rubenstein, Sylvain Gelly, Mario Lucic | The success of recent mutual information (MI)-based representation learning approaches strongly depends on the inductive bias in both the choice of network architectures and the parametrization of the employed MI estimators. | code |
40 | Observational Overfitting in Reinforcement Learning | Xingyou Song, Yiding Jiang, Yilun Du, Behnam Neyshabur | We isolate one factor of RL generalization by analyzing the case when the agent only overfits to the observations. We show that architectural implicit regularizations occur in this regime. | |
41 | Enhancing Transformation-Based Defenses Against Adversarial Attacks with a Distribution Classifier | Connie Kou, Hwee Kuan Lee, Teck Khim Ng, Ee-Chien Chang | We enhance existing transformation-based defenses by using a distribution classifier on the distribution of softmax obtained from transformed images. | |
42 | Additive Powers-of-Two Quantization: A Non-uniform Discretization for Neural Networks | Yuhang Li, Xin Dong, Wei Wang | We proposed Additive Powers-of-Two (APoT) quantization, an ef?cient nonuniform quantization scheme that attends to the bell-shaped and long-tailed distribution of weights in neural networks. | |
43 | Lazy-CFR: fast and near-optimal regret minimization for extensive games with imperfect information | Yichi Zhou, Tongzheng Ren, Jialian Li, Dong Yan, Jun Zhu | In this paper, we present Lazy-CFR, a CFR algorithm that adopts a lazy update strategy to avoid traversing the whole game tree in each round. | |
44 | Knowledge Consistency between Neural Networks and Beyond | Ruofan Liang, Tianlin Li, Longfei Li, Quanshi Zhang | This paper aims to analyze knowledge consistency between pre-trained deep neural networks. | |
45 | Image-guided Neural Object Rendering | Justus Thies, Michael Zollh?fer, Christian Theobalt, Marc Stamminger, Matthias Nie?ner | We propose a learned image-guided rendering technique that combines the benefits of image-based rendering and GAN-based image synthesis while considering view-dependent effects. | |
46 | Implicit Bias of Gradient Descent based Adversarial Training on Separable Data | Yan Li, Ethan X.Fang, Huan Xu, Tuo Zhao | The solution of gradient descent based adversarial training converges in direction to a robust max margin solution that is adapted to adversary geometry, using L2 perturbation also shows significant speed-up in convergence compared to clean training. | |
47 | TabFact: A Large-scale Dataset for Table-based Fact Verification | Wenhu Chen, Hongmin Wang, Jianshu Chen, Yunkai Zhang, Hong Wang, Shiyang Li, Xiyou Zhou, William Yang Wang | We propose a new dataset to investigate the entailment problem under semi-structured table as premise | |
48 | ES-MAML: Simple Hessian-Free Meta Learning | Xingyou Song, Wenbo Gao, Yuxiang Yang, Krzysztof Choromanski, Aldo Pacchiano, Yunhao Tang | We provide a new framework for MAML in the ES/blackbox setting, and show that it allows deterministic and linear policies, better exploration, and non-differentiable adaptation operators. | |
49 | Neural Stored-program Memory | Hung Le, Truyen Tran, Svetha Venkatesh | A neural simulation of Universal Turing Machine | |
50 | Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation | Suraj Nair, Chelsea Finn | Hierarchical visual foresight learns to generate visual subgoals that break down long-horizon tasks into subtasks, using only self-supervision. | code |
51 | Multi-agent Reinforcement Learning for Networked System Control | Tianshu Chu, Sandeep Chinchali, Sachin Katti | This paper proposes a new formulation and a new communication protocol for networked multi-agent control problems | code |
52 | FSPool: Learning Set Representations with Featurewise Sort Pooling | Yan Zhang, Jonathon Hare, Adam Pr?gel-Bennett | Sort in encoder and undo sorting in decoder to avoid responsibility problem in set auto-encoders | code |
53 | Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction | Taeuk Kim, Jihun Choi, Daniel Edmiston, Sang-goo Lee | In line with such interest, we propose a novel method that assists us in investigating the extent to which pre-trained LMs capture the syntactic notion of constituency. | |
54 | Dynamically Pruned Message Passing Networks for Large-scale Knowledge Graph Reasoning | Xiaoran Xu, Wei Feng, Yunsheng Jiang, Xiaohui Xie, Zhiqing Sun, Zhi-Hong Deng | We propose to learn an input-dependent subgraph, dynamically and selectively expanded, to explicitly model a sequential reasoning process. | code |
55 | Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks | Tianyu Pang*, Kun Xu*, Jun Zhu | We exploit the global linearity of the mixup-trained models in inference to break the locality of the adversarial perturbations. | code |
56 | Theory and Evaluation Metrics for Learning Disentangled Representations | Kien Do, Truyen Tran | We make two theoretical contributions to disentanglement learning by (a) defining precise semantics of disentangled representations, and (b) establishing robust metrics for evaluation. | |
57 | Measuring Compositional Generalization: A Comprehensive Method on Realistic Data | Daniel Keysers, Nathanael Sch?rli, Nathan Scales, Hylke Buisman, Daniel Furrer, Sergii Kashubin, Nikola Momchev, Danila Sinopalnikov, Lukasz Stafiniak, Tibor Tihon, Dmitry Tsarkov, Xiao Wang, Marc van Zee, Olivier Bousquet | Benchmark and method to measure compositional generalization by maximizing divergence of compound frequency at small divergence of atom frequency. | code |
58 | Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness | Tianyu Pang, Kun Xu, Yinpeng Dong, Chao Du, Ning Chen, Jun Zhu | Applying the softmax function in training leads to indirect and unexpected supervision on features. We propose a new training objective to explicitly induce dense feature regions for locally sufficient samples to benefit adversarial robustness. | code |
59 | The Implicit Bias of Depth: How Incremental Learning Drives Generalization | Daniel Gissin, Shai Shalev-Shwartz, Amit Daniely | We study the sparsity-inducing bias of deep models, caused by their learning dynamics. | |
60 | The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget | Anirudh Goyal, Yoshua Bengio, Matthew Botvinick, Sergey Levine | Training agents with adaptive computation based on information bottleneck can promote generalization. | |
61 | Learning the Arrow of Time for Problems in Reinforcement Learning | Nasim Rahaman, Steffen Wolf, Anirudh Goyal, Roman Remme, Yoshua Bengio | We learn the arrow of time for MDPs and use it to measure reachability, detect side-effects and obtain a curiosity reward signal. | code |
62 | Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives | Anirudh Goyal, Shagun Sodhani, Jonathan Binas, Xue Bin Peng, Sergey Levine, Yoshua Bengio | Learning an implicit master policy, as a master policy in HRL can fail to generalize. | |
63 | Robust Local Features for Improving the Generalization of Adversarial Training | Chuanbiao Song, Kun He, Jiadong Lin, Liwei Wang, John E. Hopcroft | We propose a new stream of adversarial training approach called Robust Local Features for Adversarial Training (RLFAT) that significantly improves both the adversarially robust generalization and the standard generalization. | code |
64 | Analysis of Video Feature Learning in Two-Stream CNNs on the Example of Zebrafish Swim Bout Classification | Bennet Breier, Arno Onken | We demonstrate the utility of a recent AI explainability technique by visualizing the learned features of a CNN trained on binary classification of zebrafish movements. | code |
65 | Learning Disentangled Representations for CounterFactual Regression | Negar Hassanpour, Russell Greiner | This paper is an attempt to conceptualize this line of thought and provide a path to explore it further.In this work, we propose an algorithm to (1) identify disentangled representations of the above-mentioned underlying factors from any given observational dataset D and (2) leverage this knowledge to reduce, as well as account for, the negative impact of selection bias on estimating the treatment effects from D. | code |
66 | Exploration in Reinforcement Learning with Deep Covering Options | Yuu Jinnai, Jee Won Park, Marlos C. Machado, George Konidaris | We introduce a method to automatically discover task-agnostic options that encourage exploration for reinforcement learning. | |
67 | AE-OT: A NEW GENERATIVE MODEL BASED ON EXTENDED SEMI-DISCRETE OPTIMAL TRANSPORT | Dongsheng An, Yang Guo, Na Lei, Zhongxuan Luo, Shing-Tung Yau, Xianfeng Gu | In this work, wegive a theoretic explanation of the both problems by Figalli?s regularity theory ofoptimal transportation maps. | |
68 | Logic and the 2-Simplicial Transformer | James Clift, Dmitry Doryn, Daniel Murfet, James Wallbridge | We introduce the 2-simplicial Transformer and show that this architecture is a useful inductive bias for logical reasoning in the context of deep reinforcement learning. | code |
69 | Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards | Allan Zhou, Eric Jang, Daniel Kappler, Alex Herzog, Mohi Khansari, Paul Wohlhart, Yunfei Bai, Mrinal Kalakrishnan, Sergey Levine, Chelsea Finn | In this work, we propose a method that can learn to learn from both demonstrations and trial-and-error experience with sparse reward feedback. | code |
70 | Fooling Detection Alone is Not Enough: Adversarial Attack against Multiple Object Tracking | Yunhan Jia, Yantao Lu, Junjie Shen, Qi Alfred Chen, Hao Chen, Zhenyu Zhong, Tao Wei | We study the adversarial machine learning attacks against the Multiple Object Tracking mechanisms for the first time. | code |
71 | DivideMix: Learning with Noisy Labels as Semi-supervised Learning | Junnan Li, Steven C.H. Hoi, Richard Socher | We propose a novel framework for learning with noisy labels by leveraging semi-supervised learning. | |
72 | Improving Adversarial Robustness Requires Revisiting Misclassified Examples | Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma, Quanquan Gu | By differentiating misclassified and correctly classified data, we propose a new misclassification aware defense that improves the state-of-the-art adversarial robustness. | |
73 | V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control | H. Francis Song, Abbas Abdolmaleki, Jost Tobias Springenberg, Aidan Clark, Hubert Soyer, Jack W. Rae, Seb Noury, Arun Ahuja, Siqi Liu, Dhruva Tirumala, Nicolas Heess, Dan Belov, Martin Riedmiller, Matthew M. Botvinick | A state-value function-based version of MPO that achieves good results in a wide range of tasks in discrete and continuous control. | |
74 | Attributes Obfuscation with Complex-Valued Features | Liyao Xiang, Hao Zhang, Haotian Ma, Yifan Zhang, Jie Ren, Quanshi Zhang | We propose a generic method to revise a conventional neural network to boost the challenge of adversarially inferring about the input but still yields useful outputs. | |
75 | Accelerating SGD with momentum for over-parameterized learning | Chaoyue Liu, Mikhail Belkin | This work proves the non-acceleration of Nesterov SGD with any hyper-parameters, and proposes new algorithm which provably accelerates SGD in the over-parameterized setting. | code |
76 | A critical analysis of self-supervision, or what we can learn from a single image | Asano YM., Rupprecht C., Vedaldi A. | We evaluate self-supervised feature learning methods and find that with sufficient data augmentation early layers can be learned using just one image. This is informative about self-supervision and the role of augmentations. | |
77 | Disentangling Factors of Variations Using Few Labels | Francesco Locatello, Michael Tschannen, Stefan Bauer, Gunnar R?tsch, Bernhard Sch?lkopf, Olivier Bachem | In this paper, we investigate the impact of such supervision on state-of-the-art disentanglement methods and perform a large scale study, training over 52000 models under well-defined and reproducible experimental conditions. | |
78 | Functional vs. parametric equivalence of ReLU networks | Mary Phuong, Christoph H. Lampert | We prove that there exist ReLU networks whose parameters are almost uniquely determined by the function they implement. | |
79 | Input Complexity and Out-of-distribution Detection with Likelihood-based Generative Models | Joan Serr?, David ?lvarez, Vicen? G?mez, Olga Slizovskaia, Jos? F. N??ez, Jordi Luque | We pose that generative models’ likelihoods are excessively influenced by the input’s complexity, and propose a way to compensate it when detecting out-of-distribution inputs | |
80 | RTFM: Generalising to New Environment Dynamics via Reading | Victor Zhong, Tim Rockt?schel, Edward Grefenstette | We show language understanding via reading is promising way to learn policies that generalise to new environments. | |
81 | What graph neural networks cannot learn: depth vs width | Andreas Loukas | Several graph problems are impossible unless the product of a graph neural network’s depth and width exceeds (a function of) the graph size. | |
82 | Progressive Memory Banks for Incremental Domain Adaptation | Nabiha Asghar, Lili Mou, Kira A. Selby, Kevin D. Pantasdo, Pascal Poupart, Xin Jiang | We present a neural memory-based architecture for incremental domain adaptation, and provide theoretical and empirical results. | code |
83 | Automated curriculum generation through setter-solver interactions | Andrew Lampinen, Sebastien Racaniere, Adam Santoro, David Reichert, Vlad Firoiu, Timothy Lillicrap | We investigate automatic curriculum generation and identify a number of losses useful to learn to generate a curriculum of tasks. | |
84 | On Identifiability in Transformers | Gino Brunner, Yang Liu, Damian Pascual, Oliver Richter, Massimiliano Ciaramita, Roger Wattenhofer | We investigate the identifiability and interpretability of attention distributions and tokens within contextual embeddings in the self-attention based BERT model. | |
85 | Exploring Model-based Planning with Policy Networks | Tingwu Wang, Jimmy Ba | how to achieve state-of-the-art performance by combining policy network in model-based planning | |
86 | Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling | Yuping Luo, Huazhe Xu, Tengyu Ma | We introduce a notion of conservatively-extrapolated value functions, which provably lead to policies that can self-correct to stay close to the demonstration states, and learn them with a novel negative sampling technique. | |
87 | Geometric Insights into the Convergence of Nonlinear TD Learning | David Brandfonbrener, Joan Bruna | Here we take a first step towards extending theoretical convergence guarantees to TD learning with nonlinear function approximation. | |
88 | Few-shot Text Classification with Distributional Signatures | Yujia Bao, Menghua Wu, Shiyu Chang, Regina Barzilay | Meta-learning methods used for vision, directly applied to NLP, perform worse than nearest neighbors on new classes; we can do better with distributional signatures. | code |
89 | Escaping Saddle Points Faster with Stochastic Momentum | Jun-Kun Wang, Chi-Heng Lin, Jacob Abernethy | Higher momentum parameter $\beta$ helps for escaping saddle points faster | |
90 | Adversarial Policies: Attacking Deep Reinforcement Learning | Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell | Deep RL policies can be attacked by other agents taking actions so as to create natural observations that are adversarial. | code |
91 | VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation | Manoj Kumar, Mohammad Babaeizadeh, Dumitru Erhan, Chelsea Finn, Sergey Levine, Laurent Dinh, Durk Kingma | We demonstrate that flow-based generative models offer a viable and competitive approach to generative modeling of video. | code |
92 | GLAD: Learning Sparse Graph Recovery | Harsh Shrivastava, Xinshi Chen, Binghong Chen, Guanghui Lan, Srinivas Aluru, Han Liu, Le Song | A data-driven learning algorithm based on unrolling the Alternating Minimization optimization for sparse graph recovery. | code |
93 | Pruned Graph Scattering Transforms | Vassilis N. Ioannidis, Siheng Chen, Georgios B. Giannakis | The present work addresses some limitations of GSTs by introducing a novel so-termed pruned (p)GST approach. | |
94 | Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model | Wenhan Xiong, Jingfei Du, William Yang Wang, Veselin Stoyanov | In this work, we further investigate the extent to which pretrained models such as BERT capture knowledge using a zero-shot fact completion task. | |
95 | Can gradient clipping mitigate label noise? | Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar | Gradient clipping doesn’t endow robustness to label noise, but a simple loss-based variant does. | |
96 | Editable Neural Networks | Anton Sinitsin, Vsevolod Plokhotnyuk, Dmitry Pyrkin, Sergei Popov, Artem Babenko | Training neural networks so you can efficiently patch them later. | code |
97 | LEARNING EXECUTION THROUGH NEURAL CODE FUSION | Zhan Shi, Kevin Swersky, Daniel Tarlow, Parthasarathy Ranganathan, Milad Hashemi | In this work, wepropose a new approach using GNNs to learn fused representations of generalsource code and its execution. | code |
98 | FasterSeg: Searching for Faster Real-time Semantic Segmentation | Wuyang Chen, Xinyu Gong, Xianming Liu, Qian Zhang, Yuan Li, Zhangyang Wang | We present a real-time segmentation model automatically discovered by a multi-scale NAS framework, achieving 30% faster than state-of-the-art models. | code |
99 | Difference-Seeking Generative Adversarial Network–Unseen Sample Generation | Yi Lin Sung, Sung-Hsien Hsieh, Soo-Chang Pei, Chun-Shien Lu | We proposed a novel GAN framework to generate unseen data. | code |
100 | Stochastic AUC Maximization with Deep Neural Networks | Mingrui Liu, Zhuoning Yuan, Yiming Ying, Tianbao Yang | The paper designs two algorithms for the stochastic AUC maximization problem with state-of-the-art complexities when using deep neural network as predictive model, which are also verified by empirical studies. | code |
101 | Semantically-Guided Representation Learning for Self-Supervised Monocular Depth | Vitor Guizilini, Rui Hou, Jie Li, Rares Ambrus, Adrien Gaidon | We propose a novel semantically-guided architecture for self-supervised monocular depth estimation | |
102 | MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius | Runtian Zhai, Chen Dan, Di He, Huan Zhang, Boqing Gong, Pradeep Ravikumar, Cho-Jui Hsieh, Liwei Wang | We propose MACER: a provable defense algorithm that trains robust models by maximizing the certified radius. It does not use adversarial training but performs better than all existing provable l2-defenses. | code |
103 | Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions | Yao Qin, Nicholas Frosst, Sara Sabour, Colin Raffel, Garrison Cottrell, Geoffrey Hinton | These suggest that CapsNets use features that are more aligned with human perception and address the central issue raised by adversarial examples. | |
104 | Adversarial Example Detection and Classification with Asymmetrical Adversarial Training | Xuwang Yin, Soheil Kolouri, Gustavo K Rohde | A new generative modeling technique based on asymmetrical adversarial training, and its applications to adversarial example detection and robust classification | code |
105 | Variational Recurrent Models for Solving Partially Observable Control Tasks | Dongqi Han, Kenji Doya, Jun Tani | A deep RL algorithm for solving POMDPs by auto-encoding the underlying states using a variational recurrent model | code |
106 | Population-Guided Parallel Policy Search for Reinforcement Learning | Whiyoung Jung, Giseung Park, Youngchul Sung | In this paper, a new population-guided parallel learning scheme is proposed to enhance the performance of off-policy reinforcement learning (RL). | |
107 | Compositional languages emerge in a neural iterated learning model | Yi Ren, Shangmin Guo, Matthieu Labeau, Shay B. Cohen, Simon Kirby | Use iterated learning framework to facilitate the dominance of high compositional language in multi-agent games. | |
108 | Black-Box Adversarial Attack with Transferable Model-based Embedding | Zhichao Huang, Tong Zhang | We present a new method that combines transfer-based and scored black-box adversarial attack, improving the success rate and query efficiency of black-box adversarial attack across different network architectures. | code |
109 | I Am Going MAD: Maximum Discrepancy Competition for Comparing Classifiers Adaptively | Haotao Wang, Tianlong Chen, Zhangyang Wang, Kede Ma | We present an efficient and adaptive framework for comparing image classifiers to maximize the discrepancies between the classifiers, in place of comparing on fixed test sets. | |
110 | Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models | Cheolhyoung Lee, Kyunghyun Cho, Wanmo Kang | In this paper, we introduce a new regularization technique, to which we refer as ?mixout?, motivated by dropout. | |
111 | Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP | Yuanhao Wang, Kefan Dong, Xiaoyu Chen, Liwei Wang | We adapt Q-learning with UCB-exploration bonus to infinite-horizon MDP with discounted rewards without accessing a generative model, and improves the previously best known result. | |
112 | Deep Network classification by Scattering and Homotopy dictionary learning | John Zarka, Louis Thiry, Tomas Angles, Stephane Mallat | A scattering transform followed by supervised dictionary learning reaches a higher accuracy than AlexNet on ImageNet. | |
113 | Data-Independent Neural Pruning via Coresets | Ben Mussay, Margarita Osadchy, Vladimir Braverman, Samson Zhou, Dan Feldman | We propose an efficient, provable and data independent method for network compression via neural pruning using coresets of neurons — a novel construction proposed in this paper. | |
114 | Bounds on Over-Parameterization for Guaranteed Existence of Descent Paths in Shallow ReLU Networks | Arsalan Sharifnassab, Saber Salehkaleybar, S. Jamaloddin Golestani | In this perspective, our results provide a somewhat sharp characterization of the over-parameterization required for “existence of descent paths” in the loss landscape. | |
115 | Novelty Detection Via Blurring | Sungik Choi, Sae-Young Chung | We propose a novel OOD detector that employ blurred images as adversarial examples . Our model achieve significant OOD detection performance in various domains. | |
116 | Nonlinearities in activations substantially shape the loss surfaces of neural networks | Fengxiang He, Bohan Wang, Dacheng Tao | This paper presents how the loss surfaces of nonlinear neural networks are substantially shaped by the nonlinearities in activations. | |
117 | Relational State-Space Model for Stochastic Multi-Object Systems | Fan Yang, Ling Chen, Fan Zhou, Yusong Gao, Wei Cao | A deep hierarchical state-space model in which the state transitions of correlated objects are coordinated by graph neural networks. | |
118 | Learning Efficient Parameter Server Synchronization Policies for Distributed SGD | Rong Zhu, Sheng Yang, Andreas Pfadler, Zhengping Qian, Jingren Zhou | We apply a reinforcement learning based approach to learning optimal synchronization policies used for Parameter Server-based distributed training of SGD. | |
119 | Action Semantics Network: Considering the Effects of Actions in Multiagent Systems | Weixun Wang, Tianpei Yang, Yong Liu, Jianye Hao, Xiaotian Hao, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao | Our proposed ASN characterizes different actions’ influence on other agents using neural networks based on the action semantics between them. | code |
120 | Vid2Game: Controllable Characters Extracted from Real-World Videos | Oran Gafni, Lior Wolf, Yaniv Taigman | We extract a controllable model from a video of a person performing a certain activity. | |
121 | Self-Adversarial Learning with Comparative Discrimination for Text Generation | Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou | We propose a self-adversarial learning (SAL) paradigm which improves the generator in a self-play fashion for improving GANs’ performance in text generation. | |
122 | Robust training with ensemble consensus | Jisoo Lee, Sae-Young Chung | This work presents a method of generating and using ensembles effectively to identify noisy examples in the presence of annotation noise. | |
123 | Identifying through Flows for Recovering Latent Representations | Shen Li, Bryan Hooi, Gim Hee Lee | In contrast, we propose an identifiable framework for estimating latent representations using a flow-based model (iFlow). | |
124 | Certified Robustness for Top-k Predictions against Adversarial Perturbations via Randomized Smoothing | Jinyuan Jia, Xiaoyu Cao, Binghui Wang, Neil Zhenqiang Gong | We study the certified robustness for top-k predictions via randomized smoothing under Gaussian noise and derive a tight robustness bound in L_2 norm. | |
125 | Optimistic Exploration even with a Pessimistic Initialisation | Tabish Rashid, Bei Peng, Wendelin Boehmer, Shimon Whiteson | We augment the Q-value estimates with a count-based bonus that ensures optimism during action selection and bootstrapping, even if the Q-value estimates are pessimistic. | |
126 | VL-BERT: Pre-training of Generic Visual-Linguistic Representations | Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai | VL-BERT is a simple yet powerful pre-trainable generic representation for visual-linguistic tasks. It is pre-trained on the massive-scale caption dataset and text-only corpus, and can be finetuned for varies down-stream visual-linguistic tasks. | |
127 | Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation | Hang Gao, Xizhou Zhu, Stephen Lin, Jifeng Dai | Don’t deform your convolutions — deform your kernels. | |
128 | Ensemble Distribution Distillation | Andrey Malinin, Bruno Mlodozeniec, Mark Gales | We distill an ensemble of models into a single model, capturing both the improved classification performance and information about the diversity of the ensemble, which is useful for uncertainty estimation. | |
129 | Gap-Aware Mitigation of Gradient Staleness | Saar Barkai, Ido Hakimi, Assaf Schuster | A new distributed, asynchronous, SGD-based algorithm, which achieves state-of-the-art accuracy on existing architectures using staleness penalization without having to re-tune the hyperparameters. | code |
130 | Counterfactuals uncover the modular structure of deep generative models | Michel Besserve, Arash Mehrjou, Remy Sun, Bernhard Schoelkopf | We develop a framework to find modular internal representations in generative models and manipulate then to generate counterfactual examples. | code |
131 | Physics-as-Inverse-Graphics: Unsupervised Physical Parameter Estimation from Video | Miguel Jaques, Michael Burke, Timothy Hospedales | We propose a model that is able to perform physical parameter estimation of systems from video, where the differential equations governing the scene dynamics are known, but labeled states or objects are not available. | |
132 | An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality | Silviu Pitis, Harris Chan, Kiarash Jamali, Jimmy Ba | We propose novel neural network architectures, guaranteed to satisfy the triangle inequality, for purposes of (asymmetric) metric learning and modeling graph distances. | |
133 | A Constructive Prediction of the Generalization Error Across Scales | Jonathan S. Rosenfeld, Amir Rosenfeld, Yonatan Belinkov, Nir Shavit | We predict the generalization error and specify the model which attains it across model/data scales. | |
134 | Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base | William W. Cohen, Haitian Sun, R. Alex Hofer, Matthew Siegler | A scalable differentiable neural module that implements reasoning on symbolic KBs. | |
135 | CLN2INV: Learning Loop Invariants with Continuous Logic Networks | Gabriel Ryan, Justin Wong, Jianan Yao, Ronghui Gu, Suman Jana | We introduce the Continuous Logic Network (CLN), a novel neural architecture for automatically learning loop invariants and general SMT formulas. | |
136 | NAS evaluation is frustratingly hard | Antoine Yang, Pedro M. Esperan?a, Fabio M. Carlucci | A study of how different components in the NAS pipeline contribute to the final accuracy. Also, a benchmark of 8 methods on 5 datasets. | code |
137 | Efficient and Information-Preserving Future Frame Prediction and Beyond | Wei Yu, Yichao Lu, Steve Easterbrook, Sanja Fidler | We propose CrevNet, a Conditionally Reversible Network that uses reversible architectures to build a bijective two-way autoencoder and its complementary recurrent predictor. | code |
138 | Order Learning and Its Application to Age Estimation | Kyungsun Lim, Nyeong-Ho Shin, Young-Yoon Lee, Chang-Su Kim | The notion of order learning is proposed and it is applied to regression problems in computer vision | |
139 | ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning | Weihao Yu, Zihang Jiang, Yanfei Dong, Jiashi Feng | We introduce ReClor, a reading comprehension dataset requiring logical reasoning, and find that current state-of-the-art models struggle with real logical reasoning with poor performance near that of random guess. | |
140 | AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures | Michael S. Ryoo, AJ Piergiovanni, Mingxing Tan, Anelia Angelova | We search for multi-stream neural architectures with better connectivity and spatio-temporal interactions for video understanding. | |
141 | Adversarially Robust Representations with Smooth Encoders | Taylan Cemgil, Sumedh Ghaisas, Krishnamurthy (Dj) Dvijotham, Pushmeet Kohli | We propose a method for computing adversarially robust representations in an entirely unsupervised way. | |
142 | From Variational to Deterministic Autoencoders | Partha Ghosh, Mehdi S. M. Sajjadi, Antonio Vergari, Michael Black, Bernhard Scholkopf | Deterministic regularized autoencoders can learn a smooth, meaningful latent space as VAEs without having to force some arbitrarily chosen prior (i.e., Gaussian). | |
143 | Computation Reallocation for Object Detection | Feng Liang, Ronghao Guo, Chen Lin, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang | We propose CR-NAS to reallocate engaged computation resources in different resolution and spatial position. | |
144 | Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents | Christian Rupprecht, Cyril Ibrahim, Christopher J. Pal | We generate critical states of a trained RL algorithms to visualize potential weaknesses. | |
145 | A Fair Comparison of Graph Neural Networks for Graph Classification | Federico Errica, Marco Podda, Davide Bacciu, Alessio Micheli | We provide a rigorous comparison of different Graph Neural Networks for graph classification. | code |
146 | Size-free generalization bounds for convolutional neural networks | Philip M. Long, Hanie Sedghi | We prove generalization bounds for convolutional neural networks that take account of weight-tying | |
147 | SAdam: A Variant of Adam for Strongly Convex Functions | Guanghui Wang, Shiyin Lu, Quan Cheng, Weiwei Tu, Lijun Zhang | A variant of Adam for strongly convex functions | code |
148 | Continual Learning with Bayesian Neural Networks for Non-Stationary Data | Richard Kurle, Botond Cseke, Alexej Klushyn, Patrick van der Smagt, Stephan G?nnemann | This work addresses continual learning for non-stationary data, using Bayesian neural networks and memory-based online variational Bayes. | |
149 | Multiplicative Interactions and Where to Find Them | Siddhant M. Jayakumar, Jacob Menick, Wojciech M. Czarnecki, Jonathan Schwarz, Jack Rae, Simon Osindero, Yee Whye Teh, Tim Harley, Razvan Pascanu | We explore the role of multiplicative interaction as a unifying framework to describe a range of classical and modern neural network architectural motifs, such as gating, attention layers, hypernetworks, and dynamic convolutions amongst others. | |
150 | FEW-SHOT LEARNING ON GRAPHS VIA SUPER-CLASSES BASED ON GRAPH SPECTRAL MEASURES | Jatin Chauhan, Deepak Nathani, Manohar Kaul | We propose to study the problem of few-shot graph classification in graph neural networks (GNNs) to recognize unseen classes, given limited labeled graph examples. | code |
151 | ON COMPUTATION AND GENERALIZATION OF GENER- ATIVE ADVERSARIAL IMITATION LEARNING | Minshuo Chen, Yizhou Wang, Tianyi Liu, Zhuoran Yang, Xingguo Li, Zhaoran Wang, Tuo Zhao | To bridge such a gap between theory and practice, this paper investigates the theoretical properties of GAIL. | |
152 | A TARGET-AGNOSTIC ATTACK ON DEEP MODELS: EXPLOITING SECURITY VULNERABILITIES OF TRANSFER LEARNING | Shahbaz Rezaei, Xin Liu | In this paper, we show that without any additional knowledge other than the pre-trained model, an attacker can launch an effective and efficient brute force attack that can craft instances of input to trigger each target class with high confidence. | code |
153 | Low-Resource Knowledge-Grounded Dialogue Generation | Xueliang Zhao, Wei Wu, Chongyang Tao, Can Xu, Dongyan Zhao, Rui Yan | Motivated by the challenge in practice, we consider knowledge-grounded dialogue generation under a natural assumption that only limited training examples are available. | |
154 | Deep 3D Pan via Local adaptive “t-shaped” convolutions with global and local adaptive dilations | Juan Luis Gonzalez Bello, Munchurl Kim | Novel architecture for stereoscopic view synthesis at arbitrary camera shifts utilizing adaptive t-shaped kernels with adaptive dilations. | |
155 | Tree-Structured Attention with Hierarchical Accumulation | Xuan-Phi Nguyen, Shafiq Joty | In this paper, we attempt to bridge this gap with Hierarchical Accumulation to encode parse tree structures into self-attention at constant time complexity. | |
156 | The asymptotic spectrum of the Hessian of DNN throughout training | Arthur Jacot, Franck Gabriel, Clement Hongler | Description of the limiting spectrum of the Hesian of the loss surface of DNNs in the infinite-width limit. | |
157 | Actor-Critic Provably Finds Nash Equilibria of Linear-Quadratic Mean-Field Games | Zuyue Fu, Zhuoran Yang, Yongxin Chen, Zhaoran Wang | Actor-Critic method with function approximation finds the Nash equilibrium pairs in mean-field games with theoretical guarantee. | |
158 | In Search for a SAT-friendly Binarized Neural Network Architecture | Nina Narodytska, Hongce Zhang, Aarti Gupta, Toby Walsh | Formal analysis of Binarized Neural Networks | |
159 | Generative Ratio Matching Networks | Akash Srivastava, Kai Xu, Michael U. Gutmann, Charles Sutton | In this work, we take their insight of using kernels as fixed adversaries further and present a novel method for training deep generative models that does not involve saddlepoint optimization. | code |
160 | Learning to Represent Programs with Property Signatures | Augustus Odena, Charles Sutton | We represent a computer program using a set of simpler programs and use this representation to improve program synthesis techniques. | |
161 | V4D: 4D Covolutional Neural Networks for Video-level Representations Learning | Shiwen Zhang, Sheng Guo, Weilin Huang, Matthew R. Scott, Limin Wang | A novel 4D CNN structure for video-level representation learning, surpassing recent 3D CNNs. | |
162 | Option Discovery using Deep Skill Chaining | Akhil Bagaria, George Konidaris | We present a new hierarchical reinforcement learning algorithm which can solve high-dimensional goal-oriented tasks far more reliably than non-hierarchical agents and other state-of-the-art skill discovery techniques. | code |
163 | Quantifying the Cost of Reliable Photo Authentication via High-Performance Learned Lossy Representations | Pawel Korus, Nasir Memon | We learn an efficient lossy image compression codec which can be optimized to facilitate reliable photo manipulation detection at fractional cost in payload/quality and even at low bitrates. | |
164 | On the Variance of the Adaptive Learning Rate and Beyond | Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han | If warmup is the answer, what is the question? | code |
165 | Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery | Kristian Hartikainen, Xinyang Geng, Tuomas Haarnoja, Sergey Levine | We show how to automatically learn dynamical distances in reinforcement learning setting and use them to provide well-shaped reward functions for reaching new goals. | |
166 | A Theoretical Analysis of the Number of Shots in Few-Shot Learning | Tianshi Cao, Marc T Law, Sanja Fidler | The paper analyzes the effect of shot number on prototypical networks and proposes a robust method when the shot number differs from meta-training to meta-testing time. | |
167 | Unsupervised Model Selection for Variational Disentangled Representation Learning | Sunny Duan, Loic Matthey, Andre Saraiva, Nick Watters, Chris Burgess, Alexander Lerchner, Irina Higgins | We introduce a method for unsupervised disentangled model selection for VAE-based disentangled representation learning approaches. | |
168 | Extracting and Leveraging Feature Interaction Interpretations | Michael Tsang, Dehua Cheng, Hanpeng Liu, Xue Feng, Eric Zhou, Yan Liu | Proposed a method to extract and leverage interpretations of feature interactions | |
169 | Understanding the Limitations of Variational Mutual Information Estimators | Jiaming Song, Stefano Ermon | We theoretically show that, under some conditions, estimators such as MINE exhibit variance that could grow exponentially with the true amount of underlying MI. | |
170 | GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations | Martin Engelcke, Adam R. Kosiorek, Oiwi Parker Jones, Ingmar Posner | We present the first object-centric generative model of 3D visual scenes capable of both decomposing and generating scenes. | |
171 | Language GANs Falling Short | Massimo Caccia, Lucas Caccia, William Fedus, Hugo Larochelle, Joelle Pineau, Laurent Charlin | GANs have been applied to text generation and are believed SOTA. However, we propose a new evaluation protocol demonstrating that maximum-likelihood trained models are still better. | code |
172 | Stochastic Conditional Generative Networks with Basis Decomposition | Ze Wang, Xiuyuan Cheng, Guillermo Sapiro, Qiang Qiu | To address this, we introduce BasisGAN, a stochastic conditional multi-mode image generator. | |
173 | LEARNED STEP SIZE QUANTIZATION | Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Appuswamy, Dharmendra S. Modha | A method for learning quantization configuration for low precision networks that achieves state of the art performance for quantized networks. | |
174 | On the “steerability” of generative adversarial networks | Ali Jahanian*, Lucy Chai*, Phillip Isola | We show that although current GANs can fit standard datasets very well, they still fall short of being comprehensive models of the visual manifold. | |
175 | Reinforced active learning for image segmentation | Arantxa Casanova, Pedro O. Pinheiro, Negar Rostamzadeh, Christopher J. Pal | Learning a labeling policy with reinforcement learning to reduce labeling effort for the task of semantic segmentation | |
176 | Sign Bits Are All You Need for Black-Box Attacks | Abdullah Al-Dujaili, Una-May O’Reilly | We present a sign-based, rather than magnitude-based, gradient estimation approach that shifts gradient estimation from continuous to binary black-box optimization. | code |
177 | Deep Semi-Supervised Anomaly Detection | Lukas Ruff, Robert A. Vandermeulen, Nico G?rnitz, Alexander Binder, Emmanuel M?ller, Klaus-Robert M?ller, Marius Kloft | We introduce Deep SAD, a deep method for general semi-supervised anomaly detection that especially takes advantage of labeled anomalies. | code |
178 | Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints | Mengtian Li, Ersin Yumer, Deva Ramanan | Introduce a formal setting for budgeted training and propose a budget-aware linear learning rate schedule | |
179 | Minimizing FLOPs to Learn Efficient Sparse Representations | Biswajit Paria, Chih-Kuan Yeh, Ning Xu, Barnabas Poczos, Pradeep Ravikumar, Ian E.H. Yen | We propose an approach to learn sparse high dimensional representations that are fast to search, by incorporating a surrogate of the number of operations directly into the loss function. | |
180 | Reanalysis of Variance Reduced Temporal Difference Learning | Tengyu Xu, Zhe Wang, Yi Zhou, Yingbin Liang | This paper provides a rigorous study of the variance reduced TD learning and characterizes its advantage over vanilla TD learning | |
181 | Imitation Learning via Off-Policy Distribution Matching | Ilya Kostrikov, Ofir Nachum, Jonathan Tompson | In this work, we show how the original distribution ratio estimation objective may be transformed in a principled manner to yield a completely off-policy objective. | |
182 | Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML | Aniruddh Raghu, Maithra Raghu, Samy Bengio, Oriol Vinyals | The success of MAML relies on feature reuse from the meta-initialization, which also yields a natural simplification of the algorithm, with the inner loop removed for the network body, as well as other insights on the head and body. | |
183 | Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical Space | AkshatKumar Nigam, Pascal Friederich, Mario Krenn, Alan Aspuru-Guzik | Tackling inverse design via genetic algorithms augmented with deep neural networks. | |
184 | Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin | Colin Wei, Tengyu Ma | We propose a new notion of margin that has a direct relationship with neural net generalization, and obtain improved generalization bounds for neural nets and robust classification by analyzing this margin. | |
185 | Identity Crisis: Memorization and Generalization Under Extreme Overparameterization | Chiyuan Zhang, Samy Bengio, Moritz Hardt, Michael C. Mozer, Yoram Singer | We study the interplay between memorization and generalization of overparameterized networks in the extreme case of a single training example and an identity-mapping task. | |
186 | ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring | David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, Colin Raffel | We introduce Distribution Matching and Augmentation Anchoring, two improvements to MixMatch which produce state-of-the-art results and enable surprisingly strong performance with only 40 labels on CIFAR-10 and SVHN. | |
187 | Adaptive Structural Fingerprints for Graph Attention Networks | Kai Zhang, Yaokang Zhu, Jun Wang, Jie Zhang | Exploiting rich strucural details in graph-structued data via adaptive “strucutral fingerprints” | code |
188 | CAQL: Continuous Action Q-Learning | Moonkyung Ryu, Yinlam Chow, Ross Anderson, Christian Tjandraatmadja, Craig Boutilier | A general framework of value-based reinforcement learning for continuous control | |
189 | Learning Heuristics for Quantified Boolean Formulas through Reinforcement Learning | Gil Lederman, Markus Rabe, Sanjit Seshia, Edward A. Lee | We use RL to automatically learn branching heuristic within a state of the art QBF solver, on industrial problems. | |
190 | Pure and Spurious Critical Points: a Geometric Study of Linear Networks | Matthew Trager, Kathl?n Kohn, Joan Bruna | We introduce a natural distinction between pure critical points, which only depend on the functional space, and spurious critical points, which arise from the parameterization. | code |
191 | Neural Text Generation With Unlikelihood Training | Sean Welleck, Ilia Kulikov, Stephen Roller, Emily Dinan, Kyunghyun Cho, Jason Weston | We propose a new objective, unlikelihood training, which forces unlikely generations to be assigned lower probability by the model. | code |
192 | Semi-Supervised Generative Modeling for Controllable Speech Synthesis | Raza Habib, Soroosh Mariooryad, Matt Shannon, Eric Battenberg, RJ Ryan, Daisy Stanton, David Kao, Tom Bagby | We present a novel generative model that combines state-of-the-art neural text- to-speech (TTS) with semi-supervised probabilistic latent variable models. | |
193 | Dynamic Time Lag Regression: Predicting What & When | Mandar Chandorkar, Cyril Furtlehner, Bala Poduval, Enrico Camporeale, Michele Sebag | We propose a new regression framework for temporal phenomena having non-stationary time-lag dependencies. | code |
194 | Scalable Model Compression by Entropy Penalized Reparameterization | Deniz Oktay, Johannes Ball?, Abhinav Shrivastava, Saurabh Singh | An end-to-end trainable model compression method optimizing accuracy jointly with the expected model size. | |
195 | AMRL: Aggregated Memory For Reinforcement Learning | Jacob Beck, Kamil Ciosek, Sam Devlin, Sebastian Tschiatschek, Cheng Zhang, Katja Hofmann | In Deep RL, order-invariant functions can be used in conjunction with standard memory modules to improve gradient decay and resilience to noise. | |
196 | Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform | Jun Li, Fuxin Li, Sinisa Todorovic | To address this challenge, we present two main contributions: (1) A new efficient retraction map based on an iterative Cayley transform for optimization updates, and (2) An implicit vector transport mechanism based on the combination of a projection of the momentum and the Cayley transform on the Stiefel manifold. | |
197 | UNPAIRED POINT CLOUD COMPLETION ON REAL SCANS USING ADVERSARIAL TRAINING | Xuelin Chen, Baoquan Chen, Niloy J. Mitra | We develop a first approach that works directly on input point clouds, does not require paired training data, and hence can directly be applied to real scans for scan completion. | |
198 | Adjustable Real-time Style Transfer | Mohammad Babaeizadeh, Golnaz Ghiasi | Stochastic style transfer with adjustable features. | |
199 | Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well | Vipul Gupta, Santiago Akle Serrano, Dennis DeCoste | We propose SWAP, a distributed algorithm for large-batch training of neural networks. | |
200 | Short and Sparse Deconvolution — A Geometric Approach | Yenson Lau, Qing Qu, Han-Wen Kuo, Pengcheng Zhou, Yuqian Zhang, John Wright | We leverage the key ideas from this theory (sphere constraints, data-driven initialization) to develop a {\em practical} algorithm, which performs well on data arising from a range of application areas. | |
201 | Selection via Proxy: Efficient Data Selection for Deep Learning | Cody Coleman, Christopher Yeh, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, Matei Zaharia | we can significantly improve the computational efficiency of data selection in deep learning by using a much smaller proxy model to perform data selection. | |
202 | Global Relational Models of Source Code | Vincent J. Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis | Models of source code that combine global and structural features learn more powerful representations of programs. | |
203 | Detecting Extrapolation with Local Ensembles | David Madras, James Atwood, Alexander D’Amour | We present local ensembles, a method for detecting extrapolation in trained models, which approximates the variance of an ensemble using local-second order information. | |
204 | Learning to Link | Maria-Florina Balcan, Travis Dick, Manuel Lang | We show how to use data to automatically learn low-loss linkage procedures and metrics for specific clustering applications. | |
205 | Adversarially robust transfer learning | Ali Shafahi, Parsa Saadatpanah, Chen Zhu, Amin Ghiasi, Christoph Studer, David Jacobs, Tom Goldstein | Robust models have robust feature extractors which can be useful for transferring robustness to other domains | |
206 | Overlearning Reveals Sensitive Attributes | Congzheng Song, Vitaly Shmatikov | Overlearning means that a model trained for a seemingly simple objective implicitly learns to recognize attributes and concepts that are (1) not part of the learning objective, and (2) sensitive from a privacy or bias perspective. | code |
207 | Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness | Pu Zhao, Pin-Yu Chen, Payel Das, Karthikeyan Natesan Ramamurthy, Xue Lin | A novel approach using mode connectivity in loss landscapes to mitigate adversarial effects, repair tampered models and evaluate adversarial robustness | |
208 | Differentially Private Meta-Learning | Jeffrey Li, Mikhail Khodak, Sebastian Caldas, Ameet Talwalkar | We conduct the first formal study of privacy in this setting and formalize the notion of task-global differential privacy as a practical relaxation of more commonly studied threat models. | |
209 | One-Shot Pruning of Recurrent Neural Networks by Jacobian Spectrum Evaluation | Shunshi Zhang, Bradly C. Stadie | New Objective for One-Shot Pruning Recurrent Neural Networks | |
210 | Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples | Eleni Triantafillou, Tyler Zhu, Vincent Dumoulin, Pascal Lamblin, Utku Evci, Kelvin Xu, Ross Goroshin, Carles Gelada, Kevin Swersky, Pierre-Antoine Manzagol, Hugo Larochelle | We propose a new large-scale diverse environment for few-shot learning, and evaluate popular models’ performance on it, revealing important research challenges. | code |
211 | Are Transformers universal approximators of sequence-to-sequence functions? | Chulhee Yun, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar | We prove that Transformer networks are universal approximators of sequence-to-sequence functions. | |
212 | GOING BEYOND TOKEN-LEVEL PRE-TRAINING FOR EMBEDDING-BASED LARGE-SCALE RETRIEVAL | Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang, Yiming Yang, Sanjiv Kumar | We consider large-scale retrieval problems such as question answering retrieval and present a comprehensive study of how different sentence level pre-training improving the BERT-style token-level pre-training for two-tower Transformer models. | |
213 | Deep Imitative Models for Flexible Inference, Planning, and Control | Nicholas Rhinehart, Rowan McAllister, Sergey Levine | In this paper, we propose Imitative Models to combine the benefits of IL and goal-directed planning: probabilistic predictive models of desirable behavior able to plan interpretable expert-like trajectories to achieve specified goals. | |
214 | CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning | Jiachen Yang, Alireza Nakhaei, David Isele, Kikuo Fujimura, Hongyuan Zha | A modular method for fully cooperative multi-goal multi-agent reinforcement learning, based on curriculum learning for efficient exploration and credit assignment for action-goal interactions. | |
215 | Robust And Interpretable Blind Image Denoising Via Bias-Free Convolutional Neural Networks | Sreyas Mohan, Zahra Kadkhodaie, Eero P. Simoncelli, Carlos Fernandez-Granda | We study the generalization properties of deep convolutional neural networks for image denoising in the presence of varying noise levels. | |
216 | Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets | Mingrui Liu, Youssef Mroueh, Jerret Ross, Wei Zhang, Xiaodong Cui, Payel Das, Tianbao Yang | This paper provides novel analysis of adaptive gradient algorithms for solving non-convex non-concave min-max problems as GANs, and explains the reason why adaptive gradient methods outperform its non-adaptive counterparts by empirical studies. | |
217 | DeepV2D: Video to Depth with Differentiable Structure from Motion | Zachary Teed, Jia Deng | DeepV2D predicts depth from a video clip by composing elements of classical SfM into a fully differentiable network. | |
218 | Learning Space Partitions for Nearest Neighbor Search | Yihe Dong, Piotr Indyk, Ilya Razenshteyn, Tal Wagner | We use supervised learning (and in particular deep learning) to produce better space partitions for fast nearest neighbor search. | code |
219 | Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP | Haonan Yu, Sergey Edunov, Yuandong Tian, Ari S. Morcos | We find that the lottery ticket phenomenon is present in both NLP and RL, and find that it can be used to train compressed Transformers to high performance | |
220 | Sign-OPT: A Query-Efficient Hard-label Adversarial Attack | Minhao Cheng, Simranjit Singh, Patrick H. Chen, Pin-Yu Chen, Sijia Liu, Cho-Jui Hsieh | In this paper, we adopt the same optimization formulation but propose to directly estimate the sign of gradient at any direction instead of the gradient itself, which enjoys the benefit of single query.Using this single query oracle for retrieving sign of directional derivative, we develop a novel query-efficient Sign-OPT approach for hard-label black-box attack. | |
221 | Toward Amortized Ranking-Critical Training For Collaborative Filtering | Sam Lobel, Chunyuan Li, Jianfeng Gao, Lawrence Carin | We apply the actor-critic methodology from reinforcement learning to collaborative filtering, resulting in improved performance across a variety of latent-variable models | code |
222 | Intrinsic Motivation for Encouraging Synergistic Behavior | Rohan Chitnis, Shubham Tulsiani, Saurabh Gupta, Abhinav Gupta | We propose a formulation of intrinsic motivation that is suitable as an exploration bias in multi-agent sparse-reward synergistic tasks, by encouraging agents to affect the world in ways that would not be achieved if they were acting individually. | |
223 | Chameleon: Adaptive Code Optimization For Expedited Deep Neural Network Compilation | Byung Hoon Ahn, Prannoy Pilligundla, Hadi Esmaeilzadeh | Reinforcement learning and Adaptive Sampling for Optimized Compilation of Deep Neural Networks. | |
224 | The function of contextual illusions | Drew Linsley, Junkyung Kim, Alekh Ashok, Thomas Serre | Contextual illusions are a feature, not a bug, of neural routines optimized for contour detection. | code |
225 | Locality and Compositionality in Zero-Shot Learning | Tristan Sylvain, Linda Petrini, Devon Hjelm | An analysis of the effects of compositionality and locality on representation learning for zero-shot learning. | |
226 | Understanding Knowledge Distillation in Non-autoregressive Machine Translation | Chunting Zhou, Jiatao Gu, Graham Neubig | We systematically examine why knowledge distillation is crucial to the training of non-autoregressive translation (NAT) models, and propose methods to further improve the distilled data to best match the capacity of an NAT model. | |
227 | Thieves on Sesame Street! Model Extraction of BERT-based APIs | Kalpesh Krishna, Gaurav Singh Tomar, Ankur P. Parikh, Nicolas Papernot, Mohit Iyyer | Outputs of modern NLP APIs on nonsensical text provide strong signals about model internals, allowing adversaries to steal the APIs. | |
228 | Fast is better than free: Revisiting adversarial training | Eric Wong, Leslie Rice, J. Zico Kolter | FGSM-based adversarial training, with randomization, works just as well as PGD-based adversarial training: we can use this to train a robust classifier in 6 minutes on CIFAR10, and 12 hours on ImageNet, on a single machine. | code |
229 | DBA: Distributed Backdoor Attacks against Federated Learning | Chulin Xie, Keli Huang, Pin-Yu Chen, Bo Li | We proposed a novel distributed backdoor attack on federated learning and show that it is not only more effective compared with standard centralized attacks, but also harder to be defended by existing robust FL methods | |
230 | DeFINE: Deep Factorized Input Word Embeddings for Neural Sequence Modeling | Sachin Mehta, Rik Koncel-Kedziorski, Mohammad Rastegari, Hannaneh Hajishirzi | DeFINE uses a deep, hierarchical, sparse network with new skip connections to learn better word embeddings efficiently. | |
231 | Sampling-Free Learning of Bayesian Quantized Neural Networks | Jiahao Su, Milan Cvitkovic, Furong Huang | We propose Bayesian quantized networks, for which we learn a posterior distribution over their quantized parameters. | |
232 | Learning to solve the credit assignment problem | Benjamin James Lansdell, Prashanth Ravi Prakash, Konrad Paul Kording | Perturbations can be used to train feedback weights to learn in fully connected and convolutional neural networks | |
233 | Four Things Everyone Should Know to Improve Batch Normalization | Cecilia Summers, Michael J. Dinneen | Four things that improve batch normalization across all batch sizes | code |
234 | Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving | Yurong You*, Yan Wang*, Wei-Lun Chao*, Divyansh Garg, Geoff Pleiss, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger | In this paper, we provide substantial advances to the pseudo-LiDAR framework through improvements in stereo depth estimation. | |
235 | SloMo: Improving Communication-Efficient Distributed SGD with Slow Momentum | Jianyu Wang, Vinayak Tantia, Nicolas Ballas, Michael Rabbat | SlowMo improves the optimization and generalization performance of communication-efficient decentralized algorithms without sacrificing speed. | |
236 | MetaPix: Few-Shot Video Retargeting | Jessica Lee, Deva Ramanan, Rohit Girdhar | Video retargeting typically requires large amount of target data to be effective, which may not always be available; we propose a metalearning approach that improves over popular baselines while producing temporally coherent frames. | |
237 | Learning to Learn by Zeroth-Order Oracle | Yangjun Ruan, Yuanhao Xiong, Sashank Reddi, Sanjiv Kumar, Cho-Jui Hsieh | Novel variant of learning to learn framework for zeroth-order optimization that learns both the update rule and the Gaussian sampling rule. | code |
238 | Decentralized Distributed PPO: Mastering PointGoal Navigation | Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra | We present Decentralized Distributed Proximal Policy Optimization (DD-PPO), a method for distributed reinforcement learning in resource-intensive simulated environments. | code |
239 | PAC Confidence Sets for Deep Neural Networks via Calibrated Prediction | Sangdon Park, Osbert Bastani, Nikolai Matni, Insup Lee | We propose an algorithm combining calibrated prediction and generalization bounds from learning theory to construct confidence sets for deep neural networks with PAC guarantees—i.e., the confidence set for a given input contains the true label with high probability. | |
240 | Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations | Yichi Zhang, Ritchie Zhao, Weizhe Hua, Nayun Xu, Edward Suh, Zhiru Zhang | We propose precision gating, an end-to-end trainable dual-precision activation quantization technique for deep neural networks. | code |
241 | Locally Constant Networks | Guang-He Lee, Tommi S. Jaakkola | A novel neural architecture which implicitly learns an (oblique) decision tree. | |
242 | Span Recovery for Deep Neural Networks with Applications to Input Obfuscation | Rajesh Jayaram, David P. Woodruff, Qiuyi Zhang | We provably recover the span of a deep multi-layered neural network with latent structure and empirically apply efficient span recovery algorithms to attack networks by obfuscating inputs. | code |
243 | Improving Neural Language Generation with Spectrum Control | Lingxiao Wang, Jing Huang, Kevin Huang, Ziniu Hu, Guangtao Wang, Quanquan Gu | In this paper, we propose a novel spectrum control approach to address this degeneration problem. | |
244 | Learn to Explain Efficiently via Neural Logic Inductive Learning | Yuan Yang, Le Song | An efficient differentiable ILP model that learns first-order logic rules that can explain the data. | code |
245 | Improved memory in recurrent neural networks with sequential non-normal dynamics | Emin Orhan, Xaq Pitkow | a feedforward, chain-like motif (1->2->3->…) is proposed as a useful inductive bias for better memory in RNNs; amazingly, it works. | |
246 | Neural Module Networks for Reasoning over Text | Nitish Gupta, Kevin Lin, Dan Roth, Sameer Singh, Matt Gardner | This paper extends neural module networks to answer compositional questions against text by introducing differentiable modules that perform reasoning over text and symbols in a probabilistic manner. | |
247 | Higher-Order Function Networks for Learning Composable 3D Object Representations | Eric Mitchell, Selim Engin, Volkan Isler, Daniel D Lee | Neural nets can encode complex 3D objects into the parameters of other (surprisingly small) neural nets | |
248 | Variational Hetero-Encoder Randomized GANs for Joint Image-Text Modeling | Hao Zhang, Bo Chen, Long Tian, Zhengjue Wang, Mingyuan Zhou | A novel Bayesian deep learning framework that captures and relates hierarchical semantic and visual concepts, performing well on a variety of image and text modeling and generation tasks. | code |
249 | Towards Fast Adaptation of Neural Architectures with Meta Learning | Dongze Lian, Yin Zheng, Yintao Xu, Yanxiong Lu, Leyu Lin, Peilin Zhao, Junzhou Huang, Shenghua Gao | In order to tackle the transferability of NAS and conduct fast adaptation of neural architectures, we propose a novel Transferable Neural Architecture Search method based on meta-learning in this paper, which is termed as T-NAS. | |
250 | Graph Constrained Reinforcement Learning for Natural Language Action Spaces | Prithviraj Ammanabrolu, Matthew Hausknecht | We present KG-A2C, a reinforcement learning agent that builds a dynamic knowledge graph while exploring and generates natural language using a template-based action space – outperforming all current agents on a wide set of text-based games. | |
251 | Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control | Nir Levine, Yinlam Chow, Rui Shu, Ang Li, Mohammad Ghavamzadeh, Hung Bui | Learning embedding for control with high-dimensional observations | |
252 | Augmenting Non-Collaborative Dialog Systems with Explicit Semantic and Strategic Dialog History | Yiheng Zhou, Yulia Tsvetkov, Alan W Black, Zhou Yu | We propose to model both semantic and tactic history using finite state transducers (FSTs). | |
253 | BERTScore: Evaluating Text Generation with BERT | Tianyi Zhang*, Varsha Kishore*, Felix Wu*, Kilian Q. Weinberger, Yoav Artzi | We propose BERTScore, an automatic evaluation metric for text generation, which correlates better with human judgments and provides stronger model selection performance than existing metrics. | |
254 | Neural Execution of Graph Algorithms | Petar Velickovic, Rex Ying, Matilde Padovano, Raia Hadsell, Charles Blundell | We supervise graph neural networks to imitate intermediate and step-wise outputs of classical graph algorithms, recovering highly favourable insights. | |
255 | On Need for Topology-Aware Generative Models for Manifold-Based Defenses | Uyeong Jang, Susmit Jha, Somesh Jha | This paper asks the following question: do the generative models used in manifold-based defenses need to be topology-aware? Our paper suggests the answer is yes. | |
256 | FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary | Yingzhen Yang, Jiahui Yu, Nebojsa Jojic, Jun Huan, Thomas S. Huang | We present a novel method of compression of deep Convolutional Neural Networks (CNNs) by weight sharing through a new representation of convolutional filters. | |
257 | Capsules with Inverted Dot-Product Attention Routing | Yao-Hung Hubert Tsai, Nitish Srivastava, Hanlin Goh, Ruslan Salakhutdinov | We present a new routing method for Capsule networks, and it performs at-par with ResNet-18 on CIFAR-10/ CIFAR-100. | |
258 | Composition-based Multi-Relational Graph Convolutional Networks | Shikhar Vashishth, Soumya Sanyal, Vikram Nitin, Partha Talukdar | A Composition-based Graph Convolutional framework for multi-relational graphs. | code |
259 | Gradient-Based Neural DAG Learning | S?bastien Lachapelle, Philippe Brouillard, Tristan Deleu, Simon Lacoste-Julien | We are proposing a new score-based approach to structure/causal learning leveraging neural networks and a recent continuous constrained formulation to this problem | code |
260 | The Local Elasticity of Neural Networks | Hangfeng He, Weijie Su | This paper presents a phenomenon in neural networks that we refer to as local elasticity. | |
261 | Composing Task-Agnostic Policies with Deep Reinforcement Learning | Ahmed H. Qureshi, Jacob J. Johnson, Yuzhe Qin, Taylor Henderson, Byron Boots, Michael C. Yip | We propose a novel reinforcement learning-based skill transfer and composition method that takes the agent’s primitive policies to solve unseen tasks. | code |
262 | Convergence Behaviour of Some Gradient-Based Methods on Bilinear Zero-Sum Games | Guojun Zhang, Yaoliang Yu | We systematically analyze the convergence behaviour of popular gradient algorithms for solving bilinear games, with both simultaneous and alternating updates. | code |
263 | Discovering Motor Programs by Recomposing Demonstrations | Tanmay Shankar, Shubham Tulsiani, Lerrel Pinto, Abhinav Gupta | We learn a space of motor primitives from unannotated robot demonstrations, and show these primitives are semantically meaningful and can be composed for new robot tasks. | |
264 | Learning from Explanations with Neural Module Execution Tree | Yujia Qin, Ziqi Wang, Wenxuan Zhou, Jun Yan, Qinyuan Ye, Xiang Ren, Leonardo Neves, Zhiyuan Liu | In this paper, we propose a novel Neural Modular Execution Tree (NMET) framework for augmenting sequence classification with NL explanations. | code |
265 | Jelly Bean World: A Testbed for Never-Ending Learning | Emmanouil Antonios Platanios, Abulhair Saparov, Tom Mitchell | To this end, we propose the Jelly Bean World testbed. | |
266 | Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization | Sat Chatterjee | We propose a hypothesis for why gradient descent generalizes based on how per-example gradients interact with each other. | |
267 | Probabilistic Connection Importance Inference and Lossless Compression of Deep Neural Networks | Xin Xing, Long Sha, Pengyu Hong, Zuofeng Shang, Jun S. Liu | We here propose a probabilistic importance inference approach for pruning DNNs. | |
268 | MEMO: A Deep Network for Flexible Combination of Episodic Memories | Andrea Banino, Adri? Puigdom?nech Badia, Raphael K?ster, Martin J. Chadwick, Vinicius Zambaldi, Demis Hassabis, Caswell Barry, Matthew Botvinick, Dharshan Kumaran, Charles Blundell | A memory architecture that support inferential reasoning. | |
269 | Economy Statistical Recurrent Units For Inferring Nonlinear Granger Causality | Saurabh Khanna, Vincent Y. F. Tan | A new recurrent neural network architecture for detecting pairwise Granger causality between nonlinearly interacting time series. | code |
270 | Bayesian Meta Sampling for Fast Uncertainty Adaptation | Zhenyi Wang, Yang Zhao, Ping Yu, Ruiyi Zhang, Changyou Chen | We proposed a Bayesian meta sampling method for adapting the model uncertainty in meta learning | |
271 | Non-Autoregressive Dialog State Tracking | Hung Le, Steven C.H. Hoi, Richard Socher | We propose the first non-autoregressive neural model for Dialogue State Tracking (DST), achieving the SOTA accuracy (49.04%) on MultiWOZ2.1 benchmark, and reducing inference latency by an order of magnitude. | |
272 | Extreme Tensoring for Low-Memory Preconditioning | Xinyi Chen, Naman Agarwal, Elad Hazan, Cyril Zhang, Yi Zhang | We propose \emph{extreme tensoring} for high-dimensional stochastic optimization, showing that an optimizer needs very little memory to benefit from adaptive preconditioning. | |
273 | Incremental RNN: A Dynamical View. | Anil Kag, Ziming Zhang, Venkatesh Saligrama | Incremental-RNNs resolves exploding/vanishing gradient problem by updating state vectors based on difference between previous state and that predicted by an ODE. | |
274 | The Early Phase of Neural Network Training | Jonathan Frankle, David J. Schwab, Ari S. Morcos | We thoroughly investigate neural network learning dynamics over the early phase of training, finding that these changes are crucial and difficult to approximate, though extended pretraining can recover them. | |
275 | NeurQuRI: Neural Question Requirement Inspector for Answerability Prediction in Machine Reading Comprehension | Seohyun Back, Sai Chetan Chinthakindi, Akhil Kedia, Haejun Lee, Jaegul Choo | We propose a neural question requirement inspection model called NeurQuRI that extracts a list of conditions from the question, each of which should be satisfied by the candidate answer generated by an MRC model. | |
276 | TOWARDS STABILIZING BATCH STATISTICS IN BACKWARD PROPAGATION OF BATCH NORMALIZATION | Junjie Yan, Ruosi Wan, Xiangyu Zhang, Wei Zhang, Yichen Wei, Jian Sun | We propose a novel normalization method to handle small batch size cases. | code |
277 | Single episode transfer for differing environmental dynamics in reinforcement learning | Jiachen Yang, Brenden Petersen, Hongyuan Zha, Daniel Faissol | Single episode policy transfer in a family of environments with related dynamics, via optimized probing for rapid inference of latent variables and immediate execution of a universal policy. | |
278 | Generalization through Memorization: Nearest Neighbor Language Models | Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, Mike Lewis | We extend a pre-trained neural language model by linearly interpolating it with a k-nearest neighbors model, achieving new state-of-the-art results on Wikitext-103 with no additional training. | |
279 | Transformer-XH: Multi-hop question answering with eXtra Hop attention | Chen Zhao, Chenyan Xiong, Corby Rosset, Xia Song, Paul Bennett, Saurabh Tiwary | We present Transformer-XH, which upgrades Transformer with eXtra Hop attentions to intrinsically model structured texts in a data driven way. It leads to a simpler yet state-of-the-art multi-hop QA system. | code |
280 | Synthesizing Programmatic Policies that Inductively Generalize | Jeevana Priya Inala, Osbert Bastani, Zenna Tavares, Armando Solar-Lezama | An approach to learn program policies that inductively generalize. | |
281 | Decoding As Dynamic Programming For Recurrent Autoregressive Models | Najam Zaidi, Trevor Cohn, Gholamreza Haffari | Approximate inference using dynamic programming for Autoregressive models. | |
282 | Deep Double Descent: Where Bigger Models and More Data Hurt | Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever | We demonstrate, and characterize, realistic settings where bigger models are worse, and more data hurts. | |
283 | Intriguing Properties of Adversarial Training at Scale | Cihang Xie, Alan Yuille | The first rigor diagnose of large-scale adversarial training on ImageNet | |
284 | Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks | Leopold Cambier, Anahita Bhiwandiwalla, Ting Gong, Oguz H. Elibol, Mehran Nekuii, Hanlin Tang | We propose a novel 8-bit format that eliminates the need for loss scaling, stochastic rounding, and other low precision techniques | |
285 | Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication | Yuanhao Wang, Jiachen Hu, Xiaoyu Chen, Liwei Wang | Our goal is to design communication protocols with near-optimal regret and little communication cost, which is measured by the total amount of transmitted data. | |
286 | Biologically inspired sleep algorithm for increased generalization and adversarial robustness in deep neural networks | Timothy Tadros, Giri Krishnan, Ramyaa Ramyaa, Maxim Bazhenov | We describe a biologically inspired sleep algorithm for increased an artificial neural network’s ability to extract the gist of a training set and exhibit increased robustness to adversarial attacks and general distortions. | |
287 | A Closer Look at the Optimization Landscapes of Generative Adversarial Networks | Hugo Berard, Gauthier Gidel, Amjad Almahairi, Pascal Vincent, Simon Lacoste-Julien | By proposing new visualization techniques we give better insights on GANs optimization in practical settings, we show that GANs on challenging datasets exhibit rotational behavior and do not converge to Nash-Equilibria | code |
288 | On the Global Convergence of Training Deep Linear ResNets | Difan Zou, Philip M. Long, Quanquan Gu | Under certain condition on the input and output linear transformations, both GD and SGD can achieve global convergence for training deep linear ResNets. | |
289 | Towards a Deep Network Architecture for Structured Smoothness | Haroun Habeeb, Oluwasanmi Koyejo | A feedforward layer to incorporate structured smoothness into a deep learning model | |
290 | Revisiting Self-Training for Neural Sequence Generation | Junxian He, Jiatao Gu, Jiajun Shen, Marc’Aurelio Ranzato | We revisit self-training as a semi-supervised learning method for neural sequence generation problem, and show that self-training can be quite successful with injected noise. | |
291 | Denoising and Regularization via Exploiting the Structural Bias of Convolutional Generators | Reinhard Heckel and Mahdi Soltanolkotabi | In this paper we take a step towards demystifying this experimental phenomena by attributing this effect to particular architectural choices of convolutional networks, namely fixed convolutional operations. | code |
292 | Variational Autoencoders for Highly Multivariate Spatial Point Processes Intensities | Baichuan Yuan, Xiaowei Wang, Andrea Bertozzi, Hongxia Yang | To bridge this gap, we introduce a declustering based hidden variable model that leads to an efficient inference procedure via a variational autoencoder (VAE). | |
293 | Model-Augmented Actor-Critic: Backpropagating through Paths | Ignasi Clavera, Yao Fu, Pieter Abbeel | Policy gradient through backpropagation through time using learned models and Q-functions. SOTA results in reinforcement learning benchmark environments. | |
294 | LambdaNet: Probabilistic Type Inference using Graph Neural Networks | Jiayi Wei, Maruth Goyal, Greg Durrett, Isil Dillig | This paper proposes a probabilistic type inference scheme for Typescript based on a graph neural network. | |
295 | From Inference to Generation: End-to-end Fully Self-supervised Generation of Human Face from Speech | Hyeong-Seok Choi, Changdae Park, Kyogu Lee | This paper proposes a method of end-to-end multi-modal generation of human face from speech based on a self-supervised learning framework. | |
296 | Visual Representation Learning with 3D View-Constrastive Inverse Graphics Networks | Adam W. Harley, Fangyu Li, Shrinidhi K. Lakshmikanth, Xian Zhou, Hsiao-Yu Fish Tung, Katerina Fragkiadaki | We show that with the right loss and architecture, view-predictive learning improves 3D object detection | |
297 | Decoupling Representation and Classifier for Long-Tailed Recognition | Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis | In this work, we decouple the learning procedure into representation learning and classification, and systematically explore how different balancing strategies affect them for long-tailed recognition. | |
298 | Robust Reinforcement Learning for Continuous Control with Model Misspecification | Daniel J. Mankowitz, Nir Levine, Rae Jeong, Abbas Abdolmaleki, Jost Tobias Springenberg, Yuanyuan Shi, Jackie Kay, Todd Hester, Timothy Mann, Martin Riedmiller | A framework for incorporating robustness to model misspecification into continuous control Reinforcement Learning algorithms. | |
299 | Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework | Zirui Wang*, Jiateng Xie*, Ruochen Xu, Yiming Yang, Graham Neubig, Jaime G. Carbonell | We conduct a comparative study of cross-lingual alignment vs joint training methods and unify these two previously exclusive paradigms in a new framework. | |
300 | Training Recurrent Neural Networks Online by Learning Explicit State Variables | Somjit Nath, Vincent Liu, Alan Chan, Adam White, Martha White | In this work, we reformulate the RNN training objective to explicitly learn state vectors; this breaks the dependence across time and so avoids the need to estimate gradients far back in time. | |
301 | Uncertainty-guided Continual Learning with Bayesian Neural Networks | Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, Marcus Rohrbach | A regularization-based approach for continual learning using Bayesian neural networks to predict parameters’ importance | |
302 | Curriculum Loss: Robust Learning and Generalization against Label Corruption | Yueming Lyu, Ivor W. Tsang | A novel loss bridges curriculum learning and robust learning | |
303 | Picking Winning Tickets Before Training by Preserving Gradient Flow | Chaoqi Wang, Guodong Zhang, Roger Grosse | We introduced a pruning criterion for pruning networks before training by preserving gradient flow. | |
304 | Generative Models for Effective ML on Private, Decentralized Datasets | Sean Augenstein, H. Brendan McMahan, Daniel Ramage, Swaroop Ramaswamy, Peter Kairouz, Mingqing Chen, Rajiv Mathews, Blaise Aguera y Arcas | Generative Models + Federated Learning + Differential Privacy gives data scientists a way to analyze private, decentralized data (e.g., on mobile devices) where direct inspection is prohibited. | code |
305 | Inductive representation learning on temporal graphs | da Xu, chuanwei ruan, evren korpeoglu, sushant kumar, kannan achan | We propose the temporal graph attention (TGAT) layer to effectively aggregate temporal-topological neighborhood features as well as learning time-feature interactions. | code |
306 | BatchEnsemble: an Alternative Approach to Efficient Ensemble and Lifelong Learning | Yeming Wen, Dustin Tran, Jimmy Ba | We introduced BatchEnsemble, an efficient method for ensembling and lifelong learning which can be used to improve the accuracy and uncertainty of any neural network like typical ensemble methods. | |
307 | Towards neural networks that provably know when they don’t know | Alexander Meinke, Matthias Hein | In this paper we propose a new approach to OOD which overcomes both problems. | |
308 | Iterative energy-based projection on a normal data manifold for anomaly localization | David Dehaene, Oriel Frigo, S?bastien Combrexelle, Pierre Eline | We use gradient descent on a regularized autoencoder loss to correct anomalous images. | |
309 | Towards Stable and Efficient Training of Verifiably Robust Neural Networks | Huan Zhang, Hongge Chen, Chaowei Xiao, Sven Gowal, Robert Stanforth, Bo Li, Duane Boning, Cho-Jui Hsieh | We propose a new certified adversarial training method, CROWN-IBP, that achieves state-of-the-art robustness for L_inf norm adversarial perturbations. | |
310 | Frequency-based Search-control in Dyna | Yangchen Pan, Jincheng Mei, Amir-massoud Farahmand, Martha White | Acquire states from high frequency region for search-control in Dyna. | |
311 | Learning representations for binary-classification without backpropagation | Mathias Lechner | First feedback alignment algorithm with provable learning guarantees for networks with single output neuron | code |
312 | Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks | Ziwei Ji, Matus Telgarsky | This work shows that O(1/epsilon) iterations of gradient descent on two-layer networks of any width exceeding polylog(n, 1/epsilon, 1/delta) and Omega(1/epsilon^2) training examples suffices to achieve a test error of epsilon. | |
313 | Physics-aware Difference Graph Networks for Sparsely-Observed Dynamics | Sungyong Seo*, Chuizheng Meng*, Yan Liu | We propose physics-aware difference graph networks designed to effectively learn spatial differences to modeling sparsely-observed dynamics. | |
314 | HiLLoC: lossless image compression with hierarchical latent variable models | James Townsend, Thomas Bird, Julius Kunze, David Barber | We scale up lossless compression with latent variables, beating existing approaches on full-size ImageNet images. | code |
315 | IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks | Michael Luo, Jiahao Yao, Richard Liaw, Eric Liang, Ion Stoica | IMPACT helps RL agents train faster by decreasing training wall-clock time and increasing sample efficiency simultaneously. | |
316 | On Bonus Based Exploration Methods In The Arcade Learning Environment | Adrien Ali Taiga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare | We find that existing bonus-based exploration methods have not been able to address the exploration-exploitation trade-off in the Arcade Learning Environment. | |
317 | Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation | Xinjie Fan, Yizhe Zhang, Zhendong Wang, Mingyuan Zhou | To stabilize this method for contextual generation of categorical sequences, we estimate the gradient by evaluating a set of correlated Monte Carlo rollouts. | code |
318 | Smoothness and Stability in GANs | Casey Chu, Kentaro Minami, Kenji Fukumizu | We develop a principled theoretical framework for understanding and enforcing the stability of various types of GANs | |
319 | SNOW: Subscribing to Knowledge via Channel Pooling for Transfer & Lifelong Learning | Chungkuk Yoo, Bumsoo Kang, Minsik Cho | We propose SNOW, an efficient way of transfer and lifelong learning by subscribing knowledge of a source model for new tasks through a novel channel pooling block. | |
320 | Empirical Studies on the Properties of Linear Regions in Deep Neural Networks | Xiao Zhang, Dongrui Wu | This paper provides a novel and meticulous perspective to look into DNNs: Instead of just counting the number of the linear regions, we study their local properties, such as the inspheres, the directions of the corresponding hyperplanes, the decision boundaries, and the relevance of the surrounding regions. | |
321 | Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning | Ali Mousavi, Lihong Li, Qiang Liu, Denny Zhou | We present a novel approach for the off-policy estimation problem in infinite-horizon RL. | |
322 | PairNorm: Tackling Oversmoothing in GNNs | Lingxiao Zhao, Leman Akoglu | We proposed a normalization layer for GNN models to solve the oversmoothing problem. | code |
323 | Unsupervised Clustering using Pseudo-semi-supervised Learning | Divam Gupta, Ramachandran Ramjee, Nipun Kwatra, Muthian Sivathanu | Using ensembles and pseudo labels for unsupervised clustering | code |
324 | Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee | Wei Hu, Zhiyuan Li, Dingli Yu | This paper proposes and analyzes two simple and intuitive regularization methods: (i) regularization by the distance between the network parameters to initialization, and (ii) adding a trainable auxiliary variable to the network output for each training example. | |
325 | Controlling generative models with continuous factors of variations | Antoine Plumerault, Herv? Le Borgne, C?line Hudelot | A model to control the generation of images with GAN and beta-VAE with regard to scale and position of the objects | |
326 | Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control | Yaofeng Desmond Zhong, Biswadip Dey, Amit Chakraborty | This work enforces Hamiltonian dynamics with control to learn system models from embedded position and velocity data, and exploits this physically-consistent dynamics to synthesize model-based control via energy shaping. | |
327 | Understanding l4-based Dictionary Learning: Interpretation, Stability, and Robustness | Yuexiang Zhai, Hermish Mehta, Zhengyuan Zhou, Yi Ma | We compare the l4-norm based dictionary learning with PCA, ICA and show its stability as well as robustness. | |
328 | Quantum Algorithms for Deep Convolutional Neural Networks | Iordanis Kerenidis, Jonas Landman, Anupam Prakash | We provide the first algorithm for quantum computers implementing universal convolutional neural network with a speedup | code |
329 | Self-Supervised Learning of Appliance Usage | Chen-Yu Hsu, Abbas Zeitoun, Guang-He Lee, Dina Katabi, Tommi Jaakkola | We learn appliance usage patterns in homes without labels, using self-supervised learning with energy and location data | |
330 | Deep Graph Matching Consensus | Matthias Fey, Jan E. Lenssen, Christopher Morris, Jonathan Masci, Nils M. Kriege | We develop a deep graph matching architecture which refines initial correspondences based on a neighborhood consensus error. | |
331 | Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks | Yu Bai, Jason D. Lee | Wide neural networks can escape the NTK regime and couple with quadratic models, with provably nice optimization landscape and better generalization. | |
332 | Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers | Junjie LIU, Zhe XU, Runbin SHI, Ray C. C. Cheung, Hayden K.H. So | We present a novel network pruning method that can find the optimal sparse structure during the training process with trainable pruning threshold | code |
333 | Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference | Ting-Kuei Hu, Tianlong Chen, Haotao Wang, Zhangyang Wang | Is it possible to co-design model accuracy, robustness and efficiency to achieve their triple wins? Yes! | |
334 | Neural Policy Gradient Methods: Global Optimality and Rates of Convergence | Lingxiao Wang, Qi Cai, Zhuoran Yang, Zhaoran Wang | In detail, we prove that neural natural policy gradient converges to a globally optimal policy at a sublinear rate. Also, we show that neural vanilla policy gradient converges sublinearly to a stationary point. | |
335 | Double Neural Counterfactual Regret Minimization | Hui Li, Kailiang Hu, Shaohua Zhang, Yuan Qi, Le Song | We proposed a double neural framework to solve large-scale imperfect information game. | |
336 | GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation | Chence Shi*, Minkai Xu*, Zhaocheng Zhu, Weinan Zhang, Ming Zhang, Jian Tang | A flow-based autoregressive model for molecular graph generation. Reaching state-of-the-art results on molecule generation and properties optimization. | code |
337 | The Gambler’s Problem and Beyond | Baoxiang Wang, Shuai Li, Jiajin Li, Siu On Chan | The optimal value function is fractal and is like a Cantor function. | |
338 | Multilingual Alignment of Contextual Word Representations | Steven Cao, Nikita Kitaev, Dan Klein | We propose procedures for evaluating and strengthening contextual embedding alignment and show that they both improve multilingual BERT’s zero-shot XNLI transfer and provide useful insights into the model. | |
339 | The Curious Case of Neural Text Degeneration | Ari Holtzman, Jan Buys, Leo Du, Maxwell Forbes, Yejin Choi | Current language generation systems either aim for high likelihood and devolve into generic repetition or miscalibrate their stochasticity?we provide evidence of both and propose a solution: Nucleus Sampling. | |
340 | Graph Convolutional Reinforcement Learning | Jiechuan Jiang, Chen Dun, Tiejun Huang, Zongqing Lu | To tackle these difficulties, we propose graph convolutional reinforcement learning, where graph convolution adapts to the dynamics of the underlying graph of the multi-agent environment, and relation kernels capture the interplay between agents by their relation representations. | code |
341 | Meta-Learning Deep Energy-Based Memory Models | Sergey Bartunov, Jack Rae, Simon Osindero, Timothy Lillicrap | Deep associative memory models using arbitrary neural networks as a storage. | |
342 | Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep RL | Akanksha Atrey, Kaleigh Clary, David Jensen | Proposing a new counterfactual-based methodology to evaluate the hypotheses generated from saliency maps about deep RL agent behavior. | |
343 | Fast Neural Network Adaptation via Parameters Remapping | Jiemin Fang*, Yuzhu Sun*, Kangjian Peng*, Qian Zhang, Yuan Li, Wenyu Liu, Xinggang Wang | In this paper, we propose a fast neural network adaptation method FNA, which can adapt the manually designed network on ImageNet to the new seg/det tasks efficiently. | |
344 | Guiding Program Synthesis by Learning to Generate Examples | Larissa Laich, Pavol Bielik, Martin Vechev | In this paper we address this challenge via an iterative approach that finds ambiguities in the provided specification and learns to resolve these by generating additional input-output examples. | |
345 | SNODE: Spectral Discretization of Neural ODEs for System Identification | Alessio Quaglino, Marco Gallieri, Jonathan Masci, Jan Koutn?k | This paper proposes the use of spectral element methods for fast and accurate training of Neural Ordinary Differential Equations for system identification. | |
346 | Generalized Convolutional Forest Networks for Domain Generalization and Visual Recognition | Jongbin Ryu, GiTaek Kwon, Ming-Hsuan Yang, Jongwoo Lim | In this work, we propose a generalized convolutional forest networks to learn a feature space to maximize the strength of individual tree classifiers while minimizing the respective correlation. | |
347 | Once for All: Train One Network and Specialize it for Efficient Deployment | Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han | We introduce techniques to train a single once-for-all network that fits many hardware platforms. | |
348 | Multi-Agent Interactions Modeling with Correlated Policies | Minghuan Liu, Ming Zhou, Weinan Zhang, Yuzheng Zhuang, Jun Wang, Wulong Liu, Yong Yu | Modeling complex multi-agent interactions under multi-agent imitation learning framework with explicit modeling of correlated policies by approximating opponents? policies. | code |
349 | PCMC-Net: Feature-based Pairwise Choice Markov Chains | Alix Lh?ritier | We propose a generic neural network architecture equipping Pairwise Choice Markov Chains choice models with amortized and automatic differentiation based inference using alternatives’ and individuals’ features. | |
350 | Implementing Inductive bias for different navigation tasks through diverse RNN attrractors | Tie XU, Omri Barak | Task agnostic pre-training can shape RNN’s attractor landscape, and form diverse inductive bias for different navigation tasks | code |
351 | Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings | Hongyu Ren*, Weihua Hu*, Jure Leskovec | Answering a wide class of logical queries over knowledge graphs with box embeddings in vector space | |
352 | Rethinking the Hyperparameters for Fine-tuning | Hao Li, Pratik Chaudhari, Hao Yang, Michael Lam, Avinash Ravichandran, Rahul Bhotik, Stefano Soatto | This paper re-examines several common practices of setting hyper-parameters for fine-tuning. | |
353 | Plug and Play Language Model: A simple baseline for controlled language generation | Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, Rosanne Liu | We control the topic and sentiment of text generation (almost) without any training. | code |
354 | Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks | Wei Hu, Lechao Xiao, Jeffrey Pennington | We provide for the first time a rigorous proof that orthogonal initialization speeds up convergence relative to Gaussian initialization, for deep linear networks. | |
355 | RGBD-GAN: Unsupervised 3D Representation Learning From Natural Image Datasets via RGBD Image Synthesis | Atsuhiro Noguchi, Tatsuya Harada | RGBD image generation for unsupervised camera parameter conditioning | |
356 | Towards Verified Robustness under Text Deletion Interventions | Johannes Welbl, Po-Sen Huang, Robert Stanforth, Sven Gowal, Krishnamurthy (Dj) Dvijotham, Martin Szummer, Pushmeet Kohli | Formal verification of a specification on a model’s prediction undersensitivity using Interval Bound Propagation | |
357 | Jacobian Adversarially Regularized Networks for Robustness | Alvin Chan, Yi Tay, Yew Soon Ong, Jie Fu | We show that training classifiers to produce salient input Jacobian matrices with a GAN-like regularization can boost adversarial robustness. | |
358 | Thinking While Moving: Deep Reinforcement Learning with Concurrent Control | Ted Xiao, Eric Jang, Dmitry Kalashnikov, Sergey Levine, Julian Ibarz, Karol Hausman, Alexander Herzog | Reinforcement learning formulation that allows agents to think and act at the same time, demonstrated on real-world robotic grasping. | |
359 | Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning | Qian Long*, Zihan Zhou*, Abhinav Gupta, Fei Fang, Yi Wu?, Xiaolong Wang? | In this paper, we introduce Evolutionary Population Curriculum (EPC), a curriculum learning paradigm that scales up Multi-Agent Reinforcement Learning (MARL) by progressively increasing the population of training agents in a stage-wise manner. | |
360 | ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators | Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning | A text encoder trained to distinguish real input tokens from plausible fakes efficiently learns effective language representations. | |
361 | Emergent Systematic Generalization In a Situated Agent | Felix Hill, Andrew Lampinen, Rosalia Schneider, Stephen Clark, Matthew Botvinick, James L. McClelland, Adam Santoro | We isolate the environmental and training factors that contribute to strong emergent systematic generalization in a situated language-learning agent | |
362 | Abstract Diagrammatic Reasoning with Multiplex Graph Networks | Duo Wang, Mateja Jamnik, Pietro Lio | MXGNet is a multilayer, multiplex graph based architecture which achieves good performance on various diagrammatic reasoning tasks. | |
363 | A Baseline for Few-Shot Image Classification | Guneet Singh Dhillon, Pratik Chaudhari, Avinash Ravichandran, Stefano Soatto | Transductive fine-tuning of a deep network is a strong baseline for few-shot image classification and outperforms the state-of-the-art on all standard benchmarks. | |
364 | Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering | Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, Caiming Xiong | Graph-based recurrent retriever that learns to retrieve reasoning paths over Wikipedia Graph outperforms the most recent state of the art on HotpotQA by more than 10 points. | |
365 | Pad? Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks | Alejandro Molina, Patrick Schramowski, Kristian Kersting | We introduce PAU, a new learnable activation function for neural networks. They free the network designers from the activation selection process and increase the test prediction accuracy. | |
366 | A FRAMEWORK FOR ROBUSTNESS CERTIFICATION OF SMOOTHED CLASSIFIERS USING F-DIVERGENCES | Krishnamurthy (Dj) Dvijotham, Jamie Hayes, Borja Balle, Zico Kolter, Chongli Qin, Andras Gyorgy, Kai Xiao, Sven Gowal, Pushmeet Kohli | Develop a general framework to establish certified robustness of ML models against various classes of adversarial perturbations | |
367 | Contrastive Representation Distillation | Yonglong Tian, Dilip Krishnan, Phillip Isola | Representation/knowledge distillation by maximizing mutual information between teacher and student | |
368 | Certified Defenses for Adversarial Patches | Ping-yeh Chiang*, Renkun Ni*, Ahmed Abdelkader, Chen Zhu, Chris Studor, Tom Goldstein | Motivated by this finding, we present an extension of certified defense algorithms and propose significantly faster variants for robust training against patch attacks. | |
369 | Sample Efficient Policy Gradient Methods with Recursive Variance Reduction | Pan Xu, Felicia Gao, Quanquan Gu | In this work, we aim to reduce the sample complexity of existing policy gradient methods. | |
370 | Deep Symbolic Superoptimization Without Human Knowledge | Hui Shi, Yang Zhang, Xinyun Chen, Yuandong Tian, Jishen Zhao | We thus propose HISS, a reinforcement learning framework for symbolic super-optimization that keeps human outside the loop. | |
371 | Explain Your Move: Understanding Agent Actions Using Focused Feature Saliency | Piyush Gupta, Nikaash Puri, Sukriti Verma, Dhruv Kayastha, Shripad Deshmukh, Balaji Krishnamurthy, Sameer Singh | We propose a model-agnostic approach to explain the behaviour of black-box deep RL agents, trained to play Atari and board games, by highlighting relevant features of an input state. | code |
372 | Universal Approximation with Certified Networks | Maximilian Baader, Matthew Mirman, Martin Vechev | We prove that for a large class of functions f there exists an interval certified robust network approximating f up to arbitrary precision. | |
373 | Measuring and Improving the Use of Graph Information in Graph Neural Networks | Yifan Hou, Jian Zhang, James Cheng, Kaili Ma, Richard T. B. Ma, Hongzhi Chen, Ming-Chang Yang | This paper introduces a context-surrounding GNN framework and proposes two smoothness metrics to measure the quantity and quality of information obtained from graph data. | |
374 | State-only Imitation with Transition Dynamics Mismatch | Tanmay Gangwani, Jian Peng | Algorithm for imitation with state-only expert demonstrations; builds on adversarial-IRL; experiments with transition dynamics mismatch b/w expert and imitator | |
375 | Adversarial AutoAugment | Xinyu Zhang, Qiang Wang, Jian Zhang, Zhao Zhong | We introduce the idea of adversarial learning into automatic data augmentation to improve the generalization of a targe network. | |
376 | Meta Dropout: Learning to Perturb Latent Features for Generalization | Hae Beom Lee, Taewook Nam, Eunho Yang, Sung Ju Hwang | To tackle this challenge, we propose a novel regularization method, meta-dropout, which learns to perturb the latent features of training examples for generalization in a meta-learning framework. | code |
377 | R?nyi Fair Inference | Sina Baharlouei, Maher Nouiehed, Meisam Razaviyayn | In this paper, we use R?nyi correlation as a measure of fairness of machine learning models and develop a general training framework to impose fairness. | |
378 | Learning transport cost from subset correspondence | Ruishan Liu, Akshay Balsubramani, James Zou | In this work, we investigate how to learn the cost function using a small amount of side information which is often available. | code |
379 | BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget | Jack Turner, Elliot J. Crowley, Michael O’Boyle, Amos Storkey, Gavin Gray | A simple and effective method for reducing large neural networks to flexible parameter targets based on block substitution. | |
380 | Variance Reduction With Sparse Gradients | Melih Elibol, Lihua Lei, Michael I. Jordan | We use sparsity to improve the computational complexity of variance reduction methods. | code |
381 | Abductive Commonsense Reasoning | Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Wen-tau Yih, Yejin Choi | We present the first study that investigates the viability of language-based abductive reasoning. We introduce a challenge dataset, ART, that consists of over 20k commonsense narrative contexts and 200k explanations. | |
382 | Discrepancy Ratio: Evaluating Model Performance When Even Experts Disagree on the Truth | Igor Lovchinsky, Alon Daks, Israel Malkin, Pouya Samangouei, Ardavan Saeedi, Yang Liu, Swami Sankaranarayanan, Tomer Gafner, Ben Sternlieb, Patrick Maher, Nathan Silberman | A framework for evaluating model performance when even experts disagree on what the ground truth is. | |
383 | Weakly Supervised Disentanglement with Guarantees | Rui Shu, Yining Chen, Abhishek Kumar, Stefano Ermon, Ben Poole | We construct a theoretical framework for weakly supervised disentanglement and conducted lots of experiments to back up the theory. | code |
384 | Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks | Jiadong Lin, Chuanbiao Song, Kun He, Liwei Wang, John E. Hopcroft | We proposed a Nesterov Iterative Fast Gradient Sign Method (NI-FGSM) and a Scale-Invariant attack Method (SIM) that can boost the transferability of adversarial examples for image classification. | code |
385 | Fantastic Generalization Measures and Where to Find Them | Yiding Jiang*, Behnam Neyshabur*, Dilip Krishnan, Hossein Mobahi, Samy Bengio | We empirically study generalization measures over more than 2000 models, identify common pitfall in existing practice of studying generalization measures and provide some new bounds based on measures in our study. | code |
386 | Robustness Verification for Transformers | Zhouxing Shi, Huan Zhang, Kai-Wei Chang, Minlie Huang, Cho-Jui Hsieh | We propose the first algorithm for verifying the robustness of Transformers. | |
387 | A Simple Randomization Technique for Generalization in Deep Reinforcement Learning | Kimin Lee, Kibok Lee, Jinwoo Shin, Honglak Lee | We propose a simple randomization technique for improving generalization in deep reinforcement learning across tasks with various unseen visual patterns. | |
388 | Tensor Decompositions for Temporal Knowledge Base Completion | Timoth?e Lacroix, Guillaume Obozinski, Nicolas Usunier | We propose new tensor decompositions and associated regularizers to obtain state of the art performances on temporal knowledge base completion. | code |
389 | On Universal Equivariant Set Networks | Nimrod Segol, Yaron Lipman | Settling permutation equivariance universality for popular deep models. | |
390 | Provable robustness against all adversarial $l_p$-perturbations for $p\geq 1$ | Francesco Croce, Matthias Hein | We introduce a method to train models with provable robustness wrt all the $l_p$-norms for $p\geq 1$ simultaneously. | |
391 | Don’t Use Large Mini-batches, Use Local SGD | Tao Lin, Sebastian U. Stich, Kumar Kshitij Patel, Martin Jaggi | As a remedy, we propose a \emph{post-local} SGD and show that it significantly improves the generalization performance compared to large-batch training on standard benchmarks while enjoying the same efficiency (time-to-accuracy) and scalability. | |
392 | Kernel of CycleGAN as a principal homogeneous space | Nikita Moriakov, Jonas Adler, Jonas Teuwen | The space of approximate solutions of CycleGAN admits a lot of symmetry, and an identity loss does not fix this. | |
393 | Distributionally Robust Neural Networks | Shiori Sagawa*, Pang Wei Koh*, Tatsunori B. Hashimoto, Percy Liang | Overparameterized neural networks can be distributionally robust, but only when you account for generalization. | |
394 | On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach | Yuanhao Wang, Guodong Zhang, Jimmy Ba | In this paper, we propose Follow-the-Ridge (FR), a novel algorithm that provably converges to and only converges to local minimax. | |
395 | A Neural Dirichlet Process Mixture Model for Task-Free Continual Learning | Soochan Lee, Junsoo Ha, Dongsu Zhang, Gunhee Kim | We propose an expansion-based approach for task-free continual learning for the first time. Our model consists of a set of neural network experts and expands the number of experts under the Bayesian nonparametric principle. | |
396 | Hyper-SAGNN: a self-attention based graph neural network for hypergraphs | Ruochi Zhang, Yuesong Zou, Jian Ma | We develop a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes that can fulfill tasks like node classification and hyperedge prediction. | code |
397 | Neural Epitome Search for Architecture-Agnostic Network Compression | Daquan Zhou, Xiaojie Jin, Qibin Hou, Kaixin Wang, Jianchao Yang, Jiashi Feng | We present a novel neural network compression method which can reuse the parameters efficiently to reduce the model size. | |
398 | On the Equivalence between Node Embeddings and Structural Graph Representations | Balasubramaniam Srinivasan, Bruno Ribeiro | We develop the foundations of a unifying theoretical framework connecting node embeddings and structural graph representations through invariant theory | |
399 | Probability Calibration for Knowledge Graph Embedding Models | Pedro Tabacof, Luca Costabello | We propose a novel method to calibrate knowledge graph embedding models without the need of negative examples. | |
400 | Why Not to Use Zero Imputation? Correcting Sparsity Bias in Training Neural Networks | Joonyoung Yi, Juhyuk Lee, Sung Ju Hwang, Eunho Yang | In this paper, we introduce the variable sparsity problem (VSP), which describes a phenomenon where the output of a predictive model largely varies with respect to the rate of missingness in the given input, and show that it adversarially affects the model performance. | |
401 | DropEdge: Towards Deep Graph Convolutional Networks on Node Classification | Yu Rong, Wenbing Huang, Tingyang Xu, Junzhou Huang | This paper proposes DropEdge, a novel and flexible technique to alleviate over-smoothing and overfitting issue in deep Graph Convolutional Networks. | code |
402 | Masked Based Unsupervised Content Transfer | Ron Mokady, Sagie Benaim, Lior Wolf, Amit Bermano | We consider the problem of translating, in an unsupervised manner, between two domains where one contains some additional information compared to the other. | code |
403 | U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation | Junho Kim, Minjae Kim, Hyeonwoo Kang, Kwang Hee Lee | We propose a novel method for unsupervised image-to-image translation, which incorporates a new attention module and a new learnable normalization function in an end-to-end manner. | code |
404 | Inductive and Unsupervised Representation Learning on Graph Structured Objects | Lichen Wang, Bo Zong, Qianqian Ma, Wei Cheng, Jingchao Ni, Wenchao Yu, Yanchi Liu, Dongjin Song, Haifeng Chen, Yun Fu | This paper proposed a novel framework for graph similarity learning in inductive and unsupervised scenario. | |
405 | Batch-shaping for learning conditional channel gated networks | Babak Ehteshami Bejnordi, Tijmen Blankevoort, Max Welling | A method that trains large capacity neural networks with significantly improved accuracy and lower dynamic computational cost | |
406 | Learning Robust Representations via Multi-View Information Bottleneck | Marco Federici, Anjan Dutta, Patrick Forr?, Nate Kushman, Zeynep Akata | We extend the information bottleneck method to the unsupervised multiview setting and show state of the art results on standard datasets | code |
407 | Deep probabilistic subsampling for task-adaptive compressed sensing | Iris A.M. Huijben, Bastiaan S. Veeling, Ruud J.G. van Sloun | In this work, we demonstrate that the deep learning paradigm can be extended to incorporate a subsampling scheme that is jointly optimized under a desired minimum sample rate. | |
408 | Robust anomaly detection and backdoor attack detection via differential privacy | Min Du, Ruoxi Jia, Dawn Song | This paper shows that differential privacy could improve the utility of outlier detection, novelty detection and backdoor attack detection, through both a theoretical analysis and extensive experimental results (constructed and real-world). | code |
409 | Learning to Guide Random Search | Ozan Sener, Vladlen Koltun | We improve the sample-efficiency of the random search for functions defined on low-dimensional manifolds. Our method jointly learns the underlying manifold and optimizes the function. | |
410 | Lagrangian Fluid Simulation with Continuous Convolutions | Benjamin Ummenhofer, Lukas Prantl, Nils Th?rey, Vladlen Koltun | We learn particle-based fluid simulation with convolutional networks. | |
411 | Reinforced Genetic Algorithm Learning for Optimizing Computation Graphs | Aditya Paliwal, Felix Gimeno, Vinod Nair, Yujia Li, Miles Lubin, Pushmeet Kohli, Oriol Vinyals | We use deep RL to learn a policy that directs the search of a genetic algorithm to better optimize the execution cost of computation graphs, and show improved results on real-world TensorFlow graphs. | |
412 | Compressive Transformers for Long-Range Sequence Modelling | Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Chloe Hillier, Timothy P. Lillicrap | Long-range transformer using a compressive memory, achieves sota in wikitext-103 and enwik8 LM benchmarks, release a new book-level LM benchmark PG-19. | |
413 | A Stochastic Derivative Free Optimization Method with Momentum | Eduard Gorbunov, Adel Bibi, Ozan Sener, El Houcine Bergou, Peter Richtarik | We develop and analyze a new derivative free optimization algorithm with momentum and importance sampling with applications to continuous control. | |
414 | Understanding and Improving Information Transfer in Multi-Task Learning | Sen Wu, Hongyang Zhang, Christopher R? | A Theoretical Study of Multi-Task Learning with Practical Implications for Improving Multi-Task Training and Transfer Learning | |
415 | Learning To Explore Using Active Neural Mapping | Devendra Singh Chaplot, Saurabh Gupta, Dhiraj Gandhi, Abhinav Gupta, Ruslan Salakhutdinov | A modular and hierarchical approach to learn policies for exploring 3D environments. | |
416 | EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness Against Adversarial Attacks | Sanchari Sen, Balaraman Ravindran, Anand Raghunathan | We propose ensembles of mixed-precision DNNs as a new form of defense against adversarial attacks | |
417 | Quantifying Point-Prediction Uncertainty in Neural Networks via Residual Estimation with an I/O Kernel | Xin Qiu, Elliot Meyerson, Risto Miikkulainen | Learning to Estimate Point-Prediction Uncertainty and Correct Output in Neural Networks | code |
418 | B-Spline CNNs on Lie groups | Erik J Bekkers | The paper describes a flexible framework for building CNNs that are equivariant to a large class of transformations groups. | code |
419 | Neural Outlier Rejection for Self-Supervised Keypoint Learning | Jiexiong Tang, Rares Ambrus, Vitor Guizilini, Hanme Kim | Learning to extract distinguishable keypoints from a proxy task, outlier rejection. | |
420 | Reducing Transformer Depth on Demand with Structured Dropout | Angela Fan, Edouard Grave, Armand Joulin | Layerdrop, a form of structured dropout that allows you to train one model at training time and prune to any desired depth at test time. You can also | |
421 | Cross-Lingual Ability of Multilingual BERT: An Empirical Study | Karthikeyan K, Zihan Wang, Stephen Mayhew, Dan Roth | Cross-Lingual Ability of Multilingual BERT: An Empirical Study | |
422 | Spatially Parallel Attention and Component Extraction for Scene Decomposition | Sungjin Ahn, Zhixuan Lin, Weihao Sun, Skand Vishwanath Peri, Gautam Singh, Yi-Fu Wu, Fei Deng, Jindong Jiang | We propose a generative latent variable model for unsupervised scene decomposition that provides factorized object representation per foreground object while also decomposing background segments of complex morphology. | |
423 | RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments | Roberta Raileanu, Tim Rockt?schel | Instead of rewarding agents for predicting the next state, reward them for taking actions that lead to changes in the state. | |
424 | On the geometry and learning low-dimensional embeddings for directed graphs | Thorben Funke, Tian Guo, Alen Lancic, Nino Antulov-Fantulin | We propose a novel node embedding of directed graphs to statistical manifolds and analyze connections to divergence, geometry and efficient learning procedure. | |
425 | Efficient Probabilistic Logic Reasoning with Graph Neural Networks | Yuyu Zhang, Xinshi Chen, Yuan Yang, Arun Ramamurthy, Bo Li, Yuan Qi, Le Song | We employ graph neural networks in the variational EM framework for efficient inference and learning of Markov Logic Networks. | code |
426 | GraphSAINT: Graph Sampling Based Inductive Learning Method | Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, Viktor Prasanna | We propose a graph sampling based minibatch construction method for training Graph Convolutional Networks. | code |
427 | You Only Train Once: Loss-Conditional Training of Deep Networks | Alexey Dosovitskiy, Josip Djolonga | A method to train a single model simultaneously minimizing a family of loss functions instead of training a set of per-loss models. | |
428 | Projection Based Constrained Policy Optimization | Tsung-Yen Yang, Justinian Rosca, Karthik Narasimhan, Peter J. Ramadge | We propose a new algorithm that learns constraint-satisfying policies, and provide theoretical analysis and empirical demonstration in the context of reinforcement learning with constraints. | code |
429 | Infinite-Horizon Differentiable Model Predictive Control | Sebastian East, Marco Gallieri, Jonathan Masci, Jan Koutnik, Mark Cannon | This paper proposes a differentiable linear quadratic Model Predictive Control (MPC) framework for safe imitation learning. | |
430 | Combining Q-Learning and Search with Amortized Value Estimates | Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Tobias Pfaff, Theophane Weber, Lars Buesing, Peter W. Battaglia | We propose a model-based method called “Search with Amortized Value Estimates” (SAVE) which leverages both real and planned experience by combining Q-learning with Monte-Carlo Tree Search, achieving strong performance with very small search budgets. | |
431 | Training Generative Adversarial Networks from Incomplete Observations using Factorised Discriminators | Daniel Stoller, Sebastian Ewert, Simon Dixon | We decompose the discriminator in a GAN in a principled way so that each component can be independently trained on different parts of the input. The resulting “FactorGAN” can be used for semi-supervised learning and in missing data scenarios. | code |
432 | Decentralized Deep Learning with Arbitrary Communication Compression | Anastasia Koloskova*, Tao Lin*, Sebastian U Stich, Martin Jaggi | We propose Choco-SGD—decentralized SGD with compressed communication—for non-convex objectives and show its strong performance in various deep learning applications (on-device learning, datacenter case). | code |
433 | Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control | Tsui-Wei Weng, Krishnamurthy (Dj) Dvijotham*, Jonathan Uesato*, Kai Xiao*, Sven Gowal*, Robert Stanforth*, Pushmeet Kohli | We study the problem of continuous control agents in deep RL with adversarial attacks and proposed a two-step algorithm based on learned model dynamics. | |
434 | Gradient $\ell_1$ Regularization for Quantization Robustness | Milad Alizadeh, Arash Behboodi, Mart van Baalen, Christos Louizos, Tijmen Blankevoort, Max Welling | We show that regularizing the $\ell_1$-norm of gradients improves robustness to post-training quantization in neural networks. | |
435 | SpikeGrad: An ANN-equivalent Computation Model for Implementing Backpropagation with Spikes | Johannes C. Thiele, Olivier Bichler, Antoine Dupret | An implementation of the backpropagation algorithm using spiking neurons for forward and backward propagation. | |
436 | On the Relationship between Self-Attention and Convolutional Layers | Jean-Baptiste Cordonnier, Andreas Loukas, Martin Jaggi | A self-attention layer can perform convolution and often learns to do so in practice. | code |
437 | Learning-Augmented Data Stream Algorithms | Tanqiu Jiang, Yi Li, Honghao Lin, Yisong Ruan, David P. Woodruff | In this paper we explore the full power of such an oracle, showing that it can be applied to a wide array of problems in data streams, sometimes resulting in the first optimal bounds for such problems. | code |
438 | Structured Object-Aware Physics Prediction for Video Modeling and Planning | Jannik Kossen, Karl Stelzner, Marcel Hussing, Claas Voelcker, Kristian Kersting | We propose a structured object-aware video prediction model, which explicitly reasons about objects and demonstrate that it provides high-quality long term video predictions for planning. | code |
439 | Incorporating BERT into Neural Machine Translation | Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tieyan Liu | We propose a new algorithm named BERT-fused NMT, in which we first use BERT to extract representations for an input sequence, and then the representations are fused with each layer of the encoder and decoder of the NMT model through attention mechanisms. | code |
440 | MMA Training: Direct Input Space Margin Maximization through Adversarial Training | Gavin Weiguang Ding, Yash Sharma, Kry Yik Chau Lui, Ruitong Huang | We propose MMA training to directly maximize input space margin in order to improve adversarial robustness primarily by removing the requirement of specifying a fixed distortion bound. | |
441 | Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies | Xinyun Chen, Lu Wang, Yizhe Hang, Heng Ge, Hongyuan Zha | A new partially policy-agnostic method for infinite-horizon off-policy policy evalution with multiple known or unknown behavior policies. | |
442 | vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations | Alexei Baevski, Steffen Schneider, Michael Auli | Learn how to quantize speech signal and apply algorithms requiring discrete inputs to audio data such as BERT. | |
443 | Meta-learning curiosity algorithms | Ferran Alet*, Martin F. Schneider*, Tomas Lozano-Perez, Leslie Pack Kaelbling | Meta-learning curiosity algorithms by searching through a rich space of programs yields novel mechanisms that generalize across very different reinforcement-learning domains. | code |
444 | Making Efficient Use of Demonstrations to Solve Hard Exploration Problems | Caglar Gulcehre, Tom Le Paine, Bobak Shahriari, Misha Denil, Matt Hoffman, Hubert Soyer, Richard Tanburn, Steven Kapturowski, Neil Rabinowitz, Duncan Williams, Gabriel Barth-Maron, Ziyu Wang, Nando de Freitas, Worlds Team | We introduce R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions. | |
445 | VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning | Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson | VariBAD opens a path to tractable approximate Bayes-optimal exploration for deep RL using ideas from meta-learning, Bayesian RL, and approximate variational inference. | |
446 | Lookahead: A Far-sighted Alternative of Magnitude-based Pruning | Sejun Park*, Jaeho Lee*, Sangwoo Mo, Jinwoo Shin | We study a multi-layer generalization of the magnitude-based pruning. | code |
447 | Spike-based causal inference for weight alignment | Jordan Guerguiev, Konrad Kording, Blake Richards | We present a learning rule for feedback weights in a spiking neural network that addresses the weight transport problem. | code |
448 | Empirical Bayes Transductive Meta-Learning with Synthetic Gradients | Xu Hu, Pablo Moreno, Yang Xiao, Xi Shen, Guillaume Obozinski, Neil Lawrence | We propose a meta-learning approach that learns from multiple tasks in a transductive setting, by leveraging unlabeled information in the query set to learn a more powerful meta-model. | code |
449 | Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning | Noah Siegel, Jost Tobias Springenberg, Felix Berkenkamp, Abbas Abdolmaleki, Michael Neunert, Thomas Lampe, Roland Hafner, Nicolas Heess, Martin Riedmiller | We develop a method for stable offline reinforcement learning from logged data. The key is to regularize the RL policy towards a learned “advantage weighted” model of the data. | |
450 | Understanding the Limitations of Conditional Generative Models | Ethan Fetaya, Joern-Henrik Jacobsen, Will Grathwohl, Richard Zemel | In this work, we investigate robust classification with likelihood-based generative models from a theoretical and practical perspective to investigate if they can deliver on their promises. | |
451 | Demystifying Inter-Class Disentanglement | Aviv Gabbay, Yedid Hoshen | Latent Optimization for Representation Disentanglement | code |
452 | Mixed-curvature Variational Autoencoders | Ondrej Skopek, Gary B?cigneul, Octavian-Eugen Ganea | Variational Autoencoders with latent spaces modeled as products of constant curvature Riemannian manifolds improve on image reconstruction over single-manifold variants. | code |
453 | BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations | Hyungjun Kim, Kyungsu Kim, Jinseok Kim, Jae-Joon Kim | In this work, we introduce coordinate discrete gradient (CDG) to better estimate the gradient mismatch. | code |
454 | Model-based reinforcement learning for biological sequence design | Christof Angermueller, David Dohan, David Belanger, Ramya Deshpande, Kevin Murphy, Lucy Colwell | We augment model-free policy learning with a sequence-level surrogate reward functions and count-based visitation bonus and demonstrate effectiveness in the large batch, low-round regime seen in designing DNA and protein sequences. | |
455 | BayesOpt Adversarial Attack | Binxin Ru, Adam Cobb, Arno Blaas, Yarin Gal | We propose a query-efficient black-box attack which uses Bayesian optimisation in combination with Bayesian model selection to optimise over the adversarial perturbation and the optimal degree of search space dimension reduction. | code |
456 | Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies | Sungryull Sohn, Hyunjae Woo, Jongwook Choi, Honglak Lee | A novel meta-RL method that infers latent subtask structure | |
457 | Hypermodels for Exploration | Vikranth Dwaracherla, Xiuyuan Lu, Morteza Ibrahimi, Ian Osband, Zheng Wen, Benjamin Van Roy | Hypermodels can encode posterior distributions similar to large ensembles at much smaller computational cost. This can facilitate significant improvements in exploration. | |
458 | RaPP: Novelty Detection with Reconstruction along Projection Pathway | Ki Hyun Kim, Sangwoo Shim, Yongsub Lim, Jongseob Jeon, Jeongwoo Choi, Byungchan Kim, Andre S. Yoon | A new methodology for novelty detection by utilizing hidden space activation values obtained from a deep autoencoder. | code |
459 | Dynamics-Aware Embeddings | William Whitney, Rajat Agarwal, Kyunghyun Cho, Abhinav Gupta | State and action embeddings which incorporate the dynamics improve exploration and RL from pixels. | code |
460 | Functional Regularisation for Continual Learning with Gaussian Processes | Michalis K. Titsias, Jonathan Schwarz, Alexander G. de G. Matthews, Razvan Pascanu, Yee Whye Teh | Using inducing point sparse Gaussian process methods to overcome catastrophic forgetting in neural networks. | |
461 | You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings | Daniel Ruffinelli, Samuel Broscheit, Rainer Gemulla | We study the impact of training strategies on the performance of knowledge graph embeddings. | |
462 | AdvectiveNet: An Eulerian-Lagrangian Fluidic Reservoir for Point Cloud Processing | Xingzhe He, Helen Lu Cao, Bo Zhu | We present a new grid-particle learning method to process point clouds motivated by computational fluid dynamics. | code |
463 | Never Give Up: Learning Directed Exploration Strategies | Adri? Puigdom?nech Badia, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Bilal Piot, Steven Kapturowski, Olivier Tieleman, Martin Arjovsky, Alexander Pritzel, Andrew Bolt, Charles Blundell | We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies. | |
464 | Fair Resource Allocation in Federated Learning | Tian Li, Maziar Sanjabi, Ahmad Beirami, Virginia Smith | We propose a novel optimization objective that encourages fairness in heterogeneous federated networks, and develop a scalable method to solve it. | |
465 | Smooth markets: A basic mechanism for organizing gradient-based learners | David Balduzzi, Wojciech M. Czarnecki, Edward Hughes, Joel Leibo, Ian Gemp, Tom Anthony, Georgios Piliouras, Thore Graepel | We introduce a class of n-player games suited to gradient-based methods. | |
466 | StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding | Wei Wang, Bin Bi, Ming Yan, Chen Wu, Jiangnan Xia, Zuyi Bao, Liwei Peng, Luo Si | Inspired by the linearization exploration work of Elman, we extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. | |
467 | Training binary neural networks with real-to-binary convolutions | Brais Martinez, Jing Yang, Adrian Bulat, Georgios Tzimiropoulos | This paper shows how to train binary networks to within a few percent points (~3-5 %) of the full precision counterpart with a negligible increase in the computational cost. | |
468 | Permutation Equivariant Models for Compositional Generalization in Language | Jonathan Gordon, David Lopez-Paz, Marco Baroni, Diane Bouchacourt | We propose a link between permutation equivariance and compositional generalization, and provide equivariant language models | |
469 | Continual learning with hypernetworks | Johannes von Oswald, Christian Henning, Jo?o Sacramento, Benjamin F. Grewe | To overcome this problem, we present a novel approach based on task-conditioned hypernetworks, i.e., networks that generate the weights of a target model based on task identity. | |
470 | Phase Transitions for the Information Bottleneck in Representation Learning | Tailin Wu, Ian Fischer | We give a theoretical analysis of the Information Bottleneck objective to understand and predict observed phase transitions. | |
471 | Variational Template Machine for Data-to-Text Generation | Rong Ye, Wenxian Shi, Hao Zhou, Zhongyu Wei, Lei Li | We propose the variational template machine (VTM), a novel method to generate text descriptions from data tables. | |
472 | MEMORY-BASED GRAPH NETWORKS | Amir hosein Khasahmadi, Kaveh Hassani, Parsa Moradi, Leo Lee, Quaid Morris | We introduce efficient memory layers for graph neural networks | |
473 | AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty | Dan Hendrycks*, Norman Mu*, Ekin Dogus Cubuk, Barret Zoph, Justin Gilmer, Balaji Lakshminarayanan | We obtain state-of-the-art on robustness to data shifts, and we maintain calibration under data shift even though even when accuracy drops | |
474 | AtomNAS: Fine-Grained End-to-End Neural Architecture Search | Jieru Mei, Yingwei Li, Xiaochen Lian, Xiaojie Jin, Linjie Yang, Alan Yuille, Jianchao Yang | A new state-of-the-art on Imagenet for mobile setting | code |
475 | Residual Energy-Based Models for Text Generation | Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam | We show that Energy-Based models when trained on the residual of an auto-regressive language model can be used effectively and efficiently to generate text. | |
476 | A closer look at the approximation capabilities of neural networks | Kai Fong Ernest Chong | A quantitative refinement of the universal approximation theorem via an algebraic approach. | |
477 | Deep Audio Priors Emerge From Harmonic Convolutional Networks | Zhoutong Zhang, Yunyun Wang, Chuang Gan, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman | A new operation called Harmonic Convolution makes deep network model audio priors without training. | code |
478 | Expected Information Maximization: Using the I-Projection for Mixture Density Estimation | Philipp Becker, Oleg Arenz, Gerhard Neumann | A novel, non-adversarial, approach to learn latent variable models in general and mixture models in particular by computing the I-Projection solely based on samples. | code |
479 | A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms | Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Nan Rosemary Ke, Sebastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, Christopher Pal | This paper proposes a meta-learning objective based on speed of adaptation to transfer distributions to discover a modular decomposition and causal variables. | code |
480 | On the interaction between supervision and self-play in emergent communication | Ryan Lowe*, Abhinav Gupta*, Jakob Foerster, Douwe Kiela, Joelle Pineau | In this paper, we investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency: imitating human language data via supervised learning, and maximizing reward in a simulated multi-agent environment via self-play (as done in emergent communication), and introduce the term \textit{supervised self-play (S2P)} for algorithms using both of these signals. | |
481 | Dynamic Model Pruning with Feedback | Tao Lin, Sebastian U. Stich, Luis Barba, Daniil Dmitriev, Martin Jaggi | We propose a novel model compression method that generates a sparse trained model without additional overhead: by allowing (i) dynamic allocation of the sparsity pattern and (ii) incorporating feedback signal to reactivate prematurely pruned weights we obtain a performant sparse model in one single training pass (retraining is not needed, but can further improve the performance). | |
482 | Latent Normalizing Flows for Many-to-Many Cross Domain Mappings | Shweta Mahajan, Iryna Gurevych, Stefan Roth | Therefore, we propose a novel semi-supervised framework, which models shared information between domains and domain-specific information separately. | code |
483 | Transferring Optimality Across Data Distributions via Homotopy Methods | Matilde Gargiani, Andrea Zanelli, Quoc Tran Dinh, Moritz Diehl, Frank Hutter | We propose a new homotopy-based method to transfer “optimality knowledge” across different data distributions in order to speed up training of deep models. | |
484 | Regularizing activations in neural networks via distribution matching with the Wassertein metric | Taejong Joo, Donggu Kang, Byunghoon Kim | We propose the projected error function regularization loss (PER) that encourages activations to follow the standard normal distribution. | |
485 | Mutual Information Gradient Estimation for Representation Learning | Liangjian Wen, Yiji Zhou, Lirong He, Mingyuan Zhou, Zenglin Xu | Therefore, we propose the Mutual Information Gradient Estimator (MIGE) for representation learning based on score estimation of implicit distributions. | |
486 | Efficient Transformer for Mobile Applications | Zhanghao Wu*, Zhijian Liu*, Ji Lin, Yujun Lin, Song Han | In this paper, we investigate the mobile setting (under 500M Mult-Adds) for NLP tasks to facilitate the deployment on the edge devices. | |
487 | A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case | Greg Ongie, Rebecca Willett, Daniel Soudry, Nathan Srebro | We characterize the space of functions realizable as a ReLU network with an unbounded number of units (infinite width), but where the Euclidean norm of the weights is bounded. | |
488 | Adversarial Lipschitz Regularization | D?vid Terj?k | alternative to gradient penalty | code |
489 | Compositional Continual Language Learning | Yuanpeng Li, Liang Zhao, Kenneth Church, Mohamed Elhoseiny | Inspired by that, in this paper, we propose a method for compositional continual learning of sequence-to-sequence models. | code |
490 | End to End Trainable Active Contours via Differentiable Rendering | Shir Gur, Tal Shaharabany, Lior Wolf | We present an image segmentation method that iteratively evolves a polygon. | |
491 | Provable Filter Pruning for Efficient Neural Networks | Lucas Liebenwein, Cenk Baykal, Harry Lang, Dan Feldman, Daniela Rus | A sampling-based filter pruning approach for convolutional neural networks exhibiting provable guarantees on the size and performance of the pruned network. | |
492 | HOW THE CHOICE OF ACTIVATION AFFECTS TRAINING OF OVERPARAMETRIZED NEURAL NETS | Abhishek Panigrahi, Abhishek Shetty, Navin Goyal | We provide theoretical results about the effect of activation function on the training of highly overparametrized 2-layer neural networks | code |
493 | Lipschitz constant estimation for Neural Networks via sparse polynomial optimization | Fabian Latorre, Paul Rolland, Volkan Cevher | We introduce LiPopt, a polynomial optimization framework for computing increasingly tighter upper bound on the Lipschitz constant of neural networks. | code |
494 | State Alignment-based Imitation Learning | Fangchen Liu, Zhan Ling, Tongzhou Mu, Hao Su | We propose a novel state alignment-based imitation learning method to train the imitator by following the state sequences in the expert demonstrations as much as possible. | |
495 | Learning to Group: A Bottom-Up Framework for 3D Part Discovery in Unseen Categories | Tiange Luo, Kaichun Mo, Zhiao Huang, Siyu Hu, Jiarui Xu, Liwei Wang, Hao Su | We propose a learning-based iterative grouping framework which learns a grouping policy to progressively merge small part proposals into bigger ones in a bottom-up fashion and achieve the state-of-the-art performance in open-context setting. | |
496 | Discriminative Particle Filter Reinforcement Learning for Complex Partial observations | Xiao Ma, Peter Karkus, Nan Ye, David Hsu, Wee Sun Lee | We introduce DPFRL, a framework for reinforcement learning under partial and complex observations with a fully differentiable discriminative particle filter | |
497 | Unrestricted Adversarial Examples via Semantic Manipulation | Anand Bhattad, Min Jin Chong, Kaizhao Liang, Bo Li, David Forsyth | We introduce unrestricted perturbations that manipulate semantically meaningful image-based visual descriptors — color and texture — in order to generate effective and photorealistic adversarial examples. | code |
498 | Classification-Based Anomaly Detection for General Data | Liron Bergman, Yedid Hoshen | An anomaly detection that: uses random-transformation classification for generalizing to non-image data. | |
499 | Scale-Equivariant Steerable Networks | Ivan Sosnovik, Michal Szmaja, Arnold Smeulders | In this work, we pay attention to scale changes,which regularly appear in various tasks due to the changing distancesbetween the objects and the camera.First, we introduce the general theory for building scale-equivariantconvolutional networks with steerable filters. | |
500 | On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning | Jian Li, Xuanyuan Luo, Mingda Qiao | We give some generalization error bounds of noisy gradient methods such as SGLD, Langevin dynamics, noisy momentum and so forth. | |
501 | Consistency Regularization for Generative Adversarial Networks | Han Zhang, Zizhao Zhang, Augustus Odena, Honglak Lee | In this work, we propose a simple and effective training stabilizer based on the notion of Consistency Regularization – a popular technique in the Semi-Supervised Learning literature. | |
502 | Differentiable learning of numerical rules in knowledge graphs | Po-Wei Wang, Daria Stepanova, Csaba Domokos, J. Zico Kolter | We present an efficient approach to integrating numerical comparisons into differentiable rule learning in knowledge graphs | |
503 | Learning to Move with Affordance Maps | William Qi, Ravi Teja Mullapudi, Saurabh Gupta, Deva Ramanan | We address the task of autonomous exploration and navigation using spatial affordance maps that can be learned in a self-supervised manner, these outperform classic geometric baselines while being more sample efficient than contemporary RL algorithms | |
504 | Neural tangent kernels, transportation mappings, and universal approximation | Ziwei Ji, Matus Telgarsky, Ruicheng Xian | The NTK linearization is a universal approximator, even when looking arbitrarily close to initialization | |
505 | SCALABLE OBJECT-ORIENTED SEQUENTIAL GENERATIVE MODELS | Jindong Jiang, Sepehr Janghorbani, Gerard De Melo, Sungjin Ahn | In this paper, we propose SCALOR, a generative model for Scalable Sequential Object-Oriented Representation. | |
506 | Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks | Tribhuvanesh Orekondy, Bernt Schiele, Mario Fritz | We propose the first approach that can resist DNN model stealing/extraction attacks | |
507 | Domain Adaptive Multiflow Networks | R?ger Berm?dez-Chac?n, Mathieu Salzmann, Pascal Fua | A Multiflow Network is a dynamic architecture for domain adaptation that learns potentially different computational graphs per domain, so as to map them to a common representation where inference can be performed in a domain-agnostic fashion. | |
508 | Differentiable Programming for Physical Simulation | Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, Fredo Durand | We study the problem of learning and optimizing through physical simulations via differentiable programming, using our proposed DiffSim programming language and compiler. | code |
509 | Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning | Arsenii Ashukha, Alexander Lyzhov, Dmitry Molchanov, Dmitry Vetrov | We highlight the problems with common metrics of in-domain uncertainty and perform a broad study of modern ensembling techniques. | |
510 | Episodic Reinforcement Learning with Associative Memory | Guangxiang Zhu*, Zichuan Lin*, Guangwen Yang, Chongjie Zhang | To improve sample efficiency of reinforcement learning, we propose a novel framework, calledEpisodic Reinforcement Learning with Associative Memory (ERLAM), which associates related experience trajectories to enable reasoning effective strategies. | |
511 | Sub-policy Adaptation for Hierarchical Reinforcement Learning | Alexander Li, Carlos Florensa, Ignasi Clavera, Pieter Abbeel | We propose HiPPO, a stable Hierarchical Reinforcement Learning algorithm that can train several levels of the hierarchy simultaneously, giving good performance both in skill discovery and adaptation. | code |
512 | Critical initialisation in continuous approximations of binary neural networks | George Stamatescu, Federica Gerace, Carlo Lucibello, Ian Fuss, Langford White | signal propagation theory applied to continuous surrogates of binary nets; counter intuitive initialisation; reparameterisation trick not helpful | |
513 | Deep Orientation Uncertainty Learning based on a Bingham Loss | Igor Glitschenski, Wilko Schwarting, Roshni Sahoo, Alexander Amini, Sertac Karaman | A method for learning uncertainties over orientations using the Bingham Distribution | |
514 | Co-Attentive Equivariant Neural Networks: Focusing Equivariance On Transformations Co-Ocurring in Data | David W. Romero Guzm?n, Mark Hoogendoorn | We utilize attention to restrict equivariant neural networks to the set or co-ocurring transformations in data. | code |
515 | Mixed Precision DNNs: All you need is a good parametrization | Stefan Uhlich, Lukas Mauch, Fabien Cardinaux, Kazuki Yoshiyama, Javier Alonso Garcia, Stephen Tiedemann, Thomas Kemp, Akira Nakamura | We show that a suited parametrization of the quantizer is the key to achievea stable training and a good final performance. | |
516 | Information Geometry of Orthogonal Initializations and Training | Piotr Aleksander Sok?l, Il Memming Park | nearly isometric DNN initializations imply low parameter space curvature, and a lower condition number, but that’s not always great | code |
517 | Extreme Classification via Adversarial Softmax Approximation | Robert Bamler, Stephan Mandt | An efficient, unbiased approximation of the softmax loss function for extreme classification | code |
518 | Learning Nearly Decomposable Value Functions Via Communication Minimization | Tonghan Wang*, Jianhao Wang*, Chongyi Zheng, Chongjie Zhang | To address this limitation, this paper presents a novel framework for learning nearly decomposable value functions with communication, with which agents act on their own most of the time but occasionally send messages to other agents in order for effective coordination. | |
519 | Robust Subspace Recovery Layer for Unsupervised Anomaly Detection | Chieh-Hsin Lai, Dongmian Zou, Gilad Lerman | This work proposes an autoencoder with a novel robust subspace recovery layer for unsupervised anomaly detection and demonstrates state-of-the-art results on various datasets. | |
520 | Learning to Coordinate Manipulation Skills via Skill Behavior Diversification | Youngwoon Lee, Jingyun Yang, Joseph J. Lim | We propose to tackle complex tasks of multiple agents by learning composable primitive skills and coordination of the skills. | |
521 | NAS-BENCH-1SHOT1: BENCHMARKING AND DISSECTING ONE-SHOT NEURAL ARCHITECTURE SEARCH | Arber Zela, Julien Siems, Frank Hutter | In order to allowa scientific study of these components, we introduce a general framework forone-shot NAS that can be instantiated to many recently-introduced variants andintroduce a general benchmarking framework that draws on the recent large-scaletabular benchmark NAS-Bench-101 for cheap anytime evaluations of one-shotNAS methods. | code |
522 | Conservative Uncertainty Estimation By Fitting Prior Networks | Kamil Ciosek, Vincent Fortuin, Ryota Tomioka, Katja Hofmann, Richard Turner | We provide theoretical support to uncertainty estimates for deep learning obtained fitting random priors. | |
523 | Understanding Generalization in Recurrent Neural Networks | Zhuozhuo Tu, Fengxiang He, Dacheng Tao | In this work, we develop the theory for analyzing the generalization performance of recurrent neural networks. | |
524 | The Shape of Data: Intrinsic Distance for Data Distributions | Anton Tsitsulin, Marina Munkhoeva, Davide Mottin, Panagiotis Karras, Alex Bronstein, Ivan Oseledets, Emmanuel Mueller | We propose a metric for comparing data distributions based on their geometry while not relying on any positional information. | code |
525 | How to 0wn the NAS in Your Spare Time | Sanghyun Hong, Michael Davinroy, Yigitcan Kaya, Dana Dachman-Soled, Tudor Dumitras | We design an algorithm that reconstructs the key components of a novel deep learning system by exploiting a small amount of information leakage from a cache side-channel attack, Flush+Reload. | |
526 | Enabling Deep Spiking Neural Networks with Hybrid Conversion and Spike Timing Dependent Backpropagation | Nitin Rathi, Gopalakrishnan Srinivasan, Priyadarshini Panda, Kaushik Roy | To address these challenges, we present a computationally-efficient training technique for deep SNNs. | code |
527 | BREAKING CERTIFIED DEFENSES: SEMANTIC ADVERSARIAL EXAMPLES WITH SPOOFED ROBUSTNESS CERTIFICATES | Amin Ghiasi, Ali Shafahi, Tom Goldstein | We present an attack that maintains the imperceptibility property of adversarial examples while being outside of the certified radius. | |
528 | Query-efficient Meta Attack to Deep Neural Networks | Jiawei Du, Hu Zhang, Joey Tianyi Zhou, Yi Yang, Jiashi Feng | In this work, we propose a meta attack approach that is capable of attacking a targeted model with much fewer queries. | |
529 | Massively Multilingual Sparse Word Representations | G?bor Berend | We propose an efficient algorithm for determining multilingually comparable sparse word representations that we release for 27 typologically diverse languages. | code |
530 | Monotonic Multihead Attention | Xutai Ma, Juan Miguel Pino, James Cross, Liezl Puzon, Jiatao Gu | Make the transformer streamable with monotonic attention. | |
531 | Gradients as Features for Deep Representation Learning | Fangzhou Mu, Yingyu Liang, Yin Li | Given a pre-trained model, we explored the per-sample gradients of the model parameters relative to a task-specific loss, and constructed a linear model that combines gradients of model parameters and the activation of the model. | |
532 | Pay Attention to Features, Transfer Learn faster CNNs | Kafeng Wang, Xitong Gao, Yiren Zhao, Xingjian Li, Dejing Dou, Cheng-Zhong Xu | We introduce attentive feature distillation and selection, to fine-tune a large model and produce a faster one. |