Most Influential ICLR Papers
The International Conference on Learning Representations (ICLR) is one of the top machine learning conferences in the world. Paper Digest Team analyze all papers published on ICLR in the past years, and presents the 15 most influential papers for each year. This ranking list is automatically constructed based upon citations from both research papers and granted patents, and will be frequently updated to reflect the most recent changes. To find the most influential papers from other conferences/journals, visit Best Paper Digest page. Note: the most influential papers may or may not include the papers that won the best paper awards. (Version: 2021-03)
If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. To search for papers with highlights, related papers, patents, grants, experts and organizations, please visit our search console. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
team@paperdigest.org
TABLE 1: Most Influential ICLR Papers
Year | Rank | Paper | Author(s) |
---|---|---|---|
2020 | 1 | ALBERT: A Lite BERT For Self-supervised Learning Of Language Representations IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: A new pretraining method that establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large. |
ZHENZHONG LAN et. al. |
2020 | 2 | On The Variance Of The Adaptive Learning Rate And Beyond IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: If warmup is the answer, what is the question? |
LIYUAN LIU et. al. |
2020 | 3 | The Curious Case Of Neural Text Degeneration IF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Current language generation systems either aim for high likelihood and devolve into generic repetition or miscalibrate their stochasticity?we provide evidence of both and propose a solution: Nucleus Sampling. |
Ari Holtzman; Jan Buys; Leo Du; Maxwell Forbes; Yejin Choi; |
2020 | 4 | BERTScore: Evaluating Text Generation With BERT IF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose BERTScore, an automatic evaluation metric for text generation, which correlates better with human judgments and provides stronger model selection performance than existing metrics. |
Tianyi Zhang*; Varsha Kishore*; Felix Wu*; Kilian Q. Weinberger; Yoav Artzi; |
2020 | 5 | ELECTRA: Pre-training Text Encoders As Discriminators Rather Than Generators IF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: A text encoder trained to distinguish real input tokens from plausible fakes efficiently learns effective language representations. |
Kevin Clark; Minh-Thang Luong; Quoc V. Le; Christopher D. Manning; |
2020 | 6 | Reformer: The Efficient Transformer IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Efficient Transformer with locality-sensitive hashing and reversible layers |
Nikita Kitaev; Lukasz Kaiser; Anselm Levskaya; |
2020 | 7 | VL-BERT: Pre-training Of Generic Visual-Linguistic Representations IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: VL-BERT is a simple yet powerful pre-trainable generic representation for visual-linguistic tasks. It is pre-trained on the massive-scale caption dataset and text-only corpus, and can be finetuned for varies down-stream visual-linguistic tasks. |
WEIJIE SU et. al. |
2020 | 8 | On The Convergence Of FedAvg On Non-IID Data IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs. |
Xiang Li; Kaixuan Huang; Wenhao Yang; Shusen Wang; Zhihua Zhang; |
2020 | 9 | Evaluating The Search Phase Of Neural Architecture Search IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We empirically disprove a fundamental hypothesis of the widely-adopted weight sharing strategy in neural architecture search and explain why the state-of-the-arts NAS algorithms performs similarly to random search. |
Kaicheng Yu; Christian Sciuto; Martin Jaggi; Claudiu Musat; Mathieu Salzmann; |
2020 | 10 | Meta-Dataset: A Dataset Of Datasets For Learning To Learn From Few Examples IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a new large-scale diverse environment for few-shot learning, and evaluate popular models’ performance on it, revealing important research challenges. |
ELENI TRIANTAFILLOU et. al. |
2020 | 11 | Once For All: Train One Network And Specialize It For Efficient Deployment IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce techniques to train a single once-for-all network that fits many hardware platforms. |
Han Cai; Chuang Gan; Tianzhe Wang; Zhekai Zhang; Song Han; |
2020 | 12 | Don’t Use Large Mini-batches, Use Local SGD IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: As a remedy, we propose a \emph{post-local} SGD and show that it significantly improves the generalization performance compared to large-batch training on standard benchmarks while enjoying the same efficiency (time-to-accuracy) and scalability. |
Tao Lin; Sebastian U. Stich; Kumar Kshitij Patel; Martin Jaggi; |
2020 | 13 | Large Batch Optimization For Deep Learning: Training BERT In 76 Minutes IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: A fast optimizer for general applications and large-batch training. |
YANG YOU et. al. |
2020 | 14 | Deep Double Descent: Where Bigger Models And More Data Hurt IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We demonstrate, and characterize, realistic settings where bigger models are worse, and more data hurts. |
PREETUM NAKKIRAN et. al. |
2020 | 15 | Emergent Tool Use From Multi-Agent Autocurricula IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Through multi-agent competition, the simple objective of hide-and-seek, and standard reinforcement learning algorithms at scale, we find that agents create a self-supervised autocurriculum inducing multiple distinct rounds of emergent strategy, many of which require sophisticated tool use and coordination. |
BOWEN BAKER et. al. |
2019 | 1 | Large Scale GAN Training For High Fidelity Natural Image Synthesis IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: GANs benefit from scaling up. |
Andrew Brock; Jeff Donahue; Karen Simonyan; |
2019 | 2 | GLUE: A Multi-Task Benchmark And Analysis Platform For Natural Language Understanding IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a multi-task benchmark and analysis platform for evaluating generalization in natural language understanding systems. |
ALEX WANG et. al. |
2019 | 3 | DARTS: Differentiable Architecture Search IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a differentiable architecture search algorithm for both convolutional and recurrent networks, achieving competitive performance with the state of the art using orders of magnitude less computation resources. |
Hanxiao Liu; Karen Simonyan; Yiming Yang; |
2019 | 4 | How Powerful Are Graph Neural Networks? IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We develop theoretical foundations for the expressive power of GNNs and design a provably most powerful GNN. |
Keyulu Xu*; Weihua Hu*; Jure Leskovec; Stefanie Jegelka; |
2019 | 5 | Decoupled Weight Decay Regularization IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Novel variants of optimization methods that combine the benefits of both adaptive and non-adaptive methods. |
Ilya Loshchilov; Frank Hutter; |
2019 | 6 | ProxylessNAS: Direct Neural Architecture Search On Target Task And Hardware IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Proxy-less neural architecture search for directly learning architectures on large-scale target task (ImageNet) while reducing the cost to the same level of normal training. |
Han Cai; Ligeng Zhu; Song Han; |
2019 | 7 | The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Feedforward neural networks that can have weights pruned after training could have had the same weights pruned before training |
Jonathan Frankle; Michael Carbin; |
2019 | 8 | ImageNet-trained CNNs Are Biased Towards Texture; Increasing Shape Bias Improves Accuracy And Robustness IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: ImageNet-trained CNNs are biased towards object texture (instead of shape like humans). Overcoming this major difference between human and machine vision yields improved detection performance and previously unseen robustness to image distortions. |
ROBERT GEIRHOS et. al. |
2019 | 9 | Robustness May Be At Odds With Accuracy IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We show that adversarial robustness might come at the cost of standard classification performance, but also yields unexpected benefits. |
Dimitris Tsipras; Shibani Santurkar; Logan Engstrom; Alexander Turner; Aleksander Madry; |
2019 | 10 | Learning Deep Representations By Mutual Information Estimation And Maximization IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We learn deep representation by maximizing mutual information, leveraging structure in the objective, and are able to compute with fully supervised classifiers with comparable architectures |
R DEVON HJELM et. al. |
2019 | 11 | Gradient Descent Provably Optimizes Over-parameterized Neural Networks IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We prove gradient descent achieves zero training loss with a linear rate on over-parameterized neural networks. |
Simon S. Du; Xiyu Zhai; Barnabas Poczos; Aarti Singh; |
2019 | 12 | Benchmarking Neural Network Robustness To Common Corruptions And Perturbations IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose ImageNet-C to measure classifier corruption robustness and ImageNet-P to measure perturbation robustness |
Dan Hendrycks; Thomas Dietterich; |
2019 | 13 | Rethinking The Value Of Network Pruning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In structured network pruning, fine-tuning a pruned model only gives comparable performance with training it from scratch. |
Zhuang Liu; Mingjie Sun; Tinghui Zhou; Gao Huang; Trevor Darrell; |
2019 | 14 | Meta-Learning With Latent Embedding Optimization IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Latent Embedding Optimization (LEO) is a novel gradient-based meta-learner with state-of-the-art performance on the challenging 5-way 1-shot and 5-shot miniImageNet and tieredImageNet classification tasks. |
ANDREI A. RUSU et. al. |
2019 | 15 | A Closer Look At Few-shot Classification IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: A detailed empirical study in few-shot classification that revealing challenges in standard evaluation setting and showing a new direction. |
Wei-Yu Chen; Yen-Cheng Liu; Zsolt Kira; Yu-Chiang Frank Wang; Jia-Bin Huang; |
2018 | 1 | Towards Deep Learning Models Resistant To Adversarial Attacks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We provide a principled, optimization-based re-look at the notion of adversarial examples, and develop methods that produce models that are adversarially robust against a wide range of adversaries. |
Aleksander Madry; Aleksandar Makelov; Ludwig Schmidt; Dimitris Tsipras; Adrian Vladu; |
2018 | 2 | Progressive Growing Of GANs For Improved Quality, Stability, And Variation IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We train generative adversarial networks in a progressive fashion, enabling us to generate high-resolution images with high quality. |
Tero Karras; Timo Aila; Samuli Laine; Jaakko Lehtinen; |
2018 | 3 | Graph Attention Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: A novel approach to processing graph-structured data by neural networks, leveraging attention over a node’s neighborhood. Achieves state-of-the-art results on transductive citation network tasks and an inductive protein-protein interaction task. |
PETAR VELICKOVIC et. al. |
2018 | 4 | Spectral Normalization For Generative Adversarial Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator of GANs. |
Takeru Miyato; Toshiki Kataoka; Masanori Koyama; Yuichi Yoshida; |
2018 | 5 | Mixup: Beyond Empirical Risk Minimization IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Training on convex combinations between random training examples and their labels improves generalization in deep neural networks |
Hongyi Zhang; Moustapha Cisse; Yann N. Dauphin; David Lopez-Paz; |
2018 | 6 | Ensemble Adversarial Training: Attacks And Defenses IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Adversarial training with single-step methods overfits, and remains vulnerable to simple black-box and white-box attacks. We show that including adversarial examples from multiple sources helps defend against black-box attacks. |
FLORIAN TRAM�R et. al. |
2018 | 7 | On The Convergence Of Adam And Beyond IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We investigate the convergence of popular optimization algorithms like Adam , RMSProp and propose new variants of these methods which provably converge to optimal solution in convex settings. |
Sashank J. Reddi; Satyen Kale; Sanjiv Kumar; |
2018 | 8 | Word Translation Without Parallel Data IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Aligning languages without the Rosetta Stone: with no parallel data, we construct bilingual dictionaries using adversarial training, cross-domain local scaling, and an accurate proxy criterion for cross-validation. |
Guillaume Lample; Alexis Conneau; Marc’Aurelio Ranzato; Ludovic Denoyer; Herv� J�gou; |
2018 | 9 | A Deep Reinforced Model For Abstractive Summarization IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: A summarization model combining a new intra-attention and reinforcement learning method to increase summary ROUGE scores and quality for long sequences. |
Romain Paulus; Caiming Xiong; Richard Socher; |
2018 | 10 | Unsupervised Representation Learning By Predicting Image Rotations IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In our work we propose to learn image features by training ConvNets to recognize the 2d rotation that is applied to the image that it gets as input. |
Spyros Gidaris; Praveer Singh; Nikos Komodakis; |
2018 | 11 | Regularizing And Optimizing LSTM Language Models IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Effective regularization and optimization strategies for LSTM-based language models achieves SOTA on PTB and WT2. |
Stephen Merity; Nitish Shirish Keskar; Richard Socher; |
2018 | 12 | Unsupervised Machine Translation Using Monolingual Corpora Only IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a new unsupervised machine translation model that can learn without using parallel corpora; experimental results show impressive performance on multiple corpora and pairs of languages. |
Guillaume Lample; Alexis Conneau; Ludovic Denoyer; Marc’Aurelio Ranzato; |
2018 | 13 | Countering Adversarial Images Using Input Transformations IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We apply a model-agnostic defense strategy against adversarial examples and achieve 60% white-box accuracy and 90% black-box accuracy against major attack algorithms. |
Chuan Guo; Mayank Rana; Moustapha Cisse; Laurens van der Maaten; |
2018 | 14 | A Simple Neural Attentive Meta-Learner IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: a simple RNN-based meta-learner that achieves SOTA performance on popular benchmarks |
Nikhil Mishra; Mostafa Rohaninejad; Xi Chen; Pieter Abbeel; |
2018 | 15 | Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Defense-GAN uses a Generative Adversarial Network to defend against white-box and black-box attacks in classification models. |
Pouya Samangouei; Maya Kabkab; Rama Chellappa; |