Most Influential ICLR Papers (2024-05)
The International Conference on Learning Representations (ICLR) is one of the top machine learning conferences in the world. Paper Digest Team analyzes all papers published on ICLR in the past years, and presents the 15 most influential papers for each year. This ranking list is automatically constructed based upon citations from both research papers and granted patents, and will be frequently updated to reflect the most recent changes. To find the latest version of this list or the most influential papers from other conferences/journals, please visit Best Paper Digest page. Note: the most influential papers may or may not include the papers that won the best paper awards. (Version: 2024-05)
To search or review papers within ICLR related to a specific topic, please use the search by venue (ICLR) and review by venue (ICLR) services. To browse the most productive ICLR authors by year ranked by #papers accepted, here is a list of most productive ICLR authors.
Based in New York, Paper Digest is dedicated to producing high-quality text analysis results that people can acturally use on a daily basis. Since 2018, we have been serving users across the world with a number of exclusive services to track, search, review and rewrite scientific literature.
You are welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Most Influential ICLR Papers (2024-05)
Year | Rank | Paper | Author(s) |
---|---|---|---|
2024 | 1 | MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We believe that the enhanced multi-modal generation capabilities of GPT-4 stem from the utilization of sophisticated large language models (LLM). To examine this phenomenon, we present MiniGPT-4, which aligns a frozen visual encoder with a frozen advanced LLM, Vicuna, using one projection layer. |
Deyao Zhu; Jun Chen; Xiaoqian Shen; Xiang Li; Mohamed Elhoseiny; |
2024 | 2 | SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Stable Diffusion XL (SDXL), a latent diffusion model for text-to-image synthesis. |
DUSTIN PODELL et. al. |
2024 | 3 | FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe that the inefficiency is due to suboptimal work partitioning between different thread blocks and warps on the GPU, causing either low-occupancy or unnecessary shared memory reads/writes. We propose FlashAttention-2, with better work partitioning to address these issues. |
Tri Dao; |
2024 | 4 | Teaching Large Language Models to Self-Debug IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, for complex programming tasks, generating the correct solution in one go becomes challenging, thus some prior works have designed program repair approaches to improve code generation performance. In this work, we propose self-debugging, which teaches a large language model to debug its predicted program. |
Xinyun Chen; Maxwell Lin; Nathanael Schärli; Denny Zhou; |
2024 | 5 | WizardCoder: Empowering Code Large Language Models with Evol-Instruct IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Code Evol-Instruct, a novel approach that adapts the Evol-Instruct method to the realm of code, enhancing Code LLMs to create novel models, WizardCoder. |
ZIYANG LUO et. al. |
2024 | 6 | ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is in contrast to the excellent tool-use capabilities of state-of-the-art (SOTA) closed-source LLMs, e.g., ChatGPT. To bridge this gap, we introduce ToolLLM, a general tool-use framework encompassing data construction, model training, and evaluation. |
YUJIA QIN et. al. |
2024 | 7 | Let’s Verify Step By Step IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset. |
HUNTER LIGHTMAN et. al. |
2024 | 8 | MVDream: Multi-view Diffusion for 3D Generation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MVDream, a diffusion model that is able to generate consistent multi-view images from a given text prompt. |
YICHUN SHI et. al. |
2024 | 9 | AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models Without Specific Tuning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present AnimateDiff, a practical framework for animating personalized T2I models without requiring model-specific tuning. |
YUWEI GUO et. al. |
2024 | 10 | DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose DreamGaussian, a novel 3D content generation framework that achieves both efficiency and quality simultaneously. |
Jiaxiang Tang; Jiawei Ren; Hang Zhou; Ziwei Liu; Gang Zeng; |
2024 | 11 | Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present “Magic123”, a two-stage coarse-to-fine approach for high-quality, textured 3D mesh generation from a single image in the wild using *both 2D and 3D priors*. |
GUOCHENG QIAN et. al. |
2024 | 12 | Stochastic Controlled Averaging for Federated Learning with Communication Compression IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we revisit the seminal stochastic controlled averaging method by proposing an equivalent but more efficient/simplified formulation with halved uplink communication costs, building upon which we propose two compressed FL algorithms, SCALLION and SCAFCOM, to support unbiased and biased compression, respectively. |
Xinmeng Huang; Ping Li; Xiaoyun Li; |
2024 | 13 | Large Language Models As Optimizers IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as optimizers, where the optimization task is described in natural language. |
CHENGRUN YANG et. al. |
2024 | 14 | SyncDreamer: Generating Multiview-consistent Images from A Single-view Image IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel diffusion model called SyncDreamer that generates multiview-consistent images from a single-view image. |
YUAN LIU et. al. |
2024 | 15 | MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we introduce MetaGPT, an innovative meta-programming framework incorporating efficient human workflows into LLM-based multi-agent collaborations. |
SIRUI HONG et. al. |
2023 | 1 | Self-Consistency Improves Chain of Thought Reasoning in Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. |
XUEZHI WANG et. al. |
2023 | 2 | DreamFusion: Text-to-3D Using 2D Diffusion IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Adapting this approach to 3D synthesis would require large-scale datasets of labeled 3D or multiview data and efficient architectures for denoising 3D data, neither of which currently exist. In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis. |
Ben Poole; Ajay Jain; Jonathan T. Barron; Ben Mildenhall; |
2023 | 3 | An Image Is Worth One Word: Personalizing Text-to-Image Generation Using Textual Inversion IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In other words, we ask: how can we use language-guided models to turn *our* cat into a painting, or imagine a new product based on *our* favorite toy? Here we present a simple approach that allows such creative freedom. |
RINON GAL et. al. |
2023 | 4 | ReAct: Synergizing Reasoning and Acting in Language Models IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two: reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with external sources, such as knowledge bases or environments, to gather additional information. |
SHUNYU YAO et. al. |
2023 | 5 | Make-A-Video: Text-to-Video Generation Without Text-Video Data IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Make-A-Video — an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). |
URIEL SINGER et. al. |
2023 | 6 | GLM-130B: An Open Bilingual Pre-trained Model IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters. |
AOHAN ZENG et. al. |
2023 | 7 | DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present DINO (DETR with Improved deNoising anchOr boxes), a strong end-to-end object detector. |
HAO ZHANG et. al. |
2023 | 8 | CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The prevalence of large language models advances the state-of-the-art for program synthesis, though limited training resources and data impede open access to such models. To democratize this, we train and release a family of large language models up to 16.1B parameters, called CODEGEN, on natural language and programming language data, and open source the training library JAXFORMER. |
ERIK NIJKAMP et. al. |
2023 | 9 | Conditional Positional Encodings for Vision Transformers IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a conditional positional encoding (CPE) scheme for vision Transformers. |
Xiangxiang Chu; Zhi Tian; Bo Zhang; Xinlong Wang; Chunhua Shen; |
2023 | 10 | PaLI: A Jointly-Scaled Multilingual Language-Image Model IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present PaLI, a model that extends this approach to the joint modeling of language and vision. |
XI CHEN et. al. |
2023 | 11 | Large Language Models Are Human-Level Prompt Engineers IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by classical program synthesis and the human approach to prompt engineering, we propose Automatic Prompt Engineer (APE) for automatic instruction generation and selection. |
YONGCHAO ZHOU et. al. |
2023 | 12 | Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate how multimodal prompt engineering can use language as the intermediate representation to combine complementary knowledge from different pretrained (potentially multimodal) language models for a variety of tasks. |
ANDY ZENG et. al. |
2023 | 13 | InCoder: A Generative Model for Code Infilling and Synthesis IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via masking and infilling). |
DANIEL FRIED et. al. |
2023 | 14 | Human Motion Diffusion Model IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Motion Diffusion Model (MDM), a carefully adapted classifier-free diffusion-based generative model for human motion data. |
GUY TEVET et. al. |
2023 | 15 | Quantifying Memorization Across Neural Language Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe three log-linear relationships that quantify the degree to which LMs emit memorized training data. |
NICHOLAS CARLINI et. al. |
2022 | 1 | LoRA: Low-Rank Adaptation of Large Language Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Finetuning updates have a low intrinsic rank which allows us to train only the rank decomposition matrices of certain weights, yielding better performance and practical benefits. |
EDWARD J HU et. al. |
2022 | 2 | Multitask Prompted Training Enables Zero-Shot Task Generalization IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping general natural language tasks into a human-readable prompted form. |
VICTOR SANH et. al. |
2022 | 3 | SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these issues, we introduce a new image synthesis and editing method, Stochastic Differential Editing (SDEdit), based on a diffusion model generative prior, which synthesizes realistic images by iteratively denoising through a stochastic differential equation (SDE). |
CHENLIN MENG et. al. |
2022 | 4 | MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Light-weight and general-purpose vision transformers for mobile devices |
Sachin Mehta; Mohammad Rastegari; |
2022 | 5 | VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Variance regularization prevents collapse in self-supervised representation learning |
Adrien Bardes; Jean Ponce; Yann LeCun; |
2022 | 6 | SimVLM: Simple Visual Language Model Pretraining with Weak Supervision IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we relax these constraints and present a minimalist pretraining framework, named Simple Visual Language Model (SimVLM). |
ZIRUI WANG et. al. |
2022 | 7 | How Attentive Are Graph Attention Networks? IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify that Graph Attention Networks (GAT) compute a very weak form of attention. We show its empirical implications and propose a fix. |
Shaked Brody; Uri Alon; Eran Yahav; |
2022 | 8 | Open-vocabulary Object Detection Via Vision and Language Knowledge Distillation IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose using knowledge distillation to train an object detector that can detect objects with arbitrary text inputs, outperforming its supervised counterparts on rare categories. |
Xiuye Gu; Tsung-Yi Lin; Weicheng Kuo; Yin Cui; |
2022 | 9 | Towards A Unified View of Parameter-Efficient Transfer Learning IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a unified framework for several state-of-the-art parameter-efficient tuning methods, |
Junxian He; Chunting Zhou; Xuezhe Ma; Taylor Berg-Kirkpatrick; Graham Neubig; |
2022 | 10 | FILIP: Fine-grained Interactive Language-Image Pre-Training IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a large-scale Fine-grained Interacitve Language-Image Pretraining (FILIP) to achieve finer-level alignment through a new cross-modal late interaction mechanism, which can boost the performance on more grounded vision and language tasks. Furthermore, we construct a new large-scale image-text pair dataset called FILIP300M for pre-training. |
LEWEI YAO et. al. |
2022 | 11 | DAB-DETR: Dynamic Anchor Boxes Are Better Queries for DETR IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present in this paper a novel query formulation using dynamic anchor boxes for DETR and offer a deeper understanding of the role of queries in DETR. |
SHILONG LIU et. al. |
2022 | 12 | An Explanation of In-context Learning As Implicit Bayesian Inference IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In-context learning emerges both theoretically and empirically when the pretraining distribution is a mixture distribution, resulting in the language model implicitly performing Bayesian inference in its forward pass. |
Sang Michael Xie; Aditi Raghunathan; Percy Liang; Tengyu Ma; |
2022 | 13 | Pseudo Numerical Methods for Diffusion Models on Manifolds IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose PNDMs, a new kind of numerical method, to accelerate diffusion models on manifolds. |
Luping Liu; Yi Ren; Zhijie Lin; Zhou Zhao; |
2022 | 14 | Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that our simple position method enables transformer LMs to efficiently and accurately perform inference on longer sequences than they were trained on. |
Ofir Press; Noah Smith; Mike Lewis; |
2022 | 15 | Language-driven Semantic Segmentation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a language-driven approach that enables synthesis of zero-shot semantic segmentation models from arbitrary label sets at test time. |
Boyi Li; Kilian Q Weinberger; Serge Belongie; Vladlen Koltun; Rene Ranftl; |
2021 | 1 | An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Transformers applied directly to image patches and pre-trained on large datasets work really well on image classification. |
ALEXEY DOSOVITSKIY et. al. |
2021 | 2 | Deformable DETR: Deformable Transformers for End-to-End Object Detection IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Deformable DETR is an efficient and fast-converging end-to-end object detector. It mitigates the high complexity and slow convergence issues of DETR via a novel sampling-based efficient attention mechanism. |
XIZHOU ZHU et. al. |
2021 | 3 | Denoising Diffusion Implicit Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show and justify a GAN-like iterative generative model with relatively fast sampling, high sample quality and without any adversarial training. |
Jiaming Song; Chenlin Meng; Stefano Ermon; |
2021 | 4 | Score-Based Generative Modeling Through Stochastic Differential Equations IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A general framework for training and sampling from score-based models that unifies and generalizes previous methods, allows likelihood computation, and enables controllable generation. |
YANG SONG et. al. |
2021 | 5 | DEBERTA: DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A new model architecture DeBERTa is proposed that improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. |
Pengcheng He; Xiaodong Liu; Jianfeng Gao; Weizhu Chen; |
2021 | 6 | Fourier Neural Operator for Parametric Partial Differential Equations IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A novel neural operator based on Fourier transformation for learning partial differential equations. |
ZONGYI LI et. al. |
2021 | 7 | Measuring Massive Multitask Language Understanding IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We test language models on 57 different multiple-choice tasks. |
DAN HENDRYCKS et. al. |
2021 | 8 | Rethinking Attention with Performers IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Performers, linear full-rank-attention Transformers via provable random feature approximation methods, without relying on sparsity or low-rankness. |
KRZYSZTOF MARCIN CHOROMANSKI et. al. |
2021 | 9 | Adaptive Federated Optimization IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose adaptive federated optimization techniques, and highlight their improved performance over popular methods such as FedAvg. |
SASHANK J. REDDI et. al. |
2021 | 10 | FastSpeech 2: Fast and High-Quality End-to-End Text to Speech IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a non-autoregressive TTS model named FastSpeech 2 to better solve the one-to-many mapping problem in TTS and surpass autoregressive models in voice quality. |
YI REN et. al. |
2021 | 11 | DiffWave: A Versatile Diffusion Model for Audio Synthesis IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: DiffWave is a versatile diffusion probabilistic model for waveform generation, which matches the state-of-the-art neural vocoder in terms of quality and can generate abundant realistic voices in time-domain without any conditional information. |
Zhifeng Kong; Wei Ping; Jiaji Huang; Kexin Zhao; Bryan Catanzaro; |
2021 | 12 | Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper improves the learning of dense text retrieval using ANCE, which selects global negatives with bigger gradient norms using an asynchronously updated ANN index. |
LEE XIONG et. al. |
2021 | 13 | Sharpness-aware Minimization for Efficiently Improving Generalization IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the connection between geometry of the loss landscape and generalization, we introduce a procedure for simultaneously minimizing loss value and loss sharpness. |
Pierre Foret; Ariel Kleiner; Hossein Mobahi; Behnam Neyshabur; |
2021 | 14 | In Search of Lost Domain Generalization IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our ERM baseline achieves state-of-the-art performance across many domain generalization benchmarks |
Ishaan Gulrajani; David Lopez-Paz; |
2021 | 15 | Prototypical Contrastive Learning of Unsupervised Representations IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an unsupervised representation learning method that bridges contrastive learning with clustering in an EM framework. |
Junnan Li; Pan Zhou; Caiming Xiong; Steven Hoi; |
2020 | 1 | ALBERT: A Lite BERT For Self-supervised Learning Of Language Representations IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A new pretraining method that establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large. |
ZHENZHONG LAN et. al. |
2020 | 2 | BERTScore: Evaluating Text Generation With BERT IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose BERTScore, an automatic evaluation metric for text generation, which correlates better with human judgments and provides stronger model selection performance than existing metrics. |
Tianyi Zhang*; Varsha Kishore*; Felix Wu*; Kilian Q. Weinberger; Yoav Artzi; |
2020 | 3 | The Curious Case Of Neural Text Degeneration IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current language generation systems either aim for high likelihood and devolve into generic repetition or miscalibrate their stochasticity?we provide evidence of both and propose a solution: Nucleus Sampling. |
Ari Holtzman; Jan Buys; Leo Du; Maxwell Forbes; Yejin Choi; |
2020 | 4 | ELECTRA: Pre-training Text Encoders As Discriminators Rather Than Generators IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A text encoder trained to distinguish real input tokens from plausible fakes efficiently learns effective language representations. |
Kevin Clark; Minh-Thang Luong; Quoc V. Le; Christopher D. Manning; |
2020 | 5 | Reformer: The Efficient Transformer IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Efficient Transformer with locality-sensitive hashing and reversible layers |
Nikita Kitaev; Lukasz Kaiser; Anselm Levskaya; |
2020 | 6 | On The Convergence Of FedAvg On Non-IID Data IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs. |
Xiang Li; Kaixuan Huang; Wenhao Yang; Shusen Wang; Zhihua Zhang; |
2020 | 7 | On The Variance Of The Adaptive Learning Rate And Beyond IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: If warmup is the answer, what is the question? |
LIYUAN LIU et. al. |
2020 | 8 | VL-BERT: Pre-training Of Generic Visual-Linguistic Representations IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: VL-BERT is a simple yet powerful pre-trainable generic representation for visual-linguistic tasks. It is pre-trained on the massive-scale caption dataset and text-only corpus, and can be finetuned for varies down-stream visual-linguistic tasks. |
WEIJIE SU et. al. |
2020 | 9 | DropEdge: Towards Deep Graph Convolutional Networks On Node Classification IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes DropEdge, a novel and flexible technique to alleviate over-smoothing and overfitting issue in deep Graph Convolutional Networks. |
Yu Rong; Wenbing Huang; Tingyang Xu; Junzhou Huang; |
2020 | 10 | Once For All: Train One Network And Specialize It For Efficient Deployment IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce techniques to train a single once-for-all network that fits many hardware platforms. |
Han Cai; Chuang Gan; Tianzhe Wang; Zhekai Zhang; Song Han; |
2020 | 11 | AugMix: A Simple Data Processing Method To Improve Robustness And Uncertainty IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We obtain state-of-the-art on robustness to data shifts, and we maintain calibration under data shift even though even when accuracy drops |
DAN HENDRYCKS* et. al. |
2020 | 12 | Strategies For Pre-training Graph Neural Networks IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a strategy for pre-training Graph Neural Networks (GNNs) and systematically study its effectiveness on multiple datasets, GNN architectures, and diverse downstream tasks. |
WEIHUA HU* et. al. |
2020 | 13 | Fast Is Better Than Free: Revisiting Adversarial Training IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: FGSM-based adversarial training, with randomization, works just as well as PGD-based adversarial training: we can use this to train a robust classifier in 6 minutes on CIFAR10, and 12 hours on ImageNet, on a single machine. |
Eric Wong; Leslie Rice; J. Zico Kolter; |
2020 | 14 | Decoupling Representation And Classifier For Long-Tailed Recognition IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we decouple the learning procedure into representation learning and classification, and systematically explore how different balancing strategies affect them for long-tailed recognition. |
BINGYI KANG et. al. |
2020 | 15 | Dream To Control: Learning Behaviors By Latent Imagination IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Dreamer, an agent that learns long-horizon behaviors purely by latent imagination using analytic value gradients. |
Danijar Hafner; Timothy Lillicrap; Jimmy Ba; Mohammad Norouzi; |
2019 | 1 | Decoupled Weight Decay Regularization IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Novel variants of optimization methods that combine the benefits of both adaptive and non-adaptive methods. |
Ilya Loshchilov; Frank Hutter; |
2019 | 2 | How Powerful Are Graph Neural Networks? IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop theoretical foundations for the expressive power of GNNs and design a provably most powerful GNN. |
Keyulu Xu*; Weihua Hu*; Jure Leskovec; Stefanie Jegelka; |
2019 | 3 | GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a multi-task benchmark and analysis platform for evaluating generalization in natural language understanding systems. |
ALEX WANG et. al. |
2019 | 4 | Large Scale GAN Training for High Fidelity Natural Image Synthesis IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: GANs benefit from scaling up. |
Andrew Brock; Jeff Donahue; Karen Simonyan; |
2019 | 5 | DARTS: Differentiable Architecture Search IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a differentiable architecture search algorithm for both convolutional and recurrent networks, achieving competitive performance with the state of the art using orders of magnitude less computation resources. |
Hanxiao Liu; Karen Simonyan; Yiming Yang; |
2019 | 6 | The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Feedforward neural networks that can have weights pruned after training could have had the same weights pruned before training |
Jonathan Frankle; Michael Carbin; |
2019 | 7 | Benchmarking Neural Network Robustness to Common Corruptions and Perturbations IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose ImageNet-C to measure classifier corruption robustness and ImageNet-P to measure perturbation robustness |
Dan Hendrycks; Thomas Dietterich; |
2019 | 8 | Learning Deep Representations By Mutual Information Estimation and Maximization IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We learn deep representation by maximizing mutual information, leveraging structure in the objective, and are able to compute with fully supervised classifiers with comparable architectures |
R DEVON HJELM et. al. |
2019 | 9 | ImageNet-trained CNNs Are Biased Towards Texture; Increasing Shape Bias Improves Accuracy and Robustness IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: ImageNet-trained CNNs are biased towards object texture (instead of shape like humans). Overcoming this major difference between human and machine vision yields improved detection performance and previously unseen robustness to image distortions. |
ROBERT GEIRHOS et. al. |
2019 | 10 | Deep Graph Infomax IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A new method for unsupervised representation learning on graphs, relying on maximizing mutual information between local and global representations in a graph. State-of-the-art results, competitive with supervised learning. |
PETAR VELICKOVIC et. al. |
2019 | 11 | ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Proxy-less neural architecture search for directly learning architectures on large-scale target task (ImageNet) while reducing the cost to the same level of normal training. |
Han Cai; Ligeng Zhu; Song Han; |
2019 | 12 | RotatE: Knowledge Graph Embedding By Relational Rotation in Complex Space IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A new state-of-the-art approach for knowledge graph embedding. |
Zhiqing Sun; Zhi-Hong Deng; Jian-Yun Nie; Jian Tang; |
2019 | 13 | Robustness May Be at Odds with Accuracy IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that adversarial robustness might come at the cost of standard classification performance, but also yields unexpected benefits. |
Dimitris Tsipras; Shibani Santurkar; Logan Engstrom; Alexander Turner; Aleksander Madry; |
2019 | 14 | A Closer Look at Few-shot Classification IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A detailed empirical study in few-shot classification that revealing challenges in standard evaluation setting and showing a new direction. |
Wei-Yu Chen; Yen-Cheng Liu; Zsolt Kira; Yu-Chiang Frank Wang; Jia-Bin Huang; |
2019 | 15 | Predict Then Propagate: Graph Neural Networks Meet Personalized PageRank IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Personalized propagation of neural predictions (PPNP) improves graph neural networks by separating them into prediction and propagation via personalized PageRank. |
Johannes Klicpera; Aleksandar Bojchevski; Stephan G�nnemann; |
2018 | 1 | Graph Attention Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A novel approach to processing graph-structured data by neural networks, leveraging attention over a node’s neighborhood. Achieves state-of-the-art results on transductive citation network tasks and an inductive protein-protein interaction task. |
PETAR VELICKOVIC et. al. |
2018 | 2 | Towards Deep Learning Models Resistant to Adversarial Attacks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide a principled, optimization-based re-look at the notion of adversarial examples, and develop methods that produce models that are adversarially robust against a wide range of adversaries. |
Aleksander Madry; Aleksandar Makelov; Ludwig Schmidt; Dimitris Tsipras; Adrian Vladu; |
2018 | 3 | Mixup: Beyond Empirical Risk Minimization IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Training on convex combinations between random training examples and their labels improves generalization in deep neural networks |
Hongyi Zhang; Moustapha Cisse; Yann N. Dauphin; David Lopez-Paz; |
2018 | 4 | Progressive Growing of GANs for Improved Quality, Stability, and Variation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We train generative adversarial networks in a progressive fashion, enabling us to generate high-resolution images with high quality. |
Tero Karras; Timo Aila; Samuli Laine; Jaakko Lehtinen; |
2018 | 5 | Spectral Normalization for Generative Adversarial Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator of GANs. |
Takeru Miyato; Toshiki Kataoka; Masanori Koyama; Yuichi Yoshida; |
2018 | 6 | Unsupervised Representation Learning By Predicting Image Rotations IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our work we propose to learn image features by training ConvNets to recognize the 2d rotation that is applied to the image that it gets as input. |
Spyros Gidaris; Praveer Singh; Nikos Komodakis; |
2018 | 7 | Ensemble Adversarial Training: Attacks and Defenses IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Adversarial training with single-step methods overfits, and remains vulnerable to simple black-box and white-box attacks. We show that including adversarial examples from multiple sources helps defend against black-box attacks. |
FLORIAN TRAM�R et. al. |
2018 | 8 | Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A neural sequence model that learns to forecast on a directed graph. |
Yaguang Li; Rose Yu; Cyrus Shahabi; Yan Liu; |
2018 | 9 | On The Convergence of Adam and Beyond IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the convergence of popular optimization algorithms like Adam , RMSProp and propose new variants of these methods which provably converge to optimal solution in convex settings. |
Sashank J. Reddi; Satyen Kale; Sanjiv Kumar; |
2018 | 10 | Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose ODIN, a simple and effective method that does not require any change to a pre-trained neural network. |
Shiyu Liang; Yixuan Li; R. Srikant; |
2018 | 11 | Word Translation Without Parallel Data IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Aligning languages without the Rosetta Stone: with no parallel data, we construct bilingual dictionaries using adversarial training, cross-domain local scaling, and an accurate proxy criterion for cross-validation. |
Guillaume Lample; Alexis Conneau; Marc’Aurelio Ranzato; Ludovic Denoyer; Herv� J�gou; |
2018 | 12 | Active Learning for Convolutional Neural Networks: A Core-Set Approach IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We approach to the problem of active learning as a core-set selection problem and show that this approach is especially useful in the batch active learning setting which is crucial when training CNNs. |
Ozan Sener; Silvio Savarese; |
2018 | 13 | A Deep Reinforced Model for Abstractive Summarization IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A summarization model combining a new intra-attention and reinforcement learning method to increase summary ROUGE scores and quality for long sequences. |
Romain Paulus; Caiming Xiong; Richard Socher; |
2018 | 14 | Mixed Precision Training IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since this format has a narrower range than single-precision we propose three techniques for preventing the loss of critical information. |
PAULIUS MICIKEVICIUS et. al. |
2018 | 15 | FastGCN: Fast Learning with Graph Convolutional Networks Via Importance Sampling IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such an interpretation allows for the use of Monte Carlo approaches to consistently estimate the integrals, which in turn leads to a batched training scheme as we propose in this work—FastGCN. |
Jie Chen; Tengfei Ma; Cao Xiao; |