Most Influential ICML Papers (2023-09)
The International Conference on Machine Learning (ICML) is one of the top machine learning conferences in the world. Paper Digest Team analyzes all papers published on ICML in the past years, and presents the 15 most influential papers for each year. This ranking list is automatically constructed based upon citations from both research papers and granted patents, and will be frequently updated to reflect the most recent changes. To find the latest version of this list or the most influential papers from other conferences/journals, please visit Best Paper Digest page. Note: the most influential papers may or may not include the papers that won the best paper awards. (Version: 2023-09)
To search or review papers within ICML related to a specific topic, please use the search by venue (ICML) and review by venue (ICML) services. To browse the most productive ICML authors by year ranked by #papers accepted, here is a list of most productive ICML authors.
Based in New York, Paper Digest is dedicated to producing high-quality text analysis results that people can acturally use on a daily basis. Since 2018, we have been serving users across the world with a number of exclusive services to track, search, review and rewrite scientific literature.
You are welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Most Influential ICML Papers (2023-09)
Year | Rank | Paper | Author(s) |
---|---|---|---|
2023 | 1 | BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes BLIP-2, a generic and efficient pre-training strategy that bootstraps vision-language pre-training from off-the-shelf frozen pre-trained image encoders and frozen large language models. |
Junnan Li; Dongxu Li; Silvio Savarese; Steven Hoi; |
2023 | 2 | Robust Speech Recognition Via Large-Scale Weak Supervision IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. |
ALEC RADFORD et. al. |
2023 | 3 | PaLM-E: An Embodied Multimodal Language Model IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose embodied language models to directly incorporate real-world continuous sensor modalities into language models and thereby establish the link between words and percepts. |
DANNY DRIESS et. al. |
2023 | 4 | PAL: Program-aided Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Program-Aided Language models (PAL): a novel approach that uses the LLM to read natural language problems and generate programs as the intermediate reasoning steps, but offloads the solution step to a runtime such as a Python interpreter. |
LUYU GAO et. al. |
2023 | 5 | Muse: Text-To-Image Generation Via Masked Generative Transformers IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Muse, a text-to-image Transformermodel that achieves state-of-the-art image genera-tion performance while being significantly moreefficient than diffusion or autoregressive models.Muse is trained on a masked modeling task indiscrete token space: given the text embeddingextracted from a pre-trained large language model(LLM), Muse learns to predict randomly maskedimage tokens. |
HUIWEN CHANG et. al. |
2023 | 6 | The Flan Collection: Designing Data and Methods for Effective Instruction Tuning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the design decision of publicly available instruction tuning methods, by reproducing and breaking down the development of Flan 2022 (Chung et al., 2022). |
SHAYNE LONGPRE et. al. |
2023 | 7 | SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SmoothQuant, a training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs. |
GUANGXUAN XIAO et. al. |
2023 | 8 | AudioLDM: Text-to-Audio Generation with Latent Diffusion Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose AudioLDM, a TTA system that is built on a latent space to learn continuous audio representations from contrastive language-audio pretraining (CLAP) embeddings. |
HAOHE LIU et. al. |
2023 | 9 | Consistency Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Diffusion models have significantly advanced the fields of image, audio, and video generation, but they depend on an iterative sampling process that causes slow generation. To overcome this limitation, we propose consistency models, a new family of models that generate high quality samples by directly mapping noise to data. |
Yang Song; Prafulla Dhariwal; Mark Chen; Ilya Sutskever; |
2023 | 10 | Composer: Creative and Controllable Image Synthesis with Composable Conditions IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work offers a new generation paradigm that allows flexible control of the output image, such as spatial layout and palette, while maintaining the synthesis quality and model creativity. |
LIANGHUA HUANG et. al. |
2023 | 11 | Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Its application to audio still lags behind for two main reasons: the lack of large-scale datasets with high-quality text-audio pairs, and the complexity of modeling long continuous audio data. In this work, we propose Make-An-Audio with a prompt-enhanced diffusion model that addresses these gaps by 1) introducing pseudo prompt enhancement with a distill-then-reprogram approach, it alleviates data scarcity with orders of magnitude concept compositions by using language-free audios; 2) leveraging spectrogram autoencoder to predict the self-supervised audio representation instead of waveforms. |
RONGJIE HUANG et. al. |
2023 | 12 | Scaling Laws for Reward Model Overoptimization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we use a synthetic setup in which a fixed “gold-standard” reward model plays the role of humans, providing labels used to train a proxy reward model. |
Leo Gao; John Schulman; Jacob Hilton; |
2023 | 13 | Large Language Models Can Be Easily Distracted By Irrelevant Context IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the *distractibility* of large language models, i.e., how the model prediction can be distracted by irrelevant context. |
FREDA SHI et. al. |
2023 | 14 | MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present MultiDiffusion, a unified framework that enables versatile and controllable image generation, using a pre-trained text-to-image diffusion model, without any further training or finetuning. |
Omer Bar-Tal; Lior Yariv; Yaron Lipman; Tali Dekel; |
2023 | 15 | Large Language Models Struggle to Learn Long-Tail Knowledge IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the relationship between the knowledge memorized by large language models and the information in pre-training datasets scraped from the web. |
Nikhil Kandpal; Haikang Deng; Adam Roberts; Eric Wallace; Colin Raffel; |
2022 | 1 | GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore diffusion models for the problem of text-conditional image synthesis and compare two different guidance strategies: CLIP guidance and classifier-free guidance. |
ALEXANDER QUINN NICHOL et. al. |
2022 | 2 | BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose BLIP, a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks. |
Junnan Li; Dongxu Li; Caiming Xiong; Steven Hoi; |
2022 | 3 | Data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To get us closer to general self-supervised learning, we present data2vec, a framework that uses the same learning method for either speech, NLP or computer vision. |
ALEXEI BAEVSKI et. al. |
2022 | 4 | Improving Language Models By Retrieving from Trillions of Tokens IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. |
SEBASTIAN BORGEAUD et. al. |
2022 | 5 | Model Soups: Averaging Weights of Multiple Fine-tuned Models Improves Accuracy Without Increasing Inference Time IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low error basin. |
MITCHELL WORTSMAN et. al. |
2022 | 6 | Language Models As Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the possibility of grounding high-level tasks, expressed in natural language (e.g. “make breakfast”), to a chosen set of actionable steps (e.g. “open fridge”). |
Wenlong Huang; Pieter Abbeel; Deepak Pathak; Igor Mordatch; |
2022 | 7 | GLaM: Efficient Scaling of Language Models with Mixture-of-Experts IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose and develop a family of language models named \glam (\textbf{G}eneralist \textbf{La}nguage \textbf{M}odel), which uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants. |
NAN DU et. al. |
2022 | 8 | FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these problems, we propose to combine Transformer with the seasonal-trend decomposition method, in which the decomposition method captures the global profile of time series while Transformers capture more detailed structures. |
TIAN ZHOU et. al. |
2022 | 9 | Equivariant Diffusion for Molecule Generation in 3D IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces a diffusion model for molecule generation in 3D that is equivariant to Euclidean transformations. |
Emiel Hoogeboom; Vi?ctor Garcia Satorras; Cl?ment Vignac; Max Welling; |
2022 | 10 | Scaling Out-of-Distribution Detection for Real-World Settings IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To set the stage for more realistic out-of-distribution detection, we depart from small-scale settings and explore large-scale multiclass and multi-label settings with high-resolution images and thousands of classes. |
DAN HENDRYCKS et. al. |
2022 | 11 | Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a new method called X-VLM to perform ‘multi-grained vision language pre-training.’ |
Yan Zeng; Xinsong Zhang; Hang Li; |
2022 | 12 | Planning with Diffusion for Flexible Behavior Synthesis IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem, such that sampling from the model and planning with it become nearly identical. |
Michael Janner; Yilun Du; Joshua Tenenbaum; Sergey Levine; |
2022 | 13 | Out-of-Distribution Detection with Deep Nearest Neighbors IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the efficacy of non-parametric nearest-neighbor distance for OOD detection, which has been largely overlooked in the literature. |
Yiyou Sun; Yifei Ming; Xiaojin Zhu; Yixuan Li; |
2022 | 14 | Diffusion Models for Adversarial Purification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose DiffPure that uses diffusion models for adversarial purification: Given an adversarial example, we first diffuse it with a small amount of noise following a forward diffusion process, and then recover the clean image through a reverse generative process. |
WEILI NIE et. al. |
2022 | 15 | Learning Inverse Folding from Millions of Predicted Structures IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of predicting a protein sequence from its backbone atom coordinates. |
CHLOE HSU et. al. |
2021 | 1 | Learning Transferable Visual Models From Natural Language Supervision IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. |
ALEC RADFORD et. al. |
2021 | 2 | Training Data-efficient Image Transformers & Distillation Through Attention IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we produce competitive convolution-free transformers trained on ImageNet only using a single computer in less than 3 days. |
HUGO TOUVRON et. al. |
2021 | 3 | Zero-Shot Text-to-Image Generation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe a simple approach for this task based on a transformer that autoregressively models the text and image tokens as a single stream of data. |
ADITYA RAMESH et. al. |
2021 | 4 | Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset. |
CHAO JIA et. al. |
2021 | 5 | Barlow Twins: Self-Supervised Learning Via Redundancy Reduction IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an objective function that naturally avoids collapse by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible. |
Jure Zbontar; Li Jing; Ishan Misra; Yann Lecun; Stephane Deny; |
2021 | 6 | Improved Denoising Diffusion Probabilistic Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that with a few simple modifications, DDPMs can also achieve competitive log-likelihoods while maintaining high sample quality. |
Alexander Quinn Nichol; Prafulla Dhariwal; |
2021 | 7 | EfficientNetV2: Smaller Models and Faster Training IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces EfficientNetV2, a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models. |
Mingxing Tan; Quoc Le; |
2021 | 8 | Is Space-Time Attention All You Need for Video Understanding? IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a convolution-free approach to video classification built exclusively on self-attention over space and time. |
Gedas Bertasius; Heng Wang; Lorenzo Torresani; |
2021 | 9 | ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a minimal VLP model, Vision-and-Language Transformer (ViLT), monolithic in the sense that the processing of visual inputs is drastically simplified to just the same convolution-free manner that we process textual inputs. |
Wonjae Kim; Bokyung Son; Ildoo Kim; |
2021 | 10 | WILDS: A Benchmark of In-the-Wild Distribution Shifts IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this gap, we present WILDS, a curated benchmark of 10 datasets reflecting a diverse range of distribution shifts that naturally arise in real-world applications, such as shifts across hospitals for tumor identification; across camera traps for wildlife monitoring; and across time and location in satellite imaging and poverty mapping. |
PANG WEI KOH et. al. |
2021 | 11 | Calibrate Before Use: Improving Few-shot Performance of Language Models IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: GPT-3 can perform numerous tasks when provided a natural language prompt that contains a few training examples. We show that this type of few-shot learning can be unstable: the choice of prompt format, training examples, and even the order of the examples can cause accuracy to vary from near chance to near state-of-the-art. |
Zihao Zhao; Eric Wallace; Shi Feng; Dan Klein; Sameer Singh; |
2021 | 12 | Out-of-Distribution Generalization Via Risk Extrapolation (REx) IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We motivate this approach, Risk Extrapolation (REx), as a form of robust optimization over a perturbation set of extrapolated domains (MM-REx), and propose a penalty on the variance of training risks (V-REx) as a simpler variant. |
DAVID KRUEGER et. al. |
2021 | 13 | Perceiver: General Perception with Iterative Attention IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we introduce the Perceiver {–} a model that builds upon Transformers and hence makes few architectural assumptions about the relationship between its inputs, but that also scales to hundreds of thousands of inputs, like ConvNets. |
ANDREW JAEGLE et. al. |
2021 | 14 | E(n) Equivariant Graph Neural Networks IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a new model to learn graph neural networks equivariant to rotations, translations, reflections and permutations called E(n)-Equivariant Graph Neural Networks (EGNNs). |
Vi?ctor Garcia Satorras; Emiel Hoogeboom; Max Welling; |
2021 | 15 | Ditto: Fair and Robust Federated Learning Through Personalization IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we identify that robustness to data and model poisoning attacks and fairness, measured as the uniformity of performance across devices, are competing constraints in statistically heterogeneous networks. |
Tian Li; Shengyuan Hu; Ahmad Beirami; Virginia Smith; |
2020 | 1 | A Simple Framework for Contrastive Learning of Visual Representations IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a simple framework for contrastive representation learning. |
Ting Chen; Simon Kornblith; Mohammad Norouzi; Geoffrey Hinton; |
2020 | 2 | PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised objective. |
Jingqing Zhang; Yao Zhao; Mohammad Saleh; Peter Liu; |
2020 | 3 | SCAFFOLD: Stochastic Controlled Averaging for Federated Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a solution, we propose a new algorithm (SCAFFOLD) which uses control variates (variance reduction) to correct for the `client drift’. |
SAI PRANEETH REDDY KARIMIREDDY et. al. |
2020 | 4 | Data-Efficient Image Recognition with Contrastive Predictive Coding IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We therefore revisit and improve Contrastive Predictive Coding, an unsupervised objective for learning such representations. |
Olivier Henaff; |
2020 | 5 | Reliable Evaluation of Adversarial Robustness with An Ensemble of Diverse Parameter-free Attacks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we first propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function. We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness. |
Francesco Croce; Matthias Hein; |
2020 | 6 | Generative Pretraining From Pixels IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. |
MARK CHEN et. al. |
2020 | 7 | Simple and Deep Graph Convolutional Networks IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of designing and analyzing deep graph convolutional networks. |
Ming Chen; Zhewei Wei; Zengfeng Huang; Bolin Ding; Yaliang Li; |
2020 | 8 | Transformers Are RNNs: Fast Autoregressive Transformers with Linear Attention IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this limitation, we express the self-attention as a linear dot-product of kernel feature maps and make use of the associativity property of matrix products to reduce the complexity from $\bigO{N^2}$ to $\bigO{N}$, where $N$ is the sequence length. |
Angelos Katharopoulos; Apoorv Vyas; Nikolaos Pappas; Francois Fleuret; |
2020 | 9 | Contrastive Multi-View Representation Learning on Graphs IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a self-supervised approach for learning node and graph level representations by contrasting structural views of graphs. |
Kaveh Hassani; Amir Hosein Khasahmadi; |
2020 | 10 | Do We Really Need to Access The Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we tackle a novel setting where only a trained source model is available and investigate how we can effectively utilize such a model without source data to solve UDA problems. |
Jian Liang; Dapeng Hu; Jiashi Feng; |
2020 | 11 | Overfitting in Adversarially Robust Deep Learning IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we empirically study this phenomenon in the setting of adversarially trained deep networks, which are trained to minimize the loss under worst-case adversarial perturbations. |
Eric Wong; Leslie Rice; Zico Kolter; |
2020 | 12 | On Layer Normalization in The Transformer Architecture IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we prove with mean field theory that at initialization, for the original-designed Post-LN Transformer, which places the layer normalization between the residual blocks, the expected gradients of the parameters near the output layer are large. |
RUIBIN XIONG et. al. |
2020 | 13 | Implicit Geometric Regularization for Learning Shapes IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we offer a new paradigm for computing high fidelity implicit neural representations directly from raw data (i.e., point clouds, with or without normal information). |
Amos Gropp; Lior Yariv; Niv Haim; Matan Atzmon; Yaron Lipman; |
2020 | 14 | On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the complexity results on two-time-scale GDA for solving nonconvex-concave minimax problems, showing that the algorithm can find a stationary point of the function $\Phi(\cdot) := \max_{\mathbf{y} \in \mathcal{Y}} f(\cdot, \mathbf{y})$ efficiently. |
Tianyi Lin; Chi Jin; Michael Jordan; |
2020 | 15 | Agent57: Outperforming The Atari Human Benchmark IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Agent57, the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games. |
ADRIÀ PUIGDOMENECH BADIA et. al. |
2019 | 1 | EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. |
Mingxing Tan; Quoc Le; |
2019 | 2 | Self-Attention Generative Adversarial Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks. |
Han Zhang; Ian Goodfellow; Dimitris Metaxas; Augustus Odena; |
2019 | 3 | Simplifying Graph Convolutional Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we reduce this excess complexity through successively removing nonlinearities and collapsing weight matrices between consecutive layers. |
FELIX WU et. al. |
2019 | 4 | Theoretically Principled Trade-off Between Robustness and Accuracy IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we decompose the prediction error for adversarial examples (robust error) as the sum of the natural (classification) error and boundary error, and provide a differentiable upper bound using the theory of classification-calibrated loss, which is shown to be the tightest possible upper bound uniform over all probability distributions and measurable predictors. |
HONGYANG ZHANG et. al. |
2019 | 5 | Parameter-Efficient Transfer Learning for NLP IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As an alternative, we propose transfer with adapter modules. |
NEIL HOULSBY et. al. |
2019 | 6 | Certified Adversarial Robustness Via Randomized Smoothing IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show how to turn any classifier that classifies well under Gaussian noise into a new classifier that is certifiably robust to adversarial perturbations under the L2 norm. |
Jeremy Cohen; Elan Rosenfeld; Zico Kolter; |
2019 | 7 | A Convergence Theory for Deep Learning Via Over-Parameterization IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we prove simple algorithms such as stochastic gradient descent (SGD) can find Global Minima on the training objective of DNNs in Polynomial Time. |
Zeyuan Allen-Zhu; Yuanzhi Li; Zhao Song; |
2019 | 8 | Challenging Common Assumptions in The Unsupervised Learning of Disentangled Representations IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions. |
FRANCESCO LOCATELLO et. al. |
2019 | 9 | Do ImageNet Classifiers Generalize to ImageNet? IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We build new test sets for the CIFAR-10 and ImageNet datasets. |
Benjamin Recht; Rebecca Roelofs; Ludwig Schmidt; Vaishaal Shankar; |
2019 | 10 | Gradient Descent Finds Global Minima of Deep Neural Networks IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet). |
Simon Du; Jason Lee; Haochuan Li; Liwei Wang; Xiyu Zhai; |
2019 | 11 | Learning Latent Dynamics for Planning from Pixels IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Deep Planning Network (PlaNet), a purely model-based agent that learns the environment dynamics from images and chooses actions through fast online planning in latent space. |
DANIJAR HAFNER et. al. |
2019 | 12 | Off-Policy Deep Reinforcement Learning Without Exploration IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we demonstrate that due to errors introduced by extrapolation, standard off-policy deep reinforcement learning algorithms, such as DQN and DDPG, are incapable of learning with data uncorrelated to the distribution under the current policy, making them ineffective for this fixed batch setting. |
Scott Fujimoto; David Meger; Doina Precup; |
2019 | 13 | Manifold Mixup: Better Representations By Interpolating Hidden States IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these issues, we propose \manifoldmixup{}, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden representations. |
VIKAS VERMA et. al. |
2019 | 14 | MASS: Masked Sequence to Sequence Pre-training for Language Generation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the success of BERT, we propose MAsked Sequence to Sequence pre-training (MASS) for the encoder-decoder based language generation tasks. |
Kaitao Song; Xu Tan; Tao Qin; Jianfeng Lu; Tie-Yan Liu; |
2019 | 15 | Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR’17]. (ii) Generalization bound independent of network size, using a data-dependent complexity measure. |
Sanjeev Arora; Simon Du; Wei Hu; Zhiyuan Li; Ruosong Wang; |
2018 | 1 | Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with A Stochastic Actor IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. |
Tuomas Haarnoja; Aurick Zhou; Pieter Abbeel; Sergey Levine; |
2018 | 2 | Addressing Function Approximation Error in Actor-Critic Methods IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested. |
Scott Fujimoto; Herke Hoof; David Meger; |
2018 | 3 | Obfuscated Gradients Give A False Sense of Security: Circumventing Defenses to Adversarial Examples IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe characteristic behaviors of defenses exhibiting the effect, and for each of the three types of obfuscated gradients we discover, we develop attack techniques to overcome it. |
Anish Athalye; Nicholas Carlini; David Wagner; |
2018 | 4 | CyCADA: Cycle-Consistent Adversarial Domain Adaptation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a model which adapts between domains using both generative image space alignment and latent representation space alignment. |
JUDY HOFFMAN et. al. |
2018 | 5 | Representation Learning on Graphs with Jumping Knowledge Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We analyze some important properties of these models, and propose a strategy to overcome those. |
KEYULU XU et. al. |
2018 | 6 | Provable Defenses Against Adversarial Examples Via The Convex Outer Adversarial Polytope IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. |
Eric Wong; Zico Kolter; |
2018 | 7 | Synthesizing Robust Adversarial Examples IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate the existence of robust 3D adversarial objects, and we present the first algorithm for synthesizing examples that are adversarial over a chosen distribution of transformations. |
Anish Athalye; Logan Engstrom; Andrew Ilyas; Kevin Kwok; |
2018 | 8 | IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. |
LASSE ESPEHOLT et. al. |
2018 | 9 | Deep One-Class Classification IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we introduce a new anomaly detection method—Deep Support Vector Data Description—, which is trained on an anomaly detection based objective. |
LUKAS RUFF et. al. |
2018 | 10 | QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate QMIX on a challenging set of StarCraft II micromanagement tasks, and show that QMIX significantly outperforms existing value-based multi-agent reinforcement learning methods. |
TABISH RASHID et. al. |
2018 | 11 | Which Training Methods for GANs Do Actually Converge? IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that the requirement of absolute continuity is necessary: we describe a simple yet prototypical counterexample showing that in the more realistic case of distributions that are not absolutely continuous, unregularized GAN training is not always convergent. |
Lars Mescheder; Andreas Geiger; Sebastian Nowozin; |
2018 | 12 | Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net’s internal state in terms of human-friendly concepts. |
BEEN KIM et. al. |
2018 | 13 | Image Transformer IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we generalize a recently proposed model architecture based on self-attention, the Transformer, to a sequence modeling formulation of image generation with a tractable likelihood. |
NIKI PARMAR et. al. |
2018 | 14 | MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the overfitting on corrupted labels, we propose a novel technique of learning another neural network, called MentorNet, to supervise the training of the base deep networks, namely, StudentNet. |
Lu Jiang; Zhengyuan Zhou; Thomas Leung; Li-Jia Li; Li Fei-Fei; |
2018 | 15 | Learning to Reweight Examples for Robust Deep Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast to past reweighting methods, which typically consist of functions of the cost value of each example, in this work we propose a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions. |
Mengye Ren; Wenyuan Zeng; Bin Yang; Raquel Urtasun; |
2017 | 1 | Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. |
Chelsea Finn; Pieter Abbeel; Sergey Levine; |
2017 | 2 | Wasserstein Generative Adversarial Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new algorithm named WGAN, an alternative to traditional GAN training. |
Martin Arjovsky; Soumith Chintala; L�on Bottou; |
2017 | 3 | Neural Message Passing for Quantum Chemistry IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we reformulate existing models into a single common framework we call Message Passing Neural Networks (MPNNs) and explore additional novel variations within this framework. |
Justin Gilmer; Samuel S. Schoenholz; Patrick F. Riley; Oriol Vinyals; George E. Dahl; |
2017 | 4 | Axiomatic Attribution for Deep Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. |
Mukund Sundararajan; Ankur Taly; Qiqi Yan; |
2017 | 5 | On Calibration of Modern Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We discover that modern neural networks, unlike those from a decade ago, are poorly calibrated. |
Chuan Guo; Geoff Pleiss; Yu Sun; Kilian Q. Weinberger; |
2017 | 6 | Convolutional Sequence to Sequence Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an architecture based entirely on convolutional neural networks. |
Jonas Gehring; Michael Auli; David Grangier; Denis Yarats; Yann N. Dauphin; |
2017 | 7 | Learning Important Features Through Propagating Activation Differences IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we present DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. |
Avanti Shrikumar; Peyton Greenside; Anshul Kundaje; |
2017 | 8 | Conditional Image Synthesis with Auxiliary Classifier GANs IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we introduce new methods for the improved training of generative adversarial networks (GANs) for image synthesis. |
Augustus Odena; Christopher Olah; Jonathon Shlens; |
2017 | 9 | Understanding Black-box Predictions Via Influence Functions IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we use influence functions — a classic technique from robust statistics — to trace a model’s prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. |
Pang Wei Koh; Percy Liang; |
2017 | 10 | Curiosity-driven Exploration By Self-supervised Prediction IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formulate curiosity as the error in an agent’s ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model. |
Deepak Pathak; Pulkit Agrawal; Alexei A. Efros; Trevor Darrell; |
2017 | 11 | Deep Transfer Learning with Joint Adaptation Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present joint adaptation networks (JAN), which learn a transfer network by aligning the joint distributions of multiple domain-specific layers across domains based on a joint maximum mean discrepancy (JMMD) criterion. |
Mingsheng Long; Han Zhu; Jianmin Wang; Michael I. Jordan; |
2017 | 12 | Language Modeling with Gated Convolutional Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we develop a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens. |
Yann N. Dauphin; Angela Fan; Michael Auli; David Grangier; |
2017 | 13 | Learning to Discover Cross-Domain Relations with Generative Adversarial Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method based on a generative adversarial network that learns to discover relations between different domains (DiscoGAN). |
Taeksoo Kim; Moonsu Cha; Hyunsoo Kim; Jung Kwon Lee; Jiwon Kim; |
2017 | 14 | Continual Learning Through Synaptic Intelligence IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we introduce intelligent synapses that bring some of this biological complexity into artificial neural networks. |
Friedemann Zenke; Ben Poole; Surya Ganguli; |
2017 | 15 | Large-Scale Evolution of Image Classifiers IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our goal is to minimize human participation, so we employ evolutionary algorithms to discover such networks automatically. |
ESTEBAN REAL et. al. |
2016 | 1 | Asynchronous Methods For Deep Reinforcement Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. |
VOLODYMYR MNIH et. al. |
2016 | 2 | Dropout As A Bayesian Approximation: Representing Model Uncertainty In Deep Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we develop a new theoretical framework casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes. |
Yarin Gal; Zoubin Ghahramani; |
2016 | 3 | Dueling Network Architectures For Deep Reinforcement Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a new neural network architecture for model-free reinforcement learning. |
ZIYU WANG et. al. |
2016 | 4 | Generative Adversarial Text To Image Synthesis IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a novel deep architecture and GAN formulation to effectively bridge these advances in text and image modeling, translating visual concepts from characters to pixels. |
SCOTT REED et. al. |
2016 | 5 | Deep Speech 2 : End-to-End Speech Recognition In English And Mandarin IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech–two vastly different languages. |
DARIO AMODEI et. al. |
2016 | 6 | Pixel Recurrent Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. |
Aaron Van Oord; Nal Kalchbrenner; Koray Kavukcuoglu; |
2016 | 7 | Unsupervised Deep Embedding For Clustering Analysis IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature representations and cluster assignments using deep neural networks. |
Junyuan Xie; Ross Girshick; Ali Farhadi; |
2016 | 8 | Complex Embeddings For Simple Link Prediction IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As in previous studies, we propose to solve this problem through latent factorization. |
Th�o Trouillon; Johannes Welbl; Sebastian Riedel; Eric Gaussier; Guillaume Bouchard; |
2016 | 9 | Learning Convolutional Neural Networks For Graphs IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a framework for learning convolutional neural networks for arbitrary graphs. |
Mathias Niepert; Mohamed Ahmed; Konstantin Kutzkov; |
2016 | 10 | Autoencoding Beyond Pixels Using A Learned Similarity Metric IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an autoencoder that leverages learned representations to better measure similarities in data space. |
Anders Boesen Lindbo Larsen; S�ren Kaae S�nderby; Hugo Larochelle; Ole Winther; |
2016 | 11 | Benchmarking Deep Reinforcement Learning For Continuous Control IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure. |
Yan Duan; Xi Chen; Rein Houthooft; John Schulman; Pieter Abbeel; |
2016 | 12 | Revisiting Semi-Supervised Learning With Graph Embeddings IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a semi-supervised learning framework based on graph embeddings. |
Zhilin Yang; William Cohen; Ruslan Salakhudinov; |
2016 | 13 | Group Equivariant Convolutional Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Group equivariant Convolutional Neural Networks (G-CNNs), a natural generalization of convolutional neural networks that reduces sample complexity by exploiting symmetries. |
Taco Cohen; Max Welling; |
2016 | 14 | Meta-Learning With Memory-Augmented Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we demonstrate the ability of a memory-augmented neural network to rapidly assimilate new data, and leverage this data to make accurate predictions after only a few samples. |
Adam Santoro; Sergey Bartunov; Matthew Botvinick; Daan Wierstra; Timothy Lillicrap; |
2016 | 15 | CryptoNets: Applying Neural Networks To Encrypted Data With High Throughput And Accuracy IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we will present a method to convert learned neural networks to CryptoNets, neural networks that can be applied to encrypted data. |
RAN GILAD-BACHRACH et. al. |
2015 | 1 | Batch Normalization: Accelerating Deep Network Training By Reducing Internal Covariate Shift IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. |
Sergey Ioffe; Christian Szegedy; |
2015 | 2 | Show, Attend And Tell: Neural Image Caption Generation With Visual Attention IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. |
KELVIN XU et. al. |
2015 | 3 | Trust Region Policy Optimization IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, we describe a method for optimizing control policies, with guaranteed monotonic improvement. |
John Schulman; Sergey Levine; Pieter Abbeel; Michael Jordan; Philipp Moritz; |
2015 | 4 | Unsupervised Domain Adaptation By Backpropagation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a new approach to domain adaptation in deep architectures that can be trained on large amount of labeled data from the source domain and large amount of unlabeled data from the target domain (no labeled target-domain data is necessary). |
Yaroslav Ganin; Victor Lempitsky; |
2015 | 5 | Learning Transferable Features With Deep Adaptation Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new Deep Adaptation Network (DAN) architecture, which generalizes deep convolutional neural network to the domain adaptation scenario. |
Mingsheng Long; Yue Cao; Jianmin Wang; Michael Jordan; |
2015 | 6 | Variational Inference With Normalizing Flows IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new approach for specifying flexible, arbitrarily complex and scalable approximate posterior distributions. |
Danilo Rezende; Shakir Mohamed; |
2015 | 7 | Unsupervised Learning Of Video Representations Using LSTMs IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We use Long Short Term Memory (LSTM) networks to learn representations of video sequences. |
Nitish Srivastava; Elman Mansimov; Ruslan Salakhudinov; |
2015 | 8 | Deep Unsupervised Learning Using Nonequilibrium Thermodynamics IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we develop an approach that simultaneously achieves both flexibility and tractability. |
Jascha Sohl-Dickstein; Eric Weiss; Niru Maheswaranathan; Surya Ganguli; |
2015 | 9 | Deep Learning With Limited Numerical Precision IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the effect of limited precision data representation and computation on neural network training. |
Suyog Gupta; Ankur Agrawal; Kailash Gopalakrishnan; Pritish Narayanan; |
2015 | 10 | From Word Embeddings To Document Distances IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the Word Mover’s Distance (WMD), a novel distance function between text documents. |
Matt Kusner; Yu Sun; Nicholas Kolkin; Kilian Weinberger; |
2015 | 11 | DRAW: A Recurrent Neural Network For Image Generation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the Deep Recurrent Attentive Writer (DRAW) architecture for image generation with neural networks. |
Karol Gregor; Ivo Danihelka; Alex Graves; Danilo Rezende; Daan Wierstra; |
2015 | 12 | An Empirical Exploration Of Recurrent Network Architectures IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to determine whether the LSTM architecture is optimal or whether much better architectures exist. |
Rafal Jozefowicz; Wojciech Zaremba; Ilya Sutskever; |
2015 | 13 | Weight Uncertainty In Neural Network IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop. |
Charles Blundell; Julien Cornebise; Koray Kavukcuoglu; Daan Wierstra; |
2015 | 14 | Compressing Neural Networks With The Hashing Trick IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel network architecture, HashedNets, that exploits inherent redundancy in neural networks to achieve drastic reductions in model sizes. |
Wenlin Chen; James Wilson; Stephen Tyree; Kilian Weinberger; Yixin Chen; |
2015 | 15 | An Embarrassingly Simple Approach To Zero-shot Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we describe a zero-shot learning approach that can be implemented in just one line of code, yet it is able to outperform state of the art approaches on standard datasets. |
Bernardino Romera-Paredes; Philip Torr; |
2014 | 1 | Distributed Representations Of Sentences And Documents IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an unsupervised algorithm that learns vector representations of sentences and text documents. |
Quoc Le; Tomas Mikolov; |
2014 | 2 | DeCAF: A Deep Convolutional Activation Feature For Generic Visual Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re-purposed to novel generic tasks. |
JEFF DONAHUE et. al. |
2014 | 3 | Stochastic Backpropagation And Approximate Inference In Deep Generative Models IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound. |
Danilo Jimenez Rezende; Shakir Mohamed; Daan Wierstra; |
2014 | 4 | Deterministic Policy Gradient Algorithms IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. |
DAVID SILVER et. al. |
2014 | 5 | Towards End-To-End Speech Recognition With Recurrent Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation. |
Alex Graves; Navdeep Jaitly; |
2014 | 6 | Stochastic Gradient Hamiltonian Monte Carlo IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the properties of such a stochastic gradient HMC approach. |
Tianqi Chen; Emily Fox; Carlos Guestrin; |
2014 | 7 | Neural Variational Inference And Learning In Belief Networks IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a fast non-iterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior. |
Andriy Mnih; Karol Gregor; |
2014 | 8 | Recurrent Convolutional Neural Networks For Scene Labeling IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an approach that consists of a recurrent convolutional neural network which allows us to consider a large input context while limiting the capacity of the model. |
Pedro Pinheiro; Ronan Collobert; |
2014 | 9 | Multimodal Neural Language Models IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce two multimodal neural language models: models of natural language that can be conditioned on other modalities. |
Ryan Kiros; Ruslan Salakhutdinov; Rich Zemel; |
2014 | 10 | Fast Computation Of Wasserstein Barycenters IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present new algorithms to compute the mean of a set of N empirical probability measures under the optimal transport metric. |
Marco Cuturi; Arnaud Doucet; |
2014 | 11 | Learning Character-level Representations For Part-of-Speech Tagging IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a deep neural network that learns character-level representation of words and associate them with usual word representations to perform POS tagging. |
Cicero Dos Santos; Bianca Zadrozny; |
2014 | 12 | Communication-Efficient Distributed Optimization Using An Approximate Newton-type Method IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems. |
Ohad Shamir; Nati Srebro; Tong Zhang; |
2014 | 13 | Large-scale Multi-label Learning With Missing Labels IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we directly address both these problems by studying the multi-label problem in a generic empirical risk minimization (ERM) framework. |
Hsiang-Fu Yu; Prateek Jain; Purushottam Kar; Inderjit Dhillon; |
2014 | 14 | A Clockwork RNN IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a simple, yet powerful modification to the simple RNN (SRN) architecture, the Clockwork RNN (CW-RNN), in which the hidden layer is partitioned into separate modules, each processing inputs at its own temporal granularity, making computations only at its prescribed clock rate. |
Jan Koutnik; Klaus Greff; Faustino Gomez; Juergen Schmidhuber; |
2014 | 15 | Accelerated Proximal Stochastic Dual Coordinate Ascent For Regularized Loss Minimization IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a proximal version of the stochastic dual coordinate ascent method and show how to accelerate the method using an inner-outer iteration procedure. |
Shai Shalev-Shwartz; Tong Zhang; |
2013 | 1 | On The Difficulty Of Training Recurrent Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective. |
Razvan Pascanu; Tomas Mikolov; Yoshua Bengio; |
2013 | 2 | On The Importance Of Initialization And Momentum In Deep Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs (on datasets with long-term dependencies) to levels of performance that were previously achievable only with Hessian-Free optimization. |
Ilya Sutskever; James Martens; George Dahl; Geoffrey Hinton; |
2013 | 3 | Regularization Of Neural Networks Using DropConnect IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce DropConnect, a generalization of DropOut, for regularizing large fully-connected layers within neural networks. |
Li Wan; Matthew Zeiler; Sixin Zhang; Yann Le Cun; Rob Fergus; |
2013 | 4 | Maxout Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout. |
Ian Goodfellow; David Warde-Farley; Mehdi Mirza; Aaron Courville; Yoshua Bengio; |
2013 | 5 | Making A Science Of Model Search: Hyperparameter Optimization In Hundreds Of Dimensions For Vision Architectures IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a meta-modeling approach to support automated hyperparameter optimization, with the goal of providing practical tools that replace hand-tuning with a reproducible and unbiased optimization process. |
James Bergstra; Daniel Yamins; David Cox; |
2013 | 6 | Deep Canonical Correlation Analysis IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Deep Canonical Correlation Analysis (DCCA), a method to learn complex nonlinear transformations of two views of data such that the resulting representations are highly linearly correlated. |
Galen Andrew; Raman Arora; Jeff Bilmes; Karen Livescu; |
2013 | 7 | Learning Fair Representations IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a learning algorithm for fair classification that achieves both group fairness (the proportion of members in a protected group receiving positive classification is identical to the proportion in the population as a whole), and individual fairness (similar individuals should be treated similarly). |
Rich Zemel; Yu Wu; Kevin Swersky; Toni Pitassi; Cynthia Dwork; |
2013 | 8 | Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new general framework for convex optimization over matrix factorizations, where every Frank-Wolfe iteration will consist of a low-rank update, and discuss the broad application areas of this approach. |
Martin Jaggi; |
2013 | 9 | Domain Generalization Via Invariant Feature Representation IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Domain-Invariant Component Analysis (DICA), a kernel-based optimization algorithm that learns an invariant transformation by minimizing the dissimilarity across domains, whilst preserving the functional relationship between input and output variables. |
Krikamol Muandet; David Balduzzi; Bernhard Sch�lkopf; |
2013 | 10 | Thompson Sampling For Contextual Bandits With Linear Payoffs IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we design and analyze Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary. |
Shipra Agrawal; Navin Goyal; |
2013 | 11 | Guided Policy Search IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima. |
Sergey Levine; Vladlen Koltun; |
2013 | 12 | Deep Learning With COTS HPC Systems IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present technical details and results from our own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI. |
ADAM COATES et. al. |
2013 | 13 | Domain Adaptation Under Target And Conditional Shift IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We exploit importance reweighting or sample transformation to find the learning machine that works well on test data, and propose to estimate the weights or transformations by \it reweighting or transforming training data to reproduce the covariate distribution on the test domain. |
Kun Zhang; Bernhard Sch�lkopf; Krikamol Muandet; Zhikun Wang; |
2013 | 14 | Gaussian Process Kernels For Pattern Discovery And Extrapolation IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce simple closed form kernels that can be used with Gaussian processes to discover patterns and enable extrapolation. |
Andrew Wilson; Ryan Adams; |
2013 | 15 | Stochastic Gradient Descent For Non-smooth Optimization: Convergence Results And Optimal Averaging Schemes IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the performance of SGD \emphwithout such smoothness assumptions, as well as a running average scheme to convert the SGD iterates to a solution with optimal optimization accuracy. |
Ohad Shamir; Tong Zhang; |
2012 | 1 | Building High-level Features Using Large Scale Unsupervised Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the challenge of building feature detectors for high-level concepts from only unlabeled data. |
QUOC LE et. al. |
2012 | 2 | Poisoning Attacks Against Support Vector Machines IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As we demonstrate in this contribution, an intelligent adversary can to some extent predict the change of the SVM decision function in response to malicious input and use this ability to construct malicious data points. |
Battista Biggio; Blaine Nelson; Pavel Laskov; |
2012 | 3 | Conversational Speech Transcription Using Context-Dependent Deep Neural Networks IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, combine the classic artificial-neural-network HMMs with traditional context-dependent acoustic modeling and … |
Dong Yu; Frank Seide; Gang Li; |
2012 | 4 | Marginalized Denoising Autoencoders For Domain Adaptation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a variation, marginalized SDA (mSDA). |
Minmin Chen; Zhixiang Xu; Kilian Weinberger; Fei Sha; |
2012 | 5 | Modeling Temporal Dependencies In High-Dimensional Sequences: Application To Polyphonic Music Generation And Transcription IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a probabilistic model based on distribution estimators conditioned on a recurrent neural network that is able to discover temporal dependencies in high-dimensional sequences. |
Nicolas Boulanger-Lewandowski; Yoshua Bengio; Pascal Vincent; |
2012 | 6 | Making Gradient Descent Optimal For Strongly Convex Stochastic Optimization IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the optimality of SGD in a stochastic setting. |
Alexander Rakhlin; Ohad Shamir; Karthik Sridharan; |
2012 | 7 | A Fast And Simple Algorithm For Training Neural Probabilistic Language Models IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a fast and simple algorithm for training NPLMs based on noise-contrastive estimation, a newly introduced procedure for estimating unnormalized continuous distributions. |
Andriy Mnih; Yee Whye Teh; |
2012 | 8 | High Dimensional Semiparametric Gaussian Copula Graphical Models IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a semiparametric approach named nonparanormal SKEPTIC for efficiently and robustly estimating high dimensional undirected graphical models. |
Han Liu; Fang Han; Ming Yuan; John Lafferty; Larry Wasserman; |
2012 | 9 | Learning Task Grouping And Overlap In Multi-task Learning IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a framework for multi-task learn- ing that enables one to selectively share the information across the tasks. |
Abhishek Kumar; Hal Daume III; |
2012 | 10 | Fast Approximation Of Matrix Coherence And Statistical Leverage IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Interestingly, to achieve our result we judiciously apply random projections on both sides of A. |
Michael Mahoney; Petros Drineas; Malik Magdon-Ismail; David Woodruff; |
2012 | 11 | Variational Bayesian Inference With Stochastic Search IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an algorithm based on stochastic optimization that allows for direct optimization of the variational lower bound in all models. |
John Paisley; David Blei; Michael Jordan; |
2012 | 12 | Parallelizing Exploration-Exploitation Tradeoffs With Gaussian Process Bandit Optimization IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formalize the task as a multi-armed bandit problem, where the unknown payoff function is sampled from a Gaussian process (GP), and instead of a single arm, in each round we pull a batch of several arms in parallel. |
Thomas Desautels; Andreas Krause; Joel Burdick; |
2012 | 13 | Learning To Label Aerial Images From Noisy Data IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose two robust loss functions for dealing with these kinds of label noise and use the loss functions to train a deep neural network on two challenging aerial image datasets. |
Volodymyr Mnih; Geoffrey Hinton; |
2012 | 14 | Revisiting K-means: New Algorithms Via Bayesian Nonparametrics IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we revisit the k-means clustering algorithm from a Bayesian nonparametric viewpoint. |
Brian Kulis; Michael Jordan; |
2012 | 15 | PAC Subset Selection In Stochastic Multi-armed Bandits IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Whereas their formal analysis is restricted to the worst case sample complexity of algorithms, in this paper, we design and analyze an algorithm (�LUCB�) with improved expected sample complexity. |
Shivaram Kalyanakrishnan; Ambuj Tewari; Peter Auer; Peter Stone; |
2011 | 1 | Multimodal Deep Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel application of deep networks to learn features over multiple modalities. |
JIQUAN NGIAM et. al. |
2011 | 2 | Bayesian Learning Via Stochastic Gradient Langevin Dynamics IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. |
Max Welling; Yee Whye Teh; |
2011 | 3 | A Three-Way Model For Collective Learning On Multi-Relational Data IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we present a novel approach to relational learning based on the factorization of a three-way tensor. |
Maximilian Nickel; Volker Tresp; Hans-Peter Kriegel; |
2011 | 4 | Domain Adaptation For Large-Scale Sentiment Classification: A Deep Learning Approach IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a deep learning approach which learns to extract a meaningful representation for each review in an unsupervised fashion. |
Xavier Glorot; Antoine Bordes; Yoshua Bengio; |
2011 | 5 | Parsing Natural Scenes And Natural Language With Recursive Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a max-margin structure prediction architecture based on recursive neural networks that can successfully recover such structure both in complex scene images as well as sentences. |
Richard Socher; Cliff Chiung-Yu Lin; Andrew Ng; Chris Manning; |
2011 | 6 | Generating Text With Recurrent Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we demonstrate the power of RNNs trained with the new Hessian-Free optimizer (HF) by applying them to character-level language modeling tasks. |
Ilya Sutskever; James Martens; Geoffrey Hinton; |
2011 | 7 | PILCO: A Model-Based And Data-Efficient Approach To Policy Search IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. |
Marc Deisenroth; Carl Rasmussen; |
2011 | 8 | Contractive Auto-Encoders: Explicit Invariance During Feature Extraction IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present in this paper a novel approach for training deterministic auto-encoders. |
Salah RIFAI; Pascal Vincent; Xavier Muller; Xavier Glorot; Yoshua Bengio; |
2011 | 9 | Hashing With Graphs IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel graph-based hashing method which automatically discovers the neighborhood structure inherent in the data to learn appropriate compact codes. |
Wei Liu; Jun Wang; Sanjiv Kumar; Shih-Fu Chang; |
2011 | 10 | On Optimization Methods For Deep Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that more sophisticated off-the-shelf optimization methods such as Limited memory BFGS (L-BFGS) and Conjugate gradient (CG) with linesearch can significantly simplify and speed up the process of pretraining deep algorithms. |
QUOC LE et. al. |
2011 | 11 | Minimal Loss Hashing For Compact Binary Codes IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method for learning similarity-preserving hash functions that map high-dimensional data onto binary codes. |
Mohammad Norouzi; David Fleet; |
2011 | 12 | A Co-training Approach For Multi-view Spectral Clustering IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a spectral clustering algorithm for the multi-view setting where we have access to multiple views of the data, each of which can be independently used for clustering. |
Abhishek Kumar; Hal Daume III; University of Maryland; |
2011 | 13 | The Importance Of Encoding Versus Training With Sparse Coding And Vector Quantization IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the reasons for the success of sparse coding over VQ by decoupling these phases, allowing us to separate out the contributions of the training and encoding in a controlled way. |
Adam Coates; Andrew Ng; |
2011 | 14 | Learning Recurrent Neural Networks With Hessian-Free Optimization IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we resolve the long-outstanding problem of how to effectively train recurrent neural networks (RNNs) on complex and difficult sequence modeling problems which may contain long-term data dependencies. |
James Martens; Ilya Sutskever; |
2011 | 15 | Doubly Robust Policy Evaluation And Learning IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we leverage the strength and overcome the weaknesses of the two approaches by applying the \emph{doubly robust} technique to the problems of policy evaluation and optimization. |
Miroslav Dudik; John Langford; Lihong Li; |
2010 | 1 | Rectified Linear Units Improve Restricted Boltzmann Machines IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Restricted Boltzmann machines were developed using binary stochastic hidden units. These can be generalized by replacing each binary unit by an infinite number of copies that all … |
Vinod Nair; Geoffrey Hinton; |
2010 | 2 | 3D Convolutional Neural Networks For Human Action Recognition IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a novel 3D CNN model for action recognition. |
Shuiwang Ji; Wei Xu; Ming Yang; Kai Yu; |
2010 | 3 | Gaussian Process Optimization In The Bandit Setting: No Regret And Experimental Design IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formalize this task as a multi-armed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. |
Niranjan Srinivas; Andreas Krause; Sham Kakade; Matthias Seeger; |
2010 | 4 | Robust Subspace Segmentation By Low-Rank Representation IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose low-rank representation(LRR) to segment data drawn from a union of multiple linear (or affine) subspaces. |
Guangcan Liu; Zhouchen Lin; Yong Yu; |
2010 | 5 | Learning Fast Approximations Of Sparse Coding IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We proposed two versions of a very fast algorithm that produces approximate estimates of the sparse code that can be used to compute good visual features, or to initialize exact iterative algorithms. |
Karol Gregor; Yann LeCun; |
2010 | 6 | Deep Learning Via Hessian-free Optimization IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a 2nd-order optimization method based on the “Hessian-free approach, and apply it to training deep auto-encoders. |
James Martens; |
2010 | 7 | Large Graph Construction For Scalable Semi-Supervised Learning IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the scalability issue plaguing graph-based semi-supervised learning viaa small number of anchor points which adequately cover the entire point cloud. |
Wei Liu; Junfeng He; Shih-Fu Chang; |
2010 | 8 | Estimation Of (near) Low-rank Matrices With Noise And High-dimensional Scaling IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study an instance of high-dimensional statistical inference inwhich the goal is to use $N$ noisy observations to estimate a matrix$\Theta^* \in \real^{k \times p}$ that is assumed to be either exactlylow rank, or "near" low-rank, meaning that it can bewell-approximated by a matrix with low rank. |
Sahand Negahban; Martin Wainwright; |
2010 | 9 | Application Of Machine Learning To Epileptic Seizure Detection IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present and evaluate a machine learning approach to constructing patient-specific classifiers that detect the onset of an epileptic seizure through analysis of the scalp EEG, a non-invasive measure of the brain�s electrical activity. |
Ali Shoeb; John Guttag; |
2010 | 10 | Bayes Optimal Multilabel Classification Via Probabilistic Classifier Chains IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this paper is to elaborate on this postulate in a critical way. |
Krzysztof Dembczynski; Weiwei Cheng; Eyke Huellermeier; |
2010 | 11 | Tree-Guided Group Lasso For Multi-Task Regression With Structured Sparsity IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our goal is to recover the common set of relevant inputs for each output cluster. |
Seyoung Kim; Eric Xing; |
2010 | 12 | Proximal Methods For Sparse Hierarchical Dictionary Learning IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to combine two approaches for modeling data admitting sparse representations: on the one hand, dictionary learning has proven effective for various signal processing tasks. |
Rodolphe Jenatton; Julien Mairal; Guillaume Obozinski; Francis Bach; |
2010 | 13 | Distance Dependent Chinese Restaurant Processes IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for non-exchangeability. |
David Blei; Peter Frazier; |
2010 | 14 | Metric Learning To Rank IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a general metric learning algorithm, based on the structural SVM framework, to learn a metric such that rankings of data induced by distance from a query can be optimized against various ranking measures, such as AUC, Precision-at-k, MRR, MAP or NDCG. |
Brian McFee; Gert Lanckriet; |
2010 | 15 | Sequential Projection Learning For Hashing With Compact Codes IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel data-dependent projection learning method such that each hash function is designed to correct the errors made by the previous one sequentially. |
Jun Wang; Sanjiv Kumar; Shih-Fu Chang; |
2009 | 1 | Curriculum Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we formalize such training strategies in the context of machine learning, and call them "curriculum learning". |
Yoshua Bengio; Jérôme Louradour; Ronan Collobert; Jason Weston; |
2009 | 2 | Online Dictionary Learning For Sparse Coding IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a new online optimization algorithm for dictionary learning, based on stochastic approximations, which scales up gracefully to large datasets with millions of training samples. |
Julien Mairal; Francis Bach; Jean Ponce; Guillermo Sapiro; |
2009 | 3 | Group Lasso With Overlap And Graph Lasso IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new penalty function which, when used as regularization for empirical risk minimization procedures, leads to sparse estimators. |
Laurent Jacob; Guillaume Obozinski; Jean-Philippe Vert; |
2009 | 4 | Information Theoretic Measures For Clusterings Comparison: Is A Correction For Chance Necessary? IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we discuss the necessity of correction for chance for information theoretic based measures for clusterings comparison. |
Nguyen Xuan Vinh; Julien Epps; James Bailey; |
2009 | 5 | Learning Structural SVMs With Latent Variables IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a large-margin formulation and algorithm for structured output prediction that allows the use of latent variables. |
Chun-Nam John Yu; Thorsten Joachims; |
2009 | 6 | Multi-view Clustering Via Canonical Correlation Analysis IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we consider constructing such projections using multiple views of the data, via Canonical Correlation Analysis (CCA). |
Kamalika Chaudhuri; Sham M. Kakade; Karen Livescu; Karthik Sridharan; |
2009 | 7 | Large-scale Deep Unsupervised Learning Using Graphics Processors IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we suggest massively parallel methods to help resolve these problems. |
Rajat Raina; Anand Madhavan; Andrew Y. Ng; |
2009 | 8 | Fast Gradient-descent Methods For Temporal-difference Learning With Linear Function Approximation IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we introduce two new related algorithms with better convergence rates. |
RICHARD S. SUTTON et. al. |
2009 | 9 | Identifying Suspicious URLs: An Application Of Large-scale Online Learning IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores online learning approaches for detecting malicious Web sites (those involved in criminal scams) using lexical and host-based features of the associated URLs. |
Justin Ma; Lawrence K. Saul; Stefan Savage; Geoffrey M. Voelker; |
2009 | 10 | An Accelerated Gradient Method For Trace Norm Minimization IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we exploit the special structure of the trace norm, based on which we propose an extended gradient algorithm that converges as O(1/k). |
Shuiwang Ji; Jieping Ye; |
2009 | 11 | Learning With Structured Sparsity IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates a new learning formulation called structured sparsity, which is a natural extension of the standard sparsity concept in statistical learning and compressive sensing. |
Junzhou Huang; Tong Zhang; Dimitris Metaxas; |
2009 | 12 | More Generality In Efficient Multiple Kernel Learning IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we observe that existing MKL formulations can be extended to learn general kernel combinations subject to general regularization. |
Manik Varma; Bodla Rakesh Babu; |
2009 | 13 | Incorporating Domain Knowledge Into Topic Modeling Via Dirichlet Forest Priors IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present its construction, and inference via collapsed Gibbs sampling. |
David Andrzejewski; Xiaojin Zhu; Mark Craven; |
2009 | 14 | Multi-instance Learning By Treating Instances As Non-I.I.D. Samples IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose two simple yet effective methods. |
Zhi-Hua Zhou; Yu-Yin Sun; Yu-Feng Li; |
2009 | 15 | Factored Conditional Restricted Boltzmann Machines For Modeling Motion Style IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new model, based on the CRBM that preserves its most important computational properties and includes multiplicative three-way interactions that allow the effective interaction weight between two units to be modulated by the dynamic state of a third unit. |
Graham W. Taylor; Geoffrey E. Hinton; |
2008 | 1 | Extracting And Composing Robust Features With Denoising Autoencoders IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern. |
Pascal Vincent; Hugo Larochelle; Yoshua Bengio; Pierre-Antoine Manzagol; |
2008 | 2 | A Unified Architecture For Natural Language Processing: Deep Neural Networks With Multitask Learning IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model. |
Ronan Collobert; Jason Weston; |
2008 | 3 | Bayesian Probabilistic Matrix Factorization Using Markov Chain Monte Carlo IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present a fully Bayesian treatment of the Probabilistic Matrix Factorization (PMF) model in which model capacity is controlled automatically by integrating over all model parameters and hyperparameters. |
Ruslan Salakhutdinov; Andriy Mnih; |
2008 | 4 | Deep Learning Via Semi-supervised Embedding IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show how nonlinear embedding algorithms popular for use with shallow semi-supervised learning techniques such as kernel methods can be applied to deep multilayer architectures, either as a regularizer at the output layer, or on each layer of the architecture. |
Jason Weston; Frédéric Ratle; Ronan Collobert; |
2008 | 5 | A Dual Coordinate Descent Method For Large-scale Linear SVM IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel dual coordinate descent method for linear SVM with L1-and L2-loss functions. |
Cho-Jui Hsieh; Kai-Wei Chang; Chih-Jen Lin; S. Sathiya Keerthi; S. Sundararajan; |
2008 | 6 | Training Restricted Boltzmann Machines Using Approximations To The Likelihood Gradient IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A new algorithm for training Restricted Boltzmann Machines is introduced. |
Tijmen Tieleman; |
2008 | 7 | Classification Using Discriminative Restricted Boltzmann Machines IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that RBMs provide a self-contained framework for deriving competitive non-linear classifiers. |
Hugo Larochelle; Yoshua Bengio; |
2008 | 8 | Listwise Approach To Learning To Rank: Theory And Algorithm IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to conduct a study on the listwise approach to learning to rank. |
Fen Xia; Tie-Yan Liu; Jue Wang; Wensheng Zhang; Hang Li; |
2008 | 9 | Grassmann Discriminant Analysis: A Unifying View On Subspace-based Learning IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose a discriminant learning framework for problems in which data consist of linear subspaces instead of vectors. |
Jihun Hamm; Daniel D. Lee; |
2008 | 10 | An Empirical Evaluation Of Supervised Learning In High Dimensions IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we perform an empirical evaluation of supervised learning on high-dimensional data. |
Rich Caruana; Nikos Karampatziakis; Ainur Yessenalina; |
2008 | 11 | Learning Diverse Rankings With Multi-armed Bandits IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present two online learning algorithms that directly learn a diverse ranking of documents based on users’ clicking behavior. |
Filip Radlinski; Robert Kleinberg; Thorsten Joachims; |
2008 | 12 | On The Quantitative Analysis Of Deep Belief Networks IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that Annealed Importance Sampling (AIS) can be used to efficiently estimate the partition function of an RBM, and we present a novel AIS scheme for comparing RBM’s with different architectures. |
Ruslan Salakhutdinov; Iain Murray; |
2008 | 13 | Hierarchical Sampling For Active Learning IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an active learning scheme that exploits cluster structure in data. |
Sanjoy Dasgupta; Daniel Hsu; |
2008 | 14 | Confidence-weighted Linear Classification IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce confidence-weighted linear classifiers, which add parameter confidence information to linear classifiers. |
Mark Dredze; Koby Crammer; Fernando Pereira; |
2008 | 15 | Bolasso: Model Consistent Lasso Estimation Through The Bootstrap IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a detailed asymptotic analysis of model consistency of the Lasso. |
Francis R. Bach; |
2007 | 1 | Pegasos: Primal Estimated Sub-GrAdient SOlver For SVM IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe and analyze a simple and effective iterative algorithm for solving the optimization problem cast by Support Vector Machines (SVM). |
Shai Shalev-Shwartz; Yoram Singer; Nathan Srebro; |
2007 | 2 | Restricted Boltzmann Machines For Collaborative Filtering IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we show how a class of two-layer undirected graphical models, called Restricted Boltzmann Machines (RBM’s), can be used to model tabular data, such as user’s ratings of movies. |
Ruslan Salakhutdinov; Andriy Mnih; Geoffrey Hinton; |
2007 | 3 | Information-theoretic Metric Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an information-theoretic approach to learning a Mahalanobis distance function. |
Jason V. Davis; Brian Kulis; Prateek Jain; Suvrit Sra; Inderjit S. Dhillon; |
2007 | 4 | Learning To Rank: From Pairwise Approach To Listwise Approach IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper proposes a new probabilistic method for the approach. |
Zhe Cao; Tao Qin; Tie-Yan Liu; Ming-Feng Tsai; Hang Li; |
2007 | 5 | Self-taught Learning: Transfer Learning From Unlabeled Data IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe an approach to self-taught learning that uses sparse coding to construct higher-level features using the unlabeled data. |
Rajat Raina; Alexis Battle; Honglak Lee; Benjamin Packer; Andrew Y. Ng; |
2007 | 6 | Boosting For Transfer Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel transfer learning framework called TrAdaBoost, which extends boosting-based learning algorithms (Freund & Schapire, 1997). |
Wenyuan Dai; Qiang Yang; Gui-Rong Xue; Yong Yu; |
2007 | 7 | An Empirical Evaluation Of Deep Architectures On Problems With Many Factors Of Variation IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, several learning algorithms relying on models with deep architectures have been proposed. |
Hugo Larochelle; Dumitru Erhan; Aaron Courville; James Bergstra; Yoshua Bengio; |
2007 | 8 | Spectral Feature Selection For Supervised And Unsupervised Learning IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work exploits intrinsic properties underlying supervised and unsupervised feature selection algorithms, and proposes a unified framework for feature selection based on spectral graph theory. |
Zheng Zhao; Huan Liu; |
2007 | 9 | Experimental Perspectives On Learning From Imbalanced Data IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address these and other issues in this work, showing that sampling in many cases will improve classifier performance. |
Jason Van Hulse; Taghi M. Khoshgoftaar; Amri Napolitano; |
2007 | 10 | Three New Graphical Models For Statistical Language Modelling IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose three new probabilistic language models that define the distribution of the next word in a sequence given several preceding words by using distributed representations of those words. |
Andriy Mnih; Geoffrey Hinton; |
2007 | 11 | Combining Online And Offline Knowledge In UCT IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider three approaches for combining offline and online value functions in the UCT algorithm. |
Sylvain Gelly; David Silver; |
2007 | 12 | Spectral Clustering And Transductive Learning With Multiple Views IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider spectral clustering and transductive inference for data with multiple views. |
Dengyong Zhou; Christopher J. C. Burges; |
2007 | 13 | Discriminative Learning For Differing Training And Test Distributions IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address classification problems for which the training instances are governed by a distribution that is allowed to differ arbitrarily from the test distribution—problems also referred to as classification under covariate shift. |
Steffen Bickel; Michael Brückner; Tobias Scheffer; |
2007 | 14 | Uncovering Shared Structures In Multiclass Classification IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper suggests a method for multiclass learning with many classes by simultaneously learning shared characteristics common to the classes, and predictors for the classes in terms of these characteristics. |
Yonatan Amit; Michael Fink; Nathan Srebro; Shimon Ullman; |
2007 | 15 | Supervised Feature Selection Via Dependence Estimation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a framework for filtering features that employs the Hilbert-Schmidt Independence Criterion (HSIC) as a measure of dependence between the features and the labels. |
Le Song; Alex Smola; Arthur Gretton; Karsten M. Borgwardt; Justin Bedo; |
2006 | 1 | The Relationship Between Precision-Recall And ROC Curves IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A corollary is the notion of an achievable PR curve, which has properties much like the convex hull in ROC space; we show an efficient algorithm for computing this curve. |
Jesse Davis; Mark Goadrich; |
2006 | 2 | Connectionist Temporal Classification: Labelling Unsegmented Sequence Data With Recurrent Neural Networks IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems. |
Alex Graves; Santiago Fernández; Faustino Gomez; Jürgen Schmidhuber; |
2006 | 3 | Dynamic Topic Models IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The approach is to use state space models on the natural parameters of the multinomial distributions that represent the topics. |
David M. Blei; John D. Lafferty; |
2006 | 4 | An Empirical Comparison Of Supervised Learning Algorithms IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a large-scale empirical comparison between ten supervised learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps. |
Rich Caruana; Alexandru Niculescu-Mizil; |
2006 | 5 | Topic Modeling: Beyond Bag-of-words IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, I explore a hierarchical generative probabilistic model that incorporates both n-gram statistics and latent topic variables by extending a unigram topic model to include properties of a hierarchical Dirichlet bigram language model. |
Hanna M. Wallach; |
2006 | 6 | Cover Trees For Nearest Neighbor IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a tree data structure for fast nearest neighbor operations in general n-point metric spaces (where the data set consists of n points). |
Alina Beygelzimer; Sham Kakade; John Langford; |
2006 | 7 | Label Propagation Through Linear Neighborhoods IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A novel semi-supervised learning approach is proposed based on a linear neighborhood model, which assumes that each data point can be linearly reconstructed from its neighborhood. |
Fei Wang; Changshui Zhang; |
2006 | 8 | Maximum Margin Planning IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this approach, we learn mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert’s behavior. |
Nathan D. Ratliff; J. Andrew Bagnell; Martin A. Zinkevich; |
2006 | 9 | Pachinko Allocation: DAG-structured Mixture Models Of Topic Correlations IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the pachinko allocation model (PAM), which captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG). |
Wei Li; Andrew McCallum; |
2006 | 10 | Fast Time Series Classification Using Numerosity Reduction IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an additional technique, numerosity reduction, to speed up one-nearest-neighbor DTW. |
Xiaopeng Xi; Eamonn Keogh; Christian Shelton; Li Wei; Chotirat Ann Ratanamahatana; |
2006 | 11 | Agnostic Active Learning IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We state and analyze the first active learning algorithm which works in the presence of arbitrary forms of noise. |
Maria-Florina Balcan; Alina Beygelzimer; John Langford; |
2006 | 12 | Probabilistic Inference For Solving Discrete And Continuous State Markov Decision Processes IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we present an Expectation Maximization algorithm for computing optimal policies. |
Marc Toussaint; Amos Storkey; |
2006 | 13 | Batch Mode Active Learning And Its Application To Medical Image Classification IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a framework for "batch mode active learning" that applies the Fisher information matrix to select a number of informative examples simultaneously. |
Steven C. H. Hoi; Rong Jin; Jianke Zhu; Michael R. Lyu; |
2006 | 14 | PAC Model-free Reinforcement Learning IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For a Markov Decision Process with finite state (size S) and action spaces (size A per state), we propose a new algorithm—Delayed Q-Learning. |
Alexander L. Strehl; Lihong Li; Eric Wiewiora; John Langford; Michael L. Littman; |
2006 | 15 | Practical Solutions To The Problem Of Diagonal Dominance In Kernel Document Clustering IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we investigate the implications of diagonal dominance for unsupervised kernel methods, specifically in the task of document clustering. |
Derek Greene; Pádraig Cunningham; |
2005 | 1 | Learning To Rank Using Gradient Descent IF:10 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function. |
CHRIS BURGES et. al. |
2005 | 2 | Predicting Good Probabilities With Supervised Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We examine the relationship between the predictions made by different learning algorithms and true posterior probabilities. |
Alexandru Niculescu-Mizil; Rich Caruana; |
2005 | 3 | Fast Maximum Margin Matrix Factorization For Collaborative Prediction IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we investigate a direct gradient-based optimization method for MMMF and demonstrate it on large collaborative prediction problems. |
Jasson D. M. Rennie; Nathan Srebro; |
2005 | 4 | A Support Vector Method For Multivariate Performance Measures IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1-score. |
Thorsten Joachims; |
2005 | 5 | Comparing Clusterings: An Axiomatic View IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper views clusterings as elements of a lattice. Distances between clusterings are analyzed in their relationship to the lattice. From this vantage point, we first give an … |
Marina Meilǎ; |
2005 | 6 | Learning Structured Prediction Models: A Large Margin Approach IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our goal is to learn parameters such that inference using the model reproduces correct answers on the training data. |
Ben Taskar; Vassil Chatalbashev; Daphne Koller; Carlos Guestrin; |
2005 | 7 | Non-negative Tensor Factorization With Applications To Statistics And Computer Vision IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We derive algorithms for finding a non-negative n-dimensional tensor factorization (n-NTF) which includes the non-negative matrix factorization (NMF) as a particular case when n = 2. |
Amnon Shashua; Tamir Hazan; |
2005 | 8 | Semi-supervised Graph Clustering: A Kernel Approach IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we unify vector-based and graph-based approaches. |
Brian Kulis; Sugato Basu; Inderjit Dhillon; Raymond Mooney; |
2005 | 9 | Near-optimal Sensor Placements In Gaussian Processes IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a mutual information criteria, and show that it produces better placements. |
Carlos Guestrin; Andreas Krause; Ajit Paul Singh; |
2005 | 10 | Beyond The Point Cloud: From Transductive To Semi-supervised Learning IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show how to turn transductive and standard supervised learning algorithms into semi-supervised learners. |
Vikas Sindhwani; Partha Niyogi; Mikhail Belkin; |
2005 | 11 | High Speed Obstacle Avoidance Using Monocular Vision And Reinforcement Learning IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an approach in which supervised learning is first used to estimate depths from single monocular images. |
Jeff Michels; Ashutosh Saxena; Andrew Y. Ng; |
2005 | 12 | Learning From Labeled And Unlabeled Data On A Directed Graph IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a general framework for learning from labeled and unlabeled data on a directed graph in which the structure of the graph including the directionality of the edges is considered. |
Dengyong Zhou; Jiayuan Huang; Bernhard Schölkopf; |
2005 | 13 | Learning Gaussian Processes From Multiple Tasks IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of multi-task learning, that is, learning multiple related functions. |
Kai Yu; Volker Tresp; Anton Schwaighofer; |
2005 | 14 | Reinforcement Learning With Gaussian Processes IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new generative model for the value function, deduced from its relation with the discounted return. |
Yaakov Engel; Shie Mannor; Ron Meir; |
2005 | 15 | Bayesian Hierarchical Clustering IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel algorithm for agglomerative hierarchical clustering based on evaluating marginal likelihoods of a probabilistic model. |
Katherine A. Heller; Zoubin Ghahramani; |
2004 | 1 | Apprenticeship Learning Via Inverse Reinforcement Learning IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We think of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and give an algorithm for learning the task demonstrated by the expert. |
Pieter Abbeel; Andrew Y. Ng; |
2004 | 2 | A Maximum Entropy Approach To Species Distribution Modeling IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the use of maximum-entropy techniques for this problem, specifically, sequential-update algorithms that can handle a very large number of features. |
Steven J. Phillips; Miroslav Dudík; Robert E. Schapire; |
2004 | 3 | Multiple Kernel Learning, Conic Duality, And The SMO Algorithm IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel dual formulation of the QCQP as a second-order cone programming problem, and show how to exploit the technique of Moreau-Yosida regularization to yield a formulation to which SMO techniques can be applied. |
Francis R. Bach; Gert R. G. Lanckriet; Michael I. Jordan; |
2004 | 4 | K-means Clustering Via Principal Component Analysis IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we prove that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering. |
Chris Ding; Xiaofeng He; |
2004 | 5 | Support Vector Machine Learning For Interdependent And Structured Output Spaces IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to generalize multiclass Support Vector Machine learning in a formulation that involves features extracted jointly from inputs and outputs. |
Ioannis Tsochantaridis; Thomas Hofmann; Thorsten Joachims; Yasemin Altun; |
2004 | 6 | Solving Large Scale Linear Prediction Problems Using Stochastic Gradient Descent Algorithms IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study stochastic gradient descent (SGD) algorithms on regularized forms of linear prediction methods. |
Tong Zhang; |
2004 | 7 | Integrating Constraints And Metric Learning In Semi-supervised Clustering IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper provides new methods for the two approaches as well as presents a new semi-supervised clustering algorithm that integrates both of these techniques in a uniform, principled framework. |
Mikhail Bilenko; Sugato Basu; Raymond J. Mooney; |
2004 | 8 | Dynamic Conditional Random Fields: Factorized Probabilistic Models For Labeling And Segmenting Sequence Data IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges—a distributed state representation as in dynamic Bayesian networks (DBNs)—and parameters are tied across slices. |
Charles Sutton; Khashayar Rohanimanesh; Andrew McCallum; |
2004 | 9 | Learning And Evaluating Classifiers Under Sample Selection Bias IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formalize the sample selection bias problem in machine learning terms and study analytically and experimentally how a number of well-known classifier learning methods are affected by it. |
Bianca Zadrozny; |
2004 | 10 | Ensemble Selection From Libraries Of Models IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method for constructing ensembles from libraries of thousands of models. |
Rich Caruana; Alexandru Niculescu-Mizil; Geoff Crew; Alex Ksikes; |
2004 | 11 | Active Learning Using Pre-clustering IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The main contribution of the paper is a formal framework that incorporates clustering into active learning. |
Hieu T. Nguyen; Arnold Smeulders; |
2004 | 12 | A Kernel View Of The Dimensionality Reduction Of Manifolds IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show how all three algorithms can be described as kernel PCA on specially constructed Gram matrices, and illustrate the similarities and differences between the algorithms with representative examples. |
Jihun Ham; Daniel D. Lee; Sebastian Mika; Bernhard Schölkopf; |
2004 | 13 | Learning A Kernel Matrix For Nonlinear Dimensionality Reduction IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate how to learn a kernel matrix for high dimensional data that lies on or near a low dimensional manifold. |
Kilian Q. Weinberger; Fei Sha; Lawrence K. Saul; |
2004 | 14 | Solving Cluster Ensemble Problems By Bipartite Graph Partitioning IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new reduction method that constructs a bipartite graph from a given cluster ensemble. |
Xiaoli Zhang Fern; Carla E. Brodley; |
2004 | 15 | Generalized Low Rank Approximations Of Matrices IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike the problem of low rank approximations of a single matrix, which was well studied in the past, the proposed algorithm in this paper does not admit a closed form solution in general. |
Jieping Ye; |