Most Influential ICML Papers
The International Conference on Machine Learning (ICML) is one of the top machine learning conferences in the world. Paper Digest Team analyze all papers published on ICML in the past years, and presents the 15 most influential papers for each year. This ranking list is automatically constructed based upon citations from both research papers and granted patents, and will be frequently updated to reflect the most recent changes. To find the most influential papers from other conferences/journals, visit Best Paper Digest page. Note: the most influential papers may or may not include the papers that won the best paper awards. (Version: 2021-05)
If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. To search for papers with highlights, related papers, patents, grants, experts and organizations, please visit our search console. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
team@paperdigest.org
TABLE 1: Most Influential ICML Papers
Year | Rank | Paper | Author(s) |
---|---|---|---|
2020 | 1 | A Simple Framework For Contrastive Learning Of Visual Representations IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a simple framework for contrastive representation learning. |
Ting Chen; Simon Kornblith; Mohammad Norouzi; Geoffrey Hinton; |
2020 | 2 | Data-Efficient Image Recognition With Contrastive Predictive Coding IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We therefore revisit and improve Contrastive Predictive Coding, an unsupervised objective for learning such representations. |
Olivier Henaff; |
2020 | 3 | PEGASUS: Pre-training With Extracted Gap-sentences For Abstractive Summarization IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised objective. |
Jingqing Zhang; Yao Zhao; Mohammad Saleh; Peter Liu; |
2020 | 4 | On Gradient Descent Ascent For Nonconvex-Concave Minimax Problems IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present the complexity results on two-time-scale GDA for solving nonconvex-concave minimax problems, showing that the algorithm can find a stationary point of the function $\Phi(\cdot) := \max_{\mathbf{y} \in \mathcal{Y}} f(\cdot, \mathbf{y})$ efficiently. |
Tianyi Lin; Chi Jin; Michael Jordan; |
2020 | 5 | Reliable Evaluation Of Adversarial Robustness With An Ensemble Of Diverse Parameter-free Attacks IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we first propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function. We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness. |
Francesco Croce; Matthias Hein; |
2020 | 6 | What Is Local Optimality In Nonconvex-Nonconcave Minimax Optimization? IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: The main contribution of this paper is to propose a proper mathematical definition of local optimality for this sequential setting—local minimax, as well as to present its properties and existence results. |
Chi Jin; Praneeth Netrapalli; Michael Jordan; |
2020 | 7 | Skew-Fit: State-Covering Self-Supervised Reinforcement Learning IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a formal exploration objective for goal-reaching policies that maximizes state coverage. |
VITCHYR PONG et. al. |
2020 | 8 | Generative Pretraining From Pixels IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. |
MARK CHEN et. al. |
2020 | 9 | SCAFFOLD: Stochastic Controlled Averaging For Federated Learning IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: As a solution, we propose a new algorithm (SCAFFOLD) which uses control variates (variance reduction) to correct for the `client drift’. |
SAI PRANEETH REDDY KARIMIREDDY et. al. |
2020 | 10 | Leveraging Procedural Generation To Benchmark Reinforcement Learning IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce Procgen Benchmark, a suite of 16 procedurally generated game-like environments designed to benchmark both sample efficiency and generalization in reinforcement learning. |
Karl Cobbe; Chris Hesse; Jacob Hilton; John Schulman; |
2020 | 11 | How Good Is The Bayes Posterior In Deep Neural Networks Really? IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work we cast doubt on the current understanding of Bayes posteriors in popular deep neural networks: we demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions when compared to simpler methods including point estimates obtained from SGD. |
FLORIAN WENZEL et. al. |
2020 | 12 | Provably Efficient Exploration In Policy Optimization IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: To bridge such a gap, this paper proposes an Optimistic variant of the Proximal Policy Optimization algorithm (OPPO), which follows an "optimistic version" of the policy gradient direction. |
Qi Cai; Zhuoran Yang; Chi Jin; Zhaoran Wang; |
2020 | 13 | Agent57: Outperforming The Atari Human Benchmark IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose Agent57, the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games. |
ADRIÀ PUIGDOMENECH BADIA et. al. |
2020 | 14 | Overfitting In Adversarially Robust Deep Learning IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we empirically study this phenomenon in the setting of adversarially trained deep networks, which are trained to minimize the loss under worst-case adversarial perturbations. |
Eric Wong; Leslie Rice; Zico Kolter; |
2020 | 15 | The Many Shapley Values For Model Explanation IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we use the axiomatic approach to study the differences between some of the many operationalizations of the Shapley value for attribution, and propose a technique called Baseline Shapley (BShap) that is backed by a proper uniqueness result. |
Mukund Sundararajan; Amir Najmi; |
2019 | 1 | EfficientNet: Rethinking Model Scaling For Convolutional Neural Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. |
Mingxing Tan; Quoc Le; |
2019 | 2 | Self-Attention Generative Adversarial Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks. |
Han Zhang; Ian Goodfellow; Dimitris Metaxas; Augustus Odena; |
2019 | 3 | A Convergence Theory For Deep Learning Via Over-Parameterization IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we prove simple algorithms such as stochastic gradient descent (SGD) can find Global Minima on the training objective of DNNs in Polynomial Time. |
Zeyuan Allen-Zhu; Yuanzhi Li; Zhao Song; |
2019 | 4 | Theoretically Principled Trade-off Between Robustness And Accuracy IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we decompose the prediction error for adversarial examples (robust error) as the sum of the natural (classification) error and boundary error, and provide a differentiable upper bound using the theory of classification-calibrated loss, which is shown to be the tightest possible upper bound uniform over all probability distributions and measurable predictors. |
HONGYANG ZHANG et. al. |
2019 | 5 | Certified Adversarial Robustness Via Randomized Smoothing IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We show how to turn any classifier that classifies well under Gaussian noise into a new classifier that is certifiably robust to adversarial perturbations under the L2 norm. |
Jeremy Cohen; Elan Rosenfeld; Zico Kolter; |
2019 | 6 | Gradient Descent Finds Global Minima Of Deep Neural Networks IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet). |
Simon Du; Jason Lee; Haochuan Li; Liwei Wang; Xiyu Zhai; |
2019 | 7 | Simplifying Graph Convolutional Networks IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we reduce this excess complexity through successively removing nonlinearities and collapsing weight matrices between consecutive layers. |
FELIX WU et. al. |
2019 | 8 | Challenging Common Assumptions In The Unsupervised Learning Of Disentangled Representations IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions. |
FRANCESCO LOCATELLO et. al. |
2019 | 9 | Fine-Grained Analysis Of Optimization And Generalization For Overparameterized Two-Layer Neural Networks IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR’17]. (ii) Generalization bound independent of network size, using a data-dependent complexity measure. |
Sanjeev Arora; Simon Du; Wei Hu; Zhiyuan Li; Ruosong Wang; |
2019 | 10 | Learning Latent Dynamics For Planning From Pixels IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose the Deep Planning Network (PlaNet), a purely model-based agent that learns the environment dynamics from images and chooses actions through fast online planning in latent space. |
DANIJAR HAFNER et. al. |
2019 | 11 | MASS: Masked Sequence To Sequence Pre-training For Language Generation IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the success of BERT, we propose MAsked Sequence to Sequence pre-training (MASS) for the encoder-decoder based language generation tasks. |
Kaitao Song; Xu Tan; Tao Qin; Jianfeng Lu; Tie-Yan Liu; |
2019 | 12 | Do ImageNet Classifiers Generalize To ImageNet? IF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We build new test sets for the CIFAR-10 and ImageNet datasets. |
Benjamin Recht; Rebecca Roelofs; Ludwig Schmidt; Vaishaal Shankar; |
2019 | 13 | Off-Policy Deep Reinforcement Learning Without Exploration IF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we demonstrate that due to errors introduced by extrapolation, standard off-policy deep reinforcement learning algorithms, such as DQN and DDPG, are incapable of learning with data uncorrelated to the distribution under the current policy, making them ineffective for this fixed batch setting. |
Scott Fujimoto; David Meger; Doina Precup; |
2019 | 14 | Quantifying Generalization In Reinforcement Learning IF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate the problem of overfitting in deep reinforcement learning. |
Karl Cobbe; Oleg Klimov; Chris Hesse; Taehoon Kim; John Schulman; |
2019 | 15 | Exploring The Landscape Of Spatial Robustness IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we thoroughly investigate the vulnerability of neural network–based classifiers to rotations and translations. |
Logan Engstrom; Brandon Tran; Dimitris Tsipras; Ludwig Schmidt; Aleksander Madry; |
2018 | 1 | Obfuscated Gradients Give A False Sense Of Security: Circumventing Defenses To Adversarial Examples IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We describe characteristic behaviors of defenses exhibiting the effect, and for each of the three types of obfuscated gradients we discover, we develop attack techniques to overcome it. |
Anish Athalye; Nicholas Carlini; David Wagner; |
2018 | 2 | Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning With A Stochastic Actor IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. |
Tuomas Haarnoja; Aurick Zhou; Pieter Abbeel; Sergey Levine; |
2018 | 3 | CyCADA: Cycle-Consistent Adversarial Domain Adaptation IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a model which adapts between domains using both generative image space alignment and latent representation space alignment. |
JUDY HOFFMAN et. al. |
2018 | 4 | Addressing Function Approximation Error In Actor-Critic Methods IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested. |
Scott Fujimoto; Herke Hoof; David Meger; |
2018 | 5 | Provable Defenses Against Adversarial Examples Via The Convex Outer Adversarial Polytope IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. |
Eric Wong; Zico Kolter; |
2018 | 6 | Synthesizing Robust Adversarial Examples IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We demonstrate the existence of robust 3D adversarial objects, and we present the first algorithm for synthesizing examples that are adversarial over a chosen distribution of transformations. |
Anish Athalye; Logan Engstrom; Andrew Ilyas; Kevin Kwok; |
2018 | 7 | IMPALA: Scalable Distributed Deep-RL With Importance Weighted Actor-Learner Architectures IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. |
LASSE ESPEHOLT et. al. |
2018 | 8 | Disentangling By Factorising IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions. |
Hyunjik Kim; Andriy Mnih; |
2018 | 9 | Which Training Methods For GANs Do Actually Converge? IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show that the requirement of absolute continuity is necessary: we describe a simple yet prototypical counterexample showing that in the more realistic case of distributions that are not absolutely continuous, unregularized GAN training is not always convergent. |
Lars Mescheder; Andreas Geiger; Sebastian Nowozin; |
2018 | 10 | Parallel WaveNet: Fast High-Fidelity Speech Synthesis IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces Probability Density Distillation, a new method for training a parallel feed-forward network from a trained WaveNet with no significant difference in quality. |
AARON OORD et. al. |
2018 | 11 | MentorNet: Learning Data-Driven Curriculum For Very Deep Neural Networks On Corrupted Labels IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: To overcome the overfitting on corrupted labels, we propose a novel technique of learning another neural network, called MentorNet, to supervise the training of the base deep networks, namely, StudentNet. |
Lu Jiang; Zhengyuan Zhou; Thomas Leung; Li-Jia Li; Li Fei-Fei; |
2018 | 12 | Representation Learning On Graphs With Jumping Knowledge Networks IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We analyze some important properties of these models, and propose a strategy to overcome those. |
KEYULU XU et. al. |
2018 | 13 | Black-box Adversarial Attacks With Limited Queries And Information IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We define three realistic threat models that more accurately characterize many real-world classifiers: the query-limited setting, the partial-information setting, and the label-only setting. |
Andrew Ilyas; Logan Engstrom; Anish Athalye; Jessy Lin; |
2018 | 14 | Learning To Reweight Examples For Robust Deep Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast to past reweighting methods, which typically consist of functions of the cost value of each example, in this work we propose a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions. |
Mengye Ren; Wenyuan Zeng; Bin Yang; Raquel Urtasun; |
2018 | 15 | Learning Representations And Generative Models For 3D Point Clouds IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we look at geometric data represented as point clouds. |
Panos Achlioptas; Olga Diamanti; Ioannis Mitliagkas; Leonidas Guibas; |
2017 | 1 | Model-Agnostic Meta-Learning For Fast Adaptation Of Deep Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. |
Chelsea Finn; Pieter Abbeel; Sergey Levine; |
2017 | 2 | Wasserstein Generative Adversarial Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a new algorithm named WGAN, an alternative to traditional GAN training. |
Martin Arjovsky; Soumith Chintala; L�on Bottou; |
2017 | 3 | Convolutional Sequence To Sequence Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce an architecture based entirely on convolutional neural networks. |
Jonas Gehring; Michael Auli; David Grangier; Denis Yarats; Yann N. Dauphin; |
2017 | 4 | Neural Message Passing For Quantum Chemistry IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we reformulate existing models into a single common framework we call Message Passing Neural Networks (MPNNs) and explore additional novel variations within this framework. |
Justin Gilmer; Samuel S. Schoenholz; Patrick F. Riley; Oriol Vinyals; George E. Dahl; |
2017 | 5 | Conditional Image Synthesis With Auxiliary Classifier GANs IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we introduce new methods for the improved training of generative adversarial networks (GANs) for image synthesis. |
Augustus Odena; Christopher Olah; Jonathon Shlens; |
2017 | 6 | On Calibration Of Modern Neural Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We discover that modern neural networks, unlike those from a decade ago, are poorly calibrated. |
Chuan Guo; Geoff Pleiss; Yu Sun; Kilian Q. Weinberger; |
2017 | 7 | Axiomatic Attribution For Deep Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. |
Mukund Sundararajan; Ankur Taly; Qiqi Yan; |
2017 | 8 | Learning To Discover Cross-Domain Relations With Generative Adversarial Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a method based on a generative adversarial network that learns to discover relations between different domains (DiscoGAN). |
Taeksoo Kim; Moonsu Cha; Hyunsoo Kim; Jung Kwon Lee; Jiwon Kim; |
2017 | 9 | Learning Important Features Through Propagating Activation Differences IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here we present DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. |
Avanti Shrikumar; Peyton Greenside; Anshul Kundaje; |
2017 | 10 | Understanding Black-box Predictions Via Influence Functions IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we use influence functions — a classic technique from robust statistics — to trace a model’s prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. |
Pang Wei Koh; Percy Liang; |
2017 | 11 | Curiosity-driven Exploration By Self-supervised Prediction IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We formulate curiosity as the error in an agent’s ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model. |
Deepak Pathak; Pulkit Agrawal; Alexei A. Efros; Trevor Darrell; |
2017 | 12 | Language Modeling With Gated Convolutional Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we develop a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens. |
Yann N. Dauphin; Angela Fan; Michael Auli; David Grangier; |
2017 | 13 | Large-Scale Evolution Of Image Classifiers IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our goal is to minimize human participation, so we employ evolutionary algorithms to discover such networks automatically. |
ESTEBAN REAL et. al. |
2017 | 14 | Deep Transfer Learning With Joint Adaptation Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present joint adaptation networks (JAN), which learn a transfer network by aligning the joint distributions of multiple domain-specific layers across domains based on a joint maximum mean discrepancy (JMMD) criterion. |
Mingsheng Long; Han Zhu; Jianmin Wang; Michael I. Jordan; |
2017 | 15 | Continual Learning Through Synaptic Intelligence IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we introduce intelligent synapses that bring some of this biological complexity into artificial neural networks. |
Friedemann Zenke; Ben Poole; Surya Ganguli; |
2016 | 1 | Asynchronous Methods For Deep Reinforcement Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. |
VOLODYMYR MNIH et. al. |
2016 | 2 | Dropout As A Bayesian Approximation: Representing Model Uncertainty In Deep Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we develop a new theoretical framework casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes. |
Yarin Gal; Zoubin Ghahramani; |
2016 | 3 | Deep Speech 2 : End-to-End Speech Recognition In English And Mandarin IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech–two vastly different languages. |
DARIO AMODEI et. al. |
2016 | 4 | Generative Adversarial Text To Image Synthesis IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we develop a novel deep architecture and GAN formulation to effectively bridge these advances in text and image modeling, translating visual concepts from characters to pixels. |
SCOTT REED et. al. |
2016 | 5 | Dueling Network Architectures For Deep Reinforcement Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a new neural network architecture for model-free reinforcement learning. |
ZIYU WANG et. al. |
2016 | 6 | Pixel Recurrent Neural Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. |
Aaron Van Oord; Nal Kalchbrenner; Koray Kavukcuoglu; |
2016 | 7 | Autoencoding Beyond Pixels Using A Learned Similarity Metric IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present an autoencoder that leverages learned representations to better measure similarities in data space. |
Anders Boesen Lindbo Larsen; S�ren Kaae S�nderby; Hugo Larochelle; Ole Winther; |
2016 | 8 | Learning Convolutional Neural Networks For Graphs IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a framework for learning convolutional neural networks for arbitrary graphs. |
Mathias Niepert; Mohamed Ahmed; Konstantin Kutzkov; |
2016 | 9 | Benchmarking Deep Reinforcement Learning For Continuous Control IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure. |
Yan Duan; Xi Chen; Rein Houthooft; John Schulman; Pieter Abbeel; |
2016 | 10 | Unsupervised Deep Embedding For Clustering Analysis IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature representations and cluster assignments using deep neural networks. |
Junyuan Xie; Ross Girshick; Ali Farhadi; |
2016 | 11 | Ask Me Anything: Dynamic Memory Networks For Natural Language Processing IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce the dynamic memory network (DMN), a neural network architecture which processes input sequences and questions, forms episodic memories, and generates relevant answers. |
ANKIT KUMAR et. al. |
2016 | 12 | Complex Embeddings For Simple Link Prediction IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: As in previous studies, we propose to solve this problem through latent factorization. |
Th�o Trouillon; Johannes Welbl; Sebastian Riedel; Eric Gaussier; Guillaume Bouchard; |
2016 | 13 | Revisiting Semi-Supervised Learning With Graph Embeddings IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a semi-supervised learning framework based on graph embeddings. |
Zhilin Yang; William Cohen; Ruslan Salakhudinov; |
2016 | 14 | CryptoNets: Applying Neural Networks To Encrypted Data With High Throughput And Accuracy IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we will present a method to convert learned neural networks to CryptoNets, neural networks that can be applied to encrypted data. |
RAN GILAD-BACHRACH et. al. |
2016 | 15 | Continuous Deep Q-Learning With Model-based Acceleration IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we explore algorithms and representations to reduce the sample complexity of deep reinforcement learning for continuous control tasks. |
Shixiang Gu; Timothy Lillicrap; Ilya Sutskever; Sergey Levine; |
2015 | 1 | Batch Normalization: Accelerating Deep Network Training By Reducing Internal Covariate Shift IF:10 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. |
Sergey Ioffe; Christian Szegedy; |
2015 | 2 | Show, Attend And Tell: Neural Image Caption Generation With Visual Attention IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. |
KELVIN XU et. al. |
2015 | 3 | Trust Region Policy Optimization IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this article, we describe a method for optimizing control policies, with guaranteed monotonic improvement. |
John Schulman; Sergey Levine; Pieter Abbeel; Michael Jordan; Philipp Moritz; |
2015 | 4 | Unsupervised Domain Adaptation By Backpropagation IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we propose a new approach to domain adaptation in deep architectures that can be trained on large amount of labeled data from the source domain and large amount of unlabeled data from the target domain (no labeled target-domain data is necessary). |
Yaroslav Ganin; Victor Lempitsky; |
2015 | 5 | Learning Transferable Features With Deep Adaptation Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new Deep Adaptation Network (DAN) architecture, which generalizes deep convolutional neural network to the domain adaptation scenario. |
Mingsheng Long; Yue Cao; Jianmin Wang; Michael Jordan; |
2015 | 6 | Unsupervised Learning Of Video Representations Using LSTMs IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We use Long Short Term Memory (LSTM) networks to learn representations of video sequences. |
Nitish Srivastava; Elman Mansimov; Ruslan Salakhudinov; |
2015 | 7 | DRAW: A Recurrent Neural Network For Image Generation IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces the Deep Recurrent Attentive Writer (DRAW) architecture for image generation with neural networks. |
Karol Gregor; Ivo Danihelka; Alex Graves; Danilo Rezende; Daan Wierstra; |
2015 | 8 | Variational Inference With Normalizing Flows IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a new approach for specifying flexible, arbitrarily complex and scalable approximate posterior distributions. |
Danilo Rezende; Shakir Mohamed; |
2015 | 9 | Deep Learning With Limited Numerical Precision IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We study the effect of limited precision data representation and computation on neural network training. |
Suyog Gupta; Ankur Agrawal; Kailash Gopalakrishnan; Pritish Narayanan; |
2015 | 10 | An Empirical Exploration Of Recurrent Network Architectures IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we aim to determine whether the LSTM architecture is optimal or whether much better architectures exist. |
Rafal Jozefowicz; Wojciech Zaremba; Ilya Sutskever; |
2015 | 11 | From Word Embeddings To Document Distances IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present the Word Mover’s Distance (WMD), a novel distance function between text documents. |
Matt Kusner; Yu Sun; Nicholas Kolkin; Kilian Weinberger; |
2015 | 12 | Compressing Neural Networks With The Hashing Trick IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel network architecture, HashedNets, that exploits inherent redundancy in neural networks to achieve drastic reductions in model sizes. |
Wenlin Chen; James Wilson; Stephen Tyree; Kilian Weinberger; Yixin Chen; |
2015 | 13 | An Embarrassingly Simple Approach To Zero-shot Learning IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we describe a zero-shot learning approach that can be implemented in just one line of code, yet it is able to outperform state of the art approaches on standard datasets. |
Bernardino Romera-Paredes; Philip Torr; |
2015 | 14 | Online Tracking By Learning Discriminative Saliency Map With Convolutional Neural Network IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an online visual tracking algorithm by learning discriminative saliency map using Convolutional Neural Network (CNN). |
Seunghoon Hong; Tackgeun You; Suha Kwak; Bohyung Han; |
2015 | 15 | Gated Feedback Recurrent Neural Networks IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel recurrent neural network (RNN) architecture. |
Junyoung Chung; Caglar Gulcehre; Kyunghyun Cho; Yoshua Bengio; |
2014 | 1 | Distributed Representations Of Sentences And Documents IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an unsupervised algorithm that learns vector representations of sentences and text documents. |
Quoc Le; Tomas Mikolov; |
2014 | 2 | DeCAF: A Deep Convolutional Activation Feature For Generic Visual Recognition IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re-purposed to novel generic tasks. |
JEFF DONAHUE et. al. |
2014 | 3 | Stochastic Backpropagation And Approximate Inference In Deep Generative Models IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound. |
Danilo Jimenez Rezende; Shakir Mohamed; Daan Wierstra; |
2014 | 4 | Deterministic Policy Gradient Algorithms IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. |
DAVID SILVER et. al. |
2014 | 5 | Towards End-To-End Speech Recognition With Recurrent Neural Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation. |
Alex Graves; Navdeep Jaitly; |
2014 | 6 | Recurrent Convolutional Neural Networks For Scene Labeling IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an approach that consists of a recurrent convolutional neural network which allows us to consider a large input context while limiting the capacity of the model. |
Pedro Pinheiro; Ronan Collobert; |
2014 | 7 | Neural Variational Inference And Learning In Belief Networks IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a fast non-iterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior. |
Andriy Mnih; Karol Gregor; |
2014 | 8 | Multimodal Neural Language Models IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce two multimodal neural language models: models of natural language that can be conditioned on other modalities. |
Ryan Kiros; Ruslan Salakhutdinov; Rich Zemel; |
2014 | 9 | Stochastic Gradient Hamiltonian Monte Carlo IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we explore the properties of such a stochastic gradient HMC approach. |
Tianqi Chen; Emily Fox; Carlos Guestrin; |
2014 | 10 | Learning Character-level Representations For Part-of-Speech Tagging IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a deep neural network that learns character-level representation of words and associate them with usual word representations to perform POS tagging. |
Cicero Dos Santos; Bianca Zadrozny; |
2014 | 11 | Accelerated Proximal Stochastic Dual Coordinate Ascent For Regularized Loss Minimization IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a proximal version of the stochastic dual coordinate ascent method and show how to accelerate the method using an inner-outer iteration procedure. |
Shai Shalev-Shwartz; Tong Zhang; |
2014 | 12 | Fast Computation Of Wasserstein Barycenters IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present new algorithms to compute the mean of a set of N empirical probability measures under the optimal transport metric. |
Marco Cuturi; Arnaud Doucet; |
2014 | 13 | Large-scale Multi-label Learning With Missing Labels IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we directly address both these problems by studying the multi-label problem in a generic empirical risk minimization (ERM) framework. |
Hsiang-Fu Yu; Prateek Jain; Purushottam Kar; Inderjit Dhillon; |
2014 | 14 | A Clockwork RNN IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces a simple, yet powerful modification to the simple RNN (SRN) architecture, the Clockwork RNN (CW-RNN), in which the hidden layer is partitioned into separate modules, each processing inputs at its own temporal granularity, making computations only at its prescribed clock rate. |
Jan Koutnik; Klaus Greff; Faustino Gomez; Juergen Schmidhuber; |
2014 | 15 | Deep Generative Stochastic Networks Trainable By Backprop IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood. |
Yoshua Bengio; Eric Laufer; Guillaume Alain; Jason Yosinski; |
2013 | 1 | On The Difficulty Of Training Recurrent Neural Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective. |
Razvan Pascanu; Tomas Mikolov; Yoshua Bengio; |
2013 | 2 | On The Importance Of Initialization And Momentum In Deep Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs (on datasets with long-term dependencies) to levels of performance that were previously achievable only with Hessian-Free optimization. |
Ilya Sutskever; James Martens; George Dahl; Geoffrey Hinton; |
2013 | 3 | Regularization Of Neural Networks Using DropConnect IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce DropConnect, a generalization of DropOut, for regularizing large fully-connected layers within neural networks. |
Li Wan; Matthew Zeiler; Sixin Zhang; Yann Le Cun; Rob Fergus; |
2013 | 4 | Maxout Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout. |
Ian Goodfellow; David Warde-Farley; Mehdi Mirza; Aaron Courville; Yoshua Bengio; |
2013 | 5 | Deep Canonical Correlation Analysis IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce Deep Canonical Correlation Analysis (DCCA), a method to learn complex nonlinear transformations of two views of data such that the resulting representations are highly linearly correlated. |
Galen Andrew; Raman Arora; Jeff Bilmes; Karen Livescu; |
2013 | 6 | Making A Science Of Model Search: Hyperparameter Optimization In Hundreds Of Dimensions For Vision Architectures IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a meta-modeling approach to support automated hyperparameter optimization, with the goal of providing practical tools that replace hand-tuning with a reproducible and unbiased optimization process. |
James Bergstra; Daniel Yamins; David Cox; |
2013 | 7 | Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a new general framework for convex optimization over matrix factorizations, where every Frank-Wolfe iteration will consist of a low-rank update, and discuss the broad application areas of this approach. |
Martin Jaggi; |
2013 | 8 | Learning Fair Representations IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a learning algorithm for fair classification that achieves both group fairness (the proportion of members in a protected group receiving positive classification is identical to the proportion in the population as a whole), and individual fairness (similar individuals should be treated similarly). |
Rich Zemel; Yu Wu; Kevin Swersky; Toni Pitassi; Cynthia Dwork; |
2013 | 9 | Deep Learning With COTS HPC Systems IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present technical details and results from our own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI. |
ADAM COATES et. al. |
2013 | 10 | Guided Policy Search IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima. |
Sergey Levine; Vladlen Koltun; |
2013 | 11 | Thompson Sampling For Contextual Bandits With Linear Payoffs IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we design and analyze Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary. |
Shipra Agrawal; Navin Goyal; |
2013 | 12 | Connecting The Dots With Landmarks: Discriminatively Learning Domain-Invariant Features For Unsupervised Domain Adaptation IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel approach for learning such features. |
Boqing Gong; Kristen Grauman; Fei Sha; |
2013 | 13 | No More Pesky Learning Rates IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. |
Tom Schaul; Sixin Zhang; Yann LeCun; |
2013 | 14 | Stochastic Gradient Descent For Non-smooth Optimization: Convergence Results And Optimal Averaging Schemes IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate the performance of SGD \emphwithout such smoothness assumptions, as well as a running average scheme to convert the SGD iterates to a solution with optimal optimization accuracy. |
Ohad Shamir; Tong Zhang; |
2013 | 15 | Gaussian Process Kernels For Pattern Discovery And Extrapolation IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce simple closed form kernels that can be used with Gaussian processes to discover patterns and enable extrapolation. |
Andrew Wilson; Ryan Adams; |
2012 | 1 | Building High-level Features Using Large Scale Unsupervised Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider the challenge of building feature detectors for high-level concepts from only unlabeled data. |
QUOC LE et. al. |
2012 | 2 | Conversational Speech Transcription Using Context-Dependent Deep Neural Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Abstract: Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, combine the classic artificial-neural-network HMMs with traditional context-dependent acoustic modeling and … |
Dong Yu; Frank Seide; Gang Li; |
2012 | 3 | Poisoning Attacks Against Support Vector Machines IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: As we demonstrate in this contribution, an intelligent adversary can to some extent predict the change of the SVM decision function in response to malicious input and use this ability to construct malicious data points. |
Battista Biggio; Blaine Nelson; Pavel Laskov; |
2012 | 4 | Marginalized Denoising Autoencoders For Domain Adaptation IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a variation, marginalized SDA (mSDA). |
Minmin Chen; Zhixiang Xu; Kilian Weinberger; Fei Sha; |
2012 | 5 | Modeling Temporal Dependencies In High-Dimensional Sequences: Application To Polyphonic Music Generation And Transcription IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a probabilistic model based on distribution estimators conditioned on a recurrent neural network that is able to discover temporal dependencies in high-dimensional sequences. |
Nicolas Boulanger-Lewandowski; Yoshua Bengio; Pascal Vincent; |
2012 | 6 | A Fast And Simple Algorithm For Training Neural Probabilistic Language Models IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a fast and simple algorithm for training NPLMs based on noise-contrastive estimation, a newly introduced procedure for estimating unnormalized continuous distributions. |
Andriy Mnih; Yee Whye Teh; |
2012 | 7 | Making Gradient Descent Optimal For Strongly Convex Stochastic Optimization IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate the optimality of SGD in a stochastic setting. |
Alexander Rakhlin; Ohad Shamir; Karthik Sridharan; |
2012 | 8 | Learning Task Grouping And Overlap In Multi-task Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a framework for multi-task learn- ing that enables one to selectively share the information across the tasks. |
Abhishek Kumar; Hal Daume III; |
2012 | 9 | High Dimensional Semiparametric Gaussian Copula Graphical Models IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a semiparametric approach named nonparanormal SKEPTIC for efficiently and robustly estimating high dimensional undirected graphical models. |
Han Liu; Fang Han; Ming Yuan; John Lafferty; Larry Wasserman; |
2012 | 10 | Fast Approximation Of Matrix Coherence And Statistical Leverage IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Interestingly, to achieve our result we judiciously apply random projections on both sides of A. |
Michael Mahoney; Petros Drineas; Malik Magdon-Ismail; David Woodruff; |
2012 | 11 | Variational Bayesian Inference With Stochastic Search IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present an algorithm based on stochastic optimization that allows for direct optimization of the variational lower bound in all models. |
John Paisley; David Blei; Michael Jordan; |
2012 | 12 | Revisiting K-means: New Algorithms Via Bayesian Nonparametrics IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we revisit the k-means clustering algorithm from a Bayesian nonparametric viewpoint. |
Brian Kulis; Michael Jordan; |
2012 | 13 | Learning To Label Aerial Images From Noisy Data IF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose two robust loss functions for dealing with these kinds of label noise and use the loss functions to train a deep neural network on two challenging aerial image datasets. |
Volodymyr Mnih; Geoffrey Hinton; |
2012 | 14 | Parallelizing Exploration-Exploitation Tradeoffs With Gaussian Process Bandit Optimization IF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We formalize the task as a multi-armed bandit problem, where the unknown payoff function is sampled from a Gaussian process (GP), and instead of a single arm, in each round we pull a batch of several arms in parallel. |
Thomas Desautels; Andreas Krause; Joel Burdick; |
2012 | 15 | A Joint Model Of Language And Perception For Grounded Attribute Learning IF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present an approach for joint learning of language and perception models for grounded attribute induction. |
Cynthia Matuszek; Nicholas FitzGerald; Luke Zettlemoyer; Liefeng Bo; Dieter Fox; |
2011 | 1 | Multimodal Deep Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel application of deep networks to learn features over multiple modalities. |
JIQUAN NGIAM et. al. |
2011 | 2 | Domain Adaptation For Large-Scale Sentiment Classification: A Deep Learning Approach IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a deep learning approach which learns to extract a meaningful representation for each review in an unsupervised fashion. |
Xavier Glorot; Antoine Bordes; Yoshua Bengio; |
2011 | 3 | Parsing Natural Scenes And Natural Language With Recursive Neural Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a max-margin structure prediction architecture based on recursive neural networks that can successfully recover such structure both in complex scene images as well as sentences. |
Richard Socher; Cliff Chiung-Yu Lin; Andrew Ng; Chris Manning; |
2011 | 4 | Bayesian Learning Via Stochastic Gradient Langevin Dynamics IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. |
Max Welling; Yee Whye Teh; |
2011 | 5 | Generating Text With Recurrent Neural Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we demonstrate the power of RNNs trained with the new Hessian-Free optimizer (HF) by applying them to character-level language modeling tasks. |
Ilya Sutskever; James Martens; Geoffrey Hinton; |
2011 | 6 | Contractive Auto-Encoders: Explicit Invariance During Feature Extraction IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present in this paper a novel approach for training deterministic auto-encoders. |
Salah RIFAI; Pascal Vincent; Xavier Muller; Xavier Glorot; Yoshua Bengio; |
2011 | 7 | A Three-Way Model For Collective Learning On Multi-Relational Data IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we present a novel approach to relational learning based on the factorization of a three-way tensor. |
Maximilian Nickel; Volker Tresp; Hans-Peter Kriegel; |
2011 | 8 | PILCO: A Model-Based And Data-Efficient Approach To Policy Search IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. |
Marc Deisenroth; Carl Rasmussen; |
2011 | 9 | Hashing With Graphs IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel graph-based hashing method which automatically discovers the neighborhood structure inherent in the data to learn appropriate compact codes. |
Wei Liu; Jun Wang; Sanjiv Kumar; Shih-Fu Chang; |
2011 | 10 | On Optimization Methods For Deep Learning IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show that more sophisticated off-the-shelf optimization methods such as Limited memory BFGS (L-BFGS) and Conjugate gradient (CG) with linesearch can significantly simplify and speed up the process of pretraining deep algorithms. |
QUOC LE et. al. |
2011 | 11 | Minimal Loss Hashing For Compact Binary Codes IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a method for learning similarity-preserving hash functions that map high-dimensional data onto binary codes. |
Mohammad Norouzi; David Fleet; |
2011 | 12 | The Importance Of Encoding Versus Training With Sparse Coding And Vector Quantization IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we investigate the reasons for the success of sparse coding over VQ by decoupling these phases, allowing us to separate out the contributions of the training and encoding in a controlled way. |
Adam Coates; Andrew Ng; |
2011 | 13 | A Co-training Approach For Multi-view Spectral Clustering IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a spectral clustering algorithm for the multi-view setting where we have access to multiple views of the data, each of which can be independently used for clustering. |
Abhishek Kumar; Hal Daume III; University of Maryland; |
2011 | 14 | Learning Recurrent Neural Networks With Hessian-Free Optimization IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work we resolve the long-outstanding problem of how to effectively train recurrent neural networks (RNNs) on complex and difficult sequence modeling problems which may contain long-term data dependencies. |
James Martens; Ilya Sutskever; |
2011 | 15 | Noisy Matrix Decomposition Via Convex Relaxation: Optimal Rates In High Dimensions IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We analyze a class of estimators based on a convex relaxation for solving high-dimensional matrix decomposition problems. |
Alekh Agarwal; Sahand Negahban; Martin Wainwright; |
2010 | 1 | Rectified Linear Units Improve Restricted Boltzmann Machines IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Abstract: Restricted Boltzmann machines were developed using binary stochastic hidden units. These can be generalized by replacing each binary unit by an infinite number of copies that all … |
Vinod Nair; Geoffrey Hinton; |
2010 | 2 | 3D Convolutional Neural Networks For Human Action Recognition IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we develop a novel 3D CNN model for action recognition. |
Shuiwang Ji; Wei Xu; Ming Yang; Kai Yu; |
2010 | 3 | Robust Subspace Segmentation By Low-Rank Representation IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose low-rank representation(LRR) to segment data drawn from a union of multiple linear (or affine) subspaces. |
Guangcan Liu; Zhouchen Lin; Yong Yu; |
2010 | 4 | Gaussian Process Optimization In The Bandit Setting: No Regret And Experimental Design IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We formalize this task as a multi-armed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. |
Niranjan Srinivas; Andreas Krause; Sham Kakade; Matthias Seeger; |
2010 | 5 | Learning Fast Approximations Of Sparse Coding IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We proposed two versions of a very fast algorithm that produces approximate estimates of the sparse code that can be used to compute good visual features, or to initialize exact iterative algorithms. |
Karol Gregor; Yann LeCun; |
2010 | 6 | Deep Learning Via Hessian-free Optimization IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We develop a 2nd-order optimization method based on the “Hessian-free approach, and apply it to training deep auto-encoders. |
James Martens; |
2010 | 7 | Large Graph Construction For Scalable Semi-Supervised Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we address the scalability issue plaguing graph-based semi-supervised learning viaa small number of anchor points which adequately cover the entire point cloud. |
Wei Liu; Junfeng He; Shih-Fu Chang; |
2010 | 8 | Estimation Of (near) Low-rank Matrices With Noise And High-dimensional Scaling IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We study an instance of high-dimensional statistical inference inwhich the goal is to use $N$ noisy observations to estimate a matrix$\Theta^* \in \real^{k \times p}$ that is assumed to be either exactlylow rank, or "near" low-rank, meaning that it can bewell-approximated by a matrix with low rank. |
Sahand Negahban; Martin Wainwright; |
2010 | 9 | Tree-Guided Group Lasso For Multi-Task Regression With Structured Sparsity IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our goal is to recover the common set of relevant inputs for each output cluster. |
Seyoung Kim; Eric Xing; |
2010 | 10 | Bayes Optimal Multilabel Classification Via Probabilistic Classifier Chains IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: The goal of this paper is to elaborate on this postulate in a critical way. |
Krzysztof Dembczynski; Weiwei Cheng; Eyke Huellermeier; |
2010 | 11 | Proximal Methods For Sparse Hierarchical Dictionary Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to combine two approaches for modeling data admitting sparse representations: on the one hand, dictionary learning has proven effective for various signal processing tasks. |
Rodolphe Jenatton; Julien Mairal; Guillaume Obozinski; Francis Bach; |
2010 | 12 | Application Of Machine Learning To Epileptic Seizure Detection IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present and evaluate a machine learning approach to constructing patient-specific classifiers that detect the onset of an epileptic seizure through analysis of the scalp EEG, a non-invasive measure of the brain�s electrical activity. |
Ali Shoeb; John Guttag; |
2010 | 13 | Sequential Projection Learning For Hashing With Compact Codes IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel data-dependent projection learning method such that each hash function is designed to correct the errors made by the previous one sequentially. |
Jun Wang; Sanjiv Kumar; Shih-Fu Chang; |
2010 | 14 | Distance Dependent Chinese Restaurant Processes IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for non-exchangeability. |
David Blei; Peter Frazier; |
2010 | 15 | Metric Learning To Rank IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a general metric learning algorithm, based on the structural SVM framework, to learn a metric such that rankings of data induced by distance from a query can be optimized against various ranking measures, such as AUC, Precision-at-k, MRR, MAP or NDCG. |
Brian McFee; Gert Lanckriet; |
2009 | 1 | Curriculum Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we formalize such training strategies in the context of machine learning, and call them "curriculum learning". |
Yoshua Bengio; Jérôme Louradour; Ronan Collobert; Jason Weston; |
2009 | 2 | Online Dictionary Learning For Sparse Coding IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a new online optimization algorithm for dictionary learning, based on stochastic approximations, which scales up gracefully to large datasets with millions of training samples. |
Julien Mairal; Francis Bach; Jean Ponce; Guillermo Sapiro; |
2009 | 3 | Group Lasso With Overlap And Graph Lasso IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a new penalty function which, when used as regularization for empirical risk minimization procedures, leads to sparse estimators. |
Laurent Jacob; Guillaume Obozinski; Jean-Philippe Vert; |
2009 | 4 | Learning Structural SVMs With Latent Variables IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a large-margin formulation and algorithm for structured output prediction that allows the use of latent variables. |
Chun-Nam John Yu; Thorsten Joachims; |
2009 | 5 | Multi-view Clustering Via Canonical Correlation Analysis IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we consider constructing such projections using multiple views of the data, via Canonical Correlation Analysis (CCA). |
Kamalika Chaudhuri; Sham M. Kakade; Karen Livescu; Karthik Sridharan; |
2009 | 6 | Large-scale Deep Unsupervised Learning Using Graphics Processors IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we suggest massively parallel methods to help resolve these problems. |
Rajat Raina; Anand Madhavan; Andrew Y. Ng; |
2009 | 7 | Information Theoretic Measures For Clusterings Comparison: Is A Correction For Chance Necessary? IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we discuss the necessity of correction for chance for information theoretic based measures for clusterings comparison. |
Nguyen Xuan Vinh; Julien Epps; James Bailey; |
2009 | 8 | An Accelerated Gradient Method For Trace Norm Minimization IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we exploit the special structure of the trace norm, based on which we propose an extended gradient algorithm that converges as O(1/k). |
Shuiwang Ji; Jieping Ye; |
2009 | 9 | Learning With Structured Sparsity IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper investigates a new learning formulation called structured sparsity, which is a natural extension of the standard sparsity concept in statistical learning and compressive sensing. |
Junzhou Huang; Tong Zhang; Dimitris Metaxas; |
2009 | 10 | Identifying Suspicious URLs: An Application Of Large-scale Online Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper explores online learning approaches for detecting malicious Web sites (those involved in criminal scams) using lexical and host-based features of the associated URLs. |
Justin Ma; Lawrence K. Saul; Stefan Savage; Geoffrey M. Voelker; |
2009 | 11 | Fast Gradient-descent Methods For Temporal-difference Learning With Linear Function Approximation IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we introduce two new related algorithms with better convergence rates. |
RICHARD S. SUTTON et. al. |
2009 | 12 | More Generality In Efficient Multiple Kernel Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we observe that existing MKL formulations can be extended to learn general kernel combinations subject to general regularization. |
Manik Varma; Bodla Rakesh Babu; |
2009 | 13 | Incorporating Domain Knowledge Into Topic Modeling Via Dirichlet Forest Priors IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present its construction, and inference via collapsed Gibbs sampling. |
David Andrzejewski; Xiaojin Zhu; Mark Craven; |
2009 | 14 | Factored Conditional Restricted Boltzmann Machines For Modeling Motion Style IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a new model, based on the CRBM that preserves its most important computational properties and includes multiplicative three-way interactions that allow the effective interaction weight between two units to be modulated by the dynamic state of a third unit. |
Graham W. Taylor; Geoffrey E. Hinton; |
2009 | 15 | Supervised Learning From Multiple Experts: Whom To Trust When Everyone Lies A Bit IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We describe a probabilistic approach for supervised learning when we have multiple experts/annotators providing (possibly noisy) labels but no absolute gold standard. |
VIKAS C. RAYKAR et. al. |
2008 | 1 | Extracting And Composing Robust Features With Denoising Autoencoders IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern. |
Pascal Vincent; Hugo Larochelle; Yoshua Bengio; Pierre-Antoine Manzagol; |
2008 | 2 | A Unified Architecture For Natural Language Processing: Deep Neural Networks With Multitask Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model. |
Ronan Collobert; Jason Weston; |
2008 | 3 | Bayesian Probabilistic Matrix Factorization Using Markov Chain Monte Carlo IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we present a fully Bayesian treatment of the Probabilistic Matrix Factorization (PMF) model in which model capacity is controlled automatically by integrating over all model parameters and hyperparameters. |
Ruslan Salakhutdinov; Andriy Mnih; |
2008 | 4 | A Dual Coordinate Descent Method For Large-scale Linear SVM IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a novel dual coordinate descent method for linear SVM with L1-and L2-loss functions. |
Cho-Jui Hsieh; Kai-Wei Chang; Chih-Jen Lin; S. Sathiya Keerthi; S. Sundararajan; |
2008 | 5 | Training Restricted Boltzmann Machines Using Approximations To The Likelihood Gradient IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: A new algorithm for training Restricted Boltzmann Machines is introduced. |
Tijmen Tieleman; |
2008 | 6 | Classification Using Discriminative Restricted Boltzmann Machines IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we argue that RBMs provide a self-contained framework for deriving competitive non-linear classifiers. |
Hugo Larochelle; Yoshua Bengio; |
2008 | 7 | Listwise Approach To Learning To Rank: Theory And Algorithm IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper aims to conduct a study on the listwise approach to learning to rank. |
Fen Xia; Tie-Yan Liu; Jue Wang; Wensheng Zhang; Hang Li; |
2008 | 8 | Grassmann Discriminant Analysis: A Unifying View On Subspace-based Learning IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we propose a discriminant learning framework for problems in which data consist of linear subspaces instead of vectors. |
Jihun Hamm; Daniel D. Lee; |
2008 | 9 | Deep Learning Via Semi-supervised Embedding IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We show how nonlinear embedding algorithms popular for use with shallow semi-supervised learning techniques such as kernel methods can be applied to deep multilayer architectures, either as a regularizer at the output layer, or on each layer of the architecture. |
Jason Weston; Frédéric Ratle; Ronan Collobert; |
2008 | 10 | An Empirical Evaluation Of Supervised Learning In High Dimensions IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we perform an empirical evaluation of supervised learning on high-dimensional data. |
Rich Caruana; Nikos Karampatziakis; Ainur Yessenalina; |
2008 | 11 | Learning Diverse Rankings With Multi-armed Bandits IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present two online learning algorithms that directly learn a diverse ranking of documents based on users’ clicking behavior. |
Filip Radlinski; Robert Kleinberg; Thorsten Joachims; |
2008 | 12 | On The Quantitative Analysis Of Deep Belief Networks IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We show that Annealed Importance Sampling (AIS) can be used to efficiently estimate the partition function of an RBM, and we present a novel AIS scheme for comparing RBM’s with different architectures. |
Ruslan Salakhutdinov; Iain Murray; |
2008 | 13 | Confidence-weighted Linear Classification IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce confidence-weighted linear classifiers, which add parameter confidence information to linear classifiers. |
Mark Dredze; Koby Crammer; Fernando Pereira; |
2008 | 14 | Fast Support Vector Machine Training And Classification On Graphics Processors IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We describe a solver for Support Vector Machine training running on a GPU, using the Sequential Minimal Optimization algorithm and an adaptive first and second order working set selection heuristic, which achieves speedups of 9-35x over LIBSVM running on a traditional processor. |
Bryan Catanzaro; Narayanan Sundaram; Kurt Keutzer; |
2008 | 15 | Hierarchical Sampling For Active Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present an active learning scheme that exploits cluster structure in data. |
Sanjoy Dasgupta; Daniel Hsu; |
2007 | 1 | Pegasos: Primal Estimated Sub-GrAdient SOlver For SVM IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We describe and analyze a simple and effective iterative algorithm for solving the optimization problem cast by Support Vector Machines (SVM). |
Shai Shalev-Shwartz; Yoram Singer; Nathan Srebro; |
2007 | 2 | Information-theoretic Metric Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present an information-theoretic approach to learning a Mahalanobis distance function. |
Jason V. Davis; Brian Kulis; Prateek Jain; Suvrit Sra; Inderjit S. Dhillon; |
2007 | 3 | Restricted Boltzmann Machines For Collaborative Filtering IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we show how a class of two-layer undirected graphical models, called Restricted Boltzmann Machines (RBM’s), can be used to model tabular data, such as user’s ratings of movies. |
Ruslan Salakhutdinov; Andriy Mnih; Geoffrey Hinton; |
2007 | 4 | Learning To Rank: From Pairwise Approach To Listwise Approach IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: The paper proposes a new probabilistic method for the approach. |
Zhe Cao; Tao Qin; Tie-Yan Liu; Ming-Feng Tsai; Hang Li; |
2007 | 5 | Self-taught Learning: Transfer Learning From Unlabeled Data IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We describe an approach to self-taught learning that uses sparse coding to construct higher-level features using the unlabeled data. |
Rajat Raina; Alexis Battle; Honglak Lee; Benjamin Packer; Andrew Y. Ng; |
2007 | 6 | Boosting For Transfer Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a novel transfer learning framework called TrAdaBoost, which extends boosting-based learning algorithms (Freund & Schapire, 1997). |
Wenyuan Dai; Qiang Yang; Gui-Rong Xue; Yong Yu; |
2007 | 7 | An Empirical Evaluation Of Deep Architectures On Problems With Many Factors Of Variation IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Recently, several learning algorithms relying on models with deep architectures have been proposed. |
Hugo Larochelle; Dumitru Erhan; Aaron Courville; James Bergstra; Yoshua Bengio; |
2007 | 8 | Spectral Feature Selection For Supervised And Unsupervised Learning IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work exploits intrinsic properties underlying supervised and unsupervised feature selection algorithms, and proposes a unified framework for feature selection based on spectral graph theory. |
Zheng Zhao; Huan Liu; |
2007 | 9 | Three New Graphical Models For Statistical Language Modelling IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose three new probabilistic language models that define the distribution of the next word in a sequence given several preceding words by using distributed representations of those words. |
Andriy Mnih; Geoffrey Hinton; |
2007 | 10 | Experimental Perspectives On Learning From Imbalanced Data IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We address these and other issues in this work, showing that sampling in many cases will improve classifier performance. |
Jason Van Hulse; Taghi M. Khoshgoftaar; Amri Napolitano; |
2007 | 11 | Combining Online And Offline Knowledge In UCT IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider three approaches for combining offline and online value functions in the UCT algorithm. |
Sylvain Gelly; David Silver; |
2007 | 12 | Spectral Clustering And Transductive Learning With Multiple Views IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider spectral clustering and transductive inference for data with multiple views. |
Dengyong Zhou; Christopher J. C. Burges; |
2007 | 13 | Discriminative Learning For Differing Training And Test Distributions IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We address classification problems for which the training instances are governed by a distribution that is allowed to differ arbitrarily from the test distribution—problems also referred to as classification under covariate shift. |
Steffen Bickel; Michael Brückner; Tobias Scheffer; |
2007 | 14 | Uncovering Shared Structures In Multiclass Classification IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper suggests a method for multiclass learning with many classes by simultaneously learning shared characteristics common to the classes, and predictors for the classes in terms of these characteristics. |
Yonatan Amit; Michael Fink; Nathan Srebro; Shimon Ullman; |
2007 | 15 | More Efficiency In Multiple Kernel Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we address the MKL problem through an adaptive 2-norm regularization formulation. |
Alain Rakotomamonjy; Francis Bach; Stéphane Canu; Yves Grandvalet; |
2006 | 1 | The Relationship Between Precision-Recall And ROC Curves IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: A corollary is the notion of an achievable PR curve, which has properties much like the convex hull in ROC space; we show an efficient algorithm for computing this curve. |
Jesse Davis; Mark Goadrich; |
2006 | 2 | Connectionist Temporal Classification: Labelling Unsegmented Sequence Data With Recurrent Neural Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems. |
Alex Graves; Santiago Fernández; Faustino Gomez; Jürgen Schmidhuber; |
2006 | 3 | Dynamic Topic Models IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: The approach is to use state space models on the natural parameters of the multinomial distributions that represent the topics. |
David M. Blei; John D. Lafferty; |
2006 | 4 | An Empirical Comparison Of Supervised Learning Algorithms IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a large-scale empirical comparison between ten supervised learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps. |
Rich Caruana; Alexandru Niculescu-Mizil; |
2006 | 5 | Topic Modeling: Beyond Bag-of-words IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, I explore a hierarchical generative probabilistic model that incorporates both n-gram statistics and latent topic variables by extending a unigram topic model to include properties of a hierarchical Dirichlet bigram language model. |
Hanna M. Wallach; |
2006 | 6 | Cover Trees For Nearest Neighbor IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a tree data structure for fast nearest neighbor operations in general n-point metric spaces (where the data set consists of n points). |
Alina Beygelzimer; Sham Kakade; John Langford; |
2006 | 7 | Pachinko Allocation: DAG-structured Mixture Models Of Topic Correlations IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce the pachinko allocation model (PAM), which captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG). |
Wei Li; Andrew McCallum; |
2006 | 8 | Fast Time Series Classification Using Numerosity Reduction IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose an additional technique, numerosity reduction, to speed up one-nearest-neighbor DTW. |
Xiaopeng Xi; Eamonn Keogh; Christian Shelton; Li Wei; Chotirat Ann Ratanamahatana; |
2006 | 9 | Maximum Margin Planning IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this approach, we learn mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert’s behavior. |
Nathan D. Ratliff; J. Andrew Bagnell; Martin A. Zinkevich; |
2006 | 10 | Probabilistic Inference For Solving Discrete And Continuous State Markov Decision Processes IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here we present an Expectation Maximization algorithm for computing optimal policies. |
Marc Toussaint; Amos Storkey; |
2006 | 11 | Label Propagation Through Linear Neighborhoods IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: A novel semi-supervised learning approach is proposed based on a linear neighborhood model, which assumes that each data point can be linearly reconstructed from its neighborhood. |
Fei Wang; Changshui Zhang; |
2006 | 12 | Batch Mode Active Learning And Its Application To Medical Image Classification IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a framework for "batch mode active learning" that applies the Fisher information matrix to select a number of informative examples simultaneously. |
Steven C. H. Hoi; Rong Jin; Jianke Zhu; Michael R. Lyu; |
2006 | 13 | Agnostic Active Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We state and analyze the first active learning algorithm which works in the presence of arbitrary forms of noise. |
Maria-Florina Balcan; Alina Beygelzimer; John Langford; |
2006 | 14 | PAC Model-free Reinforcement Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: For a Markov Decision Process with finite state (size S) and action spaces (size A per state), we propose a new algorithm—Delayed Q-Learning. |
Alexander L. Strehl; Lihong Li; Eric Wiewiora; John Langford; Michael L. Littman; |
2006 | 15 | Accelerated Training Of Conditional Random Fields With Stochastic Gradient Methods IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We apply Stochastic Meta-Descent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs). |
S. V. N. Vishwanathan; Nicol N. Schraudolph; Mark W. Schmidt; Kevin P. Murphy; |
2005 | 1 | Learning To Rank Using Gradient Descent IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function. |
CHRIS BURGES et. al. |
2005 | 2 | Fast Maximum Margin Matrix Factorization For Collaborative Prediction IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we investigate a direct gradient-based optimization method for MMMF and demonstrate it on large collaborative prediction problems. |
Jasson D. M. Rennie; Nathan Srebro; |
2005 | 3 | A Support Vector Method For Multivariate Performance Measures IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1-score. |
Thorsten Joachims; |
2005 | 4 | Predicting Good Probabilities With Supervised Learning IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We examine the relationship between the predictions made by different learning algorithms and true posterior probabilities. |
Alexandru Niculescu-Mizil; Rich Caruana; |
2005 | 5 | Comparing Clusterings: An Axiomatic View IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Abstract: This paper views clusterings as elements of a lattice. Distances between clusterings are analyzed in their relationship to the lattice. From this vantage point, we first give an … |
Marina Meilǎ; |
2005 | 6 | Learning Structured Prediction Models: A Large Margin Approach IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our goal is to learn parameters such that inference using the model reproduces correct answers on the training data. |
Ben Taskar; Vassil Chatalbashev; Daphne Koller; Carlos Guestrin; |
2005 | 7 | Non-negative Tensor Factorization With Applications To Statistics And Computer Vision IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We derive algorithms for finding a non-negative n-dimensional tensor factorization (n-NTF) which includes the non-negative matrix factorization (NMF) as a particular case when n = 2. |
Amnon Shashua; Tamir Hazan; |
2005 | 8 | Near-optimal Sensor Placements In Gaussian Processes IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a mutual information criteria, and show that it produces better placements. |
Carlos Guestrin; Andreas Krause; Ajit Paul Singh; |
2005 | 9 | Beyond The Point Cloud: From Transductive To Semi-supervised Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show how to turn transductive and standard supervised learning algorithms into semi-supervised learners. |
Vikas Sindhwani; Partha Niyogi; Mikhail Belkin; |
2005 | 10 | Learning From Labeled And Unlabeled Data On A Directed Graph IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a general framework for learning from labeled and unlabeled data on a directed graph in which the structure of the graph including the directionality of the edges is considered. |
Dengyong Zhou; Jiayuan Huang; Bernhard Schölkopf; |
2005 | 11 | High Speed Obstacle Avoidance Using Monocular Vision And Reinforcement Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present an approach in which supervised learning is first used to estimate depths from single monocular images. |
Jeff Michels; Ashutosh Saxena; Andrew Y. Ng; |
2005 | 12 | Learning Gaussian Processes From Multiple Tasks IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider the problem of multi-task learning, that is, learning multiple related functions. |
Kai Yu; Volker Tresp; Anton Schwaighofer; |
2005 | 13 | Reinforcement Learning With Gaussian Processes IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a new generative model for the value function, deduced from its relation with the discounted return. |
Yaakov Engel; Shie Mannor; Ron Meir; |
2005 | 14 | Bayesian Hierarchical Clustering IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel algorithm for agglomerative hierarchical clustering based on evaluating marginal likelihoods of a probabilistic model. |
Katherine A. Heller; Zoubin Ghahramani; |
2005 | 15 | Learning The Structure Of Markov Logic Networks IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive logic programming (ILP) and feature induction in Markov networks. |
Stanley Kok; Pedro Domingos; |
2004 | 1 | Apprenticeship Learning Via Inverse Reinforcement Learning IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We think of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and give an algorithm for learning the task demonstrated by the expert. |
Pieter Abbeel; Andrew Y. Ng; |
2004 | 2 | A Maximum Entropy Approach To Species Distribution Modeling IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose the use of maximum-entropy techniques for this problem, specifically, sequential-update algorithms that can handle a very large number of features. |
Steven J. Phillips; Miroslav Dudík; Robert E. Schapire; |
2004 | 3 | Multiple Kernel Learning, Conic Duality, And The SMO Algorithm IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel dual formulation of the QCQP as a second-order cone programming problem, and show how to exploit the technique of Moreau-Yosida regularization to yield a formulation to which SMO techniques can be applied. |
Francis R. Bach; Gert R. G. Lanckriet; Michael I. Jordan; |
2004 | 4 | Support Vector Machine Learning For Interdependent And Structured Output Spaces IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to generalize multiclass Support Vector Machine learning in a formulation that involves features extracted jointly from inputs and outputs. |
Ioannis Tsochantaridis; Thomas Hofmann; Thorsten Joachims; Yasemin Altun; |
2004 | 5 | K-means Clustering Via Principal Component Analysis IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here we prove that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering. |
Chris Ding; Xiaofeng He; |
2004 | 6 | Solving Large Scale Linear Prediction Problems Using Stochastic Gradient Descent Algorithms IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study stochastic gradient descent (SGD) algorithms on regularized forms of linear prediction methods. |
Tong Zhang; |
2004 | 7 | Dynamic Conditional Random Fields: Factorized Probabilistic Models For Labeling And Segmenting Sequence Data IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges—a distributed state representation as in dynamic Bayesian networks (DBNs)—and parameters are tied across slices. |
Charles Sutton; Khashayar Rohanimanesh; Andrew McCallum; |
2004 | 8 | Integrating Constraints And Metric Learning In Semi-supervised Clustering IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper provides new methods for the two approaches as well as presents a new semi-supervised clustering algorithm that integrates both of these techniques in a uniform, principled framework. |
Mikhail Bilenko; Sugato Basu; Raymond J. Mooney; |
2004 | 9 | Learning And Evaluating Classifiers Under Sample Selection Bias IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we formalize the sample selection bias problem in machine learning terms and study analytically and experimentally how a number of well-known classifier learning methods are affected by it. |
Bianca Zadrozny; |
2004 | 10 | Ensemble Selection From Libraries Of Models IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a method for constructing ensembles from libraries of thousands of models. |
Rich Caruana; Alexandru Niculescu-Mizil; Geoff Crew; Alex Ksikes; |
2004 | 11 | A Kernel View Of The Dimensionality Reduction Of Manifolds IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We show how all three algorithms can be described as kernel PCA on specially constructed Gram matrices, and illustrate the similarities and differences between the algorithms with representative examples. |
Jihun Ham; Daniel D. Lee; Sebastian Mika; Bernhard Schölkopf; |
2004 | 12 | Learning A Kernel Matrix For Nonlinear Dimensionality Reduction IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We investigate how to learn a kernel matrix for high dimensional data that lies on or near a low dimensional manifold. |
Kilian Q. Weinberger; Fei Sha; Lawrence K. Saul; |
2004 | 13 | Solving Cluster Ensemble Problems By Bipartite Graph Partitioning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a new reduction method that constructs a bipartite graph from a given cluster ensemble. |
Xiaoli Zhang Fern; Carla E. Brodley; |
2004 | 14 | Active Learning Using Pre-clustering IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: The main contribution of the paper is a formal framework that incorporates clustering into active learning. |
Hieu T. Nguyen; Arnold Smeulders; |
2004 | 15 | Margin Based Feature Selection – Theory And Algorithms IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we introduce a margin based feature selection criterion and apply it to measure the quality of sets of features. |
Ran Gilad-Bachrach; Amir Navot; Naftali Tishby; |