Most Influential ICML Papers (2021-02)

February 11, 2021March 9, 2021 admin

The International Conference on Machine Learning (ICML) is one of the top machine learning conferences in the world. Paper Digest Team analyze all papers published on ICML in the past years, and presents the 10 most influential papers for each year. This ranking list is automatically constructed based upon citations from both research papers and granted patents, and will be frequently updated to reflect the most recent changes. To find the latest updates of the most influential papers from different conferences/journals, visit Best Paper Digest page. Note: the most influential papers may or may not include the papers that won the best paper awards. (Version: 2021-02)

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. To search for papers with highlights, related papers, patents, grants, experts and organizations, please visit our search console. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: Most Influential ICML Papers (2021-02)

Year	Rank	Paper	Author(s)
2020	1	A Simple Framework For Contrastive Learning Of Visual Representations IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a simple framework for contrastive representation learning.	Ting Chen; Simon Kornblith; Mohammad Norouzi; Geoffrey Hinton;
2020	2	Data-Efficient Image Recognition With Contrastive Predictive Coding IF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We therefore revisit and improve Contrastive Predictive Coding, an unsupervised objective for learning such representations.	Olivier Henaff;
2020	3	What Is Local Optimality In Nonconvex-Nonconcave Minimax Optimization? IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: The main contribution of this paper is to propose a proper mathematical definition of local optimality for this sequential setting—local minimax, as well as to present its properties and existence results.	Chi Jin; Praneeth Netrapalli; Michael Jordan;
2020	4	Skew-Fit: State-Covering Self-Supervised Reinforcement Learning IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a formal exploration objective for goal-reaching policies that maximizes state coverage.	VITCHYR PONG et. al.
2020	5	Leveraging Procedural Generation To Benchmark Reinforcement Learning IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce Procgen Benchmark, a suite of 16 procedurally generated game-like environments designed to benchmark both sample efficiency and generalization in reinforcement learning.	Karl Cobbe; Chris Hesse; Jacob Hilton; John Schulman;
2020	6	Agent57: Outperforming The Atari Human Benchmark IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose Agent57, the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games.	ADRI&AGRAVE; PUIGDOMENECH BADIA et. al.
2020	7	SCAFFOLD: Stochastic Controlled Averaging For Federated Learning IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: As a solution, we propose a new algorithm (SCAFFOLD) which uses control variates (variance reduction) to correct for the `client drift’.	SAI PRANEETH REDDY KARIMIREDDY et. al.
2020	8	Reliable Evaluation Of Adversarial Robustness With An Ensemble Of Diverse Parameter-free Attacks IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we first propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function. We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness.	Francesco Croce; Matthias Hein;
2020	9	Provably Efficient Exploration In Policy Optimization IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: To bridge such a gap, this paper proposes an Optimistic variant of the Proximal Policy Optimization algorithm (OPPO), which follows an "optimistic version" of the policy gradient direction.	Qi Cai; Zhuoran Yang; Chi Jin; Zhaoran Wang;
2020	10	Rigging The Lottery: Making All Tickets Winners IF:3 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-to-sparse training methods.	Utku Evci; Trevor Gale; Jacob Menick; Pablo Samuel Castro; Erich Elsen;
2019	1	EfficientNet: Rethinking Model Scaling For Convolutional Neural Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance.	Mingxing Tan; Quoc Le;
2019	2	Self-Attention Generative Adversarial Networks IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks.	Han Zhang; Ian Goodfellow; Dimitris Metaxas; Augustus Odena;
2019	3	A Convergence Theory For Deep Learning Via Over-Parameterization IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we prove simple algorithms such as stochastic gradient descent (SGD) can find Global Minima on the training objective of DNNs in Polynomial Time.	Zeyuan Allen-Zhu; Yuanzhi Li; Zhao Song;
2019	4	Theoretically Principled Trade-off Between Robustness And Accuracy IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we decompose the prediction error for adversarial examples (robust error) as the sum of the natural (classification) error and boundary error, and provide a differentiable upper bound using the theory of classification-calibrated loss, which is shown to be the tightest possible upper bound uniform over all probability distributions and measurable predictors.	HONGYANG ZHANG et. al.
2019	5	Gradient Descent Finds Global Minima Of Deep Neural Networks IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet).	Simon Du; Jason Lee; Haochuan Li; Liwei Wang; Xiyu Zhai;
2019	6	Certified Adversarial Robustness Via Randomized Smoothing IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We show how to turn any classifier that classifies well under Gaussian noise into a new classifier that is certifiably robust to adversarial perturbations under the L2 norm.	Jeremy Cohen; Elan Rosenfeld; Zico Kolter;
2019	7	Challenging Common Assumptions In The Unsupervised Learning Of Disentangled Representations IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions.	FRANCESCO LOCATELLO et. al.
2019	8	Simplifying Graph Convolutional Networks IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we reduce this excess complexity through successively removing nonlinearities and collapsing weight matrices between consecutive layers.	FELIX WU et. al.
2019	9	MASS: Masked Sequence To Sequence Pre-training For Language Generation IF:5 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the success of BERT, we propose MAsked Sequence to Sequence pre-training (MASS) for the encoder-decoder based language generation tasks.	Kaitao Song; Xu Tan; Tao Qin; Jianfeng Lu; Tie-Yan Liu;
2019	10	Quantifying Generalization In Reinforcement Learning IF:4 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate the problem of overfitting in deep reinforcement learning.	Karl Cobbe; Oleg Klimov; Chris Hesse; Taehoon Kim; John Schulman;
2018	1	Obfuscated Gradients Give A False Sense Of Security: Circumventing Defenses To Adversarial Examples IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We describe characteristic behaviors of defenses exhibiting the effect, and for each of the three types of obfuscated gradients we discover, we develop attack techniques to overcome it.	Anish Athalye; Nicholas Carlini; David Wagner;
2018	2	CyCADA: Cycle-Consistent Adversarial Domain Adaptation IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a model which adapts between domains using both generative image space alignment and latent representation space alignment.	JUDY HOFFMAN et. al.
2018	3	Provable Defenses Against Adversarial Examples Via The Convex Outer Adversarial Polytope IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data.	Eric Wong; Zico Kolter;
2018	4	Addressing Function Approximation Error In Actor-Critic Methods IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.	Scott Fujimoto; Herke Hoof; David Meger;
2018	5	Efficient Neural Audio Synthesis IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: With a focus on text-to-speech synthesis, we describe a set of general techniques for reducing sampling time while maintaining high output quality.	NAL KALCHBRENNER et. al.
2018	6	Parallel WaveNet: Fast High-Fidelity Speech Synthesis IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces Probability Density Distillation, a new method for training a parallel feed-forward network from a trained WaveNet with no significant difference in quality.	AARON OORD et. al.
2018	7	Which Training Methods For GANs Do Actually Converge? IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show that the requirement of absolute continuity is necessary: we describe a simple yet prototypical counterexample showing that in the more realistic case of distributions that are not absolutely continuous, unregularized GAN training is not always convergent.	Lars Mescheder; Andreas Geiger; Sebastian Nowozin;
2018	8	Learning Representations And Generative Models For 3D Point Clouds IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we look at geometric data represented as point clouds.	Panos Achlioptas; Olga Diamanti; Ioannis Mitliagkas; Leonidas Guibas;
2018	9	Learning To Reweight Examples For Robust Deep Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast to past reweighting methods, which typically consist of functions of the cost value of each example, in this work we propose a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions.	Mengye Ren; Wenyuan Zeng; Bin Yang; Raquel Urtasun;
2018	10	Black-box Adversarial Attacks With Limited Queries And Information IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We define three realistic threat models that more accurately characterize many real-world classifiers: the query-limited setting, the partial-information setting, and the label-only setting.	Andrew Ilyas; Logan Engstrom; Anish Athalye; Jessy Lin;
2017	1	Learning To Discover Cross-Domain Relations With Generative Adversarial Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a method based on a generative adversarial network that learns to discover relations between different domains (DiscoGAN).	Taeksoo Kim; Moonsu Cha; Hyunsoo Kim; Jung Kwon Lee; Jiwon Kim;
2017	2	Wasserstein Generative Adversarial Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a new algorithm named WGAN, an alternative to traditional GAN training.	Martin Arjovsky; Soumith Chintala; L�on Bottou;
2017	3	Neural Message Passing For Quantum Chemistry IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we reformulate existing models into a single common framework we call Message Passing Neural Networks (MPNNs) and explore additional novel variations within this framework.	Justin Gilmer; Samuel S. Schoenholz; Patrick F. Riley; Oriol Vinyals; George E. Dahl;
2017	4	Model-Agnostic Meta-Learning For Fast Adaptation Of Deep Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning.	Chelsea Finn; Pieter Abbeel; Sergey Levine;
2017	5	Convolutional Sequence To Sequence Learning IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce an architecture based entirely on convolutional neural networks.	Jonas Gehring; Michael Auli; David Grangier; Denis Yarats; Yann N. Dauphin;
2017	6	On Calibration Of Modern Neural Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We discover that modern neural networks, unlike those from a decade ago, are poorly calibrated.	Chuan Guo; Geoff Pleiss; Yu Sun; Kilian Q. Weinberger;
2017	7	Understanding Black-box Predictions Via Influence Functions IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we use influence functions — a classic technique from robust statistics — to trace a model’s prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction.	Pang Wei Koh; Percy Liang;
2017	8	Learning Important Features Through Propagating Activation Differences IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here we present DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input.	Avanti Shrikumar; Peyton Greenside; Anshul Kundaje;
2017	9	Language Modeling With Gated Convolutional Networks IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we develop a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens.	Yann N. Dauphin; Angela Fan; Michael Auli; David Grangier;
2017	10	Curiosity-driven Exploration By Self-supervised Prediction IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We formulate curiosity as the error in an agent’s ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model.	Deepak Pathak; Pulkit Agrawal; Alexei A. Efros; Trevor Darrell;
2016	1	Generative Adversarial Text To Image Synthesis IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we develop a novel deep architecture and GAN formulation to effectively bridge these advances in text and image modeling, translating visual concepts from characters to pixels.	SCOTT REED et. al.
2016	2	Dueling Network Architectures For Deep Reinforcement Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a new neural network architecture for model-free reinforcement learning.	ZIYU WANG et. al.
2016	3	Deep Speech 2 : End-to-End Speech Recognition In English And Mandarin IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech–two vastly different languages.	DARIO AMODEI et. al.
2016	4	Autoencoding Beyond Pixels Using A Learned Similarity Metric IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present an autoencoder that leverages learned representations to better measure similarities in data space.	Anders Boesen Lindbo Larsen; S�ren Kaae S�nderby; Hugo Larochelle; Ole Winther;
2016	5	Learning Convolutional Neural Networks For Graphs IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a framework for learning convolutional neural networks for arbitrary graphs.	Mathias Niepert; Mohamed Ahmed; Konstantin Kutzkov;
2016	6	Asynchronous Methods For Deep Reinforcement Learning IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers.	VOLODYMYR MNIH et. al.
2016	7	Dropout As A Bayesian Approximation: Representing Model Uncertainty In Deep Learning IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we develop a new theoretical framework casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes.	Yarin Gal; Zoubin Ghahramani;
2016	8	Ask Me Anything: Dynamic Memory Networks For Natural Language Processing IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce the dynamic memory network (DMN), a neural network architecture which processes input sequences and questions, forms episodic memories, and generates relevant answers.	ANKIT KUMAR et. al.
2016	9	Benchmarking Deep Reinforcement Learning For Continuous Control IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure.	Yan Duan; Xi Chen; Rein Houthooft; John Schulman; Pieter Abbeel;
2016	10	CryptoNets: Applying Neural Networks To Encrypted Data With High Throughput And Accuracy IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we will present a method to convert learned neural networks to CryptoNets, neural networks that can be applied to encrypted data.	RAN GILAD-BACHRACH et. al.
2015	1	Deep Learning With Limited Numerical Precision IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We study the effect of limited precision data representation and computation on neural network training.	Suyog Gupta; Ankur Agrawal; Kailash Gopalakrishnan; Pritish Narayanan;
2015	2	Unsupervised Domain Adaptation By Backpropagation IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we propose a new approach to domain adaptation in deep architectures that can be trained on large amount of labeled data from the source domain and large amount of unlabeled data from the target domain (no labeled target-domain data is necessary).	Yaroslav Ganin; Victor Lempitsky;
2015	3	DRAW: A Recurrent Neural Network For Image Generation IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces the Deep Recurrent Attentive Writer (DRAW) architecture for image generation with neural networks.	Karol Gregor; Ivo Danihelka; Alex Graves; Danilo Rezende; Daan Wierstra;
2015	4	An Empirical Exploration Of Recurrent Network Architectures IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we aim to determine whether the LSTM architecture is optimal or whether much better architectures exist.	Rafal Jozefowicz; Wojciech Zaremba; Ilya Sutskever;
2015	5	From Word Embeddings To Document Distances IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present the Word Mover’s Distance (WMD), a novel distance function between text documents.	Matt Kusner; Yu Sun; Nicholas Kolkin; Kilian Weinberger;
2015	6	Learning Transferable Features With Deep Adaptation Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new Deep Adaptation Network (DAN) architecture, which generalizes deep convolutional neural network to the domain adaptation scenario.	Mingsheng Long; Yue Cao; Jianmin Wang; Michael Jordan;
2015	7	Show, Attend And Tell: Neural Image Caption Generation With Visual Attention IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images.	KELVIN XU et. al.
2015	8	Trust Region Policy Optimization IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this article, we describe a method for optimizing control policies, with guaranteed monotonic improvement.	John Schulman; Sergey Levine; Pieter Abbeel; Michael Jordan; Philipp Moritz;
2015	9	Unsupervised Learning Of Video Representations Using LSTMs IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We use Long Short Term Memory (LSTM) networks to learn representations of video sequences.	Nitish Srivastava; Elman Mansimov; Ruslan Salakhudinov;
2015	10	Variational Inference With Normalizing Flows IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a new approach for specifying flexible, arbitrarily complex and scalable approximate posterior distributions.	Danilo Rezende; Shakir Mohamed;
2014	1	DeCAF: A Deep Convolutional Activation Feature For Generic Visual Recognition IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re-purposed to novel generic tasks.	JEFF DONAHUE et. al.
2014	2	Towards End-To-End Speech Recognition With Recurrent Neural Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation.	Alex Graves; Navdeep Jaitly;
2014	3	Stochastic Backpropagation And Approximate Inference In Deep Generative Models IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound.	Danilo Jimenez Rezende; Shakir Mohamed; Daan Wierstra;
2014	4	Deterministic Policy Gradient Algorithms IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions.	DAVID SILVER et. al.
2014	5	Multimodal Neural Language Models IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce two multimodal neural language models: models of natural language that can be conditioned on other modalities.	Ryan Kiros; Ruslan Salakhutdinov; Rich Zemel;
2014	6	Recurrent Convolutional Neural Networks For Scene Labeling IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an approach that consists of a recurrent convolutional neural network which allows us to consider a large input context while limiting the capacity of the model.	Pedro Pinheiro; Ronan Collobert;
2014	7	Neural Variational Inference And Learning In Belief Networks IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a fast non-iterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior.	Andriy Mnih; Karol Gregor;
2014	8	A Clockwork RNN IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces a simple, yet powerful modification to the simple RNN (SRN) architecture, the Clockwork RNN (CW-RNN), in which the hidden layer is partitioned into separate modules, each processing inputs at its own temporal granularity, making computations only at its prescribed clock rate.	Jan Koutnik; Klaus Greff; Faustino Gomez; Juergen Schmidhuber;
2014	9	Learning Character-level Representations For Part-of-Speech Tagging IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a deep neural network that learns character-level representation of words and associate them with usual word representations to perform POS tagging.	Cicero Dos Santos; Bianca Zadrozny;
2014	10	Large-scale Multi-label Learning With Missing Labels IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we directly address both these problems by studying the multi-label problem in a generic empirical risk minimization (ERM) framework.	Hsiang-Fu Yu; Prateek Jain; Purushottam Kar; Inderjit Dhillon;
2013	1	On The Importance Of Initialization And Momentum In Deep Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs (on datasets with long-term dependencies) to levels of performance that were previously achievable only with Hessian-Free optimization.	Ilya Sutskever; James Martens; George Dahl; Geoffrey Hinton;
2013	2	Regularization Of Neural Networks Using DropConnect IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce DropConnect, a generalization of DropOut, for regularizing large fully-connected layers within neural networks.	Li Wan; Matthew Zeiler; Sixin Zhang; Yann Le Cun; Rob Fergus;
2013	3	On The Difficulty Of Training Recurrent Neural Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective.	Razvan Pascanu; Tomas Mikolov; Yoshua Bengio;
2013	4	Maxout Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout.	Ian Goodfellow; David Warde-Farley; Mehdi Mirza; Aaron Courville; Yoshua Bengio;
2013	5	Deep Canonical Correlation Analysis IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce Deep Canonical Correlation Analysis (DCCA), a method to learn complex nonlinear transformations of two views of data such that the resulting representations are highly linearly correlated.	Galen Andrew; Raman Arora; Jeff Bilmes; Karen Livescu;
2013	6	Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a new general framework for convex optimization over matrix factorizations, where every Frank-Wolfe iteration will consist of a low-rank update, and discuss the broad application areas of this approach.	Martin Jaggi;
2013	7	Deep Learning With COTS HPC Systems IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present technical details and results from our own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI.	ADAM COATES et. al.
2013	8	Making A Science Of Model Search: Hyperparameter Optimization In Hundreds Of Dimensions For Vision Architectures IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a meta-modeling approach to support automated hyperparameter optimization, with the goal of providing practical tools that replace hand-tuning with a reproducible and unbiased optimization process.	James Bergstra; Daniel Yamins; David Cox;
2013	9	Learning Fair Representations IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a learning algorithm for fair classification that achieves both group fairness (the proportion of members in a protected group receiving positive classification is identical to the proportion in the population as a whole), and individual fairness (similar individuals should be treated similarly).	Rich Zemel; Yu Wu; Kevin Swersky; Toni Pitassi; Cynthia Dwork;
2013	10	Guided Policy Search IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima.	Sergey Levine; Vladlen Koltun;
2012	1	Conversational Speech Transcription Using Context-Dependent Deep Neural Networks IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Abstract: Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, combine the classic artificial-neural-network HMMs with traditional context-dependent acoustic modeling and …	Dong Yu; Frank Seide; Gang Li;
2012	2	Marginalized Denoising Autoencoders For Domain Adaptation IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a variation, marginalized SDA (mSDA).	Minmin Chen; Zhixiang Xu; Kilian Weinberger; Fei Sha;
2012	3	Poisoning Attacks Against Support Vector Machines IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: As we demonstrate in this contribution, an intelligent adversary can to some extent predict the change of the SVM decision function in response to malicious input and use this ability to construct malicious data points.	Battista Biggio; Blaine Nelson; Pavel Laskov;
2012	4	A Fast And Simple Algorithm For Training Neural Probabilistic Language Models IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a fast and simple algorithm for training NPLMs based on noise-contrastive estimation, a newly introduced procedure for estimating unnormalized continuous distributions.	Andriy Mnih; Yee Whye Teh;
2012	5	Making Gradient Descent Optimal For Strongly Convex Stochastic Optimization IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate the optimality of SGD in a stochastic setting.	Alexander Rakhlin; Ohad Shamir; Karthik Sridharan;
2012	6	Learning Task Grouping And Overlap In Multi-task Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a framework for multi-task learn- ing that enables one to selectively share the information across the tasks.	Abhishek Kumar; Hal Daume III;
2012	7	High Dimensional Semiparametric Gaussian Copula Graphical Models IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a semiparametric approach named nonparanormal SKEPTIC for efficiently and robustly estimating high dimensional undirected graphical models.	Han Liu; Fang Han; Ming Yuan; John Lafferty; Larry Wasserman;
2012	8	Fast Approximation Of Matrix Coherence And Statistical Leverage IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Interestingly, to achieve our result we judiciously apply random projections on both sides of A.	Michael Mahoney; Petros Drineas; Malik Magdon-Ismail; David Woodruff;
2012	9	Variational Bayesian Inference With Stochastic Search IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present an algorithm based on stochastic optimization that allows for direct optimization of the variational lower bound in all models.	John Paisley; David Blei; Michael Jordan;
2012	10	Revisiting K-means: New Algorithms Via Bayesian Nonparametrics IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we revisit the k-means clustering algorithm from a Bayesian nonparametric viewpoint.	Brian Kulis; Michael Jordan;
2011	1	Multimodal Deep Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel application of deep networks to learn features over multiple modalities.	JIQUAN NGIAM et. al.
2011	2	Parsing Natural Scenes And Natural Language With Recursive Neural Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a max-margin structure prediction architecture based on recursive neural networks that can successfully recover such structure both in complex scene images as well as sentences.	Richard Socher; Cliff Chiung-Yu Lin; Andrew Ng; Chris Manning;
2011	3	Generating Text With Recurrent Neural Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we demonstrate the power of RNNs trained with the new Hessian-Free optimizer (HF) by applying them to character-level language modeling tasks.	Ilya Sutskever; James Martens; Geoffrey Hinton;
2011	4	Domain Adaptation For Large-Scale Sentiment Classification: A Deep Learning Approach IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a deep learning approach which learns to extract a meaningful representation for each review in an unsupervised fashion.	Xavier Glorot; Antoine Bordes; Yoshua Bengio;
2011	5	Bayesian Learning Via Stochastic Gradient Langevin Dynamics IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches.	Max Welling; Yee Whye Teh;
2011	6	A Three-Way Model For Collective Learning On Multi-Relational Data IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we present a novel approach to relational learning based on the factorization of a three-way tensor.	Maximilian Nickel; Volker Tresp; Hans-Peter Kriegel;
2011	7	Contractive Auto-Encoders: Explicit Invariance During Feature Extraction IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present in this paper a novel approach for training deterministic auto-encoders.	Salah RIFAI; Pascal Vincent; Xavier Muller; Xavier Glorot; Yoshua Bengio;
2011	8	On Optimization Methods For Deep Learning IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show that more sophisticated off-the-shelf optimization methods such as Limited memory BFGS (L-BFGS) and Conjugate gradient (CG) with linesearch can significantly simplify and speed up the process of pretraining deep algorithms.	QUOC LE et. al.
2011	9	Hashing With Graphs IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel graph-based hashing method which automatically discovers the neighborhood structure inherent in the data to learn appropriate compact codes.	Wei Liu; Jun Wang; Sanjiv Kumar; Shih-Fu Chang;
2011	10	PILCO: A Model-Based And Data-Efficient Approach To Policy Search IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method.	Marc Deisenroth; Carl Rasmussen;
2010	1	Rectified Linear Units Improve Restricted Boltzmann Machines IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Abstract: Restricted Boltzmann machines were developed using binary stochastic hidden units. These can be generalized by replacing each binary unit by an infinite number of copies that all …	Vinod Nair; Geoffrey Hinton;
2010	2	3D Convolutional Neural Networks For Human Action Recognition IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we develop a novel 3D CNN model for action recognition.	Shuiwang Ji; Wei Xu; Ming Yang; Kai Yu;
2010	3	Gaussian Process Optimization In The Bandit Setting: No Regret And Experimental Design IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We formalize this task as a multi-armed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm.	Niranjan Srinivas; Andreas Krause; Sham Kakade; Matthias Seeger;
2010	4	Robust Subspace Segmentation By Low-Rank Representation IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose low-rank representation(LRR) to segment data drawn from a union of multiple linear (or affine) subspaces.	Guangcan Liu; Zhouchen Lin; Yong Yu;
2010	5	Learning Fast Approximations Of Sparse Coding IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We proposed two versions of a very fast algorithm that produces approximate estimates of the sparse code that can be used to compute good visual features, or to initialize exact iterative algorithms.	Karol Gregor; Yann LeCun;
2010	6	Deep Learning Via Hessian-free Optimization IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We develop a 2nd-order optimization method based on the “Hessian-free approach, and apply it to training deep auto-encoders.	James Martens;
2010	7	Large Graph Construction For Scalable Semi-Supervised Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we address the scalability issue plaguing graph-based semi-supervised learning viaa small number of anchor points which adequately cover the entire point cloud.	Wei Liu; Junfeng He; Shih-Fu Chang;
2010	8	Sequential Projection Learning For Hashing With Compact Codes IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel data-dependent projection learning method such that each hash function is designed to correct the errors made by the previous one sequentially.	Jun Wang; Sanjiv Kumar; Shih-Fu Chang;
2010	9	Tree-Guided Group Lasso For Multi-Task Regression With Structured Sparsity IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our goal is to recover the common set of relevant inputs for each output cluster.	Seyoung Kim; Eric Xing;
2010	10	Bayes Optimal Multilabel Classification Via Probabilistic Classifier Chains IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: The goal of this paper is to elaborate on this postulate in a critical way.	Krzysztof Dembczynski; Weiwei Cheng; Eyke Huellermeier;
2009	1	Curriculum Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we formalize such training strategies in the context of machine learning, and call them "curriculum learning".	Yoshua Bengio; Jérôme Louradour; Ronan Collobert; Jason Weston;
2009	2	Online Dictionary Learning For Sparse Coding IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a new online optimization algorithm for dictionary learning, based on stochastic approximations, which scales up gracefully to large datasets with millions of training samples.	Julien Mairal; Francis Bach; Jean Ponce; Guillermo Sapiro;
2009	3	Group Lasso With Overlap And Graph Lasso IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a new penalty function which, when used as regularization for empirical risk minimization procedures, leads to sparse estimators.	Laurent Jacob; Guillaume Obozinski; Jean-Philippe Vert;
2009	4	Large-scale Deep Unsupervised Learning Using Graphics Processors IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we suggest massively parallel methods to help resolve these problems.	Rajat Raina; Anand Madhavan; Andrew Y. Ng;
2009	5	Learning Structural SVMs With Latent Variables IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a large-margin formulation and algorithm for structured output prediction that allows the use of latent variables.	Chun-Nam John Yu; Thorsten Joachims;
2009	6	Multi-view Clustering Via Canonical Correlation Analysis IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we consider constructing such projections using multiple views of the data, via Canonical Correlation Analysis (CCA).	Kamalika Chaudhuri; Sham M. Kakade; Karen Livescu; Karthik Sridharan;
2009	7	Information Theoretic Measures For Clusterings Comparison: Is A Correction For Chance Necessary? IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we discuss the necessity of correction for chance for information theoretic based measures for clusterings comparison.	Nguyen Xuan Vinh; Julien Epps; James Bailey;
2009	8	An Accelerated Gradient Method For Trace Norm Minimization IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we exploit the special structure of the trace norm, based on which we propose an extended gradient algorithm that converges as O(1/k).	Shuiwang Ji; Jieping Ye;
2009	9	Supervised Learning From Multiple Experts: Whom To Trust When Everyone Lies A Bit IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We describe a probabilistic approach for supervised learning when we have multiple experts/annotators providing (possibly noisy) labels but no absolute gold standard.	VIKAS C. RAYKAR et. al.
2009	10	Identifying Suspicious URLs: An Application Of Large-scale Online Learning IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper explores online learning approaches for detecting malicious Web sites (those involved in criminal scams) using lexical and host-based features of the associated URLs.	Justin Ma; Lawrence K. Saul; Stefan Savage; Geoffrey M. Voelker;
2008	1	A Unified Architecture For Natural Language Processing: Deep Neural Networks With Multitask Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model.	Ronan Collobert; Jason Weston;
2008	2	Extracting And Composing Robust Features With Denoising Autoencoders IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern.	Pascal Vincent; Hugo Larochelle; Yoshua Bengio; Pierre-Antoine Manzagol;
2008	3	Bayesian Probabilistic Matrix Factorization Using Markov Chain Monte Carlo IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we present a fully Bayesian treatment of the Probabilistic Matrix Factorization (PMF) model in which model capacity is controlled automatically by integrating over all model parameters and hyperparameters.	Ruslan Salakhutdinov; Andriy Mnih;
2008	4	Training Restricted Boltzmann Machines Using Approximations To The Likelihood Gradient IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: A new algorithm for training Restricted Boltzmann Machines is introduced.	Tijmen Tieleman;
2008	5	A Dual Coordinate Descent Method For Large-scale Linear SVM IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a novel dual coordinate descent method for linear SVM with L1-and L2-loss functions.	Cho-Jui Hsieh; Kai-Wei Chang; Chih-Jen Lin; S. Sathiya Keerthi; S. Sundararajan;
2008	6	Classification Using Discriminative Restricted Boltzmann Machines IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we argue that RBMs provide a self-contained framework for deriving competitive non-linear classifiers.	Hugo Larochelle; Yoshua Bengio;
2008	7	Listwise Approach To Learning To Rank: Theory And Algorithm IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper aims to conduct a study on the listwise approach to learning to rank.	Fen Xia; Tie-Yan Liu; Jue Wang; Wensheng Zhang; Hang Li;
2008	8	Grassmann Discriminant Analysis: A Unifying View On Subspace-based Learning IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we propose a discriminant learning framework for problems in which data consist of linear subspaces instead of vectors.	Jihun Hamm; Daniel D. Lee;
2008	9	An Empirical Evaluation Of Supervised Learning In High Dimensions IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we perform an empirical evaluation of supervised learning on high-dimensional data.	Rich Caruana; Nikos Karampatziakis; Ainur Yessenalina;
2008	10	Deep Learning Via Semi-supervised Embedding IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We show how nonlinear embedding algorithms popular for use with shallow semi-supervised learning techniques such as kernel methods can be applied to deep multilayer architectures, either as a regularizer at the output layer, or on each layer of the architecture.	Jason Weston; Frédéric Ratle; Ronan Collobert;
2007	1	Information-theoretic Metric Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present an information-theoretic approach to learning a Mahalanobis distance function.	Jason V. Davis; Brian Kulis; Prateek Jain; Suvrit Sra; Inderjit S. Dhillon;
2007	2	Pegasos: Primal Estimated Sub-GrAdient SOlver For SVM IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We describe and analyze a simple and effective iterative algorithm for solving the optimization problem cast by Support Vector Machines (SVM).	Shai Shalev-Shwartz; Yoram Singer; Nathan Srebro;
2007	3	Restricted Boltzmann Machines For Collaborative Filtering IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we show how a class of two-layer undirected graphical models, called Restricted Boltzmann Machines (RBM’s), can be used to model tabular data, such as user’s ratings of movies.	Ruslan Salakhutdinov; Andriy Mnih; Geoffrey Hinton;
2007	4	Learning To Rank: From Pairwise Approach To Listwise Approach IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: The paper proposes a new probabilistic method for the approach.	Zhe Cao; Tao Qin; Tie-Yan Liu; Ming-Feng Tsai; Hang Li;
2007	5	Self-taught Learning: Transfer Learning From Unlabeled Data IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We describe an approach to self-taught learning that uses sparse coding to construct higher-level features using the unlabeled data.	Rajat Raina; Alexis Battle; Honglak Lee; Benjamin Packer; Andrew Y. Ng;
2007	6	Boosting For Transfer Learning IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a novel transfer learning framework called TrAdaBoost, which extends boosting-based learning algorithms (Freund & Schapire, 1997).	Wenyuan Dai; Qiang Yang; Gui-Rong Xue; Yong Yu;
2007	7	An Empirical Evaluation Of Deep Architectures On Problems With Many Factors Of Variation IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Recently, several learning algorithms relying on models with deep architectures have been proposed.	Hugo Larochelle; Dumitru Erhan; Aaron Courville; James Bergstra; Yoshua Bengio;
2007	8	Three New Graphical Models For Statistical Language Modelling IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose three new probabilistic language models that define the distribution of the next word in a sequence given several preceding words by using distributed representations of those words.	Andriy Mnih; Geoffrey Hinton;
2007	9	Spectral Feature Selection For Supervised And Unsupervised Learning IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work exploits intrinsic properties underlying supervised and unsupervised feature selection algorithms, and proposes a unified framework for feature selection based on spectral graph theory.	Zheng Zhao; Huan Liu;
2007	10	Combining Online And Offline Knowledge In UCT IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider three approaches for combining offline and online value functions in the UCT algorithm.	Sylvain Gelly; David Silver;
2006	1	Connectionist Temporal Classification: Labelling Unsegmented Sequence Data With Recurrent Neural Networks IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems.	Alex Graves; Santiago Fernández; Faustino Gomez; Jürgen Schmidhuber;
2006	2	An Empirical Comparison Of Supervised Learning Algorithms IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a large-scale empirical comparison between ten supervised learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps.	Rich Caruana; Alexandru Niculescu-Mizil;
2006	3	Dynamic Topic Models IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: The approach is to use state space models on the natural parameters of the multinomial distributions that represent the topics.	David M. Blei; John D. Lafferty;
2006	4	The Relationship Between Precision-Recall And ROC Curves IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: A corollary is the notion of an achievable PR curve, which has properties much like the convex hull in ROC space; we show an efficient algorithm for computing this curve.	Jesse Davis; Mark Goadrich;
2006	5	Topic Modeling: Beyond Bag-of-words IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, I explore a hierarchical generative probabilistic model that incorporates both n-gram statistics and latent topic variables by extending a unigram topic model to include properties of a hierarchical Dirichlet bigram language model.	Hanna M. Wallach;
2006	6	Cover Trees For Nearest Neighbor IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a tree data structure for fast nearest neighbor operations in general n-point metric spaces (where the data set consists of n points).	Alina Beygelzimer; Sham Kakade; John Langford;
2006	7	Pachinko Allocation: DAG-structured Mixture Models Of Topic Correlations IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce the pachinko allocation model (PAM), which captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG).	Wei Li; Andrew McCallum;
2006	8	Fast Time Series Classification Using Numerosity Reduction IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose an additional technique, numerosity reduction, to speed up one-nearest-neighbor DTW.	Xiaopeng Xi; Eamonn Keogh; Christian Shelton; Li Wei; Chotirat Ann Ratanamahatana;
2006	9	Maximum Margin Planning IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this approach, we learn mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert’s behavior.	Nathan D. Ratliff; J. Andrew Bagnell; Martin A. Zinkevich;
2006	10	Probabilistic Inference For Solving Discrete And Continuous State Markov Decision Processes IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here we present an Expectation Maximization algorithm for computing optimal policies.	Marc Toussaint; Amos Storkey;
2005	1	Learning To Rank Using Gradient Descent IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function.	CHRIS BURGES et. al.
2005	2	Fast Maximum Margin Matrix Factorization For Collaborative Prediction IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we investigate a direct gradient-based optimization method for MMMF and demonstrate it on large collaborative prediction problems.	Jasson D. M. Rennie; Nathan Srebro;
2005	3	A Support Vector Method For Multivariate Performance Measures IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F₁-score.	Thorsten Joachims;
2005	4	Predicting Good Probabilities With Supervised Learning IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We examine the relationship between the predictions made by different learning algorithms and true posterior probabilities.	Alexandru Niculescu-Mizil; Rich Caruana;
2005	5	Learning Structured Prediction Models: A Large Margin Approach IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our goal is to learn parameters such that inference using the model reproduces correct answers on the training data.	Ben Taskar; Vassil Chatalbashev; Daphne Koller; Carlos Guestrin;
2005	6	Comparing Clusterings: An Axiomatic View IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Abstract: This paper views clusterings as elements of a lattice. Distances between clusterings are analyzed in their relationship to the lattice. From this vantage point, we first give an …	Marina Meilǎ;
2005	7	Non-negative Tensor Factorization With Applications To Statistics And Computer Vision IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We derive algorithms for finding a non-negative n-dimensional tensor factorization (n-NTF) which includes the non-negative matrix factorization (NMF) as a particular case when n = 2.	Amnon Shashua; Tamir Hazan;
2005	8	Learning From Labeled And Unlabeled Data On A Directed Graph IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a general framework for learning from labeled and unlabeled data on a directed graph in which the structure of the graph including the directionality of the edges is considered.	Dengyong Zhou; Jiayuan Huang; Bernhard Schölkopf;
2005	9	Learning The Structure Of Markov Logic Networks IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive logic programming (ILP) and feature induction in Markov networks.	Stanley Kok; Pedro Domingos;
2005	10	Near-optimal Sensor Placements In Gaussian Processes IF:6 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a mutual information criteria, and show that it produces better placements.	Carlos Guestrin; Andreas Krause; Ajit Paul Singh;
2004	1	Support Vector Machine Learning For Interdependent And Structured Output Spaces IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to generalize multiclass Support Vector Machine learning in a formulation that involves features extracted jointly from inputs and outputs.	Ioannis Tsochantaridis; Thomas Hofmann; Thorsten Joachims; Yasemin Altun;
2004	2	K-means Clustering Via Principal Component Analysis IF:9 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here we prove that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering.	Chris Ding; Xiaofeng He;
2004	3	Apprenticeship Learning Via Inverse Reinforcement Learning IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We think of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and give an algorithm for learning the task demonstrated by the expert.	Pieter Abbeel; Andrew Y. Ng;
2004	4	A Maximum Entropy Approach To Species Distribution Modeling IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose the use of maximum-entropy techniques for this problem, specifically, sequential-update algorithms that can handle a very large number of features.	Steven J. Phillips; Miroslav Dudík; Robert E. Schapire;
2004	5	Multiple Kernel Learning, Conic Duality, And The SMO Algorithm IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel dual formulation of the QCQP as a second-order cone programming problem, and show how to exploit the technique of Moreau-Yosida regularization to yield a formulation to which SMO techniques can be applied.	Francis R. Bach; Gert R. G. Lanckriet; Michael I. Jordan;
2004	6	Integrating Constraints And Metric Learning In Semi-supervised Clustering IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper provides new methods for the two approaches as well as presents a new semi-supervised clustering algorithm that integrates both of these techniques in a uniform, principled framework.	Mikhail Bilenko; Sugato Basu; Raymond J. Mooney;
2004	7	Solving Large Scale Linear Prediction Problems Using Stochastic Gradient Descent Algorithms IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study stochastic gradient descent (SGD) algorithms on regularized forms of linear prediction methods.	Tong Zhang;
2004	8	Dynamic Conditional Random Fields: Factorized Probabilistic Models For Labeling And Segmenting Sequence Data IF:8 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges—a distributed state representation as in dynamic Bayesian networks (DBNs)—and parameters are tied across slices.	Charles Sutton; Khashayar Rohanimanesh; Andrew McCallum;
2004	9	Ensemble Selection From Libraries Of Models IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a method for constructing ensembles from libraries of thousands of models.	Rich Caruana; Alexandru Niculescu-Mizil; Geoff Crew; Alex Ksikes;
2004	10	Learning And Evaluating Classifiers Under Sample Selection Bias IF:7 Related Papers Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we formalize the sample selection bias problem in machine learning terms and study analytically and experimentally how a number of well-known classifier learning methods are affected by it.	Bianca Zadrozny;