Paper Digest: ICML 2014 Highlights

June 20, 2014October 6, 2019 admin

The International Conference on Machine Learning (ICML) is one of the top machine learning conferences in the world. In 2014, it is to be held in Beijing, China.

To help AI community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting AI paper, you are welcome to sign up our free paper digest service to get new paper updates customized to your own interests on a daily basis.

Paper Digest Team
team@paperdigest.org

TABLE 1: ICML 2014 Papers

	Title	Authors	Highlight
1	A Discriminative Latent Variable Model for Online Clustering	Rajhans Samdani, Kai-Wei Chang, Dan Roth	This paper presents a latent variable structured prediction model for discriminative supervised clustering of items called the Latent Left-linking Model (L3M).
2	Kernel Mean Estimation and Stein Effect	Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, Arthur Gretton, Bernhard Schoelkopf	Focusing on a subset of this class, we propose efficient shrinkage estimators for the kernel mean.
3	Demystifying Information-Theoretic Clustering	Greg Ver Steeg, Aram Galstyan, Fei Sha, Simon DeDeo	We propose a novel method for clustering data which is grounded in information-theoretic principles and requires no parametric assumptions.
4	Covering Number for Efficient Heuristic-based POMDP Planning	Zongzhang Zhang, David Hsu, Wee Sun Lee	In this paper, we use the covering number to characterize the size of the search space reachable under heuristics and connect the complexity of POMDP planning to the effectiveness of heuristics.
5	The Coherent Loss Function for Classification	Wenzhuo Yang, Melvyn Sim, Huan Xu	To address the intractability, previous methods consider minimizing the cumulative loss – the sum of convex surrogates of the 0-1 loss of each sample. In this paper, we revisit this paradigm and develop instead an axiomatic framework by proposing a set of salient properties on functions for binary classification and then propose the coherent loss approach, which is a tractable upper-bound of the empirical classification error over the entire sample set.
6	Fast Stochastic Alternating Direction Method of Multipliers	Wenliang Zhong, James Kwok	We propose a new stochastic alternating direction method of multipliers (ADMM) algorithm, which incrementally approximates the full gradient in the linearized ADMM formulation.
7	Active Detection via Adaptive Submodularity	Yuxin Chen, Hiroaki Shioi, Cesar Fuentes Montesinos, Lian Pin Koh, Serge Wich, Andreas Krause	In this paper, we propose a principled approach to active object detection, and show that for a rich class of base detectors algorithms, one can derive a natural sequential decision problem for deciding when to invoke expert supervision.
8	Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization	Shai Shalev-Shwartz, Tong Zhang	We introduce a proximal version of the stochastic dual coordinate ascent method and show how to accelerate the method using an inner-outer iteration procedure.
9	An Adaptive Accelerated Proximal Gradient Method and its Homotopy Continuation for Sparse Optimization	Qihang Lin, Lin Xiao	This method incorporates a restarting scheme to automatically estimate the strong convexity parameter and achieves a nearly optimal iteration complexity.
10	Recurrent Convolutional Neural Networks for Scene Labeling	Pedro Pinheiro, Ronan Collobert	We propose an approach that consists of a recurrent convolutional neural network which allows us to consider a large input context while limiting the capacity of the model.
11	A Statistical Perspective on Algorithmic Leveraging	Ping Ma, Michael Mahoney, Bin Yu	Based on these theoretical results, we propose and analyze two new leveraging algorithms: one constructs a smaller least-squares problem with “shrinked” leverage scores (SLEV), and the other solves a smaller and unweighted (or biased) least-squares problem (LEVUNW).
12	Thompson Sampling for Complex Online Problems	Aditya Gopalan, Shie Mannor, Yishay Mansour	We consider stochastic multi-armed bandit problems with complex actions over a set of basic arms, where the decision maker plays a complex action rather than a basic arm in each round.
13	Boosting multi-step autoregressive forecasts	Souhaib Ben Taieb, Rob Hyndman	To address this issue, we propose a new forecasting strategy which boosts traditional recursive linear forecasts with a direct strategy using a boosting autoregression procedure at each horizon.
14	A Statistical Convergence Perspective of Algorithms for Rank Aggregation from Pairwise Data	Arun Rajkumar, Shivani Agarwal	In this paper, we consider this question in a natural setting where pairwise comparisons are drawn randomly and independently from some underlying probability distribution.
15	Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations	Timothy Mann, Shie Mannor	We show how options, a class of control structures encompassing primitive and temporally extended actions, can play a valuable role in planning in MDPs with continuous state-spaces.
16	Latent Bandits.	Odalric-Ambrym Maillard, Shie Mannor	In each setting, we introduce specific algorithms and derive non-trivial regret performance.
17	Fast Allocation of Gaussian Process Experts	Trung Nguyen, Edwin Bonilla	We propose a scalable nonparametric Bayesian regression model based on a mixture of Gaussian process (GP) experts and the inducing points formalism underpinning sparse GP approximations.
18	Von Mises-Fisher Clustering Models	Siddharth Gopal, Yiming Yang	This paper proposes a suite of models for clustering high-dimensional data on a unit sphere based on Von Mises-Fisher (vMF) distribution and for discovering more intuitive clusters than existing approaches.
19	Convergence rates for persistence diagram estimation in Topological Data Analysis	Fr�d�ric Chazal, Marc Glisse, Catherine Labru�re, Bertrand Michel	We show that the use of persistent homology can be naturally considered in general statistical frameworks.
20	Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs	Fabian Gieseke, Justin Heinermann, Cosmin Oancea, Christian Igel	We present a new approach for combining k-d trees and graphics processing units for nearest neighbor search.
21	Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget	Anoop Korattikara, Yutian Chen, Max Welling	We introduce an approximate MH rule based on a sequential hypothesis test that allows us to accept or reject samples with high confidence using only a fraction of the data required for the exact MH rule.
22	Understanding the Limiting Factors of Topic Modeling via Posterior Contraction Analysis	Jian Tang, Zhaoshi Meng, Xuanlong Nguyen, Qiaozhu Mei, Ming Zhang	We present theorems elucidating the posterior contraction rates of the topics as the amount of data increases, and a thorough supporting empirical study using synthetic and real data sets, including news and web-based articles and tweet messages.
23	The Inverse Regression Topic Model	Maxim Rabinovich, David Blei	In this paper, we introduce the inverse regression topic model (IRTM), a mixed-membership extension of MNIR that combines the strengths of both methodologies.
24	A Consistent Histogram Estimator for Exchangeable Graph Models	Stanley Chan, Edoardo Airoldi	In this paper, we propose a histogram estimator of a graphon that is provably consistent and numerically efficient.
25	Latent Variable Copula Inference for Bundle Pricing from Retail Transaction Data	Benjamin Letham, Wei Sun, Anshul Sheopuri	We develop a statistically consistent and computationally tractable inference procedure for fitting a copula model over correlated valuations, using only sales transaction data for the individual items.
26	Towards Minimax Online Learning with Unknown Time Horizon	Haipeng Luo, Robert Schapire	For the random horizon setting with restricted losses, we derive a fully optimal minimax algorithm.
27	Factorized Point Process Intensities: A Spatial Analysis of Professional Basketball	Andrew Miller, Luke Bornn, Ryan Adams, Kirk Goldsberry	Modeling shot attempt data as a point process, we create a low dimensional representation of offensive player types in the NBA.
28	Margins, Kernels and Non-linear Smoothed Perceptrons	Aaditya Ramdas, Javier Pe�a	We focus on the problem of finding a non-linear classification function that lies in a Reproducing Kernel Hilbert Space (RKHS) both from the primal point of view (finding a perfect separator when one exists) and the dual point of view (giving a certificate of non-existence), with special focus on generalizations of two classical schemes – the Perceptron (primal) and Von-Neumann (dual) algorithms.
29	Robust RegBayes: Selectively Incorporating First-Order Logic Domain Knowledge into Bayesian Models	Shike Mei, Jun Zhu, Jerry Zhu	We present a novel and more direct approach by imposing First-Order Logic (FOL) rules on the posterior distribution.
30	Learning Theory and Algorithms for revenue optimization in second price auctions with reserve	Mehryar Mohri, Andres Munoz Medina	We cast the problem of selecting the reserve price to optimize revenue as a learning problem and present a full theoretical analysis dealing with the complex properties of the corresponding loss function (it is non-convex and discontinuous).
31	Low-density Parity Constraints for Hashing-Based Discrete Integration	Stefano Ermon, Carla Gomes, Ashish Sabharwal, Bart Selman	Inspired by the success of LDPC codes, we propose the use of low-density parity constraints to make inference more tractable in practice.
32	Prediction with Limited Advice and Multiarmed Bandits with Paid Observations	Yevgeny Seldin, Peter Bartlett, Koby Crammer, Yasin Abbasi-Yadkori	We present an algorithm that achieves O(\sqrt(N/M)T\ln N) regret on T rounds of this game.
33	Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts	Tien Vu Nguyen, Dinh Phung, Xuanlong Nguyen, Swetha Venkatesh, Hung Bui	We present a Bayesian nonparametric framework for multilevel clustering which utilizes group-level context information to simultaneously discover low-dimensional structures of the group contents and partitions groups into clusters.
34	Large-Margin Metric Learning for Constrained Partitioning Problems	R�mi Lajugie, Francis Bach, Sylvain Arlot	We aim at learning a Mahalanobis metric for these unsupervised problems, leading to feature weighting and/or selection.
35	Wasserstein Propagation for Semi-Supervised Learning	Justin Solomon, Raif Rustamov, Leonidas Guibas, Adrian Butscher	Thus, this paper introduces a technique for graph-based semi-supervised learning of histograms, derived from the theory of optimal transportation.
36	Max-Margin Infinite Hidden Markov Models	Aonan Zhang, Jun Zhu, Bo Zhang	Our paper introduces max-margin infinite HMMs (M2iHMMs), new infinite HMMs that explore the max-margin principle for discriminative learning.
37	Efficient Approximation of Cross-Validation for Kernel Methods using Bouligand Influence Function	Yong Liu, Shali Jiang, Shizhong Liao	In this paper, we present a novel strategy for approximating the cross-validation based on the Bouligand influence function (BIF), which only requires the solution of the algorithm once.
38	Generalized Exponential Concentration Inequality for Renyi Divergence Estimation	Shashank Singh, Barnabas Poczos	The main contribution of our work is to provide such a bound for an estimator of Renyi divergence for a smooth Holder class of densities on the d-dimensional unit cube.
39	Boosting with Online Binary Learners for the Multiclass Bandit Problem	Shang-Tse Chen, Hsuan-Tien Lin, Chi-Jen Lu	In this paper, we propose an approach that systematically converts existing online binary classifiers to promising bandit learners with strong theoretical guarantee.
40	Optimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm	Tasuku Soma, Naonori Kakimura, Kazuhiro Inaba, Ken-ichi Kawarabayashi	We consider the budget allocation problem over bipartite influence model proposed by Alon et al.
41	Computing Parametric Ranking Models via Rank-Breaking	Hossein Azari Soufiani, David Parkes, Lirong Xia	We characterize the breakings for which the estimator is consistent for random utility models (RUMs) including Plackett-Luce and Normal-RUM, develop a general sufficient condition for a full breaking to be the only consistent breaking, and provide a trichotomy theorem in regard to single-edge breakings.
42	Tracking Adversarial Targets	Yasin Abbasi-Yadkori, Peter Bartlett, Varun Kanade	We present an efficient algorithm for this problem and show that, under standard conditions on the linear system, its regret with respect to an optimal linear policy grows as O(\log^2 T), where T is the number of rounds of the game.
43	Online Bayesian Passive-Aggressive Learning	Tianlin Shi, Jun Zhu	This paper presents online Bayesian Passive-Aggressive (BayesPA) learning, which subsumes the online PA and extends naturally to incorporate latent variables and perform nonparametric Bayesian inference, thus providing great flexibility for explorative analysis.
44	Deterministic Policy Gradient Algorithms	David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, Martin Riedmiller	In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions.
45	Modeling Correlated Arrival Events with Latent Semi-Markov Processes	Wenzhao Lian, Vinayak Rao, Brian Eriksson, Lawrence Carin	In this work, we model such data as generated by a latent collection of continuous-time binary semi-Markov processes, corresponding to external events appearing and disappearing.
46	Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach	R�mi Bardenet, Arnaud Doucet, Chris Holmes	This paper describes a methodology that aims to scale up the Metropolis-Hastings (MH) algorithm in this context.
47	Diagnosis determination: decision trees optimizing simultaneously worst and expected testing cost	Ferdinando Cicalese, Eduardo Laber, Aline Medeiros Saettler	We provide an algorithm that builds a strategy (decision tree) with both expected cost and worst cost which are at most an O(\log n) factor away from, respectively, the minimum possible expected cost and the minimum possible worst cost.
48	Condensed Filter Tree for Cost-Sensitive Multi-Label Classification	Chun-Liang Li, Hsuan-Tien Lin	In this paper, we propose a novel algorithm, called condensed filter tree (CFT), for optimizing any criteria in CSMLC.
49	On Measure Concentration of Random Maximum A-Posteriori Perturbations	Francesco Orabona, Tamir Hazan, Anand Sarwate, Tommi Jaakkola	More efficient algorithms use sequential sampling strategies based on the expected value of low dimensional MAP perturbations.
50	Bias in Natural Actor-Critic Algorithms	Philip Thomas	We show that several popular discounted reward natural actor-critics, including the popular NAC-LSTD and eNAC algorithms, do not generate unbiased estimates of the natural policy gradient as claimed.
51	Dimension-free Concentration Bounds on Hankel Matrices for Spectral Learning	Fran�ois Denis, Mattias Gybels, Amaury Habrard	Spectral methods propose elegant solutions to the problem of inferring weighted automata from finite samples of variable-length strings drawn from an unknown target distribution.
52	On Modelling Non-linear Topical Dependencies	Zhixing Li, Siqiang Wen, Juanzi Li, Peng Zhang, Jie Tang	In this paper, sentences are represented as dependency trees and a Global Topic Random Field (GTRF) is presented to model the non-linear dependencies between words.
53	A Deep and Tractable Density Estimator	Benigno Uria, Iain Murray, Hugo Larochelle	In this work we introduce an efficient procedure to simultaneously train a NADE model for each possible ordering of the variables, by sharing parameters across all these models.
54	(Near) Dimension Independent Risk Bounds for Differentially Private Learning	Prateek Jain, Abhradeep Guha Thakurta	In this paper, we study the problem of differentially private risk minimization where the goal is to provide differentially private algorithms that have small excess risk.
55	Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels	Jiyan Yang, Vikas Sindhwani, Haim Avron, Michael Mahoney	In this paper, we propose to use Quasi-Monte Carlo (QMC) approximations instead where the relevant integrands are evaluated on a low-discrepancy sequence of points as opposed to random point sets as in the Monte Carlo approach.
56	Discriminative Features via Generalized Eigenvectors	Nikos Karampatziakis, Paul Mineiro	In this paper we investigate scalable techniques for inducing discriminative features by taking advantage of simple second order structure in the data.
57	Forward-Backward Greedy Algorithms for General Convex Smooth Functions over A Cardinality Constraint	Ji Liu, Jieping Ye, Ryohei Fujimaki	In this paper, we systematically analyze the theoretical properties of both algorithms.
58	Online Learning in Markov Decision Processes with Changing Cost Sequences	Travis Dick, Andras Gyorgy, Csaba Szepesvari	In this paper we consider online learning in finite Markov decision processes (MDPs) with changing cost sequences under full and bandit-information.
59	Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms	Richard Combes, Alexandre Proutiere	For discrete unimodal bandits, we derive asymptotic lower bounds for the regret achieved under any algorithm, and propose OSUB, an algorithm whose regret matches this lower bound.
60	Maximum Mean Discrepancy for Class Ratio Estimation: Convergence Bounds and Kernel Selection	Arun Iyer, Saketha Nath, Sunita Sarawagi	In this paper we investigate the use of maximum mean discrepancy (MMD) in a reproducing kernel Hilbert space (RKHS) for estimating such ratios.
61	Asymptotically consistent estimation of the number of change points in highly dependent time series	Azadeh Khaleghi, Daniil Ryabko	Based on this reduction, an algorithm is proposed that finds the number of change points and locates the changes.
62	Coordinate-descent for learning orthogonal matrices through Givens rotations	Uri Shalit, Gal Chechik	Here we propose a framework for optimizing orthogonal matrices, that is the parallel of coordinate-descent in Euclidean spaces.
63	Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search	Anshumali Shrivastava, Ping Li	In this paper, we propose a hashing technique which generates all the necessary hash evaluations needed for similarity search, using one single permutation.
64	A Divide-and-Conquer Solver for Kernel Support Vector Machines	Cho-Jui Hsieh, Si Si, Inderjit Dhillon	In this paper, we propose and analyze a novel divide-and-conquer solver for kernel SVMs (DC-SVM).
65	Nuclear Norm Minimization via Active Subspace Selection	Cho-Jui Hsieh, Peder Olsen	We describe a novel approach to optimizing matrix problems involving nuclear norm regularization and apply it to the matrix completion problem.
66	Provable Bounds for Learning Some Deep Representations	Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma	We give algorithms with provable guarantees that learn a class of deep nets in the generative model view popularized by Hinton and others.
67	Large-scale Multi-label Learning with Missing Labels	Hsiang-Fu Yu, Prateek Jain, Purushottam Kar, Inderjit Dhillon	In this paper, we directly address both these problems by studying the multi-label problem in a generic empirical risk minimization (ERM) framework.
68	Learning Graphs with a Few Hubs	Rashish Tandon, Pradeep Ravikumar	We consider the problem of recovering the graph structure of a “hub-networked” Ising model given iid samples, under high-dimensional settings, where number of nodes p could be potentially larger than the number of samples n. By a “hub-networked” graph, we mean a graph with a few “hub nodes” with very large degrees.
69	Agnostic Bayesian Learning of Ensembles	Alexandre Lacoste, Mario Marchand, Fran�ois Laviolette, Hugo Larochelle	We propose a method for producing ensembles of predictors based on holdout estimations of their generalization performances.
70	Towards an optimal stochastic alternating direction method of multipliers	Samaneh Azadi, Suvrit Sra	This paper presents two new SADMM methods: (i) the first attains the minimax optimal rate of O(1/k) for nonsmooth strongly-convex stochastic problems; while (ii) the second progresses towards an optimal rate by exhibiting an O(1/k^2) rate for the smooth part.
71	Spherical Hamiltonian Monte Carlo for Constrained Target Distributions	Shiwei Lan, Bo Zhou, Babak Shahbaba	For such problems, we propose a novel Markov Chain Monte Carlo (MCMC) method that provides a general and computationally efficient framework for handling boundary conditions.
72	Efficient Continuous-Time Markov Chain Estimation	Monir Hajiaghayi, Bonnie Kirkpatrick, Liangliang Wang, Alexandre Bouchard-C�t�	We propose a particle-based Monte Carlo approach where the holding times are marginalized analytically.
73	DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition	Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, Trevor Darrell	We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re-purposed to novel generic tasks.
74	Making the Most of Bag of Words: Sentence Regularization with Alternating Direction Method of Multipliers	Dani Yogatama, Noah Smith	We introduce a learning algorithm that exploits this intuition by encoding it in a regularizer.
75	Narrowing the Gap: Random Forests In Theory and In Practice	Misha Denil, David Matheson, Nando De Freitas	In this paper we contribute to this understanding in two ways.
76	Coherent Matrix Completion	Yudong Chen, Srinadh Bhojanapalli, Sujay Sanghavi, Rachel Ward	Here, we show that nuclear norm minimization can recover an arbitrary n \times n matrix of rank r from O(nr log^2(n)) revealed entries, provided that revealed entries are drawn proportionally to the local row and column coherences (closely related to leverage scores) of the underlying matrix.
77	Admixture of Poisson MRFs: A Topic Model with Word Dependencies	David Inouye, Pradeep Ravikumar, Inderjit Dhillon	This paper introduces a new topic model based on an admixture of Poisson Markov Random Fields (APM), which can model dependencies between words as opposed to previous independent topic models such as PLSA (Hofmann, 1999), LDA (Blei et al., 2003) or SAM (Reisinger et al., 2010).
78	True Online TD(lambda)	Harm Seijen, Rich Sutton	In this paper we introduce a new forward view that takes into account the possibility of changing estimates and a new variant of TD(lambda) that exactly achieves it.
79	Memory Efficient Kernel Approximation	Si Si, Cho-Jui Hsieh, Inderjit Dhillon	Based on this observation, we propose a new kernel approximation algorithm – Memory Efficient Kernel Approximation (MEKA), which considers both low-rank and clustering structure of the kernel matrix.
80	Learning Sum-Product Networks with Direct and Indirect Variable Interactions	Amirmohammad Rooshenas, Daniel Lowd	In this paper, we present ID-SPN, a new algorithm for learning SPN structure that unifies the two approaches.
81	Hamiltonian Monte Carlo Without Detailed Balance	Jascha Sohl-Dickstein, Mayur Mudigonda, Michael DeWeese	We present a method for performing Hamiltonian Monte Carlo that largely eliminates sample rejection.
82	Filtering with Abstract Particles	Jacob Steinhardt, Percy Liang	We present a new filtering method that addresses this issue by using “abstract particles” that each represent an entire region of the state space.
83	Stochastic Dual Coordinate Ascent with Alternating Direction Method of Multipliers	Taiji Suzuki	We propose a new stochastic dual coordinate ascent technique that can be applied to a wide range of regularized learning problems.
84	Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction	Jian Zhou, Olga Troyanskaya	Here we present a new supervised generative stochastic network (GSN) based method to predict local secondary structure with deep hierarchical representations.
85	An Efficient Approach for Assessing Hyperparameter Importance	Frank Hutter, Holger Hoos, Kevin Leyton-Brown	This paper describes efficient methods that can be used to gain such insight, leveraging random forest models fit on the data already gathered by Bayesian optimization.
86	An Information Geometry of Statistical Manifold Learning	Ke Sun, St�phane Marchand-Maillet	We develop a manifold learning theory in a hypothesis space consisting of models.
87	Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem	Masrour Zoghi, Shimon Whiteson, Remi Munos, Maarten Rijke	This paper proposes a new method for the K-armed dueling bandit problem, a variation on the regular K-armed bandit problem that offers only relative feedback about pairs of arms.
88	Compact Random Feature Maps	Raffay Hamid, Ying Xiao, Alex Gittens, Dennis Decoste	We show how structured random matrices can be used to efficiently generate CRAFTMaps, and present a single-pass algorithm using CRAFTMaps to learn non-linear multi-class classifiers.
89	Concentration in unbounded metric spaces and algorithmic stability	Aryeh Kontorovich	To this end, we introduce the notion of the \em subgaussian diameter, which is a distribution-dependent refinement of the metric diameter.
90	Heavy-tailed regression with a generalized median-of-means	Daniel Hsu, Sivan Sabato	This work proposes a simple and computationally efficient estimator for linear regression, and other smooth and strongly convex loss minimization problems.
91	Spectral Bandits for Smooth Graph Functions	Michal Valko, Remi Munos, Branislav Kveton, Tom� Koc�k	In this paper, we study a bandit problem where the payoffs of arms are smooth on a graph.
92	Robust Principal Component Analysis with Complex Noise	Qian Zhao, Deyu Meng, Zongben Xu, Wangmeng Zuo, Lei Zhang	We propose a generative RPCA model under the Bayesian framework by modeling data noise as a mixture of Gaussians (MoG).
93	Scalable Semidefinite Relaxation for Maximum A Posterior Estimation	Qixing Huang, Yuxin Chen, Leonidas Guibas	In this paper, we propose a novel semidefinite relaxation formulation (referred to as SDR) to estimate the MAP assignment.
94	Square Deal: Lower Bounds and Improved Relaxations for Tensor Recovery	Cun Mu, Bo Huang, John Wright, Donald Goldfarb	We introduce a simple, new convex relaxation, which partially bridges this gap.
95	Automated inference of point of view from user interactions in collective intelligence venues	Sanmay Das, Allen Lavoie	We introduce a statistical framework which classifies point of view based on user interactions.
96	Rank-One Matrix Pursuit for Matrix Completion	Zheng Wang, Ming-Jun Lai, Zhaosong Lu, Wei Fan, Hasan Davulcu, Jieping Ye	In this paper, we present an efficient and scalable algorithm for matrix completion.
97	Near-Optimal Joint Object Matching via Convex Relaxation	Yuxin Chen, Leonidas Guibas, Qixing Huang	In this paper, we propose an algorithm to jointly match multiple objects that exhibit only partial similarities, where the provided pairwise feature correspondences can be densely corrupted.
98	Convex Total Least Squares	Dmitry Malioutov, Nikolai Slavov	We describe a fast solution based on augmented Lagrangian formulation, and apply our approach to an important class of biological problems that use population average measurements to infer cell-type and physiological-state specific expression levels that are very hard to measure directly.
99	On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection	Pratik Jawanpuria, Manik Varma, Saketha Nath	Our objective is to develop formulations and algorithms for efficiently computing the feature selection path – i.e. the variation in classification accuracy as the fraction of selected features is varied from null to unity.
100	Gradient Hard Thresholding Pursuit for Sparsity-Constrained Optimization	Xiaotong Yuan, Ping Li, Tong Zhang	In this paper, we generalize HTP from compressed sensing to a generic problem setup of sparsity-constrained convex optimization.
101	A Unified Framework for Consistency of Regularized Loss Minimizers	Jean Honorio, Tommi Jaakkola	We characterize a family of regularized loss minimization problems that satisfy three properties: scaled uniform convergence, super-norm regularization, and norm-loss monotonicity.
102	Geodesic Distance Function Learning via Heat Flow on Vector Fields	Binbin Lin, Ji Yang, Xiaofei He, Jieping Ye	In this paper, we propose to learn the distance function directly on the manifold without embedding.
103	Near-Optimally Teaching the Crowd to Classify	Adish Singla, Ilija Bogunovic, Gabor Bartok, Amin Karbasi, Andreas Krause	We propose a natural stochastic model of the learners, modeling them as randomly switching among hypotheses based on observed feedback.
104	On the convergence of no-regret learning in selfish routing	Walid Krichene, Benjamin Drigh�s, Alexandre Bayen	We consider a model in which players use regret-minimizing algorithms as the learning mechanism, and study the resulting dynamics.
105	Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques	J�r�mie Mary, Philippe Preux, Olivier Nicol	After highlighting the limitations of the previous methods, we present a new method, based on bootstrapping techniques.
106	Scaling Up Robust MDPs using Function Approximation	Aviv Tamar, Shie Mannor, Huan Xu	In this work we employ a reinforcement learning approach to tackle this planning problem: we develop a robust approximate dynamic programming method based on a projected fixed point equation to approximately solve large scale robust MDPs.
107	Marginal Structured SVM with Hidden Variables	Wei Ping, Qiang Liu, Alex Ihler	In this work, we propose the marginal structured SVM (MSSVM) for structured prediction with hidden variables.
108	Linear and Parallel Learning of Markov Random Fields	Yariv Mizrahi, Misha Denil, Nando De Freitas	We introduce a new embarrassingly parallel parameter learning algorithm for Markov random fields which is efficient for a large class of practical models.
109	Pitfalls in the use of Parallel Inference for the Dirichlet Process	Yarin Gal, Zoubin Ghahramani	In this paper we show that the approach suggested is impractical due to an extremely unbalanced distribution of the data.
110	Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing	Yuan Zhou, Xi Chen, Jian Li	We propose a new PAC algorithm, which, with probability at least 1-δ, identifies a set of K arms with regret at most ε.
111	Deep Generative Stochastic Networks Trainable by Backprop	Yoshua Bengio, Eric Laufer, Guillaume Alain, Jason Yosinski	We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood.
112	A Highly Scalable Parallel Algorithm for Isotropic Total Variation Models	Jie Wang, Qingyang Li, Sen Yang, Wei Fan, Peter Wonka, Jieping Ye	In this paper, we propose a highly scalable parallel algorithm for TV models that is based on a novel decomposition strategy of the problem domain.
113	Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting	Yudong Chen, Jiaming Xu	The planted models assume that a graph is generated from some unknown clusters by randomly placing edges between nodes according to their cluster memberships; the task is to recover the clusters given the graph.
114	Gaussian Process Optimization with Mutual Information	Emile Contal, Vianney Perchet, Nicolas Vayatis	In this paper, we analyze a generic algorithm scheme for sequential global optimization using Gaussian processes.
115	Aggregating Ordinal Labels from Crowds by Minimax Conditional Entropy	Dengyong Zhou, Qiang Liu, John Platt, Christopher Meek	We propose a method to aggregate noisy ordinal labels collected from a crowd of workers or annotators.
116	Exchangeable Variable Models	Mathias Niepert, Pedro Domingos	We introduce exchangeable variable models (EVMs) as a novel class of probabilistic models whose basic building blocks are partially exchangeable sequences, a generalization of exchangeable sequences.
117	Clustering in the Presence of Background Noise	Shai Ben-David, Nika Haghtalab	We propose a simple and efficient method to turn any centroid-based clustering algorithm into a noise-robust one, and prove robustness guarantees for our method with respect to these measures.
118	Safe Screening with Variational Inequalities and Its Application to Lasso	Jun Liu, Zheng Zhao, Jie Wang, Jieping Ye	In this paper, we propose an approach called “Sasvi” (Safe screening with variational inequalities).
119	Learning the Consistent Behavior of Common Users for Target Node Prediction across Social Networks	Shan-Hung Wu, Hao-Heng Chien, Kuan-Hua Lin, Philip Yu	In this paper, we propose learning the consistent behavior of common users to help the knowledge transfer.
120	Signal recovery from Pooling Representations	Joan Bruna Estrach, Arthur Szlam, Yann LeCun	We address this latter question by computing the upper and lower Lipschitz bounds of \ell_p pooling operators for p=1, 2, ∞as well as their half-rectified equivalents, which give sufficient conditions for the design of invertible pooling layers.
121	PAC-inspired Option Discovery in Lifelong Reinforcement Learning	Emma Brunskill, Lihong Li	In this work, we provide the first formal analysis of the sample complexity, a measure of learning speed, of reinforcement learning with options.
122	Multi-label Classification via Feature-aware Implicit Label Space Encoding	Zijia Lin, Guiguang Ding, Mingqing Hu, Jianmin Wang	In this paper, we propose a novel method termed FaIE to perform LSDR via Feature-aware Implicit label space Encoding.
123	Scalable Gaussian Process Structured Prediction for Grid Factor Graph Applications	Sebastien Bratieres, Novi Quadrianto, Sebastian Nowozin, Zoubin Ghahramani	Here we explore a scalable approach to learning GPstruct models based on ensemble learning, with weak learners (predictors) trained on subsets of the latent variables and bootstrap data, which can easily be distributed.
124	Anomaly Ranking as Supervised Bipartite Ranking	Stephan Cl�men�on, Sylvain Robbiano	In this paper, it is proved that, in the case where the data generating probability distribution has compact support, anomaly ranking is equivalent to (supervised) bipartite ranking, where the goal is to discriminate between the underlying probability distribution and the uniform distribution with same support.
125	Hierarchical Quasi-Clustering Methods for Asymmetric Networks	Gunnar Carlsson, Facundo M�moli, Alejandro Ribeiro, Santiago Segarra	This paper introduces hierarchical quasi-clustering methods, a generalization of hierarchical clustering for asymmetric networks where the output structure preserves the asymmetry of the input data.
126	Rectangular Tiling Process	Masahiro Nakano, Katsuhiko Ishiguro, Akisato Kimura, Takeshi Yamada, Naonori Ueda	In this paper, we propose a new probabilistic model of arbitrary partitioning called the rectangular tiling process (RTP).
127	Two-Stage Metric Learning	Jun Wang, Ke Sun, Fei Sha, St�phane Marchand-Maillet, Alexandros Kalousis	In this paper, we present a novel two-stage metric learning algorithm.
128	Stochastic Inference for Scalable Probabilistic Modeling of Binary Matrices	Jose Miguel Hernandez-Lobato, Neil Houlsby, Zoubin Ghahramani	We derive an efficient stochastic inference algorithm for PMF models of fully observed binary matrices.
129	Elementary Estimators for High-Dimensional Linear Regression	Eunho Yang, Aurelie Lozano, Pradeep Ravikumar	In this paper, we attempt to address this scaling issue at the source, by asking whether one can build \emphsimpler possibly closed-form estimators, that yet come with statistical guarantees that are nonetheless comparable to regularized likelihood estimators!
130	Elementary Estimators for Sparse Covariance Matrices and other Structured Moments	Eunho Yang, Aurelie Lozano, Pradeep Ravikumar	We propose a class of elementary convex estimators, that in many cases are available in \emphclosed-form, for estimating general structured moments.
131	Graph-based Semi-supervised Learning: Realizing Pointwise Smoothness Probabilistically	Yuan Fang, Kevin Chang, Hady Lauw	In this paper, we study two complementary dimensions of smoothness: its pointwise nature and probabilistic modeling.
132	Bayesian Max-margin Multi-Task Learning with Data Augmentation	Chengtao Li, Jun Zhu, Jianfei Chen	We present Bayesian max-margin multi-task learning, which conjoins the two schools of methods, thus allowing the discriminative max-margin methods to enjoy the great flexibility of Bayesian methods on incorporating rich prior information as well as performing nonparametric Bayesian feature learning with the latent dimensionality resolved from data.
133	Sparse Reinforcement Learning via Convex Optimization	Zhiwei Qin, Weichang Li, Firdaus Janoos	We propose two new algorithms for the sparse reinforcement learning problem based on different formulations.
134	Gaussian Process Classification and Active Learning with Multiple Annotators	Filipe Rodrigues, Francisco Pereira, Bernardete Ribeiro	In this paper, we extend GP classification in order to account for multiple annotators with different levels expertise.
135	Structured Prediction of Network Response	Hongyu Su, Aristides Gionis, Juho Rousu	To solve the problems, we present an approximate inference method through a semi-definite programming relaxation (SDP), as well as a more scalable greedy heuristic algorithm. We introduce the following network response problem: given a complex network and an action, predict the subnetwork that responds to action, that is, which nodes perform the action and which directed edges relay the action to the adjacent nodes.
136	An Analysis of State-Relevance Weights and Sampling Distributions on L1-Regularized Approximate Linear Programming Approximation Accuracy	Gavin Taylor, Connor Geer, David Piekut	In this paper, we discuss and explain the effects of choices in the state-relevance weights and sampling distribution on approximation quality, using both theoretical and experimental illustrations.
137	Optimization Equivalence of Divergences Improves Neighbor Embedding	Zhirong Yang, Jaakko Peltonen, Samuel Kaski	Through the equivalences we represent several nonlinear dimensionality reduction and graph drawing methods in a generalized stochastic neighbor embedding setting, where information divergences are minimized between similarities in input and output spaces, and the optimal connection scalar provides a natural choice for the tradeoff between attractive and repulsive forces.
138	An Asynchronous Parallel Stochastic Coordinate Descent Algorithm	Ji Liu, Steve Wright, Christopher Re, Victor Bittorf, Srikrishna Sridhar	We describe an asynchronous parallel stochastic coordinate descent algorithm for minimizing smooth unconstrained or separably constrained functions.
139	Consistency of Causal Inference under the Additive Noise Model	Samory Kpotufe, Eleni Sgouritsa, Dominik Janzing, Bernhard Sch�lkopf	We analyze a family of methods for statistical causal inference from sample under the so-called Additive Noise Model.
140	Globally Convergent Parallel MAP LP Relaxation Solver using the Frank-Wolfe Algorithm	Alexander Schwing, Tamir Hazan, Marc Pollefeys, Raquel Urtasun	In this paper we suggest to decouple the quadratic program based on the Frank-Wolfe approach.
141	Linear Programming for Large-Scale Markov Decision Problems	Alan Malek, Yasin Abbasi-Yadkori, Peter Bartlett	We propose two techniques, one based on stochastic convex optimization, and one based on constraint sampling.
142	Linear Time Solver for Primal SVM	Feiping Nie, Yizhen Huang, Heng Huang	This paper presents a new L2-norm regularized primal SVM solver using Augmented Lagrange Multipliers, with linear-time computational cost for Lp-norm loss functions.
143	Memory (and Time) Efficient Sequential Monte Carlo	Seong-Hwan Jun, Alexandre Bouchard-C�t�	Our contribution is a simple scheme that makes the memory cost of SMC methods depends on the number of distinct particles that survive resampling.
144	Scaling SVM and Least Absolute Deviations via Exact Data Reduction	Jie Wang, Peter Wonka, Jieping Ye	Motivated by this observation, we present fast and efficient screening rules to discard non-support vectors by analyzing the dual problem of SVM via variational inequalities (DVI).
145	Latent Semantic Representation Learning for Scene Classification	Xin Li, Yuhong Guo	In this work, we address this problem by proposing a novel patch-based latent variable model to integrate latent contextual representation learning and classification model training in one joint optimization framework.
146	Least Squares Revisited: Scalable Approaches for Multi-class Prediction	Alekh Agarwal, Sham Kakade, Nikos Karampatziakis, Le Song, Gregory Valiant	On the theoretical front, we present several variants with convergence guarantees.
147	Local algorithms for interactive clustering	Pranjal Awasthi, Maria Balcan, Konstantin Voevodski	We study the design of interactive clustering algorithms for data sets satisfying natural stability assumptions.
148	Model-Based Relational RL When Object Existence is Partially Observable	Ngo Ahn Vien, Marc Toussaint	We propose a computationally efficient extension of model-based relational RL methods that approximates these beliefs using discrete uncertainty predicates.
149	A new Q(lambda) with interim forward view and Monte Carlo equivalence	Rich Sutton, Ashique Rupam Mahmood, Doina Precup, Hado Hasselt	In this paper, we introduce a new version of Q(lambda) that does exactly that, without significantly increased algorithmic complexity.
150	On Robustness and Regularization of Structural Support Vector Machines	Mohamad Ali Torkamani, Daniel Lowd	In this paper, we explore the problem of learning robust models for structured prediction problems.
151	Guess-Averse Loss Functions For Cost-Sensitive Multiclass Boosting	Oscar Beijbom, Mohammad Saberian, David Kriegman, Nuno Vasconcelos	Guess-Averse Loss Functions For Cost-Sensitive Multiclass Boosting
152	Multimodal Neural Language Models	Ryan Kiros, Ruslan Salakhutdinov, Rich Zemel	We introduce two multimodal neural language models: models of natural language that can be conditioned on other modalities.
153	Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods	Jascha Sohl-Dickstein, Ben Poole, Surya Ganguli	We present an algorithm for minimizing a sum of functions that combines the computational efficiency of stochastic gradient descent (SGD) with the second order curvature information leveraged by quasi-Newton methods.
154	Alternating Minimization for Mixed Linear Regression	Xinyang Yi, Constantine Caramanis, Sujay Sanghavi	In this paper we provide a new initialization procedure for EM, based on finding the leading two eigenvectors of an appropriate matrix.
155	Stochastic Neighbor Compression	Matt Kusner, Stephen Tyree, Kilian Weinberger, Kunal Agrawal	We present Stochastic Neighborhood Compression (SNC), an algorithm to compress a dataset for the purpose of k-nearest neighbor (kNN) classification.
156	Robust Learning under Uncertain Test Distributions: Relating Covariate Shift to Model Misspecification	Junfeng Wen, Chun-Nam Yu, Russell Greiner	Our empirical studies, on UCI datasets and a real-world cancer prognostic prediction dataset, show that our analysis applies, and that our RCSA works effectively.
157	Nonparametric Estimation of Multi-View Latent Variable Models	Le Song, Animashree Anandkumar, Bo Dai, Bo Xie	In this paper, we propose a kernel method for learning multi-view latent variable models, allowing each mixture component to be nonparametric and learned from data in an unsupervised fashion.
158	Structured Generative Models of Natural Source Code	Chris Maddison, Daniel Tarlow	We study the problem of building generative models of natural source code (NSC); that is, source code written and understood by humans.
159	A Single-Pass Algorithm for Efficiently Recovering Sparse Cluster Centers of High-dimensional Data	Jinfeng Yi, Lijun Zhang, Jun Wang, Rong Jin, Anil Jain	In this work, we focus on the problem of clustering high-dimensional data with sparse centers.
160	Statistical analysis of stochastic gradient methods for generalized linear models	Panagiotis Toulis, Edoardo Airoldi, Jason Rennie	We study the statistical properties of stochastic gradient descent (SGD) using explicit and implicit updates for fitting generalized linear models (GLMs).
161	Coding for Random Projections	Ping Li, Michael Mitzenmacher, Anshumali Shrivastava	In this paper, we study a number of simple coding schemes, focusing on the task of similarity estimation and on an application to training linear classifiers.
162	Fast Computation of Wasserstein Barycenters	Marco Cuturi, Arnaud Doucet	We present new algorithms to compute the mean of a set of N empirical probability measures under the optimal transport metric.
163	Global graph kernels using geometric embeddings	Fredrik Johansson, Vinay Jethava, Devdatt Dubhashi, Chiranjib Bhattacharyya	This paper presents two graph kernels defined on unlabeled graphs which capture global properties of graphs using the celebrated Lovász number and its associated orthonormal representation.
164	Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data	Zhiyuan Chen, Bing Liu	To address this problem, several knowledge-based topic models have been proposed to incorporate prior domain knowledge from the user.
165	K-means recovers ICA filters when independent components are sparse	Alon Vinnikov, Shai Shalev-Shwartz	The goal of this work is to shed light on the success of K-means with whitening for the task of unsupervised feature learning.
166	Learning Mixtures of Linear Classifiers	Yuekai Sun, Stratis Ioannidis, Andrea Montanari	We consider a discriminative learning (regression) problem, whereby the regression function is a convex combination of k linear classifiers.
167	The Falling Factorial Basis and Its Statistical Applications	Yu-Xiang Wang, Alex Smola, Ryan Tibshirani	We study a novel spline-like basis, which we name the \it falling factorial basis, bearing many similarities to the classic truncated power basis.
168	Nonmyopic e-Bayes-Optimal Active Learning of Gaussian Processes	Trong Nghia Hoang, Bryan Kian Hsiang Low, Patrick Jaillet, Mohan Kankanhalli	This paper presents a novel nonmyopic ε-Bayes-optimal active learning (ε-BAL) approach that jointly and naturally optimizes the trade-off.
169	A Unifying View of Representer Theorems	Andreas Argyriou, Francesco Dinuzzo	In this paper we propose a unified view, which generalizes the concept of representer theorems and extends necessary and sufficient conditions for such theorems to hold.
170	Online Clustering of Bandits	Claudio Gentile, Shuai Li, Giovanni Zappella	We introduce a novel algorithmic approach to content recommendation based on adaptive clustering of exploration-exploitation (“bandit”) strategies.
171	Cold-start Active Learning with Robust Ordinal Matrix Factorization	Neil Houlsby, Jose Miguel Hernandez-Lobato, Zoubin Ghahramani	We present a new matrix factorization model for rating data and a corresponding active learning strategy to address the cold-start problem.
172	Multivariate Maximal Correlation Analysis	Hoang Vu Nguyen, Emmanuel M�ller, Jilles Vreeken, Pavel Efros, Klemens B�hm	We propose MAC, a novel multivariate correlation measure designed for discovering multi-dimensional patterns.
173	Efficient Label Propagation	Yasuhiro Fujiwara, Go Irie	This paper proposes an efficient label propagation algorithm that guarantees exactly the same labeling results as those yielded by optimal labeling scores.
174	Estimating Diffusion Network Structures: Recovery Conditions, Sample Complexity & Soft-thresholding Algorithm	Hadi Daneshmand, Manuel Gomez-Rodriguez, Le Song, Bernhard Schoelkopf	In this paper, we investigate the network structure inference problem for a general family of continuous-time diffusion models using an l1-regularized likelihood maximization framework.
175	Coupled Group Lasso for Web-Scale CTR Prediction in Display Advertising	Ling Yan, Wu-Jun Li, Gui-Rong Xue, Dingyi Han	In this paper, we propose a novel model, called coupled group lasso(CGL), for CTR prediction in display advertising.
176	Putting MRFs on a Tensor Train	Alexander Novikov, Anton Rodomanov, Anton Osokin, Dmitry Vetrov	In the paper we present a new framework for dealing with probabilistic graphical models.
177	Efficient Algorithms for Robust One-bit Compressive Sensing	Lijun Zhang, Jinfeng Yi, Rong Jin	In this paper, we study the vector recovery problem from noisy one-bit measurements, and develop two novel algorithms with formal theoretical guarantees.
178	Learning Complex Neural Network Policies with Trajectory Optimization	Sergey Levine, Vladlen Koltun	In this work, we introduce a policy search algorithm that can directly learn high-dimensional, general-purpose policies, represented by neural networks.
179	Composite Quantization for Approximate Nearest Neighbor Search	Ting Zhang, Chao Du, Jingdong Wang	This paper presents a novel compact coding approach, composite quantization, for approximate nearest neighbor search.
180	Local Ordinal Embedding	Yoshikazu Terada, Ulrike Luxburg	We study the problem of ordinal embedding: given a set of ordinal constraints of the form distance(i,j) < distance(k,l) for some_quadruples (i,j,k,l) of indices, the goal is to construct a point configuration \hat\bmx_1, …, \hat\bmx_n in \R^p that preserves these constraints as well as possible.
181	Reducing Dueling Bandits to Cardinal Bandits	Nir Ailon, Zohar Karnin, Thorsten Joachims	We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem.
182	Large-margin Weakly Supervised Dimensionality Reduction	Chang Xu, Dacheng Tao, Chao Xu, Yong Rui	A novel framework is proposed that integrates two aspects of the large margin principle (angle and distance), which simultaneously encourage angle consistency between preference pairs and maximize the distance between examples in preference pairs.
183	Joint Inference of Multiple Label Types in Large Networks	Deepayan Chakrabarti, Stanislav Funiak, Jonathan Chang, Sofus Macskassy	We tackle the problem of inferring node labels in a partially labeled graph where each node in the graph has multiple label types and each label type has a large number of possible labels.
184	Hard-Margin Active Linear Regression	Elad Hazan, Zohar Karnin	We consider the fundamental problem of linear regression in which the designer can actively choose observations.
185	Maximum Margin Multiclass Nearest Neighbors	Aryeh Kontorovich, Roi Weiss	We develop a general framework for margin-based multicategory classification in metric spaces.
186	Combinatorial Partial Monitoring Game with Linear Feedback and Its Applications	Tian Lin, Bruno Abrahao, Robert Kleinberg, John Lui, Wei Chen	In this paper, we propose the model of combinatorial partial monitoring games with linear feedback, a model which simultaneously addresses limited feedback, infinite outcome space of the environment and exponentially large action space of the player.
187	Sparse meta-Gaussian information bottleneck	Melani Rey, Volker Roth, Thomas Fuchs	We present a new sparse compression technique based on the information bottleneck (IB) principle, which takes into account side information.
188	Nonparametric Estimation of Renyi Divergence and Friends	Akshay Krishnamurthy, Kirthevasan Kandasamy, Barnabas Poczos, Larry Wasserman	We consider nonparametric estimation of L_2, Renyi-αand Tsallis-αdivergences between continuous distributions.
189	Robust Inverse Covariance Estimation under Noisy Measurements	Jun-Kun Wang, Shou-de Lin	This paper proposes a robust method to estimate the inverse covariance under noisy measurements.
190	Bayesian Optimization with Inequality Constraints	Jacob Gardner, Matt Kusner, Zhixiang, Kilian Weinberger, John Cunningham	Here we present constrained Bayesian optimization, which places a prior distribution on both the objective and the constraint functions.
191	Circulant Binary Embedding	Felix Yu, Sanjiv Kumar, Yunchao Gong, Shih-Fu Chang	To address this problem, we propose Circulant Binary Embedding (CBE) which generates binary codes by projecting the data with a circulant matrix.
192	Multiple Testing under Dependence via Semiparametric Graphical Models	Jie Liu, Chunming Zhang, Elizabeth Burnside, David Page	We propose a novel semiparametric approach for multiple testing under dependence, which estimates f1 adaptively.
193	Making Fisher Discriminant Analysis Scalable	Bojun Tu, Zhihua Zhang, Shusen Wang, Hui Qian	In this paper we present theoretical analysis on the approximation error of a two-stage algorithm.
194	Hierarchical Dirichlet Scaling Process	Dongwoo Kim, Alice Oh	We present the hierarchical Dirichlet scaling process (HDSP), a Bayesian nonparametric mixed membership model for multi-labeled data.
195	Approximation Analysis of Stochastic Gradient Langevin Dynamics by using Fokker-Planck Equation and Ito Process	Issei Sato, Hiroshi Nakagawa	We theoretically analyze the SGLD algorithm with constant stepsize in two ways.
196	A PAC-Bayesian bound for Lifelong Learning	Anastasia Pentina, Christoph Lampert	In this work we study lifelong learning from a theoretical perspective.
197	Communication-Efficient Distributed Optimization using an Approximate Newton-type Method	Ohad Shamir, Nati Srebro, Tong Zhang	We present a novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems.
198	Concept Drift Detection Through Resampling	Maayan Harel, Shie Mannor, Ran El-Yaniv, Koby Crammer	We present theoretical guarantees for the proposed procedure based on the stability of the underlying learning algorithms.
199	Anti-differentiating approximation algorithms:A case study with min-cuts, spectral, and flow	David Gleich, Michael Mahoney	We explore this concept with a case study of approximation algorithms for finding locally-biased partitions in data graphs, demonstrating connections between min-cut objectives, a personalized version of the popular PageRank vector, and the highly effective “push” procedure for computing an approximation to personalized PageRank.
200	A Bayesian Wilcoxon signed-rank test based on the Dirichlet process	Alessio Benavoli, Giorgio Corani, Francesca Mangili, Marco Zaffalon, Fabrizio Ruggeri	We propose a nonparametric Bayesian version of the Wilcoxon signed-rank test using a Dirichlet process (DP) based prior.
201	Min-Max Problems on Factor Graphs	Siamak Ravanbakhsh, Christopher Srinivasa, Brendan Frey, Russell Greiner	We study the min-max problem in factor graphs, which seeks the assignment that minimizes the maximum value over all factors.
202	Distributed Stochastic Gradient MCMC	Sungjin Ahn, Babak Shahbaba, Max Welling	Here we introduce the first fully distributed MCMC algorithm based on stochastic gradients.
203	Nearest Neighbors Using Compact Sparse Codes	Anoop Cherian	In this paper, we propose a novel scheme for approximate nearest neighbor (ANN) retrieval based on dictionary learning and sparse coding.
204	Optimal Mean Robust Principal Component Analysis	Feiping Nie, Jianjun Yuan, Heng Huang	In this paper, we propose novel robust PCA objective functions with removing optimal mean automatically.
205	Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows	Robert Busa-Fekete, Eyke Huellermeier, Bal�zs Sz�r�nyi	We address the problem of rank elicitation assuming that the underlying data generating process is characterized by a probability distribution on the set of all rankings (total orders) of a given set of items.
206	Hierarchical Conditional Random Fields for Outlier Detection: An Application to Detecting Epileptogenic Cortical Malformations	Bilal Ahmed, Thomas Thesen, Karen Blackmon, Yijun Zhao, Orrin Devinsky, Ruben Kuzniecky, Carla Brodley	We cast the problem of detecting and isolating regions of abnormal cortical tissue in the MRIs of epilepsy patients in an image segmentation framework.
207	A Physics-Based Model Prior for Object-Oriented MDPs	Jonathan Scholz, Martin Levihn, Charles Isbell, David Wingate	We present a physics-based approach that exploits modern simulation tools to efficiently parameterize physical dynamics.
208	Outlier Path: A Homotopy Algorithm for Robust SVM	Shinya Suzumura, Kohei Ogawa, Masashi Sugiyama, Ichiro Takeuchi	In this paper, we address these two issues simultaneously in an integrated way by introducing a novel homotopy approach to RSVM learning.
209	Ensemble-Based Tracking: Aggregating Crowdsourced Structured Time Series Data	Naiyan Wang, Dit-Yan Yeung	We propose a factorial hidden Markov model (FHMM) for ensemble-based tracking by learning jointly the unknown trajectory of the target and the reliability of each tracker in the ensemble.
210	Latent Confusion Analysis by Normalized Gamma Construction	Issei Sato, Hisashi Kashima, Hiroshi Nakagawa	We aim at summarizing the workers’ confusion matrices with the small number of latent principal confusion matrices because many personal confusion matrices is difficult to analyze.
211	Finito: A faster, permutable incremental gradient method for big data problems	Aaron Defazio, Justin Domke, Caetano	In this work we introduce a new method in this class with a theoretical convergence rate four times faster than existing methods, for sums with sufficiently many terms.
212	Ensemble Methods for Structured Prediction	Corinna Cortes, Vitaly Kuznetsov, Mehryar Mohri	We present a series of learning algorithms and theoretical guarantees for designing accurate ensembles of structured prediction tasks.
213	Standardized Mutual Information for Clustering Comparisons: One Step Further in Adjustment for Chance	Simone Romano, James Bailey, Vinh Nguyen, Karin Verspoor	In this paper, we argue that a further type of statistical adjustment for the mutual information is also beneficial – an adjustment to correct selection bias.
214	Preserving Modes and Messages via Diverse Particle Selection	Jason Pacheco, Silvia Zuffi, Michael Black, Erik Sudderth	We develop a particle-based max-product algorithm which maintains a diverse set of posterior mode hypotheses, and is robust to initialization.
215	Nonlinear Information-Theoretic Compressive Measurement Design	Liming Wang, Abolfazl Razi, Miguel Rodrigues, Robert Calderbank, Lawrence Carin	We investigate design of general nonlinear functions for mapping high-dimensional data into a lower-dimensional (compressive) space.
216	Dual Query: Practical Private Query Release for High Dimensional Data	Marco Gaboardi, Emilio Jesus Gallego Arias, Justin Hsu, Aaron Roth, Zhiwei Steven Wu	We present a practical, differentially private algorithm for answering a large number of queries on high dimensional datasets.
217	Deep Boosting	Corinna Cortes, Mehryar Mohri, Umar Syed	We present a new ensemble learning algorithm, DeepBoost, which can use as base classifiers a hypothesis set containing deep decision trees, or members of other rich or complex families, and succeed in achieving high accuracy without overfitting the data.
218	Distributed Representations of Sentences and Documents	Quoc Le, Tomas Mikolov	In this paper, we propose an unsupervised algorithm that learns vector representations of sentences and text documents.
219	Understanding Protein Dynamics with L1-Regularized Reversible Hidden Markov Models	Robert McGibbon, Bharath Ramsundar, Mohammad Sultan, Gert Kiss, Vijay Pande	We present a machine learning framework for modeling protein dynamics.
220	Online Multi-Task Learning for Policy Gradient Methods	Haitham Bou Ammar, Eric Eaton, Paul Ruvolo, Matthew Taylor	To make agents more sample-efficient, we developed a multi-task policy gradient method to learn decision making tasks consecutively, transferring knowledge between tasks to accelerate learning.
221	Affinity Weighted Embedding	Jason Weston, Ron Weiss, Hector Yee	We propose a new class of models which aim to provide improved performance while retaining many of the benefits of the existing class of embedding models.
222	Learning the Parameters of Determinantal Point Process Kernels	Raja Hafiz Affandi, Emily Fox, Ryan Adams, Ben Taskar	Here we propose Bayesian methods for learning the DPP kernel parameters.
223	Discrete Chebyshev Classifiers	Elad Eban, Elad Mezuman, Amir Globerson	Here we present a framework for discriminative learning given a set of statistics.
224	Deep AutoRegressive Networks	Karol Gregor, Ivo Danihelka, Andriy Mnih, Charles Blundell, Daan Wierstra	We introduce a deep, generative autoencoder capable of learning hierarchies of distributed representations from data.
225	A Convergence Rate Analysis for LogitBoost, MART and Their Variant	Peng Sun, Tong Zhang, Jie Zhou	We analyze their convergence rates based on a new weak learnability formulation.
226	Inferning with High Girth Graphical Models	Uri Heinemann, Amir Globerson	Motivated by this, we propose an algorithm that always returns models of this type, and hence in the models it returns inference is approximately correct.
227	Learning Latent Variable Gaussian Graphical Models	Zhaoshi Meng, Brian Eriksson, Al Hero	In this paper, we focus on a family of latent variable Gaussian graphical models (LVGGM), where the model is conditionally sparse given latent variables, but marginally non-sparse.
228	Stochastic Backpropagation and Approximate Inference in Deep Generative Models	Danilo Jimenez Rezende, Shakir Mohamed, Daan Wierstra	Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound.
229	One Practical Algorithm for Both Stochastic and Adversarial Bandits	Yevgeny Seldin, Aleksandrs Slivkins	We present an algorithm for multiarmed bandits that achieves almost optimal performance in both stochastic and adversarial regimes without prior knowledge about the nature of the environment.
230	Robust and Efficient Kernel Hyperparameter Paths with Guarantees	Joachim Giesen, Soeren Laue, Patrick Wieschollek	We use this algorithm to compute approximate kernel hyperparamter solution paths for support vector machines and robust kernel regression.
231	Active Transfer Learning under Model Shift	Xuezhi Wang, Tzu-Kuo Huang, Jeff Schneider	We propose two transfer learning algorithms that allow changes in all marginal and conditional distributions but assume the changes are smooth in order to achieve transfer between the tasks.
232	Approximate Policy Iteration Schemes: A Comparison	Bruno Scherrer	For all algorithms, we describe performance bounds, and make a comparison by paying a particular attention to the concentrability constants involved, the number of iterations and the memory required.
233	Stable and Efficient Representation Learning with Nonnegativity Constraints	Tsung-Han Lin, H. T. Kung	In this work, we provide extensive analysis and experimental results to examine and validate the stability advantage of NOMP.
234	Sample Efficient Reinforcement Learning with Gaussian Processes	Robert Grande, Thomas Walsh, Jonathan How	This paper derives sample complexity results for using Gaussian Processes (GPs) in both model-based and model-free reinforcement learning (RL).
235	Memory and Computation Efficient PCA via Very Sparse Random Projections	Farhad Pourkamali Anaraki, Shannon Hughes	In this paper, we propose an approach to principal component estimation that utilizes projections onto very sparse random vectors with Bernoulli-generated nonzero entries.
236	Time-Regularized Interrupting Options (TRIO)	Timothy Mann, Daniel Mankowitz, Shie Mannor	Therefore we introduce a regularization term that favors longer duration skills.
237	Randomized Nonlinear Component Analysis	David Lopez-Paz, Suvrit Sra, Alex Smola, Zoubin Ghahramani, Bernhard Schoelkopf	In this paper we leverage randomness to design scalable new variants of nonlinear PCA and CCA; our ideas extend to key multivariate analysis tools such as spectral clustering or LDA.
238	High Order Regularization for Semi-Supervised Learning of Structured Output Problems	Yujia Li, Rich Zemel	We propose a new max-margin framework for semi-supervised structured output learning, that allows the use of powerful discrete optimization algorithms and high order regularizers defined directly on model predictions for the unlabeled examples.
239	Transductive Learning with Multi-class Volume Approximation	Gang Niu, Bo Dai, Christoffel Plessis, Masashi Sugiyama	In this paper, we propose a novel generalization to multiple classes, allowing applications of the large volume principle on more learning problems such as multi-class, multi-label and serendipitous learning in a transductive manner.
240	Methods of Moments for Learning Stochastic Languages: Unified Presentation and Empirical Comparison	Borja Balle, William Hamilton, Joelle Pineau	In this work, we provide a unified presentation and empirical comparison of three general moment-based methods in the context of modelling stochastic languages.
241	Effective Bayesian Modeling of Groups of Related Count Time Series	Nicolas Chapados	This paper introduces a hierarchical Bayesian formulation applicable to count time series that can easily account for explanatory variables and share statistical strength across groups of related time series.
242	Variational Inference for Sequential Distance Dependent Chinese Restaurant Process	Sergey Bartunov, Dmitry Vetrov	In this paper we propose novel variational inference for important sequential case of ddCRP (seqddCRP) by revealing its connection with Laplacian of random graph constructed by the process.
243	Discovering Latent Network Structure in Point Process Data	Scott Linderman, Ryan Adams	To enable analysis of these implicit networks, we develop a probabilistic model that combines mutually-exciting point processes with random graph models.
244	A Kernel Independence Test for Random Processes	Kacper Chwialkowski, Arthur Gretton	A non-parametric approach to the problem of testing the independence of two random processes is developed.
245	Learning to Disentangle Factors of Variation with Manifold Interaction	Scott Reed, Kihyuk Sohn, Yuting Zhang, Honglak Lee	We propose to learn manifold coordinates for the relevant factors of variation and to model their joint interaction.
246	Learning Modular Structures from Network Data and Node Variables	Elham Azizi, Edoardo Airoldi, James Galagan	Here, we propose an extended model that leverages direct observations about the network in addition to node-specific variables.
247	Probabilistic Partial Canonical Correlation Analysis	Yusuke Mukuta, Harada	In this paper, we have addressed these problems by proposing a probabilistic interpretation of partial CCA and deriving a Bayesian estimation method based on the probabilistic model.
248	Skip Context Tree Switching	Marc Bellemare, Joel Veness, Erik Talvitie	In this paper we show how to generalize this technique to the class of K-skip prediction suffix trees.
249	Lower Bounds for the Gibbs Sampler over Mixtures of Gaussians	Christopher Tosh, Sanjoy Dasgupta	In this paper, we present lower bounds for the mixing time of the Gibbs sampler over Gaussian mixture models with Dirichlet priors.
250	Marginalized Denoising Auto-encoders for Nonlinear Representations	Minmin Chen, Kilian Weinberger, Fei Sha, Yoshua Bengio	In this paper we present the marginalized Denoising Auto-encoder (mDAE), which (approximately) marginalizes out the corruption during training.
251	Gaussian Processes for Bayesian Estimation in Ordinary Differential Equations	David Barber, Yali Wang	We propose a Gaussian process model that directly links state derivative information with system observations, simplifying previous approaches and providing a natural generative model.
252	Fast Multi-stage Submodular Maximization	Kai Wei, Rishabh Iyer, Jeff Bilmes	We introduce a new multi-stage algorithmic framework for submodular maximization.
253	Programming by Feedback	Marc Schoenauer, Riad Akrour, Michele Sebag, Jean-Christophe Souplet	This paper advocates a new ML-based programming framework, called Programming by Feedback (PF), which involves a sequence of interactions between the active computer and the user.
254	Probabilistic Matrix Factorization with Non-random Missing Data	Jose Miguel Hernandez-Lobato, Neil Houlsby, Zoubin Ghahramani	We propose a probabilistic matrix factorization model for collaborative filtering that learns from data that is missing not at random(MNAR).
255	Pursuit-Evasion Without Regret, with an Application to Trading	Lili Dworkin, Michael Kearns, Yuriy Nevmyvaka	We propose a state-based variant of the classical online learning problem of tracking the best expert.
256	The f-Adjusted Graph Laplacian: a Diagonal Modification with a Geometric Interpretation	Sven Kurras, Ulrike Luxburg, Gilles Blanchard	Our goal is to re-weight the graph’s edges such that all cuts and volumes behave as if the graph was built on a different sample drawn from an alternative density q.
257	Riemannian Pursuit for Big Matrix Recovery	Mingkui Tan, Ivor W. Tsang, Li Wang, Bart Vandereycken, Sinno Jialin Pan	In this paper, we therefore propose an efficient method, called Riemannian Pursuit (RP), that aims to address these two problems simultaneously.
258	Dynamic Programming Boosting for Discriminative Macro-Action Discovery	Leonidas Lefakis, Francois Fleuret	Our main contribution is a novel supervised learning algorithm which extends the classical Boosting framework by combining it with dynamic programming.
259	Online Stochastic Optimization under Correlated Bandit Feedback	Mohammad Gheshlaghi azar, Alessandro Lazaric, Emma Brunskill	In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback.
260	Weighted Graph Clustering with Non-Uniform Uncertainties	Yudong Chen, Shiau Hong Lim, Huan Xu	We propose a clustering algorithm that is based on optimizing an appropriate weighted objective, where larger weights are given to observations with lower uncertainty.
261	GeNGA: A Generalization of Natural Gradient Ascent with Positive and Negative Convergence Results	Philip Thomas	In our first contribution, we derive generalized natural gradient ascent (GeNGA), a generalization of NGA which allows for positive semidefinite non-smooth metric tensors.
262	A Bayesian Framework for Online Classifier Ensemble	Qinxun Bai, Henry Lam, Stan Sclaroff	We propose a Bayesian framework for recursively estimating the classifier weights in online learning of a classifier ensemble.
263	Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm	Jacob Steinhardt, Percy Liang	We present an adaptive variant of the exponentiated gradient algorithm.
264	Gaussian Approximation of Collective Graphical Models	Liping Liu, Daniel Sheldon, Thomas Dietterich	The Collective Graphical Model (CGM) models a population of independent and identically distributed individuals when only collective statistics (i.e., counts of individuals) are observed.
265	On learning to localize objects with minimal supervision	Hyun Oh Song, Ross Girshick, Stefanie Jegelka, Julien Mairal, Zaid Harchaoui, Trevor Darrell	In this paper, we propose a new method that achieves this goal with only image-level labels of whether the objects are present or not.
266	Multiresolution Matrix Factorization	Risi Kondor, Nedelina Teneva, Vikas Garg	Inspired by ideas from multiresolution analysis, this paper introduces a new notion of matrix factorization that can capture structure in matrices at multiple different scales.
267	Learnability of the Superset Label Learning Problem	Liping Liu, Thomas Dietterich	In this paper, we analyze Empirical Risk Minimizing learners that use the superset error as the empirical risk measure.
268	Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits	Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert Schapire	We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of K \emphactions in response to the observed \emphcontext, and observes the \emphreward only for that action.
269	Structured Recurrent Temporal Restricted Boltzmann Machines	Roni Mittelman, Benjamin Kuipers, Silvio Savarese, Honglak Lee	In this work we propose a new class of RTRBM, which explicitly uses a dependency graph to model the structure in the problem and to define the energy function.
270	Scalable and Robust Bayesian Inference via the Median Posterior	Stanislav Minsker, Sanvesh Srivastava, Lizhen Lin, David Dunson	We propose a novel general approach to Bayesian inference that is scalable and robust to corruption in the data.
271	Kernel Adaptive Metropolis-Hastings	Dino Sejdinovic, Heiko Strathmann, Maria Lomeli Garcia, Christophe Andrieu, Arthur Gretton	A Kernel Adaptive Metropolis-Hastings algorithm is introduced, for the purpose of sampling from a target distribution with strongly nonlinear support.
272	Input Warping for Bayesian Optimization of Non-Stationary Functions	Jasper Snoek, Kevin Swersky, Rich Zemel, Ryan Adams	We develop a methodology for automatically learning a wide family of bijective transformations or warpings of the input space using the Beta cumulative distribution function.
273	Stochastic Gradient Hamiltonian Monte Carlo	Tianqi Chen, Emily Fox, Carlos Guestrin	In this paper, we explore the properties of such a stochastic gradient HMC approach.
274	A Deep Semi-NMF Model for Learning Hidden Representations	George Trigeorgis, Konstantinos Bousmalis, Stefanos Zafeiriou, Bjoern Schuller	In this work we propose a novel model, Deep Semi-NMF, that is able to learn such hidden representations that allow themselves to an interpretation of clustering according to different, unknown attributes of a given dataset.
275	Asynchronous Distributed ADMM for Consensus Optimization	Ruiliang Zhang, James Kwok	In this paper, we propose an asynchronous ADMM algorithm by using two conditions to control the asynchrony: partial barrier and bounded delay.
276	Spectral Regularization for Max-Margin Sequence Tagging	Ariadna Quattoni, Borja Balle, Xavier Carreras, Amir Globerson	We frame max-margin learning of latent variable structured prediction models as a convex optimization problem, making use of scoring functions computed by input-output observable operator models.
277	Learning by Stretching Deep Networks	Gaurav Pandey, Ambedkar Dukkipati	In this paper, we propose a technique, called ‘stretching’, that allows the same models to perform considerably better with very little training.
278	Nonnegative Sparse PCA with Provable Guarantees	Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros Dimakis	We introduce a novel algorithm to compute nonnegative sparse principal components of positive semidefinite (PSD) matrices.
279	Active Learning of Parameterized Skills	Bruno Da Silva, George Konidaris, Andrew Barto	We introduce a method for actively learning parameterized skills.
280	Learning Ordered Representations with Nested Dropout	Oren Rippel, Michael Gelbart, Ryan Adams	In this paper, we present results on ordered representations of data in which different dimensions have different degrees of importance.
281	Learning the Irreducible Representations of Commutative Lie Groups	Taco Cohen, Max Welling	We present a new probabilistic model of compact commutative Lie groups that produces invariant-equivariant and disentangled representations of data.
282	Towards End-To-End Speech Recognition with Recurrent Neural Networks	Alex Graves, Navdeep Jaitly	This paper presents a speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation.
283	Multi-period Trading Prediction Markets with Connections to Machine Learning	Jinli Hu, Amos Storkey	We present a new model for prediction markets, in which we use risk measures to model agents and introduce a market maker to describe the trading process.
284	Efficient Gradient-Based Inference through Transformations between Bayes Nets and Neural Nets	Diederik Kingma, Max Welling	We show that either of these types of models can often be transformed into an instance of the other, by switching between centered and differentiable non-centered parameterizations of the latent variables.
285	Neural Variational Inference and Learning in Belief Networks	Andriy Mnih, Karol Gregor	We propose a fast non-iterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior.
286	Scalable Bayesian Low-Rank Decomposition of Incomplete Multiway Tensors	Piyush Rai, Yingjian Wang, Shengbo Guo, Gary Chen, David Dunson, Lawrence Carin	We present a scalable Bayesian framework for low-rank decomposition of multiway tensor data with missing observations.
287	Beta Diffusion Trees	Creighton Heaukulani, David Knowles, Zoubin Ghahramani	We define the beta diffusion tree, a random tree structure with a set of leaves that defines a collection of overlapping subsets of objects, known as a feature allocation.
288	Learning Character-level Representations for Part-of-Speech Tagging	Cicero Dos Santos, Bianca Zadrozny	In this paper, we propose a deep neural network that learns character-level representation of words and associate them with usual word representations to perform POS tagging.
289	Saddle Points and Accelerated Perceptron Algorithms	Adams Wei Yu, Fatma Kilinc-Karzan, Jaime Carbonell	In this paper, we consider the problem of finding a linear (binary) classifier or providing a near-infeasibility certificate if there is none.
290	Robust Distance Metric Learning via Simultaneous L1-Norm Minimization and Maximization	Hua Wang, Feiping Nie, Heng Huang	As an important theoretical contribution of this paper, we systematically derive an efficient iterative algorithm to solve the general L1-norm minmax problem, which is rarely studied in literature.
291	Learning from Contagion (Without Timestamps)	Kareem Amin, Hoda Heidari, Michael Kearns	We introduce and study new models for learning from contagion processes in a network.
292	Stochastic Variational Inference for Bayesian Time Series Models	Matthew Johnson, Alan Willsky	In this paper we develop SVI algorithms for several common Bayesian time series models, namely the hidden Markov model (HMM), hidden semi-Markov model (HSMM), and the nonparametric HDP-HMM and HDP-HSMM.
293	A Clockwork RNN	Jan Koutnik, Klaus Greff, Faustino Gomez, Juergen Schmidhuber	This paper introduces a simple, yet powerful modification to the simple RNN (SRN) architecture, the Clockwork RNN (CW-RNN), in which the hidden layer is partitioned into separate modules, each processing inputs at its own temporal granularity, making computations only at its prescribed clock rate.
294	Estimating Latent-Variable Graphical Models using Moments and Likelihoods	Arun Tejasvi Chaganty, Percy Liang	In this work, we show that the method of moments in conjunction with a composite marginal likelihood objective yields consistent parameter estimates for a much broader class of directed and undirected graphical models, including loopy graphs with high treewidth.
295	Universal Matrix Completion	Srinadh Bhojanapalli, Prateek Jain	In this work, we address these issues by providing a universal recovery guarantee for matrix completion that works for a variety of sampling schemes.
296	Finding Dense Subgraphs via Low-Rank Bilinear Optimization	Dimitris Papailiopoulos, Ioannis Mitliagkas, Alexandros Dimakis, Constantine Caramanis	In this work, we develop a novel algorithm for \DkS that searches a low-dimensional space for provably good solutions.
297	Compositional Morphology for Word Representations and Language Modelling	Jan Botha, Phil Blunsom	This paper presents a scalable method for integrating compositional morphological representations into a vector-based probabilistic language model.
298	Learning Polynomials with Neural Networks	Alexandr Andoni, Rina Panigrahy, Gregory Valiant, Li Zhang	In this paper, we present several positive theoretical results to support the effectiveness of neural networks.
299	Exponential Family Matrix Completion under Structural Constraints	Suriya Gunasekar, Pradeep Ravikumar, Joydeep Ghosh	In this paper, we provide a vastly unified framework for generalized matrix completion by considering a matrix completion setting wherein the matrix entries are sampled from any member of the rich family of \textitexponential family distributions; and impose general structural constraints on the underlying matrix, as captured by a general regularizer \mathcalR(.)
300	Sample-based approximate regularization	Philip Bachman, Amir-Massoud Farahmand, Doina Precup	We introduce a method for regularizing linearly parameterized functions using general derivative-based penalties, which relies on sampling as well as finite-difference approximations of the relevant derivatives.
301	A Compilation Target for Probabilistic Programming Languages	Brooks Paige, Frank Wood	Forward inference techniques such as sequential Monte Carlo and particle Markov chain Monte Carlo for probabilistic programming can be implemented in any programming language by creative use of standardized operating system functionality including processes, forking, mutexes, and shared memory.
302	Adaptive Monte Carlo via Bandit Allocation	James Neufeld, Andras Gyorgy, Csaba Szepesvari, Dale Schuurmans	We consider the problem of sequentially choosing between a set of unbiased Monte Carlo estimators to minimize the mean-squared-error (MSE) of a final combined estimate.
303	Efficient Dimensionality Reduction for High-Dimensional Network Estimation	Safiye Celik, Benjamin Logsdon, Su-In Lee	We propose module graphical lasso (MGL), an aggressive dimensionality reduction and network estimation technique for a high-dimensional Gaussian graphical model (GGM).
304	Deterministic Anytime Inference for Stochastic Continuous-Time Markov Processes	E. Busra Celikkaya, Christian Shelton	We describe a deterministic anytime method for calculating filtered and smoothed distributions in large variable-based continuous time Markov processes.
305	Doubly Stochastic Variational Bayes for non-Conjugate Inference	Michalis Titsias, Miguel L�zaro-Gredilla	We propose a simple and effective variational inference algorithm based on stochastic optimisation that can be widely applied for Bayesian non-conjugate inference in continuous parameter spaces.
306	Efficient Learning of Mahalanobis Metrics for Ranking	Daryl Lim, Gert Lanckriet	We develop an efficient algorithm to learn a Mahalanobis distance metric by directly optimizing a ranking loss.
307	GEV-Canonical Regression for Accurate Binary Class Probability Estimation when One Class is Rare	Arpit Agarwal, Harikrishna Narasimhan, Shivaram Kalyanakrishnan, Shivani Agarwal	In this paper, we use tools from the theory of proper composite losses (Buja et al, 2005; Reid & Williamson, 2010) to construct a canonical underlying CPE loss corresponding to the GEV link, which yields a convex proper composite loss that we call the GEV-canonical loss; this loss is tailored for the task of CPE when one class is rare, and is easy to minimize using an IRLS-type algorithm similar to that used for logistic regression.
308	A reversible infinite HMM using normalised random measures	David Knowles, Zoubin Ghahramani, Konstantina Palla	We present a nonparametric prior over reversible Markov chains.
309	Structured Low-Rank Matrix Factorization: Optimality, Algorithm, and Applications to Image Processing	Benjamin Haeffele, Eric Young, Rene Vidal	In this paper we explore a matrix factorization technique suitable for large datasets that captures additional structure in the factors by using a projective tensor norm, which includes classical image regularizers such as total variation and the nuclear norm as particular cases.
310	Influence Function Learning in Information Diffusion Networks	Nan Du, Yingyu Liang, Maria Balcan, Le Song	In this paper, we exploit the insight that the influence functions in many diffusion models are coverage functions, and propose a novel parameterization of such functions using a convex combination of random basis functions.