Paper Digest: AISTATS 2015 Highlights

June 17, 2015June 18, 2020 admin

Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The International Conference on Artificial Intelligence and Statistics (AISTATS) is an interdisciplinary gathering of researchers at the intersection of computer science, artificial intelligence, machine learning, statistics, and related areas.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: AISTATS 2015 Papers

	Title	Authors	Highlight
1	Nonparametric Bayesian Factor Analysis for Dynamic Count Matrices	Ayan Acharya, Joydeep Ghosh, Mingyuan Zhou	We apply the model to text and music analysis, with state-of-the-art results.
2	Parameter Estimation of Generalized Linear Models without Assuming their Link Function	Sreangsu Acharyya, Joydeep Ghosh	We propose a parameter-recovery facilitating, jointly-convex, regularized loss functional that is optimized globally over the vector as well as the link function, with best rates possible under a first order oracle model.
3	Spectral Gap Error Bounds for Improving CUR Matrix Decomposition and the Nystr�m Method	David Anderson, Simon Du, Michael Mahoney, Christopher Melgaard, Kunming Wu, Ming Gu	Here, we introduce novel \emphspectral gap error bounds that judiciously exploit the potentially rapid spectrum decay in the input matrix, a most common occurrence in machine learning and data analysis.
4	Global Multi-armed Bandits with H�lder Continuity	Onur Atan, Cem Tekin, Mihaela Schaar	In this paper, formalize a new class of multi-armed bandit methods, Global Multi-armed Bandit (GMAB), in which arms are globally informative through a global parameter, i.e., choosing an arm reveals information about all the arms.
5	Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures	Martin Azizyan, Aarti Singh, Larry Wasserman	The method we propose is a combination of a recent approach for learning parameters of a Gaussian mixture model and sparse linear discriminant analysis (LDA).
6	Unifying Local Consistency and MAX SAT Relaxations for Scalable Inference with Rounding Guarantees	Stephen Bach, Bert Huang, Lise Getoor	We prove the equivalence of first-order local consistency relaxations and the MAX SAT relaxation of Goemans and Williamson (1994) for a class of MRFs we refer to as logical MRFs.
7	Near-optimal max-affine estimators for convex regression	Gabor Balazs, Andr�s Gy�rgy, Csaba Szepesvari	This paper considers least squares estimators for regression problems over convex, uniformly bounded, uniformly Lipschitz function classes minimizing the empirical risk over max-affine functions (the maximum of finitely many affine functions).
8	Convex Multi-Task Learning by Clustering	Aviad Barzilai, Koby Crammer	We propose a scalable optimization algorithm for finding the optimal solution.
9	Gaussian Processes for Bayesian hypothesis tests on regression functions	Alessio Benavoli, Francesca Mangili	In this paper we show that they can also be employed as a universal tool for developing a large variety of Bayesian statistical hypothesis tests for regression functions.
10	Sparse Solutions to Nonnegative Linear Systems and Applications	Aditya Bhaskara, Ananda Suresh, Morteza Zadimoghaddam	We give an efficient algorithm for finding sparse approximate solutions to linear systems of equations with nonnegative coefficients.
11	Generalized Linear Models for Aggregated Data	Avradeep Bhowmik, Joydeep Ghosh, Oluwasanmi Koyejo	Based on this relationship, we propose a simple algorithm to estimate the model parameters and individual level inferences via alternating imputation and standard generalized linear model fitting.
12	Accurate and conservative estimates of MRF log-likelihood using reverse annealing	Yuri Burda, Roger Grosse, Ruslan Salakhutdinov	We present the Reverse AIS Estimator (RAISE), a stochastic lower bound on the log-likelihood of an approximation to the original MRF model.
13	Stochastic Spectral Descent for Restricted Boltzmann Machines	David Carlson, Volkan Cevher, Lawrence Carin	We introduce a new method called “Stochastic Spectral Descent” that updates parameters in the normed space.
14	Implementable confidence sets in high dimensional regression	Alexandra Carpentier	We focus on the problem of constructing adaptive and honest confidence sets for the sparse parameter θ, i.e. we want to construct a confidence set for theta that contains theta with high probability, and that is as small as possible.
15	Online Ranking with Top-1 Feedback	Sougata Chaudhuri, Ambuj Tewari	We consider a novel top-1 feedback model: at the end of each round, the relevance score for only the top ranked object is revealed. We provide a comprehensive set of results regarding learnability under this challenging setting.
16	One-bit Compressed Sensing with the k-Support Norm	Sheng Chen, Arindam Banerjee	In this paper, we investigate 1-bit CS problems for sparse signals using the recently proposed k-support norm.
17	Efficient Second-Order Gradient Boosting for Conditional Random Fields	Tianqi Chen, Sameer Singh, Ben Taskar, Carlos Guestrin	We incorporate second-order information by deriving a Markov Chain mixing rate bound to quantify the dependencies, and introduce a gradient boosting algorithm that iteratively optimizes an adaptive upper bound of the objective function.
18	Filtered Search for Submodular Maximization with Controllable Approximation Bounds	Wenlin Chen, Yixin Chen, Kilian Weinberger	In this paper, we propose a filtered search (FS) framework that allows the user to set an arbitrary approximation bound guarantee with a “tunable knob”, from 0 (arbitrarily bad) to 1 (globally optimal).
19	Predictive Inverse Optimal Control for Linear-Quadratic-Gaussian Systems	Xiangli Chen, Brian Ziebart	In this work, we extend predictive inverse optimal control to the linear- quadratic-Gaussian control setting.
20	Exact Bayesian Learning of Ancestor Relations in Bayesian Networks	Yetian Chen, Lingjian Meng, Jin Tian	In this paper, we develop dynamic programming (DP) algorithms to compute the exact posterior probabilities of ancestor relations in Bayesian networks.
21	Model Selection for Topic Models via Spectral Decomposition	Dehua Cheng, Xinran He, Yan Liu	Topic models have achieved significant successes in analyzing large-scale text corpus.
22	The Loss Surfaces of Multilayer Networks	Anna Choromanska, MIkael Henaff, Michael Mathieu, Gerard Ben Arous, Yann LeCun	We study the connection between the highly non-convex loss function of a simple model of the fully-connected feed-forward neural network and the Hamiltonian of the spherical spin-glass model under the assumptions of: i) variable independence, ii) redundancy in network parametrization, and iii) uniformity.
23	Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions	Alexandre Defossez, Francis Bach	We consider the least-squares regression problem and provide a detailed asymptotic analysis of the performance of averaged constant-step-size stochastic gradient descent.
24	A Topic Modeling Approach to Ranking	Weicong Ding, Prakash Ishwar, Venkatesh Saligrama	We propose a topic modeling approach to the prediction of preferences in pairwise comparisons.
25	A totally unimodular view of structured sparsity	Marwa El Halabi, Volkan Cevher	This paper describes a simple framework for structured sparse recovery based on convex optimization.
26	Back to the Past: Source Identification in Diffusion Networks from Partially Observed Cascades	Mehrdad Farajtabar, Manuel Gomez Rodriguez, Mohammad Zamani, Nan Du, Hongyuan Zha, Le Song	In this paper, we tackle this problem by developing a two-stage framework, which first learns a continuous-time diffusion network based on historical diffusion traces and then identifies the source of an incomplete diffusion trace by maximizing the likelihood of the trace under the learned model.
27	Graph Approximation and Clustering on a Budget	Ethan Fetaya, Ohad Shamir, Shimon Ullman	We consider the problem of learning from a similarity matrix (such as spectral clustering and low-dimensional embedding), when computing pairwise similarities are costly, and only a limited number of entries can be observed.
28	A Sufficient Statistics Construction of Exponential Family Le �vy Measure Densities for Nonparametric Conjugate Models	Robert Finn, Brian Kulis	We seek to address the problem of obtaining a general construction of prior distributions over infi- nite dimensional spaces possessing distribu- tional properties amenable to conjugacy.
29	Computational Complexity of Linear Large Margin Classification With Ramp Loss	S�ren Frejstrup Maibing, Christian Igel	This article addresses the fundamental question about the computational complexity of determining whether there is a hypotheses class with a hypothesis such that the upper bound on the generalization error is below a certain value.
30	Learning Deep Sigmoid Belief Networks with Data Augmentation	Zhe Gan, Ricardo Henao, David Carlson, Lawrence Carin	Deep directed generative models are developed.
31	Efficient Estimation of Mutual Information for Strongly Dependent Variables	Shuyang Gao, Greg Ver Steeg, Aram Galstyan	We introduce a new estimator that is robust to local non-uniformity, works well with limited data, and is able to capture relationship strengths over many orders of magnitude.
32	On Anomaly Ranking and Excess-Mass Curves	Nicolas Goix, Anne Sabourin, St�phan Cl�men�on	Extensions to the multivariate setting are far from straightforward and it is precisely the main purpose of this paper to introduce a novel and convenient (functional) criterion for measuring the performance of a scoring function regarding the anomaly ranking task, referred to as the Excess-Mass curve (EM-curve).
33	Modeling Skill Acquisition Over Time with Sequence and Topic Modeling	Jos� Gonz�lez-Brenes	We propose three novel data-driven methods that bridge sequence modeling with topic models to infer students’ time varying knowledge.
34	Consistent Collective Matrix Completion under Joint Low Rank Structure	Suriya Gunasekar, Makoto Yamada, Dawei Yin, Yi Chang	The sample complexity requirement derived in the paper are optimum up to logarithmic factors, and significantly improve upon the requirements obtained by trivial extensions of standard matrix completion.
35	The Bayesian Echo Chamber: Modeling Social Influence via Linguistic Accommodation	Fangjian Guo, Charles Blundell, Hanna Wallach, Katherine Heller	We present the Bayesian Echo Chamber, a new Bayesian generative model for social interaction data.
36	Preserving Privacy of Continuous High-dimensional Data with Minimax Filters	Jihun Hamm	Minimax filters that achieve the optimal privacy-utility trade-off from broad families of filters and loss/classifiers are defined, and algorithms for learning the filers in batch or distributed settings are presented.
37	A Consistent Method for Graph Based Anomaly Localization	Satoshi Hara, Tetsuro Morimura, Toshihiro Takahashi, Hiroki Yanagisawa, Taiji Suzuki	In this paper, we propose an anomaly localization algorithm with a consistency guarantee on its results.
38	Metric recovery from directed unweighted graphs	Tatsunori Hashimoto, Yi Sun, Tommi Jaakkola	We analyze directed, unweighted graphs obtained from x_i∈\RR^d by connecting vertex i to j iff \|x_i – x_j\| < ε(x_i).
39	Scalable Variational Gaussian Process Classification	James Hensman, Alexander Matthews, Zoubin Ghahramani	We show how to scale the model within a variational inducing point framework, out-performing the state of the art on benchmark datasets.
40	Stochastic Structured Variational Inference	Matthew Hoffman, David Blei	We show how to relax the mean-field approximation to allow arbitrary dependencies between global parameters and local hidden variables, producing better parameter estimates by reducing bias, sensitivity to local optima, and sensitivity to hyperparameters.
41	Reliable and Scalable Variational Inference for the Hierarchical Dirichlet Process	Michael Hughes, Dae Il Kim, Erik Sudderth	We introduce a new variational inference objective for hierarchical Dirichlet process admixture models.
42	Cross-domain recommendation without shared users or items by sharing latent vector distributions	Tomoharu Iwata, Takeuchi Koh	We propose a cross-domain recommendation method for predicting the ratings of items in different domains, where neither users nor items are shared across domains.
43	Submodular Point Processes with Applications to Machine learning	Rishabh Iyer, Jeffrey Bilmes	In this paper, we analyze the computational complexity of probabilistic inference in SPPs.
44	Online Optimization : Competing with Dynamic Comparators	Ali Jadbabaie, Alexander Rakhlin, Shahin Shahrampour, Karthik Sridharan	In this paper, we address these two directions together.
45	Estimating the accuracies of multiple classifiers without labeled data	Ariel Jaffe, Boaz Nadler, Yuval Kluger	In this paper, focusing on the binary case, we present simple, computationally efficient algorithms to solve these questions.
46	Sparse Dueling Bandits	Kevin Jamieson, Sumeet Katariya, Atul Deshpande, Robert Nowak	This paper focuses on a new approach for finding the best arm according to the Borda criterion using noisy comparisons.
47	Consensus Message Passing for Layered Graphical Models	Varun Jampani, S. M. Ali Eslami, Daniel Tarlow, Pushmeet Kohli, John Winn	With these models in mind, we introduce a modification to message passing that learns to exploit their layered structure by passing ’consensus’ messages that guide inference towards good solutions.
48	Robust Cost Sensitive Support Vector Machine	Shuichi Katsumata, Akiko Takeda	In this paper we consider robust classifications and show equivalence between the regularized classifications.
49	On Approximate Non-submodular Minimization via Tree-Structured Supermodularity	Yoshinobu Kawahara, Rishabh Iyer, Jeffrey Bilmes	We address the problem of minimizing non-submodular functions where the supermodularity is restricted to tree-structured pairwise terms.
50	Sparse Submodular Probabilistic PCA	Rajiv Khanna, Joydeep Ghosh, Russell Poldrack, Oluwasanmi Koyejo	We propose a novel approach for sparse probabilistic principal component analysis, that combines a low rank representation for the latent factors and loadings with a novel sparse variational inference approach for estimating distributions of latent variables subject to sparse support constraints.
51	Latent feature regression for multivariate count data	Arto Klami, Abhishek Tripathi, Johannes Sirola, Lauri V�re, Frederic Roulland	We consider the problem of regression on multivariate count data and present a Gibbs sampler for a latent feature regression model suitable for both under- and overdispersed response variables.
52	Dimensionality estimation without distances	Matth�us Kleindessner, Ulrike Luxburg	We provide two estimators for this situation, a naive one and a more elaborate one.
53	A Bayes consistent 1-NN classifier	Aryeh Kontorovich, Roi Weiss	We show that a simple modification of the 1-nearest neighbor classifier yields a strongly Bayes consistent learner.
54	DART: Dropouts meet Multiple Additive Regression Trees	Rashmi Korlakai Vinayak, Ran Gilad-Bachrach	In this work, we explore a different approach to address the problem, that of employing dropouts, a tool that has been recently proposed in the context of learning deep neural networks.
55	On Estimating L_2^2 Divergence	Akshay Krishnamurthy, Kirthevasan Kandasamy, Barnabas Poczos, Larry Wasserman	We give a comprehensive theoretical characterization of a nonparametric estimator for the L_2^2 divergence between two continuous distributions.
56	Tensor Factorization via Matrix Factorization	Volodymyr Kuleshov, Arun Chaganty, Percy Liang	In this paper, we propose a new algorithm for CP tensor factorization that uses random projections to reduce the problem to simultaneous matrix diagonalization.
57	Low-Rank Spectral Learning with Weighted Loss Functions	Alex Kulesza, Nan Jiang, Satinder Singh	In this paper we prove that when learning predictive state representations those problematic cases disappear if we introduce a particular weighted loss function and learn using sufficiently large sets of statistics; our main result is a bound on the loss of the learned low-rank model in terms of the singular values that are discarded.
58	Symmetric Iterative Proportional Fitting	Sven Kurras	Since IPF inherently generates non-symmetric matrices, we introduce two symmetrized variants of IPF.
59	Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits	Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari	In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits.
60	Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering	Simon Lacoste-Julien, Fredrik Lindsten, Francis Bach	In this paper, we propose to replace the random sampling step in a particle filter by Frank-Wolfe optimization.
61	Particle Gibbs for Bayesian Additive Regression Trees	Balaji Lakshminarayanan, Daniel Roy, Yee Whye Teh	We present a novel sampler for BART based on the Particle Gibbs (PG) algorithm (Andrieu et al., 2010) and a top-down particle filtering algorithm for Bayesian decision trees (Lakshminarayanan et al., 2013).
62	Deeply-Supervised Nets	Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, Zhuowen Tu	We propose deeply-supervised nets (DSN), a method that simultaneously minimizes classification error and improves the directness and transparency of the hidden layer learning process.
63	Preferential Attachment in Graphs with Affinities	Jay Lee, Manzil Zaheer, Stephan G�nnemann, Alex Smola	We propose a random graph model based on both node attributes and preferential attachment.
64	Bayesian Hierarchical Clustering with Exponential Family: Small-Variance Asymptotics and Reducibility	Juho Lee, Seungjin Choi	In this paper we relax BHC into a non-probabilistic formulation, exploring smallvariance asymptotics in conjugate-exponential models.
65	Modelling Policies in MDPs in Reproducing Kernel Hilbert Space	Guy Lever, Ronnie Stafford	We present a framework for performing gradient-based policy optimization in the RKHS, deriving the functional gradient of the return for our policy, which has a simple form and can be estimated efficiently.
66	Scalable Optimization of Randomized Operational Decisions in Adversarial Classification Settings	Bo Li, Yevgeniy Vorobeychik	To overcome scalability limitations, we introduce a novel method for estimating a compact parity basis representation for the operational decision function.
67	Toward Minimax Off-policy Value Estimation	Lihong Li, Remi Munos, Csaba Szepesvari	This paper studies the off-policy evaluation problem, where one aims to estimate the value of a target policy based on a sample of observations collected by another policy.
68	Compressed Sensing with Very Sparse Gaussian Random Projections	Ping Li, Cun-Hui Zhang	In this paper, following a well-known experimental setup \citeUrl:wiki_sparse, we show that, at the same number of measurements, the recovery accuracies of our proposed method are similar to the standard L1 decoding.
69	Max-Margin Zero-Shot Learning for Multi-class Classification	Xin Li, Yuhong Guo	In this paper, we propose a semi-supervised max-margin learning framework that integrates the semi-supervised classification problem over observed classes and the unsupervised clustering problem over unseen classes together to tackle zero-shot multi-class classification.
70	Conditional Restricted Boltzmann Machines for Multi-label Learning with Incomplete Labels	Xin Li, Feipeng Zhao, Yuhong Guo	In this paper, we develop a novel conditional restricted Boltzmann machine model to address multi-label learning with incomplete labels.
71	Sparsistency of \ell_1-Regularized M-Estimators	Yen-Huan Li, Jonathan Scarlett, Pradeep Ravikumar, Volkan Cevher	For this purpose, we propose the local structured smoothness condition (LSSC) on the loss function.
72	Similarity Learning for High-Dimensional Sparse Data	Kuan Liu, Aur�lien Bellet, Fei Sha	In this paper, we propose a method that can learn efficiently similarity measure from high-dimensional sparse data.
73	Tradeoffs for Space, Time, Data and Risk in Unsupervised Learning	Mario Lucic, Mesrob Ohannessian, Amin Karbasi, Andreas Krause	Using k-means clustering as a prototypical unsupervised learning problem, we show how we can strategically summarize the data (control space) in order to trade off risk and time when data is generated by a probabilistic model.
74	Active Pointillistic Pattern Search	Yifei Ma, Dougal Sutherland, Roman Garnett, Jeff Schneider	We introduce the problem of active pointillistic pattern search (APPS), which seeks to discover regions of a domain exhibiting desired behavior with limited observations.
75	The Security of Latent Dirichlet Allocation	Shike Mei, Xiaojin Zhu	We present an efficient solution (up to local optima) using descent method and implicit functions.
76	A Spectral Algorithm for Inference in Hidden semi-Markov Models	Igor Melnyk, Arindam Banerjee	In this paper, we introduce a novel spectral algorithm to perform inference in HSMMs.
77	Efficient Training of Structured SVMs via Soft Constraints	Ofer Meshi, Nathan Srebro, Tamir Hazan	In this work we observe that relaxing these agreement constraints and replacing them with soft constraints yields a much easier optimization problem.
78	Variance Reduction via Antithetic Markov Chains	James Neufeld, Dale Schuurmans, Michael Bowling	We present a Monte Carlo integration method, antithetic Markov chain sampling (AMCS), that incorporates local Markov transitions in an underlying importance sampler.
79	Fast Function to Function Regression	Junier Oliva, William Neiswanger, Barnabas Poczos, Eric Xing, Hy Trac, Shirley Ho, Jeff Schneider	We analyze the problem of regression when both input covariates and output responses are functions from a nonparametric function class.
80	Reactive bandits with attitude	Pedro Ortega, Kee-Eung Kim, Daniel Lee	When the underlying stochastic distribution is Gaussian, we derive an analytic solution for the long run optimal player strategy for different regimes of the bandit.
81	Feature Selection for Linear SVM with Provable Guarantees	Saurabh Paul, Malik Magdon-Ismail, Petros Drineas	We give two provably accurate feature-selection techniques for the linear SVM.
82	On Theoretical Properties of Sum-Product Networks	Robert Peharz, Sebastian Tschiatschek, Franz Pernkopf, Pedro Domingos	In this paper we fill some gaps in the theoretic foundation of SPNs.
83	Robust sketching for multiple square-root LASSO problems	Vu Pham, Laurent El Ghaoui	We introduce a robust framework for solving multiple square-root LASSO problems, based on a sketch of the learning data that uses low-rank approximations.
84	Deep Exponential Families	Rajesh Ranganath, Linpeng Tang, Laurent Charlin, David Blei	We describe deep exponential families (DEFs), a class of latent variable models that are inspired by the hidden structures used in deep neural networks.
85	On the High Dimensional Power of a Linear-Time Two Sample Test under Mean-shift Alternatives	Sashank Reddi, Aaditya Ramdas, Barnabas Poczos, Aarti Singh, Larry Wasserman	The main contribution of this paper is to explicitly characterize the power of a popular nonparametric two sample test, designed for general alternatives, under a mean-shift alternative in the high-dimensional setting.
86	A Scalable Algorithm for Structured Kernel Feature Selection	Shaogang Ren, Shuai Huang, John Onofrey, Xenios Papademetris, Xiaoning Qian	In this paper we propose a stochastic optimization algorithm that can efficiently address this computational problem on account of the redundant kernel representations of the given data.
87	Learning Efficient Anomaly Detectors from K-NN Graphs	Jonathan Root, Jing Qian, Venkatesh Saligrama	We propose a non-parametric anomaly detection algorithm for high dimensional data.
88	Gamma Processes, Stick-Breaking, and Variational Inference	Anirban Roychowdhury, Brian Kulis	In this paper, we present a variational inference framework for models involving gamma process priors.
89	Direct Density-Derivative Estimation and Its Application in KL-Divergence Approximation	Hiroaki Sasaki, Yung-Kyun Noh, Masashi Sugiyama	In this paper, we give a direct method to approximate the density derivative without estimating the density itself.
90	Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields	Mark Schmidt, Reza Babanezhad, Mohamed Ahmed, Aaron Defazio, Ann Clifton, Anoop Sarkar	We describe a practical implementation that uses structure in the CRF gradient to reduce the memory requirement of this linearly-convergent stochastic gradient method, propose a non-uniform sampling scheme that substantially improves practical performance, and analyze the rate of convergence of the SAGA variant under non-uniform sampling.
91	Sensor Selection for Crowdsensing Dynamical Systems	Francois Schnitzler, Jia Yuan Yu, Shie Mannor	To achieve low estimation error, we propose a Thompson sampling approach combining submodular optimization and a scalable online variational inference algorithm to maintain the posterior distribution over the variance.
92	A Rate of Convergence for Mixture Proportion Estimation, with Application to Learning from Noisy Labels	Clayton Scott	In this work we establish a rate of convergence for mixture proportion estimation under an appropriate distributional assumption, and argue that this rate of convergence is useful for analyzing weakly supervised learning algorithms that build on MPE.
93	Inference of Cause and Effect with Unsupervised Inverse Regression	Eleni Sgouritsa, Dominik Janzing, Philipp Hennig, Bernhard Sch�lkopf	To this end, we propose a method for estimating a conditional from samples of the corresponding marginal, which we call unsupervised inverse GP regression.
94	Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence	Nihar Shah, Sivaraman Balakrishnan, Joseph Bradley, Abhay Parekh, Kannan Ramchandran, Martin Wainwright	The Bradley-Terry-Luce (BTL) and Thurstone models are the most widely used parametric models for such pairwise comparison data.
95	Exploiting Symmetries to Construct Efficient MCMC Algorithms With an Application to SLAM	Roshan Shariff, Andr�s Gy�rgy, Csaba Szepesvari	In this paper we propose a variation of the MH algorithm based on group moves, where the next state is obtained by first choosing a random transformation of the state space and then applying this transformation to the current state.
96	Learning Where to Sample in Structured Prediction	Tianlin Shi, Jacob Steinhardt, Percy Liang	In this paper, we propose a heterogeneous approach that dynamically allocates computation to the different parts.
97	State Space Methods for Efficient Inference in Student-t Process Regression	Arno Solin, Simo S�rkk�	We show how a large class of temporal TP regression models can be reformulated as state space models, and how a forward filtering and backward smoothing recursion can be derived for solving the inference analytically in linear time complexity.
98	Learning from Data with Heterogeneous Noise using SGD	Shuang Song, Kamalika Chaudhuri, Anand Sarwate	In this paper, we adopt instead a model in which data is observed through heterogeneous noise, where the noise level reflects the quality of the data source.
99	Data modeling with the elliptical gamma distribution	Suvrit Sra, Reshad Hosseini, Lucas Theis, Matthias Bethge	We study mixture modeling using the elliptical gamma (EG) distribution, a non-Gaussian distribution that allows heavy and light tail and peak behaviors.
100	WASP: Scalable Bayes via barycenters of subset posteriors	Sanvesh Srivastava, Volkan Cevher, Quoc Dinh, David Dunson	We propose a simple, general, and highly efficient approach, which first runs a posterior sampling algorithm in parallel on different machines for subsets of a large data set.
101	Calibration of conditional composite likelihood for Bayesian inference on Gibbs random fields	Julien Stoehr, Nial Friel	This paper provides a mean to calibrate the posterior distribution resulting from using a composite likelihood and illustrate its performance in several examples.
102	A Dirichlet Process Mixture Model for Spherical Data	Julian Straub, Jason Chang, Oren Freifeld, John Fisher III	For this purpose we propose a Dirichlet process mixture model of Gaussian distributions in distinct tangent spaces (DP-TGMM) to the sphere.
103	Inferring Block Structure of Graphical Models in Exponential Families	Siqi Sun, Hai Wang, Jinbo Xu	In this paper, we propose a novel generative model for describing the block structure in general exponential families, and optimize it by an Expectation-Maximization(EM) algorithm with variational Bayes.
104	Two-stage sampled learning theory on distributions	Zoltan Szabo, Arthur Gretton, Barnabas Poczos, Bharath Sriperumbudur	In this paper, we provide theoretical guarantees for a remarkably simple algorithmic alternative to solve the distribution regression problem: embed the distributions to a reproducing kernel Hilbert space, and learn a ridge regressor from the embeddings to the outputs.
105	Predicting Preference Reversals via Gaussian Process Uncertainty Aversion	Rikiya Takahashi, Tetsuro Morimura	In order to accurately predict choice decisions involving preference reversals, which existing econometric methods have failed to incorporate, the authors introduce a new cognitive choice model whose parameters are efficiently fitted with a global convex optimization algorithm.
106	Streaming Variational Inference for Bayesian Nonparametric Mixture Models	Alex Tank, Nicholas Foti, Emily Fox	We work within this general framework and present a streaming variational inference algorithm for NRM mixture models based on assumed density filtering.
107	Missing at Random in Graphical Models	Jin Tian	In this paper, we assume the missing data model is represented as a directed acyclic graph that not only encodes the dependencies among the variables but also explicitly portrays the causal mechanisms responsible for the missingness process.
108	Particle Gibbs with Ancestor Sampling for Probabilistic Programs	Jan-Willem Meent, Hongseok Yang, Vikash Mansinghka, Frank Wood	We present empirical results that demonstrate nontrivial performance gains.
109	Learning of Non-Parametric Control Policies with High-Dimensional State Features	Herke Van Hoof, Jan Peters, Gerhard Neumann	In this paper, we develop a policy search algorithm that integrates robust policy updates and kernel embeddings.
110	Maximally Informative Hierarchical Representations of High-Dimensional Data	Greg Ver Steeg, Aram Galstyan	We present bounds on how informative a representation is about input data.
111	Falling Rule Lists	Fulton Wang, Cynthia Rudin	We provide a Bayesian framework for learning falling rule lists that does not rely on traditional greedy decision tree learning methods.
112	Multi-Manifold Modeling in Non-Euclidean spaces	Xu Wang, Konstantinos Slavakis, Gilad Lerman	This paper advocates a novel framework for segmenting a dataset on a Riemannian manifold M into clusters lying around low-dimensional submanifolds of M. Important examples of M, for which the proposed algorithm is computationally efficient, include the sphere, the set of positive definite matrices, and the Grassmannian.
113	Column Subset Selection with Missing Data via Active Sampling	Yining Wang, Aarti Singh	In this paper, we propose and analyze two sampling based algorithms for column subset selection without access to the complete input matrix.
114	Trend Filtering on Graphs	Yu-Xiang Wang, James Sharpnack, Alex Smola, Ryan Tibshirani	We introduce a family of adaptive estimators on graphs, based on penalizing the \ell_1 norm of discrete graph differences.
115	A Greedy Homotopy Method for Regression with Nonconvex Constraints	Fabian Wauthier, Peter Donnelly	The goal of this paper is to estimate sparse linear regression models, where for a given partition \mathcalG of input variables, the selected variables are chosen from a \it diverse set of groups in \mathcalG.
116	Revisiting the Limits of MAP Inference by MWSS on Perfect Graphs	Adrian Weller	Here we introduce novel techniques and consider all cases, demonstrating that this greatly expands the set of tractable models.
117	Understanding and Evaluating Sparse Linear Discriminant Analysis	Yi Wu, David Wipf, Jeong-Min Yun	Linear discriminant analysis (LDA) represents a simple yet powerful technique for partitioning a p-dimensional feature vector into one of K classes based on a linear projection learned from N labeled observations.
118	Stochastic Block Transition Models for Dynamic Networks	Kevin Xu	In this paper, I propose a stochastic block transition model (SBTM) for dynamic networks that is inspired by the well-known stochastic block model (SBM) for static networks and previous dynamic extensions of the SBM.
119	Majorization-Minimization for Manifold Embedding	Zhirong Yang, Jaakko Peltonen, Samuel Kaski	We propose a new MM procedure that yields fast MM algorithms for a wide variety of manifold embedding problems.
120	A la Carte � Learning Fast Kernels	Zichao Yang, Andrew Wilson, Alex Smola, Le Song	We introduce a family of fast, flexible, general purpose, and lightly parametrized kernel learning methods, derived from Fastfood basis function expansions.
121	Minimizing Nonconvex Non-Separable Functions	Yaoliang Yu, Xun Zheng, Micol Marchetti-Bowick, Eric Xing	To address this issue, we propose a new proximal gradient meta-algorithm by rigorously extending the proximal average to the nonconvex setting.
122	A Simple Homotopy Algorithm for Compressive Sensing	Lijun Zhang, Tianbao Yang, Rong Jin, Zhi-Hua Zhou	In this paper, we consider the problem of recovering the s largest elements of an arbitrary vector from noisy measurements.
123	Scalable Nonparametric Multiway Data Analysis	Shandian Zhe, Zenglin Xu, Xinqi Chu, Yuan Qi, Youngja Park	To address these issues, we propose a scalable nonparametric tensor decomposition model.
124	Infinite Edge Partition Models for Overlapping Community Detection and Link Prediction	Mingyuan Zhou	A hierarchical gamma process infinite edge partition model is proposed to factorize the binary adjacency matrix of an unweighted undirected relational network under a Bernoulli-Poisson link.
125	Power-Law Graph Cuts	Xiangyang Zhou, Jiaxin Zhang, Brian Kulis	To achieve our goals, we treat the Pitman-Yor exchangeable partition probability function (EPPF) as a regularizer to graph cut objectives.
126	The Log-Shift Penalty for Adaptive Estimation of Multiple Gaussian Graphical Models	Yuancheng Zhu, Rina Foygel Barber	To estimate multiple related Gaussian graphical models on the same set of variables, we formulate a hierarchical model, which leads to an optimization problem with a nonconvex log-shift penalty function.