Paper Digest: AISTATS 2017 Highlights

June 17, 2017June 18, 2020 admin

Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The International Conference on Artificial Intelligence and Statistics (AISTATS) is an interdisciplinary gathering of researchers at the intersection of computer science, artificial intelligence, machine learning, statistics, and related areas.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: AISTATS 2017 Papers

	Title	Authors	Highlight
1	Minimax Gaussian Classification & Clustering	Tianyang Li, Xinyang Yi, Constantine Carmanis, Pradeep Ravikumar	We present minimax bounds for classification and clustering error in the setting where covariates are drawn from a mixture of two isotropic Gaussian distributions.
2	Conditions beyond treewidth for tightness of higher-order LP relaxations	Mark Rowland, Aldo Pacchiano, Adrian Weller	We consider binary pairwise models and introduce new methods which allow us to demonstrate refined conditions for tightness of LP relaxations in the Sherali-Adams hierarchy.
3	Large-Scale Data-Dependent Kernel Approximation	Catalin Ionescu, Alin Popa, Cristian Sminchisescu	Here we derive an approximate large-scale learning procedure for data-dependent kernels that is efficient and performs well in practice.
4	Clustering from Multiple Uncertain Experts	Yale Chang, Junxiang Chen, Michael Cho, Peter Castaldi, Ed Silverman, Jennifer Dy	To model the uncertainty in constraints from different experts, we build a probabilistic model for pairwise constraints through jointly modeling each expert’s accuracy and the mapping from features to latent cluster assignments.
5	Online Nonnegative Matrix Factorization with General Divergences	Renbo Zhao, Vincent Tan, Huan Xu	We develop a unified and systematic framework for performing online nonnegative matrix factorization under a wide variety of important divergences.
6	ASAGA: Asynchronous Parallel SAGA	R�mi Leblond, Fabian Pedregosa, Simon Lacoste-Julien	We describe ASAGA, an asynchronous parallel version of the incremental gradient algorithm SAGA that enjoys fast linear convergence rates.
7	Lower Bounds on Active Learning for Graphical Model Selection	Jonathan Scarlett, Volkan Cevher	We consider the problem of estimating the underlying graph associated with a Markov random field, with the added twist that the decoding algorithm can iteratively choose which subsets of nodes to sample based on the previous samples, resulting in an active learning setting.
8	Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach	Dohyung Park, Anastasios Kyrillidis, Constantine Carmanis, Sujay Sanghavi	In this paper, we complement recent findings on the non-convex geometry of the analogous PSD setting [5], and show that matrix factorization does not introduce any spurious local minima, under RIP.
9	Sparse Accelerated Exponential Weights	Pierre Gaillard, Olivier Wintenberger	We introduce SAEW, a new procedure that accelerates exponential weights procedures with the slow rate $1/\sqrtT$ to procedures achieving the fast rate $1/T$.
10	On the Learnability of Fully-Connected Neural Networks	Yuchen Zhang, Jason Lee, Martin Wainwright, Michael I. Jordan	In this paper, we characterize the learnability of fully-connected neural networks via both positive and negative results.
11	An Information-Theoretic Route from Generalization in Expectation to Generalization in Probability	Ibrahim Alabdulmohsin	In this paper, we answer this question by proving that, while a generalization in expectation does not imply a generalization in probability, a uniform generalization in expectation does imply concentration.
12	Nearly Instance Optimal Sample Complexity Bounds for Top-k Arm Selection	Lijie Chen, Jian Li, Mingda Qiao	In this paper, we make progress towards a complete characterization of the instance-wise sample complexity bounds for the Best-k-Arm problem.
13	Guaranteed Non-convex Optimization: Submodular Maximization over Continuous Domains	Andrew An Bian, Baharan Mirzasoleiman, Joachim Buhmann, Andreas Krause	Specifically, i) We introduce the weak DR property that gives a unified characterization of submodularity for all set, integer-lattice and continuous functions; ii) for maximizing monotone DR-submodular continuous functions under general down-closed convex constraints, we propose a Frank-Wolfe variant with (1-1/e) approximation guarantee, and sub-linear convergence rate; iii) for maximizing general non-monotone submodular continuous functions subject to box constraints, we propose a DoubleGreedy algorithm with 1/3 approximation guarantee.
14	Tensor-Dictionary Learning with Deep Kruskal-Factor Analysis	Andrew Stevens, Yunchen Pu, Yannan Sun, Gregory Spell, Lawrence Carin	A multi-way factor analysis model is introduced for tensor-variate data of any order.
15	Consistent and Efficient Nonparametric Different-Feature Selection	Satoshi Hara, Takayuki Katsuki, Hiroki Yanagisawa, Takafumi Ono, Ryo Okamoto, Shigeki Takeuchi	We propose a feature selection method to find features that describe a difference in two probability distributions.
16	Annular Augmentation Sampling	Francois Fagan, Jalaj Bhandari, John Cunningham	In this work, we introduce an auxiliary variable MCMC scheme that samples from an annular augmented space, translating to a great circle path around the hypercube of the binary sample space.
17	Less than a Single Pass: Stochastically Controlled Stochastic Gradient	Lihua Lei, Michael Jordan	We develop and analyze a procedure for gradient-based optimization that we refer to as stochastically controlled stochastic gradient (SCSG).
18	Learning Time Series Detection Models from Temporally Imprecise Labels	Roy Adams, Ben Marlin	In this paper, we consider a new low-quality label learning problem: learning time series detection models from temporally imprecise labels.
19	Learning Cost-Effective and Interpretable Treatment Regimes	Himabindu Lakkaraju, Cynthia Rudin	In this work, we aim to automate this task of learning cost-effective, interpretable and actionable treatment regimes.
20	Linear Thompson Sampling Revisited	Marc Abeille, Alessandro Lazaric	We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting.
21	A Sub-Quadratic Exact Medoid Algorithm	James Newling, Francois Fleuret	We present a new algorithm, ‘trimed’ for obtaining the medoid of a set, that is the element of the set which minimises the mean distance to all other elements.
22	Minimax Density Estimation for Growing Dimension	Daniel McDonald	This paper presents minimax rates for density estimation when the data dimension $d$ is allowed to grow with the number of observations $n$ rather than remaining fixed as in previous analyses.
23	Estimating Density Ridges by Direct Estimation of Density-Derivative-Ratios	Hiroaki Sasaki, Takafumi Kanamori, Masashi Sugiyama	To overcome these problems, we propose a novel method that directly estimates the ratios without going through density estimation and division.
24	Learning Theory for Conditional Risk Minimization	Alexander Zimin, Christoph Lampert	In this work we study the learnability of stochastic processes with respect to the conditional risk, i.e. the existence of a learning algorithm that improves its next-step performance with the amount of observed data.
25	Near-optimal Bayesian Active Learning with Correlated and Noisy Tests	Yuxin Chen, Hamed Hassani, Andreas Krause	We propose ECED, a novel, efficient active learning algorithm, and prove strong theoretical guarantees that hold with correlated, noisy tests.
26	Learning Nash Equilibrium for General-Sum Markov Games from Batch Data	Julien Perolat, Florian Strub, Bilal Piot, Olivier Pietquin	In this paper, we introduce a new definition of $ε$-Nash equilibrium in MGs which grasps the strategy’s quality for multiplayer games.
27	Distance Covariance Analysis	Benjamin Cowley, Joao Semedo, Amin Zandvakili, Matthew Smith, Adam Kohn, Byron Yu	We propose a dimensionality reduction method to identify linear projections that capture interactions between two or more sets of variables.
28	Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation	Sohail Bahmani, Justin Romberg	We propose a flexible convex relaxation for the phase retrieval problem that operates in the natural domain of the signal.
29	Regret Bounds for Lifelong Learning	Pierre Alquier, The Tien Mai, Massimiliano Pontil	We propose a lifelong learning strategy which refines the underlying data representation used by the within-task algorithm, thereby transferring information from one task to the next.
30	Poisson intensity estimation with reproducing kernels	Seth Flaxman, Yee Whye Teh, Dino Sejdinovic	In this paper we develop a new, computationally tractable Reproducing Kernel Hilbert Space (RKHS) formulation for the inhomogeneous Poisson process.
31	Generalized Pseudolikelihood Methods for Inverse Covariance Estimation	Alnur Ali, Kshitij Khare, Sang-Yun Oh, Bala Rajaratnam	We present a fast algorithm as well as screening rules that make computing the PseudoNet estimate over a range of tuning parameters tractable.
32	Removing Phase Transitions from Gibbs Measures	Ian Fellows, Mark Handcock	We introduce a modification to the Gibbs distribution that reduces the effects of phase transitions, and with properly chosen hyper-parameters, provably removes all multiphase behavior.
33	Performance Bounds for Graphical Record Linkage	Rebecca C. Steorts, Mattew Barnes, Willie Neiswanger	We provide an upper bound using the KL divergence and a lower bound on the minimum probability of misclassifying a latent entity.
34	Regret Bounds for Transfer Learning in Bayesian Optimisation	Alistair Shilton, Sunil Gupta, Santu Rana, Svetha Venkatesh	The second algorithm proposes a new way to model the difference between the source and target as a Gaussian process which is then used to adapt the source data.
35	Scaling Submodular Maximization via Pruned Submodularity Graphs	Tianyi Zhou, Hua Ouyang, Jeff Bilmes, Yi Chang, Carlos Guestrin	We propose a new random pruning method (called “submodular sparsification (SS)”) to reduce the cost of submodular maximization.
36	Localized Lasso for High-Dimensional Regression	Makoto Yamada, Takeuchi Koh, Tomoharu Iwata, John Shawe-Taylor, Samuel Kaski	We introduce the localized Lasso, which learns models that both are interpretable and have a high predictive power in problems with high dimensionality d and small sample size n.
37	Encrypted Accelerated Least Squares Regression	Pedro Esperanca, Louis Aslett, Chris Holmes	In this paper we present detailed analysis of coordinate and accelerated gradient descent algorithms which are capable of fitting least squares and penalised ridge regression models, using data encrypted under a fully homomorphic encryption scheme.
38	Random Consensus Robust PCA	Daniel Pimentel-Alarcon, Robert Nowak	This paper presents R2PCA, a random consensus method for robust principal component analysis.
39	Gray-box Inference for Structured Gaussian Process Models	Pietro Galliani, Amir Dezfouli, Edwin Bonilla, Novi Quadrianto	We develop an automated variational inference method for Bayesian structured prediction problems with Gaussian process (GP) priors and linear-chain likelihoods.
40	Frank-Wolfe Algorithms for Saddle Point Problems	Gauthier Gidel, Tony Jebara, Simon Lacoste-Julien	We extend the Frank-Wolfe (FW) optimization algorithm to solve constrained smooth convex-concave saddle point (SP) problems.
41	A Framework for Optimal Matching for Causal Inference	Nathan Kallus	We propose a novel framework for matching estimators for causal effect from observational data that is based on minimizing the dual norm of estimation error when expressed as an operator.
42	Quantifying the accuracy of approximate diffusions and Markov chains	Jonathan Huggins, James Zou	With the growth of large-scale datasets, the computational cost associated with simulating these stochastic processes can be considerable, and many algorithms have been proposed to approximate the underlying Markov chain or diffusion.
43	Stochastic Rank-1 Bandits	Sumeet Katariya, Branislav Kveton, Csaba Szepesvari, Claire Vernade, Zheng Wen	We propose a computationally-efficient algorithm for solving our problem, which we call Rank1Elim.
44	On the Troll-Trust Model for Edge Sign Prediction in Social Networks	G�raud Le Falher, Nicolo Cesa-Bianchi, Claudio Gentile, Fabio Vitale	We show that these heuristics can be understood, and rigorously analyzed, as approximators to the Bayes optimal classifier for a simple probabilistic model of the edge labels.
45	Online Optimization of Smoothed Piecewise Constant Functions	Vincent Cohen-Addad, Varun Kanade	We give algorithms that achieve sublinear regret in the full information and bandit settings.
46	Combinatorial Topic Models using Small-Variance Asymptotics	Ke Jiang, Suvrit Sra, Brian Kulis	In contrast, we approach topic modeling via combinatorial optimization, and take a small-variance limit of LDA to derive a new objective function.
47	ConvNets with Smooth Adaptive Activation Functions for Regression	Le Hou, Dimitris Samaras, Tahsin Kurc, Yi Gao, Joel Saltz	In this paper, we propose and apply AAFs on CNNs for regression tasks. We empirically evaluated CNNs with SAAFs and achieved state-of-the-art results on age and pose estimation datasets.
48	Rapid Mixing Swendsen-Wang Sampler for Stochastic Partitioned Attractive Models	Sejun Park, Yunhun Jang, Andreas Galanis, Jinwoo Shin, Daniel Stefankovic, Eric Vigoda	In this paper, we study the Swendsen-Wang dynamics which is a more sophisticated Markov chain designed to overcome bottlenecks that impede Gibbs sampler.
49	Efficient Rank Aggregation via Lehmer Codes	Pan Li, Arya Mazumdar, Olgica Milenkovic	We propose a novel rank aggregation method based on converting permutations into their corresponding Lehmer codes or other subdiagonal images.
50	Nonlinear ICA of Temporally Dependent Stationary Sources	Aapo Hyvarinen, Hiroshi Morioka	We introduce a nonlinear generative model where the independent sources are assumed to be temporally dependent, non-Gaussian, and stationary, and we observe arbitrarily nonlinear mixtures of them.
51	Stochastic Difference of Convex Algorithm and its Application to Training Deep Boltzmann Machines	Atsushi Nitanda, Taiji Suzuki	In this paper, we propose a stochastic variant of DC algorithm and give computational complexities to converge to a stationary point under several situations.
52	Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot	Prateek Jain, Chi Jin, Sham Kakade, Praneeth Netrapalli	A key contribution of our work is the general proof technique which we believe should further excite research in understanding deterministic and stochastic variants of simple non-convex gradient descent algorithms with good global convergence rates for other problems in machine learning and numerical linear algebra.
53	Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms	Christian Naesseth, Francisco Ruiz, Scott Linderman, David Blei	We propose a new method that lets us leverage reparameterization gradients even when variables are outputs of a acceptance-rejection sampling algorithm.
54	Asymptotically exact inference in differentiable generative models	Matthew Graham, Amos Storkey	We present a method for performing efficient MCMC inference in such models when conditioning on observations of the model output.
55	Decentralized Collaborative Learning of Personalized Models over Networks	Paul Vanhaesebrouck, Aur�lien Bellet, Marc Tommasi	The question addressed in this paper is: how can agents improve upon their locally trained model by communicating with other agents that have similar objectives?
56	Contextual Bandits with Latent Confounders: An NMF Approach	Rajat Sen, Karthikeyan Shanmugam, Murat Kocaoglu, Alex Dimakis, Sanjay Shakkottai	This insight enables us to propose an $ε$-greedy NMF-Bandit algorithm that designs a sequence of interventions (selecting specific arms), that achieves a balance between learning this low-dimensional structure and selecting the best arm to minimize regret.
57	Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets	Aaron Klein, Stefan Falkner, Simon Bartels, Philipp Hennig, Frank Hutter	To accelerate hyperparameter optimization, we propose a generative model for the validation error as a function of training set size, which is learned during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset.
58	Least-Squares Log-Density Gradient Clustering for Riemannian Manifolds	Mina Ashizawa, Hiroaki Sasaki, Tomoya Sakai, Masashi Sugiyama	In this paper, we combine these ideas and propose a novel mode-seeking algorithm for Riemannian manifolds with direct density-gradient estimation.
59	Fast column generation for atomic norm regularization	Marina Vinyes, Guillaume Obozinski	We consider optimization problems that consist in minimizing a quadratic function under an atomic norm regularization or constraint.
60	Bayesian Hybrid Matrix Factorisation for Data Integration	Thomas Brouwer, Pietro Lio	We introduce a novel Bayesian hybrid matrix factorisation model (HMF) for data integration, based on combining multiple matrix factorisation methods, that can be used for in- and out-of-matrix prediction of missing values.
61	Co-Occurring Directions Sketching for Approximate Matrix Multiply	Youssef Mroueh, Etienne Marcheret, Vaibahava Goel	We introduce co-occurring directions sketching, a deterministic algorithm for approximate matrix product (AMM), in the streaming model.
62	Exploration-Exploitation in MDPs with Options	Ronan Fruit, Alessandro Lazaric	In this paper, we derive an upper and lower bound on the regret of a variant of UCRL using options.
63	Local Perturb-and-MAP for Structured Prediction	Gedas Bertasius, Qiang Liu, Lorenzo Torresani, Jianbo Shi	In this work, we present a new Local Perturb-and-MAP (locPMAP) framework that replaces the global optimization with a local optimization by exploiting our observed connection between locPMAP and the pseudolikelihood of the original CRF model.
64	Gradient Boosting on Stochastic Data Streams	Hanzhang Hu, Wen Sun, Arun Venkatraman, Martial Hebert, Andrew Bagnell	In this work, we investigate the problem of adapting batch gradient boosting for minimizing convex loss functions to online setting where the loss at each iteration is i.i.d sampled from an unknown distribution.
65	Online Learning and Blackwell Approachability with Partial Monitoring: Optimal Convergence Rates	Joon Kwon, Vianney Perchet	We construct, for the first time, approachability algorithms with convergence rate of order $O(T^-1/2)$ when the signal is independent of the decision and of order $O(T^-1/3)$ in the case of general signals.
66	Tensor Decompositions via Two-Mode Higher-Order SVD (HOSVD)	Miaoyan Wang, Yun Song	Here, we present a new method built on Kruskal’s uniqueness theorem to decompose symmetric, nearly orthogonally decomposable tensors.
67	Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers	Meelis Kull, Telmo Silva Filho, Peter Flach	In this paper we solve all these problems with a richer class of calibration maps based on the beta distribution.
68	Detecting Dependencies in Sparse, Multivariate Databases Using Probabilistic Programming and Non-parametric Bayes	Feras Saad, Vikash Mansinghka	This paper proposes an approach that combines probabilistic programming, information theory, and non-parametric Bayes.
69	High-dimensional Time Series Clustering via Cross-Predictability	Dezhi Hong, Quanquan Gu, Kamin Whitehouse	In this paper, we explore a new similarity metric called “cross-predictability”: the degree to which a future value in each time series is predicted by past values of the others.
70	Minimax Approach to Variable Fidelity Data Interpolation	Alexey Zaytsev, Evgeny Burnaev	In this paper we obtain minimax interpolation errors for single and variable fidelity scenarios for a multivariate Gaussian process regression.
71	Data Driven Resource Allocation for Distributed Learning	Travis Dick, Mu Li, Venkata Krishna Pillutla, Colin White, Nina Balcan, Alex Smola	We present an in-depth analysis of this model, providing new algorithms with provable worst-case guarantees, analysis proving existing scalable heuristics perform well in natural non worst-case conditions, and techniques for extending a dispatching rule from a small sample to the entire distribution.
72	Learning Nonparametric Forest Graphical Models with Prior Information	Yuancheng Zhu, Zhe Liu, Siqi Sun	We present a framework for incorporating prior information into nonparametric estimation of graphical models.
73	Sparse Randomized Partition Trees for Nearest Neighbor Search	Kaushik Sinha, Omid Keivani	Inspired by the fast Johnson-Lindenstrauss transform, in this paper, we propose a sparse version of randomized partition tree where each internal node needs to store only a few non-zero entries, as opposed to all $d$ entries, leading to significant space savings without sacrificing much in terms of nearest neighbor search accuracy.
74	Horde of Bandits using Gaussian Markov Random Fields	Sharan Vaswani, Mark Schmidt, Laks Lakshmanan	Despite its effectiveness, the existing GOB model can only be applied to small problems due to its quadratic time-dependence on the number of nodes.
75	Random projection design for scalable implicit smoothing of randomly observed stochastic processes	Francois Belletti, Evan Sparks, Alexandre Bayen, Joseph Gonzalez	In this paper we present a novel estimator for cross-covariance of randomly observed time series which unravels the dynamics of an unobserved stochastic process.
76	Trading off Rewards and Errors in Multi-Armed Bandits	Akram Erraqabi, Alessandro Lazaric, Michal Valko, Emma Brunskill, Yun-En Liu	In this paper, we formalize this tradeoff and introduce the ForcingBalance algorithm whose performance is provably close to the best possible tradeoff strategy.
77	Adaptive ADMM with Spectral Penalty Parameter Selection	Zheng Xu, Mario Figueiredo, Tom Goldstein	We tackle this weakness of ADMM by proposing a method that adaptively tunes the penalty parameter to achieve fast convergence.
78	The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits	Tor Lattimore, Csaba Szepesvari	We analyse the asymptotic regret and show matching upper and lower bounds on what is achievable.
79	Dynamic Collaborative Filtering With Compound Poisson Factorization	Ghassen Jerfel, Mehmet Basbug, Barbara Engelhardt	Here, we propose a new conjugate and numerically stable dynamic matrix factorization (DCPF) based on hierarchical Poisson factorization that models the smoothly drifting latent factors using gamma-Markov chains.
80	Rank Aggregation and Prediction with Item Features	Kai-Yang Chiang, Cho-Jui Hsieh, Inderjit Dhillon	Observing that traditional rank aggregation methods disregard features, while models adapted from learning-to-rank task are sensitive to feature noise, we propose a general model to learn a total ranking by balancing between comparisons and feature information jointly.
81	Robust and Efficient Computation of Eigenvectors in a Generalized Spectral Method for Constrained Clustering	Chengming Jiang, Huiqing Xie, Zhaojun Bai	In this paper, we provide solutions to these two critical issues.
82	Information-theoretic limits of Bayesian network structure learning	Asish Ghoshal, Jean Honorio	In this paper, we study the information-theoretic limits of learning the structure of Bayesian networks (BNs), on discrete as well as continuous random variables, from a finite number of samples.
83	Markov Chain Truncation for Doubly-Intractable Inference	Colin Wei, Iain Murray	We demonstrate how to construct unbiased estimates for 1/Z given access to black-box importance sampling estimators for Z.
84	Regression Uncertainty on the Grassmannian	Yi Hong, Xiao Yang, Roland Kwitt, Martin Styner, Marc Niethammer	This paper develops an approach to compute confidence intervals for geodesic regression models.
85	Attributing Hacks	Ziqi Liu, Alex Smola, Kyle Soska, Yu-Xiang Wang, Qinghua Zheng	In this paper, we describe an algorithm for estimating the provenance of hacks on websites.
86	Unsupervised Sequential Sensor Acquisition	Manjesh Hanawal, Csaba Szepesvari, Venkatesh Saligrama	Our objective is to learn strategies for selecting tests to optimize accuracy and costs.
87	A Stochastic Nonconvex Splitting Method for Symmetric Nonnegative Matrix Factorization	Songtao Lu, Mingyi Hong, Zhengdao Wang	In this paper, we consider a stochastic SymNMF problem in which the observation matrix is generated in a random and sequential manner.
88	Hierarchically-partitioned Gaussian Process Approximation	Byung-Jun Lee, Jongmin Lee, Kee-Eung Kim	In this paper, we introduce a hierarchical model based on local GP for large-scale datasets, which stacks inducing points over inducing points in layers.
89	Scalable Learning of Non-Decomposable Objectives	Elad Eban, Mariano Schain, Alan Mackey, Ariel Gordon, Ryan Rifkin, Gal Elidan	In this work we present a unified framework that, using straightforward building block bounds, allows for highly scalable optimization of a wide range of ranking-based objectives.
90	CPSG-MCMC: Clustering-Based Preprocessing method for Stochastic Gradient MCMC	Tianfan Fu, Zhihua Zhang	In this paper, we propose an effective subsampling strategy to reduce the variance based on a failed attempt to do importance sampling.
91	Comparison-Based Nearest Neighbor Search	Siavash Haghiri, Debarghya Ghoshdastidar, Ulrike von Luxburg	We focus on a simple yet effective algorithm that recursively splits the space by first selecting two random pivot points and then assigning all other points to the closer of the two (comparison tree).
92	A Unified Optimization View on Generalized Matching Pursuit and Frank-Wolfe	Francesco Locatello, Rajiv Khanna, Michael Tschannen, Martin Jaggi	In this paper we take a unified view on both classes of methods, leading to the first explicit convergence rates of matching pursuit methods in an optimization sense, for general sets of atoms.
93	Faster Coordinate Descent via Adaptive Importance Sampling	Dmytro Perekrestenko, Volkan Cevher, Martin Jaggi	In this work, we introduce new adaptive rules for the random selection of their updates.
94	Conjugate-Computation Variational Inference : Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models	Mohammad Khan, Wu Lin	In this paper, we propose a new algorithm called Conjugate-computation Variational Inference (CVI) which brings the best of the two worlds together – it uses conjugate computations for the conjugate terms and employs stochastic gradients for the rest.
95	Hit-and-Run for Sampling and Planning in Non-Convex Spaces	Yasin Abbasi-Yadkori, Peter Bartlett, Victor Gabillon, Alan Malek	We propose the Hit-and-Run algorithm for planning and sampling problems in non- convex spaces.
96	DP-EM: Differentially Private Expectation Maximization	Mijung Park, James Foulds, Kamalika Choudhary, Max Welling	We propose a practical private EM algorithm that overcomes this challenge using two innovations: (1) a novel moment perturbation formulation for differentially private EM (DP-EM), and (2) the use of two recently developed composition methods to bound the privacy “cost” of multiple EM iterations: the moments accountant (MA) and zero-mean concentrated differential privacy (zCDP).
97	On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior	Juho Piironen, Aki Vehtari	The horseshoe prior has proven to be a noteworthy alternative for sparse Bayesian estimation, but as shown in this paper, the results can be sensitive to the prior choice for the global shrinkage hyperparameter.
98	Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems	Scott Linderman, Matthew Johnson, Andrew Miller, Ryan Adams, David Blei, Liam Paninski	Building on switching linear dynamical systems (SLDS), we develop a model class and Bayesian inference algorithms that not only discover these dynamical units but also, by learning how transition probabilities depend on observations or continuous latent states, explain their switching behavior.
99	Efficient Algorithm for Sparse Tensor-variate Gaussian Graphical Models via Gradient Descent	Pan Xu, Tingting Zhang, Quanquan Gu	In order to estimate the precision matrices, we propose a sparsity constrained maximum likelihood estimator.
100	Minimax-optimal semi-supervised regression on unknown manifolds	Amit Moscovich, Ariel Jaffe, Nadler Boaz	We consider semi-supervised regression when the predictor variables are drawn from an unknown manifold.
101	Improved Strongly Adaptive Online Learning using Coin Betting	Kwang-Sung Jun, Francesco Orabona, Stephen Wright, Rebecca Willett	This paper describes a new parameter-free online learning algorithm for changing environments.
102	Black-box Importance Sampling	Qiang Liu, Jason Lee	We address this problem by studying black-box importance sampling methods that calculate importance weights for samples generated from any unknown proposal or black-box mechanism.
103	Fairness Constraints: Mechanisms for Fair Classification	Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rogriguez, Krishna P. Gummadi	In this paper, we introduce a flexible mechanism to design fair classifiers by leveraging a novel intuitive measure of decision boundary (un)fairness.
104	Frequency Domain Predictive Modelling with Aggregated Data	Avradeep Bhowmik, Joydeep Ghosh, Oluwasanmi Koyejo	In this manuscript we investigate the problem of predictive linear modelling in the scenario where data is aggregated in a non-uniform manner across targets and features.
105	A Unified Computational and Statistical Framework for Nonconvex Low-rank Matrix Estimation	Lingxiao Wang, Xiao Zhang, Quanquan Gu	We propose a unified framework for estimating low-rank matrices through nonconvex optimization based on gradient descent algorithm.
106	A New Class of Private Chi-Square Hypothesis Tests	Ryan Rogers, Daniel Kifer	In this paper, we develop new test statistics for hypothesis testing over differentially private data.
107	A Learning Theory of Ranking Aggregation	Anna Korba, St�phan Clemencon, Eric Sibony	This paper develops a statistical learning theory for ranking aggregation in a general probabilistic setting (avoiding any rigid ranking model assumptions), assessing the generalization ability of empirical ranking medians.
108	Anomaly Detection in Extreme Regions via Empirical MV-sets on the Sphere	Albert Thomas, St�phan Clemencon, Alexandre Gramfort, Anne Sabourin	This paper presents an unsupervised algorithm for anomaly detection in extreme regions.
109	Structured adaptive and random spinners for fast machine learning computations	Mariusz Bojarski, Anna Choromanska, Krzysztof Choromanski, Francois Fagan, Cedric Gouy-Pailler, Anne Morvan, Nouri Sakr, Tamas Sarlos, Jamal Atif	The proposed framework comes with theoretical guarantees characterizing the capacity of the structured model in reference to its unstructured counterpart and is based on a general theoretical principle that we describe in the paper.
110	Complementary Sum Sampling for Likelihood Approximation in Large Scale Classification	Aleksandar Botev, Bowen Zheng, David Barber	We consider training probabilistic classifiers in the case that the number of classes is too large to perform exact normalisation over all classes.
111	Learning Optimal Interventions	Jonas Mueller, David Reshef, George Du, Tommi Jaakkola	Our goal is to identify beneficial interventions from observational data.
112	A Lower Bound on the Partition Function of Attractive Graphical Models in the Continuous Case	Nicholas Ruozzi	In this work, we use graph covers to extend several such results from the discrete case to the continuous case.
113	Scalable Variational Inference for Super Resolution Microscopy	Ruoxi Sun, Evan Archer, Liam Paninski	In this paper we develop new Bayesian image processing methods that extend the reach of super-resolution microscopy even further.
114	Linear Convergence of Stochastic Frank Wolfe Variants	Donald Goldfarb, Garud Iyengar, Chaoxu Zhou	In this paper, we show that the Away-step Stochastic Frank-Wolfe (ASFW) and Pairwise Stochastic Frank-Wolfe (PSFW) algorithms converge linearly in expectation.
115	Sequential Graph Matching with Sequential Monte Carlo	Seong-Hwan Jun, Samuel W.K. Wong, James Zidek, Alexandre Bouchard-Cote	We develop a novel probabilistic model for graph matchings and develop practical inference methods for supervised and unsupervised learning of the parameters of this model.
116	Fast rates with high probability in exp-concave statistical learning	Nishant Mehta	We present an algorithm for the statistical learning setting with a bounded exp-concave loss in d dimensions that obtains excess risk $O(d \log(1/δ)/n)$ with probability $1 – δ$.
117	Generalization Error of Invariant Classifiers	Jure Sokolic, Raja Giryes, Guillermo Sapiro, Miguel Rodrigues	This paper studies the generalization error of invariant classifiers.
118	Learning with Feature Feedback: from Theory to Practice	Stefanos Poulis, Sanjoy Dasgupta	In this paper, we examine a particular type of feature feedback that has been used, with some success, in information retrieval and in computer vision.
119	Optimistic Planning for the Stochastic Knapsack Problem	Ciara Pike-Burke, Steffen Grunewalder	We derive and study an optimistic planning algorithm specifically designed for the stochastic knapsack problem.
120	Identifying Groups of Strongly Correlated Variables through Smoothed Ordered Weighted $L_1$-norms	Raman Sankaran, Francis Bach, Chiranjib Bhattacharya	In this paper we take a submodular perspective and show that OWL can be posed as the Lovász extension of a suitably defined submodular function.
121	Tracking Objects with Higher Order Interactions via Delayed Column Generation	Shaofei Wang, Steffen Wolf, Charless Fowlkes, Julian Yarkony	We present a relaxation of this combinatorial problem that uses a column generation formulation where the pricing problem is solved via dynamic programming to efficiently explore the space of tracks.
122	Belief Propagation in Conditional RBMs for Structured Prediction	Wei Ping, Alex Ihler	In this work, we present a matrix-based implementation of belief propagation algorithms on CRBMs, which is easily scalable to tens of thousands of visible and hidden units.
123	Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data	Jialei Wang, Jason Lee, Mehrdad Mahdavi, Mladen Kolar, Nati Srebro	In this paper, we study sketching from an optimization point of view.
124	Finite-sum Composition Optimization via Variance Reduced Gradient Descent	Xiangru Lian, Mengdi Wang, Ji Liu	In this paper, we consider the finite-sum scenario for composition optimization: $\min_x f (x) := \frac1n \sum_i = 1^n F_i \left( \frac1m \sum_j = 1^m G_j (x) \right)$.
125	A Fast and Scalable Joint Estimator for Learning Multiple Related Sparse Gaussian Graphical Models	Beilun Wang, Ji Gao, Yanjun Qi	We propose a novel approach, FASJEM for \underlinefast and \underlinescalable \underlinejoint structure-\underlineestimation of \underlinemultiple sGGMs at a large scale.
126	Communication-efficient Distributed Sparse Linear Discriminant Analysis	Lu Tian, Quanquan Gu	We propose a communication-efficient distributed estimation method for sparse linear discriminant analysis (LDA) in the high dimensional regime.
127	Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage	Alp Yurtsever, Madeleine Udell, Joel Tropp, Volkan Cevher	It presents the first algorithm that uses optimal storage and provably computes a low-rank approximation of a solution.
128	Modal-set estimation with an application to clustering	Heinrich Jiang, Samory Kpotufe	We present a procedure that can estimate – with statistical consistency guarantees – any local-maxima of a density, under benign distributional conditions.
129	Compressed Least Squares Regression revisited	Martin Slawski	As a fix, we subsequently present a modified analysis with meaningful implications that much better reflects empirical results with simulated and real data.
130	Diverse Neural Network Learns True Target Functions	Bo Xie, Yingyu Liang, Le Song	In this paper, we answer these questions by analyzing one-hidden-layer neural networks with ReLU activation, and show that despite the non-convexity, neural networks with diverse units have no spurious local minima.
131	Local Group Invariant Representations via Orbit Embeddings	Anant Raj, Abhishek Kumar, Youssef Mroueh, Tom Fletcher, Bernhard Schoelkopf	We consider transformations that form a group and propose an approach based on kernel methods to derive local group invariant representations.
132	Relativistic Monte Carlo	Xiaoyu Lu, Valerio Perrone, Leonard Hasenclever, Yee Whye Teh, Sebastian Vollmer	In order to alleviate these problems we propose relativistic Hamiltonian Monte Carlo, a version of HMC based on relativistic dynamics that introduces a maximum velocity on particles.
133	Thompson Sampling for Linear-Quadratic Control Problems	Marc Abeille, Alessandro Lazaric	We consider the exploration-exploitation tradeoff in linear quadratic (LQ) control problems, where the state dynamics is linear and the cost function is quadratic in states and controls.
134	Fast Classification with Binary Prototypes	Kai Zhong, Ruiqi Guo, Sanjiv Kumar, Bowei Yan, David Simcha, Inderjit Dhillon	In this work, we propose a new technique for \emphfast k-nearest neighbor (k-NN) classification in which the original database is represented via a small set of learned binary prototypes.
135	Prediction Performance After Learning in Gaussian Process Regression	Johan Wagberg, Dave Zachariah, Thomas Schon, Petre Stoica	This paper considers the quantification of the prediction performance in Gaussian process regression.
136	Communication-Efficient Learning of Deep Networks from Decentralized Data	Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, Blaise Aguera y Arcas	We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets.
137	Learning Structured Weight Uncertainty in Bayesian Neural Networks	Shengyang Sun, Changyou Chen, Lawrence Carin	In this paper, we consider the matrix variate Gaussian (MVG) distribution to model structured correlations within the weights of a DNN.
138	Signal-based Bayesian Seismic Monitoring	David Moore, Stuart Russell	We formulate this task as Bayesian inference and propose a generative model of seismic events and signals across a network of spatially distributed stations.
139	Learning the Network Structure of Heterogeneous Data via Pairwise Exponential Markov Random Fields	Youngsuk Park, David Hallac, Stephen Boyd, Jure Leskovec	Here, we define the pairwise exponential Markov random field (PE-MRF), an approach capable of modeling exponential family distributions in heterogeneous domains.
140	Discovering and Exploiting Additive Structure for Bayesian Optimization	Jacob Gardner, Chuan Guo, Kilian Weinberger, Roman Garnett, Roger Grosse	We propose an efficient algorithm based on Metropolis-Hastings sampling and demonstrate its efficacy empirically on synthetic and real-world data sets.
141	Lipschitz Density-Ratios, Structured Data, and Data-driven Tuning	Samory Kpotufe	Lipschitz Density-Ratios, Structured Data, and Data-driven Tuning
142	Spatial Decompositions for Large Scale SVMs	Philipp Thomann, Ingrid Blaschzyk, Mona Meister, Ingo Steinwart	In this work we investigate a decomposition strategy that learns on small, spatially defined data chunks.
143	Inference Compilation and Universal Probabilistic Programming	Tuan Anh Le, Atilim Gunes Baydin, Frank Wood	We introduce a method for using deep neural networks to amortize the cost of inference in models from the family induced by universal probabilistic programming languages, establishing a framework that combines the strengths of probabilistic programming and deep learning methods.
144	Active Positive Semidefinite Matrix Completion: Algorithms, Theory and Applications	Aniruddha Bhargava, Ravi Ganti, Rob Nowak	In this paper we provide simple, computationally efficient, active algorithms for completion of symmetric positive semidefinite matrices.
145	Information Projection and Approximate Inference for Structured Sparse Variables	Rajiv Khanna, Joydeep Ghosh, Rusell Poldrack, Oluwasanmi Koyejo	This manuscript goes beyond classical sparsity by proposing efficient algorithms for approximate inference via information projection that are applicable to any structure on the set of variables that admits enumeration using matroid or knapsack constraints.
146	On the Interpretability of Conditional Probability Estimates in the Agnostic Setting	Yihan Gao, Aditya Parameswaran, Jian Peng	In this paper, we define a novel measure for the calibration property together with its empirical counterpart, and prove an uniform convergence result between them.
147	Linking Micro Event History to Macro Prediction in Point Process Models	Yichen Wang, Xiaojing Ye, Haomin Zhou, Hongyuan Zha, Le Song	In this paper, we propose a unifying framework with a jump stochastic differential equation model that systematically links the microscopic event data and macroscopic inference, and the theory to approximate its probability distribution.
148	Initialization and Coordinate Optimization for Multi-way Matching	Da Tang, Tony Jebara	We propose a coordinate update algorithm that directly optimizes the target objective.
149	Optimal Recovery of Tensor Slices	Vivek Farias, Andrew Li	We consider the problem of large scale matrix recovery given side information in the form of additional matrices of conforming dimension.
150	Efficient Online Multiclass Prediction on Graphs via Surrogate Losses	Alexander Rakhlin, Karthik Sridharan	We develop computationally efficient algorithms for online multi-class prediction.
151	Distribution of Gaussian Process Arc Lengths	Justin Bewsher, Alessandra Tosi, Michael Osborne, Stephen Roberts	We present the first treatment of the arc length of the GP with more than a single output dimension.
152	Distributed Adaptive Sampling for Kernel Matrix Approximation	Daniele Calandriello, Alessandro Lazaric, Michal Valko	In this paper, we introduce SQUEAK, a new algorithm for kernel approximation based on RLS sampling that \emphsequentially processes the dataset, storing a dictionary which creates accurate kernel matrix approximations with a number of points that only depends on the effective dimension $d_eff(γ)$ of the dataset.
153	Binary and Multi-Bit Coding for Stable Random Projections	Ping Li	In this paper, we develop an estimation procedure for the $l_α$ norm of the signal, where $0<α\leq2$ from binary or multi-bit measurements.
154	Spectral Methods for Correlated Topic Models	Forough Arabshahi, Anima Anandkumar	In this paper we propose guaranteed spectral methods for learning a broad range of topic models, which generalize the popular Latent Dirichlet Allocation (LDA).
155	Label Filters for Large Scale Multilabel Classification	Alexandru Niculescu-Mizil, Ehsan Abbasnejad	To alleviate this problem we propose a two step approach where computationally efficient label filters pre-select a small set of candidate labels before the base multiclass or multilabel classifier is applied.
156	Learning from Conditional Distributions via Dual Embeddings	Bo Dai, Niao He, Yunpeng Pan, Byron Boots, Le Song	To address these challenges, we propose a novel approach which employs a new min-max reformulation of the learning from conditional distribution problem.
157	Sequential Multiple Hypothesis Testing with Type I Error Control	Alan Malek, Sumeet Katariya, Yinlam Chow, Mohammad Ghavamzadeh	This work studies multiple hypothesis testing in the setting when we obtain data sequentially and may choose when to stop sampling.
158	A Maximum Matching Algorithm for Basis Selection in Spectral Learning	Ariadna Quattoni, Xavier Carreras, Matthias Gall�	We present a solution to scale spectral algorithms for learning sequence functions.
159	Value-Aware Loss Function for Model-based Reinforcement Learning	Amir-Massoud Farahmand, Andre Barreto, Daniel Nikovski	We introduce a loss function that takes the structure of the value function into account.
160	Convergence Rate of Stochastic k-means	Cheng Tang, Claire Monteleoni	We analyze online (Bottou & Bengio, 1994) and mini-batch (Sculley, 2010) k-means variants.
161	Automated Inference with Adaptive Batches	Soham De, Abhay Yadav, David Jacobs, Tom Goldstein	We propose alternative “big batch” SGD schemes that adaptively grow the batch size over time to maintain a nearly constant signal-to-noise ratio in the gradient approximation.
162	Scalable Convex Multiple Sequence Alignment via Entropy-Regularized Dual Decomposition	Jiong Zhang, Ian En-Hsu Yen, Pradeep Ravikumar, Inderjit Dhillon	In this work, we propose an accelerated dual decomposition algorithm that exploits entropy regularization to induce closed-form solutions for each atomic-norm-constrained subproblem, giving a single-loop algorithm of iteration complexity linear to the problem size (total length of all sequences).
163	Robust Causal Estimation in the Large-Sample Limit without Strict Faithfulness	Ioan Gabriel Bucur, Tom Claassen, Tom Heskes	We introduce an alternative approach by replacing strict faithfulness with a prior that reflects the existence of many ’weak’ (irrelevant) and ’strong’ interactions.
164	Learning Graphical Games from Behavioral Data: Sufficient and Necessary Conditions	Asish Ghoshal, Jean Honorio	In this paper we obtain sufficient and necessary conditions on the number of samples required for exact recovery of the pure-strategy Nash equilibria (PSNE) set of a graphical game from noisy observations of joint actions.
165	Non-Count Symmetries in Boolean & Multi-Valued Prob. Graphical Models	Ankit Anand, Ritesh Noothigattu, Parag Singla, Mausam	In this paper, we present first algorithms to compute non-count symmetries in both Boolean-valued and multi-valued domains.
166	Greedy Direction Method of Multiplier for MAP Inference of Large Output Domain	Xiangru Huang, Ian En-Hsu Yen, Ruohan Zhang, Qixing Huang, Pradeep Ravikumar, Inderjit Dhillon	In this paper, we introduce an effective MAP inference method for problems with large output domains.
167	Scalable Greedy Feature Selection via Weak Submodularity	Rajiv Khanna, Ethan Elenberg, Alex Dimakis, Sahand Negahban, Joydeep Ghosh	In this paper we show that divergent from previously held opinion, submodularity is not required to obtain approximation guarantees for these two algorithms.