Paper Digest: AISTATS 2013 Highlights
The International Conference on Artificial Intelligence and Statistics (AISTATS) is an interdisciplinary gathering of researchers at the intersection of computer science, artificial intelligence, machine learning, statistics, and related areas.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
team@paperdigest.org
TABLE 1: AISTATS 2013 Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | Bayesian learning of joint distributions of objects | Anjishnu Banerjee, Jared Murray, David Dunson | We consider a general framework for nonparametric Bayes joint modeling through mixture models that incorporate dependence across data types through a joint mixing measure. |
2 | Permutation estimation and minimax rates of identifiability | Olivier Collier, Arnak Dalalyan | We address this problem from a statistical point of view and provide a theoretical analysis of the accuracy of several natural estimators. |
3 | A unifying representation for a class of dependent random measures | Nicholas Foti, Joseph Futoma, Daniel Rockmore, Sinead Williamson | We present a general construction for dependent random measures based on thinning Poisson processes on an augmented space. |
4 | Diagonal Orthant Multinomial Probit Models | James Johndrow, David Dunson, Kristian Lum | To address these problems, we propose a new class of diagonal orthant (DO) multinomial models. |
5 | Distributed Learning of Gaussian Graphical Models via Marginal Likelihoods | Zhaoshi Meng, Dennis Wei, Ami Wiesel, Alfred Hero III | In this paper, we propose a general framework for distributed estimation based on a maximum marginal likelihood (MML) approach. |
6 | Sparse Principal Component Analysis for High Dimensional Multivariate Time Series | Zhaoran Wang, Fang Han, Han Liu | We study sparse principal component analysis (sparse PCA) for high dimensional multivariate vector autoregressive (VAR) time series. |
7 | A Competitive Test for Uniformity of Monotone Distributions | Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, Ananda Suresh | We propose a test that takes random samples drawn from a monotone distribution and decides whether or not the distribution is uniform. |
8 | Clustering Oligarchies | Margareta Ackerman, Shai Ben-David, David Loker, Sivan Sabato | k-means and several related techniques are robust when data is clusterable, and we provide a quantitative analysis capturing the precise relationship between clusterability and robustness. |
9 | Reconstructing ecological networks with hierarchical Bayesian regression and Mondrian processes | Andrej Aderhold, Dirk Husmeier, V. Anne Smith | Here, we describe a novel Bayesian regression and Mondrian process model (BRAMP) for reconstructing species interaction networks from observed field data. |
10 | Nystrom Approximation for Large-Scale Determinantal Processes | Raja Hafiz Affandi, Alex Kulesza, Emily Fox, Ben Taskar | In this paper we derive new error bounds for the Nystrom-approximated DPP and present empirical results to corroborate them. |
11 | Further Optimal Regret Bounds for Thompson Sampling | Shipra Agrawal, Navin Goyal | In this paper, we provide a novel regret analysis for Thompson Sampling that proves the first near-optimal problem-independent bound of O(\sqrtNT\ln T) on the expected regret of this algorithm. |
12 | Distributed and Adaptive Darting Monte Carlo through Regenerations | Sungjin Ahn, Yutian Chen, Max Welling | We propose an adaptive and distributed version of this method by using regenerations. |
13 | Consensus Ranking with Signed Permutations | Raman Arora, Marina Meila | This paper presents a tractable algorithm for learning consensus ranking between signed permutations under the inversion distance. |
14 | Ultrahigh Dimensional Feature Screening via RKHS Embeddings | Krishnakumar Balasubramanian, Bharath Sriperumbudur, Guy Lebanon | To overcome these issues, in this paper, we propose a novel Hilbert space embedding based approach to independence screening for ultrahigh dimensional data sets. |
15 | Meta-Transportability of Causal Effects: A Formal Approach | Elias Bareinboim, Judea Pearl | This paper considers the problem of transferring experimental findings learned from multiple heterogeneous domains to a different environment, in which only passive observations can be collected. |
16 | Convex Collective Matrix Factorization | Guillaume Bouchard, Dawei Yin, Shengbo Guo | Existing algorithms to estimate parameters of collective matrix factorization models are based on non-convex formulations of the problem; in this paper, a convex formulation of this approach is proposed. |
17 | Efficiently Sampling Probabilistic Programs via Program Analysis | Arun Chaganty, Aditya Nori, Sriram Rajamani | In this paper, we address two key challenges of this paradigm: (i) ensuring samples are well distributed in the combinatorial space of the program, and (ii) efficiently generating samples with minimal rejection. |
18 | Computing the M Most Probable Modes of a Graphical Model | Chao Chen, Vladimir Kolmogorov, Yan Zhu, Dimitris Metaxas, Christoph Lampert | We present two algorithms for solving the M-modes problem. We introduce the M-modes problem for graphical models: predicting the M label configurations of highest probability that are at the same time local maxima of the probability landscape. |
19 | A simple criterion for controlling selection bias | Eunice Yuh-Jie Chen, Judea Pearl | This paper presents a simple criterion for controlling selection bias in the odds ratio, a widely used measure for association between variables, that connects the nature of selection bias with the graph modeling the selection mechanism. |
20 | Evidence Estimation for Bayesian Partially Observed MRFs | Yutian Chen, Max Welling | For the first time we propose a comprehensive procedure to address one of the Bayesian estimation problems, approximating the evidence of partially observed MRFs based on the Laplace approximation. |
21 | Why Steiner-tree type algorithms work for community detection | Mung Chiang, Henry Lam, Zhenming Liu, Vincent Poor | We consider the problem of reconstructing a specific connected community S ⊂V in a graph G = (V, E), where each node v is associated with a signal whose strength grows with the likelihood that v belongs to S. |
22 | A simple sketching algorithm for entropy estimation over streaming data | Peter Clifford, Ioana Cosma | We propose a family of asymptotically unbiased log-mean estimators of the Shannon entropy, indexed by a constant ζ> 0, that can be computed in a single-pass algorithm to provide an additive approximation. |
23 | Deep Gaussian Processes | Andreas Damianou, Neil Lawrence | In this paper we introduce deep Gaussian process (GP) models. |
24 | ODE parameter inference using adaptive gradient matching with Gaussian processes | Frank Dondelinger, Dirk Husmeier, Simon Rogers, Maurizio Filippone | The present paper discusses a method based on nonparametric Bayesian statistics with Gaussian processes due to Calderhead et al. (2008), and shows how inference in this model can be substantially improved by consistently sampling from the joint distribution of the ODE parameters and GP hyperparameters. |
25 | Uncover Topic-Sensitive Information Diffusion Networks | Nan Du, Le Song, Hyenkyun Woo, Hongyuan Zha | In this paper, we propose a continuous time model, TopicCascade, for topic-sensitive information diffusion networks, and infer the hidden diffusion networks and the topic dependent transmission rates from the observed time stamps and contents of cascades. |
26 | Stochastic blockmodeling of relational event dynamics | Christopher DuBois, Carter Butts, Padhraic Smyth | Several approaches have recently been proposed for modeling of continuous-time network data via dyadic event rates conditioned on the observed history of events and nodal or dyadic covariates. |
27 | Dynamic Copula Networks for Modeling Real-valued Time Series | Elad Eban, Gideon Rothschild, Adi Mizrahi, Israel Nelken, Gal Elidan | In this work we introduce Dynamic Copula Bayesian Networks, a generalization aimed at capturing the distribution of rich temporal sequences. |
28 | Data-driven covariate selection for nonparametric estimation of causal effects | Doris Entner, Patrik Hoyer, Peter Spirtes | In this contribution, we analyze the problem of inferring whether a given variable has a causal effect on another and, if it does, inferring an adjustment set of covariates that yields a consistent and unbiased estimator of this effect, based on the (conditional) independence and dependence relationships among the observed variables. |
29 | Learning to Top-K Search using Pairwise Comparisons | Brian Eriksson | In this paper we introduce techniques to resolve the top ranked items using significantly fewer than all the possible pairwise comparisons using both random and adaptive sampling methodologies. |
30 | Predictive Correlation Screening: Application to Two-stage Predictor Design in High Dimension | Hamed Firouzi, Bala Rajaratnam, Alfred Hero III | We introduce a new approach to variable selection, called Predictive Correlation Screening, for predictor design. |
31 | Mixed LICORS: A Nonparametric Algorithm for Predictive State Reconstruction | Georg Goerg, Cosma Shalizi | We introduce mixed LICORS, an algorithm for learning nonlinear, high-dimensional dynamics from spatio-temporal data, suitable for both prediction and simulation. |
32 | Unsupervised Link Selection in Networks | Quanquan Gu, Charu Aggarwal, Jiawei Han | In order to solve it efficiently, we propose a backward elimination algorithm using sequential optimization. |
33 | Clustered Support Vector Machines | Quanquan Gu, Jiawei Han | In this paper, we propose a Clustered Support Vector Machine (CSVM), which tackles the data in a divide and conquer manner. |
34 | DivMCuts: Faster Training of Structural SVMs with Diverse M-Best Cutting-Planes | Abner Guzman-Rivera, Pushmeet Kohli, Dhruv Batra | To find these diverse M-Best solutions, we employ a recently proposed algorithm [4]. |
35 | Recursive Karcher Expectation Estimators And Geometric Law of Large Numbers | Jeffrey Ho, Guang Cheng, Hesamoddin Salehian, Baba Vemuri | Specifically, we propose a recursive algorithm for estimating the Karcher expectation of an arbitrary distribution defined on Pn, and we show that the estimates computed by the recursive algorithm asymptotically converge in probability to the correct Karcher expectation. |
36 | DYNACARE: Dynamic Cardiac Arrest Risk Estimation | Joyce Ho, Yubin Park, Carlos Carvalho, Joydeep Ghosh | In this paper, we present two dynamic cardiac risk estimation models, focusing on different temporal signatures in a patient’s risk trajectory. |
37 | Active Learning for Interactive Visualization | Tomoharu Iwata, Neil Houlsby, Zoubin Ghahramani | We propose an active learning framework for interactive visualization which selects objects for the user to re-locate so that they can obtain their desired visualization by re-locating as few as possible. |
38 | A Parallel, Block Greedy Method for Sparse Inverse Covariance Estimation for Ultra-high Dimensions | Prabhanjan Kambadur, Aurelie Lozano | In this paper, we present GINCO, a blocked greedy method for sparse inverse covariance matrix estimation. |
39 | Beyond Sentiment: The Manifold of Human Emotions | Seungyeon Kim, Fuxin Li, Guy Lebanon, Irfan Essa | In this paper we consider higher dimensional extensions of the sentiment concept, which represent a richer set of human emotions. |
40 | Exact Learning of Bounded Tree-width Bayesian Networks | Janne Korhonen, Pekka Parviainen | In this paper we aim to lay groundwork for future research on the topic by studying the exact complexity of this problem. |
41 | Structural Expectation Propagation (SEP): Bayesian structure learning for networks with latent variables | Nevena Lazic, Christopher Bishop, John Winn | Learning the structure of discrete Bayesian networks has been the subject of extensive research in machine learning, with most Bayesian approaches focusing on fully observed networks. |
42 | Structure Learning of Mixed Graphical Models | Jason Lee, Trevor Hastie | We present a new pairwise model for graphical models with both continuous and discrete variables that is amenable to structure learning. |
43 | Dynamic Scaled Sampling for Deterministic Constraints | Lei Li, Bharath Ramsundar, Stuart Russell | For the general continuous case, we propose a dynamic scaling algorithm (DYSC), and prove that it has O(k) expected running time and finite variance. |
44 | Learning Markov Networks With Arithmetic Circuits | Daniel Lowd, Amirmohammad Rooshenas | In this paper, we introduce ACMN, the first ever method for learning efficient Markov networks with arbitrary conjunctive features. |
45 | Texture Modeling with Convolutional Spike-and-Slab RBMs and Deep Extensions | Heng Luo, Pierre Luc Carrier, Aaron Courville, Yoshua Bengio | We show the resulting deep belief network (DBN) is a powerful generative model that improves on single-layer models and is capable of modeling not only single high-resolution and challenging textures but also multiple textures with fixed-size filters in the bottom layer. |
46 | Fast Near-GRID Gaussian Process Regression | Yuancheng Luo, Ramani Duraiswami | In practice, the inputs may also violate the multidimensional grid constraints so we pose and efficiently solve missing and extra data problems for both exact and sparse grid GPR. |
47 | Estimating the Partition Function of Graphical Models Using Langevin Importance Sampling | Jianzhu Ma, Jian Peng, Sheng Wang, Jinbo Xu | This paper describes a Langevin Importance Sampling (LIS) algorithm to compute the partition function of a graphical model. |
48 | Thompson Sampling in Switching Environments with Bayesian Online Change Detection | Joseph Mellor, Jonathan Shapiro | In this paper we derive and evaluate algorithms using Thompson Sampling for a Switching Multi-Armed Bandit Problem. |
49 | A Last-Step Regression Algorithm for Non-Stationary Online Learning | Edward Moroshko, Koby Crammer | We analyze the algorithm in the worst-case regret framework and show that it maintains an average loss close to that of the best slowly changing sequence of linear functions, as long as the total of drift is sublinear. |
50 | Competing with an Infinite Set of Models in Reinforcement Learning | Phuong Nguyen, Odalric-Ambrym Maillard, Daniil Ryabko, Ronald Ortner | The algorithm we propose avoids guessing the diameter, thus improving the regret bound. |
51 | Efficient Variational Inference for Gaussian Process Regression Networks | Trung Nguyen, Edwin Bonilla | In this paper we propose two efficient variational inference methods for GPRNs. |
52 | High-dimensional Inference via Lipschitz Sparsity-Yielding Regularizers | Zheng Pan, Changshui Zhang | In this paper, we prove that some non-convex regularizers can be such "good" regularizers. |
53 | Bayesian Structure Learning for Functional Neuroimaging | Mijung Park, Oluwasanmi Koyejo, Joydeep Ghosh, Russell Poldrack, Jonathan Pillow | Here we develop a flexible, hierarchical model designed to simultaneously capture spatial block sparsity and smoothness in neuroimaging data. |
54 | Random Projections for Support Vector Machines | Saurabh Paul, Christos Boutsidis, Malik Magdon-Ismail, Petros Drineas | We present extensive experiments with real and synthetic data to support our theory. |
55 | Distribution-Free Distribution Regression | Barnabas Poczos, Aarti Singh, Alessandro Rinaldo, Larry Wasserman | In this paper we develop theory and methods for distribution-free versions of distribution regression. |
56 | Localization and Adaptation in Online Learning | Alexander Rakhlin, Ohad Shamir, Karthik Sridharan | We introduce a formalism of localization for online learning problems, which, similarly to statistical learning theory, can be used to obtain fast rates. |
57 | A recursive estimate for the predictive likelihood in a topic model | James Scott, Jason Baldridge | We propose a fast algorithm for approximating this likelihood, one whose computational cost is linear both in document length and in the number of topics. |
58 | Detecting Activations over Graphs using Spanning Tree Wavelet Bases | James Sharpnack, Aarti Singh, Akshay Krishnamurthy | To this end, we introduce the spanning tree wavelet basis over a graph, a localized basis that reflects the topology of the graph. |
59 | Changepoint Detection over Graphs with the Spectral Scan Statistic | James Sharpnack, Aarti Singh, Alessandro Rinaldo | We consider the change-point detection problem of deciding, based on noisy measurements, whether an unknown signal over a given graph is constant or is instead piecewise constant over two induced subgraphs of relatively low cut size. |
60 | Central Limit Theorems for Conditional Markov Chains | Mathieu Sinn, Bei Chen | Central Limit Theorems for Conditional Markov Chains |
61 | Statistical Tests for Contagion in Observational Social Network Studies | Greg Ver Steeg, Aram Galstyan | We demonstrate a general method to lower bound the strength of causal effects in observational social network studies, even in the presence of arbitrary, unobserved individual traits. |
62 | Completeness Results for Lifted Variable Elimination | Nima Taghipour, Daan Fierens, Guy Van den Broeck, Jesse Davis, Hendrik Blockeel | Various methods for lifted probabilistic inference have been proposed, but our understanding of these methods and the relationships between them is still limited, compared to their propositional counterparts. |
63 | Supervised Sequential Classification Under Budget Constraints | Kirill Trapeznikov, Venkatesh Saligrama | In this paper we develop a framework for a sequential decision making under budget constraints for multi-class classification. |
64 | On the Asymptotic Optimality of Maximum Margin Bayesian Networks | Sebastian Tschiatschek, Franz Pernkopf | For specific classes of MMBNs, i.e. MMBNs with fully connected graphs and discrete-valued nodes, we show Bayes consistency for binary-class problems and a sufficient condition for Bayes consistency in the multi-class case. |
65 | Collapsed Variational Bayesian Inference for Hidden Markov Models | Pengyu Wang, Phil Blunsom | In this paper we propose two collapsed variational Bayesian inference algorithms for hidden Markov models, a popular framework for representing time series data. |
66 | Block Regularized Lasso for Multivariate Multi-Response Linear Regression | Weiguang Wang, Yingbin Liang, Eric Xing | Block Regularized Lasso for Multivariate Multi-Response Linear Regression |
67 | Bethe Bounds and Approximating the Global Optimum | Adrian Weller, Tony Jebara | Applying these to discretized pseudo-marginals in the associative case, we present a polynomial time approximation scheme for global optimization of the Bethe free energy provided the maximum degree ∆=O(\log n), where n is the number of variables. |
68 | Dual Decomposition for Joint Discrete-Continuous Optimization | Christopher Zach | We analyse convex formulations for combined discrete-continuous MAP inference using the dual decomposition method. |
69 | Learning Social Infectivity in Sparse Low-rank Networks Using Multi-dimensional Hawkes Processes | Ke Zhou, Hongyuan Zha, Le Song | We propose a convex optimization approach to discover the hidden network of social influence by modeling the recurrent events at different individuals as multi-dimensional Hawkes processes. |
70 | Greedy Bilateral Sketch, Completion & Smoothing | Tianyi Zhou, Dacheng Tao | We detail how to model and solve low-rank approximation, matrix completion and robust PCA in GreB’s paradigm. |
71 | Scoring anomalies: a M-estimation formulation | St�phan Cl�men�on, J�r�mie Jakubowicz | It is the purpose of this paper to formulate the issue of scoring multivariate observations depending on their degree of abnormality/novelty as an unsupervised learning task. |