Paper Digest: KDD 2013 Highlights

August 1, 2013June 25, 2020 admin

ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) is one of the top data mining conferences in the world.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: KDD 2013 Papers

	Title	Authors	Highlight
1	Scale-out beyond map-reduce	Raghu Ramakrishnan, Team Members CISL	Until recently, data was gathered for well-defined objectives such as auditing, forensics, reporting and line-of-business operations; now, exploratory and predictive analysis is becoming ubiquitous.
2	The online revolution: education for everyone	Andrew Ng, Daphne Koller	In this talk, I’ll report on this far-reaching experiment in education, and why we believe this model can provide both an improved classroom experience for our on-campus students, via a flipped classroom model, as well as a meaningful learning experience for the millions of students around the world who would otherwise never have access to education of this quality.
3	Optimization in learning and data analysis	Stephen J. Wright	We discuss research on several areas in this domain, including signal reconstruction, manifold learning, and regression/classification, describing in each case recent research in which optimization algorithms have been developed and applied successfully.
4	Predicting the present with search engine data	Hal Varian	We illustrate how one can use Google search data to nowcast economic metrics of interest, and discuss some of the ramifications for research and policy.
5	One theme in all views: modeling consensus topics in multiple contexts	Jian Tang, Ming Zhang, Qiaozhu Mei	In this paper we explore a different direction.
6	Representing documents through their readers	Khalid El-Arini, Min Xu, Emily B. Fox, Carlos Guestrin	By assuming that a user’s labels correspond to topics in the articles he shares, we can learn a labeled dictionary from a training corpus of articles shared on Twitter.
7	Text-based measures of document diversity	Kevin Bache, David Newman, Padhraic Smyth	In this paper we present a text-based framework for quantifying how diverse a document is in terms of its content.
8	Diversity maximization under matroid constraints	Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur	Aggregator websites typically present documents in the form of representative clusters.
9	Connecting users across social media sites: a behavioral-modeling approach	Reza Zafarani, Huan Liu	This paper aims to address the cross-media user identification problem.
10	Automatic selection of social media responses to news	Tadej Štajner, Bart Thomee, Ana-Maria Popescu, Marco Pennacchiotti, Alejandro Jaimes	We propose a near-optimal solution to the underlying optimization problem, which leverages the submodularity property of the objective function.
11	Estimating sharer reputation via social data calibration	Jaewon Yang, Bee-Chung Chen, Deepak Agarwal	To correct for such biases, we propose to utilize an additional data source that provides unbiased goodness estimates for a small set of shared items, and calibrate biased social data through a novel multi-level hierarchical model that describes how the unbiased data and biased data are jointly generated according to sharer reputation scores.
12	Linking named entities in Tweets with knowledge base via user interest modeling	Wei Shen, Jianyong Wang, Ping Luo, Min Wang	In this paper, we propose KAURI, a graph-based framework to collectively link all the named entity mentions in all tweets posted by a user via modeling the user’s topics of interest.
13	TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC	Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, Jeong-Hoon Lee, Min-Soo Kim, Jinha Kim, Hwanjo Yu	In this paper, we propose a general, disk-based graph engine called TurboGraph to process billion-scale graphs very efficiently by using modern hardware on a single PC.
14	Beyond myopic inference in big data pipelines	Karthik Raman, Adith Swaminathan, Johannes Gehrke, Thorsten Joachims	We propose a novel model for reasoning across components of Big Data Pipelines in a probabilistically well-founded manner.
15	Big data analytics with small footprint: squaring the cloud	John Canny, Huasha Zhao	This paper describes the BID Data Suite, a collection of hardware, software and design patterns that enable fast, large-scale data mining at very low cost. We present several benchmark problems to show how the above elements combine to yield multiple orders-of-magnitude improvements for each problem.
16	Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees	Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, Maria Tsiarli	In this paper, we define a novel density function, which gives subgraphs of much higher quality than densest subgraphs: the graphs found by our method are compact, dense, and with smaller diameter.
17	Guided learning for role discovery (GLRD): framework, algorithms, and applications	Sean Gilpin, Tina Eliassi-Rad, Ian Davidson	We provide an alternating least squares framework that allows convex constraints to be placed on the role discovery problem, which can provide useful supervision.
18	Redundancy-aware maximal cliques	Jia Wang, James Cheng, Ada Wai-Chee Fu	In this paper, we aim at providing a concise and complete summary of the set of maximal cliques, which is useful to many applications.
19	Selective sampling on graphs for classification	Quanquan Gu, Charu Aggarwal, Jialu Liu, Jiawei Han	In this paper, motivated by the ubiquity of graph representations in real-world applications, we propose to study selective sampling on graphs.
20	Density-based logistic regression	Wenlin Chen, Yixin Chen, Yi Mao, Baolong Guo	This paper introduces a nonlinear logistic regression model for classification.
21	MI2LS: multi-instance learning from multiple informationsources	Dan Zhang, Jingrui He, Richard Lawrence	Out of a similar motivation, to incorporate the consistencies between different information sources into MIL, we propose a novel research framework — Multi-Instance Learning from Multiple Information Sources (MI²LS).
22	Querying discriminative and representative samples for batch mode active learning	Zheng Wang, Jieping Ye	In this paper, we generalize the empirical risk minimization principle to the active learning setting.
23	SVM	Harikrishna Narasimhan, Shivani Agarwal	In this paper, we develop a new support vector method, SVM_pAUC^tight, that optimizes a tighter convex upper bound on the partial AUC loss, which leads to both improved accuracy and reduced computational complexity.
24	Succinct interval-splitting tree for scalable similarity search of compound-protein pairs with property constraints	Yasuo Tabei, Akihiro Kishimoto, Masaaki Kotera, Yoshihiro Yamanishi	We present the succinct interval-splitting tree algorithm (SITA) that efficiently per- forms similarity search in databases for compound-protein pairs with respect to both binary fingerprints and real-valued properties.
25	Multi-source learning with block-wise missing data for Alzheimer’s disease prediction	Shuo Xiang, Lei Yuan, Wei Fan, Yalin Wang, Paul M. Thompson, Jieping Ye	Our major contributions are threefold: (1) the proposed models handle both feature-level and source-level analysis in a unified formulation and include several existing feature learning approaches as special cases; (2) the model for incomplete data avoids direct imputation of the missing elements and thus provides superior performances.
26	Network discovery via constrained tensor analysis of fMRI data	Ian Davidson, Sean Gilpin, Owen Carmichael, Peter Walker	We pose the problem of network discovery which involves simplifying spatio-temporal data into cohesive regions (nodes) and relationships between those regions (edges).
27	Learning to question: leveraging user preferences for shopping advice	Mahashweta Das, Gianmarco De Francisci Morales, Aristides Gionis, Ingmar Weber	In this paper we show (i) how to learn the structure of the tree, i.e., which questions to ask at each node, and (ii) how to produce a suitable ranking at each node.
28	Active learning and search on low-rank matrices	Dougal J. Sutherland, Barnabás Póczos, Jeff Schneider	This work presents a general approach for active collaborative prediction with the Probabilistic Matrix Factorization model.
29	LCARS: a location-content-aware recommender system	Hongzhi Yin, Yizhou Sun, Bin Cui, Zhiting Hu, Ling Chen	In this paper, we propose LCARS, a location-content-aware recommender system that offers a particular user a set of venues (e.g., restaurants) or events (e.g., concerts and exhibitions) by giving consideration to both personal interest and local preference.
30	Comparing apples to oranges: a scalable solution with heterogeneous hashing	Mingdong Ou, Peng Cui, Fei Wang, Jun Wang, Wenwu Zhu, Shiqiang Yang	In this paper, we address the problem of “comparing apples to oranges” under the large scale setting.
31	Fast and scalable polynomial kernels via explicit feature maps	Ninh Pham, Rasmus Pagh	Fast and scalable polynomial kernels via explicit feature maps
32	Indexed block coordinate descent for large-scale linear classification with limited memory	Ian En-Hsu Yen, Chun-Fu Chang, Ting-Wei Lin, Shan-Wei Lin, Shou-De Lin	In this paper, we show how a Block Coordinate Descent method based on Nearest-Neighbor Index can significantly reduce such cost when learning a dual-sparse model.
33	Recursive regularization for large-scale classification with hierarchical and graphical dependencies	Siddharth Gopal, Yiming Yang	In this paper we propose a regularization framework for large-scale hierarchical classification that addresses both the problems.
34	Discovering latent influence in online social activities via shared cascade poisson processes	Tomoharu Iwata, Amar Shah, Zoubin Ghahramani	In this paper, we propose a probabilistic model for discovering latent influence from sequences of item adoption events.
35	STRIP: stream learning of influence probabilities	Konstantin Kutzkov, Albert Bifet, Francesco Bonchi, Aristides Gionis	Motivated by modern microblogging platforms, such as twitter, in this paper we study the problem of learning influence probabilities in a data-stream scenario, in which the network topology is relatively stable and the challenge of a learning algorithm is to keep up with a continuous stream of tweets using a small amount of time and memory.
36	Fast structure learning in generalized stochastic processes with latent factors	Mohammad Taha Bahadori, Yan Liu, Eric P. Xing	In this paper, we analyze a flexible stochastic process model, the generalized linear auto-regressive process (GLARP) and identify the conditions under which the impact of hidden variables appears as an additive term to the evolution matrix estimated with the maximum likelihood.
37	Robust sparse estimation of multiresponse regression and inverse covariance matrix via the L2 distance	Aurelie C. Lozano, Huijing Jiang, Xinwei Deng	We propose a robust framework to jointly perform two key modeling tasks involving high dimensional data: (i) learning a sparse functional mapping from multiple predictors to multiple responses while taking advantage of the coupling among responses, and (ii) estimating the conditional dependency structure among responses while adjusting for their predictors.
38	Exact sparse recovery with L0 projections	Ping Li, Cun-Hui Zhang	This paper focuses on the problem of recovering a K-sparse signal x ∈ R/1×N, i.e., K << N and ∑N/i=1 1{x_i ≠ 0} = K.
39	Robust principal component analysis via capped norms	Qian Sun, Shuo Xiang, Jieping Ye	In this paper, we present a novel non-convex formulation for the RPCA problem using the capped trace norm and the capped l₁-norm.
40	Flexible and robust co-regularized multi-domain graph clustering	Wei Cheng, Xiang Zhang, Zhishan Guo, Yubao Wu, Patrick F. Sullivan, Wei Wang	In this paper, we propose a flexible and robust framework, CGC (Co-regularized Graph Clustering), based on non-negative matrix factorization (NMF), to tackle these challenges.
41	Graph cluster randomization: network exposure to multiple universes	Johan Ugander, Brian Karrer, Lars Backstrom, Jon Kleinberg	In this work, we propose a novel methodology using graph clustering to analyze average treatment effects under social interference.
42	Social influence based clustering of heterogeneous information networks	Yang Zhou, Ling Liu	In this paper, we present a social influence based clustering framework for analyzing heterogeneous information networks with three unique features.
43	Confluence: conformity influence in large social networks	Jie Tang, Sen Wu, Jimeng Sun	We propose Confluence model to formalize the effects of social conformity into a probabilistic model.
44	The role of information diffusion in the evolution of social networks	Lilian Weng, Jacob Ratkiewicz, Nicola Perra, Bruno Gonçalves, Carlos Castillo, Francesco Bonchi, Rossano Schifanella, Filippo Menczer, Alessandro Flammini	Here we present an analysis of longitudinal micro-blogging data, revealing a more nuanced view of the strategies employed by users when expanding their social circles.
45	Information cascade at group scale	Milad Eftekhar, Yashar Ganjali, Nick Koudas	In this paper, we generalize the "influential nodes" problem.
46	Extracting social events for learning better information diffusion models	Shuyang Lin, Fengjiao Wang, Qingbo Hu, Philip S. Yu	Learning of the information diffusion model is a fundamental problem in the study of information diffusion in social networks.
47	Model selection in markovian processes	Assaf Hallak, Dotan Di-Castro, Shie Mannor	In this work we address the problem of how to use time series data to choose from a finite set of candidate discrete state spaces, where these spaces are constructed by a domain expert.
48	DTW-D: time series semi-supervised learning from a single example	Yanping Chen, Bing Hu, Eamonn Keogh, Gustavo E.A.P.A Batista	In this work we argue that the availability of this resource has isolated much of the research community from the following reality, labeled time series data is often very difficult to obtain.
49	Model-based kernel for efficient time series analysis	Huanhuan Chen, Fengzhen Tang, Peter Tino, Xin Yao	We present novel, efficient, model based kernels for time series data rooted in the reservoir computation framework.
50	Mining lines in the sand: on trajectory discovery from untrustworthy data in cyber-physical system	Lu-An Tang, Xiao Yu, Quanquan Gu, Jiawei Han, Alice Leung, Thomas La Porta	In this study, we propose a method called LiSM (Line-in-the-Sand Miner) to discover trajectories from untrustworthy sensor data.
51	A general bootstrap performance diagnostic	Ariel Kleiner, Ameet Talwalkar, Sameer Agarwal, Ion Stoica, Michael I. Jordan	Thus, we present here a general diagnostic procedure which directly and automatically evaluates the accuracy of the bootstrap’s outputs, determining whether or not the bootstrap is performing satisfactorily when applied to a given dataset and estimator.
52	Subsampling for efficient and effective unsupervised outlier detection ensembles	Arthur Zimek, Matthew Gaudet, Ricardo J.G.B. Campello, Jörg Sander	Here, we propose and study subsampling as a technique to induce diversity among individual outlier detectors.
53	A phrase mining framework for recursive construction of a topical hierarchy	Chi Wang, Marina Danilevsky, Nihit Desai, Yinan Zhang, Phuong Nguyen, Thrivikrama Taula, Jiawei Han	In this paper we propose an algorithm for recursively constructing a hierarchy of topics from a collection of content-representative documents.
54	Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation	James Foulds, Levi Boyles, Christopher DuBois, Padhraic Smyth, Max Welling	We propose a stochastic algorithm for collapsed variational Bayesian inference for LDA, which is simpler and more efficient than the state of the art method.
55	WiseMarket: a new paradigm for managing wisdom of online social users	Caleb Chen Cao, Yongxin Tong, Lei Chen, H. V. Jagadish	In this paper, we present Wise Market as an effective framework for crowdsourcing on social media that motivates users to participate in a task with care and correctly aggregates their opinions on pairwise choice problems.
56	Multi-label relational neighbor classification using social context features	Xi Wang, Gita Sukthankar	In this paper, we focus on the problem of performing multi-label classification on networked data, where the instances in the network can be assigned multiple labels.
57	Scalable text and link analysis with mixed-topic link models	Yaojia Zhu, Xiaoran Yan, Lise Getoor, Cristopher Moore	In this paper, we combine classic ideas in topic modeling with a variant of the mixed-membership block model recently developed in the statistical physics community.
58	Collaborative boosting for activity classification in microblogs	Yangqiu Song, Zhengdong Lu, Cane Wing-ki Leung, Qiang Yang	In this light, we propose a novel collaborative boosting framework comprising a text-to-activity classifier for each user, and a mechanism for collaboration between classifiers of users having social connections.
59	Trace complexity of network inference	Bruno Abrahao, Flavio Chierichetti, Robert Kleinberg, Alessandro Panconesi	We give algorithms that are competitive with, while being simpler and more efficient than, existing network inference approaches.
60	Debiasing social wisdom	Abhimanyu Das, Sreenivas Gollapudi, Rina Panigrahy, Mahyar Salek	Using a natural model of opinion formation, we analyze the effect of these interactions on an individual’s opinion and estimate her propensity to conform.
61	Mining discriminative subgraphs from global-state networks	Sayan Ranu, Minh Hoang, Ambuj Singh	In this paper, we explore this problem and design a technique called MINDS to mine minimally discriminative subgraphs from large global-state networks.
62	Approximate graph mining with label costs	Pranay Anchuri, Mohammed J. Zaki, Omer Barkol, Shahar Golan, Moshe Shamy	We present novel and scalable methods to efficiently solve the approximate isomorphism problem.
63	Summarizing probabilistic frequent patterns: a fast approach	Chunyang Liu, Ling Chen, Chengqi Zhang	In this paper, we focus on the problem of mining probabilistic representative frequent patterns (P-RFP), which is the minimal set of patterns with adequately high probability to represent all frequent patterns.
64	Mining high utility episodes in complex event sequences	Cheng-Wei Wu, Yu-Feng Lin, Philip S. Yu, Vincent S. Tseng	To address these issues, in this paper, we incorporate the concept of utility into episode mining and address a new problem of mining high utility episodes from complex event sequences, which has not been explored so far.
65	Mining frequent graph patterns with differential privacy	Entong Shen, Ting Yu	In this paper we propose the first differentially private algorithm for mining frequent graph patterns.
66	Statistical quality estimation for general crowdsourcing tasks	Yukino Baba, Hisashi Kashima	In this paper, we propose an unsupervised statistical quality estimation method for such general crowdsourcing tasks.
67	Psychological advertising: exploring user psychology for click prediction in sponsored search	Taifeng Wang, Jiang Bian, Shusen Liu, Yuyu Zhang, Tie-Yan Liu	In this paper, we aim at answering this “why” question.
68	SIGMa: simple greedy matching for aligning large knowledge bases	Simon Lacoste-Julien, Konstantina Palla, Alex Davies, Gjergji Kasneci, Thore Graepel, Zoubin Ghahramani	Here, we present Simple Greedy Matching (SiGMa), a simple algorithm for aligning knowledge bases with millions of entities and facts.
69	Simple and deterministic matrix sketching	Edo Liberty	In this paper we adapt a well known streaming algorithm for approximating item frequencies to the matrix sketching setting.
70	A space efficient streaming algorithm for triangle counting using the birthday paradox	Madhav Jha, C. Seshadhri, Ali Pinar	We design a space efficient algorithm that approximates the transitivity (global clustering coefficient) and total triangle count with only a single pass through a graph given as a stream of edges.
71	Who, where, when and what: discover spatio-temporal topics for twitter users	Quan Yuan, Gao Cong, Zongyang Ma, Aixin Sun, Nadia Magnenat- Thalmann	In this paper, we propose a probabilistic model W⁴ (short for Who+Where+When+What) to exploit such data to discover individual users’ mobility behaviors from spatial, temporal and activity aspects.
72	Multi-label classification by mining label and instance correlations from heterogeneous information networks	Xiangnan Kong, Bokai Cao, Philip S. Yu	In this paper, we propose to use heterogeneous information networks to facilitate the multi-label classification process.
73	Accurate intelligible models with pairwise interactions	Yin Lou, Rich Caruana, Johannes Gehrke, Giles Hooker	In this paper, we suggest adding selected terms of interacting pairs of features to standard GAMs.
74	Spotting opinion spammers using behavioral footprints	Arjun Mukherjee, Abhinav Kumar, Bing Liu, Junhui Wang, Meichun Hsu, Malu Castellanos, Riddhiman Ghosh	This work proposes a novel angle to the problem by modeling spamicity as latent.
75	An efficient ADMM algorithm for multidimensional anisotropic total variation regularization problems	Sen Yang, Jie Wang, Wei Fan, Xiatian Zhang, Peter Wonka, Jieping Ye	In this paper, we propose an efficient alternating augmented Lagrangian method (ADMM) to solve total variation regularization problems.
76	Speeding up large-scale learning with a social prior	Deepayan Chakrabarti, Ralf Herbrich	We study this problem in a fully Bayesian setting, focusing on the problem of using Facebook user-IDs as features, with the social network giving the relationship structure.
77	FISM: factored item similarity models for top-N recommender systems	Santosh Kabbur, Xia Ning, George Karypis	To alleviate this problem, we present an item-based method for generating top-N recommendations that learns the item-item similarity matrix as the product of two low dimensional latent factor matrices.
78	Nonparametric hierarchal bayesian modeling in non-contractual heterogeneous survival data	Shouichi Nagano, Yusuke Ichikawa, Noriko Takaya, Tadasu Uchiyama, Makoto Abe	To overcome this problem, we present a new survival model using a non-parametric Bayes paradigm with MCMC.
79	Cross-task crowdsourcing	Kaixiang Mo, Erheng Zhong, Qiang Yang	In this paper, we employ transfer learning, which borrows knowledge from auxiliary historical tasks to improve the data veracity in a given target task.
80	Evaluating the crowd with confidence	Manas Joglekar, Hector Garcia-Molina, Aditya Parameswaran	In this work, we devise techniques to generate confidence intervals for worker error rate estimates, thereby enabling a better evaluation of worker quality.
81	Inferring social roles and statuses in social networks	Yuchen Zhao, Guan Wang, Philip S. Yu, Shaobo Liu, Simon Zhang	In this paper, we investigate the social roles and statuses that people act in online social networks in the perspective of network structures, since the uniqueness of social networks is connecting people.
82	Adaptive collective routing using gaussian process dynamic congestion models	Siyuan Liu, Yisong Yue, Ramayya Krishnan	We consider the problem of adaptively routing a fleet of cooperative vehicles within a road network in the presence of uncertain and dynamic congestion conditions.
83	Maximizing acceptance probability for active friending in online social networks	De-Nian Yang, Hui-Ju Hung, Wang-Chien Lee, Wei Chen	In this paper, we advocate a recommendation support for active friending, where a user actively specifies a friending target.
84	Mining evolutionary multi-branch trees from text streams	Xiting Wang, Shixia Liu, Yangqiu Song, Baining Guo	In this paper, we propose an evolutionary multi-branch tree clustering method for streaming text data.
85	Active search on graphs	Xuezhi Wang, Roman Garnett, Jeff Schneider	Inspired by the success of myopic methods for active learning and bandit problems, we propose a myopic method for active search on graphs.
86	Fast rank-2 nonnegative matrix factorization for hierarchical document clustering	Da Kuang, Haesun Park	In this paper, we propose an efficient hierarchical document clustering method based on a new algorithm for rank-2 NMF.
87	A “semi-lazy” approach to probabilistic path prediction in dynamic environments	Jingbo Zhou, Anthony K.H. Tung, Wei Wu, Wee Siong Ng	We propose a "semi-lazy" approach to path prediction that builds prediction models on the fly using dynamically selected reference trajectories.
88	Optimizing parallel belief propagation in junction treesusing regression	Lu Zheng, Ole Mengshoel	In this paper, we investigate a machine learning approach to minimize the execution time of parallel junction tree algorithms implemented on a GPU.
89	Multi-source deep learning for information trustworthiness estimation	Liang Ge, Jing Gao, Xiaoyi Li, Aidong Zhang	In this paper, we investigate the important problem of estimating information trustworthiness from the perspective of correlating and comparing multiple data sources.
90	Unsupervised link prediction using aggregative statistics on heterogeneous social networks	Tsung-Ting Kuo, Rui Yan, Yu-Yang Huang, Perng-Hwa Kung, Shou-De Lin	This paper devises a novel unsupervised framework to solve this problem, including two main components: (1) a three-layer factor graph model and three types of potential functions; (2) a ranked-margin learning and inference algorithm.
91	Link prediction with social vector clocks	Conrad Lee, Bobo Nick, Ulrik Brandes, Pádraig Cunningham	We here show that computationally less expensive features can achieve the same performance in the common scenario in which the data is available as a sequence of interactions.
92	Geo-spotting: mining online location-based services for optimal retail store placement	Dmytro Karamshuk, Anastasios Noulas, Salvatore Scellato, Vincenzo Nicosia, Cecilia Mascolo	In this paper we study the predictive power of various machine learning features on the popularity of retail stores in the city through the use of a dataset collected from Foursquare in New York.
93	Location-aware publish/subscribe	Guoliang Li, Yang Wang, Ting Wang, Jianhua Feng	We propose an rtree based index structure by integrating textual descriptions into rtree nodes.
94	Quadratic optimization to identify highly heritable quantitative traits from complex phenotypic features	Jiangwen Sun, Jinbo Bi, Henry R. Kranzler	We propose a quadratic optimization approach that directly utilizes heritability as an objective during the derivation of quantitative traits of a disease.
95	Repetition-aware content placement in navigational networks	Dora Erdos, Vatche Ishakian, Azer Bestavros, Evimaria Terzi	The key contribution of our work is the introduction of memory into the navigation process, by making user conversion dependent on the number of her exposures to that content.
96	Scalable all-pairs similarity search in metric spaces	Ye Wang, Ahmed Metwally, Srinivasan Parthasarathy	In this article, we propose a parallel framework for solving this problem in metric spaces.
97	Massively parallel expectation maximization using graphics processing units	Muzaffer Can Altinigneli, Claudia Plant, Christian Böhm	In this paper, we propose an innovative EM clustering algorithm particularly suited for the GPU platform on NVIDIA’s Fermi architecture.
98	Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms	Chris Thornton, Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown	We consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previous work that attacks these issues separately.
99	Direct optimization of ranking measures for learning to rank models	Ming Tan, Tian Xia, Lily Guo, Shaojun Wang	We present a novel learning algorithm, DirectRank, which directly and exactly optimizes ranking measures without resorting to any upper bounds or approximations.
100	Multi-space probabilistic sequence modeling	Shuo Chen, Jiexun Xu, Thorsten Joachims	In this paper, we propose a method that trains not one monolithic model, but multiple local embeddings for a class of pairwise conditional models especially suited for sequence and co-occurrence modeling.
101	Towards never-ending learning from time series streams	Yuan Hao, Yanping Chen, Jesin Zakaria, Bing Hu, Thanawin Rakthanmanon, Eamonn Keogh	Based on this observation, we propose a never-ending learning framework for time series in which an agent examines an unbounded stream of data and occasionally asks a teacher (which may be a human or an algorithm) for a label.
102	Constrained stochastic gradient descent for large-scale least squares problem	Yang Mu, Wei Ding, Tianyi Zhou, Dacheng Tao	In this paper, we present the Constrained Stochastic Gradient Descent (CSGD) algorithm to solve the large-scale least squares problem.
103	Making recommendations from multiple domains	Wei Chen, Wynne Hsu, Mong Li Lee	In this work, we propose a generalized cross domain collaborative filtering framework that integrates social network information seamlessly with cross domain data.
104	Cascading outbreak prediction in networks: a data-driven approach	Peng Cui, Shifei Jin, Linyun Yu, Fei Wang, Wenwu Zhu, Shiqiang Yang	In this paper, we attempt harnessing historical cascade data, propose a novel data driven approach to select important nodes as sensors, and predict the outbreaks based on the cascading behaviors of these sensors.
105	Combining latent factor model with location features for event-based group recommendation	Wei Zhang, Jianyong Wang, Wei Feng	In this paper, we propose a method called Pairwise Tag enhAnced and featuRe-based Matrix factorIzation for Group recommendAtioN (PTARMIGAN), which considers location features, social features, and implicit patterns simultaneously in a unified model.
106	Cost-sensitive online active learning with application to malicious URL detection	Peilin Zhao, Steven C.H. Hoi	In particular, we propose two CSOAL algorithms and analyze their theoretical performance in terms of cost-sensitive bounds.
107	The bang for the buck: fair competitive viral marketing from the host perspective	Wei Lu, Francesco Bonchi, Amit Goyal, Laks V.S. Lakshmanan	In this paper we propose and study the novel problem of competitive viral marketing from the perspective of the host, i.e., the owner of the social network platform.
108	Modeling the dynamics of composite social networks	Erheng Zhong, Wei Fan, Yin Zhu, Qiang Yang	In this paper, we study the problem of modeling the dynamics of composite networks, where the evolution processes of different networks are jointly considered.
109	A time-dependent enhanced support vector machine for time series regression	Goce Ristanoski, Wei Liu, James Bailey	Once we identified the samples that produced the largest errors, we observed their correlation with distribution shifts that occur in the time series.
110	A new collaborative filtering approach for increasing the aggregate diversity of recommender systems	Katja Niemann, Martin Wolpers	In this paper, we propose a new collaborative filtering approach that is based on the items’ usage contexts.
111	Scalable inference in max-margin topic models	Jun Zhu, Xun Zheng, Li Zhou, Bo Zhang	In this paper, we present a highly scalable approach to building max-margin supervised topic models.
112	A data-driven method for in-game decision making in MLB: when to pull a starting pitcher	Ganeshapillai Gartheeban, John Guttag	In this paper we show how machine learning can be applied to generate a model that could lead to better on-field decisions by managers of professional baseball teams.
113	Exploiting user clicks for automatic seed set generation for entity matching	Xiao Bai, Flavio P. Junqueira, Srinivasan H. Sengamedu	In this paper, we present an approach that leverages user clicks during Web search to automatically generate training data for entity matching.
114	Silence is also evidence: interpreting dwell time for recommendation from psychological perspective	Peifeng Yin, Ping Luo, Wang-Chien Lee, Min Wang	Based on the observation that the dwell time on an item may reflect the opinion of a user, we aim to enrich the user-vote matrix by converting the dwell time on items into users’ “pseudo votes” and then help improve recommendation performance.
115	Efficient single-source shortest path and distance queries on large graphs	Andy Diwen Zhu, Xiaokui Xiao, Sibo Wang, Wenqing Lin	To address the deficiency of existing work, this paper presents Highways-on-Disk (HoD), a disk-based index that supports both SSD and SSSP queries on directed and weighted graphs.
116	On community detection in real-world networks and the importance of degree assortativity	Marek Ciglan, Michal Laclavík, Kjetil Nørvåg	In this paper, we focus on several popular community detection algorithms with low computational complexity and with decent performance on the artificial benchmarks, and we study their behaviour on real-world networks.
117	Trial and error in influential social networks	Xiaohui Bei, Ning Chen, Liyu Dou, Xiangru Huang, Ruixin Qiang	In this paper, we introduce a trial-and-error model to study information diffusion in a social network.
118	Collaborative matrix factorization with multiple similarities for predicting drug-target interactions	Xiaodong Zheng, Hao Ding, Hiroshi Mamitsuka, Shanfeng Zhu	We propose a factor model, named Multiple Similarities Collaborative Matrix Factorization(MSCMF), which projects drugs and targets into a common low-rank feature space, which is further consistent with weighted similarity matrices over drugs and those over targets.
119	FeaFiner: biomarker identification from medical data through feature generalization and selection	Jiayu Zhou, Zhaosong Lu, Jimeng Sun, Lei Yuan, Fei Wang, Jieping Ye	To address this problem, we propose FeaFiner (short for Feature Refiner), an efficient formulation that simultaneously generalizes low-level features into higher level concepts and then selects relevant concepts based on the target variable.
120	Learning geographical preferences for point-of-interest recommendation	Bin Liu, Yanjie Fu, Zijun Yao, Hui Xiong	To this end, in this paper, we propose a novel geographical probabilistic factor analysis framework which strategically takes various factors into consideration.
121	Learning mixed kronecker product graph models with simulated method of moments	Sebastian I. Moreno, Jennifer Neville, Sergey Kirshner	In this work, we present the first learning algorithm for mKPGMs.
122	Measuring spontaneous devaluations in user preferences	Komal Kapoor, Nisheeth Srivastava, Jaideep Srivastava, Paul Schrater	In this work, we study the music listening histories of Last.fm users focusing on the changes in their preferences based on their choices for different artists at different points in time.
123	Mining evidences for named entity disambiguation	Yang Li, Chi Wang, Fangqiu Han, Jiawei Han, Dan Roth, Xifeng Yan	In this work, we propose a generative model and an incremental algorithm to automatically mine useful evidences across documents.
124	Privacy-preserving data exploration in genome-wide association studies	Aaron Johnson, Vitaly Shmatikov	We present a set of practical, privacy-preserving data mining algorithms for GWAS datasets.
125	Synthetic review spamming and defense	Huan Sun, Alex Morales, Xifeng Yan	In this paper, we introduce a very simple, but powerful review spamming technique that could fail the existing feature-based detection algorithms easily.
126	Information cartography: creating zoomable, large-scale maps of information	Dafna Shahaf, Jaewon Yang, Caroline Suen, Jeff Jacobs, Heidi Wang, Jure Leskovec	In this paper, we formalize characteristics of good zoomable maps and formulate their construction as an optimization problem.
127	Restreaming graph partitioning: simple versatile algorithms for advanced balancing	Joel Nishimura, Johan Ugander	In this work we introduce restreaming graph partitioning and develop algorithms that scale similarly to streaming partitioning algorithms yet empirically perform as well as fully offline algorithms.
128	Understanding evolution of research themes: a probabilistic generative model for citations	Xiaolong Wang, Chengxiang Zhai, Dan Roth	In this paper, we propose a novel way of analyzing literature citation to explore the research topics and the theme evolution by modeling article citation relations with a probabilistic generative model.
129	On the equivalent of low-rank linear regressions and linear discriminant analysis based regressions	Xiao Cai, Chris Ding, Feiping Nie, Heng Huang	In this paper, we will prove that the low-rank regression model is equivalent to doing linear regression in the linear discriminant analysis (LDA) subspace.
130	To buy or not to buy: that is the question	Oren Etzioni	In this talk, I’ll describe how we utilize advanced data-mining and text-mining techniques at Decide.com (and earlier at Farecast) to solve these problems for on-line shoppers.
131	Mining the digital universe of data to develop personalized cancer therapies	Eric Schadt	Mining the digital universe of data to develop personalized cancer therapies
132	The business impact of deep learning	Jeremy Howard	The business impact of deep learning
133	Adaptive adversaries: building systems to fight fraud and cyber intruders	Ari Gesher	In this talk, we’ll take a look at case studies of three different systems, using a partnership of automation and human analysis on large scale data to find the clandestine human behavior that these datasets hold, including a discussion of the backend systems architecture and a demo of the interactive analysis environment.
134	Targeting and influencing at scale: from presidential elections to social good	Rayid Ghani	If you’re still recovering from the barrage of ads, news, emails, Facebook posts, and newspaper articles that were giving you the latest poll numbers, asking you to volunteer, donate money, and vote, this talk will give you a look behind the scenes on why you were seeing what you were seeing.
135	Hadoop: a view from the trenches	Milind Bhandarkar	In this talk I will reminisce about the early days of Hadoop, and will give an overview of the current state of the Hadoop ecosystem, and some real-world use cases of this open source platform.
136	Cyber security: how visual analytics unlock insight	Raffael Marty	In this talk we will have a look at what approaches have been explored, what has worked, and what has not. In the Cyber Security domain, we have been collecting ‘big data’ for almost two decades.
137	Using "big data" to solve "small data" problems	Chris Neumann	In this talk, Chris Neumann will discuss how DataHero applied the principles of user-centric design and development over a year and a half to create a product with which more than 95% of new users can get answers on their first attempt.
138	Financing lead triggers: empowering sales reps through knowledge discovery and fusion	Kareem S. Aggour, Bethany Hoogs	Here we describe a system built to automate the collection and aggregation of information on companies, which is then mined to identify actionable sales leads.
139	Query clustering based on bid landscape for sponsored search auction optimization	Ye Chen, Weiguo Liu, Jeonghee Yi, Anton Schwaighofer, Tak W. Yan	In this paper we present a formalism of clustering probability distributions, and its application to query clustering where each query is represented as a probability density of click-through rate (CTR) weighted bid and distortion is measured by KL divergence.
140	Analysis of advanced meter infrastructure data of water consumption in apartment buildings	Einat Kermany, Hanna Mazzawi, Dorit Baras, Yehuda Naveh, Hagai Michaelis	We present our experience of using machine learning techniques over data originating from advanced meter infrastructure (AMI) systems for water consumption in a medium-size city.
141	Online controlled experiments at large scale	Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, Nils Pohlmann	We discuss why negative experiments, which degrade the user experience short term, should be run, given the learning value and long-term benefits.
142	iHR: an online recruiting system for Xiamen Talent Service Center	Wenxing Hong, Lei Li, Tao Li, Wenfu Pan	In this paper, we investigate and compare various online recruiting systems from a product perspective.
143	Dynamic memory allocation policies for postings in real-time Twitter search	Nima Asadi, Jimmy Lin, Michael Busch	In this paper, we focus on one aspect: dynamic postings allocation policies for index structures that are completely held in main memory.
144	A unified search federation system based on online user feedback	Luo Jie, Sudarshan Lamkhede, Rochit Sapra, Evans Hsu, Helen Song, Yi Chang	In this paper, we propose a unified framework for the search federation problem.
145	Amplifying the voice of youth in Africa via text analytics	Prem Melville, Vijil Chenthamarakshan, Richard D. Lawrence, James Powell, Moses Mugisha, Sharad Sapra, Rajesh Anandan, Solomon Assefa	This paper describes an automated message-understanding and routing system deployed by IBM at UNICEF.
146	Scalable supervised dimensionality reduction using clustering	Troy Raeder, Claudia Perlich, Brian Dalessandro, Ori Stitelman, Foster Provost	We present experimental results showing that for this task our algorithm outperforms other popular dimensionality-reduction algorithms across a wide variety of ad campaigns, as well as production results that showcase its performance in practice.
147	Ad click prediction: a view from the trenches	H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, Jeremy Kubica	The goal of this paper is to highlight the close relationship between theoretical advances and practical engineering in this industrial setting, and to show the depth of challenges that appear when applying traditional machine learning methods in a complex dynamic system.
148	Modeling and probabilistic reasoning of population evacuation during large-scale disaster	Xuan Song, Quanshi Zhang, Yoshihide Sekimoto, Teerayut Horanont, Satoshi Ueyama, Ryosuke Shibasaki	In this paper, we construct a large human mobility database that stores and manages GPS records from mobile devices used by approximately 1.6 million people throughout Japan from 1 August 2010 to 31 July 2011.
149	Using co-visitation networks for detecting large scale online display advertising exchange fraud	Ori Stitelman, Claudia Perlich, Brian Dalessandro, Rod Hook, Troy Raeder, Foster Provost	In this paper, we will show examples of how non-intentional traffic that is produced by fraudulent activities adversely affects both general analytics and predictive models, and propose an approach using co-visitation networks to identify sites that have large amounts of this fraudulent traffic.
150	An integrated framework for optimizing automatic monitoring systems in large IT infrastructures	Liang Tang, Tao Li, Larisa Shwartz, Florian Pinel, Genady Ya Grabarnik	This paper describes an integrated framework for minimizing false positive tickets and maximizing the monitoring coverage for system faults.
151	Improving quality control by early prediction of manufacturing outcomes	Sholom M. Weiss, Amit Dhurandhar, Robert J. Baseman	We describe methods for continual prediction of manufactured product quality prior to final testing.
152	A data mining driven risk profiling method for road asset management	Daniel Emerson, Justin Z. Weligamage, Richi Nayak	Road surface skid resistance has been shown to have a strong relationship to road crash risk, however, applying the current method of using investigatory levels to identify crash prone roads is problematic as they may fail in identifying risky roads outside of the norm.
153	Why people hate your app: making sense of user feedback in a mobile app store	Bin Fu, Jialiu Lin, Lei Li, Christos Faloutsos, Jason Hong, Norman Sadeh	In this paper, we propose Wiscom, a system that can analyze tens of millions user ratings and comments in mobile app markets at three different levels of detail.
154	Towards long-lead forecasting of extreme flood events: a data mining framework for precipitation cluster precursors identification	Dawei Wang, Wei Ding, Kui Yu, Xindong Wu, Ping Chen, David L. Small, Shafiqul Islam	In this paper, we propose an integrated data mining framework for identifying the precursors to precipitation event clusters and use this information to predict extended periods of extreme precipitation and subsequent floods.
155	Predictive model performance: offline and online evaluations	Jeonghee Yi, Ye Chen, Jie Li, Swaraj Sett, Tak W. Yan	We study the accuracy of evaluation metrics used to estimate the efficacy of predictive models.
156	Uncertainty in online experiments with dependent data: an evaluation of bootstrap methods	Eytan Bakshy, Dean Eckles	We develop a framework for understanding how dependence affects uncertainty in user-item experiments and evaluate how bootstrap methods that account for differing levels of dependence perform in practice.
157	Knowledge discovery from massive healthcare claims data	Varun Chandola, Sreenivas R. Sukumar, Jack C. Schryver	Specifically, we translate the problem of analyzing healthcare data into some of the most well-known analysis problems in the data mining community, social network analysis, text mining, and temporal analysis and higher order feature construction, and describe how advances within each of these areas can be leveraged to understand the domain of healthcare. The objective of this paper is two fold: first, we introduce the emerging domain of "big" healthcare claims data to the KDD community, and second, we describe the success and challenges that we encountered in analyzing this data using state of art analytics for massive data.
158	Palette power: enabling visual search through colors	Anurag Bhardwaj, Atish Das Sarma, Wei Di, Raffay Hamid, Robinson Piramuthu, Neel Sundaresan	In this paper we present a simple and fast search algorithm that uses color as the main feature for building visual search.
159	Heat pump detection from coarse grained smart meter data with positive and unlabeled learning	Hongliang Fei, Younghun Kim, Sambit Sahu, Milind Naphade, Sanjay K. Mamidipalli, John Hutchinson	In this paper, we aim to detect electric heat pumps from coarse grained smart meter data for a heat pump marketing campaign.
160	Empirical bayes model to combine signals of adverse drug reactions	Rave Harpaz, William DuMouchel, Paea LePendu, Nigam H. Shah	We present a methodology based on empirical Bayes modeling to combine ADR signals mined from ~5 million adverse event reports collected by the FDA, and healthcare data corresponding to 46 million patients’ the main two types of information sources currently employed for signal detection.
161	Efficiently rewriting large multimedia application execution traces with few event sequences	Christiane Kamdem Kengne, Leon Constantin Fopa, Alexandre Termier, Noha Ibrahim, Marie-Christine Rousset, Takashi Washio, Miguel Santana	In this paper, we study the problem of finding a set of sequences of events that allows a reduced-size rewriting of the original trace.
162	Discriminant malware distance learning on structural information for automated malware classification	Deguang Kong, Guanhua Yan	In this work, we explore techniques that can automatically classify malware variants into their corresponding families.
163	Assessing team strategy using spatiotemporal data	Patrick Lucey, Dean Oliver, Peter Carr, Joe Roth, Iain Matthews	By way of example, we present an approach which uses an entire season of ball tracking data from the English Premier League (2010-2011 season) to reinforce the common held belief that teams should aim to "win home games and draw away ones".
164	Exploratory analysis of highly heterogeneous document collections	Arun S. Maiya, John P. Thompson, Francisco Loaiza-Lemos, Robert M. Rolfe	As one of our key tagging strategies, we introduce the KERA algorithm (Keyword Extraction for Reports and Articles).
165	Experience from hosting a corporate prediction market: benefits beyond the forecasts	Thomas A. Montgomery, Paul M. Stieg, Michael J. Cavaretta, Paul E. Moraal	We describe our experience, including both the strong and weak correlations found between predictions and real world results.
166	Detecting insider threats in a real corporate database of computer usage activity	Ted E. Senator, Henry G. Goldberg, Alex Memory, William T. Young, Brad Rees, Robert Pierce, Daniel Huang, Matthew Reardon, David A. Bader, Edmond Chow, Irfan Essa, Joshua Jones, Vinay Bettadapura, Duen Horng Chau, Oded Green, Oguz Kaya, Anita Zakrzewska, Erica Briscoe, Rudolph IV L. Mappus, Robert McColl, Lora Weiss, Thomas G. Dietterich, Alan Fern, Weng–Keen Wong, Shubhomoy Das, Andrew Emmott, Jed Irvine, Jay-Yoon Lee, Danai Koutra, Christos Faloutsos, Daniel Corkill, Lisa Friedland, Amanda Gentzel, David Jensen	This paper reports on methods and results of an applied research project by a team consisting of SAIC and four universities to develop, integrate, and evaluate new approaches to detect the weak signals characteristic of insider threats on organizations’ information systems.
167	Mining for geographically disperse communities in social networks by leveraging distance modularity	Paulo Shakarian, Patrick Roos, Devon Callahan, Cory Kirk	We apply a variant of Newman-Girvan modularity to this problem known as distance modularity.
168	An integrated framework for suicide risk prediction	Truyen Tran, Dinh Phung, Wei Luo, Richard Harvey, Michael Berk, Svetha Venkatesh	We present an integrated machine learning framework to tackle this challenge.
169	Gaussian multiple instance learning approach for mapping the slums of the world using very high resolution imagery	Ranga Raju Vatsavai	In this paper, we present a computationally efficient algorithm based on multiple instance learning for mapping informal settlements (slums) using very high-resolution remote sensing imagery.
170	A privacy preserving framework for managing vehicle data in road pricing systems	Huayu Wu, Wee Siong Ng, Kian-Lee Tan, Wei Wu, Shili Xiang, Mingqiang Xue	We propose a novel framework in which privacy protection is pushed to data provider site.
171	U-Air: when urban air quality inference meets big data	Yu Zheng, Furui Liu, Hsun-Ping Hsieh	In this paper, we infer the real-time and fine-grained air quality information throughout a city, based on the (historical and real-time) air quality data reported by existing monitor stations and a variety of data sources we observed in the city, such as meteorology, traffic flow, human mobility, structure of road networks, and point of interests (POIs).
172	Panel: a data scientist’s guide to making money from start-ups	Foster Provost, Geoffrey I. Webb	Panel: a data scientist’s guide to making money from start-ups
173	LAICOS: an open source platform for personalized social web search	Mohamed Reda Bouadjenek, Hakim Hacid, Mokrane Bouzeghoub	In this paper, we introduce LAICOS, a social Web search engine as a contribution to the growing area of Social Information Retrieval (SIR).
174	JobMiner: a real-time system for mining job-related patterns from social media	Yu Cheng, Yusheng Xie, Zhengzhang Chen, Ankit Agrawal, Alok Choudhary, Songtao Guo	In this paper, we analyze the job information from the social network point of view.
175	Inferring distant-time location in low-sampling-rate trajectories	Meng-Fen Chiang, Yung-Hsiang Lin, Wen-Chih Peng, Philip S. Yu	To efficiently process queries, we proposed the index structure Sorted Interval-Tree (SOIT) to organize location records.
176	AMETHYST: a system for mining and exploring topical hierarchies of heterogeneous data	Marina Danilevsky, Chi Wang, Fangbo Tao, Son Nguyen, Gong Chen, Nihit Desai, Lidan Wang, Jiawei Han	In this demo we present AMETHYST, a system for exploring and analyzing a topical hierarchy constructed from a heterogeneous information network (HIN).
177	A tool for collecting provenance data in social media	Pritam Gundecha, Suhas Ranganath, Zhuo Feng, Huan Liu	In this paper, we present a novel web-based tool for collecting the attributes of interest associated with a particular social media user related to the received information.
178	STED: semi-supervised targeted-interest event detectionin in twitter	Ting Hua, Feng Chen, Liang Zhao, Chang-Tien Lu, Naren Ramakrishnan	This paper presents STED, a semi-supervised system that helps users to automatically detect and interactively visualize events of a targeted type from twitter, such as crimes, civil unrests, and disease outbreaks.
179	Forex-foreteller: currency trend modeling using news articles	Fang Jin, Nathan Self, Parang Saraf, Patrick Butler, Wei Wang, Naren Ramakrishnan	In this demo, we present Forex-foreteller (FF) which mines news articles and makes forecasts about the movement of foreign currency markets.
180	Real-time disease surveillance using Twitter data: demonstration on flu and cancer	Kathy Lee, Ankit Agrawal, Alok Choudhary	In this paper, we describe a novel real-time flu and cancer surveillance system that uses spatial, temporal, and text mining on Twitter data.
181	KeySee: supporting keyword search on evolving events in social streams	Pei Lee, Laks V.S. Lakshmanan, Evangelos Milios	In this demo, we provide a new solution called \keysee by grouping posts into events, and track the evolution patterns of events as new posts stream in and old posts fade out.
182	Understanding Twitter data with TweetXplorer	Fred Morstatter, Shamanth Kumar, Huan Liu, Ross Maciejewski	We present TweetXplorer, a system for analysts with little information about an event to gain knowledge through the use of effective visualization techniques.
183	An online system with end-user services: mining novelty concepts from tv broadcast subtitles	Mika Rautiainen, Jouni Sarvanko, Arto Heikkinen, Mika Ylianttila, Vassilis Kostakos	In this paper we introduce our data mining system and accompanying services for summarizing Finnish DVB broadcast streams from seven national channels.
184	When TEDDY meets GrizzLY: temporal dependency discovery for triggering road deicing operations	Céline Robardet, Vasile-Marian Scuturici, Marc Plantevit, Antoine Fraboulet	TEDDY algorithm aims at discovering such dependencies, identifying the statically significant time intervals with a chi2 test.
185	EventCube: multi-dimensional search and mining of structured and text data	Fangbo Tao, Kin Hou Lei, Jiawei Han, Chengxiang Zhai, Xiao Cheng, Marina Danilevsky, Nihit Desai, Bolin Ding, Jing Ge Ge, Heng Ji, Rucha Kanade, Anne Kao, Qi Li, Yanen Li, Cindy Lin, Jialu Liu, Nikunj Oza, Ashok Srivastava, Rod Tjoelker, Chi Wang, Duo Zhang, Bo Zhao	EventCube: multi-dimensional search and mining of structured and text data
186	SEA: a system for event analysis on chinese tweets	Yaqiong Wang, Hongfu Liu, Hao Lin, Junjie Wu, Zhiang Wu, Jie Cao	In light of this, in this demo paper, we propose SEA, a System for Event Analysis on Chinese tweets.
187	SAE: social analytic engine for large networks	Yang Yang, Jianfei Wang, Yutao Zhang, Wei Chen, Jing Zhang, Honglei Zhuang, Zhilin Yang, Bo Ma, Zhanpeng Fang, Sen Wu, Xiaoxiao Li, Debing Liu, Jie Tang	In this paper, we present a novel Social Analytic Engine (SAE) for large online social networks.
188	FIU-Miner: a fast, integrated, and user-friendly system for data mining in distributed environment	Chunqiu Zeng, Yexi Jiang, Li Zheng, Jingxuan Li, Lei Li, Hongtai Li, Chao Shen, Wubai Zhou, Tao Li, Bing Duan, Ming Lei, Pengnian Wang	In this paper, we design and implement FIU-Miner, a Fast, Integrated, and User-friendly system to ease data analysis.
189	LAFT-Explorer: inferring, visualizing and predicting how your social network expands	Jun Zhang, Chaokun Wang, Yuanchi Ning, Yichi Liu, Jianmin Wang, Philip S. Yu	In this paper we demonstrate LaFT-Explorer, a general toolkit for explaining and reproducing the network growth process based on the friendship propagation.
190	A transfer learning based framework of crowd-selection on twitter	Zhou Zhao, Da Yan, Wilfred Ng, Shi Gao	This helps understand our ideas in an interactive manner.
191	Risk-O-Meter: an intelligent clinical risk calculator	Kiyana Zolfaghar, Jayshree Agarwal, Deepthi Sistla, Si-Chi Chin, Senjuti Basu Roy, Nele Verbiest	We present a system called Risk-O-Meter to predict and an- alyze clinical risk via data imputation, visualization, predic- tive modeling, and association rule exploration.
192	Algorithmic techniques for modeling and mining large graphs (AMAzING)	Alan Frieze, Aristides Gionis, Charalampos Tsourakakis	In this tutorial, we will provide an in-depth presentation of the most popular random-graph models used for modeling real-world networks.
193	Mining data from mobile devices: a survey of smart sensing and analytics	Spiros Papadimitriou, Tina Eliassi-Rad	In this tutorial, we survey the state-of-the-art in terms of mining data from mobile devices across different application areas such as ads, healthcare, geosocial, public policy, etc. In part two, we present cross-cutting challenges such as real-time analysis, security, and we outline cross cutting methods for mobile data mining such as network inference, streaming algorithms, etc.
194	Big data analytics for healthcare	Jimeng Sun, Chandan K. Reddy	In this tutorial, we introduce the characteristics and related mining challenges on dealing with big medical data.
195	Entity resolution for big data	Lise Getoor, Ashwin Machanavajjhala	In this tutorial, we bring together perspectives on entity resolution from a variety of fields, including databases, information retrieval, natural language processing and machine learning, to provide, in one setting, a survey of a large body of work.
196	Network sampling	Mohammad A. Hasan, Jennifer Neville, Nesreen Ahmed	In this tutorial, we aim to cover a diverse collection of methodologies and applications of network sampling.
197	The dataminer’s guide to scalable mixed-membership and nonparametric bayesian models	Amr Ahmed, Alex Smola	We present design patterns for hierarchical nonparametric Bayesian models, efficient inference algorithms, and modeling tools to describe salient aspects of the data.