Paper Digest: KDD 2013 Highlights
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) is one of the top data mining conferences in the world.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
team@paperdigest.org
TABLE 1: KDD 2013 Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | Scale-out beyond map-reduce | Raghu Ramakrishnan, Team Members CISL | Until recently, data was gathered for well-defined objectives such as auditing, forensics, reporting and line-of-business operations; now, exploratory and predictive analysis is becoming ubiquitous. |
2 | The online revolution: education for everyone | Andrew Ng, Daphne Koller | In this talk, I’ll report on this far-reaching experiment in education, and why we believe this model can provide both an improved classroom experience for our on-campus students, via a flipped classroom model, as well as a meaningful learning experience for the millions of students around the world who would otherwise never have access to education of this quality. |
3 | Optimization in learning and data analysis | Stephen J. Wright | We discuss research on several areas in this domain, including signal reconstruction, manifold learning, and regression/classification, describing in each case recent research in which optimization algorithms have been developed and applied successfully. |
4 | Predicting the present with search engine data | Hal Varian | We illustrate how one can use Google search data to nowcast economic metrics of interest, and discuss some of the ramifications for research and policy. |
5 | One theme in all views: modeling consensus topics in multiple contexts | Jian Tang, Ming Zhang, Qiaozhu Mei | In this paper we explore a different direction. |
6 | Representing documents through their readers | Khalid El-Arini, Min Xu, Emily B. Fox, Carlos Guestrin | By assuming that a user’s labels correspond to topics in the articles he shares, we can learn a labeled dictionary from a training corpus of articles shared on Twitter. |
7 | Text-based measures of document diversity | Kevin Bache, David Newman, Padhraic Smyth | In this paper we present a text-based framework for quantifying how diverse a document is in terms of its content. |
8 | Diversity maximization under matroid constraints | Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur | Aggregator websites typically present documents in the form of representative clusters. |
9 | Connecting users across social media sites: a behavioral-modeling approach | Reza Zafarani, Huan Liu | This paper aims to address the cross-media user identification problem. |
10 | Automatic selection of social media responses to news | Tadej Štajner, Bart Thomee, Ana-Maria Popescu, Marco Pennacchiotti, Alejandro Jaimes | We propose a near-optimal solution to the underlying optimization problem, which leverages the submodularity property of the objective function. |
11 | Estimating sharer reputation via social data calibration | Jaewon Yang, Bee-Chung Chen, Deepak Agarwal | To correct for such biases, we propose to utilize an additional data source that provides unbiased goodness estimates for a small set of shared items, and calibrate biased social data through a novel multi-level hierarchical model that describes how the unbiased data and biased data are jointly generated according to sharer reputation scores. |
12 | Linking named entities in Tweets with knowledge base via user interest modeling | Wei Shen, Jianyong Wang, Ping Luo, Min Wang | In this paper, we propose KAURI, a graph-based framework to collectively link all the named entity mentions in all tweets posted by a user via modeling the user’s topics of interest. |
13 | TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC | Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, Jeong-Hoon Lee, Min-Soo Kim, Jinha Kim, Hwanjo Yu | In this paper, we propose a general, disk-based graph engine called TurboGraph to process billion-scale graphs very efficiently by using modern hardware on a single PC. |
14 | Beyond myopic inference in big data pipelines | Karthik Raman, Adith Swaminathan, Johannes Gehrke, Thorsten Joachims | We propose a novel model for reasoning across components of Big Data Pipelines in a probabilistically well-founded manner. |
15 | Big data analytics with small footprint: squaring the cloud | John Canny, Huasha Zhao | This paper describes the BID Data Suite, a collection of hardware, software and design patterns that enable fast, large-scale data mining at very low cost. We present several benchmark problems to show how the above elements combine to yield multiple orders-of-magnitude improvements for each problem. |
16 | Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees | Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, Maria Tsiarli | In this paper, we define a novel density function, which gives subgraphs of much higher quality than densest subgraphs: the graphs found by our method are compact, dense, and with smaller diameter. |
17 | Guided learning for role discovery (GLRD): framework, algorithms, and applications | Sean Gilpin, Tina Eliassi-Rad, Ian Davidson | We provide an alternating least squares framework that allows convex constraints to be placed on the role discovery problem, which can provide useful supervision. |
18 | Redundancy-aware maximal cliques | Jia Wang, James Cheng, Ada Wai-Chee Fu | In this paper, we aim at providing a concise and complete summary of the set of maximal cliques, which is useful to many applications. |
19 | Selective sampling on graphs for classification | Quanquan Gu, Charu Aggarwal, Jialu Liu, Jiawei Han | In this paper, motivated by the ubiquity of graph representations in real-world applications, we propose to study selective sampling on graphs. |
20 | Density-based logistic regression | Wenlin Chen, Yixin Chen, Yi Mao, Baolong Guo | This paper introduces a nonlinear logistic regression model for classification. |
21 | MI2LS: multi-instance learning from multiple informationsources | Dan Zhang, Jingrui He, Richard Lawrence | Out of a similar motivation, to incorporate the consistencies between different information sources into MIL, we propose a novel research framework — Multi-Instance Learning from Multiple Information Sources (MI2LS). |
22 | Querying discriminative and representative samples for batch mode active learning | Zheng Wang, Jieping Ye | In this paper, we generalize the empirical risk minimization principle to the active learning setting. |
23 | SVM | Harikrishna Narasimhan, Shivani Agarwal | In this paper, we develop a new support vector method, SVMpAUCtight, that optimizes a tighter convex upper bound on the partial AUC loss, which leads to both improved accuracy and reduced computational complexity. |
24 | Succinct interval-splitting tree for scalable similarity search of compound-protein pairs with property constraints | Yasuo Tabei, Akihiro Kishimoto, Masaaki Kotera, Yoshihiro Yamanishi | We present the succinct interval-splitting tree algorithm (SITA) that efficiently per- forms similarity search in databases for compound-protein pairs with respect to both binary fingerprints and real-valued properties. |
25 | Multi-source learning with block-wise missing data for Alzheimer’s disease prediction | Shuo Xiang, Lei Yuan, Wei Fan, Yalin Wang, Paul M. Thompson, Jieping Ye | Our major contributions are threefold: (1) the proposed models handle both feature-level and source-level analysis in a unified formulation and include several existing feature learning approaches as special cases; (2) the model for incomplete data avoids direct imputation of the missing elements and thus provides superior performances. |
26 | Network discovery via constrained tensor analysis of fMRI data | Ian Davidson, Sean Gilpin, Owen Carmichael, Peter Walker | We pose the problem of network discovery which involves simplifying spatio-temporal data into cohesive regions (nodes) and relationships between those regions (edges). |
27 | Learning to question: leveraging user preferences for shopping advice | Mahashweta Das, Gianmarco De Francisci Morales, Aristides Gionis, Ingmar Weber | In this paper we show (i) how to learn the structure of the tree, i.e., which questions to ask at each node, and (ii) how to produce a suitable ranking at each node. |
28 | Active learning and search on low-rank matrices | Dougal J. Sutherland, Barnabás Póczos, Jeff Schneider | This work presents a general approach for active collaborative prediction with the Probabilistic Matrix Factorization model. |
29 | LCARS: a location-content-aware recommender system | Hongzhi Yin, Yizhou Sun, Bin Cui, Zhiting Hu, Ling Chen | In this paper, we propose LCARS, a location-content-aware recommender system that offers a particular user a set of venues (e.g., restaurants) or events (e.g., concerts and exhibitions) by giving consideration to both personal interest and local preference. |
30 | Comparing apples to oranges: a scalable solution with heterogeneous hashing | Mingdong Ou, Peng Cui, Fei Wang, Jun Wang, Wenwu Zhu, Shiqiang Yang | In this paper, we address the problem of “comparing apples to oranges” under the large scale setting. |
31 | Fast and scalable polynomial kernels via explicit feature maps | Ninh Pham, Rasmus Pagh | Fast and scalable polynomial kernels via explicit feature maps |
32 | Indexed block coordinate descent for large-scale linear classification with limited memory | Ian En-Hsu Yen, Chun-Fu Chang, Ting-Wei Lin, Shan-Wei Lin, Shou-De Lin | In this paper, we show how a Block Coordinate Descent method based on Nearest-Neighbor Index can significantly reduce such cost when learning a dual-sparse model. |
33 | Recursive regularization for large-scale classification with hierarchical and graphical dependencies | Siddharth Gopal, Yiming Yang | In this paper we propose a regularization framework for large-scale hierarchical classification that addresses both the problems. |
34 | Discovering latent influence in online social activities via shared cascade poisson processes | Tomoharu Iwata, Amar Shah, Zoubin Ghahramani | In this paper, we propose a probabilistic model for discovering latent influence from sequences of item adoption events. |
35 | STRIP: stream learning of influence probabilities | Konstantin Kutzkov, Albert Bifet, Francesco Bonchi, Aristides Gionis | Motivated by modern microblogging platforms, such as twitter, in this paper we study the problem of learning influence probabilities in a data-stream scenario, in which the network topology is relatively stable and the challenge of a learning algorithm is to keep up with a continuous stream of tweets using a small amount of time and memory. |
36 | Fast structure learning in generalized stochastic processes with latent factors | Mohammad Taha Bahadori, Yan Liu, Eric P. Xing | In this paper, we analyze a flexible stochastic process model, the generalized linear auto-regressive process (GLARP) and identify the conditions under which the impact of hidden variables appears as an additive term to the evolution matrix estimated with the maximum likelihood. |
37 | Robust sparse estimation of multiresponse regression and inverse covariance matrix via the L2 distance | Aurelie C. Lozano, Huijing Jiang, Xinwei Deng | We propose a robust framework to jointly perform two key modeling tasks involving high dimensional data: (i) learning a sparse functional mapping from multiple predictors to multiple responses while taking advantage of the coupling among responses, and (ii) estimating the conditional dependency structure among responses while adjusting for their predictors. |
38 | Exact sparse recovery with L0 projections | Ping Li, Cun-Hui Zhang | This paper focuses on the problem of recovering a K-sparse signal x ∈ R/1×N, i.e., K << N and ∑N/i=1 1{xi ≠ 0} = K. |
39 | Robust principal component analysis via capped norms | Qian Sun, Shuo Xiang, Jieping Ye | In this paper, we present a novel non-convex formulation for the RPCA problem using the capped trace norm and the capped l1-norm. |
40 | Flexible and robust co-regularized multi-domain graph clustering | Wei Cheng, Xiang Zhang, Zhishan Guo, Yubao Wu, Patrick F. Sullivan, Wei Wang | In this paper, we propose a flexible and robust framework, CGC (Co-regularized Graph Clustering), based on non-negative matrix factorization (NMF), to tackle these challenges. |
41 | Graph cluster randomization: network exposure to multiple universes | Johan Ugander, Brian Karrer, Lars Backstrom, Jon Kleinberg | In this work, we propose a novel methodology using graph clustering to analyze average treatment effects under social interference. |
42 | Social influence based clustering of heterogeneous information networks | Yang Zhou, Ling Liu | In this paper, we present a social influence based clustering framework for analyzing heterogeneous information networks with three unique features. |
43 | Confluence: conformity influence in large social networks | Jie Tang, Sen Wu, Jimeng Sun | We propose Confluence model to formalize the effects of social conformity into a probabilistic model. |
44 | The role of information diffusion in the evolution of social networks | Lilian Weng, Jacob Ratkiewicz, Nicola Perra, Bruno Gonçalves, Carlos Castillo, Francesco Bonchi, Rossano Schifanella, Filippo Menczer, Alessandro Flammini | Here we present an analysis of longitudinal micro-blogging data, revealing a more nuanced view of the strategies employed by users when expanding their social circles. |
45 | Information cascade at group scale | Milad Eftekhar, Yashar Ganjali, Nick Koudas | In this paper, we generalize the "influential nodes" problem. |
46 | Extracting social events for learning better information diffusion models | Shuyang Lin, Fengjiao Wang, Qingbo Hu, Philip S. Yu | Learning of the information diffusion model is a fundamental problem in the study of information diffusion in social networks. |
47 | Model selection in markovian processes | Assaf Hallak, Dotan Di-Castro, Shie Mannor | In this work we address the problem of how to use time series data to choose from a finite set of candidate discrete state spaces, where these spaces are constructed by a domain expert. |
48 | DTW-D: time series semi-supervised learning from a single example | Yanping Chen, Bing Hu, Eamonn Keogh, Gustavo E.A.P.A Batista | In this work we argue that the availability of this resource has isolated much of the research community from the following reality, labeled time series data is often very difficult to obtain. |
49 | Model-based kernel for efficient time series analysis | Huanhuan Chen, Fengzhen Tang, Peter Tino, Xin Yao | We present novel, efficient, model based kernels for time series data rooted in the reservoir computation framework. |
50 | Mining lines in the sand: on trajectory discovery from untrustworthy data in cyber-physical system | Lu-An Tang, Xiao Yu, Quanquan Gu, Jiawei Han, Alice Leung, Thomas La Porta | In this study, we propose a method called LiSM (Line-in-the-Sand Miner) to discover trajectories from untrustworthy sensor data. |
51 | A general bootstrap performance diagnostic | Ariel Kleiner, Ameet Talwalkar, Sameer Agarwal, Ion Stoica, Michael I. Jordan | Thus, we present here a general diagnostic procedure which directly and automatically evaluates the accuracy of the bootstrap’s outputs, determining whether or not the bootstrap is performing satisfactorily when applied to a given dataset and estimator. |
52 | Subsampling for efficient and effective unsupervised outlier detection ensembles | Arthur Zimek, Matthew Gaudet, Ricardo J.G.B. Campello, Jörg Sander | Here, we propose and study subsampling as a technique to induce diversity among individual outlier detectors. |
53 | A phrase mining framework for recursive construction of a topical hierarchy | Chi Wang, Marina Danilevsky, Nihit Desai, Yinan Zhang, Phuong Nguyen, Thrivikrama Taula, Jiawei Han | In this paper we propose an algorithm for recursively constructing a hierarchy of topics from a collection of content-representative documents. |
54 | Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation | James Foulds, Levi Boyles, Christopher DuBois, Padhraic Smyth, Max Welling | We propose a stochastic algorithm for collapsed variational Bayesian inference for LDA, which is simpler and more efficient than the state of the art method. |
55 | WiseMarket: a new paradigm for managing wisdom of online social users | Caleb Chen Cao, Yongxin Tong, Lei Chen, H. V. Jagadish | In this paper, we present Wise Market as an effective framework for crowdsourcing on social media that motivates users to participate in a task with care and correctly aggregates their opinions on pairwise choice problems. |
56 | Multi-label relational neighbor classification using social context features | Xi Wang, Gita Sukthankar | In this paper, we focus on the problem of performing multi-label classification on networked data, where the instances in the network can be assigned multiple labels. |
57 | Scalable text and link analysis with mixed-topic link models | Yaojia Zhu, Xiaoran Yan, Lise Getoor, Cristopher Moore | In this paper, we combine classic ideas in topic modeling with a variant of the mixed-membership block model recently developed in the statistical physics community. |
58 | Collaborative boosting for activity classification in microblogs | Yangqiu Song, Zhengdong Lu, Cane Wing-ki Leung, Qiang Yang | In this light, we propose a novel collaborative boosting framework comprising a text-to-activity classifier for each user, and a mechanism for collaboration between classifiers of users having social connections. |
59 | Trace complexity of network inference | Bruno Abrahao, Flavio Chierichetti, Robert Kleinberg, Alessandro Panconesi | We give algorithms that are competitive with, while being simpler and more efficient than, existing network inference approaches. |
60 | Debiasing social wisdom | Abhimanyu Das, Sreenivas Gollapudi, Rina Panigrahy, Mahyar Salek | Using a natural model of opinion formation, we analyze the effect of these interactions on an individual’s opinion and estimate her propensity to conform. |
61 | Mining discriminative subgraphs from global-state networks | Sayan Ranu, Minh Hoang, Ambuj Singh | In this paper, we explore this problem and design a technique called MINDS to mine minimally discriminative subgraphs from large global-state networks. |
62 | Approximate graph mining with label costs | Pranay Anchuri, Mohammed J. Zaki, Omer Barkol, Shahar Golan, Moshe Shamy | We present novel and scalable methods to efficiently solve the approximate isomorphism problem. |
63 | Summarizing probabilistic frequent patterns: a fast approach | Chunyang Liu, Ling Chen, Chengqi Zhang | In this paper, we focus on the problem of mining probabilistic representative frequent patterns (P-RFP), which is the minimal set of patterns with adequately high probability to represent all frequent patterns. |
64 | Mining high utility episodes in complex event sequences | Cheng-Wei Wu, Yu-Feng Lin, Philip S. Yu, Vincent S. Tseng | To address these issues, in this paper, we incorporate the concept of utility into episode mining and address a new problem of mining high utility episodes from complex event sequences, which has not been explored so far. |
65 | Mining frequent graph patterns with differential privacy | Entong Shen, Ting Yu | In this paper we propose the first differentially private algorithm for mining frequent graph patterns. |
66 | Statistical quality estimation for general crowdsourcing tasks | Yukino Baba, Hisashi Kashima | In this paper, we propose an unsupervised statistical quality estimation method for such general crowdsourcing tasks. |
67 | Psychological advertising: exploring user psychology for click prediction in sponsored search | Taifeng Wang, Jiang Bian, Shusen Liu, Yuyu Zhang, Tie-Yan Liu | In this paper, we aim at answering this “why” question. |
68 | SIGMa: simple greedy matching for aligning large knowledge bases | Simon Lacoste-Julien, Konstantina Palla, Alex Davies, Gjergji Kasneci, Thore Graepel, Zoubin Ghahramani | Here, we present Simple Greedy Matching (SiGMa), a simple algorithm for aligning knowledge bases with millions of entities and facts. |
69 | Simple and deterministic matrix sketching | Edo Liberty | In this paper we adapt a well known streaming algorithm for approximating item frequencies to the matrix sketching setting. |
70 | A space efficient streaming algorithm for triangle counting using the birthday paradox | Madhav Jha, C. Seshadhri, Ali Pinar | We design a space efficient algorithm that approximates the transitivity (global clustering coefficient) and total triangle count with only a single pass through a graph given as a stream of edges. |
71 | Who, where, when and what: discover spatio-temporal topics for twitter users | Quan Yuan, Gao Cong, Zongyang Ma, Aixin Sun, Nadia Magnenat- Thalmann | In this paper, we propose a probabilistic model W4 (short for Who+Where+When+What) to exploit such data to discover individual users’ mobility behaviors from spatial, temporal and activity aspects. |
72 | Multi-label classification by mining label and instance correlations from heterogeneous information networks | Xiangnan Kong, Bokai Cao, Philip S. Yu | In this paper, we propose to use heterogeneous information networks to facilitate the multi-label classification process. |
73 | Accurate intelligible models with pairwise interactions | Yin Lou, Rich Caruana, Johannes Gehrke, Giles Hooker | In this paper, we suggest adding selected terms of interacting pairs of features to standard GAMs. |
74 | Spotting opinion spammers using behavioral footprints | Arjun Mukherjee, Abhinav Kumar, Bing Liu, Junhui Wang, Meichun Hsu, Malu Castellanos, Riddhiman Ghosh | This work proposes a novel angle to the problem by modeling spamicity as latent. |
75 | An efficient ADMM algorithm for multidimensional anisotropic total variation regularization problems | Sen Yang, Jie Wang, Wei Fan, Xiatian Zhang, Peter Wonka, Jieping Ye | In this paper, we propose an efficient alternating augmented Lagrangian method (ADMM) to solve total variation regularization problems. |
76 | Speeding up large-scale learning with a social prior | Deepayan Chakrabarti, Ralf Herbrich | We study this problem in a fully Bayesian setting, focusing on the problem of using Facebook user-IDs as features, with the social network giving the relationship structure. |
77 | FISM: factored item similarity models for top-N recommender systems | Santosh Kabbur, Xia Ning, George Karypis | To alleviate this problem, we present an item-based method for generating top-N recommendations that learns the item-item similarity matrix as the product of two low dimensional latent factor matrices. |
78 | Nonparametric hierarchal bayesian modeling in non-contractual heterogeneous survival data | Shouichi Nagano, Yusuke Ichikawa, Noriko Takaya, Tadasu Uchiyama, Makoto Abe | To overcome this problem, we present a new survival model using a non-parametric Bayes paradigm with MCMC. |
79 | Cross-task crowdsourcing | Kaixiang Mo, Erheng Zhong, Qiang Yang | In this paper, we employ transfer learning, which borrows knowledge from auxiliary historical tasks to improve the data veracity in a given target task. |
80 | Evaluating the crowd with confidence | Manas Joglekar, Hector Garcia-Molina, Aditya Parameswaran | In this work, we devise techniques to generate confidence intervals for worker error rate estimates, thereby enabling a better evaluation of worker quality. |
81 | Inferring social roles and statuses in social networks | Yuchen Zhao, Guan Wang, Philip S. Yu, Shaobo Liu, Simon Zhang | In this paper, we investigate the social roles and statuses that people act in online social networks in the perspective of network structures, since the uniqueness of social networks is connecting people. |
82 | Adaptive collective routing using gaussian process dynamic congestion models | Siyuan Liu, Yisong Yue, Ramayya Krishnan | We consider the problem of adaptively routing a fleet of cooperative vehicles within a road network in the presence of uncertain and dynamic congestion conditions. |
83 | Maximizing acceptance probability for active friending in online social networks | De-Nian Yang, Hui-Ju Hung, Wang-Chien Lee, Wei Chen | In this paper, we advocate a recommendation support for active friending, where a user actively specifies a friending target. |
84 | Mining evolutionary multi-branch trees from text streams | Xiting Wang, Shixia Liu, Yangqiu Song, Baining Guo | In this paper, we propose an evolutionary multi-branch tree clustering method for streaming text data. |
85 | Active search on graphs | Xuezhi Wang, Roman Garnett, Jeff Schneider | Inspired by the success of myopic methods for active learning and bandit problems, we propose a myopic method for active search on graphs. |
86 | Fast rank-2 nonnegative matrix factorization for hierarchical document clustering | Da Kuang, Haesun Park | In this paper, we propose an efficient hierarchical document clustering method based on a new algorithm for rank-2 NMF. |
87 | A “semi-lazy” approach to probabilistic path prediction in dynamic environments | Jingbo Zhou, Anthony K.H. Tung, Wei Wu, Wee Siong Ng | We propose a "semi-lazy" approach to path prediction that builds prediction models on the fly using dynamically selected reference trajectories. |
88 | Optimizing parallel belief propagation in junction treesusing regression | Lu Zheng, Ole Mengshoel | In this paper, we investigate a machine learning approach to minimize the execution time of parallel junction tree algorithms implemented on a GPU. |
89 | Multi-source deep learning for information trustworthiness estimation | Liang Ge, Jing Gao, Xiaoyi Li, Aidong Zhang | In this paper, we investigate the important problem of estimating information trustworthiness from the perspective of correlating and comparing multiple data sources. |
90 | Unsupervised link prediction using aggregative statistics on heterogeneous social networks | Tsung-Ting Kuo, Rui Yan, Yu-Yang Huang, Perng-Hwa Kung, Shou-De Lin | This paper devises a novel unsupervised framework to solve this problem, including two main components: (1) a three-layer factor graph model and three types of potential functions; (2) a ranked-margin learning and inference algorithm. |
91 | Link prediction with social vector clocks | Conrad Lee, Bobo Nick, Ulrik Brandes, Pádraig Cunningham | We here show that computationally less expensive features can achieve the same performance in the common scenario in which the data is available as a sequence of interactions. |
92 | Geo-spotting: mining online location-based services for optimal retail store placement | Dmytro Karamshuk, Anastasios Noulas, Salvatore Scellato, Vincenzo Nicosia, Cecilia Mascolo | In this paper we study the predictive power of various machine learning features on the popularity of retail stores in the city through the use of a dataset collected from Foursquare in New York. |
93 | Location-aware publish/subscribe | Guoliang Li, Yang Wang, Ting Wang, Jianhua Feng | We propose an rtree based index structure by integrating textual descriptions into rtree nodes. |
94 | Quadratic optimization to identify highly heritable quantitative traits from complex phenotypic features | Jiangwen Sun, Jinbo Bi, Henry R. Kranzler | We propose a quadratic optimization approach that directly utilizes heritability as an objective during the derivation of quantitative traits of a disease. |
95 | Repetition-aware content placement in navigational networks | Dora Erdos, Vatche Ishakian, Azer Bestavros, Evimaria Terzi | The key contribution of our work is the introduction of memory into the navigation process, by making user conversion dependent on the number of her exposures to that content. |
96 | Scalable all-pairs similarity search in metric spaces | Ye Wang, Ahmed Metwally, Srinivasan Parthasarathy | In this article, we propose a parallel framework for solving this problem in metric spaces. |
97 | Massively parallel expectation maximization using graphics processing units | Muzaffer Can Altinigneli, Claudia Plant, Christian Böhm | In this paper, we propose an innovative EM clustering algorithm particularly suited for the GPU platform on NVIDIA’s Fermi architecture. |
98 | Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms | Chris Thornton, Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown | We consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previous work that attacks these issues separately. |
99 | Direct optimization of ranking measures for learning to rank models | Ming Tan, Tian Xia, Lily Guo, Shaojun Wang | We present a novel learning algorithm, DirectRank, which directly and exactly optimizes ranking measures without resorting to any upper bounds or approximations. |
100 | Multi-space probabilistic sequence modeling | Shuo Chen, Jiexun Xu, Thorsten Joachims | In this paper, we propose a method that trains not one monolithic model, but multiple local embeddings for a class of pairwise conditional models especially suited for sequence and co-occurrence modeling. |
101 | Towards never-ending learning from time series streams | Yuan Hao, Yanping Chen, Jesin Zakaria, Bing Hu, Thanawin Rakthanmanon, Eamonn Keogh | Based on this observation, we propose a never-ending learning framework for time series in which an agent examines an unbounded stream of data and occasionally asks a teacher (which may be a human or an algorithm) for a label. |
102 | Constrained stochastic gradient descent for large-scale least squares problem | Yang Mu, Wei Ding, Tianyi Zhou, Dacheng Tao | In this paper, we present the Constrained Stochastic Gradient Descent (CSGD) algorithm to solve the large-scale least squares problem. |
103 | Making recommendations from multiple domains | Wei Chen, Wynne Hsu, Mong Li Lee | In this work, we propose a generalized cross domain collaborative filtering framework that integrates social network information seamlessly with cross domain data. |
104 | Cascading outbreak prediction in networks: a data-driven approach | Peng Cui, Shifei Jin, Linyun Yu, Fei Wang, Wenwu Zhu, Shiqiang Yang | In this paper, we attempt harnessing historical cascade data, propose a novel data driven approach to select important nodes as sensors, and predict the outbreaks based on the cascading behaviors of these sensors. |
105 | Combining latent factor model with location features for event-based group recommendation | Wei Zhang, Jianyong Wang, Wei Feng | In this paper, we propose a method called Pairwise Tag enhAnced and featuRe-based Matrix factorIzation for Group recommendAtioN (PTARMIGAN), which considers location features, social features, and implicit patterns simultaneously in a unified model. |
106 | Cost-sensitive online active learning with application to malicious URL detection | Peilin Zhao, Steven C.H. Hoi | In particular, we propose two CSOAL algorithms and analyze their theoretical performance in terms of cost-sensitive bounds. |
107 | The bang for the buck: fair competitive viral marketing from the host perspective | Wei Lu, Francesco Bonchi, Amit Goyal, Laks V.S. Lakshmanan | In this paper we propose and study the novel problem of competitive viral marketing from the perspective of the host, i.e., the owner of the social network platform. |
108 | Modeling the dynamics of composite social networks | Erheng Zhong, Wei Fan, Yin Zhu, Qiang Yang | In this paper, we study the problem of modeling the dynamics of composite networks, where the evolution processes of different networks are jointly considered. |
109 | A time-dependent enhanced support vector machine for time series regression | Goce Ristanoski, Wei Liu, James Bailey | Once we identified the samples that produced the largest errors, we observed their correlation with distribution shifts that occur in the time series. |
110 | A new collaborative filtering approach for increasing the aggregate diversity of recommender systems | Katja Niemann, Martin Wolpers | In this paper, we propose a new collaborative filtering approach that is based on the items’ usage contexts. |
111 | Scalable inference in max-margin topic models | Jun Zhu, Xun Zheng, Li Zhou, Bo Zhang | In this paper, we present a highly scalable approach to building max-margin supervised topic models. |
112 | A data-driven method for in-game decision making in MLB: when to pull a starting pitcher | Ganeshapillai Gartheeban, John Guttag | In this paper we show how machine learning can be applied to generate a model that could lead to better on-field decisions by managers of professional baseball teams. |
113 | Exploiting user clicks for automatic seed set generation for entity matching | Xiao Bai, Flavio P. Junqueira, Srinivasan H. Sengamedu | In this paper, we present an approach that leverages user clicks during Web search to automatically generate training data for entity matching. |
114 | Silence is also evidence: interpreting dwell time for recommendation from psychological perspective | Peifeng Yin, Ping Luo, Wang-Chien Lee, Min Wang | Based on the observation that the dwell time on an item may reflect the opinion of a user, we aim to enrich the user-vote matrix by converting the dwell time on items into users’ “pseudo votes” and then help improve recommendation performance. |
115 | Efficient single-source shortest path and distance queries on large graphs | Andy Diwen Zhu, Xiaokui Xiao, Sibo Wang, Wenqing Lin | To address the deficiency of existing work, this paper presents Highways-on-Disk (HoD), a disk-based index that supports both SSD and SSSP queries on directed and weighted graphs. |
116 | On community detection in real-world networks and the importance of degree assortativity | Marek Ciglan, Michal Laclavík, Kjetil Nørvåg | In this paper, we focus on several popular community detection algorithms with low computational complexity and with decent performance on the artificial benchmarks, and we study their behaviour on real-world networks. |
117 | Trial and error in influential social networks | Xiaohui Bei, Ning Chen, Liyu Dou, Xiangru Huang, Ruixin Qiang | In this paper, we introduce a trial-and-error model to study information diffusion in a social network. |
118 | Collaborative matrix factorization with multiple similarities for predicting drug-target interactions | Xiaodong Zheng, Hao Ding, Hiroshi Mamitsuka, Shanfeng Zhu | We propose a factor model, named Multiple Similarities Collaborative Matrix Factorization(MSCMF), which projects drugs and targets into a common low-rank feature space, which is further consistent with weighted similarity matrices over drugs and those over targets. |
119 | FeaFiner: biomarker identification from medical data through feature generalization and selection | Jiayu Zhou, Zhaosong Lu, Jimeng Sun, Lei Yuan, Fei Wang, Jieping Ye | To address this problem, we propose FeaFiner (short for Feature Refiner), an efficient formulation that simultaneously generalizes low-level features into higher level concepts and then selects relevant concepts based on the target variable. |
120 | Learning geographical preferences for point-of-interest recommendation | Bin Liu, Yanjie Fu, Zijun Yao, Hui Xiong | To this end, in this paper, we propose a novel geographical probabilistic factor analysis framework which strategically takes various factors into consideration. |
121 | Learning mixed kronecker product graph models with simulated method of moments | Sebastian I. Moreno, Jennifer Neville, Sergey Kirshner | In this work, we present the first learning algorithm for mKPGMs. |
122 | Measuring spontaneous devaluations in user preferences | Komal Kapoor, Nisheeth Srivastava, Jaideep Srivastava, Paul Schrater | In this work, we study the music listening histories of Last.fm users focusing on the changes in their preferences based on their choices for different artists at different points in time. |
123 | Mining evidences for named entity disambiguation | Yang Li, Chi Wang, Fangqiu Han, Jiawei Han, Dan Roth, Xifeng Yan | In this work, we propose a generative model and an incremental algorithm to automatically mine useful evidences across documents. |
124 | Privacy-preserving data exploration in genome-wide association studies | Aaron Johnson, Vitaly Shmatikov | We present a set of practical, privacy-preserving data mining algorithms for GWAS datasets. |
125 | Synthetic review spamming and defense | Huan Sun, Alex Morales, Xifeng Yan | In this paper, we introduce a very simple, but powerful review spamming technique that could fail the existing feature-based detection algorithms easily. |
126 | Information cartography: creating zoomable, large-scale maps of information | Dafna Shahaf, Jaewon Yang, Caroline Suen, Jeff Jacobs, Heidi Wang, Jure Leskovec | In this paper, we formalize characteristics of good zoomable maps and formulate their construction as an optimization problem. |
127 | Restreaming graph partitioning: simple versatile algorithms for advanced balancing | Joel Nishimura, Johan Ugander | In this work we introduce restreaming graph partitioning and develop algorithms that scale similarly to streaming partitioning algorithms yet empirically perform as well as fully offline algorithms. |
128 | Understanding evolution of research themes: a probabilistic generative model for citations | Xiaolong Wang, Chengxiang Zhai, Dan Roth | In this paper, we propose a novel way of analyzing literature citation to explore the research topics and the theme evolution by modeling article citation relations with a probabilistic generative model. |
129 | On the equivalent of low-rank linear regressions and linear discriminant analysis based regressions | Xiao Cai, Chris Ding, Feiping Nie, Heng Huang | In this paper, we will prove that the low-rank regression model is equivalent to doing linear regression in the linear discriminant analysis (LDA) subspace. |
130 | To buy or not to buy: that is the question | Oren Etzioni | In this talk, I’ll describe how we utilize advanced data-mining and text-mining techniques at Decide.com (and earlier at Farecast) to solve these problems for on-line shoppers. |
131 | Mining the digital universe of data to develop personalized cancer therapies | Eric Schadt | Mining the digital universe of data to develop personalized cancer therapies |
132 | The business impact of deep learning | Jeremy Howard | The business impact of deep learning |
133 | Adaptive adversaries: building systems to fight fraud and cyber intruders | Ari Gesher | In this talk, we’ll take a look at case studies of three different systems, using a partnership of automation and human analysis on large scale data to find the clandestine human behavior that these datasets hold, including a discussion of the backend systems architecture and a demo of the interactive analysis environment. |
134 | Targeting and influencing at scale: from presidential elections to social good | Rayid Ghani | If you’re still recovering from the barrage of ads, news, emails, Facebook posts, and newspaper articles that were giving you the latest poll numbers, asking you to volunteer, donate money, and vote, this talk will give you a look behind the scenes on why you were seeing what you were seeing. |
135 | Hadoop: a view from the trenches | Milind Bhandarkar | In this talk I will reminisce about the early days of Hadoop, and will give an overview of the current state of the Hadoop ecosystem, and some real-world use cases of this open source platform. |
136 | Cyber security: how visual analytics unlock insight | Raffael Marty | In this talk we will have a look at what approaches have been explored, what has worked, and what has not. In the Cyber Security domain, we have been collecting ‘big data’ for almost two decades. |
137 | Using "big data" to solve "small data" problems | Chris Neumann | In this talk, Chris Neumann will discuss how DataHero applied the principles of user-centric design and development over a year and a half to create a product with which more than 95% of new users can get answers on their first attempt. |
138 | Financing lead triggers: empowering sales reps through knowledge discovery and fusion | Kareem S. Aggour, Bethany Hoogs | Here we describe a system built to automate the collection and aggregation of information on companies, which is then mined to identify actionable sales leads. |
139 | Query clustering based on bid landscape for sponsored search auction optimization | Ye Chen, Weiguo Liu, Jeonghee Yi, Anton Schwaighofer, Tak W. Yan | In this paper we present a formalism of clustering probability distributions, and its application to query clustering where each query is represented as a probability density of click-through rate (CTR) weighted bid and distortion is measured by KL divergence. |
140 | Analysis of advanced meter infrastructure data of water consumption in apartment buildings | Einat Kermany, Hanna Mazzawi, Dorit Baras, Yehuda Naveh, Hagai Michaelis | We present our experience of using machine learning techniques over data originating from advanced meter infrastructure (AMI) systems for water consumption in a medium-size city. |
141 | Online controlled experiments at large scale | Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, Nils Pohlmann | We discuss why negative experiments, which degrade the user experience short term, should be run, given the learning value and long-term benefits. |
142 | iHR: an online recruiting system for Xiamen Talent Service Center | Wenxing Hong, Lei Li, Tao Li, Wenfu Pan | In this paper, we investigate and compare various online recruiting systems from a product perspective. |
143 | Dynamic memory allocation policies for postings in real-time Twitter search | Nima Asadi, Jimmy Lin, Michael Busch | In this paper, we focus on one aspect: dynamic postings allocation policies for index structures that are completely held in main memory. |
144 | A unified search federation system based on online user feedback | Luo Jie, Sudarshan Lamkhede, Rochit Sapra, Evans Hsu, Helen Song, Yi Chang | In this paper, we propose a unified framework for the search federation problem. |
145 | Amplifying the voice of youth in Africa via text analytics | Prem Melville, Vijil Chenthamarakshan, Richard D. Lawrence, James Powell, Moses Mugisha, Sharad Sapra, Rajesh Anandan, Solomon Assefa | This paper describes an automated message-understanding and routing system deployed by IBM at UNICEF. |
146 | Scalable supervised dimensionality reduction using clustering | Troy Raeder, Claudia Perlich, Brian Dalessandro, Ori Stitelman, Foster Provost | We present experimental results showing that for this task our algorithm outperforms other popular dimensionality-reduction algorithms across a wide variety of ad campaigns, as well as production results that showcase its performance in practice. |
147 | Ad click prediction: a view from the trenches | H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, Jeremy Kubica | The goal of this paper is to highlight the close relationship between theoretical advances and practical engineering in this industrial setting, and to show the depth of challenges that appear when applying traditional machine learning methods in a complex dynamic system. |
148 | Modeling and probabilistic reasoning of population evacuation during large-scale disaster | Xuan Song, Quanshi Zhang, Yoshihide Sekimoto, Teerayut Horanont, Satoshi Ueyama, Ryosuke Shibasaki | In this paper, we construct a large human mobility database that stores and manages GPS records from mobile devices used by approximately 1.6 million people throughout Japan from 1 August 2010 to 31 July 2011. |
149 | Using co-visitation networks for detecting large scale online display advertising exchange fraud | Ori Stitelman, Claudia Perlich, Brian Dalessandro, Rod Hook, Troy Raeder, Foster Provost | In this paper, we will show examples of how non-intentional traffic that is produced by fraudulent activities adversely affects both general analytics and predictive models, and propose an approach using co-visitation networks to identify sites that have large amounts of this fraudulent traffic. |
150 | An integrated framework for optimizing automatic monitoring systems in large IT infrastructures | Liang Tang, Tao Li, Larisa Shwartz, Florian Pinel, Genady Ya Grabarnik | This paper describes an integrated framework for minimizing false positive tickets and maximizing the monitoring coverage for system faults. |
151 | Improving quality control by early prediction of manufacturing outcomes | Sholom M. Weiss, Amit Dhurandhar, Robert J. Baseman | We describe methods for continual prediction of manufactured product quality prior to final testing. |
152 | A data mining driven risk profiling method for road asset management | Daniel Emerson, Justin Z. Weligamage, Richi Nayak | Road surface skid resistance has been shown to have a strong relationship to road crash risk, however, applying the current method of using investigatory levels to identify crash prone roads is problematic as they may fail in identifying risky roads outside of the norm. |
153 | Why people hate your app: making sense of user feedback in a mobile app store | Bin Fu, Jialiu Lin, Lei Li, Christos Faloutsos, Jason Hong, Norman Sadeh | In this paper, we propose Wiscom, a system that can analyze tens of millions user ratings and comments in mobile app markets at three different levels of detail. |
154 | Towards long-lead forecasting of extreme flood events: a data mining framework for precipitation cluster precursors identification | Dawei Wang, Wei Ding, Kui Yu, Xindong Wu, Ping Chen, David L. Small, Shafiqul Islam | In this paper, we propose an integrated data mining framework for identifying the precursors to precipitation event clusters and use this information to predict extended periods of extreme precipitation and subsequent floods. |
155 | Predictive model performance: offline and online evaluations | Jeonghee Yi, Ye Chen, Jie Li, Swaraj Sett, Tak W. Yan | We study the accuracy of evaluation metrics used to estimate the efficacy of predictive models. |
156 | Uncertainty in online experiments with dependent data: an evaluation of bootstrap methods | Eytan Bakshy, Dean Eckles | We develop a framework for understanding how dependence affects uncertainty in user-item experiments and evaluate how bootstrap methods that account for differing levels of dependence perform in practice. |
157 | Knowledge discovery from massive healthcare claims data | Varun Chandola, Sreenivas R. Sukumar, Jack C. Schryver | Specifically, we translate the problem of analyzing healthcare data into some of the most well-known analysis problems in the data mining community, social network analysis, text mining, and temporal analysis and higher order feature construction, and describe how advances within each of these areas can be leveraged to understand the domain of healthcare. The objective of this paper is two fold: first, we introduce the emerging domain of "big" healthcare claims data to the KDD community, and second, we describe the success and challenges that we encountered in analyzing this data using state of art analytics for massive data. |
158 | Palette power: enabling visual search through colors | Anurag Bhardwaj, Atish Das Sarma, Wei Di, Raffay Hamid, Robinson Piramuthu, Neel Sundaresan | In this paper we present a simple and fast search algorithm that uses color as the main feature for building visual search. |
159 | Heat pump detection from coarse grained smart meter data with positive and unlabeled learning | Hongliang Fei, Younghun Kim, Sambit Sahu, Milind Naphade, Sanjay K. Mamidipalli, John Hutchinson | In this paper, we aim to detect electric heat pumps from coarse grained smart meter data for a heat pump marketing campaign. |
160 | Empirical bayes model to combine signals of adverse drug reactions | Rave Harpaz, William DuMouchel, Paea LePendu, Nigam H. Shah | We present a methodology based on empirical Bayes modeling to combine ADR signals mined from ~5 million adverse event reports collected by the FDA, and healthcare data corresponding to 46 million patients’ the main two types of information sources currently employed for signal detection. |
161 | Efficiently rewriting large multimedia application execution traces with few event sequences | Christiane Kamdem Kengne, Leon Constantin Fopa, Alexandre Termier, Noha Ibrahim, Marie-Christine Rousset, Takashi Washio, Miguel Santana | In this paper, we study the problem of finding a set of sequences of events that allows a reduced-size rewriting of the original trace. |
162 | Discriminant malware distance learning on structural information for automated malware classification | Deguang Kong, Guanhua Yan | In this work, we explore techniques that can automatically classify malware variants into their corresponding families. |
163 | Assessing team strategy using spatiotemporal data | Patrick Lucey, Dean Oliver, Peter Carr, Joe Roth, Iain Matthews | By way of example, we present an approach which uses an entire season of ball tracking data from the English Premier League (2010-2011 season) to reinforce the common held belief that teams should aim to "win home games and draw away ones". |
164 | Exploratory analysis of highly heterogeneous document collections | Arun S. Maiya, John P. Thompson, Francisco Loaiza-Lemos, Robert M. Rolfe | As one of our key tagging strategies, we introduce the KERA algorithm (Keyword Extraction for Reports and Articles). |
165 | Experience from hosting a corporate prediction market: benefits beyond the forecasts | Thomas A. Montgomery, Paul M. Stieg, Michael J. Cavaretta, Paul E. Moraal | We describe our experience, including both the strong and weak correlations found between predictions and real world results. |
166 | Detecting insider threats in a real corporate database of computer usage activity | Ted E. Senator, Henry G. Goldberg, Alex Memory, William T. Young, Brad Rees, Robert Pierce, Daniel Huang, Matthew Reardon, David A. Bader, Edmond Chow, Irfan Essa, Joshua Jones, Vinay Bettadapura, Duen Horng Chau, Oded Green, Oguz Kaya, Anita Zakrzewska, Erica Briscoe, Rudolph IV L. Mappus, Robert McColl, Lora Weiss, Thomas G. Dietterich, Alan Fern, Weng–Keen Wong, Shubhomoy Das, Andrew Emmott, Jed Irvine, Jay-Yoon Lee, Danai Koutra, Christos Faloutsos, Daniel Corkill, Lisa Friedland, Amanda Gentzel, David Jensen | This paper reports on methods and results of an applied research project by a team consisting of SAIC and four universities to develop, integrate, and evaluate new approaches to detect the weak signals characteristic of insider threats on organizations’ information systems. |
167 | Mining for geographically disperse communities in social networks by leveraging distance modularity | Paulo Shakarian, Patrick Roos, Devon Callahan, Cory Kirk | We apply a variant of Newman-Girvan modularity to this problem known as distance modularity. |
168 | An integrated framework for suicide risk prediction | Truyen Tran, Dinh Phung, Wei Luo, Richard Harvey, Michael Berk, Svetha Venkatesh | We present an integrated machine learning framework to tackle this challenge. |
169 | Gaussian multiple instance learning approach for mapping the slums of the world using very high resolution imagery | Ranga Raju Vatsavai | In this paper, we present a computationally efficient algorithm based on multiple instance learning for mapping informal settlements (slums) using very high-resolution remote sensing imagery. |
170 | A privacy preserving framework for managing vehicle data in road pricing systems | Huayu Wu, Wee Siong Ng, Kian-Lee Tan, Wei Wu, Shili Xiang, Mingqiang Xue | We propose a novel framework in which privacy protection is pushed to data provider site. |
171 | U-Air: when urban air quality inference meets big data | Yu Zheng, Furui Liu, Hsun-Ping Hsieh | In this paper, we infer the real-time and fine-grained air quality information throughout a city, based on the (historical and real-time) air quality data reported by existing monitor stations and a variety of data sources we observed in the city, such as meteorology, traffic flow, human mobility, structure of road networks, and point of interests (POIs). |
172 | Panel: a data scientist’s guide to making money from start-ups | Foster Provost, Geoffrey I. Webb | Panel: a data scientist’s guide to making money from start-ups |
173 | LAICOS: an open source platform for personalized social web search | Mohamed Reda Bouadjenek, Hakim Hacid, Mokrane Bouzeghoub | In this paper, we introduce LAICOS, a social Web search engine as a contribution to the growing area of Social Information Retrieval (SIR). |
174 | JobMiner: a real-time system for mining job-related patterns from social media | Yu Cheng, Yusheng Xie, Zhengzhang Chen, Ankit Agrawal, Alok Choudhary, Songtao Guo | In this paper, we analyze the job information from the social network point of view. |
175 | Inferring distant-time location in low-sampling-rate trajectories | Meng-Fen Chiang, Yung-Hsiang Lin, Wen-Chih Peng, Philip S. Yu | To efficiently process queries, we proposed the index structure Sorted Interval-Tree (SOIT) to organize location records. |
176 | AMETHYST: a system for mining and exploring topical hierarchies of heterogeneous data | Marina Danilevsky, Chi Wang, Fangbo Tao, Son Nguyen, Gong Chen, Nihit Desai, Lidan Wang, Jiawei Han | In this demo we present AMETHYST, a system for exploring and analyzing a topical hierarchy constructed from a heterogeneous information network (HIN). |
177 | A tool for collecting provenance data in social media | Pritam Gundecha, Suhas Ranganath, Zhuo Feng, Huan Liu | In this paper, we present a novel web-based tool for collecting the attributes of interest associated with a particular social media user related to the received information. |
178 | STED: semi-supervised targeted-interest event detectionin in twitter | Ting Hua, Feng Chen, Liang Zhao, Chang-Tien Lu, Naren Ramakrishnan | This paper presents STED, a semi-supervised system that helps users to automatically detect and interactively visualize events of a targeted type from twitter, such as crimes, civil unrests, and disease outbreaks. |
179 | Forex-foreteller: currency trend modeling using news articles | Fang Jin, Nathan Self, Parang Saraf, Patrick Butler, Wei Wang, Naren Ramakrishnan | In this demo, we present Forex-foreteller (FF) which mines news articles and makes forecasts about the movement of foreign currency markets. |
180 | Real-time disease surveillance using Twitter data: demonstration on flu and cancer | Kathy Lee, Ankit Agrawal, Alok Choudhary | In this paper, we describe a novel real-time flu and cancer surveillance system that uses spatial, temporal, and text mining on Twitter data. |
181 | KeySee: supporting keyword search on evolving events in social streams | Pei Lee, Laks V.S. Lakshmanan, Evangelos Milios | In this demo, we provide a new solution called \keysee by grouping posts into events, and track the evolution patterns of events as new posts stream in and old posts fade out. |
182 | Understanding Twitter data with TweetXplorer | Fred Morstatter, Shamanth Kumar, Huan Liu, Ross Maciejewski | We present TweetXplorer, a system for analysts with little information about an event to gain knowledge through the use of effective visualization techniques. |
183 | An online system with end-user services: mining novelty concepts from tv broadcast subtitles | Mika Rautiainen, Jouni Sarvanko, Arto Heikkinen, Mika Ylianttila, Vassilis Kostakos | In this paper we introduce our data mining system and accompanying services for summarizing Finnish DVB broadcast streams from seven national channels. |
184 | When TEDDY meets GrizzLY: temporal dependency discovery for triggering road deicing operations | Céline Robardet, Vasile-Marian Scuturici, Marc Plantevit, Antoine Fraboulet | TEDDY algorithm aims at discovering such dependencies, identifying the statically significant time intervals with a chi2 test. |
185 | EventCube: multi-dimensional search and mining of structured and text data | Fangbo Tao, Kin Hou Lei, Jiawei Han, Chengxiang Zhai, Xiao Cheng, Marina Danilevsky, Nihit Desai, Bolin Ding, Jing Ge Ge, Heng Ji, Rucha Kanade, Anne Kao, Qi Li, Yanen Li, Cindy Lin, Jialu Liu, Nikunj Oza, Ashok Srivastava, Rod Tjoelker, Chi Wang, Duo Zhang, Bo Zhao | EventCube: multi-dimensional search and mining of structured and text data |
186 | SEA: a system for event analysis on chinese tweets | Yaqiong Wang, Hongfu Liu, Hao Lin, Junjie Wu, Zhiang Wu, Jie Cao | In light of this, in this demo paper, we propose SEA, a System for Event Analysis on Chinese tweets. |
187 | SAE: social analytic engine for large networks | Yang Yang, Jianfei Wang, Yutao Zhang, Wei Chen, Jing Zhang, Honglei Zhuang, Zhilin Yang, Bo Ma, Zhanpeng Fang, Sen Wu, Xiaoxiao Li, Debing Liu, Jie Tang | In this paper, we present a novel Social Analytic Engine (SAE) for large online social networks. |
188 | FIU-Miner: a fast, integrated, and user-friendly system for data mining in distributed environment | Chunqiu Zeng, Yexi Jiang, Li Zheng, Jingxuan Li, Lei Li, Hongtai Li, Chao Shen, Wubai Zhou, Tao Li, Bing Duan, Ming Lei, Pengnian Wang | In this paper, we design and implement FIU-Miner, a Fast, Integrated, and User-friendly system to ease data analysis. |
189 | LAFT-Explorer: inferring, visualizing and predicting how your social network expands | Jun Zhang, Chaokun Wang, Yuanchi Ning, Yichi Liu, Jianmin Wang, Philip S. Yu | In this paper we demonstrate LaFT-Explorer, a general toolkit for explaining and reproducing the network growth process based on the friendship propagation. |
190 | A transfer learning based framework of crowd-selection on twitter | Zhou Zhao, Da Yan, Wilfred Ng, Shi Gao | This helps understand our ideas in an interactive manner. |
191 | Risk-O-Meter: an intelligent clinical risk calculator | Kiyana Zolfaghar, Jayshree Agarwal, Deepthi Sistla, Si-Chi Chin, Senjuti Basu Roy, Nele Verbiest | We present a system called Risk-O-Meter to predict and an- alyze clinical risk via data imputation, visualization, predic- tive modeling, and association rule exploration. |
192 | Algorithmic techniques for modeling and mining large graphs (AMAzING) | Alan Frieze, Aristides Gionis, Charalampos Tsourakakis | In this tutorial, we will provide an in-depth presentation of the most popular random-graph models used for modeling real-world networks. |
193 | Mining data from mobile devices: a survey of smart sensing and analytics | Spiros Papadimitriou, Tina Eliassi-Rad | In this tutorial, we survey the state-of-the-art in terms of mining data from mobile devices across different application areas such as ads, healthcare, geosocial, public policy, etc. In part two, we present cross-cutting challenges such as real-time analysis, security, and we outline cross cutting methods for mobile data mining such as network inference, streaming algorithms, etc. |
194 | Big data analytics for healthcare | Jimeng Sun, Chandan K. Reddy | In this tutorial, we introduce the characteristics and related mining challenges on dealing with big medical data. |
195 | Entity resolution for big data | Lise Getoor, Ashwin Machanavajjhala | In this tutorial, we bring together perspectives on entity resolution from a variety of fields, including databases, information retrieval, natural language processing and machine learning, to provide, in one setting, a survey of a large body of work. |
196 | Network sampling | Mohammad A. Hasan, Jennifer Neville, Nesreen Ahmed | In this tutorial, we aim to cover a diverse collection of methodologies and applications of network sampling. |
197 | The dataminer’s guide to scalable mixed-membership and nonparametric bayesian models | Amr Ahmed, Alex Smola | We present design patterns for hierarchical nonparametric Bayesian models, efficient inference algorithms, and modeling tools to describe salient aspects of the data. |