Paper Digest: CIKM 2016 Highlights

November 1, 2016June 26, 2020 admin

The ACM Conference on Information and Knowledge Management (CIKM) is an annual computer science research conference dedicated to information management and knowledge management.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: CIKM 2016 Papers

	Title	Authors	Highlight
1	Toward Data-Driven Education: CIKM-2016 Keynote	Rakesh Agrawal	We address three issues in this talk.
2	Social Recommendation with Strong and Weak Ties	Xin Wang, Wei Lu, Martin Ester, Can Wang, Chun Chen	In this work, we study the effects of distinguishing strong and weak ties in social recommendation.
3	Learning Graph-based POI Embedding for Location-based Recommendation	Min Xie, Hongzhi Yin, Hao Wang, Fanjiang Xu, Weitong Chen, Sen Wang	To address these challenges, we stand on recent advances in embedding learning techniques and propose a generic graph-based embedding model, called GE, in this paper.
4	Improving Personalized Trip Recommendation by Avoiding Crowds	Xiaoting Wang, Christopher Leckie, Jeffrey Chan, Kwan Hui Lim, Tharshan Vaithianathan	In this work, we propose the Personalized Crowd-aware Trip Recommendation (PersCT) algorithm to recommend personalized trips that also avoid the most crowded times of the POIs.
5	Memory-based Recommendations of Entities for Web Search Users	Ignacio Fernández-Tobías, Roi Blanco	In this paper we propose a set of domain-agnostic methods based on nearest neighbors collaborative filtering that exploit query log data to generate entity suggestions, taking into account the user’s full search session.
6	LICON: A Linear Weighting Scheme for the Contribution ofInput Variables in Deep Artificial Neural Networks	Gjergji Kasneci, Thomas Gottron	We propose a generic framework as well as a concrete method for quantifying the influence of individual input signals on the output computed by a deep neural network.
7	A Deep Relevance Matching Model for Ad-hoc Retrieval	Jiafeng Guo, Yixing Fan, Qingyao Ai, W. Bruce Croft	In this paper, we propose a novel deep relevance matching model (DRMM) for ad-hoc retrieval.
8	A Neural Network Approach to Quote Recommendation in Writings	Jiwei Tan, Xiaojun Wan, Jianguo Xiao	In this paper, we propose a neural network approach based on LSTMs to the quote recommendation task.
9	Retweet Prediction with Attention-based Deep Neural Network	Qi Zhang, Yeyun Gong, Jindou Wu, Haoran Huang, Xuanjing Huang	In this work, we proposed a novel attention-based deep neural network to incorporate contextual and social information for this task. To train and evaluate the proposed methods, we also constructed a large dataset collected from Twitter.
10	Effective Document Labeling with Very Few Seed Words: A Topic Model Approach	Chenliang Li, Jian Xing, Aixin Sun, Zongyang Ma	In this paper, we propose a Seed-Guided Topic Model (named STM) for the dataless text classification task.
11	Cross-lingual Text Classification via Model Translation with Limited Dictionaries	Ruochen Xu, Yiming Yang, Hanxiao Liu, Andrew Hsi	Specifically, we propose two new approaches that combines unsupervised word embedding in different languages, supervised mapping of embedded words across languages, and probabilistic translation of classification models.
12	Semi-supervised Multi-Label Topic Models for Document Classification and Sentence Labeling	Hossein Soleimani, David J. Miller	We propose a semi-supervised multi-label topic model for jointly achieving document and sentence-level class inferences.
13	Linked Document Embedding for Classification	Suhang Wang, Jiliang Tang, Charu Aggarwal, Huan Liu	In this paper, we study the problem of linked document embedding for classification and propose a linked document embedding framework LDE, which combines link and label information with content information to learn document representations for classification. Linked documents present new challenges to traditional document embedding algorithms.
14	Detecting Promotion Campaigns in Query Auto Completion	Yuli LIU, Yiqun Liu, Ke Zhou, Min Zhang, Shaoping Ma, Yue Yin, Hengliang Luo	Query Auto Completion (QAC) aims to provide possible suggestions to Web search users from the moment they start entering a query, which is thought to reduce their physical and cognitive efforts in query formulation.
15	A Unified Index for Spatio-Temporal Keyword Queries	Tuan-Anh Hoang-Vu, Huy T. Vo, Juliana Freire	We propose a new indexing strategy that uniformly handles text, space and time in a single structure, and is thus able to efficiently evaluate queries that combine keywords with spatial and temporal constraints.
16	Privacy-Preserving Reachability Query Services for Massive Networks	Jiaxin Jiang, Peipei Yi, Byron Choi, Zhiwei Zhang, Xiaohui Yu	Specifically, we propose a scalable index construction algorithm by employing the idea of topological folding, recently proposed by Cheng et al.
17	Sequential Query Expansion using Concept Graph	Saeid Balaneshin-kordan, Alexander Kotov	In this paper, we propose a two-stage feature-based method for sequential selection of the most effective concepts for query expansion from a concept graph.
18	Learning Latent Vector Spaces for Product Search	Christophe Van Gysel, Maarten de Rijke, Evangelos Kanoulas	We introduce a novel latent vector space model that jointly learns the latent representations of words, e-commerce products and a mapping between the two without the need for explicit annotations.
19	Incorporating Clicks, Attention and Satisfaction into a Search Engine Result Page Evaluation Model	Aleksandr Chuklin, Maarten de Rijke	In this paper we propose a model of user behavior on a SERP that jointly captures click behavior, user attention and satisfaction, the CAS model, and demonstrate that it gives more accurate predictions of user actions and self-reported satisfaction than existing models based on clicks alone.
20	The Role of Relevance in Sponsored Search	Luca Aiello, Ioannis Arapakis, Ricardo Baeza-Yates, Xiao Bai, Nicola Barbieri, Amin Mantrach, Fabrizio Silvestri	Specifically, we propose a machine learning approach that solely relies on text-based features to measure the relevance between an advertisement and a query.
21	PowerWalk: Scalable Personalized PageRank via Random Walks with Vertex-Centric Decomposition	Qin Liu, Zhenguo Li, John C.S. Lui, Jiefeng Cheng	In this paper, we propose a distributed framework that strikes a better balance between offline indexing and online querying.
22	Building Industry-specific Knowledge Bases	Shivakumar Vaithyanathan	In this talk, I will describe the design of domain-specific languages (DSL) with specialized constructs that serve as target languages for learning these models and algorithms, and the generation of training data for scaling up the learning.
23	Reuters Tracer: A Large Scale System of Detecting & Verifying Real-Time News Events from Twitter	Xiaomo Liu, Quanzhi Li, Armineh Nourbakhsh, Rui Fang, Merine Thomas, Kajsa Anderson, Russ Kociuba, Mark Vedder, Steven Pomerville, Ramdev Wudali, Robert Martin, John Duprey, Arun Vachher, William Keenan, Sameena Shah	In this paper, we describe Reuters Tracer, a system for sifting through all noise to detect news events on Twitter and assessing their veracity.
24	Structural Clustering of Machine-Generated Mail	Noa Avigdor-Elgrabli, Mark Cwalinski, Dotan Di Castro, Iftah Gamzu, Irena Grabovitch-Zuyev, Liane Lewin-Eytan, Yoelle Maarek	Several recent studies have presented different approaches for clustering and classifying machine-generated mail based on email headers.
25	LambdaFM: Learning Optimal Ranking with Factorization Machines Using Lambda Surrogates	Fajie Yuan, Guibing Guo, Joemon M. Jose, Long Chen, Haitao Yu, Weinan Zhang	In this paper, we demonstrate, both theoretically and empirically, PRFM models usually lead to non-optimal item recommendation results due to such a mismatch.
26	Plackett-Luce Regression Mixture Model for Heterogeneous Rankings	Maksim Tkachenko, Hady W. Lauw	In this work, we are concerned with learning to rank for a heterogeneous population, which may consist of a number of sub-populations, each of which may rank objects differently.
27	Compression-Based Selective Sampling for Learning to Rank	Rodrigo M. Silva, Guilherme C.M. Gomes, Mário S. Alvim, Marcos A. Gonçalves	In this paper, we propose that certain characteristics of unlabeled L2R datasets allow for an unsupervised, compression-based selection process to be used to create small and yet highly informative and effective initial sets that can later be labeled and used to bootstrap a L2R system.
28	Incorporating Risk-Sensitiveness into Feature Selection for Learning to Rank	Daniel Xavier De Sousa, Sérgio Daniel Canuto, Thierson Couto Rosa, Wellington Santos Martins, Marcos André Gonçalves	In this paper we propose multi-objective FS strategies that optimize both aspects at the same time: ranking performance and risk-sensitive evaluation.
29	Answering Twitter Questions: a Model for Recommending Answerers through Social Collaboration	Laure Soulier, Lynda Tamine, Gia-Hung Nguyen	In this paper, we specifically consider the challenging task of solving a question posted on Twitter.
30	Learning to Extract Conditional Knowledge for Question Answering using Dialogue	Pengwei Wang, Lei Ji, Jun Yan, Lianwen Jin, Wei-Ying Ma	In this work, we propose to extract conditional knowledge base (CKB) from user question-answer pairs for answering user questions with different conditions through dialogue.
31	aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model	Liu Yang, Qingyao Ai, Jiafeng Guo, W. Bruce Croft	In this paper, we propose an attention based neural matching model for ranking short answer text.
32	Medical Question Answering for Clinical Decision Support	Travis R. Goodwin, Sanda M. Harabagiu	In this paper, we present a novel framework for answering medical questions in the spirit of TREC-CDS by first discovering the answer and then selecting and ranking scientific articles that contain the answer.
33	Error Link Detection and Correction in Wikipedia	Chengyu Wang, Rong Zhang, Xiaofeng He, Aoying Zhou	In this paper, we address the error link problem, and propose algorithms to detect and correct error links.
34	Using Prerequisites to Extract Concept Maps fromTextbooks	Shuting Wang, Alexander Ororbia, Zhaohui Wu, Kyle Williams, Chen Liang, Bart Pursel, C. Lee Giles	We present a framework for constructing a specific type of knowledge graph, a concept map from textbooks.
35	Vandalism Detection in Wikidata	Stefan Heindorf, Martin Potthast, Benno Stein, Gregor Engels	In this paper, we present a new machine learning-based approach to detect vandalism in Wikidata. We propose a set of 47 features that exploit both content and context information, and we report on 4 classifiers of increasing effectiveness tailored to this learning task.
36	Finding News Citations for Wikipedia	Besnik Fetahu, Katja Markert, Wolfgang Nejdl, Avishek Anand	In this work we address the problem of finding and updating news citations for statements in entity pages.
37	SemiNMF-PCA framework for Sparse Data Co-clustering	Kais Allab, Lazhar Labiod, Mohamed Nadif	In this paper, we propose a novel way to consider the co-clustering and the reduction of the dimension simultaneously.
38	Effective and Efficient Spectral Clustering on Text and Link Data	Zhiqiang Xu, Yiping Ke	In this paper, we address this limitation by explicitly modeling the domain-specific distinctions in the clustering process.
39	Robust Spectral Ensemble Clustering	Zhiqiang Tao, Hongfu Liu, Sheng Li, Yun Fu	In this paper, we propose a novel Robust Spectral Ensemble Clustering (RSEC) approach to address this challenge.
40	Hybrid Indexing for Versioned Document Search with Cluster-based Retrieval	Xin Jin, Daniel Agun, Tao Yang, Qinghao Wu, Yifan Shen, Susen Zhao	This paper proposes an alternative approach that uses cluster-based retrieval to quickly narrow the search scope guided by version representatives at Phase 1 and develops a hybrid index structure with adaptive runtime data traversal to speed up Phase 2 search.
41	Time-aware Multi-Viewpoint Summarization of Multilingual Social Text Streams	Zhaochun Ren, Oana Inel, Lora Aroyo, Maarten de Rijke	In this paper, we focus on time-aware multi-viewpoint summarization of multilingual social text streams.
42	Data Summarization with Social Contexts	Hao Zhuang, Rameez Rahman, Xia Hu, Tian Guo, Pan Hui, Karl Aberer	To tackle these challenges, in this paper, we focus on exploiting social contexts to summarize social data while preserving topics in the original dataset.
43	Understanding Sparse Topical Structure of Short Text via Stochastic Variational-Gibbs Inference	Tianyi Lin, Siyuan Zhang, Hong Cheng	In this paper, we propose a probabilistic Bayesian topic model, namely Sparse Dirichlet mixture Topic Model (SparseDTM), based on Indian Buffet Process (IBP) prior, and infer our model on the large text corpora through a novel inference procedure called stochastic variational-Gibbs inference.
44	Annotating Points of Interest with Geo-tagged Tweets	Kaiqi Zhao, Gao Cong, Aixin Sun	In this paper, we aim to associate tweets that are semantically related to real-world locations or Points of Interest (POIs).
45	Duer: Intelligent Personal Assistant	Haifeng Wang	In this talk, I describe Duer, Baidu’s intelligent personal assistant.
46	Measuring Metrics	Pavel Dmitriev, Xian Wu	In this paper we describe the metric evaluation system deployed at Bing, where we have been working on designing and improving metrics for over five years.
47	City-Scale Localization with Telco Big Data	Fangzhou Zhu, Chen Luo, Mingxuan Yuan, Yijian Zhu, Zhengqing Zhang, Tao Gu, Ke Deng, Weixiong Rao, Jia Zeng	In this paper, we find that the widely-used location based services (LBSs) have accumulated lots of over-the-top (OTT) global positioning system (GPS) data in telco networks, which can be automatically used as training labels for learning accurate MR-based positioning systems.
48	Approximating Graph Pattern Queries Using Views	Jia Li, Yang Cao, Xudong Liu	Given a pattern query Q and a set V of views, we propose to find a pair of queries Q_u and Q_l, referred to as the upper and lower approximations of Q w.r.t. V, such that (a) for any data graph G, answers to (part of) Q in G are contained in Q_u(G) and contain Q_l(G); and (b) both Q_u and Q_l can be answered by using views in V.
49	Group-Aware Weighted Bipartite B-Matching	Cheng Chen, Sean Chester, Venkatesh Srinivasan, Kui Wu, Alex Thomo	In this paper, we investigate powerful generalisations of WBM. We then propose two related problems, collectively called group-aware WBM.
50	Growing Graphs from Hyperedge Replacement Graph Grammars	Salvador Aguiñaga, Rodrigo Palacios, David Chiang, Tim Weninger	In this paper we show that a graph’s clique tree can be used to extract a hyperedge replacement grammar.
51	GiraphAsync: Supporting Online and Offline Graph Processing via Adaptive Asynchronous Message Processing	Yuqiong Liu, Chang Zhou, Jun Gao, Zhiguo Fan	In this work, we propose an adaptive asynchronous message processing (AAMP) method, which improves the efficiency of network communication while maintains low latency, to efficiently support offline analytics and online queries in one graph processing framework.
52	Graph Topic Scan Statistic for Spatial Event Detection	Yu Liu, Baojian Zhou, Feng Chen, David W. Cheung	In this paper, we focus on the problem of spatial event detection using textual information in social media.
53	A Nonparametric Model for Event Discovery in the Geospatial-Temporal Space	Jinjin Guo, Zhiguo Gong	To break through such limitations, in this paper we propose a novel nonparametric model to identify events in the geographical and temporal space, where any recurrent patterns of events can be automatically captured.
54	A Multiple Instance Learning Framework for Identifying Key Sentences and Detecting Events	Wei Wang, Yue Ning, Huzefa Rangwala, Naren Ramakrishnan	We evaluate our model in its ability to detect news articles about civil unrest events (from Spanish text) across ten Latin American countries and identify the key sentences pertaining to these events.
55	PairFac: Event Analytics through Discriminant Tensor Factorization	Xidao Wen, Yu-Ru Lin, Konstantinos Pelechrinis	In this paper, we propose a novel approach for analyzing events called PairFac.
56	Active Content-Based Crowdsourcing Task Selection	Piyush Bansal, Carsten Eickhoff, Thomas Hofmann	In this paper, we focus on an alternate method that exploits document information instead, to infer relevance labels for unjudged documents.
57	CrowdSelect: Increasing Accuracy of Crowdsourcing Tasks through Behavior Prediction and User Selection	Chenxi Qiu, Anna C. Squicciarini, Barbara Carminati, James Caverlee, Dev Rishi Khare	In this paper, we present a dynamic and time efficient solution to the task assignment problem in crowdsourcing platforms.
58	Attribute-based Crowd Entity Resolution	Asif R. Khan, Hector Garcia-Molina	In this paper, we reduce the cost of pairwise crowd ER approaches by soliciting the crowd for attribute labels on records, and then asking for pairwise judgments only between records with similar sets of attribute labels.
59	Efficient Processing of Location-Aware Group Preference Queries	Miao Li, Lisi Chen, Gao Cong, Yu Gu, Ge Yu	We develop a novel framework for answering the LGP query, which can be used to compute both exact query result and approximate result with a proven approximation ratio.
60	Mining Shopping Patterns for Divergent Urban Regions by Incorporating Mobility Data	Tianran Hu, Ruihua Song, Yingzi Wang, Xing Xie, Jiebo Luo	In this paper, we aim to predict citywide shopping patterns.
61	Large-Scale Analysis of Viewing Behavior: Towards Measuring Satisfaction with Mobile Proactive Systems	Qi Guo, Yang Song	In this paper, we present the first large-scale analysis of viewing behavior based on the viewport (the visible fraction of a Web page) of the mobile devices, towards measuring user satisfaction with the information cards of the mobile proactive systems.
62	Where Did You Go: Personalized Annotation of Mobility Records	Fei Wu, Zhenhui Li	In this paper, we aim to answer this question by annotating the mobility records with surrounding venues that were actually visited by the user.
63	Understanding Mobile Searcher Attention with Rich Ad Formats	Dmitry Lagun, Donal McMahon, Vidhya Navalpakkam	In this paper, we study how the presence of ads and their formats impacts searcher’s gaze and satisfaction.
64	Link Prediction in Heterogeneous Social Networks	Sumit Negi, Santanu Chaudhury	In this paper we pose the problem of link prediction in heterogeneous networks as a multi-task, metric learning (MTML) problem.
65	Who are My Familiar Strangers?: Revealing Hidden Friend Relations and Common Interests from Smart Card Data	Fusang Zhang, Beihong Jin, Tingjian Ge, Qiang Ji, Yanling Cui	In this paper, we study the problem of discovering familiar strangers, specifically, public transportation trip companions, and their common interests.
66	PIN-TRUST: Fast Trust Propagation Exploiting Positive, Implicit, and Negative Information	Min-Hee Jang, Christos Faloutsos, Sang-Wook Kim, U Kang, Jiwoon Ha	In this paper, we propose PIN-TRUST, a novel method to handle all three types of interaction information: explicit trust, implicit trust, and explicit distrust.
67	Predicting Popularity of Twitter Accounts through the Discovery of Link-Propagating Early Adopters	Daichi Imamori, Keishi Tajima	In this paper, we propose a method of ranking recently created Twitter accounts according to their prospective popularity.
68	"Shall I Be Your Chat Companion?": Towards an Online Human-Computer Conversation System	Rui Yan, Yiping Song, Xiangyang Zhou, Hua Wu	In this paper, we introduce a chat companion system, which is a practical conversation system between human and computer as a real application.
69	To Click or Not To Click: Automatic Selection of Beautiful Thumbnails from Videos	Yale Song, Miriam Redi, Jordi Vallmitjana, Alejandro Jaimes	We present an automatic thumbnail selection system that exploits two important characteristics commonly associated with meaningful and attractive thumbnails: high relevance to video content and superior visual aesthetic quality.
70	Separating-Plane Factorization Models: Scalable Recommendation from One-Class Implicit Feedback	Haolan Chen, Di Niu, Kunfeng Lai, Yu Xu, Masoud Ardakani	We propose a scalable approach called separating-plane matrix factorization (SPMF) to make effective recommendations based on positive implicit feedback, with a learning complexity that is comparable to traditional matrix factorization.
71	User Response Learning for Directly Optimizing Campaign Performance in Display Advertising	Kan Ren, Weinan Zhang, Yifei Rong, Haifeng Zhang, Yong Yu, Jun Wang	In this paper, we take real-time display advertising as an example, where the predicted user’s ad click-through rate (CTR) is employed to calculate a bid for an ad impression in the second price auction.
72	Personalized Search: Potential and Pitfalls	Susan T. Dumais	In this talk I present a framework to quantify the "potential for personalization" which we use to characterize the extent to which different people have different intents for the same query.
73	Query Variations and their Effect on Comparing Information Retrieval Systems	Guido Zuccon, Joao Palotti, Allan Hanbury	We propose a framework for evaluating retrieval systems that explicitly takes into account query variations.
74	Semantic Matching by Non-Linear Word Transportation for Information Retrieval	Jiafeng Guo, Yixing Fan, Qingyao Ai, W. Bruce Croft	Based on this representation, we introduce a novel retrieval model by viewing the matching between queries and documents as a non-linear word transportation (NWT) problem.
75	Generalizing Translation Models in the Probabilistic Relevance Framework	Navid Rekabsaz, Mihai Lupu, Allan Hanbury, Guido Zuccon	In this paper, we revisit a wide spectrum of existing models (Pivoted Document Normalization, BM25, BM25 Verboseness Aware, Multi-Aspect TF, and Language Modelling) by introducing a generalisation of the idea of the translation model.
76	Axiomatic Result Re-Ranking	Matthias Hagen, Michael Völske, Steve Göring, Benno Stein	In this paper, we combine the learning-to-rank paradigm with the recent developments on axioms for information retrieval.
77	Agents, Simulated Users and Humans: An Analysis of Performance and Behaviour	David Maxwell, Leif Azzopardi	In this paper, we develop a more sophisticated model of the user that includes their cognitive state within the simulation.
78	Inspiration or Preparation?: Explaining Creativity in Scientific Enterprise	Xinyang Zhang, Dashun Wang, Ting Wang	Existing studies have made striding advances in quantifying creativity of scientific publications by investigating their citation relationships.
79	Pagination versus Scrolling in Mobile Web Search	Jaewon Kim, Paul Thomas, Ramesh Sankaranarayana, Tom Gedeon, Hwan-Jin Yoon	For touch-enabled mobile devices that are not equipped with a mouse or keyboard, we adopt other methods of controlling the viewport with the aim of investigating user interaction.
80	Studying the Dark Triad of Personality through Twitter Behavior	Daniel Preotiuc-Pietro, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar	Our results show that we can map various behaviors to psychological theory and study new aspects related to social media usage.
81	Document Filtering for Long-tail Entities	Ridho Reinanda, Edgar Meij, Maarten de Rijke	In this paper we propose a document filtering method for long-tail entities that is entity-independent and thus also generalizes to unseen or rarely seen entities. We propose a set of features that capture informativeness, entity-saliency, and timeliness.
82	Estimating Time Models for News Article Excerpts	Arunav Mishra, Klaus Berberich	For this, we propose a semi-supervised distribution propagation framework that leverages redundancy in the data to improve the quality of estimated time models.
83	A Framework for Task-specific Short Document Expansion	Ramakrishna B. Bairi, Raghavendra Udupa, Ganesh Ramakrishnan	We present an expansion technique — TIDE (Task-specIfic short Document Expansion) — that can be applied on several Machine Learning, NLP and Information Retrieval tasks on short texts (such as short text classification, clustering, entity disambiguation, and the like) without using task specific heuristics and domain-specific knowledge for expansion.
84	Beyond Clustering: Sub-DAG Discovery for Categorising Documents	Ramakrishna B. Bairi, Mark J. Carman, Ganesh Ramakrishnan	We propose two different algorithms for estimating the model parameters. Unlike previous works, which focus on clustering the set of documents using the category hierarchy as features, we directly pose the problem as that of finding a DAG structured generative mode that has maximum likelihood of generating the observed "importance" scores for each document where documents are modeled as the leaf nodes in the DAG structure.
85	On Transductive Classification in Heterogeneous Information Networks	Xiang Li, Ben Kao, Yudian Zheng, Zhipeng Huang	Studies have shown that transductive classification is an effective way to classify and to deduce labels of objects, and a number of transductive classifiers have been put forward to classify objects in an HIN.
86	Efficient Hidden Trajectory Reconstruction from Sparse Data	Ning Yang, Philip S. Yu	In this paper, we investigate the problem of reconstructing hidden trajectories from a collective of separate spatial-temporal points without ID information, given the number of hidden trajectories.
87	Quark-X: An Efficient Top-K Processing Framework for RDF Quad Stores	Jyoti Leeka, Srikanta Bedathur, Debajyoti Bera, Medha Atre	In this paper, we present Quark-X, an RDF-store and SPARQL processing system for reified RDF data represented in the form of quads.
88	Reenactment for Read-Committed Snapshot Isolation	Bahareh Sadat Arab, Dieter Gawlick, Vasudha Krishnaswamy, Venkatesh Radhakrishnan, Boris Glavic	We present non trivial extensions of the model and reenactment approach to be able to compute provenance of RC-SI transactions efficiently.
89	Influence-Aware Truth Discovery	Hengtong Zhang, Qi Li, Fenglong Ma, Houping Xiao, Yaliang Li, Jing Gao, Lu Su	To tackle these challenges in truth discovery, we propose an unsupervised probabilistic model named IATD.
90	Truth Discovery via Exploiting Implications from Multi-Source Data	Xianzhi Wang, Quan Z. Sheng, Lina Yao, Xue Li, Xiu Susie Fang, Xiaofei Xu, Boualem Benatallah	In this paper, we address this challenge by exploiting and leveraging the implications from multi-source data.
91	FacetGist: Collective Extraction of Document Facets in Large Technical Corpora	Tarique Siddiqui, Xiang Ren, Aditya Parameswaran, Jiawei Han	Towards this end, we introduce a new research problem called Facet Extraction.
92	Empowering Truth Discovery with Multi-Truth Prediction	Xianzhi Wang, Quan Z. Sheng, Lina Yao, Xue Li, Xiu Susie Fang, Xiaofei Xu, Boualem Benatallah	In this paper, we propose a multi-truth discovery approach, which addresses the above challenges by providing a generic framework for enhancing existing truth discovery methods.
93	Using Machine Learning to Improve the Email Experience	Marc Najork	In this talk, I will give three examples of machine learning improving the email experience.
94	Hashtag Recommendation for Enterprise Applications	Dhruv Mahajan, Vishwajit Kolathur, Chetan Bansal, Suresh Parthasarathy, Sundararajan Sellamanickam, Sathiya Keerthi, Johannes Gehrke	In this paper, we consider the problem of recommending hashtags for enterprise applications.
95	Survival Analysis based Framework for Early Prediction of Student Dropouts	Sattar Ameri, Mahtab J. Fard, Ratna B. Chinnam, Chandan K. Reddy	In this paper, we develop a survival analysis framework for early prediction of student dropout using Cox proportional hazards regression model (Cox).
96	Generative Feature Language Models for Mining Implicit Features from Customer Reviews	Shubhra Kanti Karmaker Santu, Parikshit Sondhi, ChengXiang Zhai	In this paper, we propose a new approach based on generative feature language models that can mine the implicit features more effectively through unsupervised statistical learning. We also created eight new data sets to facilitate evaluation of this task in English.
97	Data-Driven Contextual Valence Shifter Quantification for Multi-Theme Sentiment Analysis	Hongkun Yu, Jingbo Shang, Meichun Hsu, Malu Castellanos, Jiawei Han	To simultaneously resolve the multi-theme and sentiment shifting problems, we propose a data-driven framework to enable both capabilities: (1) polarity predictions of the same word in reviews of different themes, and (2) discovery and quantification of contextual valence shifters.
98	Sentiment Domain Adaptation with Multi-Level Contextual Sentiment Knowledge	Fangzhao Wu, Sixing Wu, Yongfeng Huang, Songfang Huang, Yong Qin	In this paper, we propose a new sentiment domain adaptation approach by adapting the sentiment knowledge in general-purpose sentiment lexicons to a specific domain.
99	Mobile App Retrieval for Social Media Users via Inference of Implicit Intent in Social Media Text	Dae Hoon Park, Yi Fang, Mengwen Liu, ChengXiang Zhai	In this paper, we study how to infer a user’s intent based on the user’s "status text" and retrieve relevant mobile apps that may satisfy the user’s needs.
100	Derivative Delay Embedding: Online Modeling of Streaming Time Series	Zhifei Zhang, Yang Song, Wei Wang, Hairong Qi	We propose a novel and more practical online modeling and classification scheme, DDE-MGM, which does not make any assumptions on the time series while maintaining high efficiency and state-of-the-art performance.
101	PISA: An Index for Aggregating Big Time Series Data	Xiangdong Huang, Jianmin Wang, Raymond Wong, Jinrui Zhang, Chen Wang	By defining two kinds of tags, namely code number and serial number, we propose an algorithm to accelerate queries by avoiding reading unnecessary data on disk.
102	Multi-View Time Series Classification: A Discriminative Bilinear Projection Approach	Sheng Li, Yaliang Li, Yun Fu	In light of this challenge, we propose a novel approach, named Multi-view Discriminative Bilinear Projections (MDBP), for extracting discriminative features from multi-view m.t.s. data.
103	Semi-Supervision Dramatically Improves Time Series Clustering under Dynamic Time Warping	Hoang Anh Dau, Nurjahan Begum, Eamonn Keogh	In this work we show that this is a naive approach which in most circumstances produces inferior clusterings.
104	Model-Based Oversampling for Imbalanced Sequence Classification	Zhichen Gong, Huanhuan Chen	To address these problems, this paper proposes a novel oversampling algorithm based on the ‘generative’ models of sequences.
105	CRISP: Consensus Regularized Selection based Prediction	Ping Wang, Karthik K. Padthe, Bhanukiran Vinzamuri, Chandan K. Reddy	To solve this problem, in this paper, we propose a method to generate a committee of non-convex regularized linear regression models, and use a consensus criterion to determine the optimal model for prediction.
106	Regularizing Structured Classifier with Conditional Probabilistic Constraints for Semi-supervised Learning	Vincent W. Zheng, Kevin Chen-Chuan Chang	Thus in this paper, we propose a new conditional probabilistic formulation for modeling both x-type and y-type constraints.
107	Scalability of Continuous Active Learning for Reliable High-Recall Text Classification	Gordon V. Cormack, Maura R. Grossman	We present a scalable version of CAL (‘S-CAL’) that requires O(log N) labeling effort and O(N log N) computational effort—where N is the number of unlabeled training examples—to construct a classifier whose effectiveness for a given labeling cost compares favorably with previously reported methods.
108	Towards the Effective Linking of Social Media Contents to Products in E-Commerce Catalogs	Henry S. Vieira, Altigran S. da Silva, Pável Calado, Marco Cristo, Edleno S. de Moura	We argue that this problem can be effectively solved using a set of evidences that can be easily extracted from social media content and product descriptions.
109	Tracking Virality and Susceptibility in Social Media	Tuan-Anh Hoang, Ee-Peng Lim	In this work, we investigate the inter-relationship among the factors and users’ multiple adoptions on items to propose both new static and temporal models for measuring the factors without requiring user – item exposure.
110	Feature Driven and Point Process Approaches for Popularity Prediction	Swapnil Mishra, Marian-Andrei Rizoiu, Lexing Xie	From these observations, we argue that future work on popularity prediction should compare across feature-driven and generative modeling approaches in both classification and regression tasks.
111	Adaptive Evolutionary Filtering in Real-Time Twitter Stream	Feifan Fan, Yansong Feng, Lili Yao, Dongyan Zhao	In this paper, we propose a novel adaptive evolutionary filtering framework to push interesting tweets for users from real-time twitter stream.
112	Multiple Queries as Bandit Arms	Cheng Li, Paul Resnick, Qiaozhu Mei	We consider a new paradigm of retrieval where multiple queries are kept “active” simultaneously.
113	Off the Beaten Path: Let’s Replace Term-Based Retrieval with k-NN Search	Leonid Boytsov, David Novak, Yury Malkov, Eric Nyberg	We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle term associations.
114	Scalability and Total Recall with Fast CoveringLSH	Ninh Pham, Rasmus Pagh	Building on the recent theoretical "CoveringLSH" construction that eliminates false negatives, we propose a fast and practical covering LSH scheme for Hamming space called Fast CoveringLSH (fcLSH).
115	Query-Biased Partitioning for Selective Search	Zhuyun Dai, Chenyan Xiong, Jamie Callan	This paper presents a query-biased partitioning strategy that aligns document partitions with topics from query logs.
116	Characterizing Diseases from Unstructured Text: A Vocabulary Driven Word2vec Approach	Saurav Ghosh, Prithwish Chakraborty, Emily Cohn, John S. Brownstein, Naren Ramakrishnan	In this paper, we motivate a disease vocabulary driven word2vec model (Dis2Vec) to model diseases and constituent attributes as word embeddings from the HealthMap news corpus.
117	Network-Efficient Distributed Word2vec Training System for Large Vocabularies	Erik Ordentlich, Lee Yang, Andy Feng, Peter Cnudde, Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, Gavin Owens	In this paper, we present a novel distributed, parallel training system that enables unprecedented practical training of vectors for vocabularies with several 100 million words on a shared cluster of commodity servers, using far less network traffic than the existing solutions.
118	A Personal Perspective and Retrospective on Web Search Technology	Andrei Broder	This talk is a review of some Web research and predictions that I co-authored over the last two decades: both what turned out gratifyingly right and what turned out embarrassingly wrong.
119	Scalable Spectral k-Support Norm Regularization for Robust Low Rank Subspace Learning	Yiu-ming Cheung, Jian Lou	Therefore, this paper proposes a scalable and efficient algorithm which considers the dual objective of the original problem that can take advantage of the more computational efficient linear oracle of the spectral k-support norm to be evaluated.
120	Online Adaptive Passive-Aggressive Methods for Non-Negative Matrix Factorization and Its Applications	Chenghao Liu, Steven C.H. Hoi, Peilin Zhao, Jianling Sun, Ee-Peng Lim	This paper aims to investigate efficient and scalable machine learning algorithms for resolving Non-negative Matrix Factorization (NMF), which is important for many real-world applications, particularly for collaborative filtering and recommender systems.
121	aptMTVL: Nailing Interactions in Multi-Task Multi-View Multi-Label Learning using Adaptive-basis Multilinear Factor Analyzers	Xiaoli Li, Jun Huan	We investigate a new direction of multi-task multi-view learning where we have data sets with multiple tasks, multiple views and multiple labels.
122	An Adaptive Framework for Multistream Classification	Swarup Chandra, Ahsanul Haque, Latifur Khan, Charu Aggarwal	In this paper, we present a novel stream classification problem setting involving two independent non-stationary data generating processes, relaxing the above assumptions.
123	Optimizing Update Frequencies for Decaying Information	Simon Razniewski	In this paper we present a model for describing the relationship between update frequency and income derived from data, present solutions for calculating the optimal update frequency for two common classes of functions for describing decay behaviour, and validate the benefits of our framework.
124	Cutty: Aggregate Sharing for User-Defined Windows	Paris Carbone, Jonas Traub, Asterios Katsifodimos, Seif Haridi, Volker Markl	In this paper we present a technique to perform efficient aggregate sharing for data stream windows, which are declared as user-defined functions (UDFs) and can contain arbitrary business logic.
125	Relational Database Schema Design for Uncertain Data	Sebastian Link, Henri Prade	We investigate the impact of uncertainty on relational data\-base schema design.
126	BICP: Block-Incremental CP Decomposition with Update Sensitive Refinement	Shengyu Huang, K. Selçuk Candan, Maria Luisa Sapino	In this paper, we propose a two-phase block-incremental CP-based tensor decomposition technique, BICP, that efficiently and effectively maintains tensor decomposition results in the presence of dynamically evolving tensor data.
127	Topological Graph Sketching for Incremental and Scalable Analytics	Bortik Bandyopadhyay, David Fuhry, Aniket Chakrabarti, Srinivasan Parthasarathy	We propose a novel, scalable, and principled graph sketching technique based on minwise hashing of local neighborhood.
128	Querying Minimal Steiner Maximum-Connected Subgraphs in Large Graphs	Jiafeng Hu, Xiaowei Wu, Reynold Cheng, Siqiang Luo, Yixiang Fang	In this paper, we investigate the minimal SMCS, which is the minimal subgraph of G with the maximum connectivity containing Q.
129	Efficient Estimation of Triangles in Very Large Graphs	Roohollah Etemadi, Jianguo Lu, Yung H. Tsin	This paper proposes a new method to estimate the number of triangles based on random edge sampling.
130	Efficient Batch Processing for Multiple Keyword Queries on Graph Data	Lu Chen, Chengfei Liu, Xiaochun Yang, Bin Wang, Jianxin Li, Rui Zhou	Based on the model, we design an A* based algorithm to find the global optimal execution plan for multiple queries.
131	Supervised Robust Discrete Multimodal Hashing for Cross-Media Retrieval	Ting-Kun Yan, Xin-Shun Xu, Shanqing Guo, Zi Huang, Xiao-Lin Wang	To consider these problems, in this paper, we propose a novel supervised hashing framework for cross-modal retrieval, i.e., Supervised Robust Discrete Multimodal Hashing (SRDMH).
132	Word Vector Compositionality based Relevance Feedback using Kernel Density Estimation	Dwaipayan Roy, Debasis Ganguly, Mandar Mitra, Gareth J.F. Jones	To alleviate this limitation, we introduce a relevance feedback (RF) method which makes use of word embedded vectors.
133	Q+Tree: An Efficient Quad Tree based Data Indexing for Parallelizing Dynamic and Reverse Skylines	Md. Saiful Islam, Chengfei Liu, Wenny Rahayu, Tarique Anwar	This paper presents an efficient quad-tree based data indexing scheme, called Q+Tree, for parallelizing the computations of the dynamic and reverse skyline queries.
134	Luhn Revisited: Significant Words Language Models	Mostafa Dehghani, Hosein Azarbonyad, Jaap Kamps, Djoerd Hiemstra, Maarten Marx	Inspired by the early work of Luhn [23], we propose significant words language models of feedback documents that capture all, and only, the significant shared terms from feedback documents.
135	ESPRESSO: Explaining Relationships between Entity Sets	Stephan Seufert, Klaus Berberich, Srikanta J. Bedathur, Sarath Kumar Kondreddi, Patrick Ernst, Gerhard Weikum	This paper presents efficient approximation algorithms.
136	Geotagging Named Entities in News and Online Documents	Jiangwei Yu Rafiei, Davood Rafiei	We study the problem of associating geography to named entities in online documents.
137	Discovering Entities with Just a Little Help from You	Jaspreet Singh, Johannes Hoffart, Avishek Anand	We propose novel human-in-the-loop retrieval methods for generating candidates based on gradient interleaving of diversification and textual relevance approaches.
138	Bayesian Non-Exhaustive Classification A Case Study: Online Name Disambiguation using Temporal Record Streams	Baichuan Zhang, Murat Dundar, Mohammad Al Hasan	In this work, we propose a Bayesian non-exhaustive classification framework for solving online name disambiguation task.
139	Large-scale Robust Online Matching and Its Application in E-commerce	Rong Jin	To address the first challenge, I will introduce two different techniques for robust matching.
140	A Distributed Graph Algorithm for Discovering Unique Behavioral Groups from Large-Scale Telco Data	Qirong Ho, Wenqing Lin, Eran Shaham, Shonali Krishnaswamy, The Anh Dang, Jingxuan Wang, Isabel Choo Zhongyan, Amy She-Nash	In this paper we propose a novel graph edge-clustering algorithm (DGEC) that can discover unique behavioral groups, from rich usage data sets (such as CDRs and beyond).
141	Urban Traffic Prediction through the Second Use of Inexpensive Big Data from Buildings	Zimu Zheng, Dan Wang, Jian Pei, Yi Yuan, Cheng Fan, Fu Xiao	In this paper, we report a novel and interesting case study of urban traffic prediction in Central, Hong Kong, one of the densest urban areas in the world.
142	A Probabilistic Multi-Touch Attribution Model for Online Advertising	Wendi Ji, Xiaoling Wang, Dell Zhang	In this paper, we propose a novel Probabilistic Multi-Touch Attribution (PMTA) model which takes into account not only which ads have been viewed or clicked by the user but also when each such interaction occurred.
143	Optimizing Ad Allocation in Social Advertising	Shaojie Tang, Jing Yuan	The goal of this work is to optimize the ad allocation from the platform’s perspective.
144	Joint Collaborative Ranking with Social Relationships in Top-N Recommendation	Dimitrios Rafailidis, Fabio Crestani	In this study, to account for the fact that the selections of social friends can improve the recommendation accuracy, we propose a joint CR model based on the users’ social relationships.
145	Modeling Customer Engagement from Partial Observations	Jelena Stojanovic, Djordje Gligorijevic, Zoran Obradovic	We address this problem by proposing a robust framework for structured regression on deficient data in evolving networks with a supervised representation learning based on neural features embedding.
146	On the Effectiveness of Query Weighting for Adapting Rank Learners to New Unlabelled Collections	Pengfei Li, Mark Sanderson, Mark Carman, Falk Scholer	Past work has shown that this approach can be used to significantly improve effectiveness; in this work, the approach is re-examined on a wide set of publicly available L2R test collections with more advanced learning to rank algorithms.
147	One Query, Many Clicks: Analysis of Queries with Multiple Clicks by the Same User	Elad Kravi, Ido Guy, Avihai Mejer, David Carmel, Yoelle Maarek, Dan Pelleg, Gilad Tsur	In this paper, we study multi-click queries – queries for which more than one click is performed by the same user within the same query session.
148	Precision-Oriented Query Facet Extraction	Weize Kong, James Allan	Recent work proposed an alternative solution that extracts facets for queries from their web search results, but neglected the precision-oriented perspective of the task — users are likely to care more about precision of presented facets than recall.
149	Learning to Rewrite Queries	Yunlong He, Jiliang Tang, Hua Ouyang, Changsung Kang, Dawei Yin, Yi Chang	In this paper, we propose a learning to rewrite framework that consists of a candidate generating phase and a candidate ranking phase.
150	When is the Time Ripe for Natural Language Processing for Patent Passage Retrieval?	Linda Andersson, Mihai Lupu, João Palotti, Allan Hanbury, Andreas Rauber	In this paper we explore query generation using natural language processing technologies in order to capture domain specific concepts represented as multi-word units.
151	A Probabilistic Fusion Framework	Yael Anava, Anna Shtok, Oren Kurland, Ella Rabinovich	Herein we present a probabilistic framework for the fusion task.
152	Selective Cluster-Based Document Retrieval	Or Levi, Fiana Raiber, Oren Kurland, Ido Guy	We address the long standing challenge of selective cluster-based retrieval; namely, deciding on a per-query basis whether to apply cluster-based document retrieval or standard document retrieval. To address this classification task, we propose a few sets of features based on those utilized by the cluster-based ranker, query-performance predictors, and properties of the clustering structure.
153	Pseudo-Relevance Feedback Based on Matrix Factorization	Hamed Zamani, Javid Dadashkarimi, Azadeh Shakery, W. Bruce Croft	In this paper, we look at the PRF task as a recommendation problem: the goal is to recommend a number of terms for a given query along with weights, such that the final weights of terms in the updated query model better reflect the terms’ contributions in the query.
154	Uncovering the Spatio-Temporal Dynamics of Memes in the Presence of Incomplete Information	Hancheng Ge, James Caverlee, Nan Zhang, Anna Squicciarini	Hence, in this paper, we investigate new methods for uncovering the full (underlying) distribution through a novel spatio-temporal dynamics recovery framework which models the latent relationships among locations, memes, and times.
155	From Recommendation to Profile Inference (Rec2PI): A Value-added Service to Wi-Fi Data Mining	Cheng Chen, Fang Dong, Kui Wu, Venkatesh Srinivasan, Alex Thomo	To tackle the technical challenges in profile inference, we propose novel algorithms built using copulas, a statistical tool suitable for capturing complex dependence structure beyond the scope of linear dependence.
156	On Backup Battery Data in Base Stations of Mobile Networks: Measurement, Analysis, and Optimization	Xiaoyi Fan, Feng Wang, Jiangchuan Liu	In this paper, we conduct a systematical analysis on a real world dataset collected from the battery groups installed on the base stations of China Mobile, with totally 1,550,032,984 records from July 28th, 2014 to February 17th, 2016.
157	Automatic Generation and Validation of Road Maps from GPS Trajectory Data Sets	Hengfeng Li, Lars Kulik, Kotagiri Ramamohanarao	To address these challenges, we propose a novel Spatial-Linear Clustering (SLC) technique to infer road segments from GPS traces.
158	Fully Dynamic Shortest-Path Distance Query Acceleration on Massive Networks	Takanori Hayashi, Takuya Akiba, Ken-ichi Kawarabayashi	In this paper, we present the first algorithm that can process exact distance queries on fully dynamic billion-scale networks besides trivial non-indexing algorithms, which combines an online bidirectional breadth-first search (BFS) and an offline indexing method for handling billion-scale networks in memory.
159	Hierarchical and Dynamic	Takuya Akiba, Yosuke Yano, Naoto Mizuno	To address these issues, we propose novel k-APC construction and maintenance algorithms.
160	Efficient Computation of Importance Based Communities in Web-Scale Networks Using a Single Machine	Shu Chen, Ran Wei, Diana Popova, Alex Thomo	In this paper, our goal is to scale-up the computation of top-r, k-core communities to web-scale graphs of tens of billions of edges.
161	Collective Classification via Discriminative Matrix Factorization on Sparsely Labeled Networks	Daokun Zhang, Jie Yin, Xingquan Zhu, Chengqi Zhang	In this paper, we propose a novel discriminative matrix factorization (DMF) based algorithm that effectively learns a latent network representation by exploiting topological paths between labeled and unlabeled nodes, in addition to nodes’ content information.
162	LogMine: Fast Pattern Recognition for Log Analytics	Hossein Hamooni, Biplob Debnath, Jianwu Xu, Hui Zhang, Guofei Jiang, Abdullah Mueen	We propose a method, named LogMine, that extracts high quality patterns for a given set of log messages.
163	Scaling Factorization Machines with Parameter Server	Erheng Zhong, Yue Shi, Nathan Liu, Suju Rajan	We propose a new system framework that integrates Parameter Server (PS) with the Map/Reduce (MR) framework.
164	DI-DAP: An Efficient Disaster Information Delivery and Analysis Platform in Disaster Management	Tao Li, Wubai Zhou, Chunqiu Zeng, Qing Wang, Qifeng Zhou, Dingding Wang, Jia Xu, Yue Huang, Wentao Wang, Minjing Zhang, Steve Luis, Shu-Ching Chen, Naphtali Rishe	To present an integrated solution to address the information explosion problem during the disaster period, we designed and implemented DI-DAP, an efficient and effective disaster information delivery and analysis platform.
165	Approximate Aggregates in Oracle 12C	Hong Su, Mohamed Zait, Vladimir Barrière, Joseph Torres, Andre Menck	Alternative algorithms considered in this paper are approximate aggregates that perform a lot better at the cost of reduced and tolerable accuracy.
166	Supervised Feature Selection by Preserving Class Correlation	Jun Wang, Jinmao Wei, Zhenglu Yang	In this paper, we propose effective supervised feature selection techniques to address the problems.
167	CGMOS: Certainty Guided Minority OverSampling	Xi Zhang, Di Ma, Lin Gan, Shanshan Jiang, Gady Agam	In this paper we propose a novel extension to the SMOTE algorithm with a theoretical guarantee for improved classification performance.
168	Learning Hidden Features for Contextual Bandits	Huazheng Wang, Qingyun Wu, Hongning Wang	In this paper, we propose to learn the hidden features for contextual bandit algorithms.
169	Constructing Reliable Gradient Exploration for Online Learning to Rank	Tong Zhao, Irwin King	In this paper, we propose two OLR algorithms that improve the reliability of the exploration by constructing robust exploratory directions.
170	A Model-Free Approach to Infer the Diffusion Network from Event Cascade	Yu Rong, Qiankun Zhu, Hong Cheng	Different from previous works focusing on building models, we propose to interpret the diffusion process from the cascade data directly in a non-parametric way, and design a novel and efficient algorithm named Non-Parametric Distributional Clustering (NPDC).
171	Multiple Infection Sources Identification with Provable Guarantees	Hung T. Nguyen, Preetam Ghosh, Michael L. Mayo, Thang N. Dinh	In this paper, we propose a new approach to identify infection sources by searching for a seed set S that minimizes the symmetric difference between the cascade from S and V_I, the given set of infected nodes.
172	Information Diffusion at Workplace	Jiawei Zhang, Philip S. Yu, Yuanhua Lv, Qianyi Zhan	In this paper, we want to study the information diffusion among employees at workplace via both online ESNs and online contacts.
173	Targeted Influence Maximization in Social Networks	Chonggang Song, Wynne Hsu, Mong Li Lee	In this paper, we formalize the problem targeted influence maximization in social networks.
174	Updating an Existing Social Graph Snapshot via a Limited API	Norases Vesdapunt, Hector Garcia-Molina	In this paper, we focus on updating a social graph snapshot.
175	Making Sense of Entities and Quantities in Web Tables	Yusra Ibrahim, Mirek Riedewald, Gerhard Weikum	This paper aims to overcome this problem by automatically canonicalizing header names and cell values onto concepts, classes, entities and uniquely represented quantities registered in a knowledge base.
176	Influence Maximization for Complementary Goods: Why Parties Fail to Cooperate?	Han-Ching Ou, Chung-Kuang Chou, Ming-Syan Chen	We consider the problem where companies provide different types of products and want to promote their products through viral marketing simultaneously.
177	Effective Spelling Correction for Eye-based Typing using domain-specific Information about Error Distribution	Raiza Hanada, Maria da Graça C. Pimentel, Marco Cristo, Fernando Anglada Lores	We address these problems by combining estimates extracted from general error corpora with domain-specific knowledge about eye-based input.
178	Computing and Summarizing the Negative Skycube	Nicolas Hanusse, Patrick Kamnang Wanko, Sofian Maabout	In this paper, we consider the complementary statement, i.e., “for every tuple t, list the skylines where t does not belong to".
179	Efficient Orthogonal Non-negative Matrix Factorization over Stiefel Manifold	Wei Emma Zhang, Mingkui Tan, Quan Z. Sheng, Lina Yao, Qingfeng Shi	In this paper, we propose a method, called Nonlinear Riemannian Conjugate Gradient ONMF (NRCG-ONMF), which updates U and V alternatively and preserves the orthogonality of U while achieving fast convergence speed.
180	Paired Restricted Boltzmann Machine for Linked Data	Suhang Wang, Jiliang Tang, Fred Morstatter, Huan Liu	In this paper, we aim to design a new type of Restricted Boltzmann Machines that takes advantage of linked data.
181	LDA Revisited: Entropy, Prior and Convergence	Jianwei Zhang, Jia Zeng, Mingxuan Yuan, Weixiong Rao, Jianfeng Yan	In this paper, we revisit these three algorithms from the entropy perspective, and show that EM can achieve the best predictive perplexity (a standard performance metric for LDA accuracy) by minimizing directly the cross entropy between the observed word distribution and LDA’s predictive distribution.
182	Cost-Effective Stream Join Algorithm on Cloud System	Junhua Fang, Rong Zhang, Xiaotong Wang, Tom Z.J. Fu, Zhenjie Zhang, Aoying Zhou	In this paper, we propose a cost-effective stream join algorithm, which ensures the adaptability of Join-Matrix but with lower resources consumption.
183	Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks	Ryan A. Rossi, Rong Zhou	In this paper, we propose a key class of hybrid parallel graphlet algorithms that leverages multiple CPUs and GPUs simultaneously for computing k-vertex induced subgraph statistics (called graphlets).
184	Scalable Local-Recoding Anonymization using Locality Sensitive Hashing for Big Data Privacy Preservation	Xuyun Zhang, Christopher Leckie, Wanchun Dou, Jinjun Chen, Ramamohanarao Kotagiri, Zoran Salcic	In this paper, we propose a highly scalable approach to local-recoding anonymization in cloud computing, based on Locality Sensitive Hashing (LSH).
185	Approximate Discovery of Functional Dependencies for Large Datasets	Tobias Bleifuß, Susanne Bülow, Johannes Frohnhofen, Julian Risch, Georg Wiese, Sebastian Kruse, Thorsten Papenbrock, Felix Naumann	In particular, we introduce AID-FD, an algorithm that approximately discovers FDs within runtimes up to orders of magnitude faster than state-of-the-art FD discovery algorithms.
186	On Structural Health Monitoring Using Tensor Analysis and Support Vector Machine with Artificial Negative Data	Prasad Cheema, Nguyen Lu Dang Khoa, Mehrisadat Makki Alamdari, Wei Liu, Yang Wang, Fang Chen, Peter Runcie	In our approach, we propose the use of tensor learning and support vector machines with artificial negative data generated by density estimation techniques for damage detection, localization and estimation in a one-class manner.
187	A Self-Learning and Online Algorithm for Time Series Anomaly Detection, with Application in CPU Manufacturing	Xing Wang, Jessica Lin, Nital Patel, Martin Braun	To address these limitations, we propose a self-learning online anomaly detection algorithm that automatically identifies anomalous time series, as well as the exact locations where the anomalies occur in the detected time series.
188	Deep Match between Geology Reports and Well Logs Using Spatial Information	Bin Tong, Martin Klinkigt, Makoto Iwayama, Yoshiyuki Kobayashi, Anshuman Sahu, Ravigopal Vennelakanti	We propose both linear and nonlinear (artificial neural network) models to achieve such an embedding.
189	MIST: Missing Person Intelligence Synthesis Toolkit	Elham Shaabani, Hamidreza Alvari, Paulo Shakarian, J.E. Kelly Snyder	This paper introduces the Missing Person Intelligence Synthesis Toolkit (MIST) which leverages a data-driven variant of geospatial abductive inference.
190	Skipping Word: A Character-Sequential Representation based Framework for Question Answering	Lingxun Meng, Yan Li, Mengyi Liu, Peng Shu	Compared with deep models pre-trained on word embedding (WE) strategy, our character-sequential representation (CSR) based method shows a much simpler procedure and more stable performance across different benchmarks.
191	Towards Time-Discounted Influence Maximization	Arijit Khan	The problem that we solve in this paper is to maximize the expected aggregated value of this utility function over all network users.
192	Quantifying Query Ambiguity with Topic Distributions	Yuki Yano, Yukihiro Tagami, Akira Tajima	In this paper, we propose a new approach for quantifying query ambiguity using topic distributions.
193	ASNets: A Benchmark Dataset of Aligned Social Networks for Cross-Platform User Modeling	Xuezhi Cao, Yong Yu	Therefore, in this paper we propose ASNets, a benchmark dataset with two sets of aligned social networks.
194	Data Locality in Graph Engines: Implications and Preliminary Experimental Results	Yong-Yeon Jo, Jiwon Hong, Myung-Hwan Jang, Jae-Geun Bang, Sang-Wook Kim	In this paper, we show the importance of data locality with graph algorithms by running on graph engines based on a single machine.
195	Active Zero-Shot Learning	Sihong Xie, Shaoxiong Wang, Philip S. Yu	To resolve this issue, we propose an active class selection strategy to intelligently query labeled data for a parsimonious set of informative classes.
196	Learning to Account for Good Abandonment in Search Success Metrics	Madian Khabsa, Aidan Crook, Ahmed Hassan Awadallah, Imed Zitouni, Tasos Anastasakos, Kyle Williams	In this work we describe how a search success metric can be augmented to account for good abandonment sessions using a machine learned metric that depends on user’s viewport information.
197	Modeling and Predicting Popularity Dynamics via an Influence-based Self-Excited Hawkes Process	Peng Bao	In this paper, we propose a probabilistic model using an influence-based self-excited Hawkes process (ISEHP) to characterize the process through which individual microblogs gain their popularity.
198	Incorporate Group Information to Enhance Network Embedding	Jifan Chen, Qi Zhang, Xuanjing Huang	In this paper, we investigate a novel method for learning the network embeddings with valuable group information for large-scale networks.
199	Exploiting Cluster-based Meta Paths for Link Prediction in Signed Networks	Jiangfeng Zeng, Ke Zhou, Xiao Ma, Fuhao Zou, Hua Wang	In order to solve this issue, in this paper, we introduce a novel sign prediction model by exploiting cluster-based meta paths, which can take advantage of both local and global information of the input networks.
200	Predicting Importance of Historical Persons using Wikipedia	Adam Jatowt, Daisuke Kawai, Katsumi Tanaka	In this work, we are interested in utilizing Wikipedia for judging historical person’s importance.
201	Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks	Jinfeng Rao, Hua He, Jimmy Lin	Unlike previous work which treats this task as a straightforward pointwise classification problem, we model this problem as a ranking task and propose a pairwise ranking approach that can directly exploit existing pointwise neural network models as base components.
202	Global and Local Influence-based Social Recommendation	Qinzhe Zhang, Jia Wu, Hong Yang, Weixue Lu, Guodong Long, Chengqi Zhang	In this paper, we introduce a new global and local influence-based social recommendation model.
203	Tag-Aware Personalized Recommendation Using a Deep-Semantic Similarity Model with Negative Sampling	Zhenghua Xu, Cheng Chen, Thomas Lukasiewicz, Yishu Miao, Xiangwu Meng	In this paper, we propose a deep neural network approach to solve this problem by mapping both the tag-based user and item profiles to an abstract deep feature space, where the deep-semantic similarities between users and their target items (resp., irrelevant items) are maximized (resp., minimized).
204	Personalized Semantic Word Vectors	Javid Ebrahimi, Dejing Dou	In this paper, we present a word representation scheme that incorporates authorship information.
205	Query Expansion Using Word Embeddings	Saar Kuzi, Anna Shtok, Oren Kurland	We present a suite of query expansion methods that are based on word embeddings.
206	Efficient Distributed Regular Path Queries on RDF Graphs Using Partial Evaluation	Xin Wang, Junhu Wang, Xiaowang Zhang	We propose an efficient distributed method for answering regular path queries (RPQs) on large-scale RDF graphs using partial evaluation.
207	Webpage Depth-level Dwell Time Prediction	Chong Wang, Achir Kalra, Cristian Borcea, Yi Chen	This paper presents a model to predict the dwell time for a given "user, webpage, depth" triplet based on historic data collected by publishers.
208	Collaborative Social Group Influence for Event Recommendation	Li Gao, Jia Wu, Zhi Qiao, Chuan Zhou, Hong Yang, Yue Hu	To this end, we propose a new Bayesian latent factor model SogBmf that combines social group influence and individual preference for event recommendation.
209	Graph-Based Multi-Modality Learning for Clinical Decision Support	Ziwei Zheng, Xiaojun Wan	In this paper, we propose to use the paragraph vector technique to learn the latent semantic representation of texts and treat the latent semantic representations and the original bag-of-words representations as two different modalities.
210	Where are You Tweeting?: A Context and User Movement Based Approach	Zhi Liu, Yan Huang	In this paper, we propose a Hidden-Markov-based model to integrate tweet contents and user movements for geotagging.
211	Ensemble Learned Vaccination Uptake Prediction using Web Search Queries	Niels Dalum Hansen, Christina Lioma, Kåre Mølbak	We present a method that uses ensemble learning to combine clinical and web-mined time-series data in order to predict future vaccination uptake.
212	Location-aware Friend Recommendation in Event-based Social Networks: A Bayesian Latent Factor Approach	Yao Lu, Zhi Qiao, Chuan Zhou, Yue Hu, Li Guo	In this paper we study the friend recommendation problem in event-based social networks (EBSNs).
213	Extracting Skill Endorsements from Personal Communication Data	Darshan M. Shankaralingappa, Gianmarco De Fransicsi Morales, Aristides Gionis	In this paper, we mine personal communication data with the goal of generating skill endorsements of the type "person A endorses person B on skill X." To address privacy concerns, we consider that each person has access only to their own data (i.e., conversations with their peers).
214	A Self-Organizing Map for Identifying InfluentialCommunities in Speech-based Networks	Sameen Mansha, Faisal Kamiran, Asim Karim, Aizaz Anwar	In this paper, we present a self-organizing map (SOM) for discovering and visualizing influential communities of users in SBNs.
215	Crowdsourcing-based Urban Anomaly Prediction System for Smart Cities	Chao Huang, Xian Wu, Dong Wang	In this paper, we develop a Crowdsourcing-based Urban Anomaly Prediction Scheme (CUAPS) to accurately predict the anomalies of a city by exploring both spatial and temporal information embedded in the crowdsourcing data.
216	Near Real-time Geolocation Prediction in Twitter Streams via Matrix Factorization Based Regression	Nghia Duong-Trung, Nicolas Schilling, Lars Schmidt-Thieme	In this work, we develop a novel generative content-based regression model via a matrix factorization technique to tackle the near real-time geolocation prediction problem.
217	Distilling Word Embeddings: An Encoding Approach	Lili Mou, Ran Jia, Yan Xu, Ge Li, Lu Zhang, Zhi Jin	We propose an encoding approach to distill task-specific knowledge from a set of high-dimensional embeddings, so that we can reduce model complexity by a large margin as well as retain high accuracy, achieving a good compromise between efficiency and performance.
218	Regularising Factorised Models for Venue Recommendation using Friends and their Comments	Jarana Manotumruksa, Craig Macdonald, Iadh Ounis	We propose a MF regularisation technique that seamlessly incorporates both social network information and textual comments, by exploiting word embeddings to estimate a semantic similarity of friends based on their explicit textual feedback, to regularise the complexity of the factorised model.
219	Improving Search Results with Prior Similar Queries	Yashar Moshfeghi, Kristiyan Velinov, Peter Triantafillou	This paper describes a novel approach to re-ranking search engine result pages (SERP): Its fundamental principle is to re-rank results to a given query, based on exploiting evidence gathered from past similar search queries. We construct a set of features from our similarity graph and build a prediction model using the Hoeffding decision tree algorithm.
220	The Solitude of Relevant Documents in the Pool	Aldo Lipani, Mihai Lupu, Evangelos Kanoulas, Allan Hanbury	Recently, methods to address this pool bias for previously created test collections have been proposed, for the evaluation measure precision at cut-off ([email protected]).
221	Scarce Feature Topic Mining for Video Recommendation	Wei Lu, Fu-lai Chung, Kunfeng Lai	Targeting the long tail phenomena of user behavior and sparsity of item features, we propose a personalized compound recommendation framework for online video recommendation called Dirichlet mixture probit model for information scarcity (DPIS).
222	Learning to Re-Rank Questions in Community Question Answering Using Advanced Features	Giovanni Da San Martino, Alberto Barrón Cedeño, Salvatore Romeo, Antonio Uva, Alessandro Moschitti	We study the impact of different types of features for question ranking in community Question Answering: bag-of-words models (BoW), syntactic tree kernels (TKs) and rank features.
223	Learning to Rank System Configurations	Romain Deveaud, Josiane Mothe, Jian-Yun Nie	We propose to tackle this problem by dealing with entire system configurations (i.e. a set of parameters representing an IR system) instead of single parameters, and to apply state-of-the-art Learning to Rank techniques to select the most appropriate configuration for a given query.
224	Adaptive Distributional Extensions to DFR Ranking	Casper Petersen, Jakob Grue Simonsen, Kalervo Järvelin, Christina Lioma	Adaptive Distributional Extensions to DFR Ranking
225	CyberRank: Knowledge Elicitation for Risk Assessment of Database Security	Hagit Grushka – Cohen, Oded Sofer, Ofer Biller, Bracha Shapira, Lior Rokach	In this paper, we propose CyberRank, a novel algorithm for automatic preference elicitation that is effective for situations with limited experts’ time and outperforms other algorithms for initial training of the system.
226	Online Food Recipe Title Semantics: Combining Nutrient Facts and Topics	Tomasz Kusmierczyk, Kjetil Nørvåg	To contribute to this lack of knowledge, we present a novel approach to mine and model online food content by combining text topics with related nutrient facts.
227	A Non-Parametric Topic Model for Short Texts Incorporating Word Coherence Knowledge	Yuhao Zhang, Wenji Mao, Daniel Zeng	To tackle these problems, in this paper, we propose a non-parametric topic model npCTM with the above distinction.
228	Forecasting Seasonal Time Series Using Weighted Gradient RBF Network based Autoregressive Model	Wenjie Ruan, Quan Z. Sheng, Peipei Xu, Nguyen Khoi Tran, Nickolas J.G. Falkner, Xue Li, Wei Emma Zhang	In this paper, we propose a weighted gradient Radial Basis Function Network based AutoRegressive (WGRBF-AR) model for modeling and predicting the nonlinear and non-stationary seasonal time series.
229	When Sensor Meets Tensor: Filling Missing Sensor Values Through a Tensor Approach	Wenjie Ruan, Peipei Xu, Quan Z. Sheng, Nguyen Khoi Tran, Nickolas J.G. Falkner, Xue Li, Wei Emma Zhang	In this paper, we formulate the time-series sensor data as a 3-order tensor that naturally preserves sensors’ temporal and spatial dependencies.
230	PEQ: An Explainable, Specification-based, Aspect-oriented Product Comparator for E-commerce	Abhishek Sikchi, Pawan Goyal, Samik Datta	In this paper, we extend the existing model by incorporating the feature specifications of the products, which are easily available, and learn the importance to be associated with each of them.
231	Forecasting Geo-sensor Data with Participatory Sensing Based on Dropout Neural Network	Jyun-Yu Jiang, Cheng-Te Li	In this paper, we propose a novel concept to forecast geosensor data with participatory sensing.
232	Iterative Search using Query Aspects	Manmeet Singh, W. Bruce Croft	We propose a new iterative feedback method that combines PRF with aspect generation to improve feedback effectiveness.
233	A Preference Approach to Reputation in Sponsored Search	Aritra Ghosh, Dinesh Gaurav, Rahul Agrawal	In this study, we motivate and propose a pairwise preference relation model to study the advertiser reputation problem.
234	Clustering Speed in Multi-lane Traffic Networks	Bing Zhang, Goce Trajcevski, Feiying Liu	We address the problem of efficient spatio-temporal clustering of speed data in road segments with multiple lanes.
235	Learning to Rank Non-Factoid Answers: Comment Selection in Web Forums	Kateryna Tymoshenko, Daniele Bonadiman, Alessandro Moschitti	In this paper, we design state-of-the-art models for non-factoid QA also carried out on noisy data.
236	A Theoretical Framework on the Ideal Number of Classifiers for Online Ensembles in Data Streams	Hamed R. Bonab, Fazli Can	Our theoretical framework shows that using the same number of independent component classifiers as class labels gives the highest accuracy.
237	User Modeling on Twitter with WordNet Synsets and DBpedia Concepts for Personalized Recommendations	Guangyuan Piao, John G. Breslin	In this short paper, instead of using concepts alone, we propose using synsets from WordNet and concepts from DBpedia for representing user interests.
238	Improving Entity Ranking for Keyword Queries	John Foley, Brendan O’Connor, James Allan	We propose a set of features that do not require index-time entity linking, and demonstrate competitive performance on the new dataset.
239	The Healing Power of Poison: Helpful Non-relevant Documents in Feedback	Mostafa Dehghani, Samira Abnar, Jaap Kamps	In this paper, we study the positive counterpart of this by investigating the helpfulness of nonrelevant documents in feedback.
240	Probabilistic Approaches to Controversy Detection	Myungha Jang, John Foley, Shiri Dori-Hacohen, James Allan	In this paper, we propose a probabilistic framework to detect controversy on the web, and investigate two models.
241	Evaluating Document Retrieval Methods for Resource Selection in Clustered P2P IR	Rami Suleiman Alkhawaldeh, Joemon M. Jose, Deepak P	We observe that semantic heterogeneity is mitigated in the clustered 2-tier P2P IR architecture resource selection layer by way of usage of clustering, and posit that this necessitates a re-look at the applicability of document retrieval methods for resource selection within such a framework.
242	Detecting and Ranking Conceptual Links between Texts Using a Knowledge Base	Martin Tutek, Goran Glavas, Jan Šnajder, Natasa Milić-Frayling, Bojana Dalbelo Basic	Recent research has explored the use of Knowledge Bases (KBs) to represent documents as subgraphs of a KB concept graph and define metrics to characterize semantic relatedness of documents in terms of properties of the document concept graphs.
243	DePP: A System for Detecting Pages to Protect in Wikipedia	Kelsey Suyehira, Francesca Spezzano	In this paper we consider for the first time the problem of deciding whether a page should be protected or not in a collaborative environment such as Wikipedia. We formulate the problem as a binary classification task and propose a novel set of features to decide which pages to protect based on (i) users page revision behavior and (ii) page categories.
244	Hashtag Recommendation Based on Topic Enhanced Embedding, Tweet Entity Data and Learning to Rank	Quanzhi Li, Sameena Shah, Armineh Nourbakhsh, Xiaomo Liu, Rui Fang	In this paper, we present a new approach of recommending hashtags for tweets.
245	An Experimental Comparison of Iterative MapReduce Frameworks	Haejoon Lee, Minseo Kang, Sun-Bum Youn, Jae-Gil Lee, YongChul Kwon	In this paper, we experimentally compare Hadoop and the aforementioned systems using various workloads and metrics.
246	A Density-Based Approach to the Retrieval of Top-K Spatial Textual Clusters	Dingming Wu, Christian S. Jensen	To compute this query, the paper proposes a basic and an advanced algorithm that rely on on-line density-based clustering.
247	Top-N Recommendation on Graphs	Zhao Kang, Chong Peng, Ming Yang, Qiang Cheng	To alleviate this problem, this paper proposes a simple recommendation algorithm that fully exploits the similarity information among users and items and intrinsic structural information of the user-item matrix.
248	KB-Enabled Query Recommendation for Long-Tail Queries	Zhipeng Huang, Bogdan Cautis, Reynold Cheng, Yudian Zheng	To handle such queries, we study a new solution, which makes use of a knowledge base (or KB), such as YAGO and Freebase.
249	RAP: Scalable RPCA for Low-rank Matrix Recovery	Chong Peng, Zhao Kang, Ming Yang, Qiang Cheng	In this paper, we propose a novel RPCA approach that eliminates the need for SVD of large matrices.
250	Query Answering Efficiency in Expert Networks Under Decentralized Search	Liang Ma, Mudhakar Srivatsa, Derya Cansever, Xifeng Yan, Sue Kase, Michelle Vanni	In this regard, we investigate decentralized search by quantifying its performance under a variety of network settings.
251	A Study of Realtime Summarization Metrics	Matthew Ekstrand-Abueg, Richard McCreadie, Virgil Pavlu, Fernando Diaz	In this paper, we present a study of TREC-TS track evaluation methodology, with the aim of documenting its design, analyzing its effectiveness, as well as identifying improvements and best practices for the evaluation of temporal summarization systems.
252	Framing Mobile Information Needs: An Investigation of Hierarchical Query Sequence Structure	Shuguang Han, Xing Yi, Zhen Yue, Zhigeng Geng, Alyssa Glass	(2) We identify several differences between mobile and desktop search patterns in terms of goal/mission length, duration and interleaving.
253	A Context-aware Collaborative Filtering Approach for Urban Black Holes Detection	Li Jin, Zhuonan Feng, Ling Feng	In this paper, we model the urban black holes in each region of New York City (NYC) at different time intervals with a 3-dimensional tensor by fusing cross-domain data sources.
254	Combining Powers of Two Predictors in Optimizing Real-Time Bidding Strategy under Constrained Budget	Chi-Chun Lin, Kun-Ta Chuang, Wush Chi-Hsuan Wu, Ming-Syan Chen	In this paper, a method combining powers of two prediction models is proposed, and experiments with real world RTB datasets from benchmarking the new algorithm with a classic CTR-only method are presented.
255	Attractiveness versus Competition: Towards an Unified Model for User Visitation	Thanh-Nam Doan, Ee-Peng Lim	Attractiveness versus Competition: Towards an Unified Model for User Visitation
256	OptMark: A Toolkit for Benchmarking Query Optimizers	Zhan Li, Olga Papaemmanouil, Mitch Cherniack	To address this challenge, this paper introduces OptMark, a toolkit for evaluating the quality of a query optimizer.
257	Multi-Dueling Bandits and Their Application to Online Ranker Evaluation	Brian Brost, Yevgeny Seldin, Ingemar J. Cox, Christina Lioma	We evaluate our algorithm on standard large-scale online ranker evaluation datasets.
258	Robust Contextual Outlier Detection: Where Context Meets Sparsity	Jiongqian Liang, Srinivasan Parthasarathy	To address these problems, here we propose a novel and robust approach alternative to the state-of-the-art called RObust Contextual Outlier Detection (ROCOD).
259	Credibility Assessment of Textual Claims on the Web	Kashyap Popat, Subhabrata Mukherjee, Jannik Strötgen, Gerhard Weikum	For inference, our method leverages the joint interaction between the language of articles about the claim and the reliability of the underlying web sources.
260	Collective Traffic Prediction with Partially Observed Traffic History using Location-Based Social Media	Xinyue Liu, Xiangnan Kong, Yanhua Li	In this paper, we propose to use location-based social media, which captures a much larger area of the road systems than deployed sensors, to predict the traffic conditions.
261	Recommendations For Streaming Data	Karthik Subbian, Charu Aggarwal, Kshiteesh Hegde	In this paper, we present a probabilistic neighborhood-based algorithm for performing recommendations in real-time.
262	PRO: Preference-Aware Recurring Query Optimization	Zhongfang Zhuang, Chuan Lei, Elke Rundensteiner, Mohamed Eltabakh	In this work, we propose PRO, a preference-aware recurring query processing system that optimizes recurring query executions complying with user preferences.
263	Discovering Temporal Purchase Patterns with Different Responses to Promotions	Ling Luo, Bin Li, Irena Koprinska, Shlomo Berkovsky, Fang Chen	Given a transaction data set collected by an Australian national supermarket chain, in this paper we conduct a case study aimed at discovering customers’ long-term purchase patterns, which may be induced by preference changes, as well as short-term purchase patterns, which may be induced by promotions.
264	ZEST: A Hybrid Model on Predicting Passenger Demand for Chauffeured Car Service	Hua Wei, Yuandong Wang, Tianyu Wo, Yaxiao Liu, Jie Xu	In this paper, we propose a Zero-Grid Ensemble Spatio Temporal model (ZEST) to predict passenger demand with four predictors: a temporal predictor and a spatial predictor to model the influences of local and spatial factors separately, an ensemble predictor to combine the results of former two predictors comprehensively and a Zero-Grid predictor to predict zero demand areas specifically since any cruising within these areas costs extra waste on energy and time of driver.
265	A Filtering-based Clustering Algorithm for Improving Spatio-temporal Kriging Interpolation Accuracy	Qiao Kang, Wei-keng Liao, Ankit Agrawal, Alok Choudhary	To address this problem, this paper presents a new filtering-based clustering algorithm that partitions data into clusters such that the interpolation error within each cluster is significantly reduced, which in turn improves the overall accuracy.
266	Reuse-based Optimization for Pig Latin	Jesús Camacho-Rodríguez, Dario Colazzo, Melanie Herschel, Ioana Manolescu, Soudip Roy Chowdhury	We present a novel optimization approach aiming at identifying and reusing repeated subexpressions in Pig Latin scripts.
267	Discriminative View Learning for Single View Co-Training	Joseph St.Amand, Jun Huan	In this paper, we investigate techniques to apply co-training to single-view data sets.
268	Learning Points and Routes to Recommend Trajectories	Dawei Chen, Cheng Soon Ong, Lexing Xie	We propose a new F₁ score on pairs of POIs that capture the order of visits.
269	Towards Representation Independent Similarity Search Over Graph Databases	Yodsawalai Chodpathumwan, Amirhossein Aleyasen, Arash Termehchy, Yizhou Sun	We propose an algorithm called R-PathSim, which is provably robust under relationship reorganizing.
270	Why Did You Cover That Song?: Modeling N-th Order Derivative Creation with Content Popularity	Kosetsu Tsukuda, Masahiro Hamasaki, Masataka Goto	In this paper, we propose a model for inferring latent factors from sequences of derivative work posting events.
271	Anomalies in the Peer-review System: A Case Study of the Journal of High Energy Physics	Sandipan Sikdar, Matteo Marsili, Niloy Ganguly, Animesh Mukherjee	Since editors and reviewers are the most important pillars of a reviewing system, we in this work, attempt to address a related question – given the editing/reviewing history of the editors or reviewers "can we identify the under-performing ones?"
272	Multi-source Hierarchical Prediction Consolidation	Chenwei Zhang, Sihong Xie, Yaliang Li, Jing Gao, Wei Fan, Philip S. Yu	We propose a novel multi-source hierarchical prediction consolidation method to effectively exploits the complicated hierarchical label structures to resolve the noisy and conflicting information that inherently originates from multiple imperfect sources.
273	Probabilistic Knowledge Graph Construction: Compositional and Incremental Approaches	Dongwoo Kim, Lexing Xie, Cheng Soon Ong	We propose a new probabilistic knowledge graph factorisation method that benefits from the path structure of existing knowledge (e.g. syllogism) and enables a common modelling approach to be used for both incremental population and knowledge completion tasks.
274	Explaining Sentiment Spikes in Twitter	Anastasia Giachanou, Ida Mele, Fabio Crestani	In this paper, we focus on the problem of tracking sentiment towards different entities, detecting sentiment spikes and on the problem of extracting and ranking the causes of a sentiment spike.
275	Qualitative Cleaning of Uncertain Data	Henning Koehler, Sebastian Link	We propose a new view on data cleaning: Not data itself but the degrees of uncertainty attributed to data are dirty.
276	APAM: Adaptive Eager-Lazy Hybrid Evaluation of Event Patterns for Low Latency	Ilyeop Yi, Jae-Gil Lee, Kyu-Young Whang	In this paper, we propose a hybrid eager-lazy evaluation method that combines the advantages of both methods.
277	OrientStream: A Framework for Dynamic Resource Allocation in Distributed Data Stream Management Systems	Chunkai Wang, Xiaofeng Meng, Qi Guo, Zujian Weng, Chen Yang	This article presents OrientStream, a framework for dynamic resource allocation in DDSMS using incremental machine learning techniques.
278	Tag2Word: Using Tags to Generate Words for Content Based Tag Recommendation	Yong Wu, Yuan Yao, Feng Xu, Hanghang Tong, Jian Lu	In this paper, we put our focus on the content based tag recommendation due to its wider applicability.
279	Digesting Multilingual Reader Comments via Latent Discussion Topics with Commonality and Specificity	Bei Shi, Wai Lam, Lidong Bing, Yinqing Xu	To tackle this task of discovering discussion topics that exhibit commonality or specificity from news reader comments written in different languages, we propose a new model called TDCS based on graphical models, which can cope with the language gap and detect language-common and language-specific latent discussion topics simultaneously.
280	Digesting News Reader Comments via Fine-Grained Associations with Event Facets and News Contents	Bei Shi, Wai Lam	We propose a framework that can digest reader comments automatically via fine-grained associations with event facets and news.
281	Efficient Algorithms for the Two Locus Problem in Genome-Wide Association Study: Algorithms for the Two Locus Problem	Sanguthevar Rajasekaran, Subrata Saha	In this paper we present an algorithm for solving the 2-locus problem that is up to two orders of magnitude faster than the previous best known algorithms.
282	FolkTrails: Interpreting Navigation Behavior in a Social Tagging System	Thomas Niebler, Martin Becker, Daniel Zoller, Stephan Doerfel, Andreas Hotho	In this work, we investigate navigation trails in the popular scholarly social tagging system BibSonomy from six years of log data.
283	Memory-Optimized Distributed Graph Processing through Novel Compression Techniques	Panagiotis Liakos, Katia Papakonstantinopoulou, Alex Delis	In this paper, we propose three space-efficient adjacency list representations that can be applied to any distributed graph processing system.
284	Tracking the Evolution of Congestion in Dynamic Urban Road Networks	Tarique Anwar, Chengfei Liu, Hai L. Vu, Md. Saiful Islam	In this paper, we propose a two-layer method to incrementally update the differently congested partitions from those at the previous time point in an efficient manner, and thus track their evolution.
285	The Rich and the Poor: A Markov Decision Process Approach to Optimizing Taxi Driver Revenue Efficiency	Huigui Rong, Xun Zhou, Chang Yang, Zubair Shafiq, Alex Liu	To address these issues, this paper investigates how to increase the revenue efficiency (revenue per unit time) of taxi drivers, and models the passenger seeking process as a Markov Decision Process (MDP).
286	Ensemble of Anchor Adapters for Transfer Learning	Fuzhen Zhuang, Ping Luo, Sinno Jialin Pan, Hui Xiong, Qing He	Aiming at more robust transfer learning models, we propose an ENsemble framework of anCHOR adapters (ENCHOR for short), in which an anchor adapter adapts the features of instances based on their similarities to a specific anchor (i.e., a selected instance).
287	Incremental Mining of High Utility Sequential Patterns in Incremental Databases	Jun-Zhe Wang, Jiun-Long Huang	In view of this, we propose the IncUSP-Miner algorithm to mine HUSPs incrementally.
288	Understanding Stability of Noisy Networks through Centrality Measures and Local Connections	Vladimir Ufimtsev, Soumya Sarkar, Animesh Mukherjee, Sanjukta Bhowmick	In this paper, we study the effect of noise in changing ranks of the high centrality vertices.
289	Online Adaptive Topic Focused Tweet Acquisition	Mehdi Sadri, Sharad Mehrotra, Yaming Yu	In this paper, we address the tweet acquisition challenge to enhance monitoring of tweets based on the client/application needs in an online adaptive manner such that the quality and quantity of the results improves over time.
290	Optimizing Nugget Annotations with Active Learning	Gaurav Baruah, Haotian Zhang, Rakesh Guttikonda, Jimmy Lin, Mark D. Smucker, Olga Vechtomova	In this paper, we present two active learning techniques that prioritize the sequence in which candidate nugget/sentence pairs are presented to an assessor, based on the likelihood that the sentence contains a nugget.
291	Uncovering Fake Likers in Online Social Networks	Prudhvi Ratna Badri Satya, Kyumin Lee, Dongwon Lee, Thanh Tran, Jason (Jiasheng) Zhang	Toward this goal, in this paper, we investigate the problem of detecting the so-called "fake likers" who frequently make fake Likes for illegitimate reasons.
292	Where to Place Your Next Restaurant?: Optimal Restaurant Placement via Leveraging User-Generated Reviews	Feng Wang, Li Chen, Weike Pan	In this paper, we particularly take advantage of user-generated reviews to construct predictive features for assessing the attractiveness of candidate locations to expand a restaurant.
293	Leveraging the Implicit Structure within Social Media for Emergent Rumor Detection	Justin Sampson, Fred Morstatter, Liang Wu, Huan Liu	In this work, we propose a method for classifying conversations within their formative stages as well as improving accuracy within mature conversations through the discovery of implicit linkages between conversation fragments.
294	Automatical Storyline Generation with Help from Twitter	Ting Hua, Xuchao Zhang, Wei Wang, Chang-Tien Lu, Naren Ramakrishnan	This paper introduces a Bayesian model to generate storylines from massive documents and infer the corresponding hidden relations and topics.
295	A Comparative Study of Query-biased and Non-redundant Snippets for Structured Search on Mobile Devices	Nikita V. Spirin, Alexander S. Kotov, Karrie G. Karahalios, Vassil Mladenov, Pavel A. Izhutov	To investigate what kind of snippets are better suited for structured search on mobile devices, we built an experimental mobile search application and conducted a task-oriented interactive user study with 36 participants.
296	Content-Agnostic Malware Detection in Heterogeneous Malicious Distribution Graph	Ibrahim Alabdulmohsin, YuFei Han, Yun Shen, XiangLiang Zhang	We propose a novel Bayesian label propagation model to unify the multi-source information, including content-agnostic features of different node types and topological information of the heterogeneous network.
297	Improving Advertisement Recommendation by Enriching User Browser Cookie Attributes	Liang Wang, Kuang-chih Lee, Quan Lu	In this paper, we try to tackle this problem by using an `assistant identifier’ to find the linkage between different bcookies.
298	Balanced Supervised Non-Negative Matrix Factorization for Childhood Leukaemia Patients	Ali Braytee, Daniel R. Catchpoole, Paul J. Kennedy, Wei Liu	This paper proposes a method with twofold objectives: it implements a balanced supervised non-negative matrix factorization (BSNMF) to handle the class imbalance problem in supervised non-negative matrix factorization techniques.
299	SoLSCSum: A Linked Sentence-Comment Dataset for Social Context Summarization	Minh-Tien Nguyen, Chien-Xuan Tran, Duc-Vu Tran, Minh-Le Nguyen	This paper presents a dataset named SoLSCSum for social context summarization.
300	Distributed Deep Learning for Question Answering	Minwei Feng, Bing Xiang, Bowen Zhou	Comparison studies of SGD, MSGD, ADADELTA, ADAGRAD, ADAM/ADAMAX, RMSPROP, DOWNPOUR and EASGD/EAMSGD algorithms have been presented.
301	Bus Routes Design and Optimization via Taxi Data Analytics	Seong Ping Chuah, Huayu Wu, Yu Lu, Liang Yu, Stephane Bressan	In this paper, we describe a proof of concept effort to discover this weakness and its improvement in public transportation system via mining of taxi ride dataset.
302	Routing an Autonomous Taxi with Reinforcement Learning	Miyoung Han, Pierre Senellart, Stéphane Bressan, Huayu Wu	In this paper we demonstrate that a reinforcement learning algorithm of the Q-learning family, based on a customized exploration and exploitation strategy, is able to learn optimal actions for the routing autonomous taxis in a real scenario at the scale of the city of Singapore with pick-up and drop-off events for a fleet of one thousand taxis.
303	XKnowSearch!: Exploiting Knowledge Bases for Entity-based Cross-lingual Information Retrieval	Lei Zhang, Michael Färber, Achim Rettinger	In this paper, we present XKnowSearch!
304	TweetSift: Tweet Topic Classification Based on Entity Knowledge Base and Topic Enhanced Word Embedding	Quanzhi Li, Sameena Shah, Xiaomo Liu, Armineh Nourbakhsh, Rui Fang	In this demonstration, we present TweetSift, an efficient and effective real time tweet topic classifier.
305	PARC: Privacy-Aware Data Cleaning	Dejun Huang, Dhruv Gairola, Yu Huang, Zheng Zheng, Fei Chiang	In this demonstration, we present PARC, a Privacy-AwaRe data Cleaning system that corrects data inconsistencies w.r.t. a set of FDs, and limits the disclosure of sensitive values during the cleaning process.
306	Ease the Process of Machine Learning with Dataflow	Tianyou Guo, Jun Xu, Xiaohui Yan, Jianpeng Hou, Ping Li, Zhaohui Li, Jiafeng Guo, Xueqi Cheng	In this demo we present a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.
307	FIN10K: A Web-based Information System for Financial Report Analysis and Visualization	Yu-Wen Liu, Liang-Chih Liu, Chuan-Ju Wang, Ming-Feng Tsai	In this demonstration, we present FIN10K, a web-based information system that facilitates the analysis of textual information in financial reports.
308	FeatureMiner: A Tool for Interactive Feature Selection	Kewei Cheng, Jundong Li, Huan Liu	In this demonstration, we show (1) How to conduct data preprocessing after loading a dataset; (2) How to apply feature selection algorithms; (3) How to choose a suitable algorithm by visualized performance evaluation.
309	Deola: A System for Linking Author Entities in Web Document with DBLP	Yinan Liu, Wei Shen, Xiaojie Yuan	In this paper, we present Deola, an Online system for Author Entity Linking with DBLP.
310	ConHub: A Metadata Management System for Docker Containers	Chris Xing Tian, Aditya Pan, Yong Chiang Tay	ConHub: A Metadata Management System for Docker Containers
311	BIGtensor: Mining Billion-Scale Tensor Made Easy	Namyong Park, Byungsoo Jeon, Jungwoo Lee, U Kang	In this paper, we propose BIGtensor, a large-scale tensor mining library that tackles both of the above problems.
312	eGraphSearch: Effective Keyword Search in Graphs	Mehdi Kargar, Lukasz Golab, Jaroslaw Szlichta	We demonstrate eGraphSearch, a new system for effective keyword search in graph databases.
313	EnerQuery: Energy-Aware Query Processing	Amine Roukh, Ladjel Bellatreche, Carlos Ordonez	In this paper, we propose EnerQuery, a tool built on top of a traditional DBMS to capitalize the efforts invested in building energy-aware query optimizers, which have the lion’s share in energy consumption.
314	TGraph: A Temporal Graph Data Management System	Haixing Huang, Jinghe Song, Xuelian Lin, Shuai Ma, Jinpeng Huai	To solve these issues, we design and develop TGraph, a temporal graph data management system, that assures the ACID transaction feature, and supports fast temporal graph queries.
315	Analyzing Data Relevance and Access Patterns of Live Production Database Systems	Martin Boissier, Carsten Alexander Meyer, Timo Djürken, Jan Lindemann, Kathrin Mao, Pascal Reinhardt, Tim Specht, Tim Zimmermann, Matthias Uflacker	In this paper, we present a tool set to analyze and compare synthetic and real-world database workloads, their characteristics, and access patterns.
316	Thymeflow, A Personal Knowledge Base with Spatio-temporal Data	David Montoya, Thomas Pellissier Tanon, Serge Abiteboul, Fabian M. Suchanek	We demonstrate an open-source system for integrating user’s data from different sources into a single Knowledge Base.
317	Inferring Traffic Incident Start Time with Loop Sensor Data	Mingxuan Yue, Liyue Fan, Cyrus Shahabi	We present INFIT, a system that infers the incident start time utilizing traffic data collected by loop sensors.
318	TEAMOPT: Interactive Team Optimization in Big Networks	Liangyue Li, Hanghang Tong, Nan Cao, Kate Ehrlich, Yu-Ru Lin, Norbou Buchler	An interesting research question we address in this work is how to maintain and optimize the team performance should certain changes happen to the team.
319	GStreamMiner: A GPU-accelerated Data Stream Mining Framework	Chandima HewaNadungodage, Yuni Xia, John Jaehwan Lee	In this paper, we present GStreamMiner, a GPU-accelerated data stream mining framework and demonstrate its application using outlier detection over continuous streaming data as a case study.
320	QART: A Tool for Quality Assurance in Real-Time in Contact Centers	Ragunathan Mariappan, Balaji Peddamuthu, Preethi R Raajaratnam, Sandipan Dandapat, Neeta Pande, Shourya Roy	In this paper, we describe an automatic real-time quality assurance system QA^RT (pronounced cart) for contact center chats.
321	A Fatigue Strength Predictor for Steels Using Ensemble Data Mining: Steel Fatigue Strength Predictor	Ankit Agrawal, Alok Choudhary	We have developed advanced data-driven ensemble predictive models for this purpose with an extremely high cross-validated accuracy of >98\%, and have deployed these models in a user-friendly online web-tool, which can make very fast predictions of fatigue strength for a given steel represented by its composition and processing information.
322	CyberSafety 2016: The First International Workshop on Computational Methods in CyberSafety	Shivakant Mishra, Qin Lv, Richard Han, Jeremy Blackburn	The main goal of this inaugural workshop on cybersafety is to bring together the researchers and practitioners from academia, industry, government and research labs working in the area of cybersafety to discuss the unique challenges in addressing various cybersafety issues and to share experiences, solutions, tools, and techniques.
323	The Fourth International Workshop on Social Web for Disaster Management (SWDM 2016)	Carlos Castillo, Fernando Diaz, Yu-Ru Lin, Jie Yin	As massive amount of messages posted by users are transformed into semi-structured records via information extraction and natural language processing techniques, there is a growing need for developing advanced techniques to aggregate this large-scale data to gain an understanding of the “big picture” of an emergency, and to detect and predict how a disaster could develop.
324	BigNet 2016: First Workshop on Big Network Analytics	Jie Tang, Keke Cai, Zhong Su, Hanghang Tong, Michalis Vazirgiannis, Yang Yang	The main objective of the workshop is to provide a forum for presenting the most recent advances in mining big networks to unearth rich knowledge.
325	DDTA 2016: The Workshop on Data-Driven Talent Acquisition	Yi Fang, Maarten de Rijke, Huangming Xie	The aim of this workshop is to provide a forum for industry and academia to discuss the recent progress in talent search and management, and how the use of big data and data-driven decision making can advance talent acquisition and human resource management.
326	ACM DAVA’16: 2nd International Workshop on DAta mining meets Visual Analytics at Big Data Era	Lei Shi, Hanghang Tong, Chaoli Wang, Leman Akoglu	Three keynote speakers from both data mining and visualization give invited talks in this workshop (40-minute each).
327	DTMBIO 2016: The Tenth International Workshop on Data and Text Mining in Biomedical Informatics	Sangwoo Kim, Jake Y. Chen, Vincenzo Cutello, Doheon Lee	DTMBIO 2016: The Tenth International Workshop on Data and Text Mining in Biomedical Informatics