Paper Digest: CIKM 2016 Highlights
The ACM Conference on Information and Knowledge Management (CIKM) is an annual computer science research conference dedicated to information management and knowledge management.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
team@paperdigest.org
TABLE 1: CIKM 2016 Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | Toward Data-Driven Education: CIKM-2016 Keynote | Rakesh Agrawal | We address three issues in this talk. |
2 | Social Recommendation with Strong and Weak Ties | Xin Wang, Wei Lu, Martin Ester, Can Wang, Chun Chen | In this work, we study the effects of distinguishing strong and weak ties in social recommendation. |
3 | Learning Graph-based POI Embedding for Location-based Recommendation | Min Xie, Hongzhi Yin, Hao Wang, Fanjiang Xu, Weitong Chen, Sen Wang | To address these challenges, we stand on recent advances in embedding learning techniques and propose a generic graph-based embedding model, called GE, in this paper. |
4 | Improving Personalized Trip Recommendation by Avoiding Crowds | Xiaoting Wang, Christopher Leckie, Jeffrey Chan, Kwan Hui Lim, Tharshan Vaithianathan | In this work, we propose the Personalized Crowd-aware Trip Recommendation (PersCT) algorithm to recommend personalized trips that also avoid the most crowded times of the POIs. |
5 | Memory-based Recommendations of Entities for Web Search Users | Ignacio Fernández-Tobías, Roi Blanco | In this paper we propose a set of domain-agnostic methods based on nearest neighbors collaborative filtering that exploit query log data to generate entity suggestions, taking into account the user’s full search session. |
6 | LICON: A Linear Weighting Scheme for the Contribution ofInput Variables in Deep Artificial Neural Networks | Gjergji Kasneci, Thomas Gottron | We propose a generic framework as well as a concrete method for quantifying the influence of individual input signals on the output computed by a deep neural network. |
7 | A Deep Relevance Matching Model for Ad-hoc Retrieval | Jiafeng Guo, Yixing Fan, Qingyao Ai, W. Bruce Croft | In this paper, we propose a novel deep relevance matching model (DRMM) for ad-hoc retrieval. |
8 | A Neural Network Approach to Quote Recommendation in Writings | Jiwei Tan, Xiaojun Wan, Jianguo Xiao | In this paper, we propose a neural network approach based on LSTMs to the quote recommendation task. |
9 | Retweet Prediction with Attention-based Deep Neural Network | Qi Zhang, Yeyun Gong, Jindou Wu, Haoran Huang, Xuanjing Huang | In this work, we proposed a novel attention-based deep neural network to incorporate contextual and social information for this task. To train and evaluate the proposed methods, we also constructed a large dataset collected from Twitter. |
10 | Effective Document Labeling with Very Few Seed Words: A Topic Model Approach | Chenliang Li, Jian Xing, Aixin Sun, Zongyang Ma | In this paper, we propose a Seed-Guided Topic Model (named STM) for the dataless text classification task. |
11 | Cross-lingual Text Classification via Model Translation with Limited Dictionaries | Ruochen Xu, Yiming Yang, Hanxiao Liu, Andrew Hsi | Specifically, we propose two new approaches that combines unsupervised word embedding in different languages, supervised mapping of embedded words across languages, and probabilistic translation of classification models. |
12 | Semi-supervised Multi-Label Topic Models for Document Classification and Sentence Labeling | Hossein Soleimani, David J. Miller | We propose a semi-supervised multi-label topic model for jointly achieving document and sentence-level class inferences. |
13 | Linked Document Embedding for Classification | Suhang Wang, Jiliang Tang, Charu Aggarwal, Huan Liu | In this paper, we study the problem of linked document embedding for classification and propose a linked document embedding framework LDE, which combines link and label information with content information to learn document representations for classification. Linked documents present new challenges to traditional document embedding algorithms. |
14 | Detecting Promotion Campaigns in Query Auto Completion | Yuli LIU, Yiqun Liu, Ke Zhou, Min Zhang, Shaoping Ma, Yue Yin, Hengliang Luo | Query Auto Completion (QAC) aims to provide possible suggestions to Web search users from the moment they start entering a query, which is thought to reduce their physical and cognitive efforts in query formulation. |
15 | A Unified Index for Spatio-Temporal Keyword Queries | Tuan-Anh Hoang-Vu, Huy T. Vo, Juliana Freire | We propose a new indexing strategy that uniformly handles text, space and time in a single structure, and is thus able to efficiently evaluate queries that combine keywords with spatial and temporal constraints. |
16 | Privacy-Preserving Reachability Query Services for Massive Networks | Jiaxin Jiang, Peipei Yi, Byron Choi, Zhiwei Zhang, Xiaohui Yu | Specifically, we propose a scalable index construction algorithm by employing the idea of topological folding, recently proposed by Cheng et al. |
17 | Sequential Query Expansion using Concept Graph | Saeid Balaneshin-kordan, Alexander Kotov | In this paper, we propose a two-stage feature-based method for sequential selection of the most effective concepts for query expansion from a concept graph. |
18 | Learning Latent Vector Spaces for Product Search | Christophe Van Gysel, Maarten de Rijke, Evangelos Kanoulas | We introduce a novel latent vector space model that jointly learns the latent representations of words, e-commerce products and a mapping between the two without the need for explicit annotations. |
19 | Incorporating Clicks, Attention and Satisfaction into a Search Engine Result Page Evaluation Model | Aleksandr Chuklin, Maarten de Rijke | In this paper we propose a model of user behavior on a SERP that jointly captures click behavior, user attention and satisfaction, the CAS model, and demonstrate that it gives more accurate predictions of user actions and self-reported satisfaction than existing models based on clicks alone. |
20 | The Role of Relevance in Sponsored Search | Luca Aiello, Ioannis Arapakis, Ricardo Baeza-Yates, Xiao Bai, Nicola Barbieri, Amin Mantrach, Fabrizio Silvestri | Specifically, we propose a machine learning approach that solely relies on text-based features to measure the relevance between an advertisement and a query. |
21 | PowerWalk: Scalable Personalized PageRank via Random Walks with Vertex-Centric Decomposition | Qin Liu, Zhenguo Li, John C.S. Lui, Jiefeng Cheng | In this paper, we propose a distributed framework that strikes a better balance between offline indexing and online querying. |
22 | Building Industry-specific Knowledge Bases | Shivakumar Vaithyanathan | In this talk, I will describe the design of domain-specific languages (DSL) with specialized constructs that serve as target languages for learning these models and algorithms, and the generation of training data for scaling up the learning. |
23 | Reuters Tracer: A Large Scale System of Detecting & Verifying Real-Time News Events from Twitter | Xiaomo Liu, Quanzhi Li, Armineh Nourbakhsh, Rui Fang, Merine Thomas, Kajsa Anderson, Russ Kociuba, Mark Vedder, Steven Pomerville, Ramdev Wudali, Robert Martin, John Duprey, Arun Vachher, William Keenan, Sameena Shah | In this paper, we describe Reuters Tracer, a system for sifting through all noise to detect news events on Twitter and assessing their veracity. |
24 | Structural Clustering of Machine-Generated Mail | Noa Avigdor-Elgrabli, Mark Cwalinski, Dotan Di Castro, Iftah Gamzu, Irena Grabovitch-Zuyev, Liane Lewin-Eytan, Yoelle Maarek | Several recent studies have presented different approaches for clustering and classifying machine-generated mail based on email headers. |
25 | LambdaFM: Learning Optimal Ranking with Factorization Machines Using Lambda Surrogates | Fajie Yuan, Guibing Guo, Joemon M. Jose, Long Chen, Haitao Yu, Weinan Zhang | In this paper, we demonstrate, both theoretically and empirically, PRFM models usually lead to non-optimal item recommendation results due to such a mismatch. |
26 | Plackett-Luce Regression Mixture Model for Heterogeneous Rankings | Maksim Tkachenko, Hady W. Lauw | In this work, we are concerned with learning to rank for a heterogeneous population, which may consist of a number of sub-populations, each of which may rank objects differently. |
27 | Compression-Based Selective Sampling for Learning to Rank | Rodrigo M. Silva, Guilherme C.M. Gomes, Mário S. Alvim, Marcos A. Gonçalves | In this paper, we propose that certain characteristics of unlabeled L2R datasets allow for an unsupervised, compression-based selection process to be used to create small and yet highly informative and effective initial sets that can later be labeled and used to bootstrap a L2R system. |
28 | Incorporating Risk-Sensitiveness into Feature Selection for Learning to Rank | Daniel Xavier De Sousa, Sérgio Daniel Canuto, Thierson Couto Rosa, Wellington Santos Martins, Marcos André Gonçalves | In this paper we propose multi-objective FS strategies that optimize both aspects at the same time: ranking performance and risk-sensitive evaluation. |
29 | Answering Twitter Questions: a Model for Recommending Answerers through Social Collaboration | Laure Soulier, Lynda Tamine, Gia-Hung Nguyen | In this paper, we specifically consider the challenging task of solving a question posted on Twitter. |
30 | Learning to Extract Conditional Knowledge for Question Answering using Dialogue | Pengwei Wang, Lei Ji, Jun Yan, Lianwen Jin, Wei-Ying Ma | In this work, we propose to extract conditional knowledge base (CKB) from user question-answer pairs for answering user questions with different conditions through dialogue. |
31 | aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model | Liu Yang, Qingyao Ai, Jiafeng Guo, W. Bruce Croft | In this paper, we propose an attention based neural matching model for ranking short answer text. |
32 | Medical Question Answering for Clinical Decision Support | Travis R. Goodwin, Sanda M. Harabagiu | In this paper, we present a novel framework for answering medical questions in the spirit of TREC-CDS by first discovering the answer and then selecting and ranking scientific articles that contain the answer. |
33 | Error Link Detection and Correction in Wikipedia | Chengyu Wang, Rong Zhang, Xiaofeng He, Aoying Zhou | In this paper, we address the error link problem, and propose algorithms to detect and correct error links. |
34 | Using Prerequisites to Extract Concept Maps fromTextbooks | Shuting Wang, Alexander Ororbia, Zhaohui Wu, Kyle Williams, Chen Liang, Bart Pursel, C. Lee Giles | We present a framework for constructing a specific type of knowledge graph, a concept map from textbooks. |
35 | Vandalism Detection in Wikidata | Stefan Heindorf, Martin Potthast, Benno Stein, Gregor Engels | In this paper, we present a new machine learning-based approach to detect vandalism in Wikidata. We propose a set of 47 features that exploit both content and context information, and we report on 4 classifiers of increasing effectiveness tailored to this learning task. |
36 | Finding News Citations for Wikipedia | Besnik Fetahu, Katja Markert, Wolfgang Nejdl, Avishek Anand | In this work we address the problem of finding and updating news citations for statements in entity pages. |
37 | SemiNMF-PCA framework for Sparse Data Co-clustering | Kais Allab, Lazhar Labiod, Mohamed Nadif | In this paper, we propose a novel way to consider the co-clustering and the reduction of the dimension simultaneously. |
38 | Effective and Efficient Spectral Clustering on Text and Link Data | Zhiqiang Xu, Yiping Ke | In this paper, we address this limitation by explicitly modeling the domain-specific distinctions in the clustering process. |
39 | Robust Spectral Ensemble Clustering | Zhiqiang Tao, Hongfu Liu, Sheng Li, Yun Fu | In this paper, we propose a novel Robust Spectral Ensemble Clustering (RSEC) approach to address this challenge. |
40 | Hybrid Indexing for Versioned Document Search with Cluster-based Retrieval | Xin Jin, Daniel Agun, Tao Yang, Qinghao Wu, Yifan Shen, Susen Zhao | This paper proposes an alternative approach that uses cluster-based retrieval to quickly narrow the search scope guided by version representatives at Phase 1 and develops a hybrid index structure with adaptive runtime data traversal to speed up Phase 2 search. |
41 | Time-aware Multi-Viewpoint Summarization of Multilingual Social Text Streams | Zhaochun Ren, Oana Inel, Lora Aroyo, Maarten de Rijke | In this paper, we focus on time-aware multi-viewpoint summarization of multilingual social text streams. |
42 | Data Summarization with Social Contexts | Hao Zhuang, Rameez Rahman, Xia Hu, Tian Guo, Pan Hui, Karl Aberer | To tackle these challenges, in this paper, we focus on exploiting social contexts to summarize social data while preserving topics in the original dataset. |
43 | Understanding Sparse Topical Structure of Short Text via Stochastic Variational-Gibbs Inference | Tianyi Lin, Siyuan Zhang, Hong Cheng | In this paper, we propose a probabilistic Bayesian topic model, namely Sparse Dirichlet mixture Topic Model (SparseDTM), based on Indian Buffet Process (IBP) prior, and infer our model on the large text corpora through a novel inference procedure called stochastic variational-Gibbs inference. |
44 | Annotating Points of Interest with Geo-tagged Tweets | Kaiqi Zhao, Gao Cong, Aixin Sun | In this paper, we aim to associate tweets that are semantically related to real-world locations or Points of Interest (POIs). |
45 | Duer: Intelligent Personal Assistant | Haifeng Wang | In this talk, I describe Duer, Baidu’s intelligent personal assistant. |
46 | Measuring Metrics | Pavel Dmitriev, Xian Wu | In this paper we describe the metric evaluation system deployed at Bing, where we have been working on designing and improving metrics for over five years. |
47 | City-Scale Localization with Telco Big Data | Fangzhou Zhu, Chen Luo, Mingxuan Yuan, Yijian Zhu, Zhengqing Zhang, Tao Gu, Ke Deng, Weixiong Rao, Jia Zeng | In this paper, we find that the widely-used location based services (LBSs) have accumulated lots of over-the-top (OTT) global positioning system (GPS) data in telco networks, which can be automatically used as training labels for learning accurate MR-based positioning systems. |
48 | Approximating Graph Pattern Queries Using Views | Jia Li, Yang Cao, Xudong Liu | Given a pattern query Q and a set V of views, we propose to find a pair of queries Qu and Ql, referred to as the upper and lower approximations of Q w.r.t. V, such that (a) for any data graph G, answers to (part of) Q in G are contained in Qu(G) and contain Ql(G); and (b) both Qu and Ql can be answered by using views in V. |
49 | Group-Aware Weighted Bipartite B-Matching | Cheng Chen, Sean Chester, Venkatesh Srinivasan, Kui Wu, Alex Thomo | In this paper, we investigate powerful generalisations of WBM. We then propose two related problems, collectively called group-aware WBM. |
50 | Growing Graphs from Hyperedge Replacement Graph Grammars | Salvador Aguiñaga, Rodrigo Palacios, David Chiang, Tim Weninger | In this paper we show that a graph’s clique tree can be used to extract a hyperedge replacement grammar. |
51 | GiraphAsync: Supporting Online and Offline Graph Processing via Adaptive Asynchronous Message Processing | Yuqiong Liu, Chang Zhou, Jun Gao, Zhiguo Fan | In this work, we propose an adaptive asynchronous message processing (AAMP) method, which improves the efficiency of network communication while maintains low latency, to efficiently support offline analytics and online queries in one graph processing framework. |
52 | Graph Topic Scan Statistic for Spatial Event Detection | Yu Liu, Baojian Zhou, Feng Chen, David W. Cheung | In this paper, we focus on the problem of spatial event detection using textual information in social media. |
53 | A Nonparametric Model for Event Discovery in the Geospatial-Temporal Space | Jinjin Guo, Zhiguo Gong | To break through such limitations, in this paper we propose a novel nonparametric model to identify events in the geographical and temporal space, where any recurrent patterns of events can be automatically captured. |
54 | A Multiple Instance Learning Framework for Identifying Key Sentences and Detecting Events | Wei Wang, Yue Ning, Huzefa Rangwala, Naren Ramakrishnan | We evaluate our model in its ability to detect news articles about civil unrest events (from Spanish text) across ten Latin American countries and identify the key sentences pertaining to these events. |
55 | PairFac: Event Analytics through Discriminant Tensor Factorization | Xidao Wen, Yu-Ru Lin, Konstantinos Pelechrinis | In this paper, we propose a novel approach for analyzing events called PairFac. |
56 | Active Content-Based Crowdsourcing Task Selection | Piyush Bansal, Carsten Eickhoff, Thomas Hofmann | In this paper, we focus on an alternate method that exploits document information instead, to infer relevance labels for unjudged documents. |
57 | CrowdSelect: Increasing Accuracy of Crowdsourcing Tasks through Behavior Prediction and User Selection | Chenxi Qiu, Anna C. Squicciarini, Barbara Carminati, James Caverlee, Dev Rishi Khare | In this paper, we present a dynamic and time efficient solution to the task assignment problem in crowdsourcing platforms. |
58 | Attribute-based Crowd Entity Resolution | Asif R. Khan, Hector Garcia-Molina | In this paper, we reduce the cost of pairwise crowd ER approaches by soliciting the crowd for attribute labels on records, and then asking for pairwise judgments only between records with similar sets of attribute labels. |
59 | Efficient Processing of Location-Aware Group Preference Queries | Miao Li, Lisi Chen, Gao Cong, Yu Gu, Ge Yu | We develop a novel framework for answering the LGP query, which can be used to compute both exact query result and approximate result with a proven approximation ratio. |
60 | Mining Shopping Patterns for Divergent Urban Regions by Incorporating Mobility Data | Tianran Hu, Ruihua Song, Yingzi Wang, Xing Xie, Jiebo Luo | In this paper, we aim to predict citywide shopping patterns. |
61 | Large-Scale Analysis of Viewing Behavior: Towards Measuring Satisfaction with Mobile Proactive Systems | Qi Guo, Yang Song | In this paper, we present the first large-scale analysis of viewing behavior based on the viewport (the visible fraction of a Web page) of the mobile devices, towards measuring user satisfaction with the information cards of the mobile proactive systems. |
62 | Where Did You Go: Personalized Annotation of Mobility Records | Fei Wu, Zhenhui Li | In this paper, we aim to answer this question by annotating the mobility records with surrounding venues that were actually visited by the user. |
63 | Understanding Mobile Searcher Attention with Rich Ad Formats | Dmitry Lagun, Donal McMahon, Vidhya Navalpakkam | In this paper, we study how the presence of ads and their formats impacts searcher’s gaze and satisfaction. |
64 | Link Prediction in Heterogeneous Social Networks | Sumit Negi, Santanu Chaudhury | In this paper we pose the problem of link prediction in heterogeneous networks as a multi-task, metric learning (MTML) problem. |
65 | Who are My Familiar Strangers?: Revealing Hidden Friend Relations and Common Interests from Smart Card Data | Fusang Zhang, Beihong Jin, Tingjian Ge, Qiang Ji, Yanling Cui | In this paper, we study the problem of discovering familiar strangers, specifically, public transportation trip companions, and their common interests. |
66 | PIN-TRUST: Fast Trust Propagation Exploiting Positive, Implicit, and Negative Information | Min-Hee Jang, Christos Faloutsos, Sang-Wook Kim, U Kang, Jiwoon Ha | In this paper, we propose PIN-TRUST, a novel method to handle all three types of interaction information: explicit trust, implicit trust, and explicit distrust. |
67 | Predicting Popularity of Twitter Accounts through the Discovery of Link-Propagating Early Adopters | Daichi Imamori, Keishi Tajima | In this paper, we propose a method of ranking recently created Twitter accounts according to their prospective popularity. |
68 | "Shall I Be Your Chat Companion?": Towards an Online Human-Computer Conversation System | Rui Yan, Yiping Song, Xiangyang Zhou, Hua Wu | In this paper, we introduce a chat companion system, which is a practical conversation system between human and computer as a real application. |
69 | To Click or Not To Click: Automatic Selection of Beautiful Thumbnails from Videos | Yale Song, Miriam Redi, Jordi Vallmitjana, Alejandro Jaimes | We present an automatic thumbnail selection system that exploits two important characteristics commonly associated with meaningful and attractive thumbnails: high relevance to video content and superior visual aesthetic quality. |
70 | Separating-Plane Factorization Models: Scalable Recommendation from One-Class Implicit Feedback | Haolan Chen, Di Niu, Kunfeng Lai, Yu Xu, Masoud Ardakani | We propose a scalable approach called separating-plane matrix factorization (SPMF) to make effective recommendations based on positive implicit feedback, with a learning complexity that is comparable to traditional matrix factorization. |
71 | User Response Learning for Directly Optimizing Campaign Performance in Display Advertising | Kan Ren, Weinan Zhang, Yifei Rong, Haifeng Zhang, Yong Yu, Jun Wang | In this paper, we take real-time display advertising as an example, where the predicted user’s ad click-through rate (CTR) is employed to calculate a bid for an ad impression in the second price auction. |
72 | Personalized Search: Potential and Pitfalls | Susan T. Dumais | In this talk I present a framework to quantify the "potential for personalization" which we use to characterize the extent to which different people have different intents for the same query. |
73 | Query Variations and their Effect on Comparing Information Retrieval Systems | Guido Zuccon, Joao Palotti, Allan Hanbury | We propose a framework for evaluating retrieval systems that explicitly takes into account query variations. |
74 | Semantic Matching by Non-Linear Word Transportation for Information Retrieval | Jiafeng Guo, Yixing Fan, Qingyao Ai, W. Bruce Croft | Based on this representation, we introduce a novel retrieval model by viewing the matching between queries and documents as a non-linear word transportation (NWT) problem. |
75 | Generalizing Translation Models in the Probabilistic Relevance Framework | Navid Rekabsaz, Mihai Lupu, Allan Hanbury, Guido Zuccon | In this paper, we revisit a wide spectrum of existing models (Pivoted Document Normalization, BM25, BM25 Verboseness Aware, Multi-Aspect TF, and Language Modelling) by introducing a generalisation of the idea of the translation model. |
76 | Axiomatic Result Re-Ranking | Matthias Hagen, Michael Völske, Steve Göring, Benno Stein | In this paper, we combine the learning-to-rank paradigm with the recent developments on axioms for information retrieval. |
77 | Agents, Simulated Users and Humans: An Analysis of Performance and Behaviour | David Maxwell, Leif Azzopardi | In this paper, we develop a more sophisticated model of the user that includes their cognitive state within the simulation. |
78 | Inspiration or Preparation?: Explaining Creativity in Scientific Enterprise | Xinyang Zhang, Dashun Wang, Ting Wang | Existing studies have made striding advances in quantifying creativity of scientific publications by investigating their citation relationships. |
79 | Pagination versus Scrolling in Mobile Web Search | Jaewon Kim, Paul Thomas, Ramesh Sankaranarayana, Tom Gedeon, Hwan-Jin Yoon | For touch-enabled mobile devices that are not equipped with a mouse or keyboard, we adopt other methods of controlling the viewport with the aim of investigating user interaction. |
80 | Studying the Dark Triad of Personality through Twitter Behavior | Daniel Preotiuc-Pietro, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar | Our results show that we can map various behaviors to psychological theory and study new aspects related to social media usage. |
81 | Document Filtering for Long-tail Entities | Ridho Reinanda, Edgar Meij, Maarten de Rijke | In this paper we propose a document filtering method for long-tail entities that is entity-independent and thus also generalizes to unseen or rarely seen entities. We propose a set of features that capture informativeness, entity-saliency, and timeliness. |
82 | Estimating Time Models for News Article Excerpts | Arunav Mishra, Klaus Berberich | For this, we propose a semi-supervised distribution propagation framework that leverages redundancy in the data to improve the quality of estimated time models. |
83 | A Framework for Task-specific Short Document Expansion | Ramakrishna B. Bairi, Raghavendra Udupa, Ganesh Ramakrishnan | We present an expansion technique — TIDE (Task-specIfic short Document Expansion) — that can be applied on several Machine Learning, NLP and Information Retrieval tasks on short texts (such as short text classification, clustering, entity disambiguation, and the like) without using task specific heuristics and domain-specific knowledge for expansion. |
84 | Beyond Clustering: Sub-DAG Discovery for Categorising Documents | Ramakrishna B. Bairi, Mark J. Carman, Ganesh Ramakrishnan | We propose two different algorithms for estimating the model parameters. Unlike previous works, which focus on clustering the set of documents using the category hierarchy as features, we directly pose the problem as that of finding a DAG structured generative mode that has maximum likelihood of generating the observed "importance" scores for each document where documents are modeled as the leaf nodes in the DAG structure. |
85 | On Transductive Classification in Heterogeneous Information Networks | Xiang Li, Ben Kao, Yudian Zheng, Zhipeng Huang | Studies have shown that transductive classification is an effective way to classify and to deduce labels of objects, and a number of transductive classifiers have been put forward to classify objects in an HIN. |
86 | Efficient Hidden Trajectory Reconstruction from Sparse Data | Ning Yang, Philip S. Yu | In this paper, we investigate the problem of reconstructing hidden trajectories from a collective of separate spatial-temporal points without ID information, given the number of hidden trajectories. |
87 | Quark-X: An Efficient Top-K Processing Framework for RDF Quad Stores | Jyoti Leeka, Srikanta Bedathur, Debajyoti Bera, Medha Atre | In this paper, we present Quark-X, an RDF-store and SPARQL processing system for reified RDF data represented in the form of quads. |
88 | Reenactment for Read-Committed Snapshot Isolation | Bahareh Sadat Arab, Dieter Gawlick, Vasudha Krishnaswamy, Venkatesh Radhakrishnan, Boris Glavic | We present non trivial extensions of the model and reenactment approach to be able to compute provenance of RC-SI transactions efficiently. |
89 | Influence-Aware Truth Discovery | Hengtong Zhang, Qi Li, Fenglong Ma, Houping Xiao, Yaliang Li, Jing Gao, Lu Su | To tackle these challenges in truth discovery, we propose an unsupervised probabilistic model named IATD. |
90 | Truth Discovery via Exploiting Implications from Multi-Source Data | Xianzhi Wang, Quan Z. Sheng, Lina Yao, Xue Li, Xiu Susie Fang, Xiaofei Xu, Boualem Benatallah | In this paper, we address this challenge by exploiting and leveraging the implications from multi-source data. |
91 | FacetGist: Collective Extraction of Document Facets in Large Technical Corpora | Tarique Siddiqui, Xiang Ren, Aditya Parameswaran, Jiawei Han | Towards this end, we introduce a new research problem called Facet Extraction. |
92 | Empowering Truth Discovery with Multi-Truth Prediction | Xianzhi Wang, Quan Z. Sheng, Lina Yao, Xue Li, Xiu Susie Fang, Xiaofei Xu, Boualem Benatallah | In this paper, we propose a multi-truth discovery approach, which addresses the above challenges by providing a generic framework for enhancing existing truth discovery methods. |
93 | Using Machine Learning to Improve the Email Experience | Marc Najork | In this talk, I will give three examples of machine learning improving the email experience. |
94 | Hashtag Recommendation for Enterprise Applications | Dhruv Mahajan, Vishwajit Kolathur, Chetan Bansal, Suresh Parthasarathy, Sundararajan Sellamanickam, Sathiya Keerthi, Johannes Gehrke | In this paper, we consider the problem of recommending hashtags for enterprise applications. |
95 | Survival Analysis based Framework for Early Prediction of Student Dropouts | Sattar Ameri, Mahtab J. Fard, Ratna B. Chinnam, Chandan K. Reddy | In this paper, we develop a survival analysis framework for early prediction of student dropout using Cox proportional hazards regression model (Cox). |
96 | Generative Feature Language Models for Mining Implicit Features from Customer Reviews | Shubhra Kanti Karmaker Santu, Parikshit Sondhi, ChengXiang Zhai | In this paper, we propose a new approach based on generative feature language models that can mine the implicit features more effectively through unsupervised statistical learning. We also created eight new data sets to facilitate evaluation of this task in English. |
97 | Data-Driven Contextual Valence Shifter Quantification for Multi-Theme Sentiment Analysis | Hongkun Yu, Jingbo Shang, Meichun Hsu, Malu Castellanos, Jiawei Han | To simultaneously resolve the multi-theme and sentiment shifting problems, we propose a data-driven framework to enable both capabilities: (1) polarity predictions of the same word in reviews of different themes, and (2) discovery and quantification of contextual valence shifters. |
98 | Sentiment Domain Adaptation with Multi-Level Contextual Sentiment Knowledge | Fangzhao Wu, Sixing Wu, Yongfeng Huang, Songfang Huang, Yong Qin | In this paper, we propose a new sentiment domain adaptation approach by adapting the sentiment knowledge in general-purpose sentiment lexicons to a specific domain. |
99 | Mobile App Retrieval for Social Media Users via Inference of Implicit Intent in Social Media Text | Dae Hoon Park, Yi Fang, Mengwen Liu, ChengXiang Zhai | In this paper, we study how to infer a user’s intent based on the user’s "status text" and retrieve relevant mobile apps that may satisfy the user’s needs. |
100 | Derivative Delay Embedding: Online Modeling of Streaming Time Series | Zhifei Zhang, Yang Song, Wei Wang, Hairong Qi | We propose a novel and more practical online modeling and classification scheme, DDE-MGM, which does not make any assumptions on the time series while maintaining high efficiency and state-of-the-art performance. |
101 | PISA: An Index for Aggregating Big Time Series Data | Xiangdong Huang, Jianmin Wang, Raymond Wong, Jinrui Zhang, Chen Wang | By defining two kinds of tags, namely code number and serial number, we propose an algorithm to accelerate queries by avoiding reading unnecessary data on disk. |
102 | Multi-View Time Series Classification: A Discriminative Bilinear Projection Approach | Sheng Li, Yaliang Li, Yun Fu | In light of this challenge, we propose a novel approach, named Multi-view Discriminative Bilinear Projections (MDBP), for extracting discriminative features from multi-view m.t.s. data. |
103 | Semi-Supervision Dramatically Improves Time Series Clustering under Dynamic Time Warping | Hoang Anh Dau, Nurjahan Begum, Eamonn Keogh | In this work we show that this is a naive approach which in most circumstances produces inferior clusterings. |
104 | Model-Based Oversampling for Imbalanced Sequence Classification | Zhichen Gong, Huanhuan Chen | To address these problems, this paper proposes a novel oversampling algorithm based on the ‘generative’ models of sequences. |
105 | CRISP: Consensus Regularized Selection based Prediction | Ping Wang, Karthik K. Padthe, Bhanukiran Vinzamuri, Chandan K. Reddy | To solve this problem, in this paper, we propose a method to generate a committee of non-convex regularized linear regression models, and use a consensus criterion to determine the optimal model for prediction. |
106 | Regularizing Structured Classifier with Conditional Probabilistic Constraints for Semi-supervised Learning | Vincent W. Zheng, Kevin Chen-Chuan Chang | Thus in this paper, we propose a new conditional probabilistic formulation for modeling both x-type and y-type constraints. |
107 | Scalability of Continuous Active Learning for Reliable High-Recall Text Classification | Gordon V. Cormack, Maura R. Grossman | We present a scalable version of CAL (‘S-CAL’) that requires O(log N) labeling effort and O(N log N) computational effort—where N is the number of unlabeled training examples—to construct a classifier whose effectiveness for a given labeling cost compares favorably with previously reported methods. |
108 | Towards the Effective Linking of Social Media Contents to Products in E-Commerce Catalogs | Henry S. Vieira, Altigran S. da Silva, Pável Calado, Marco Cristo, Edleno S. de Moura | We argue that this problem can be effectively solved using a set of evidences that can be easily extracted from social media content and product descriptions. |
109 | Tracking Virality and Susceptibility in Social Media | Tuan-Anh Hoang, Ee-Peng Lim | In this work, we investigate the inter-relationship among the factors and users’ multiple adoptions on items to propose both new static and temporal models for measuring the factors without requiring user – item exposure. |
110 | Feature Driven and Point Process Approaches for Popularity Prediction | Swapnil Mishra, Marian-Andrei Rizoiu, Lexing Xie | From these observations, we argue that future work on popularity prediction should compare across feature-driven and generative modeling approaches in both classification and regression tasks. |
111 | Adaptive Evolutionary Filtering in Real-Time Twitter Stream | Feifan Fan, Yansong Feng, Lili Yao, Dongyan Zhao | In this paper, we propose a novel adaptive evolutionary filtering framework to push interesting tweets for users from real-time twitter stream. |
112 | Multiple Queries as Bandit Arms | Cheng Li, Paul Resnick, Qiaozhu Mei | We consider a new paradigm of retrieval where multiple queries are kept “active” simultaneously. |
113 | Off the Beaten Path: Let’s Replace Term-Based Retrieval with k-NN Search | Leonid Boytsov, David Novak, Yury Malkov, Eric Nyberg | We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle term associations. |
114 | Scalability and Total Recall with Fast CoveringLSH | Ninh Pham, Rasmus Pagh | Building on the recent theoretical "CoveringLSH" construction that eliminates false negatives, we propose a fast and practical covering LSH scheme for Hamming space called Fast CoveringLSH (fcLSH). |
115 | Query-Biased Partitioning for Selective Search | Zhuyun Dai, Chenyan Xiong, Jamie Callan | This paper presents a query-biased partitioning strategy that aligns document partitions with topics from query logs. |
116 | Characterizing Diseases from Unstructured Text: A Vocabulary Driven Word2vec Approach | Saurav Ghosh, Prithwish Chakraborty, Emily Cohn, John S. Brownstein, Naren Ramakrishnan | In this paper, we motivate a disease vocabulary driven word2vec model (Dis2Vec) to model diseases and constituent attributes as word embeddings from the HealthMap news corpus. |
117 | Network-Efficient Distributed Word2vec Training System for Large Vocabularies | Erik Ordentlich, Lee Yang, Andy Feng, Peter Cnudde, Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, Gavin Owens | In this paper, we present a novel distributed, parallel training system that enables unprecedented practical training of vectors for vocabularies with several 100 million words on a shared cluster of commodity servers, using far less network traffic than the existing solutions. |
118 | A Personal Perspective and Retrospective on Web Search Technology | Andrei Broder | This talk is a review of some Web research and predictions that I co-authored over the last two decades: both what turned out gratifyingly right and what turned out embarrassingly wrong. |
119 | Scalable Spectral k-Support Norm Regularization for Robust Low Rank Subspace Learning | Yiu-ming Cheung, Jian Lou | Therefore, this paper proposes a scalable and efficient algorithm which considers the dual objective of the original problem that can take advantage of the more computational efficient linear oracle of the spectral k-support norm to be evaluated. |
120 | Online Adaptive Passive-Aggressive Methods for Non-Negative Matrix Factorization and Its Applications | Chenghao Liu, Steven C.H. Hoi, Peilin Zhao, Jianling Sun, Ee-Peng Lim | This paper aims to investigate efficient and scalable machine learning algorithms for resolving Non-negative Matrix Factorization (NMF), which is important for many real-world applications, particularly for collaborative filtering and recommender systems. |
121 | aptMTVL: Nailing Interactions in Multi-Task Multi-View Multi-Label Learning using Adaptive-basis Multilinear Factor Analyzers | Xiaoli Li, Jun Huan | We investigate a new direction of multi-task multi-view learning where we have data sets with multiple tasks, multiple views and multiple labels. |
122 | An Adaptive Framework for Multistream Classification | Swarup Chandra, Ahsanul Haque, Latifur Khan, Charu Aggarwal | In this paper, we present a novel stream classification problem setting involving two independent non-stationary data generating processes, relaxing the above assumptions. |
123 | Optimizing Update Frequencies for Decaying Information | Simon Razniewski | In this paper we present a model for describing the relationship between update frequency and income derived from data, present solutions for calculating the optimal update frequency for two common classes of functions for describing decay behaviour, and validate the benefits of our framework. |
124 | Cutty: Aggregate Sharing for User-Defined Windows | Paris Carbone, Jonas Traub, Asterios Katsifodimos, Seif Haridi, Volker Markl | In this paper we present a technique to perform efficient aggregate sharing for data stream windows, which are declared as user-defined functions (UDFs) and can contain arbitrary business logic. |
125 | Relational Database Schema Design for Uncertain Data | Sebastian Link, Henri Prade | We investigate the impact of uncertainty on relational data\-base schema design. |
126 | BICP: Block-Incremental CP Decomposition with Update Sensitive Refinement | Shengyu Huang, K. Selçuk Candan, Maria Luisa Sapino | In this paper, we propose a two-phase block-incremental CP-based tensor decomposition technique, BICP, that efficiently and effectively maintains tensor decomposition results in the presence of dynamically evolving tensor data. |
127 | Topological Graph Sketching for Incremental and Scalable Analytics | Bortik Bandyopadhyay, David Fuhry, Aniket Chakrabarti, Srinivasan Parthasarathy | We propose a novel, scalable, and principled graph sketching technique based on minwise hashing of local neighborhood. |
128 | Querying Minimal Steiner Maximum-Connected Subgraphs in Large Graphs | Jiafeng Hu, Xiaowei Wu, Reynold Cheng, Siqiang Luo, Yixiang Fang | In this paper, we investigate the minimal SMCS, which is the minimal subgraph of G with the maximum connectivity containing Q. |
129 | Efficient Estimation of Triangles in Very Large Graphs | Roohollah Etemadi, Jianguo Lu, Yung H. Tsin | This paper proposes a new method to estimate the number of triangles based on random edge sampling. |
130 | Efficient Batch Processing for Multiple Keyword Queries on Graph Data | Lu Chen, Chengfei Liu, Xiaochun Yang, Bin Wang, Jianxin Li, Rui Zhou | Based on the model, we design an A* based algorithm to find the global optimal execution plan for multiple queries. |
131 | Supervised Robust Discrete Multimodal Hashing for Cross-Media Retrieval | Ting-Kun Yan, Xin-Shun Xu, Shanqing Guo, Zi Huang, Xiao-Lin Wang | To consider these problems, in this paper, we propose a novel supervised hashing framework for cross-modal retrieval, i.e., Supervised Robust Discrete Multimodal Hashing (SRDMH). |
132 | Word Vector Compositionality based Relevance Feedback using Kernel Density Estimation | Dwaipayan Roy, Debasis Ganguly, Mandar Mitra, Gareth J.F. Jones | To alleviate this limitation, we introduce a relevance feedback (RF) method which makes use of word embedded vectors. |
133 | Q+Tree: An Efficient Quad Tree based Data Indexing for Parallelizing Dynamic and Reverse Skylines | Md. Saiful Islam, Chengfei Liu, Wenny Rahayu, Tarique Anwar | This paper presents an efficient quad-tree based data indexing scheme, called Q+Tree, for parallelizing the computations of the dynamic and reverse skyline queries. |
134 | Luhn Revisited: Significant Words Language Models | Mostafa Dehghani, Hosein Azarbonyad, Jaap Kamps, Djoerd Hiemstra, Maarten Marx | Inspired by the early work of Luhn [23], we propose significant words language models of feedback documents that capture all, and only, the significant shared terms from feedback documents. |
135 | ESPRESSO: Explaining Relationships between Entity Sets | Stephan Seufert, Klaus Berberich, Srikanta J. Bedathur, Sarath Kumar Kondreddi, Patrick Ernst, Gerhard Weikum | This paper presents efficient approximation algorithms. |
136 | Geotagging Named Entities in News and Online Documents | Jiangwei Yu Rafiei, Davood Rafiei | We study the problem of associating geography to named entities in online documents. |
137 | Discovering Entities with Just a Little Help from You | Jaspreet Singh, Johannes Hoffart, Avishek Anand | We propose novel human-in-the-loop retrieval methods for generating candidates based on gradient interleaving of diversification and textual relevance approaches. |
138 | Bayesian Non-Exhaustive Classification A Case Study: Online Name Disambiguation using Temporal Record Streams | Baichuan Zhang, Murat Dundar, Mohammad Al Hasan | In this work, we propose a Bayesian non-exhaustive classification framework for solving online name disambiguation task. |
139 | Large-scale Robust Online Matching and Its Application in E-commerce | Rong Jin | To address the first challenge, I will introduce two different techniques for robust matching. |
140 | A Distributed Graph Algorithm for Discovering Unique Behavioral Groups from Large-Scale Telco Data | Qirong Ho, Wenqing Lin, Eran Shaham, Shonali Krishnaswamy, The Anh Dang, Jingxuan Wang, Isabel Choo Zhongyan, Amy She-Nash | In this paper we propose a novel graph edge-clustering algorithm (DGEC) that can discover unique behavioral groups, from rich usage data sets (such as CDRs and beyond). |
141 | Urban Traffic Prediction through the Second Use of Inexpensive Big Data from Buildings | Zimu Zheng, Dan Wang, Jian Pei, Yi Yuan, Cheng Fan, Fu Xiao | In this paper, we report a novel and interesting case study of urban traffic prediction in Central, Hong Kong, one of the densest urban areas in the world. |
142 | A Probabilistic Multi-Touch Attribution Model for Online Advertising | Wendi Ji, Xiaoling Wang, Dell Zhang | In this paper, we propose a novel Probabilistic Multi-Touch Attribution (PMTA) model which takes into account not only which ads have been viewed or clicked by the user but also when each such interaction occurred. |
143 | Optimizing Ad Allocation in Social Advertising | Shaojie Tang, Jing Yuan | The goal of this work is to optimize the ad allocation from the platform’s perspective. |
144 | Joint Collaborative Ranking with Social Relationships in Top-N Recommendation | Dimitrios Rafailidis, Fabio Crestani | In this study, to account for the fact that the selections of social friends can improve the recommendation accuracy, we propose a joint CR model based on the users’ social relationships. |
145 | Modeling Customer Engagement from Partial Observations | Jelena Stojanovic, Djordje Gligorijevic, Zoran Obradovic | We address this problem by proposing a robust framework for structured regression on deficient data in evolving networks with a supervised representation learning based on neural features embedding. |
146 | On the Effectiveness of Query Weighting for Adapting Rank Learners to New Unlabelled Collections | Pengfei Li, Mark Sanderson, Mark Carman, Falk Scholer | Past work has shown that this approach can be used to significantly improve effectiveness; in this work, the approach is re-examined on a wide set of publicly available L2R test collections with more advanced learning to rank algorithms. |
147 | One Query, Many Clicks: Analysis of Queries with Multiple Clicks by the Same User | Elad Kravi, Ido Guy, Avihai Mejer, David Carmel, Yoelle Maarek, Dan Pelleg, Gilad Tsur | In this paper, we study multi-click queries – queries for which more than one click is performed by the same user within the same query session. |
148 | Precision-Oriented Query Facet Extraction | Weize Kong, James Allan | Recent work proposed an alternative solution that extracts facets for queries from their web search results, but neglected the precision-oriented perspective of the task — users are likely to care more about precision of presented facets than recall. |
149 | Learning to Rewrite Queries | Yunlong He, Jiliang Tang, Hua Ouyang, Changsung Kang, Dawei Yin, Yi Chang | In this paper, we propose a learning to rewrite framework that consists of a candidate generating phase and a candidate ranking phase. |
150 | When is the Time Ripe for Natural Language Processing for Patent Passage Retrieval? | Linda Andersson, Mihai Lupu, João Palotti, Allan Hanbury, Andreas Rauber | In this paper we explore query generation using natural language processing technologies in order to capture domain specific concepts represented as multi-word units. |
151 | A Probabilistic Fusion Framework | Yael Anava, Anna Shtok, Oren Kurland, Ella Rabinovich | Herein we present a probabilistic framework for the fusion task. |
152 | Selective Cluster-Based Document Retrieval | Or Levi, Fiana Raiber, Oren Kurland, Ido Guy | We address the long standing challenge of selective cluster-based retrieval; namely, deciding on a per-query basis whether to apply cluster-based document retrieval or standard document retrieval. To address this classification task, we propose a few sets of features based on those utilized by the cluster-based ranker, query-performance predictors, and properties of the clustering structure. |
153 | Pseudo-Relevance Feedback Based on Matrix Factorization | Hamed Zamani, Javid Dadashkarimi, Azadeh Shakery, W. Bruce Croft | In this paper, we look at the PRF task as a recommendation problem: the goal is to recommend a number of terms for a given query along with weights, such that the final weights of terms in the updated query model better reflect the terms’ contributions in the query. |
154 | Uncovering the Spatio-Temporal Dynamics of Memes in the Presence of Incomplete Information | Hancheng Ge, James Caverlee, Nan Zhang, Anna Squicciarini | Hence, in this paper, we investigate new methods for uncovering the full (underlying) distribution through a novel spatio-temporal dynamics recovery framework which models the latent relationships among locations, memes, and times. |
155 | From Recommendation to Profile Inference (Rec2PI): A Value-added Service to Wi-Fi Data Mining | Cheng Chen, Fang Dong, Kui Wu, Venkatesh Srinivasan, Alex Thomo | To tackle the technical challenges in profile inference, we propose novel algorithms built using copulas, a statistical tool suitable for capturing complex dependence structure beyond the scope of linear dependence. |
156 | On Backup Battery Data in Base Stations of Mobile Networks: Measurement, Analysis, and Optimization | Xiaoyi Fan, Feng Wang, Jiangchuan Liu | In this paper, we conduct a systematical analysis on a real world dataset collected from the battery groups installed on the base stations of China Mobile, with totally 1,550,032,984 records from July 28th, 2014 to February 17th, 2016. |
157 | Automatic Generation and Validation of Road Maps from GPS Trajectory Data Sets | Hengfeng Li, Lars Kulik, Kotagiri Ramamohanarao | To address these challenges, we propose a novel Spatial-Linear Clustering (SLC) technique to infer road segments from GPS traces. |
158 | Fully Dynamic Shortest-Path Distance Query Acceleration on Massive Networks | Takanori Hayashi, Takuya Akiba, Ken-ichi Kawarabayashi | In this paper, we present the first algorithm that can process exact distance queries on fully dynamic billion-scale networks besides trivial non-indexing algorithms, which combines an online bidirectional breadth-first search (BFS) and an offline indexing method for handling billion-scale networks in memory. |
159 | Hierarchical and Dynamic | Takuya Akiba, Yosuke Yano, Naoto Mizuno | To address these issues, we propose novel k-APC construction and maintenance algorithms. |
160 | Efficient Computation of Importance Based Communities in Web-Scale Networks Using a Single Machine | Shu Chen, Ran Wei, Diana Popova, Alex Thomo | In this paper, our goal is to scale-up the computation of top-r, k-core communities to web-scale graphs of tens of billions of edges. |
161 | Collective Classification via Discriminative Matrix Factorization on Sparsely Labeled Networks | Daokun Zhang, Jie Yin, Xingquan Zhu, Chengqi Zhang | In this paper, we propose a novel discriminative matrix factorization (DMF) based algorithm that effectively learns a latent network representation by exploiting topological paths between labeled and unlabeled nodes, in addition to nodes’ content information. |
162 | LogMine: Fast Pattern Recognition for Log Analytics | Hossein Hamooni, Biplob Debnath, Jianwu Xu, Hui Zhang, Guofei Jiang, Abdullah Mueen | We propose a method, named LogMine, that extracts high quality patterns for a given set of log messages. |
163 | Scaling Factorization Machines with Parameter Server | Erheng Zhong, Yue Shi, Nathan Liu, Suju Rajan | We propose a new system framework that integrates Parameter Server (PS) with the Map/Reduce (MR) framework. |
164 | DI-DAP: An Efficient Disaster Information Delivery and Analysis Platform in Disaster Management | Tao Li, Wubai Zhou, Chunqiu Zeng, Qing Wang, Qifeng Zhou, Dingding Wang, Jia Xu, Yue Huang, Wentao Wang, Minjing Zhang, Steve Luis, Shu-Ching Chen, Naphtali Rishe | To present an integrated solution to address the information explosion problem during the disaster period, we designed and implemented DI-DAP, an efficient and effective disaster information delivery and analysis platform. |
165 | Approximate Aggregates in Oracle 12C | Hong Su, Mohamed Zait, Vladimir Barrière, Joseph Torres, Andre Menck | Alternative algorithms considered in this paper are approximate aggregates that perform a lot better at the cost of reduced and tolerable accuracy. |
166 | Supervised Feature Selection by Preserving Class Correlation | Jun Wang, Jinmao Wei, Zhenglu Yang | In this paper, we propose effective supervised feature selection techniques to address the problems. |
167 | CGMOS: Certainty Guided Minority OverSampling | Xi Zhang, Di Ma, Lin Gan, Shanshan Jiang, Gady Agam | In this paper we propose a novel extension to the SMOTE algorithm with a theoretical guarantee for improved classification performance. |
168 | Learning Hidden Features for Contextual Bandits | Huazheng Wang, Qingyun Wu, Hongning Wang | In this paper, we propose to learn the hidden features for contextual bandit algorithms. |
169 | Constructing Reliable Gradient Exploration for Online Learning to Rank | Tong Zhao, Irwin King | In this paper, we propose two OLR algorithms that improve the reliability of the exploration by constructing robust exploratory directions. |
170 | A Model-Free Approach to Infer the Diffusion Network from Event Cascade | Yu Rong, Qiankun Zhu, Hong Cheng | Different from previous works focusing on building models, we propose to interpret the diffusion process from the cascade data directly in a non-parametric way, and design a novel and efficient algorithm named Non-Parametric Distributional Clustering (NPDC). |
171 | Multiple Infection Sources Identification with Provable Guarantees | Hung T. Nguyen, Preetam Ghosh, Michael L. Mayo, Thang N. Dinh | In this paper, we propose a new approach to identify infection sources by searching for a seed set S that minimizes the symmetric difference between the cascade from S and VI, the given set of infected nodes. |
172 | Information Diffusion at Workplace | Jiawei Zhang, Philip S. Yu, Yuanhua Lv, Qianyi Zhan | In this paper, we want to study the information diffusion among employees at workplace via both online ESNs and online contacts. |
173 | Targeted Influence Maximization in Social Networks | Chonggang Song, Wynne Hsu, Mong Li Lee | In this paper, we formalize the problem targeted influence maximization in social networks. |
174 | Updating an Existing Social Graph Snapshot via a Limited API | Norases Vesdapunt, Hector Garcia-Molina | In this paper, we focus on updating a social graph snapshot. |
175 | Making Sense of Entities and Quantities in Web Tables | Yusra Ibrahim, Mirek Riedewald, Gerhard Weikum | This paper aims to overcome this problem by automatically canonicalizing header names and cell values onto concepts, classes, entities and uniquely represented quantities registered in a knowledge base. |
176 | Influence Maximization for Complementary Goods: Why Parties Fail to Cooperate? | Han-Ching Ou, Chung-Kuang Chou, Ming-Syan Chen | We consider the problem where companies provide different types of products and want to promote their products through viral marketing simultaneously. |
177 | Effective Spelling Correction for Eye-based Typing using domain-specific Information about Error Distribution | Raiza Hanada, Maria da Graça C. Pimentel, Marco Cristo, Fernando Anglada Lores | We address these problems by combining estimates extracted from general error corpora with domain-specific knowledge about eye-based input. |
178 | Computing and Summarizing the Negative Skycube | Nicolas Hanusse, Patrick Kamnang Wanko, Sofian Maabout | In this paper, we consider the complementary statement, i.e., “for every tuple t, list the skylines where t does not belong to". |
179 | Efficient Orthogonal Non-negative Matrix Factorization over Stiefel Manifold | Wei Emma Zhang, Mingkui Tan, Quan Z. Sheng, Lina Yao, Qingfeng Shi | In this paper, we propose a method, called Nonlinear Riemannian Conjugate Gradient ONMF (NRCG-ONMF), which updates U and V alternatively and preserves the orthogonality of U while achieving fast convergence speed. |
180 | Paired Restricted Boltzmann Machine for Linked Data | Suhang Wang, Jiliang Tang, Fred Morstatter, Huan Liu | In this paper, we aim to design a new type of Restricted Boltzmann Machines that takes advantage of linked data. |
181 | LDA Revisited: Entropy, Prior and Convergence | Jianwei Zhang, Jia Zeng, Mingxuan Yuan, Weixiong Rao, Jianfeng Yan | In this paper, we revisit these three algorithms from the entropy perspective, and show that EM can achieve the best predictive perplexity (a standard performance metric for LDA accuracy) by minimizing directly the cross entropy between the observed word distribution and LDA’s predictive distribution. |
182 | Cost-Effective Stream Join Algorithm on Cloud System | Junhua Fang, Rong Zhang, Xiaotong Wang, Tom Z.J. Fu, Zhenjie Zhang, Aoying Zhou | In this paper, we propose a cost-effective stream join algorithm, which ensures the adaptability of Join-Matrix but with lower resources consumption. |
183 | Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks | Ryan A. Rossi, Rong Zhou | In this paper, we propose a key class of hybrid parallel graphlet algorithms that leverages multiple CPUs and GPUs simultaneously for computing k-vertex induced subgraph statistics (called graphlets). |
184 | Scalable Local-Recoding Anonymization using Locality Sensitive Hashing for Big Data Privacy Preservation | Xuyun Zhang, Christopher Leckie, Wanchun Dou, Jinjun Chen, Ramamohanarao Kotagiri, Zoran Salcic | In this paper, we propose a highly scalable approach to local-recoding anonymization in cloud computing, based on Locality Sensitive Hashing (LSH). |
185 | Approximate Discovery of Functional Dependencies for Large Datasets | Tobias Bleifuß, Susanne Bülow, Johannes Frohnhofen, Julian Risch, Georg Wiese, Sebastian Kruse, Thorsten Papenbrock, Felix Naumann | In particular, we introduce AID-FD, an algorithm that approximately discovers FDs within runtimes up to orders of magnitude faster than state-of-the-art FD discovery algorithms. |
186 | On Structural Health Monitoring Using Tensor Analysis and Support Vector Machine with Artificial Negative Data | Prasad Cheema, Nguyen Lu Dang Khoa, Mehrisadat Makki Alamdari, Wei Liu, Yang Wang, Fang Chen, Peter Runcie | In our approach, we propose the use of tensor learning and support vector machines with artificial negative data generated by density estimation techniques for damage detection, localization and estimation in a one-class manner. |
187 | A Self-Learning and Online Algorithm for Time Series Anomaly Detection, with Application in CPU Manufacturing | Xing Wang, Jessica Lin, Nital Patel, Martin Braun | To address these limitations, we propose a self-learning online anomaly detection algorithm that automatically identifies anomalous time series, as well as the exact locations where the anomalies occur in the detected time series. |
188 | Deep Match between Geology Reports and Well Logs Using Spatial Information | Bin Tong, Martin Klinkigt, Makoto Iwayama, Yoshiyuki Kobayashi, Anshuman Sahu, Ravigopal Vennelakanti | We propose both linear and nonlinear (artificial neural network) models to achieve such an embedding. |
189 | MIST: Missing Person Intelligence Synthesis Toolkit | Elham Shaabani, Hamidreza Alvari, Paulo Shakarian, J.E. Kelly Snyder | This paper introduces the Missing Person Intelligence Synthesis Toolkit (MIST) which leverages a data-driven variant of geospatial abductive inference. |
190 | Skipping Word: A Character-Sequential Representation based Framework for Question Answering | Lingxun Meng, Yan Li, Mengyi Liu, Peng Shu | Compared with deep models pre-trained on word embedding (WE) strategy, our character-sequential representation (CSR) based method shows a much simpler procedure and more stable performance across different benchmarks. |
191 | Towards Time-Discounted Influence Maximization | Arijit Khan | The problem that we solve in this paper is to maximize the expected aggregated value of this utility function over all network users. |
192 | Quantifying Query Ambiguity with Topic Distributions | Yuki Yano, Yukihiro Tagami, Akira Tajima | In this paper, we propose a new approach for quantifying query ambiguity using topic distributions. |
193 | ASNets: A Benchmark Dataset of Aligned Social Networks for Cross-Platform User Modeling | Xuezhi Cao, Yong Yu | Therefore, in this paper we propose ASNets, a benchmark dataset with two sets of aligned social networks. |
194 | Data Locality in Graph Engines: Implications and Preliminary Experimental Results | Yong-Yeon Jo, Jiwon Hong, Myung-Hwan Jang, Jae-Geun Bang, Sang-Wook Kim | In this paper, we show the importance of data locality with graph algorithms by running on graph engines based on a single machine. |
195 | Active Zero-Shot Learning | Sihong Xie, Shaoxiong Wang, Philip S. Yu | To resolve this issue, we propose an active class selection strategy to intelligently query labeled data for a parsimonious set of informative classes. |
196 | Learning to Account for Good Abandonment in Search Success Metrics | Madian Khabsa, Aidan Crook, Ahmed Hassan Awadallah, Imed Zitouni, Tasos Anastasakos, Kyle Williams | In this work we describe how a search success metric can be augmented to account for good abandonment sessions using a machine learned metric that depends on user’s viewport information. |
197 | Modeling and Predicting Popularity Dynamics via an Influence-based Self-Excited Hawkes Process | Peng Bao | In this paper, we propose a probabilistic model using an influence-based self-excited Hawkes process (ISEHP) to characterize the process through which individual microblogs gain their popularity. |
198 | Incorporate Group Information to Enhance Network Embedding | Jifan Chen, Qi Zhang, Xuanjing Huang | In this paper, we investigate a novel method for learning the network embeddings with valuable group information for large-scale networks. |
199 | Exploiting Cluster-based Meta Paths for Link Prediction in Signed Networks | Jiangfeng Zeng, Ke Zhou, Xiao Ma, Fuhao Zou, Hua Wang | In order to solve this issue, in this paper, we introduce a novel sign prediction model by exploiting cluster-based meta paths, which can take advantage of both local and global information of the input networks. |
200 | Predicting Importance of Historical Persons using Wikipedia | Adam Jatowt, Daisuke Kawai, Katsumi Tanaka | In this work, we are interested in utilizing Wikipedia for judging historical person’s importance. |
201 | Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks | Jinfeng Rao, Hua He, Jimmy Lin | Unlike previous work which treats this task as a straightforward pointwise classification problem, we model this problem as a ranking task and propose a pairwise ranking approach that can directly exploit existing pointwise neural network models as base components. |
202 | Global and Local Influence-based Social Recommendation | Qinzhe Zhang, Jia Wu, Hong Yang, Weixue Lu, Guodong Long, Chengqi Zhang | In this paper, we introduce a new global and local influence-based social recommendation model. |
203 | Tag-Aware Personalized Recommendation Using a Deep-Semantic Similarity Model with Negative Sampling | Zhenghua Xu, Cheng Chen, Thomas Lukasiewicz, Yishu Miao, Xiangwu Meng | In this paper, we propose a deep neural network approach to solve this problem by mapping both the tag-based user and item profiles to an abstract deep feature space, where the deep-semantic similarities between users and their target items (resp., irrelevant items) are maximized (resp., minimized). |
204 | Personalized Semantic Word Vectors | Javid Ebrahimi, Dejing Dou | In this paper, we present a word representation scheme that incorporates authorship information. |
205 | Query Expansion Using Word Embeddings | Saar Kuzi, Anna Shtok, Oren Kurland | We present a suite of query expansion methods that are based on word embeddings. |
206 | Efficient Distributed Regular Path Queries on RDF Graphs Using Partial Evaluation | Xin Wang, Junhu Wang, Xiaowang Zhang | We propose an efficient distributed method for answering regular path queries (RPQs) on large-scale RDF graphs using partial evaluation. |
207 | Webpage Depth-level Dwell Time Prediction | Chong Wang, Achir Kalra, Cristian Borcea, Yi Chen | This paper presents a model to predict the dwell time for a given "user, webpage, depth" triplet based on historic data collected by publishers. |
208 | Collaborative Social Group Influence for Event Recommendation | Li Gao, Jia Wu, Zhi Qiao, Chuan Zhou, Hong Yang, Yue Hu | To this end, we propose a new Bayesian latent factor model SogBmf that combines social group influence and individual preference for event recommendation. |
209 | Graph-Based Multi-Modality Learning for Clinical Decision Support | Ziwei Zheng, Xiaojun Wan | In this paper, we propose to use the paragraph vector technique to learn the latent semantic representation of texts and treat the latent semantic representations and the original bag-of-words representations as two different modalities. |
210 | Where are You Tweeting?: A Context and User Movement Based Approach | Zhi Liu, Yan Huang | In this paper, we propose a Hidden-Markov-based model to integrate tweet contents and user movements for geotagging. |
211 | Ensemble Learned Vaccination Uptake Prediction using Web Search Queries | Niels Dalum Hansen, Christina Lioma, Kåre Mølbak | We present a method that uses ensemble learning to combine clinical and web-mined time-series data in order to predict future vaccination uptake. |
212 | Location-aware Friend Recommendation in Event-based Social Networks: A Bayesian Latent Factor Approach | Yao Lu, Zhi Qiao, Chuan Zhou, Yue Hu, Li Guo | In this paper we study the friend recommendation problem in event-based social networks (EBSNs). |
213 | Extracting Skill Endorsements from Personal Communication Data | Darshan M. Shankaralingappa, Gianmarco De Fransicsi Morales, Aristides Gionis | In this paper, we mine personal communication data with the goal of generating skill endorsements of the type "person A endorses person B on skill X." To address privacy concerns, we consider that each person has access only to their own data (i.e., conversations with their peers). |
214 | A Self-Organizing Map for Identifying InfluentialCommunities in Speech-based Networks | Sameen Mansha, Faisal Kamiran, Asim Karim, Aizaz Anwar | In this paper, we present a self-organizing map (SOM) for discovering and visualizing influential communities of users in SBNs. |
215 | Crowdsourcing-based Urban Anomaly Prediction System for Smart Cities | Chao Huang, Xian Wu, Dong Wang | In this paper, we develop a Crowdsourcing-based Urban Anomaly Prediction Scheme (CUAPS) to accurately predict the anomalies of a city by exploring both spatial and temporal information embedded in the crowdsourcing data. |
216 | Near Real-time Geolocation Prediction in Twitter Streams via Matrix Factorization Based Regression | Nghia Duong-Trung, Nicolas Schilling, Lars Schmidt-Thieme | In this work, we develop a novel generative content-based regression model via a matrix factorization technique to tackle the near real-time geolocation prediction problem. |
217 | Distilling Word Embeddings: An Encoding Approach | Lili Mou, Ran Jia, Yan Xu, Ge Li, Lu Zhang, Zhi Jin | We propose an encoding approach to distill task-specific knowledge from a set of high-dimensional embeddings, so that we can reduce model complexity by a large margin as well as retain high accuracy, achieving a good compromise between efficiency and performance. |
218 | Regularising Factorised Models for Venue Recommendation using Friends and their Comments | Jarana Manotumruksa, Craig Macdonald, Iadh Ounis | We propose a MF regularisation technique that seamlessly incorporates both social network information and textual comments, by exploiting word embeddings to estimate a semantic similarity of friends based on their explicit textual feedback, to regularise the complexity of the factorised model. |
219 | Improving Search Results with Prior Similar Queries | Yashar Moshfeghi, Kristiyan Velinov, Peter Triantafillou | This paper describes a novel approach to re-ranking search engine result pages (SERP): Its fundamental principle is to re-rank results to a given query, based on exploiting evidence gathered from past similar search queries. We construct a set of features from our similarity graph and build a prediction model using the Hoeffding decision tree algorithm. |
220 | The Solitude of Relevant Documents in the Pool | Aldo Lipani, Mihai Lupu, Evangelos Kanoulas, Allan Hanbury | Recently, methods to address this pool bias for previously created test collections have been proposed, for the evaluation measure precision at cut-off ([email protected]). |
221 | Scarce Feature Topic Mining for Video Recommendation | Wei Lu, Fu-lai Chung, Kunfeng Lai | Targeting the long tail phenomena of user behavior and sparsity of item features, we propose a personalized compound recommendation framework for online video recommendation called Dirichlet mixture probit model for information scarcity (DPIS). |
222 | Learning to Re-Rank Questions in Community Question Answering Using Advanced Features | Giovanni Da San Martino, Alberto Barrón Cedeño, Salvatore Romeo, Antonio Uva, Alessandro Moschitti | We study the impact of different types of features for question ranking in community Question Answering: bag-of-words models (BoW), syntactic tree kernels (TKs) and rank features. |
223 | Learning to Rank System Configurations | Romain Deveaud, Josiane Mothe, Jian-Yun Nie | We propose to tackle this problem by dealing with entire system configurations (i.e. a set of parameters representing an IR system) instead of single parameters, and to apply state-of-the-art Learning to Rank techniques to select the most appropriate configuration for a given query. |
224 | Adaptive Distributional Extensions to DFR Ranking | Casper Petersen, Jakob Grue Simonsen, Kalervo Järvelin, Christina Lioma | Adaptive Distributional Extensions to DFR Ranking |
225 | CyberRank: Knowledge Elicitation for Risk Assessment of Database Security | Hagit Grushka – Cohen, Oded Sofer, Ofer Biller, Bracha Shapira, Lior Rokach | In this paper, we propose CyberRank, a novel algorithm for automatic preference elicitation that is effective for situations with limited experts’ time and outperforms other algorithms for initial training of the system. |
226 | Online Food Recipe Title Semantics: Combining Nutrient Facts and Topics | Tomasz Kusmierczyk, Kjetil Nørvåg | To contribute to this lack of knowledge, we present a novel approach to mine and model online food content by combining text topics with related nutrient facts. |
227 | A Non-Parametric Topic Model for Short Texts Incorporating Word Coherence Knowledge | Yuhao Zhang, Wenji Mao, Daniel Zeng | To tackle these problems, in this paper, we propose a non-parametric topic model npCTM with the above distinction. |
228 | Forecasting Seasonal Time Series Using Weighted Gradient RBF Network based Autoregressive Model | Wenjie Ruan, Quan Z. Sheng, Peipei Xu, Nguyen Khoi Tran, Nickolas J.G. Falkner, Xue Li, Wei Emma Zhang | In this paper, we propose a weighted gradient Radial Basis Function Network based AutoRegressive (WGRBF-AR) model for modeling and predicting the nonlinear and non-stationary seasonal time series. |
229 | When Sensor Meets Tensor: Filling Missing Sensor Values Through a Tensor Approach | Wenjie Ruan, Peipei Xu, Quan Z. Sheng, Nguyen Khoi Tran, Nickolas J.G. Falkner, Xue Li, Wei Emma Zhang | In this paper, we formulate the time-series sensor data as a 3-order tensor that naturally preserves sensors’ temporal and spatial dependencies. |
230 | PEQ: An Explainable, Specification-based, Aspect-oriented Product Comparator for E-commerce | Abhishek Sikchi, Pawan Goyal, Samik Datta | In this paper, we extend the existing model by incorporating the feature specifications of the products, which are easily available, and learn the importance to be associated with each of them. |
231 | Forecasting Geo-sensor Data with Participatory Sensing Based on Dropout Neural Network | Jyun-Yu Jiang, Cheng-Te Li | In this paper, we propose a novel concept to forecast geosensor data with participatory sensing. |
232 | Iterative Search using Query Aspects | Manmeet Singh, W. Bruce Croft | We propose a new iterative feedback method that combines PRF with aspect generation to improve feedback effectiveness. |
233 | A Preference Approach to Reputation in Sponsored Search | Aritra Ghosh, Dinesh Gaurav, Rahul Agrawal | In this study, we motivate and propose a pairwise preference relation model to study the advertiser reputation problem. |
234 | Clustering Speed in Multi-lane Traffic Networks | Bing Zhang, Goce Trajcevski, Feiying Liu | We address the problem of efficient spatio-temporal clustering of speed data in road segments with multiple lanes. |
235 | Learning to Rank Non-Factoid Answers: Comment Selection in Web Forums | Kateryna Tymoshenko, Daniele Bonadiman, Alessandro Moschitti | In this paper, we design state-of-the-art models for non-factoid QA also carried out on noisy data. |
236 | A Theoretical Framework on the Ideal Number of Classifiers for Online Ensembles in Data Streams | Hamed R. Bonab, Fazli Can | Our theoretical framework shows that using the same number of independent component classifiers as class labels gives the highest accuracy. |
237 | User Modeling on Twitter with WordNet Synsets and DBpedia Concepts for Personalized Recommendations | Guangyuan Piao, John G. Breslin | In this short paper, instead of using concepts alone, we propose using synsets from WordNet and concepts from DBpedia for representing user interests. |
238 | Improving Entity Ranking for Keyword Queries | John Foley, Brendan O’Connor, James Allan | We propose a set of features that do not require index-time entity linking, and demonstrate competitive performance on the new dataset. |
239 | The Healing Power of Poison: Helpful Non-relevant Documents in Feedback | Mostafa Dehghani, Samira Abnar, Jaap Kamps | In this paper, we study the positive counterpart of this by investigating the helpfulness of nonrelevant documents in feedback. |
240 | Probabilistic Approaches to Controversy Detection | Myungha Jang, John Foley, Shiri Dori-Hacohen, James Allan | In this paper, we propose a probabilistic framework to detect controversy on the web, and investigate two models. |
241 | Evaluating Document Retrieval Methods for Resource Selection in Clustered P2P IR | Rami Suleiman Alkhawaldeh, Joemon M. Jose, Deepak P | We observe that semantic heterogeneity is mitigated in the clustered 2-tier P2P IR architecture resource selection layer by way of usage of clustering, and posit that this necessitates a re-look at the applicability of document retrieval methods for resource selection within such a framework. |
242 | Detecting and Ranking Conceptual Links between Texts Using a Knowledge Base | Martin Tutek, Goran Glavas, Jan Šnajder, Natasa Milić-Frayling, Bojana Dalbelo Basic | Recent research has explored the use of Knowledge Bases (KBs) to represent documents as subgraphs of a KB concept graph and define metrics to characterize semantic relatedness of documents in terms of properties of the document concept graphs. |
243 | DePP: A System for Detecting Pages to Protect in Wikipedia | Kelsey Suyehira, Francesca Spezzano | In this paper we consider for the first time the problem of deciding whether a page should be protected or not in a collaborative environment such as Wikipedia. We formulate the problem as a binary classification task and propose a novel set of features to decide which pages to protect based on (i) users page revision behavior and (ii) page categories. |
244 | Hashtag Recommendation Based on Topic Enhanced Embedding, Tweet Entity Data and Learning to Rank | Quanzhi Li, Sameena Shah, Armineh Nourbakhsh, Xiaomo Liu, Rui Fang | In this paper, we present a new approach of recommending hashtags for tweets. |
245 | An Experimental Comparison of Iterative MapReduce Frameworks | Haejoon Lee, Minseo Kang, Sun-Bum Youn, Jae-Gil Lee, YongChul Kwon | In this paper, we experimentally compare Hadoop and the aforementioned systems using various workloads and metrics. |
246 | A Density-Based Approach to the Retrieval of Top-K Spatial Textual Clusters | Dingming Wu, Christian S. Jensen | To compute this query, the paper proposes a basic and an advanced algorithm that rely on on-line density-based clustering. |
247 | Top-N Recommendation on Graphs | Zhao Kang, Chong Peng, Ming Yang, Qiang Cheng | To alleviate this problem, this paper proposes a simple recommendation algorithm that fully exploits the similarity information among users and items and intrinsic structural information of the user-item matrix. |
248 | KB-Enabled Query Recommendation for Long-Tail Queries | Zhipeng Huang, Bogdan Cautis, Reynold Cheng, Yudian Zheng | To handle such queries, we study a new solution, which makes use of a knowledge base (or KB), such as YAGO and Freebase. |
249 | RAP: Scalable RPCA for Low-rank Matrix Recovery | Chong Peng, Zhao Kang, Ming Yang, Qiang Cheng | In this paper, we propose a novel RPCA approach that eliminates the need for SVD of large matrices. |
250 | Query Answering Efficiency in Expert Networks Under Decentralized Search | Liang Ma, Mudhakar Srivatsa, Derya Cansever, Xifeng Yan, Sue Kase, Michelle Vanni | In this regard, we investigate decentralized search by quantifying its performance under a variety of network settings. |
251 | A Study of Realtime Summarization Metrics | Matthew Ekstrand-Abueg, Richard McCreadie, Virgil Pavlu, Fernando Diaz | In this paper, we present a study of TREC-TS track evaluation methodology, with the aim of documenting its design, analyzing its effectiveness, as well as identifying improvements and best practices for the evaluation of temporal summarization systems. |
252 | Framing Mobile Information Needs: An Investigation of Hierarchical Query Sequence Structure | Shuguang Han, Xing Yi, Zhen Yue, Zhigeng Geng, Alyssa Glass | (2) We identify several differences between mobile and desktop search patterns in terms of goal/mission length, duration and interleaving. |
253 | A Context-aware Collaborative Filtering Approach for Urban Black Holes Detection | Li Jin, Zhuonan Feng, Ling Feng | In this paper, we model the urban black holes in each region of New York City (NYC) at different time intervals with a 3-dimensional tensor by fusing cross-domain data sources. |
254 | Combining Powers of Two Predictors in Optimizing Real-Time Bidding Strategy under Constrained Budget | Chi-Chun Lin, Kun-Ta Chuang, Wush Chi-Hsuan Wu, Ming-Syan Chen | In this paper, a method combining powers of two prediction models is proposed, and experiments with real world RTB datasets from benchmarking the new algorithm with a classic CTR-only method are presented. |
255 | Attractiveness versus Competition: Towards an Unified Model for User Visitation | Thanh-Nam Doan, Ee-Peng Lim | Attractiveness versus Competition: Towards an Unified Model for User Visitation |
256 | OptMark: A Toolkit for Benchmarking Query Optimizers | Zhan Li, Olga Papaemmanouil, Mitch Cherniack | To address this challenge, this paper introduces OptMark, a toolkit for evaluating the quality of a query optimizer. |
257 | Multi-Dueling Bandits and Their Application to Online Ranker Evaluation | Brian Brost, Yevgeny Seldin, Ingemar J. Cox, Christina Lioma | We evaluate our algorithm on standard large-scale online ranker evaluation datasets. |
258 | Robust Contextual Outlier Detection: Where Context Meets Sparsity | Jiongqian Liang, Srinivasan Parthasarathy | To address these problems, here we propose a novel and robust approach alternative to the state-of-the-art called RObust Contextual Outlier Detection (ROCOD). |
259 | Credibility Assessment of Textual Claims on the Web | Kashyap Popat, Subhabrata Mukherjee, Jannik Strötgen, Gerhard Weikum | For inference, our method leverages the joint interaction between the language of articles about the claim and the reliability of the underlying web sources. |
260 | Collective Traffic Prediction with Partially Observed Traffic History using Location-Based Social Media | Xinyue Liu, Xiangnan Kong, Yanhua Li | In this paper, we propose to use location-based social media, which captures a much larger area of the road systems than deployed sensors, to predict the traffic conditions. |
261 | Recommendations For Streaming Data | Karthik Subbian, Charu Aggarwal, Kshiteesh Hegde | In this paper, we present a probabilistic neighborhood-based algorithm for performing recommendations in real-time. |
262 | PRO: Preference-Aware Recurring Query Optimization | Zhongfang Zhuang, Chuan Lei, Elke Rundensteiner, Mohamed Eltabakh | In this work, we propose PRO, a preference-aware recurring query processing system that optimizes recurring query executions complying with user preferences. |
263 | Discovering Temporal Purchase Patterns with Different Responses to Promotions | Ling Luo, Bin Li, Irena Koprinska, Shlomo Berkovsky, Fang Chen | Given a transaction data set collected by an Australian national supermarket chain, in this paper we conduct a case study aimed at discovering customers’ long-term purchase patterns, which may be induced by preference changes, as well as short-term purchase patterns, which may be induced by promotions. |
264 | ZEST: A Hybrid Model on Predicting Passenger Demand for Chauffeured Car Service | Hua Wei, Yuandong Wang, Tianyu Wo, Yaxiao Liu, Jie Xu | In this paper, we propose a Zero-Grid Ensemble Spatio Temporal model (ZEST) to predict passenger demand with four predictors: a temporal predictor and a spatial predictor to model the influences of local and spatial factors separately, an ensemble predictor to combine the results of former two predictors comprehensively and a Zero-Grid predictor to predict zero demand areas specifically since any cruising within these areas costs extra waste on energy and time of driver. |
265 | A Filtering-based Clustering Algorithm for Improving Spatio-temporal Kriging Interpolation Accuracy | Qiao Kang, Wei-keng Liao, Ankit Agrawal, Alok Choudhary | To address this problem, this paper presents a new filtering-based clustering algorithm that partitions data into clusters such that the interpolation error within each cluster is significantly reduced, which in turn improves the overall accuracy. |
266 | Reuse-based Optimization for Pig Latin | Jesús Camacho-Rodríguez, Dario Colazzo, Melanie Herschel, Ioana Manolescu, Soudip Roy Chowdhury | We present a novel optimization approach aiming at identifying and reusing repeated subexpressions in Pig Latin scripts. |
267 | Discriminative View Learning for Single View Co-Training | Joseph St.Amand, Jun Huan | In this paper, we investigate techniques to apply co-training to single-view data sets. |
268 | Learning Points and Routes to Recommend Trajectories | Dawei Chen, Cheng Soon Ong, Lexing Xie | We propose a new F1 score on pairs of POIs that capture the order of visits. |
269 | Towards Representation Independent Similarity Search Over Graph Databases | Yodsawalai Chodpathumwan, Amirhossein Aleyasen, Arash Termehchy, Yizhou Sun | We propose an algorithm called R-PathSim, which is provably robust under relationship reorganizing. |
270 | Why Did You Cover That Song?: Modeling N-th Order Derivative Creation with Content Popularity | Kosetsu Tsukuda, Masahiro Hamasaki, Masataka Goto | In this paper, we propose a model for inferring latent factors from sequences of derivative work posting events. |
271 | Anomalies in the Peer-review System: A Case Study of the Journal of High Energy Physics | Sandipan Sikdar, Matteo Marsili, Niloy Ganguly, Animesh Mukherjee | Since editors and reviewers are the most important pillars of a reviewing system, we in this work, attempt to address a related question – given the editing/reviewing history of the editors or reviewers "can we identify the under-performing ones?" |
272 | Multi-source Hierarchical Prediction Consolidation | Chenwei Zhang, Sihong Xie, Yaliang Li, Jing Gao, Wei Fan, Philip S. Yu | We propose a novel multi-source hierarchical prediction consolidation method to effectively exploits the complicated hierarchical label structures to resolve the noisy and conflicting information that inherently originates from multiple imperfect sources. |
273 | Probabilistic Knowledge Graph Construction: Compositional and Incremental Approaches | Dongwoo Kim, Lexing Xie, Cheng Soon Ong | We propose a new probabilistic knowledge graph factorisation method that benefits from the path structure of existing knowledge (e.g. syllogism) and enables a common modelling approach to be used for both incremental population and knowledge completion tasks. |
274 | Explaining Sentiment Spikes in Twitter | Anastasia Giachanou, Ida Mele, Fabio Crestani | In this paper, we focus on the problem of tracking sentiment towards different entities, detecting sentiment spikes and on the problem of extracting and ranking the causes of a sentiment spike. |
275 | Qualitative Cleaning of Uncertain Data | Henning Koehler, Sebastian Link | We propose a new view on data cleaning: Not data itself but the degrees of uncertainty attributed to data are dirty. |
276 | APAM: Adaptive Eager-Lazy Hybrid Evaluation of Event Patterns for Low Latency | Ilyeop Yi, Jae-Gil Lee, Kyu-Young Whang | In this paper, we propose a hybrid eager-lazy evaluation method that combines the advantages of both methods. |
277 | OrientStream: A Framework for Dynamic Resource Allocation in Distributed Data Stream Management Systems | Chunkai Wang, Xiaofeng Meng, Qi Guo, Zujian Weng, Chen Yang | This article presents OrientStream, a framework for dynamic resource allocation in DDSMS using incremental machine learning techniques. |
278 | Tag2Word: Using Tags to Generate Words for Content Based Tag Recommendation | Yong Wu, Yuan Yao, Feng Xu, Hanghang Tong, Jian Lu | In this paper, we put our focus on the content based tag recommendation due to its wider applicability. |
279 | Digesting Multilingual Reader Comments via Latent Discussion Topics with Commonality and Specificity | Bei Shi, Wai Lam, Lidong Bing, Yinqing Xu | To tackle this task of discovering discussion topics that exhibit commonality or specificity from news reader comments written in different languages, we propose a new model called TDCS based on graphical models, which can cope with the language gap and detect language-common and language-specific latent discussion topics simultaneously. |
280 | Digesting News Reader Comments via Fine-Grained Associations with Event Facets and News Contents | Bei Shi, Wai Lam | We propose a framework that can digest reader comments automatically via fine-grained associations with event facets and news. |
281 | Efficient Algorithms for the Two Locus Problem in Genome-Wide Association Study: Algorithms for the Two Locus Problem | Sanguthevar Rajasekaran, Subrata Saha | In this paper we present an algorithm for solving the 2-locus problem that is up to two orders of magnitude faster than the previous best known algorithms. |
282 | FolkTrails: Interpreting Navigation Behavior in a Social Tagging System | Thomas Niebler, Martin Becker, Daniel Zoller, Stephan Doerfel, Andreas Hotho | In this work, we investigate navigation trails in the popular scholarly social tagging system BibSonomy from six years of log data. |
283 | Memory-Optimized Distributed Graph Processing through Novel Compression Techniques | Panagiotis Liakos, Katia Papakonstantinopoulou, Alex Delis | In this paper, we propose three space-efficient adjacency list representations that can be applied to any distributed graph processing system. |
284 | Tracking the Evolution of Congestion in Dynamic Urban Road Networks | Tarique Anwar, Chengfei Liu, Hai L. Vu, Md. Saiful Islam | In this paper, we propose a two-layer method to incrementally update the differently congested partitions from those at the previous time point in an efficient manner, and thus track their evolution. |
285 | The Rich and the Poor: A Markov Decision Process Approach to Optimizing Taxi Driver Revenue Efficiency | Huigui Rong, Xun Zhou, Chang Yang, Zubair Shafiq, Alex Liu | To address these issues, this paper investigates how to increase the revenue efficiency (revenue per unit time) of taxi drivers, and models the passenger seeking process as a Markov Decision Process (MDP). |
286 | Ensemble of Anchor Adapters for Transfer Learning | Fuzhen Zhuang, Ping Luo, Sinno Jialin Pan, Hui Xiong, Qing He | Aiming at more robust transfer learning models, we propose an ENsemble framework of anCHOR adapters (ENCHOR for short), in which an anchor adapter adapts the features of instances based on their similarities to a specific anchor (i.e., a selected instance). |
287 | Incremental Mining of High Utility Sequential Patterns in Incremental Databases | Jun-Zhe Wang, Jiun-Long Huang | In view of this, we propose the IncUSP-Miner algorithm to mine HUSPs incrementally. |
288 | Understanding Stability of Noisy Networks through Centrality Measures and Local Connections | Vladimir Ufimtsev, Soumya Sarkar, Animesh Mukherjee, Sanjukta Bhowmick | In this paper, we study the effect of noise in changing ranks of the high centrality vertices. |
289 | Online Adaptive Topic Focused Tweet Acquisition | Mehdi Sadri, Sharad Mehrotra, Yaming Yu | In this paper, we address the tweet acquisition challenge to enhance monitoring of tweets based on the client/application needs in an online adaptive manner such that the quality and quantity of the results improves over time. |
290 | Optimizing Nugget Annotations with Active Learning | Gaurav Baruah, Haotian Zhang, Rakesh Guttikonda, Jimmy Lin, Mark D. Smucker, Olga Vechtomova | In this paper, we present two active learning techniques that prioritize the sequence in which candidate nugget/sentence pairs are presented to an assessor, based on the likelihood that the sentence contains a nugget. |
291 | Uncovering Fake Likers in Online Social Networks | Prudhvi Ratna Badri Satya, Kyumin Lee, Dongwon Lee, Thanh Tran, Jason (Jiasheng) Zhang | Toward this goal, in this paper, we investigate the problem of detecting the so-called "fake likers" who frequently make fake Likes for illegitimate reasons. |
292 | Where to Place Your Next Restaurant?: Optimal Restaurant Placement via Leveraging User-Generated Reviews | Feng Wang, Li Chen, Weike Pan | In this paper, we particularly take advantage of user-generated reviews to construct predictive features for assessing the attractiveness of candidate locations to expand a restaurant. |
293 | Leveraging the Implicit Structure within Social Media for Emergent Rumor Detection | Justin Sampson, Fred Morstatter, Liang Wu, Huan Liu | In this work, we propose a method for classifying conversations within their formative stages as well as improving accuracy within mature conversations through the discovery of implicit linkages between conversation fragments. |
294 | Automatical Storyline Generation with Help from Twitter | Ting Hua, Xuchao Zhang, Wei Wang, Chang-Tien Lu, Naren Ramakrishnan | This paper introduces a Bayesian model to generate storylines from massive documents and infer the corresponding hidden relations and topics. |
295 | A Comparative Study of Query-biased and Non-redundant Snippets for Structured Search on Mobile Devices | Nikita V. Spirin, Alexander S. Kotov, Karrie G. Karahalios, Vassil Mladenov, Pavel A. Izhutov | To investigate what kind of snippets are better suited for structured search on mobile devices, we built an experimental mobile search application and conducted a task-oriented interactive user study with 36 participants. |
296 | Content-Agnostic Malware Detection in Heterogeneous Malicious Distribution Graph | Ibrahim Alabdulmohsin, YuFei Han, Yun Shen, XiangLiang Zhang | We propose a novel Bayesian label propagation model to unify the multi-source information, including content-agnostic features of different node types and topological information of the heterogeneous network. |
297 | Improving Advertisement Recommendation by Enriching User Browser Cookie Attributes | Liang Wang, Kuang-chih Lee, Quan Lu | In this paper, we try to tackle this problem by using an `assistant identifier’ to find the linkage between different bcookies. |
298 | Balanced Supervised Non-Negative Matrix Factorization for Childhood Leukaemia Patients | Ali Braytee, Daniel R. Catchpoole, Paul J. Kennedy, Wei Liu | This paper proposes a method with twofold objectives: it implements a balanced supervised non-negative matrix factorization (BSNMF) to handle the class imbalance problem in supervised non-negative matrix factorization techniques. |
299 | SoLSCSum: A Linked Sentence-Comment Dataset for Social Context Summarization | Minh-Tien Nguyen, Chien-Xuan Tran, Duc-Vu Tran, Minh-Le Nguyen | This paper presents a dataset named SoLSCSum for social context summarization. |
300 | Distributed Deep Learning for Question Answering | Minwei Feng, Bing Xiang, Bowen Zhou | Comparison studies of SGD, MSGD, ADADELTA, ADAGRAD, ADAM/ADAMAX, RMSPROP, DOWNPOUR and EASGD/EAMSGD algorithms have been presented. |
301 | Bus Routes Design and Optimization via Taxi Data Analytics | Seong Ping Chuah, Huayu Wu, Yu Lu, Liang Yu, Stephane Bressan | In this paper, we describe a proof of concept effort to discover this weakness and its improvement in public transportation system via mining of taxi ride dataset. |
302 | Routing an Autonomous Taxi with Reinforcement Learning | Miyoung Han, Pierre Senellart, Stéphane Bressan, Huayu Wu | In this paper we demonstrate that a reinforcement learning algorithm of the Q-learning family, based on a customized exploration and exploitation strategy, is able to learn optimal actions for the routing autonomous taxis in a real scenario at the scale of the city of Singapore with pick-up and drop-off events for a fleet of one thousand taxis. |
303 | XKnowSearch!: Exploiting Knowledge Bases for Entity-based Cross-lingual Information Retrieval | Lei Zhang, Michael Färber, Achim Rettinger | In this paper, we present XKnowSearch! |
304 | TweetSift: Tweet Topic Classification Based on Entity Knowledge Base and Topic Enhanced Word Embedding | Quanzhi Li, Sameena Shah, Xiaomo Liu, Armineh Nourbakhsh, Rui Fang | In this demonstration, we present TweetSift, an efficient and effective real time tweet topic classifier. |
305 | PARC: Privacy-Aware Data Cleaning | Dejun Huang, Dhruv Gairola, Yu Huang, Zheng Zheng, Fei Chiang | In this demonstration, we present PARC, a Privacy-AwaRe data Cleaning system that corrects data inconsistencies w.r.t. a set of FDs, and limits the disclosure of sensitive values during the cleaning process. |
306 | Ease the Process of Machine Learning with Dataflow | Tianyou Guo, Jun Xu, Xiaohui Yan, Jianpeng Hou, Ping Li, Zhaohui Li, Jiafeng Guo, Xueqi Cheng | In this demo we present a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks. |
307 | FIN10K: A Web-based Information System for Financial Report Analysis and Visualization | Yu-Wen Liu, Liang-Chih Liu, Chuan-Ju Wang, Ming-Feng Tsai | In this demonstration, we present FIN10K, a web-based information system that facilitates the analysis of textual information in financial reports. |
308 | FeatureMiner: A Tool for Interactive Feature Selection | Kewei Cheng, Jundong Li, Huan Liu | In this demonstration, we show (1) How to conduct data preprocessing after loading a dataset; (2) How to apply feature selection algorithms; (3) How to choose a suitable algorithm by visualized performance evaluation. |
309 | Deola: A System for Linking Author Entities in Web Document with DBLP | Yinan Liu, Wei Shen, Xiaojie Yuan | In this paper, we present Deola, an Online system for Author Entity Linking with DBLP. |
310 | ConHub: A Metadata Management System for Docker Containers | Chris Xing Tian, Aditya Pan, Yong Chiang Tay | ConHub: A Metadata Management System for Docker Containers |
311 | BIGtensor: Mining Billion-Scale Tensor Made Easy | Namyong Park, Byungsoo Jeon, Jungwoo Lee, U Kang | In this paper, we propose BIGtensor, a large-scale tensor mining library that tackles both of the above problems. |
312 | eGraphSearch: Effective Keyword Search in Graphs | Mehdi Kargar, Lukasz Golab, Jaroslaw Szlichta | We demonstrate eGraphSearch, a new system for effective keyword search in graph databases. |
313 | EnerQuery: Energy-Aware Query Processing | Amine Roukh, Ladjel Bellatreche, Carlos Ordonez | In this paper, we propose EnerQuery, a tool built on top of a traditional DBMS to capitalize the efforts invested in building energy-aware query optimizers, which have the lion’s share in energy consumption. |
314 | TGraph: A Temporal Graph Data Management System | Haixing Huang, Jinghe Song, Xuelian Lin, Shuai Ma, Jinpeng Huai | To solve these issues, we design and develop TGraph, a temporal graph data management system, that assures the ACID transaction feature, and supports fast temporal graph queries. |
315 | Analyzing Data Relevance and Access Patterns of Live Production Database Systems | Martin Boissier, Carsten Alexander Meyer, Timo Djürken, Jan Lindemann, Kathrin Mao, Pascal Reinhardt, Tim Specht, Tim Zimmermann, Matthias Uflacker | In this paper, we present a tool set to analyze and compare synthetic and real-world database workloads, their characteristics, and access patterns. |
316 | Thymeflow, A Personal Knowledge Base with Spatio-temporal Data | David Montoya, Thomas Pellissier Tanon, Serge Abiteboul, Fabian M. Suchanek | We demonstrate an open-source system for integrating user’s data from different sources into a single Knowledge Base. |
317 | Inferring Traffic Incident Start Time with Loop Sensor Data | Mingxuan Yue, Liyue Fan, Cyrus Shahabi | We present INFIT, a system that infers the incident start time utilizing traffic data collected by loop sensors. |
318 | TEAMOPT: Interactive Team Optimization in Big Networks | Liangyue Li, Hanghang Tong, Nan Cao, Kate Ehrlich, Yu-Ru Lin, Norbou Buchler | An interesting research question we address in this work is how to maintain and optimize the team performance should certain changes happen to the team. |
319 | GStreamMiner: A GPU-accelerated Data Stream Mining Framework | Chandima HewaNadungodage, Yuni Xia, John Jaehwan Lee | In this paper, we present GStreamMiner, a GPU-accelerated data stream mining framework and demonstrate its application using outlier detection over continuous streaming data as a case study. |
320 | QART: A Tool for Quality Assurance in Real-Time in Contact Centers | Ragunathan Mariappan, Balaji Peddamuthu, Preethi R Raajaratnam, Sandipan Dandapat, Neeta Pande, Shourya Roy | In this paper, we describe an automatic real-time quality assurance system QART (pronounced cart) for contact center chats. |
321 | A Fatigue Strength Predictor for Steels Using Ensemble Data Mining: Steel Fatigue Strength Predictor | Ankit Agrawal, Alok Choudhary | We have developed advanced data-driven ensemble predictive models for this purpose with an extremely high cross-validated accuracy of >98\%, and have deployed these models in a user-friendly online web-tool, which can make very fast predictions of fatigue strength for a given steel represented by its composition and processing information. |
322 | CyberSafety 2016: The First International Workshop on Computational Methods in CyberSafety | Shivakant Mishra, Qin Lv, Richard Han, Jeremy Blackburn | The main goal of this inaugural workshop on cybersafety is to bring together the researchers and practitioners from academia, industry, government and research labs working in the area of cybersafety to discuss the unique challenges in addressing various cybersafety issues and to share experiences, solutions, tools, and techniques. |
323 | The Fourth International Workshop on Social Web for Disaster Management (SWDM 2016) | Carlos Castillo, Fernando Diaz, Yu-Ru Lin, Jie Yin | As massive amount of messages posted by users are transformed into semi-structured records via information extraction and natural language processing techniques, there is a growing need for developing advanced techniques to aggregate this large-scale data to gain an understanding of the “big picture” of an emergency, and to detect and predict how a disaster could develop. |
324 | BigNet 2016: First Workshop on Big Network Analytics | Jie Tang, Keke Cai, Zhong Su, Hanghang Tong, Michalis Vazirgiannis, Yang Yang | The main objective of the workshop is to provide a forum for presenting the most recent advances in mining big networks to unearth rich knowledge. |
325 | DDTA 2016: The Workshop on Data-Driven Talent Acquisition | Yi Fang, Maarten de Rijke, Huangming Xie | The aim of this workshop is to provide a forum for industry and academia to discuss the recent progress in talent search and management, and how the use of big data and data-driven decision making can advance talent acquisition and human resource management. |
326 | ACM DAVA’16: 2nd International Workshop on DAta mining meets Visual Analytics at Big Data Era | Lei Shi, Hanghang Tong, Chaoli Wang, Leman Akoglu | Three keynote speakers from both data mining and visualization give invited talks in this workshop (40-minute each). |
327 | DTMBIO 2016: The Tenth International Workshop on Data and Text Mining in Biomedical Informatics | Sangwoo Kim, Jake Y. Chen, Vincenzo Cutello, Doheon Lee | DTMBIO 2016: The Tenth International Workshop on Data and Text Mining in Biomedical Informatics |