Paper Digest: CIKM 2015 Highlights

November 1, 2015June 26, 2020 admin

The ACM Conference on Information and Knowledge Management (CIKM) is an annual computer science research conference dedicated to information management and knowledge management.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: CIKM 2015 Papers

	Title	Authors	Highlight
1	Slow Search: Improving Information Retrieval Using Human Assistance	Jaime Teevan	We propose the concept of "slow search," where search engines use additional time to provide a higher quality search experience than is possible given conventional time constraints.
2	External Data Access And Indexing In AsterixDB	Abdullah A. Alamoudi, Raman Grover, Michael J. Carey, Vinayak Borkar	In this paper, we describe techniques to achieve the qualities offered by DBMSs when accessing external data.
3	Dynamic Resource Management In a Massively Parallel Stream Processing Engine	Kasper Grud Skat Madsen, Yongluan Zhou	In this paper, we propose an approach to integrate dynamic resource management with passive fault-tolerance mechanisms in a MPSPE so that we can harvest the checkpoints prepared for failure recovery to enhance the efficiency of dynamic load migrations.
4	A Parallel GPU-Based Approach to Clustering Very Fast Data Streams	Pengtao Huang, Xiu Li, Bo Yuan	In this paper, we present a parallel algorithm called PaStream, which is based on advanced Graphics Processing Unit (GPU) and follows the online-offline framework of CluStream.
5	Scalable Clustering Algorithm via a Triangle Folding Processing for Complex Networks	Ying Kang, Xiaoyan Gu, Weiping Wang, Dan Meng	In this paper, we propose a scalable clustering algorithm via a triangle folding processing for complex networks(SCAFT).
6	Understanding the Impact of the Role Factor in Collaborative Information Retrieval	Lynda Tamine, Laure Soulier	In this paper, we investigate whether and how different factors, such as users’ behavior, search strategies, and effectiveness, are related to role assignment within a collaborative exploratory search.
7	Experiments with a Venue-Centric Model for Personalisedand Time-Aware Venue Suggestion	Romain Deveaud, M-Dyaa Albakour, Craig Macdonald, Iadh Ounis	In contrast, in this paper, we introduce a venue-centric yet personalised probabilistic approach that suggests personalised and popular venues for users to visit in the near future.
8	Search Result Diversification Based on Hierarchical Intents	Sha Hu, Zhicheng Dou, Xiaojie Wang, Tetsuya Sakai, Ji-Rong Wen	In this paper, we introduce a new hierarchical structure to represent user intents and propose two general hierarchical diversification models to leverage hierarchical intents.
9	Category-Driven Approach for Local Related Business Recommendations	Yonathan Perez, Michael Schueppert, Matthew Lawlor, Shaunak Kishore	We address the problem of constructing a useful and diverse list of such recommendations that would include an optimal combination of substitutes and complements.
10	A Soft Computing Approach for Learning to Aggregate Rankings	Javier Alvaro Vargas Muñoz, Ricardo da Silva Torres, Marcos André Gonçalves	This paper presents an approach to combine rank aggregation techniques using a soft computing technique — Genetic Programming — in order to improve the results in Information Retrieval tasks.
11	Approximate String Matching by End-Users using Active Learning	Lutz Büch, Artur Andrzejak	To address this problem, we propose an Active Learning algorithm which selects a best performing similarity measure in a given set while optimizing a decision threshold.
12	A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank	Shoaib Jameel, Wai Lam, Steven Schockaert, Lidong Bing	In contrast, we propose a learning-to-rank framework which integrates the supervised learning of a maximum margin classifier with the discovery of a suitable probabilistic topic model.
13	Collaborating between Local and Global Learning for Distributed Online Multiple Tasks	Xin Jin, Ping Luo, Fuzhen Zhuang, Jia He, Qing He	Thus, in this paper a collaborative learning scheme is proposed for this problem.
14	Lifespan-based Partitioning of Index Structures for Time-travel Text Search	Animesh Nandi, Suriya Subramanian, Sriram Lakshminarasimhan, Prasad M. Deshpande, Sriram Raghavan	The problem we tackle is how to efficiently handle different query classes using the same index layout.
15	Contextual Text Understanding in Distributional Semantic Space	Jianpeng Cheng, Zhongyuan Wang, Ji-Rong Wen, Jun Yan, Zheng Chen	In this work, we propose a new framework for generating context-aware text representations without diving into the sense space.
16	External Knowledge and Query Strategies in Active Learning: a Study in Clinical Information Extraction	Mahnoosh Kholghi, Laurianne Sitbon, Guido Zuccon, Anthony Nguyen	This paper presents a new active learning query strategy for information extraction, called Domain Knowledge Informativeness (DKI).
17	Ranking Deep Web Text Collections for Scalable Information Extraction	Pablo Barrio, Luis Gravano, Chris Develder	In this paper, we focus on an especially valuable family of text sources, the so-called deep web collections, whose (remote) contents are only accessible via querying.
18	Forming Online Support Groups for Internet and Behavior Related Addictions	Chih-Ya Shen, Hong-Han Shuai, De-Nian Yang, Yi-Feng Lan, Wang-Chien Lee, Philip S. Yu, Ming-Syan Chen	We prove that MSSG is NP-Hard and inapproximable within any ratio, and design a 3-approximation algorithm with a guaranteed error bound.
19	Concept-Based Relevance Models for Medical and Semantic Information Retrieval	Chunye Wang, Ramakrishna Akella	Using this framework, we transform documents and queries from term space into concept space, and propose a concept-based relevance model for improved estimation of relevance.
20	PlateClick: Bootstrapping Food Preferences Through an Adaptive Visual Interface	Longqi Yang, Yin Cui, Fan Zhang, John P. Pollak, Serge Belongie, Deborah Estrin	In this paper, we propose PlateClick, a novel system that bootstraps food preference using a simple, visual quiz-based user interface.
21	Data Driven Water Pipe Failure Prediction: A Bayesian Nonparametric Approach	Peng Lin, Bang Zhang, Yi Wang, Zhidong Li, Bin Li, Yang Wang, Fang Chen	In this paper, we propose a Bayesian nonparametric approach, namely the Dirichlet process mixture of hierarchical beta process model, for water pipe failure prediction.
22	Tumblr Blog Recommendation with Boosted Inductive Matrix Completion	Donghyuk Shin, Suleyman Cetintas, Kuang-Chih Lee, Inderjit S. Dhillon	In this paper, we propose a novel boosted inductive matrix completion method (BIMC) for blog recommendation.
23	BiasWatch: A Lightweight System for Discovering and Tracking Topic-Sensitive Opinion Bias in Social Media	Haokai Lu, James Caverlee, Wei Niu	We propose a lightweight system for (i) semi-automatically discovering and tracking bias themes associated with opposing sides of a topic; (ii) identifying strong partisans who drive the online discussion; and (iii) inferring the opinion bias of "regular" participants.
24	Knowlywood: Mining Activity Knowledge From Hollywood Narratives	Niket Tandon, Gerard de Melo, Abir De, Gerhard Weikum	This paper presents a novel approach that taps into movie scripts and other narrative texts.
25	Entity and Aspect Extraction for Organizing News Comments	Radityo Eko Prasojo, Mouna Kacimi, Werner Nutt	In this work, we address the above problem by organizing comments around the entities and the aspects they discuss.
26	HDRF: Stream-Based Partitioning for Power-Law Graphs	Fabio Petroni, Leonardo Querzoni, Khuzaima Daudjee, Shahin Kamali, Giorgio Iacoboni	In this paper, we propose High-Degree (are) Replicated First (HDRF), a novel streaming vertex-cut graph partitioning algorithm that effectively exploits skewed degree distributions by explicitly taking into account vertex degree in the placement decision.
27	Towards Scale-out Capability on Social Graphs	Haichuan Shang, Xiang Zhao, Uday Kiran, Masaru Kitsuregawa	We propose a novel separator-combiner based query processing engine which provides native load-balancing and very low communication overhead, such that increasinglylarger graphs can be simply addressed by adding more computing nodes to the cluster.The proposed system achieves remarkable scale-out capability in processing large social graphs with skew degree distributions, while providing many critical features for big data analytics, such as easy-to-use API, fault-tolerance and recovery.
28	Identifying Top-	Mojtaba Rezvani, Weifa Liang, Wenzheng Xu, Chengfei Liu	In this paper, we formulate the problem as the top-k structural hole spanner problem.
29	Scalable Facility Location for Massive Graphs on Pregel-like Systems	Kiran Garimella, Gianmarco De Francisci Morales, Aristides Gionis, Mauro Sozio	We propose a new scalable algorithm for the facility-location problem.
30	Rank by Time or by Relevance?: Revisiting Email Search	David Carmel, Guy Halawi, Liane Lewin-Eytan, Yoelle Maarek, Ariel Raviv	In this paper, we study the current search traffic of Yahoo mail, a major Web commercial mail service, and discuss the limitations of ranking search results by date.
31	On the Cost of Extracting Proximity Features for Term-Dependency Models	Xiaolu Lu, Alistair Moffat, J. Shane Culpepper	In this paper we examine the processes used to compute these statistics.
32	An Optimization Framework for Merging Multiple Result Lists	Chia-Jung Lee, Qingyao Ai, W. Bruce Croft, Daniel Sheldon	In this paper, we study in depth and extend a neural network-based approach, LambdaMerge, for merging results of ranked lists drawn from one (i.e., data fusion) or more (i.e., collection fusion) verticals.
33	Searching and Stopping: An Analysis of Stopping Rules and Strategies	David Maxwell, Leif Azzopardi, Kalervo Järvelin, Heikki Keskustalo	In this paper, we undertake the first large scale study of stopping rules, investigating how they influence overall session performance, and which rules best match actual stopping behaviour.
34	Automated News Suggestions for Populating Wikipedia Entity Pages	Besnik Fetahu, Katja Markert, Avishek Anand	In this work, we therefore look at Wikipedia through the lens of news and propose a novel news-article suggestion task to improve news coverage in Wikipedia, and reduce the lag of newsworthy references.
35	Mining Coordinated Intent Representation for Entity Search and Recommendation	Huizhong Duan, ChengXiang Zhai	We propose a novel generative model to discover coordinated intent representations from the entity search logs.
36	Sentiment Extraction by Leveraging Aspect-Opinion Association Structure	Li Zhao, Minlie Huang, Jiashen Sun, Hengliang Luo, Xiankai Yang, Xiaoyan Zhu	In this paper, we investigate the aspect-opinion association structure, and propose a "first clustering, then extracting" unsupervised model to leverage properties of the structure for sentiment extraction.
37	Leveraging Joint Interactions for Credibility Analysis in News Communities	Subhabrata Mukherjee, Gerhard Weikum	News communities such as digg, reddit, or newstrust offer recommendations, reviews, quality ratings, and further insights on journalistic works.
38	Clustering-based Active Learning on Sensor Type Classification in Buildings	Dezhi Hong, Hongning Wang, Kamin Whitehouse	We propose a clustering-based active learning algorithm to differentiate sensors in buildings by type, e.g., temperature v.s. humidity.
39	gSparsify: Graph Motif Based Sparsification for Graph Clustering	Peixiang Zhao	In this paper, we propose gSparsify, a graph sparsification method, to preferentially retain a small subset of edges from a graph which are more likely to be within clusters, while eliminating others with less or no structure correlation to clusters.
40	Incomplete Multi-view Clustering via Subspace Learning	Qiyue Yin, Shu Wu, Liang Wang	In this paper, a novel incomplete multi-view clustering method is therefore developed, which learns unified latent representations and projection matrices for the incomplete multi-view data.
41	Robust Subspace Clustering via Tighter Rank Approximation	Zhao Kang, Chong Peng, Qiang Cheng	In this paper, an arctangent function is used as a tighter approximation to the rank function.
42	Interactive User Group Analysis	Behrooz Omidvar-Tehrani, Sihem Amer-Yahia, Alexandre Termier	Since user data is often sparse and noisy, we propose to produce labeled groups that describe users with common properties and develop IUGA, an interactive framework based on group discovery primitives to explore the user space.
43	Viewability Prediction for Online Display Ads	Chong Wang, Achir Kalra, Cristian Borcea, Yi Chen	We analyze a real-life dataset from a large publisher, identify a number of features that impact the scroll depth for a given user and a page, and propose a probabilistic latent class model that predicts the viewability of any given scroll depth for a user-page pair.
44	10 Bits of Surprise: Detecting Malicious Users with Minimum Information	Reza Zafarani, Huan Liu	In this study, we develop a methodology that identifies malicious users with limited information.
45	MAPer: A Multi-scale Adaptive Personalized Model for Temporal Human Behavior Prediction	Sarah Masud Preum, John A. Stankovic, Yanjun Qi	The primary objective of this research is to develop a simple and interpretable predictive framework to perform temporal modeling of individual user’s behavior traits based on each person’s past observed traits/behavior.
46	Classification with Active Learning and Meta-Paths in Heterogeneous Information Networks	Chang Wan, Xiang Li, Ben Kao, Xiao Yu, Quanquan Gu, David Cheung, Jiawei Han	We propose class-level meta-paths and study how they can be used to (1) build more accurate classifiers and (2) improve active learning in identifying objects for which training labels should be obtained.
47	Semantic Path based Personalized Recommendation on Weighted Heterogeneous Information Networks	Chuan Shi, Zhiqiang Zhang, Ping Luo, Philip S. Yu, Yading Yue, Bin Wu	In this paper, we are the first to propose the weighted HIN and weighted meta path concepts to subtly depict the path semantics through distinguishing different link attribute values.
48	A Graph-based Recommendation across Heterogeneous Domains	Deqing Yang, Jingrui He, Huazheng Qin, Yanghua Xiao, Wei Wang	To this end, in this paper, we propose a graph-based approach for recommendation across heterogeneous domains.
49	Query Relaxation across Heterogeneous Data Sources	Verena Kantere, George Orfanoudakis, Anastasios Kementsietsidis, Timos Sellis	In this paper, we propose a technique to compute query relaxations of an input query that can be rewritten and evaluated in an environment of collaborating autonomous and heterogeneous data sources.
50	Approximated Summarization of Data Provenance	Eleanor Ainy, Pierre Bourhis, Susan B. Davidson, Daniel Deutch, Tova Milo	Based on this notion, we present a novel provenance summarization algorithm which, based on the semantics of the underlying data and the intended use of provenance, outputs a summary of the input provenance.
51	An Integrated Bayesian Approach for Effective Multi-Truth Discovery	Xianzhi Wang, Quan Z. Sheng, Xiu Susie Fang, Lina Yao, Xiaofei Xu, Xue Li	Based on this insight, we propose an integrated Bayesian approach to the multi-truth-finding problem, by taking these features into account.
52	Approximate Truth Discovery via Problem Scale Reduction	Xianzhi Wang, Quan Z. Sheng, Xiu Susie Fang, Xue Li, Xiaofei Xu, Lina Yao	To address this issue, we propose an approximate truth discovery approach, which divides sources and values into groups according to a user-specified approximation criterion.
53	Organic or Organized?: Exploring URL Sharing Behavior	Cheng Cao, James Caverlee, Kyumin Lee, Hancheng Ge, Jinwook Chung	In this paper, we investigate the individual-based and group-based user behavior of URL sharing in social media toward uncovering these organic versus organized user groups.
54	Mining Brokers in Dynamic Social Networks	Chonggang Song, Wynne Hsu, Mong Li Lee	In this paper, we formally define the problem of detecting top-$k$ brokers given a social network and show that it is NP-hard.
55	Who Will You "@"?	Yeyun Gong, Qi Zhang, Xuyang Sun, Xuanjing Huang	In this paper, we present our work on building a recommendation system for the mention function in microblogging services.
56	Characterizing and Predicting Voice Query Reformulation	Ahmed Hassan Awadallah, Ranjitha Gurunath Kulkarni, Umut Ozertem, Rosie Jones	In this paper, we study the problem of voice query reformulation.
57	A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion	Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob Grue Simonsen, Jian-Yun Nie	We present a novel hierarchical recurrent encoder-decoder architecture that makes possible to account for sequences of previous queries of arbitrary lengths.
58	A Network-Aware Approach for Searching As-You-Type in Social Media	Paul Lagrée, Bogdan Cautis, Hossein Vahabi	We present in this paper a novel approach for as-you-type top-k keyword search over social media.
59	Improving Microblog Retrieval with Feedback Entity Model	Feifan Fan, Runwei Qiang, Chao Lv, Jianwu Yang	In this paper, we propose a feedback entity model and integrate it into an adaptive language modeling framework in order to improve the retrieval performance.
60	Extracting Situational Information from Microblogs during Disaster Events: a Classification-Summarization Approach	Koustav Rudra, Subham Ghosh, Niloy Ganguly, Pawan Goyal, Saptarshi Ghosh	The proposed framework takes into consideration the typicalities pertaining to disaster events where (i) the same tweet often contains a mixture of situational and non-situational information, and (ii) certain numerical information, such as number of casualties, vary rapidly with time, and thus achieves superior performance compared to state-of-the-art tweet summarization approaches.
61	Profession-Based Person Search in Microblogs: Using Seed Sets to Find Journalists	Mossaab Bagdouri, Douglas W. Oard	We introduce the problem of searching for professionals in microblogging platforms.
62	Learning Entity Types from Query Logs via Graph-Based Modeling	Jingyuan Zhang, Luo Jie, Altaf Rahman, Sihong Xie, Yi Chang, Philip S. Yu	In this paper, we study the problem of learning entity types from search query logs and address the following challenges: (1) queries are short texts, and information related to entities is usually very sparse; (2) large amounts of irrelevant information exists in search logs, bringing noise in detecting entity types.
63	Collaborative Prediction for Multi-entity Interaction With Hierarchical Representation	Qiang Liu, Shu Wu, Liang Wang	In this work, we propose a Hierarchical Interaction Representation (HIR) model, which models the mutual action among different entities as a joint representation.
64	Learning to Represent Knowledge Graphs with Gaussian Embedding	Shizhu He, Kang Liu, Guoliang Ji, Jun Zhao	Therefore, this paper switches to density-based embedding and propose KG2E for explicitly modeling the certainty of entities and relations, which learn the representations of KGs in the space of multi-dimensional Gaussian distributions.
65	Associative Classification with Statistically Significant Positive and Negative Rules	Jundong Li, Osmar Zaiane	To solve the above mentioned problems, we propose a novel associative classifier which is built upon both positive and negative classification association rules that show statistically significant dependencies.
66	A Min-Max Optimization Framework For Online Graph Classification	Peng Yang, Peilin Zhao	To solve this issue, we propose a more general min-max optimization framework for online graph node classification.
67	An Inference Approach to Basic Level of Categorization	Zhongyuan Wang, Haixun Wang, Ji-Rong Wen, Yanghua Xiao	In this paper, we introduce a method based on typicality and PMI for BLC.
68	Making Sense of Spatial Trajectories	Xiaofang Zhou, Kai Zheng, Hoyoung Jueng, Jiajie Xu, Shazia Sadiq	In this paper we will present a review of the extensive work in spatiotemporal data management and trajectory mining, and discuss new challenges and new opportunities in the context of new applications, focusing on recent advances in trajectory data management and trajectory mining from their foundations to high performance processing with modern computing infrastructure.
69	ReverseCloak: Protecting Multi-level Location Privacy over Road Networks	Chao Li, Balaji Palanisamy	This paper presents ReverseCloak, a new class of reversible location cloaking mechanisms that effectively support multi-level location privacy, allowing selective de-anonymization of the cloaking region to reduce the granularity of the perturbed location when suitable access credentials are provided.
70	GLUE: a Parameter-Tuning-Free Map Updating System	Hao Wu, Chuanchuan Tu, Weiwei Sun, Baihua Zheng, Hao Su, Wei Wang	Besides, we propose theoretical models behind all the important parameters to enable self-adaptive parameter setting.
71	A Cost-based Method for Location-Aware Publish/Subscribe Services	Minghe Yu, Guoliang Li, Jianhua Feng	To this end, in this paper we propose two novel indexing structures, mbrtrie and PKQ.
72	Probabilistic Forecasts of Bike-Sharing Systems for Journey Planning	Nicolas Gast, Guillaume Massonnet, Daniel Reijsbergen, Mirco Tribastone	Instead we introduce a new metric based on scoring rules.
73	Efficient Computation of Polynomial Explanations of Why-Not Questions	Nicole Bidoit, Melanie Herschel, Aikaterini Tzompanaki	Our first contribution is a general definition of a Why-Not explanation by means of a polynomial.
74	Interruption-Sensitive Empty Result Feedback: Rethinking the Visual Query Feedback Paradigm for Semistructured Data	Sourav S Bhowmick, Curtis Dyreson, Byron Choi, Min-Hwee Ang	In this paper, we rethink the traditional way of providing feedback.
75	Implementing Query Completeness Reasoning	Werner Nutt, Sergey Paramonov, Ognjen Savkovic	With this paper we make two main contributions: (i) we develop techniques to reason about the completeness of a query answer over a partially complete database, taking into account constraints that hold over the database, and (ii) we implement them by an encoding into logic programming paradigms.
76	Towards Scalable and Complete Query Explanation with OWL 2 EL Ontologies	Zhe Wang, Mahsa Chitsaz, Kewen Wang, Jianfeng Du	In this paper, we present a hybrid approach to achieve this.
77	Crowdsourcing Pareto-Optimal Object Finding By Pairwise Comparisons	Abolfazl Asudeh, Gensheng Zhang, Naeemul Hassan, Chengkai Li, Gergely V. Zaruba	It employs an iterative question-selection framework.
78	Practical Aspects of Sensitivity in Online Experimentation with User Engagement Metrics	Alexey Drutsa, Anna Ufliand, Gleb Gusev	We introduce the notion of Overall Acceptance Criterion (OAC) that includes both the components of an OEC and a statistical significance test.
79	Generalized Team Draft Interleaving	Eugene Kharitonov, Craig Macdonald, Pavel Serdyukov, Iadh Ounis	In this paper, we propose an interleaving framework that generalizes the previously studied interleaving methods in two aspects.
80	Exploiting Document Content for Efficient Aggregation of Crowdsourcing Votes	Martin Davtyan, Carsten Eickhoff, Thomas Hofmann	In this paper, we propose an alternative approach by relying on document information.
81	L2Knng: Fast Exact K-Nearest Neighbor Graph Construction with L2-Norm Pruning	David C. Anastasiu, George Karypis	We present L2Knng, an efficient algorithm that finds the exact cosine similarity k-nearest neighbor graph for a set of sparse high-dimensional objects.
82	Lingo: Linearized Grassmannian Optimization for Nuclear Norm Minimization	Qian Li, Wenjia Niu, Gang Li, Yanan Cao, Jianlong Tan, Li Guo	This paper proposes an efficient and accurate Linearized Grassmannian Optimization (Lingo) algorithm, which adopts matrix factorization and Grassmann manifold structure to alternatively minimize the subproblems.
83	Deep Collaborative Filtering via Marginalized Denoising Auto-encoder	Sheng Li, Jaya Kawale, Yun Fu	In particular, we propose a general deep architecture for CF by integrating matrix factorization with deep feature learning.
84	Improving Latent Factor Models via Personalized Feature Projection for One Class Recommendation	Tong Zhao, Julian McAuley, Irwin King	Therefore, in this paper we propose a novel personalized feature projection method to model users’ preferences over items.
85	Node Immunization over Infectious Period	Chonggang Song, Wynne Hsu, Mong Li Lee	We propose a NIIP algorithm to select $k$ nodes to immunize over a time period.
86	Enterprise Social Link Recommendation	Jiawei Zhang, Yuanhua Lv, Philip Yu	In this paper, we study this novel problem.
87	Exploiting Game Theoretic Analysis for Link Recommendation in Social Networks	Tong Zhao, H. Vicky Zhao, Irwin King	Therefore, in this paper, we study the problem of Exploiting Game Theoretic Analysis for Link Recommendation in Social Networks.
88	Extracting Interest Tags for Non-famous Users in Social Network	Wei He, Hongyan Liu, Jun He, Shu Tang, Xiaoyong Du	In this paper, we propose a modified topic model, Bi-Labeled LDA with a term weighting scheme, to extract interest tags for users in social network.
89	Robust Capped Norm Nonnegative Matrix Factorization: Capped Norm NMF	Hongchang Gao, Feiping Nie, Weidong Cai, Heng Huang	In this paper, we present a novel robust capped norm orthogonal Nonnegative Matrix Factorization model, which utilizes the capped norm for the objective to handle these extreme outliers.
90	MF-Tree: Matrix Factorization Tree for Large Multi-Class Learning	Lei Liu, Pang-Ning Tan, Xi Liu	To overcome these challenges, we propose a novel hierarchical learning method known as MF-Tree to efficiently classify data sets with large number of classes while simultaneously inducing a taxonomy structure that captures relationships among the classes.
91	GraRep: Learning Graph Representations with Global Structural Information	Shaosheng Cao, Wei Lu, Qiongkai Xu	In this paper, we present {GraRep}, a novel model for learning vertex representations of weighted graphs.
92	Context-Adaptive Matrix Factorization for Multi-Context Recommendation	Tong Man, Huawei Shen, Junming Huang, Xueqi Cheng	In this paper, we propose a context-adaptive matrix factorization method for multi-context recommendation by simultaneously modeling context-specific factors and entity-intrinsic factors in a unified model.
93	Personalized Trip Recommendation with POI Availability and Uncertain Traveling Time	Chenyi Zhang, Hongwei Liang, Ke Wang, Jianling Sun	This work presents efficient solutions to personalized trip recommendation by incorporating these constraints to prune the search space.
94	Range Search on Uncertain Trajectories	Liming Zhan, Ying Zhang, Wenjie Zhang, Xiaoyang Wang, Xuemin Lin	In particular, we propose a general framework for range search on uncertain trajectories following the filtering-and-refinement paradigm where summaries of uncertain trajectories are constructed to facilitate the filtering process.
95	Efficient Computation of Trips with Friends and Families	Tanzima Hashem, Sukarna Barua, Mohammed Eunus Ali, Lars Kulik, Egemen Tanin	In this paper, we develop both optimal and approximation algorithms for GTP queries for both Euclidean space and road networks.
96	Sampling Big Trajectory Data	Yanhua Li, Chi-Yin Chow, Ke Deng, Mingxuan Yuan, Jia Zeng, Jia-Dong Zhang, Qiang Yang, Zhi-Li Zhang	In this paper, we study the problem of approximate query processing for trajectory aggregate queries.
97	EsdRank: Connecting Query and Documents through External Semi-Structured Data	Chenyan Xiong, Jamie Callan	This paper presents EsdRank, a new technique for improving ranking using external semi-structured data such as controlled vocabularies and knowledge bases.
98	A Probabilistic Framework for Temporal User Modeling on Microblogs	Jitao Sang, Dongyuan Lu, Changsheng Xu	In this work, in the context of microblogs, we propose a unified probabilistic framework to simultaneously model the process of transient event detection and temporal user tweeting.
99	Deriving Intensional Descriptions for Web Services	Maria Koutraki, Dan Vodislav, Nicoleta Preda	In this paper, we model an API method as a view with binding patterns over a global RDF schema.
100	An Optimization Framework for Propagation of Query-Document Features by Query Similarity Functions	Maxim Zhukovskiy, Tsimafei Khatkevich, Gleb Gusev, Pavel Serdyukov	In this paper, we propose new algorithms that facilitate and increase the effectiveness of this propagation.
101	Rank Consistency based Multi-View Learning: A Privacy-Preserving Approach	Han-Jia Ye, De-Chuan Zhan, Yuan Miao, Yuan Jiang, Zhi-Hua Zhou	In this paper, we propose a novel multi-view learning framework which works in a hybrid fusion manner.
102	Differentially Private Histogram Publication for Dynamic Datasets: an Adaptive Sampling Approach	Haoran Li, Li Xiong, Xiaoqian Jiang, Jinfei Liu	In this paper, we address the problem of releasing series of dynamic datasets in real time with differential privacy, using a novel adaptive distance-based sampling approach.
103	WaveCluster with Differential Privacy	Ling Chen, Ting Yu, Rada Chirkova	In this paper, we investigate techniques to perform WaveCluster while ensuring differential privacy.Our goal is to develop a general technique for achieving differential privacy on WaveCluster that accommodates different wavelet transforms.
104	Process-Driven Data Privacy	Weiyi Xia, Murat Kantarcioglu, Zhiyu Wan, Raymond Heatherly, Yevgeniy Vorobeychik, Bradley Malin	We introduce a principled approach to explicitly model the attack process as a series of steps.
105	Unsupervised Feature Selection on Data Streams	Hao Huang, Shinjae Yoo, Shiva Prasad Kasiviswanathan	In this paper, we introduce a novel unsupervised feature selection approach on data streams that selects important features by making only one pass over the data while utilizing limited storage.
106	Unsupervised Streaming Feature Selection in Social Media	Jundong Li, Xia Hu, Jiliang Tang, Huan Liu	In this paper, we study a novel problem to conduct unsupervised streaming feature selection for social media data.
107	Weighted Similarity Estimation in Data Streams	Konstantin Kutzkov, Mohamed Ahmed, Sofia Nikitaki	Motivated by applications such as collaborative filtering in large-scale recommender systems, and influence probabilities learning in social networks, we present new randomized algorithms for the estimation of weighted similarity in data streams.
108	Private Analysis of Infinite Data Streams via Retroactive Grouping	Rui Chen, Yilin Shen, Hongxia Jin	In this paper, we consider the problem of private analysis of infinite data streams under differential privacy.
109	Parallel Lazy Semi-Naive Bayes Strategies for Effective and Efficient Document Classification	Felipe Viegas, Marcos André Gonçalves, Wellington Martins, Leonardo Rocha	In this paper, we investigate whether the relaxation of the NB feature independence assumption (aka, Semi-NB approaches) can improve its effectiveness in large text collections.
110	A Novel Class Noise Estimation Method and Application in Classification	Lin Gui, Qin Lu, Ruifeng Xu, Minglei Li, Qikang Wei	In this paper, we propose a method to estimate class noise rate at the level of individual samples in real data. In this paper, we first present the problem of binary classification in the presence of random noise on the class labels, which we call class noise.
111	Learning Task Grouping using Supervised Task Space Partitioning in Lifelong Multitask Learning	Meenakshi Mishra, Jun Huan	In this paper, we propose learning functions to model the task relationships as it is computationally cheaper in an online setting.
112	KSGM: Keynode-driven Scalable Graph Matching	Xilun Chen, K. Selçuk Candan, Maria Luisa Sapino, Paulo Shakarian	In this paper we note that the expensive refinement phase of graph matching algorithms is not practical in any application where scalability is critical.
113	Protecting Your Children from Inappropriate Content in Mobile Apps: An Automatic Maturity Rating Framework	Bing Hu, Bin Liu, Neil Zhenqiang Gong, Deguang Kong, Hongxia Jin	In this work, we aim to design and build a machine learning framework to automatically predict maturity levels for mobile Apps and the associated reasons with a high accuracy and a low cost.
114	The Role of Query Sessions in Interpreting Compound Noun Phrases	Marius Pasca	The Role of Query Sessions in Interpreting Compound Noun Phrases
115	Deep Semantic Frame-Based Deceptive Opinion Spam Analysis	Seongsoon Kim, Hyeokyoon Chang, Seongwoon Lee, Minhwan Yu, Jaewoo Kang	In this paper, we propose a frame-based deep semantic analysis method for understanding rich characteristics of deceptive and truthful opinions written by various types of individuals including crowdsourcing workers, employees who have expert-level domain knowledge about local businesses, and online users who post on Yelp and TripAdvisor.
116	Topic Modeling in Semantic Space with Keywords	Xiaojia Pu, Rong Jin, Gangshan Wu, Dingyi Han, Gui-Rong Xue	In this paper, for the information need about a topic or category, we propose a novel method called TDCS(Topic Distilling with Compressive Sensing) for explicit and accurate modeling the topic implied by several keywords.
117	F1: Accelerating the Optimization of Aggregate Continuous Queries	Anatoli U. Shein, Panos K. Chrysanthis, Alexandros Labrinidis	In this paper we propose a novel closed formula, F1, that accelerates Weavability calculations, and thus allows WeaveShare to achieve exceptional scalability in systems with heavy workloads.
118	Fast Distributed Correlation Discovery Over Streaming Time-Series Data	Tian Guo, Saket Sathe, Karl Aberer	To tackle the challenge, we propose a framework called AEGIS.
119	Time Series Analysis of Nursing Notes for Mortality Prediction via a State Transition Topic Model	Yohan Jo, Natasha Loghmanpour, Carolyn Penstein Rosé	We propose a time series model that uncovers the temporal dynamics of patients’ underlying states from nursing notes.
120	Learning Relative Similarity from Data Streams: Active Online Learning Approaches	Shuji Hao, Peilin Zhao, Steven C.H. Hoi, Chunyan Miao	To overcome the limitation, we propose a novel framework of active online similarity learning.
121	Ad Hoc Monitoring of Vocabulary Shifts over Time	Tom Kenter, Melvin Wevers, Pim Huijnen, Maarten de Rijke	In this paper, we propose an algorithm for monitoring shifts in vocabulary over time, given a small set of seed terms. As the task of monitoring shifting vocabularies over time for an ad hoc set of seed words is, to the best of our knowledge, a new one, we construct our own evaluation set.
122	Balancing Novelty and Salience: Adaptive Learning to Rank Entities for Timeline Summarization of High-impact Events	Tuan A. Tran, Claudia Niederee, Nattiya Kanhabua, Ujwal Gadiraju, Avishek Anand	In this work, we present a novel approach for timeline summarization of high-impact events, which uses entities instead of sentences for summarizing the event at each individual point in time.
123	Location-Based Influence Maximization in Social Networks	Tao Zhou, Jiuxin Cao, Bo Liu, Shuai Xu, Ziqing Zhu, Junzhou Luo	In this paper, we aim at the product promotion in O2O model and carry out the research of location-based influence maximization on the platform of LBSN.
124	Location and Time Aware Social Collaborative Retrieval for New Successive Point-of-Interest Recommendation	Wei Zhang, Jianyong Wang	In order to solve this problem, we propose a new model called location and time aware social collaborative retrieval model (LTSCR), which has two distinct advantages: (1) it models the location, time, and social information simultaneously for the successive POI recommendation task; (2) it efficiently utilizes the merits of the collaborative retrieval model which leverages weighted approximately ranked pairwise (WARP) loss for achieving better top-n ranking results, just as the new successive POI recommendation task needs.
125	Where you Instagram?: Associating Your Instagram Photos with Points of Interest	Xutao Li, Tuan-Anh Nguyen Pham, Gao Cong, Quan Yuan, Xiao-Li Li, Shonali Krishnaswamy	In this paper, we propose to study the problem of mapping Instagram photos to points of interest.
126	Gradient-based Signatures for Efficient Similarity Search in Large-scale Multimedia Databases	Christian Beecks, Merih Seran Uysal, Judith Hermanns, Thomas Seidl	In this paper, we propose the concept of gradient-based signatures in order to aggregate content-based features of multimedia objects by means of generative models.
127	Cross-Modal Similarity Learning: A Low Rank Bilinear Formulation	Cuicui Kang, Shengcai Liao, Yonghao He, Jian Wang, Wenjia Niu, Shiming Xiang, Chunhong Pan	In this research, there are two critical issues: how to get rid of the heterogeneity between different modalities and how to match the cross-modal features of different dimensions.
128	Efficient Sparse Matrix Multiplication on GPU for Large Social Network Analysis	Yong-Yeon Jo, Sang-Wook Kim, Duck-Ho Bae	In this paper, we propose a GPU-based method for efficient sparse matrix multiplication through the parallel computing paradigm.
129	The Role Of Citation Context In Predicting Long-Term Citation Profiles: An Experimental Study Based On A Massive Bibliographic Text Dataset	Mayank Singh, Vikas Patidar, Suhansanu Kumar, Tanmoy Chakraborty, Animesh Mukherjee, Pawan Goyal	In this paper, we argue that features gathered from the citation contexts of the research papers can be very relevant for citation prediction.
130	Discovering Canonical Correlations between Topical and Topological Information in Document Networks	Yuan He, Cheng Wang, Changjun Jiang	In this paper, we simultaneously incorporate community detection and topic modeling in a unified framework, and appeal to Canonical Correlation Analysis (CCA) to capture the latent semantic correlations between the two heterogeneous latent factors, community and topic.
131	Chronological Citation Recommendation with Information-Need Shifting	Zhuoren Jiang, Xiaozhong Liu, Liangcai Gao	In this study, we propose a novel method called "Chronological Citation Recommendation" which assumes initial user information needs could shift while users are searching for papers in different time slices.
132	Answering Questions with Complex Semantic Constraints on Open Knowledge Bases	Pengcheng Yin, Nan Duan, Ben Kao, Junwei Bao, Ming Zhou	We propose using n-tuple assertions, which are assertions with an arbitrary number of arguments, and n-tuple open KB (nOKB), which is an open knowledge base of n-tuple assertions.
133	Inducing Space Dirichlet Process Mixture Large-Margin Entity RelationshipInference in Knowledge Bases	Sotirios P. Chatzis	In this paper, we focus on the problem of extending a given knowledge base by accurately predicting additional true facts based on the facts included in it.
134	Semi-Automated Exploration of Data Warehouses	Thibault Sellam, Emmanuel Müller, Martin Kersten	In this paper, we introduce Claude, a hypothesis generator for data warehouses.
135	Large-scale Knowledge Base Completion: Inferring via Grounding Network Sampling over Selected Instances	Zhuoyu Wei, Jun Zhao, Kang Liu, Zhenyu Qi, Zhengya Sun, Guanhua Tian	To resolve the limitations of the above two types of methods, we propose an approach through Inferring via Grounding Network Sampling over Selected Instances.
136	Large-Scale Analysis of Dynamics of Choice Among Discrete Alternatives	Andrew Tomkins	The work described in this talk is partly due to other researchers, and partly joint with various colleagues including Ashton Anderson, Ravi Kumar, Mohammad Mahdian, Bo Pang, Sergei Vassilvitskii and Erik Vee.
137	On Gapped Set Intersection Size Estimation	Chen Chen, Jianbin Qin, Wei Wang	In this paper, we consider a generalized problem for integer sets where, given a gap parameter δ, two elements are deemed as matches if their numeric difference equals δ or is within δ.
138	Inclusion Dependencies Reloaded	Henning Köhler, Sebastian Link	Resolving this conundrum we establish an optimal solution by identifying the desirable class of not-null inclusion dependencies (NNINDs) that subsumes simple and partial semantics as special cases, and whose associated implication problem has the same computational properties as inclusion dependencies in the relational model.
139	Comprehensible Models for Reconfiguring Enterprise Relational Databases to Avoid Incidents	Ioana Giurgiu, Mirela Botezatu, Dorothea Wiesmann	We propose using machine learning to understand how configuring a DBMS can lead to such high risk incidents. We collect historical data from three IT environments that run both IBM DB2 and Oracle DBMS.
140	An Optimal Online Algorithm For Retrieving Heavily Perturbed Statistical Databases In The Low-Dimensional Querying Model	Krzysztof Marcin Choromanski, Afshin Rostamizadeh, Umar Syed	We assume the distribution D is defined on the neighborhood of a low-dimensional manifold.
141	Aggregation of Crowdsourced Ordinal Assessments and Integration with Learning to Rank: A Latent Trait Model	Pavel Metrikov, Virgil Pavlu, Javed A. Aslam	To use such assessments for either evaluation or learning, we propose a new framework for the inference of true document relevance from crowdsourced data—one simpler than previous approaches and achieving better performance.
142	Weakly Supervised Natural Language Processing Framework for Abstractive Multi-Document Summarization: Weakly Supervised Abstractive Multi-Document Summarization	Peng Li, Weidong Cai, Heng Huang	In this paper, we propose a new weakly supervised abstractive news summarization framework using pattern based approaches.
143	Short Text Similarity with Word Embeddings	Tom Kenter, Maarten de Rijke	We propose to go from word-level to text-level semantics by combining insights from methods based on external sources of semantic knowledge with word embeddings.
144	Building Representative Composite Items	VIncent Leroy, Sihem Amer-Yahia, Eric Gaussier, Hamid Mirisaee	We formalize building representative CIs as an optimization problem and propose KFC, an extended fuzzy clustering algorithm to solve it.
145	More Accurate Question Answering on Freebase	Hannah Bast, Elmar Haussmann	We evaluate our system, called Aqqu, on two standard benchmarks, Free917 and WebQuestions, improving the previous best result for each benchmark considerably.
146	Improving Ranking Consistency for Web Search by Leveraging a Knowledge Base and Search Logs	Jyun-Yu Jiang, Jing Liu, Chin-Yew Lin, Pu-Jen Cheng	In this paper, we propose a new idea called ranking consistency in web search.
147	Assessing the Impact of Syntactic and Semantic Structures for Answer Passages Reranking	Kateryna Tymoshenko, Alessandro Moschitti	In this paper, we extensively study the use of syntactic and semantic structures obtained with shallow and deeper syntactic parsers in the answer passage reranking task.
148	Ranking Entities for Web Queries Through Text and Knowledge	Michael Schuhmacher, Laura Dietz, Simone Paolo Ponzetto	In this paper, we aim at automating this process by retrieving and ranking entities that are relevant to understand free-text web-style queries like Argentine British relations, which typically demand a set of heterogeneous entities with no specific target type like, for instance, Falklands_-War} or Margaret-_Thatcher, as answer.
149	What Is a Network Community?: A Novel Quality Function and Detection Algorithms	Atsushi Miyauchi, Yasushi Kawase	In this study, we introduce a novel quality function for a network community, which we refer to as the communitude.
150	DifRec: A Social-Diffusion-Aware Recommender System	Hossein Vahabi, Iordanis Koutsopoulos, Francesco Gullo, Maria Halkidi	In this work we take a step towards rethinking recommender systems by exploiting the anticipated social-network information diffusion and withholding recommendation of items that are expected to reach a user through sharing/re-posting.
151	Who With Whom And How?: Extracting Large Social Networks Using Search Engines	Stefan Siersdorfer, Philipp Kemkes, Hanno Ackermann, Sergej Zerr	In this paper, we introduce novel methodologies for query-based search engine mining, enabling efficient extraction of social networks from large amounts of Web data.
152	Modeling Individual-Level Infection Dynamics Using Social Network Information	Suppawong Tuarob, Conrad S. Tucker, Marcel Salathe, Nilam Ram	In this paper, we demonstrate how social media information can be incorporated into and improve upon traditional techniques used to model the dynamics of infectious diseases.
153	Finding Probabilistic k-Skyline Sets on Uncertain Data	Jinfei Liu, Haoyu Zhang, Li Xiong, Haoran Li, Jun Luo	We present an efficient algorithm for computing probabilistic k-skyline sets.
154	Ordering Selection Operators Under Partial Ignorance	Khaled H. Alyoubi, Sven Helmer, Peter T. Wood	The selectivities are modelled as intervals rather than exact values and we apply a concept from decision theory, the minimisation of the maximum regret, as a measure of optimality.
155	Querying Temporal Drifts at Multiple Granularities	Sofia Kleisarchaki, Sihem Amer-Yahia, Ahlame Douzal-Chouakria, Vassilis Christophides	In this paper, we adopt a query-based approach to drift detection.
156	Efficient Incremental Evaluation of Succinct Regular Expressions	Henrik Björklund, Wim Martens, Thomas Timm	In this paper we study the usage and effectiveness of the counting operator (or: limited repetition) in regular expressions.
157	Struggling and Success in Web Search	Daan Odijk, Ryen W. White, Ahmed Hassan Awadallah, Susan T. Dumais	We address this important issue using a mixed methods study using large-scale logs, crowd-sourced labeling, and predictive modeling.
158	Behavioral Dynamics from the SERP’s Perspective: What are Failed SERPs and How to Fix Them?	Julia Kiseleva, Jaap Kamps, Vadim Nikulin, Nikita Makarov	Our analysis of behavioral dynamics at the SERP level gives new insight in one of the primary causes of search failure due to temporal query intent drifts.
159	What Users Ask a Search Engine: Analyzing One Billion Russian Question Queries	Michael Völske, Pavel Braslavski, Matthias Hagen, Galina Lezina, Benno Stein	As an alternative, we propose a robust question query classification method that uses the labeled questions from a large community question answering platform (CQA) as a training set.
160	Does Vertical Bring more Satisfaction?: Predicting Search Satisfaction in a Heterogeneous Environment	Ye Chen, Yiqun Liu, Ke Zhou, Meng Wang, Min Zhang, Shaoping Ma	In this paper, we carry out a lab-based user study with specifically designed SERPs to determine how verticals with different qualities and presentation styles affect search satisfaction.
161	Characterizing and Predicting Viral-and-Popular Video Content	David Vallet, Shlomo Berkovsky, Sebastien Ardon, Anirban Mahanti, Mohamed Ali Kafaar	In this paper, we focus on the observable dependencies between the virality of video content on a micro-blogging social network (in this case, Twitter) and the popularity of such content on a video distribution service (YouTube).
162	Social Spammer and Spam Message Co-Detection in Microblogging with Social Context Regularization	Fangzhao Wu, Jinyun Shu, Yongfeng Huang, Zhigang Yuan	In this paper, we propose a unified framework for social spammer and spam message co-detection in microblogging.
163	Central Topic Model for Event-oriented Topics Mining in Microblog Stream	Min Peng, Jiahui Zhu, Xuhui Li, Jiajia Huang, Hua Wang, Yanchun Zhang	In this paper, we propose a central topic model (CenTM), where a Multi-view Clustering algorithm with Two-phase Random Walk (MC-TRW) is devised to aggregate the LDA’s latent topics into central topics.
164	Video Popularity Prediction by Sentiment Propagation via Implicit Network	Wanying Ding, Yue Shang, Lifan Guo, Xiaohua Hu, Rui Yan, Tingting He	Here, we propose a Dual Sentimental Hawkes Process (DSHP) to cope with all the problems above.
165	Joint Modeling of User Check-in Behaviors for Point-of-Interest Recommendation	Hongzhi Yin, Xiaofang Zhou, Yingxia Shao, Hao Wang, Shazia Sadiq	In light of the above, we propose a joint probabilistic generative model to mimic user check-in behaviors in a process of decision making, which strategically integrates the above factors to effectively overcome the data sparsity, especially for out-of-town users.
166	ORec: An Opinion-Based Point-of-Interest Recommendation Framework	Jia-Dong Zhang, Chi-Yin Chow, Yu Zheng	In this paper, we propose an opinion-based POI recommendation framework called ORec to take full advantage of the user opinions on POIs expressed as tips.
167	Toward Dual Roles of Users in Recommender Systems	Suhang Wang, Jiliang Tang, Huan Liu	In this paper, we investigate how to exploit dual roles of users in recommender systems.
168	TriRank: Review-aware Explainable Recommendation by Modeling Aspects	Xiangnan He, Tao Chen, Min-Yen Kan, Xiao Chen	Aside from users’ ratings, their affiliated reviews often provide the rationale for their ratings and identify what aspects of the item they cared most about.
169	RoadRank: Traffic Diffusion and Influence Estimation in Dynamic Urban Road Networks	Tarique Anwar, Chengfei Liu, Hai L. Vu, Md. Saiful Islam	In this work, we propose RoadRank, an algorithm to compute the influence scores of each road segment in an urban road network, and rank them based on their overall influence.
170	On Query-Update Independence for SPARQL	Nicola Guido, Pierre Genevès, Nabil Layaïda, Cécile Roisin	This paper investigates techniques for detecting independence of SPARQL queries from updates.
171	A Structured Query Model for the Deep Relational Web	Hasan M. Jamil, Hosagrahar V. Jagadish	In this paper, we describe an ongoing research of a generic structured query model that can be used against the deep web.
172	A Flash-aware Buffering Scheme using On-the-fly Redo	Kyosung Jeong, Sang-Wook Kim, Sungchae Lim	In this paper, we address how to reduce the amount of page updates in flash-based DBMS equipped with SSD (Solid State Drive).
173	Defragging Subgraph Features for Graph Classification	Haishuai Wang, Peng Zhang, Ivor Tsang, Ling Chen, Chengqi Zhang	In this paper, we propose a new Subgraph Join Feature Selection (SJFS) algorithm.
174	Structural Constraints for Multipartite Entity Resolution with Markov Logic Network	Tengyuan Ye, Hady W. Lauw	We propose a principled solution to the multipartite entity resolution problem, building on the foundation of Markov Logic Network (MLN) that combines probabilistic graphical model and first-order logic.
175	Know Your Onions: Understanding the User Experience with the Knowledge Module in Web Search	Ioannis Arapakis, Luis A. Leiva, B. Barla Cambazoglu	Our work is an early attempt to bridge this gap.
176	Personalized Federated Search at LinkedIn	Dhruv Arya, Viet Ha-Thuc, Shakti Sinha	To tackle this problem, we exploit a data-driven approach that extracts searcher intents from their profile data and recent activities at a large scale.
177	Balancing Exploration and Exploitation: Empirical Parameterization of Exploratory Search Systems	Kumaripaba Ahukorala, Alan Medlar, Kalle Ilves, Dorota Glowacka	We present a user study to analyze how different exploration rates affect search performance, user satisfaction, and the number of documents selected.
178	On Predicting Deletions of Microblog Posts	Mossaab Bagdouri, Douglas W. Oard	This paper addresses the problem of deletion prediction by analyzing the distribution of deleted tweets, presenting a new evaluation framework, exploring tweet-based and user-based features, and reporting prediction scores.
179	Semi-Automated Text Classification for Sensitivity Identification	Giacomo Berardi, Andrea Esuli, Craig Macdonald, Iadh Ounis, Fabrizio Sebastiani	We use a recently proposed utility-theoretic approach to SATC that explicitly optimizes the chosen effectiveness function when ranking the documents by sensitivity; this is especially useful in our case, since sensitivity identification is a recall-oriented task, thus requiring the use of a recall-oriented evaluation measure such as F₂.
180	Identification of Microblogs Prominent Users during Events by Learning Temporal Sequences of Features	Imen Bizid, Nibal Nayef, Patrice Boursier, Sami Faiz, Antoine Doucet	This work proposes a probabilistic model for the identification of prominent users in microblogs during specific events.
181	A Real-Time Eye Tracking Based Query Expansion Approach via Latent Topic Modeling	Yongqiang Chen, Peng Zhang, Dawei Song, Benyou Wang	In this paper, we propose a real-time eye tracking based query expansion method, which is able to: (1) automatically capture the terms that the user is viewing by utilizing eye tracking techniques; (2) derive the user’s latent intent based on the eye tracking terms and by using the Latent Dirichlet Allocation (LDA) approach.
182	Clustered Semi-Supervised Relevance Feedback	Kripabandhu Ghosh, Swapan Kumar Parui	In this paper, we consider an intermediate, semi-supervised scheme, in which only a subset of results is selected for annotation, and then their labels are propagated to their nearest neighbours.
183	On the Effect of "Stupid" Search Components on User Interaction with Search Engines	Lidia Grauer, Aleksandra Lomakina	Using eye-tracking, we investigate how searchers interact with Web search engines which get affected by nonsensical results.
184	Social-Relational Topic Model for Social Networks	Weiyu Guo, Shu Wu, Liang Wang, Tieniu Tan	To address the above limitations, we propose a novel Social-Relational Topic Model (SRTM), which can alleviate the effect of topic-irrelevant links by analyzing relational users’ topics of each link.
185	Building Effective Query Classifiers: A Case Study in Self-harm Intent Detection	Ashiqur R. KhudaBukhsh, Paul N. Bennett, Ryen W. White	We address a common scenario in designing such triggers for real-world settings where positives are rare and search providers possess only a small seed set of positive examples to learn query classification models.
186	Modelling the Usefulness of Document Collections for Query Expansion in Patient Search	Nut Limsopatham, Craig Macdonald, Iadh Ounis	In this work, we investigate two automatic approaches that measure and leverage the usefulness of document collections when exploiting multiple document collections to improve query representation.
187	A Convolutional Click Prediction Model	Qiang Liu, Feng Yu, Shu Wu, Liang Wang	In this work, we propose a novel model, Convolutional Click Prediction Model (CCPM), based on convolution neural network.
188	A Study of Query Length Heuristics in Information Retrieval	Yuanhua Lv	In this paper, we reveal that query length actually interacts with term frequency (TF) normalization, a key component of all effective retrieval models.
189	Detect Rumors Using Time Series of Social Context Information on Microblogging Websites	Jing Ma, Wei Gao, Zhongyu Wei, Yueming Lu, Kam-Fai Wong	In this study, we propose a novel approach to capture the temporal characteristics of these features based on the time series of rumor’s lifecycle, for which time series modeling technique is applied to incorporate various social context information.
190	Query Auto-Completion for Rare Prefixes	Bhaskar Mitra, Nick Craswell	In particular, we describe a candidate generation approach using frequently observed query suffixes mined from historical search logs.
191	Pooled Evaluation Over Query Variations: Users are as Diverse as Systems	Alistair Moffat, Falk Scholer, Paul Thomas, Peter Bailey	Therefore an approach called pooling is typically used where, for example, the documents to be judged can be determined by taking the union of all documents returned in the top positions of the answer lists returned by a range of systems.
192	The Influence of Pre-processing on the Estimation of Readability of Web Documents	João Rafael de Moura Palotti, Guido Zuccon, Allan Hanbury	This paper investigates the effect that text pre-processing approaches have on the estimation of the readability of web pages.
193	Atypical Queries in eCommerce	Neeraj Pradhan, Vinay Deolalikar, Kang Li	In this paper, we use query-click log data to address the problem of identifying "atypical queries": these are queries that are extremal in terms of specificity, ambiguity, or breadth of intent.
194	Bottom-up Faceted Search: Creating Search Neighbourhoods with Datacube Cells	Mark Sifer	This paper extends this approach to curated corpora that contain items or documents that have been classified in multiple dimensions (facets), where each dimension classification may be a hierarchy.
195	Personalized Recommendation Meets Your Next Favorite	Qiang Song, Jian Cheng, Ting Yuan, Hanqing Lu	In this paper, we propose a unified model, namely States Transition pAir-wise Ranking Model (STAR), to address users’ favorites mining for sequential-set recommendation.
196	Recommending Short-lived Dynamic Packages for Golf Booking Services	Robin Swezey, Young-joo Chung	We introduce an approach to recommending short-lived dynamic packages for golf booking services.
197	Large-Scale Question Answering with Joint Embedding and Proof Tree Decoding	Zhenghao Wang, Shengquan Yan, Huaming Wang, Xuedong Huang	We frame the problem from a proof-theoretic perspective, and formulate it as a proof tree search problem that seamlessly unifies semantic parsing, logic reasoning, and answer ranking.
198	Query Length, Retrievability Bias and Performance	Colin Wilkie, Leif Azzopardi	In this paper, we examine whether there are benefits of longer queries beyond performance.
199	Gauging Correct Relative Rankings For Similarity Search	Weiren Yu, Julie McCann	In this paper, we propose efficient ranking criteria that can secure correct relative orders of node-pairs with respect to SimRank scores when they are computed in an iterative fashion.
200	Learning User Preferences for Topically Similar Documents	Mustafa Zengin, Ben Carterette	In this study, we collect user preference judgements of web document similarity in order to investigate: (1) the correlation between similarity measures and users’ perception of similarity, (2) the correlation between the web document features plus document-query features and users’ similarity judgements.
201	Modeling Parameter Interactions in Ranking SVM	Yaogong Zhang, Jun Xu, Yanyan Lan, Jiafeng Guo, Maoqiang Xie, Yalou Huang, Xueqi Cheng	This paper aims to answer the question.
202	Best First Over-Sampling for Multilabel Classification	Xusheng Ai, Jian Wu, Victor S. Sheng, Yufeng Yao, Pengpeng Zhao, Zhiming Cui	In this paper we propose a MultiLabel Best First Over-sampling (ML-BFO) to improve the performance of multilabel classification algorithms, based on imbalance minimization and Wilson’s ENN rule.
203	Co-clustering Document-term Matrices by Direct Maximization of Graph Modularity	Melissa Ailem, François Role, Mohamed Nadif	We present Coclus, a novel diagonal co-clustering algorithm which is able to effectively co-cluster binary or contingency matrices by directly maximizing an adapted version of the modularity measure traditionally used for networks.
204	A Data-Driven Approach to Distinguish Cyber-Attacks from Physical Faults in a Smart Grid	Adnan Anwar, Abdun Naser Mahmood, Zubair Shah	In this paper, we utilize a data-driven approach to accurately differentiate the physical faults from cyber-attacks. First, we create a realistic dataset by generating different types of faults and cyber-attacks on the IEEE 30 bus benchmark test system.
205	Improving Event Detection by Automatically Assessing Validity of Event Occurrence in Text	Andrea Ceroni, Ujwal Kumar Gadiraju, Marco Fisichella	In this paper, we automatize event validation, defined as the task of determining whether a given event occurs in a given document or corpus.
206	DAAV: Dynamic API Authority Vectors for Detecting Software Theft	Dong-Kyu Chae, Sang-Wook Kim, Seong-Je Cho, Yesol Kim	This paper proposes a novel birthmark, a dynamic API authority vector (DAAV), for detecting software theft.
207	Towards Multi-level Provenance Reconstruction of Information Diffusion on Social Media	Tom De Nies, Io Taxidou, Anastasia Dimou, Ruben Verborgh, Peter M. Fischer, Erik Mannens, Rik Van de Walle	Therefore in this paper, we propose an approach to reconstruct the provenance of messages on social media on multiple levels.
208	Profiling Pedestrian Distribution and Anomaly Detection in a Dynamic Environment	Minh Tuan Doan, Sutharshan Rajasegarar, Mahsa Salehi, Masud Moshtaghi, Christopher Leckie	In this paper we model the normal behaviours of pedestrian flows and detect anomalous events from pedestrian counting data of the City of Melbourne.
209	A Clustering-based Approach to Detect Probable Outcomes of Lawsuits	Daniel Lemes Gribel, Maira Gatti de Bayser, Leonardo Guerreiro Azevedo	This work proposes an approach to identify possible judgment outcomes that considers the use of similarity calculations and clustering mechanisms based on lawsuits patterns.
210	Detecting Check-worthy Factual Claims in Presidential Debates	Naeemul Hassan, Chengkai Li, Mark Tremayne	Specifically, we prepared a U.S. presidential debate dataset and built classification models to distinguish check-worthy factual claims from non-factual claims and unimportant factual claims.
211	Where You Go Reveals Who You Know: Analyzing Social Ties from Millions of Footprints	Hsun-Ping Hsieh, Rui Yan, Cheng-Te Li	This paper aims to investigate how the geographical footprints of users correlate to their social ties.
212	Message Clustering based Matrix Factorization Model for Retweeting Behavior Prediction	Bo Jiang, Jiguang Liang, Ying Sha, Lihong Wang	In this paper, we propose two message clustering based matrix factorization models for retweeting prediction.
213	Heterogeneous Multi-task Semantic Feature Learning for Classification	Xin Jin, Fuzhen Zhuang, Sinno Jialin Pan, Changying Du, Ping Luo, Qing He	In this paper, we study the problem of MTL with heterogeneous features for each task.
214	Top-k Reliable Edge Colors in Uncertain Graphs	Arijit Khan, Francesco Gullo, Thomas Wohler, Francesco Bonchi	To this end, we aim at designing effective and scalable solutions for the top-k reliable color set problem.
215	Probabilistic Non-negative Inconsistent-resolution Matrices Factorization	Masahiro Kohjima, Tatsushi Matsubayashi, Hiroshi Sawada	In this paper, we tackle with the problem of analyzing datasets with different resolution such as a pair of user’s individual data and user group’s data, for example "userA visited shopA 5 times" and "users whose attributes are men purchased itemA 80 times in total".
216	Identifying Attractive News Headlines for Social Media	Sawa Kourogi, Hiroyuki Fujishiro, Akisato Kimura, Hitoshi Nishikawa	This paper provides a novel solution to this problem by identifying attractive headlines as a gateway to news articles.
217	A Probabilistic Rating Auto-encoder for Personalized Recommender Systems	Huizhi Liang, Timothy Baldwin	In this paper, we propose a probabilistic rating auto-encoder to perform unsupervised feature learning and generate latent user feature profiles from large-scale user rating data.
218	Real-time Rumor Debunking on Twitter	Xiaomo Liu, Armineh Nourbakhsh, Quanzhi Li, Rui Fang, Sameena Shah	In this paper, we propose the first real time rumor debunking algorithm for Twitter.
219	Fraud Transaction Recognition: A Money Flow Network Approach	Renxin Mao, Zhao Li, Jinhua Fu	In this paper, we provide some insights into analysis of fraud transaction recognition on Alipay’s Money Flow Network.
220	Identifying Top-k Consistent News-Casters on Twitter	Sahisnu Mazumder, Sameep Mehta, Dhaval Patel	In this paper, we present a framework, NCFinder, to discover top-k consistent news-casters directly from Twitter.
221	Mining the Minds of Customers from Online Chat Logs	Kunwoo Park, Jaewoo Kim, Jaram Park, Meeyoung Cha, Jiin Nam, Seunghyun Yoon, Eunhee Rhim	This study investigates factors that may determine satisfaction in customer service operations.
222	A Fast k-Nearest Neighbor Search Using Query-Specific Signature Selection	Youngki Park, Heasoo Hwang, Sang-goo Lee	In this paper, we target on improving the performance of k-NN search and achieving a consistent k-NN search that performs well in various datasets.
223	Core-Sets For Canonical Correlation Analysis	Saurabh Paul	In this work, we consider the over-constrained case where the number of rows is greater than the number of columns (m > max(n,l)).
224	DeepCamera: A Unified Framework for Recognizing Places-of-Interest based on Deep ConvNets	Pai Peng, Hongxiang Chen, Lidan Shou, Ke Chen, Gang Chen, Chang Xu	In this work, we present a novel project called DeepCamera(DC) for recognizing places-of-interest(POI) with smartphones.
225	Structured Sparse Regression for Recommender Systems	Mingjie Qian, Liangjie Hong, Yue Shi, Suju Rajan	In this paper we employ rich features from both user and item sides to enhance latent factors learnt from interaction data, uncovering hidden structures from features’ relationships and learning sparse pairwise and tree structural connections among features.
226	Analyzing Document Intensive Business Processes using Ontology	Suman Roychoudhury, Vinay Kulkarni, Nikhil Bellarykar	In particular, this paper presents a real life example of a document intensive business process (International Trade) and attempts to model and analyze the process in a formal way.
227	Transductive Domain Adaptation with Affinity Learning	Le Shu, Longin Jan Latecki	We propose a novel method to solve domain adaptation task in a transductive setting.
228	Update Summarization using Semi-Supervised Learning Based on Hellinger Distance	Dingding Wang, Sahar Sohangir, Tao Li	In this paper, we propose a new method to generate the sentence similarity graph using a novel similarity measure based on Helliger distance and apply semi-supervised learning on the sentence graph to select the sentences with maximum consistency and minimum redundancy to form the summaries.
229	Multi-view Clustering via Structured Low-rank Representation	Dong Wang, Qiyue Yin, Ran He, Liang Wang, Tieniu Tan	In this paper, we present a novel solution to multi-view clustering through a structured low-rank representation.
230	Partially Labeled Data Tuple Can Optimize Multivariate Performance Measures	Jim Jing-Yan Wang, Xin Gao	In this paper, we show that the multivariate performance measures can also be optimized by learning from partially labeled data tuple, when the label tuple is incomplete.
231	Modeling Infinite Topics on Social Behavior Data with Spatio-temporal Dependence	Peng Wang, Peng Zhang, Chuan Zhou, Zhao Li, Guo Li	In this paper we present a new nonparametric Bayesian model Time and Space Dependent Chinese Restaurant Processes (TSD-CRP for short).
232	ASEM: Mining Aspects and Sentiment of Events from Microblog	Ruhui Wang, Weijing Huang, Wei Chen, Tengjiao Wang, Kai Lei	In this paper we propose a novel probabilistic generative model (ASEM) to simultaneously discover aspects and the specified opinions.
233	Enhanced Word Embeddings from a Hierarchical Neural Language Model	Xun Wang, Katsuhoto Sudoh, Masaaki Nagata	This paper proposes a neural language model to capture the interaction of text units of different levels, i.e..
234	Improving Label Quality in Crowdsourcing Using Noise Correction	Jing Zhang, Victor S. Sheng, Jian Wu, Xiaoqin Fu, Xindong Wu	This paper proposes a novel framework that introduces noise correction techniques to further improve label quality after ground truth inference in crowdsourcing.
235	Improving Collaborative Filtering via Hidden Structured Constraint	Qing Zhang, Houfeng Wang	To solve this problem, we propose a novel matrix factorization model with adaptive graph regularization framework, which can automatically discover latent user communities jointly with learning latent user representations, to enhance the discriminative power for recommendation.
236	DOLAP 2015 Workshop Summary	Carlos Garcia-Alvarado, Carlos Ordonez, Il-Yeol Song	The ACM DOLAP workshop presents research that bridges data warehousing, On-Line Analytical Processing (OLAP), and other large-scale data processing platforms.
237	DTMBIO 2015: International Workshop on Data and Text Mining in Biomedical Informatics	Min Song, Doheon Lee, Karin Verspoor	DTMBIO 2015: International Workshop on Data and Text Mining in Biomedical Informatics
238	ECol 2015: First international workshop on the Evaluation on Collaborative Information Seeking and Retrieval	Leif Azzopardi, Jeremy Pickens, Tetsuya Sakai, Laure Soulier, Lynda Tamine	The goal of this workshop is to investigate the evaluation challenges in CIS/CIR with the hope of building standardized evaluation frameworks, methodologies, and task specifications that would foster and grow the research area (in a collaborative fashion).
239	Eighth Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR’15)	Krisztian Balog, Jeffrey Dalton, Antoine Doucet, Yusra Ibrahim	We dedicate a special "annotations in action" track to demonstrations that showcase innovative prototype systems, in addition to the regular research and position paper contributions.
240	LSDS-IR’15: 2015 Workshop on Large-Scale and Distributed Systems for Information Retrieval	Ismail Sengor Altingovde, B. Barla Cambazoglu, Nicola Tonellotto	The LSDS-IR’15 workshop will provide space for researchers to discuss the existing performance problems in the context of large-scale and distributed information retrieval systems and define new research directions in the modern Big Data era.
241	NWSearch 2015: International Workshop on Novel Web Search Interfaces and Systems	Davood Rafiei, Katsumi Tanaka	In particular, the workshop seeks to identify some of the problems and challenges facing the development of such tools and interfaces and to flourish new ideas and findings that can shape or influence future research directions and developments.
242	PIKM 2015: The 8th ACM Workshop for Ph.D. Students in Information and Knowledge Management	Mouna Kacimi, Nicoleta Preda, Maya Ramanath	Similar to the CIKM, the PIKM workshop covers a wide range of topics in the areas of databases, information retrieval and knowledge management.
243	TM 2015 — Topic Models: Post-Processing and Applications Workshop	Nikolaos Aletras, Jey Han Lau, Timothy Baldwin, Mark Stevenson	The main objective of the workshop is to bring together researchers who are interested in applications of topic models and improving their output.
244	UCUI’15: The 1st International Workshop on Understanding the City with Urban Informatics	Yashar Moshfeghi, Iadh Ounis, Craig Macdonald, Joemon M. Jose, Peter Triantafillou, Mark Livingston, Piyushimita Thakuriah	The goal of the workshop is to provide a multidisciplinary forum which brings together researchers in Big Data (BD), Information Retrieval (IR), Data Mining, and Urban Studies, to explore novel solutions to the numerous theoretical, practical and ethical challenges arising in this context.