Paper Digest: CIKM 2013 Highlights

November 1, 2013June 26, 2020 admin

The ACM Conference on Information and Knowledge Management (CIKM) is an annual computer science research conference dedicated to information management and knowledge management.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: CIKM 2013 Papers

	Title	Authors	Highlight
1	Scholarly big data: information extraction and data mining	C. Lee Giles	We discuss scholarly big data challenges, insights, methodologies and applications.
2	Applying theory to practice	Ronald Fagin	We present the remarkably simple Threshold Algorithm, which is optimal in an extremely strong sense: optimal not just in the worst case, or in the average case, but in every case!
3	Usability in machine learning at scale with graphlab	Carlos Guestrin	In this talk, we will focus on: Examining common algorithmic patterns in distributed ML methods.
4	Structured data in web search	Alon Halevy	I will describe some of the efforts we are conducting at Google to collect structured data, filter the high-quality content, and serve it to our users.
5	One size does not fit all: multi-granularity search of web forums	Gayatree Ganu, Amélie Marian	In this paper, we address the problem of presenting textual search results in a concise manner to answer user needs.
6	Spatial search for K diverse-near neighbors	Gregory Ference, Wang-Chien Lee, Hui-Ju Jung, De-Nian Yang	In this paper, we investigate the problem of searching for the k Diverse-Near Neighbors (kDNNs)} in spatial space that is based upon the spatial diversity and proximity of candidate locations to the query point.
7	Mining a search engine’s corpus without a query pool	Mingyang Zhang, Nan Zhang, Gautam Das	In this paper, we study how to enable third-party data analytics over a search engine’s corpus without the cooperation of its owner – specifically, by issuing a small number of search queries through the web interface.
8	G-tree: an efficient index for KNN search on road networks	Ruicheng Zhong, Guoliang Li, Kian-Lee Tan, Lizhu Zhou	In this paper we study the problem of kNN search on road networks.
9	Efficient parsing-based search over structured data	Aditya Parameswaran, Raghav Kaushik, Arvind Arasu	In this paper, we present a suite of efficient algorithms and auxiliary indexes for this problem.
10	Graph-of-word and TW-IDF: new approach to ad hoc IR	François Rousseau, Michalis Vazirgiannis	In this paper, we introduce novel document representation (graph-of-word) and retrieval model (TW-IDF) for ad hoc IR.
11	Map search via a factor graph model	Qi Zhang, Jihua Kang, Yeyun Gong, Huan Chen, Yaqian Zhou, Xuanjing Huang	In this paper, we propose to connect this task to the semi-structured retrieval problem.
12	A phased ranking model for question answering	Rui Liu, Eric Nyberg	We propose an approach that allows each phase in a system to leverage information propagated from preceding phases to inform the ranking decision.
13	CRF framework for supervised preference aggregation	Maksims N. Volkovs, Richard S. Zemel	We describe procedures for learning in this modelnand demonstrate that inference can be done much more efficiently thannin analogous models.
14	CQArank: jointly model topics and expertise in community question answering	Liu Yang, Minghui Qiu, Swapna Gottipati, Feida Zhu, Jing Jiang, Huiping Sun, Zhong Chen	To tackle this cluster of closely related problems in a principled approach, we proposed Topic Expertise Model (TEM), a novel probabilistic generative model with GMM hybrid, to jointly model topics and expertise by integrating textual content model and link structure analysis.
15	Penguins in sweaters, or serendipitous entity search on user-generated content	Ilaria Bordino, Yelena Mejova, Mounia Lalmas	In this work, the content of each data source is represented as an entity network, which is further enriched with metadata about sentiment, writing quality, and topical category.
16	Entity-centric document filtering: boosting feature mapping through meta-features	Mianwei Zhou, Kevin Chen-Chuan Chang	Based on the insight that keywords sharing some similar "properties" should have similar importance for their respective entities, we propose a novel concept of meta-feature to map keywords from different entities.
17	Structured positional entity language model for enterprise entity retrieval	Chunliang Lu, Lidong Bing, Wai Lam	We investigate the problem of general entity retrieval for enterprise websites.
18	Learning relatedness measures for entity linking	Diego Ceccarelli, Claudio Lucchese, Salvatore Orlando, Raffaele Perego, Salvatore Trani	In this paper we address the problem of learning high quality entity relatedness functions.
19	Gem-based entity-knowledge maintenance	Bilyana Taneva, Gerhard Weikum	To overcome this limitation and accelerate the maintenance of knowledge bases, we propose an approach that automatically extracts, from the Web, key contents for given input entities.
20	Predicting user activity level in social networks	Yin Zhu, Erheng Zhong, Sinno Jialin Pan, Xiao Wang, Minzhe Zhou, Qiang Yang	In this paper, we focus on a fundamental task: to predict a user’s future activity levels in a social network, e.g. weekly activeness, active or inactive.
21	On popularity prediction of videos shared in online social networks	Haitao Li, Xiaoqiang Ma, Feng Wang, Jiangchuan Liu, Ke Xu	In this paper, we present an initial study on the popularity prediction of videos propagated in OSNs along friendship links.
22	Inferring anchor links across multiple heterogeneous social networks	Xiangnan Kong, Jiawei Zhang, Philip S. Yu	In this paper, we study the problem of anchor link prediction across multiple heterogeneous social networks, i.e., discovering the correspondence among different accounts of the same user.
23	Community-based user recommendation in uni-directional social networks	Gang Zhao, Mong Li Lee, Wynne Hsu, Wei Chen, Haoji Hu	In this work, we propose a community-based approach to user recommendation in Twitter-style social networks.
24	Personalized influence maximization on social networks	Jing Guo, Peng Zhang, Chuan Zhou, Yanan Cao, Li Guo	In this paper, we study a new problem on social network influence maximization.
25	Discovering coherent topics using general knowledge	Zhiyuan Chen, Arjun Mukherjee, Bing Liu, Meichun Hsu, Malu Castellanos, Riddhiman Ghosh	In this paper, we propose a framework to leverage the general knowledge in topic models.
26	Spatio-temporal and events based analysis of topic popularity in twitter	Sebastien Ardon, Amitabha Bagchi, Anirban Mahanti, Amit Ruhela, Aaditeshwar Seth, Rudra Mohan Tripathy, Sipat Triukose	We present the first comprehensive characterization of the diffusion of ideas on Twitter, studying more than 5.96 million topics that include both popular and less popular topics.
27	Domain-dependent/independent topic switching model for online reviews with numerical ratings	Yasutoshi Ida, Takuma Nakamura, Takashi Matsumoto	We propose a domain-dependent/independent topic switching model based on Bayesian probabilistic modeling for modeling online product reviews that are accompanied with numerical ratings provided by users.
28	A partially supervised cross-collection topic model for cross-domain text classification	Yang Bao, Nigel Collier, Anindya Datta	In this paper, we propose a model called Partially Supervised Cross-Collection LDA topic model (PSCCLDA) for cross-domain learning with the purpose of addressing these two issues in a unified way.
29	Content coverage maximization on word networks for hierarchical topic summarization	Chi Wang, Xiao Yu, Yanen Li, Chengxiang Zhai, Jiawei Han	We propose a new approach of text modeling via network analysis.
30	Mining frequent neighborhood patterns in a large labeled graph	Jialong Han, Ji-Rong Wen	We propose mining a new class of patterns called frequent neighborhood patterns, which is free from the "DCP-intuitiveness" dilemma of mining frequent subgraphs in a single graph.
31	A two-phase algorithm for mining sequential patterns with differential privacy	Luca Bonomi, Li Xiong	In this paper, we study the sequential pattern mining problem under the differential privacy framework which provides formal and provable guarantees of privacy.
32	Mining diabetes complication and treatment patterns for clinical decision support	Lu Liu, Jie Tang, Yu Cheng, Ankit Agrawal, Wei-keng Liao, Alok Choudhary	In this paper, we investigate how to utilize the heterogeneous medical records to aid the clinical treatments of diabetes mellitus.
33	Mining-based compression approach of propositional formulae	Said Jabbour, Lakhdar Sais, Yakoub Salhi, Takeaki Uno	In this paper, we propose a first application of data mining techniques to propositional satisfiability.
34	Correlating medical-dependent query features with image retrieval models using association rules	Hajer Ayadi, Mouna Torjmen, Mariam Daoud, Maher Ben Jemaa, Jimmy Xiangji Huang	In this paper, we propose a novel approach for finding correlations between medical query features and retrieval models based on association rule mining.
35	Local correlation detection with linearity enhancement in streaming data	Qing Xie, Shuo Shang, Bo Yuan, Chaoyi Pang, Xiangliang Zhang	This paper proposes effective methods to continuously detect the correlation between data streams.
36	Efficient processing of streaming graphs for evolution-aware clustering	Mindi Yuan, Kun-Lung Wu, Gabriela Jacques-Silva, Yi Lu	In this paper, we present an efficient approach to processing streaming graphs for evolution-aware clustering (EAC) of vertices.
37	Searching similar segments over textual event sequences	Liang Tang, Tao Li, Shu-Ching Chen, Shunzhi Zhu	In this paper, we propose a method, suffix matrix, for efficiently searching similar segments over textual event sequences.
38	RWS-Diff: flexible and efficient change detection in hierarchical data	Jan P. Finis, Martin Raiber, Nikolaus Augsten, Robert Brunel, Alfons Kemper, Franz Färber	We propose the random walks similarity (RWS) measure which can be used to find similar subtrees rapidly.
39	Causality and responsibility: probabilistic queries revisited in uncertain databases	Xiang Lian, Lei Chen	To efficiently process CR-PNN queries, we propose effective pruning strategies to quickly filter out false alarms, and design efficient algorithms to obtain CR-PNN answers.
40	Locality sensitive hashing for scalable structural classification and clustering of web documents	Christian Hachenberg, Thomas Gottron	We introduce a novel technique to support these two tasks: template fingerprints.
41	An index for efficient semantic full-text search	Hannah Bast, Björn Buchhold	In this paper we present a novel index data structure tailored towards semantic full-text search.
42	Load-sensitive selective pruning for distributed search	Daniele Broccolo, Craig Macdonald, Salvatore Orlando, Iadh Ounis, Raffaele Perego, Fabrizio Silvestri, Nicola Tonellotto	In this paper, we propose and evaluate a different approach, where, given a set of different query processing strategies with differing efficiency, each query is considered by a framework that sets a maximum query processing time and selects which processing strategy is the best for that query, such that the processing time for all queries is kept below the threshold.
43	Rank-energy selective query forwarding for distributed search systems	Amin Teymorian, Ophir Frieder, Marcus A. Maloof	We present a hybrid rank-energy query forwarding model termed "RESQ."
44	Augmenting web search surrogates with images	Robert Capra, Jaime Arguello, Falk Scholer	In this paper, we present results of two large-scale user studies to examine the effects of augmenting text-based surrogates with images extracted from the underlying webpage.
45	Building a large-scale corpus for evaluating event detection on twitter	Andrew J. McMinn, Yashar Moshfeghi, Joemon M. Jose	In this paper, we propose a methodology for the creation of an event detection corpus. Specifically, we first create a new corpus that covers a period of 4 weeks and contains over 120 million tweets, which we make available for research.
46	On sparsity and drift for effective real-time filtering in microblogs	M-Dyaa Albakour, Craig Macdonald, Iadh Ounis	In this paper, we approach the problem of real-time filtering in the Twitter Microblogging platform.
47	Probabilistic solutions of influence propagation on social networks	Miao Zhang, Chunni Dai, Chris Ding, Enhong Chen	In this paper, we emphasize the probabilistic nature of influence propagation.
48	Improving pseudo-relevance feedback via tweet selection	Taiki Miyanishi, Kazuhiro Seki, Kuniaki Uehara	To overcome the limitation of pseudo-relevance feedback for microblog search, we propose a novel query expansion method based on two-stage relevance feedback that models search interests by manual tweet selection and integration of lexical and temporal evidence into its relevance model.
49	Supporting exploratory people search: a study of factor transparency and user control	Shuguang Han, Daqing He, Jiepu Jiang, Zhen Yue	In this project, we developed PeopleExplorer, an interactive people search system to support exploratory search tasks when looking for people.
50	Location prediction in social media based on tie strength	Jeffrey McGee, James Caverlee, Zhiyuan Cheng	We propose a novel network-based approach for location estimation in social media that integrates evidence of the social tie strength between users for improved location estimation.
51	To stay or not to stay: modeling engagement dynamics in social graphs	Fragkiskos D. Malliaros, Michalis Vazirgiannis	In this paper, we build upon recent work in the field of game theory, where the behavior of individuals (nodes) is modeled by a technology adoption game.
52	UNIK: unsupervised social network spam detection	Enhua Tan, Lei Guo, Songqing Chen, Xiaodong Zhang, Yihong Zhao	UNIK: unsupervised social network spam detection
53	Modeling dynamics of meta-populations with a probabilistic approach: global diffusion in social media	Minkyoung Kim, David Newth, Peter Christen	In this paper, we propose a macro-level diffusion model with a probabilistic approach by combining both heterogeneity and structural connectivity of social networks.
54	Diffusion of innovations revisited: from social network to innovation network	Xin Rong, Qiaozhu Mei	In this paper, we take a formal quantitative approach to address how different pieces of innovations socialize with each other and how the interrelationships among innovations affect users’ adoption behavior, which provides a novel perspective of understanding the diffusion of innovations.
55	StaticGreedy: solving the scalability-accuracy dilemma in influence maximization	Suqi Cheng, Huawei Shen, Junming Huang, Guoqing Zhang, Xueqi Cheng	Motivated by this critical finding, we propose a static greedy algorithm, named StaticGreedy, to strictly guarantee the submodularity of influence spread function during the seed selection process.
56	Online multitasking and user engagement	Janette Lehmann, Mounia Lalmas, Georges Dupret, Ricardo Baeza-Yates	In this paper, we study the effect of online multitasking on two widely used engagement metrics designed to capture users browsing behavior with a site.
57	PATRIC: a parallel algorithm for counting triangles in massive networks	Shaikh Arifuzzaman, Maleq Khan, Madhav Marathe	In this paper, we present an efficient MPI-based distributed memory parallel algorithm, called PATRIC, for counting triangles in massive networks.
58	An efficient MapReduce algorithm for counting triangles in a very large graph	Ha-Myung Park, Chin-Wan Chung	In this paper, we propose a new algorithm based on graph partitioning with a novel idea of triangle classification to count the number of triangles in a graph.
59	Parallel motif extraction from very long sequences	Majed Sahli, Essam Mansour, Panos Kalnis	This paper presents ACME, a combinatorial method for extracting motifs from a single very long sequence.
60	The logical diversity of explanations in OWL ontologies	Samantha Bail, Bijan Parsia, Ulrike Sattler	In this paper, we introduce and explore several equivalence relations over justifications for entailments of OWL ontologies which partition a set of justifications into structurally similar subsets.
61	Ontology authoring with FORZA	C. Maria Keet, Muhammad Tahir Khan, Chiara Ghidini	We solve this with a generic approach and realize it with the Foundational Ontology and Reasoner-enhanced axiomatiZAtion (FORZA) method, containing DOLCE, a decision diagram for DOLCE categories, part-whole relations, and an automated reasoner that is used during the authoring process to propose feasible axioms.
62	Aligning freebase with the YAGO ontology	Elena Demidova, Iryna Oelze, Wolfgang Nejdl	In this paper we analyze the structure of YAGO in more depth and show how to match YAGO and Freebase categories.
63	PIDGIN: ontology alignment using web text as interlingua	Derry Wijaya, Partha Pratim Talukdar, Tom Mitchell	We present a novel approach to this ontology alignment problem that employs a very large natural language text corpus as an interlingua to relate different knowledge bases (KBs).
64	Mapping adaptation actions for the automatic reconciliation of dynamic ontologies	Julio Cesar Dos Reis, Duy Dinh, Cédric Pruski, Marcos Da Silveira, Chantal Reynaud-Delaître	In this article, we propose a set of mapping adaptation actions and present how they are used to maintain mappings up-to-date based on ontology change operations of different nature.
65	On mining mobile apps usage behavior for predicting apps usage in smartphones	Zhung-Xun Liao, Yi-Chin Pan, Wen-Chih Peng, Po-Ruey Lei	In this paper, we propose two selection algorithms, MaxProb and MinEntropy.
66	Ranking fraud detection for mobile apps: a holistic view	Hengshu Zhu, Hui Xiong, Yong Ge, Enhong Chen	To this end, in this paper, we provide a holistic view of ranking fraud and propose a ranking fraud detection system for mobile Apps.
67	AnchorMF: towards effective event context identification	Hansu Gu, Mike Gartrell, Liang Zhang, Qin Lv, Dirk Grunwald	In this work, we have developed AnchorMF, a matrix factorization based technique that aims to identify event context by leveraging a prevalent feature in OSNs, the anchor information.
68	How the live web feels about events	George Valkanas, Dimitrios Gunopulos	In this paper, we focus on the problem of automatically identifying events as they occur, in such a user-driven, fast paced and voluminous setting.
69	Boolean satisfiability for sequence mining	Said Jabbour, Lakhdar Sais, Yakoub Salhi	In this paper, we propose a SAT-based encoding for the problem of discovering frequent, closed and maximal patterns in a sequence of items and a sequence of itemsets.
70	Users versus models: what observation tells us about effectiveness metrics	Alistair Moffat, Paul Thomas, Falk Scholer	This work explores that link, by analyzing the assumptions and implications of a number of effectiveness metrics, and exploring how these relate to observable user behaviors.
71	Evaluating aggregated search using interleaving	Aleksandr Chuklin, Anne Schuth, Katja Hofmann, Pavel Serdyukov, Maarten de Rijke	We propose an interleaving algorithm that allows comparisons of search engine result pages containing grouped vertical documents.
72	Using historical click data to increase interleaving sensitivity	Eugene Kharitonov, Craig Macdonald, Pavel Serdyukov, Iadh Ounis	In this paper we propose a novel approach to further improve interleaving sensitivity by using pre-experimental user behaviour data.
73	On the reliability and intuitiveness of aggregated search metrics	Ke Zhou, Mounia Lalmas, Tetsuya Sakai, Ronan Cummins, Joemon M. Jose	In this paper, we compare the properties of existing AS metrics under the assumptions that (1) queries may have multiple preferred verticals; (2) the likelihood of each vertical preference is available; and (3) the topical relevance assessments of results returned from each vertical is available.
74	User intent and assessor disagreement in web search evaluation	Gabriella Kazai, Emine Yilmaz, Nick Craswell, S.M.M. Tahaghoghi	In this paper, we examine the relationship between assessor disagreement and various click based measures, such as click preference strength and user intent similarity, for judgments collected from editorial judges and crowd workers using single absolute, pairwise absolute and pairwise preference based judging methods.
75	The water filling model and the cube test: multi-dimensional evaluation for professional search	Jiyun Luo, Christopher Wing, Hui Yang, Marti Hearst	This paper proposes a 3D water filling model to describe this search process, and derives a new evaluation metric, the Cube Test, to encompass the complex nature of professional search.
76	Disinformation techniques for entity resolution	Steven Euijong Whang, Hector Garcia-Molina	We formalize the problem of finding the disinformation with the highest benefit given a limited budget for creating the disinformation and propose efficient algorithms for solving the problem.
77	Location recommendation for out-of-town users in location-based social networks	Gregory Ference, Mao Ye, Wang-Chien Lee	In this paper, we study the issues in making location recommendations for out-of-town users by taking into account user preference, social influence and geographical proximity.
78	Short text classification by detecting information path	Shitao Zhang, Xiaoming Jin, Dou Shen, Bin Cao, Xuetao Ding, Xiaochen Zhang	We propose a method to detect the information path and employ it in short text classification.
79	Personalized point-of-interest recommendation by mining users’ preference transition	Xin Liu, Yong Liu, Karl Aberer, Chunyan Miao	In this work, we propose a novel category-aware POI recommendation model, which exploits the transition patterns of users’ preference over location categories to improve location recommendation accuracy.
80	Proximity	Jannik Strötgen, Michael Gertz	In this paper, we present a new model to rank documents according to combined textual, temporal, and geographic queries.
81	Timely crawling of high-quality ephemeral new content	Damien Lefortier, Liudmila Ostroumova, Egor Samosvat, Pavel Serdyukov	We thus propose a new metric, well thought out for this task, which takes into account the decrease of user interest for ephemeral pages over time.
82	LearNext: learning to predict tourists movements	Ranieri Baraglia, Cristina Ioana Muntean, Franco Maria Nardini, Fabrizio Silvestri	In this paper, we tackle the problem of predicting the "next" geographical position of a tourist given her history (i.e., the prediction is done accordingly to the tourist’s current trail) by means of supervised learning techniques, namely Gradient Boosted Regression Trees and Ranking SVM.
83	Where shall we go today?: planning touristic tours with tripbuilder	Igo Brilhante, Jose Antonio Macedo, Franco Maria Nardini, Raffaele Perego, Chiara Renso	In this paper we propose TripBuilder, a new framework for personalized touristic tour planning.
84	Efficient filtering and ranking schemes for finding inclusion dependencies on the web	Atsuyuki Morishima, Erika Yumiya, Masami Takahashi, Shigeo Sugimoto, Hiroyuki Kitagawa	In this paper, we address the problem of finding inclusion dependencies on the Web.
85	A generic front-stage for semi-stream processing	M. Asif Naeem, Gerald Weber, Gillian Dobbie, Christof Lutteroth	We propose a caching approach that can be used as a front-stage for different semi-stream join algorithms, resulting in significant performance gains for common applications.
86	Scalable diversification of multiple search results	Hina A. Khan, Marina Drosou, Mohamed A. Sharaf	In this paper, we address the concurrent diversification of multiple search results using various approximation techniques that provide orders of magnitude reductions in processing cost, while maintaining comparable quality of diversification as compared to sequential methods.
87	Parallel triangle counting in massive streaming graphs	Kanat Tangwongsan, A. Pavan, Srikanta Tirthapura	This paper presents the design and implementation of a fast parallel algorithm for estimating the number of triangles in a massive undirected graph whose edges arrive as a stream.
88	Cache refreshing for online social news feeds	Xiao Bai, Flavio P. Junqueira, Adam Silberstein	We propose a novel cache scheme called SOCR (Social Online Cache Refreshing) for identifying and refreshing cache entries.
89	A new operator for efficient stream-relation join processing in data streaming engines	Roozbeh Derakhshan, Abdul Sattar, Bela Stantic	In this paper, we propose a new SRJ operator to facilitate SRJ processing regardless of the cache performance using two techniques: batching and out-of-order processing.
90	SCISSOR: scalable and efficient reachability query processing in time-evolving hierarchies	Phani Rohit Mullangi, Lakshmish Ramaswamy	In this paper, we propose SCISSOR (selective snapshot indexing with progressive solution refinement), which, to the best of our knowledge is the first time and space efficient framework for answering reachability queries in TEHs.
91	Towards metric fusion on multi-view data: a cross-view based graph random walk approach	Yang Wang, Xuemin Lin, Qing Zhang	In this paper, we propose a novel Metric Fusion technique via cross-view graph Random Walk, named MFRW, regarding a multi-view based similarity graphs (with each similarity graph constructed under each view).
92	Discovering latent blockmodels in sparse and noisy graphs using non-negative matrix factorisation	Jeffrey Chan, Wei Liu, Andrey Kan, Christopher Leckie, James Bailey, Kotagiri Ramamohanarao	In this paper, we propose a new non-negative matrix factorisation approach that can discover blockmodels in sparse and noisy graphs.
93	Understanding the roles of sub-graph features for graph classification: an empirical study perspective	Ting Guo, Xingquan Zhu	One of the most common graph classification approaches is to use sub-graph features to convert graphs into instance-feature representations, so generic learning algorithms can be applied to derive learning models.
94	PAGE: a partition aware graph computation engine	Yingxia Shao, Junjie Yao, Bin Cui, Lin Ma	In this paper, we analyse the cost of parallel graph computing systems as well as the relationship between the cost and underlying graph partitioning.
95	Active exploration: simultaneous sampling and labeling for large graphs	Meng Fang, Jie Yin, Xingquan Zhu	In this paper, we propose an Active Exploration framework for large graphs where the goal is to carry out network sampling and node labeling at the same time.
96	Local clustering in provenance graphs	Peter Macko, Daniel Margo, Margo Seltzer	Local clustering in these graphs, in which we start with a seed vertex and grow a cluster around it, is of paramount importance because it supports critical provenance applications such as identifying semantically meaningful tasks in an object’s history.
97	Content-centric flow mining for influence analysis in social streams	Karthik Subbian, Charu Aggarwal, Jaideep Srivastava	In this paper, we propose a fully content-centered model of flow analysis in social network streams, in which the analysis is based on actual content transmissions in the network, rather than a static model of transmission on the edges. First, we introduce the problem of information flow mining in social streams, and then propose a novel algorithm InFlowMine to discover the information flow patterns in the network.
98	Labels or attributes?: rethinking the neighbors for collective classification in sparsely-labeled networks	Luke K. McDowell, David W. Aha	We show that these effects are consistent across a range of datasets, learning choices, and inference algorithms, and that using both neighbor attributes and labels often produces the best accuracy.
99	Fast parameterless density-based clustering via random projections	Johannes Schneider, Michail Vlachos	We present two fast density-based clustering algorithms based on random projections.
100	Mining entity attribute synonyms via compact clustering	Yanen Li, Bo-June Paul Hsu, ChengXiang Zhai, Kuansan Wang	In this work, we propose a novel compact clustering framework to jointly identify synonyms for a set of attribute values.
101	Modeling interaction features for debate side clustering	Minghui Qiu, Liu Yang, Jing Jiang	This paper proposes a two-stage solution based on latent variable models: an interaction feature identification stage to mine interaction features from structured debate posts with known sides and reply intentions; and a clustering stage to incorporate interaction features and model the interplay between interactions and sides for debate side clustering.
102	Dynamic multi-faceted topic discovery in twitter	Jan Vosecky, Di Jiang, Kenneth Wai-Ting Leung, Wilfred Ng	In this paper, we therefore propose a method for mining multifaceted topics from Twitter streams.
103	Mining causal topics in text data: iterative topic modeling with time series feedback	Hyun Duk Kim, Malu Castellanos, Meichun Hsu, ChengXiang Zhai, Thomas Rietz, Daniel Diermeier	We develop a novel general text mining framework for discovering such causal topics from text.
104	Navigating the topical structure of academic search results via the Wikipedia category network	Daniil Mirylenka, Andrea Passerini	We propose a novel method of organizing the search results into concise and informative topic hierarchies.
105	A multimodal framework for unsupervised feature fusion	Xiaoyi Li, Jing Gao, Hui Li, Le Yang, Rohini K. Srihari	In this paper, we propose a multimodal feature fusion framework which can model any given image-description pair using semantically meaningful features.
106	Probabilistic semantic similarity measurements for noisy short texts using Wikipedia entities	Masumi Shirakawa, Kotaro Nakayama, Takahiro Hara, Shojiro Nishio	This paper describes a novel probabilistic method of measuring semantic similarity for real-world noisy short texts like microblog posts.
107	Linear-time enumeration of maximal K-edge-connected subgraphs in large networks by random contraction	Takuya Akiba, Yoichi Iwata, Yuichi Yoshida	In this paper, we propose a new method to decompose a graph into maximal k-edge-connected components, based on random contraction of edges.
108	External memory K-bisimulation reduction of big graphs	Yongming Luo, George H.L. Fletcher, Jan Hidders, Yuqing Wu, Paul De Bra	In this paper, we present, to our knowledge, the first known I/O efficient solutions for computing the k-bisimulation partition of a massive directed graph, and performing maintenance of such a partition upon updates to the underlying graph.
109	Querying graphs with preferences	Valeria Fionda, Giuseppe Pirro’	This paper presents GuLP a graph query language that enables to declaratively express preferences.
110	Network-aware search in social tagging applications: instance optimality versus efficiency	Silviu Maniu, Bogdan Cautis	We propose algorithms that have the potential to scale to current applications.
111	A comparison of two physical data designs for interactive social networking actions	Sumita Barahmand, Shahram Ghandeharizadeh, Jason Yap	This paper compares the performance of an SQL solution that implements a relational data model with a document store named MongoDB.
112	Community question topic categorization via hierarchical kernelized classification	Wen Chan, Weidong Yang, Jinhui Tang, Jintao Du, Xiangdong Zhou, Wei Wang	We present a hierarchical kernelized classification model for the automatic classification of general questions into their corresponding topic categories in community Question Answering service (cQAs).
113	Building structures from classifiers for passage reranking	Aliaksei Severyn, Massimo Nicosia, Alessandro Moschitti	This paper shows that learning to rank models can be applied to automatically learn complex patterns, such as relational semantic structures occurring in questions and their answer passages.
114	Uncovering collusive spammers in Chinese review websites	Chang Xu, Jie Zhang, Kuiyu Chang, Chong Long	Empirical analysis, on recently crawled product reviews from a popular Chinese e-commerce website, reveals the failure of many state-of-the-art spam indicators on detecting collusive spammers.
115	Towards minimizing the annotation cost of certified text classification	Mossaab Bagdouri, William Webber, David D. Lewis, Douglas W. Oard	Drawing on ideas from statistical power analysis, we present a framework for joint minimization of training and test annotation that maintains the statistical validity of effectiveness estimates, and yields a natural definition of an optimal allocation of annotations to training and test data.
116	A heterogenous automatic feedback semi-supervised method for image reranking	Xin-Chao Xu, Xin-Shun Xu, Yafang Wang, Xiaolin Wang	Motivated by this, in this paper, we propose the HAFSRerank–Heterogenous Automatic Feedback Semi-supervised Reranking method which makes use of both visual and textual features simultaneously during reranking.
117	Accurate and scalable nearest neighbors in large networks based on effective importance	Petko Bogdanov, Ambuj Singh	We propose a novel proximity measure for weighted graphs called Effective Importance which incorporates multiple paths between nodes and captures the inherent structural clusters within a network.
118	Spatial-temporal query homogeneity for KNN object search on road networks	Ying-Ju Chen, Kun-Ta Chuang, Ming-Syan Chen	We in this paper explore a new research paradigm, called query homogeneity, to process KNN queries on road networks for online LBS applications.
119	Discovering influential authors in heterogeneous academic networks by a co-ranking method	Qinxue Meng, Paul J. Kennedy	Faced with this problem, we propose a co–ranking method to evaluate scientific publications and authors.
120	Entity disambiguation in anonymized graphs using graph kernels	Linus Hermansson, Tommi Kerola, Fredrik Johansson, Vinay Jethava, Devdatt Dubhashi	This paper presents a novel method for entity disambiguation in anonymized graphs using local neighborhood structure.
121	Estimating the relative utility of networks for predicting user activities	Nina Mishra, Daniel M. Romero, Panayiotis Tsaparas	In this paper, we introduce a new related problem: given a collection of networks, how can we determine the relative importance of each network for predicting user activities?
122	Exploring weakly supervised latent sentiment explanations for aspect-level review analysis	Lei Fang, Minlie Huang, Xiaoyan Zhu	In this paper, we explore a new concept for aspect-level review analysis, latent sentiment explanations, which are defined as a set of informative aspect-specific sentences whose polarities are consistent with that of the review.
123	Using micro-reviews to select an efficient set of reviews	Thanh-Son Nguyen, Hady W. Lauw, Panayiotis Tsaparas	We propose a novel methodology that brings together these two diverse types of review content, to obtain something that is more than the sum of its parts.
124	Automatic construction of domain and aspect specific sentiment lexicons for customer review mining	Juergen Bross, Heiko Ehrig	We propose a novel method that allows to automatically adapt and extend existing lexicons to a specific product domain.
125	Wikification via link co-occurrence	Zhiyuan Cai, Kaiqi Zhao, Kenny Q. Zhu, Haixun Wang	In this paper, we present a simple but powerful framework of sense disambiguation using co-occurrences of Wikipedia links in the Wikipedia corpus.
126	Manipulation among the arbiters of collective intelligence: how wikipedia administrators mold public opinion	Sanmay Das, Allen Lavoie, Malik Magdon-Ismail	We find a surprisingly large number of editors who change their behavior and begin focusing more on a particular controversial topic once they are promoted to administrator status.
127	Robust question answering over the web of linked data	Mohamed Yahya, Klaus Berberich, Shady Elbassuoni, Gerhard Weikum	This paper advocates a new approach that allows questions to be partially translated into relaxed queries, covering the essential but not necessarily all aspects of the user’s input.
128	Expertise retrieval in bibliographic network: a topic dominance learning approach	Seyyed Hadi Hashemi, Mahmood Neshati, Hamid Beigy	Motivated by the observation that rarely do all coauthors contribute to a paper equally, in this paper, we propose a discriminative method to realize leading authors contributing in a scientific publication.
129	Instant foodie: predicting expert ratings from grassroots	Chenhao Tan, Ed H. Chi, David Huffaker, Gueorgi Kossinets, Alexander J. Smola	In this paper, we examine the two different approaches to collecting user ratings of restaurants and explore the question of whether it is possible to reconcile them.
130	On segmentation of eCommerce queries	Nish Parikh, Prasad Sriram, Mohammad Al Hasan	In this paper, we present QSEGMENT, a real-life query segmentation system for eCommerce queries.
131	Scientific articles recommendation	Yingming Li, Ming Yang, Zhongfei (Mark) Zhang	We study the problem of recommending scientific articles to users in an online community and present a novel matrix factorization model, the topic regression Matrix Factorization (tr-MF), to solve the problem.
132	MRPacker: an SQL to mapreduce optimizer	Xuelian Lin, Yue Ye, Shuai Ma	In this paper, we propose MRPacker, a novel SQL-to-MapReduce optimizer by (a) using a set of transformation rules to reduce the number of MapReduce jobs, and (b) merging MapReduce jobs in a more reasonable way.
133	A hybrid approach for privacy-preserving processing of knn queries in mobile database systems	Shixin Tian, Ying Cai, Qinghua Zheng	In this paper, we present a hybrid approach that mitigates the above dilemma.
134	Flexible and extensible generation and corruption of personal data	Peter Christen, Dinusha Vatsalan	We present a sophisticated data generation and corruption tool that allows the creation of various types of data, ranging from names and addresses, dates, social security and credit card numbers, to numerical values such as salary or blood pressure.
135	An efficient and robust privacy protection technique for massive streaming choice-based information	Ji Zhang, Xuemei Liu, Yonglong Luo	In this paper, we focus on the streaming choice-based information and propose a novel anonymization technique for providing a strong privacy protection to safeguard against privacy disclosure and information tampering.
136	RCached-tree: an index structure for efficiently answering popular queries	Manash Pal, Arnab Bhattacharya, Debjyoti Paul	In this paper, we propose RCached-tree, belonging to the family of R-trees, that aims to solve this problem.
137	Label constrained shortest path estimation	Ankita Likhyani, Srikanta Bedathur	In this paper, we develop SkIt index structure, which supports a wide range of label constraints on paths, and returns an accurate estimation of the shortest path that satisfies the constraints.
138	Feature-based models for improving the quality of noisy training data for relation extraction	Benjamin Roth, Dietrich Klakow	We propose and evaluate two feature-based models for increasing the quality of distant supervision extraction patterns.
139	Weighted hashing for fast large scale similarity search	Qifan Wang, Dan Zhang, Luo Si	This paper proposes a novel method, named Weighted Hashing (WeiHash), to assign different weights to different hashing bits.
140	Term associations in query expansion: a structural linguistic perspective	Michael Symonds, Guido Zuccon, Bevan Koopman, Peter Bruza, Laurianne Sitbon	Given the reliance on word meanings when a user formulates their query, our approach takes the novel step of modelling both syntagmatic and paradigmatic associations within the query expansion process based on the (pseudo) relevant documents returned in web search.
141	Predicting event-relatedness of popular queries	Seyyedeh Newsha Ghoreishi, Aixin Sun	In this paper, we identify 20 features including both contextual and temporal features from a small set of search results of a query and predict its event-relatedness.
142	Modeling latent topic interactions using quantum interference for information retrieval	Alessandro Sordoni, Jing He, Jian-Yun Nie	In this paper, we investigate the use of the well-known wave-like phenomenon of Quantum Interference for topic models such as Latent Dirichlet Allocation (LDA).
143	Generalizing diversity detection in blog feed retrieval	Mostafa Keikha, Fabio Crestani, Bruce Croft	In this paper, we propose a blog-level diversity measure where there is no assumption made about the underlying blog-ranking technique.
144	Dynamic query intent mining from a search log stream	Yanan Qian, Tetsuya Sakai, Junting Ye, Qinghua Zheng, Cong Li	We propose a method for mining dynamic query intents from search query logs.
145	Latency-aware strategy for static list caching in flash-based web search engines	Jiancong Tong, Gang Wang, Xiaoguang Liu	Based on the observation that the speed gap between the random access of flash-based solid state drive and its sequential access is much inapparent than that of magnetic hard disk drive, we introduce a new static list caching algorithm which takes the block-level access latency into consideration.
146	Bootstrapping active name disambiguation with crowdsourcing	Yu Cheng, Zhengzhang Chen, Jiang Wang, Ankit Agrawal, Alok Choudhary	To efficiently acquire labeled data, we propose a bootstrapping algorithm for the name disambiguation task based on active learning and crowdsourced labeling.
147	Modeling clicks beyond the first result page	Aleksandr Chuklin, Pavel Serdyukov, Maarten de Rijke	We propose a modification of the Dynamic Bayesian Network (DBN) click model by explicitly including into the model the probability of transition between result pages.
148	Maintaining discriminatory power in quantized indexes	Matt Crane, Andrew Trotman, Richard O’Keefe	We observe a relationship between the collection size and ideal quantization size, and provide a way to determine the number of bits to use from the collection size.
149	Retrieving opinions from discussion forums	Laura Dietz, Ziqi Wang, Samuel Huston, W. Bruce Croft	In this short paper, we test a range of existing techniques for forum retrieval and develop new retrieval models to differentiate between opinionated and factual forum posts.
150	Retrieval of trending keywords in a peer-to-peer micro-blogging OSN	H. Asthana, Ingemar Cox	We propose a two step solution.
151	Trustable aggregation of online ratings	Hyun-Kyo Oh, Sang-Wook Kim, Sunju Park, Ming Zhou	In this paper, we define false reputation as the problem of the reputation to be manipulated by unfair ratings, and design a general framework that provides trustable reputation.
152	Exploiting proximity feature in statistical translation models for information retrieval	Xinhui Tu, Jing Luo, Bo Li, Tingting He, Maofu Liu	In this paper, we study how to explicitly incorporate proximity information into the existing translation language model, and propose a proximity-based translation language model, called TM-P, with three variants.
153	Position-based contextualization for passage retrieval	David Carmel, Anna Shtok, Oren Kurland	We present a novel contextualization approach for passage retrieval.
154	High throughput filtering using FPGA-acceleration	Wim Vanderbauwhede, Anton Frolov, Leif Azzopardi, Sai Rahul Chalamalasetti, Martin Margala	In this paper, we develop an energy-efficient high performance information filtering system that is capable of classifying a stream of incoming document at high speed.
155	On challenges with mobile e-health: lessons from a game-theoretic perspective	Ann-Marie Eklund	This paper highlights some possibilities and benefits of a theoretic framework, based on existing works on game-theoretic treatments of information retrieval and communication, to allow for both descriptive and predictive analysis of internet-based health communication.
156	Improving entity search over linked data by modeling latent semantics	Nikita Zhiltsov, Eugene Agichtein	In this paper, we propose a principled and scalable approach for integrating of latent semantic information into a learning-to-rank model, by combining compact representation of semantic similarity, achieved by using a modified algorithm for tensor factorization, with explicit entity information.
157	Challenges in commerce search	Hugh Williams	In this talk, we discuss what makes commerce search hard, how eBay has solved some of these problems, and what challenges eBay faces in the next generation of its search technologies.
158	Clustering: probably approximately useless?	Rich Caruana	How do we fix this and make clustering more useful in practice?
159	Is top-k sufficient for ranking?	Yanyan Lan, Shuzi Niu, Jiafeng Guo, Xueqi Cheng	In this paper, we propose to study this problem from both empirical and theoretical aspects.
160	How fresh do you want your search results?	Shiwen Cheng, Anastasios Arvanitis, Vagelis Hristidis	In this work, we focus on this class of queries that we refer to as "timely queries".
161	TellMyRelevance!: predicting the relevance of web search results from cursor interactions	Maximilian Speicher, Andreas Both, Martin Gaedke	We introduce TellMyRelevance!
162	Selection fusion in semi-structured retrieval	Muhammad Ali Norozi, Paavo Arvola	Hence we propose, a novel type of fusion; the \textit{selection fusion} — a fusion methodology which fuses an all-purpose and comprehensive ranking of elements with a specific selection scheme, and also enables evaluation of the ranking in many selection perspectives.
163	Incorporating user preferences into click models	Qianli Xing, Yiqun Liu, Jian-Yun Nie, Min Zhang, Shaoping Ma, Kuo Zhang	As a uniform click model for all users can hardly capture the diverse click behavior, in this paper we incorporate user preferences into both a variety of existing click models and a novel click model.
164	Feedback-driven multiclass active learning for data streams	Yu Cheng, Zhengzhang Chen, Lu Liu, Jiang Wang, Ankit Agrawal, Alok Choudhary	In this paper, we present a systematic framework for stream-based multi-class active learning.
165	Discriminative feature selection for multi-view cross-domain learning	Zheng Fang, Zhongfei (Mark) Zhang	In this paper, we address this problem and propose a new framework, called DISMUTE, taking advantage of the typically available multiple views of the data in domains.
166	Functional dirichlet process	Lijing Qin, Xiaoyan Zhu	We present a general method for constructing dependent Dirichlet processes (DP) on arbitrary covariate space.
167	Spatio-temporal meme prediction: learning what hashtags will be popular where	Krishna Y. Kamath, James Caverlee	In this paper, we tackle the problem of predicting what online memes will be popular in what locations.
168	Cost-sensitive learning for large-scale hierarchical classification	Jianfu Chen, David Warren	We propose a loss normalization approach to appropriately calibrating the scaling of loss functions, which is applicable to general classification and structured prediction tasks whenever using structured SVM with margin re-scaling.
169	Effective measures for inter-document similarity	John S. Whissell, Charles L.A. Clarke	In this work, we extend that result, presenting and evaluating novel inter-document similarity measures based on BM25, language modeling, and divergence from randomness.
170	Efficient hierarchical clustering of large high dimensional datasets	Sean Gilpin, Buyue Qian, Ian Davidson	In this paper we explore using angular hashing to hash objects with similar angular distance to the same hash bucket.
171	Flexible and adaptive subspace search for outlier analysis	Fabian Keller, Emmanuel Müller, Andreas Wixler, Klemens Böhm	In this work we propose such a flexible and adaptive subspace selection scheme.
172	Query matching for report recommendation	Veronika Thost, Konrad Voigt, Daniel Schuster	Targeting at large-scale, real-world reporting scenarios, we propose a scalable, index-based query matching approach.
173	Computing term similarity by large probabilistic isA knowledge	Peipei Li, Haixun Wang, Kenny Q. Zhu, Zhongyuan Wang, Xindong Wu	Therefore, we propose a lightweight and effective approach for semantic similarity using a large scale semantic network automatically acquired from billions of web documents.
174	Interactive collaborative filtering	Xiaoxue Zhao, Weinan Zhang, Jun Wang	In this paper, we study collaborative filtering (CF) in an interactive setting, in which a recommender system continuously recommends items to individual users and receives interactive feedback.
175	Building optimal information systems automatically: configuration space exploration for biomedical information systems	Zi Yang, Elmer Garduno, Yan Fang, Avner Maiberg, Collin McCormack, Eric Nyberg	We introduce the CSE framework, an extension to the UIMA framework which provides a general distributed solution for building and exploring configuration spaces for information systems.
176	Learning to handle negated language in medical records search	Nut Limsopatham, Craig Macdonald, Iadh Ounis	In this paper, we propose a novel learning framework that effectively handles negated language.
177	A pattern-based selective recrawling approach for object-level vertical search	Yaqian Zhou, Qi Zhang, Xuanjing Huang, Lide Wu	To deal with this problem, we propose a new hypertext resource discovery method, called “selective recrawling” for object-level vertical search applications.
178	Robust models of mouse movement on dynamic web search results pages	Fernando Diaz, Ryen White, Georg Buscher, Dan Liebling	In this work, we develop robust, log-based mouse movement models capable of estimating searcher attention on novel SERP arrangements.
179	Cross-domain sparse coding	Jim Jing-Yan Wang, Halima Bensmail	In this paper, we extend the sparse coding to cross domain learning problem, which tries to learn from a source domain to a target domain with significant different distribution.
180	Motif discovery in spatial trajectories using grammar inference	Tim Oates, Arnold P. Boedihardjo, Jessica Lin, Crystal Chen, Susan Frankenstein, Sunil Gandhi	In this work, we study the problem of discovering motifs in trajectories based on symbolically transformed representations and context free grammars.
181	LCMKL: latent-community and multi-kernel learning based image annotation	Qing Li, Yun Gu, Xueming Qian	In this paper, we propose a novel approach called latent-community and multi-kernel learning (LCMKL).
182	Random walk-based graphical sampling in unbalanced heterogeneous bipartite social graphs	Yusheng Xie, Zhengzhang Chen, Ankit Agrawal, Alok Choudhary, Lu Liu	We propose random walked-based link sampling and stratified sampling for UHBGs and show that they have advantages over generic random walk samplers.
183	Modeling information diffusion over social networks for temporal dynamic prediction	Dong Li, Zhiming Xu, Yishu Luo, Sheng Li, Anika Gupta, Katia Sycara, Shengmei Luo, Lei Hu, Hong Chen	To address this problem, we propose a novel information diffusion model (GT model), which considers the users in network as intelligent agents.
184	Predicting retweet count using visual cues	Ethem F. Can, Hüseyin Oktay, R. Manmatha	In this study, we focus on predicting the expected retweet count of a tweet by using visual cues of an image linked in that tweet in addition to content and structure-based features.
185	Identifying multilingual Wikipedia articles based on cross language similarity and activity	Khoi-Nguyen Tran, Peter Christen	In this poster, we propose similarity and activity measures of Wikipedia articles across two languages: English and German.
186	An efficient algorithm for approximate betweenness centrality computation	Mostafa Haghir Chehreghani	In this paper, we propose a generic randomized framework for unbiased approximation of betweenness centrality.
187	Exploiting collaborative filtering techniques for automatic assessment of student free-text responses	Tao Ge, Zhifang Sui, Baobao Chang	Unlike some conventional methods which assess the student responses based on only information about their corresponding questions, this paper exploits idea of collaborative filtering to analyze student responses and used an effective collaborative filtering model — feature-based matrix factorization model to deal with this challenge.
188	Automated probabilistic modeling for relational data	Sameer Singh, Thore Graepel	Instead of requiring a domain expert to specify the probabilistic dependencies of the data, we present an approach that uses the relational DB schema to automatically construct a Bayesian graphical model for a database.
189	Semantic discovery from web comparison queries	Tingting Zhong, Wensheng Wu	We present a novel snowballing algorithm that "crawls" comparison queries from search engines via their query autocompletion services.
190	Joint learning on sentiment and emotion classification	Wei Gao, Shoushan Li, Sophia Yat Mei Lee, Guodong Zhou, Chu-Ren Huang	In this paper, we address joint learning on sentiment and emotion classification where both the labeled data for sentiment and emotion classification are available.
191	A unified graph model for personalized query-oriented reference paper recommendation	Fanqi Meng, Dehong Gao, Wenjie Li, Xu Sun, Yuexian Hou	In this paper, we propose a unified graph model that can easily incorporate various types of useful information (e.g., content, authorship, citation and collaboration networks etc.) for efficient recommendation.
192	Probabilistic latent class models for predicting student performance	Suleyman Cetintas, Luo Si, Yan Ping Xin, Ron Tzur	This paper proposes a set of novel probabilistic latent class models for the task.
193	Timeline adaptation for text classification	Fumiyo Fukumoto, Yoshimi Suzuki, Atsuhiro Takasu	In this paper, we address the text classification problem that a period of time created test data is different from the training data, and present a method for text classification based on temporal adaptation.
194	Recommendation via user’s personality and social contextual	He Feng, Xueming Qian	In this paper, three social factors, personal interest, interpersonal interest similarity and interpersonal influence, fuse into a unified personalized recommendation model based on probabilistic matrix factorization.
195	A fast convergence clustering algorithm merging MCMC and EM methods	David Sergio Matusevich, Carlos Ordonez, Veerabhadran Baladandayuthapani	In this article, we tackle two fundamental conflicting goals: Finding higher quality solutions and achieving faster convergence.
196	Discrimination aware classification for imbalanced datasets	Goce Ristanoski, Wei Liu, James Bailey	Once the discrimination sensitive attribute is identified, the methods aim to develop a strategy that will include the useful information from that attribute without causing any additional discrimination.
197	Incremental shared nearest neighbor density-based clustering	Sumeet Singh, Amit Awekar	We propose an incremental extension to this algorithm IncSNN-DBSCAN, capable of finding clusters on a dataset to which frequent inserts are made.
198	The essence of knowledge (bases) through entity rankings	Evica Ilieva, Sebastian Michel, Aleksandar Stupar	We consider the task of automatically phrasing and computing top-k rankings over the information contained in common knowledge bases (KBs), such as YAGO or DBPedia.
199	Chinese syntactic parsing based on linguistic entity-relationship model	Dechun Yin	In this paper, we present a new parsing method for Chinese based on a newly proposed linguistic entity relationship model.
200	Clustering-based anomaly detection in multi-view data	Alejandro Marcos Alvarez, Makoto Yamada, Akisato Kimura, Tomoharu Iwata	This paper proposes a simple yet effective anomaly detection method for multi-view data.
201	Discovering relations using matrix factorization methods	Ervina Cergani, Pauli Miettinen	In this work we propose the use of matrix factorization methods instead of clustering.
202	On exploiting content and citations together to compute similarity of scientific papers	Masoud Reyhani Hamedani, Sang-Wook Kim, Sang-Chul Lee, Dong-Jin Kim	In this paper, we propose a novel approach called SimCC that effectively combines the content and citation information to accurately compute the similarity of scientific papers.
203	Taxonomy-based regression model for cross-domain sentiment classification	Cong-Kai Lin, Yang-Yin Lee, Chi-Hsin Yu, Hsin-Hsi Chen	To select an appropriate source node for training in the domain taxonomy, we propose a Taxonomy-Based Regression Model (TBRM) which predicts the accuracy loss from multiple source nodes to a target node using the tree-structured domain representation combined with domain similarity and domain complexity.
204	Reconciliation of categorical opinions from multiple sources	Adway Mitra, Srujana Merugu	To address this, we propose a generic Bayesian framework for opinion reconciliation that can readily incorporate latent and observed attributes of sources and subjects.
205	An unsupervised transfer learning approach to discover topics for online reputation management	Tamara Martín-Wanton, Julio Gonzalo, Enrique Amigó	In this paper, we present a new technique to cluster a collection of tweets emitted within a short time span about a specific entity.
206	Discovering facts with boolean tensor tucker decomposition	Dora Erdos, Pauli Miettinen	We consider the presentation of the problem as a Boolean tensor decomposition as one of this paper’s main contributions.
207	Intelligent SSD: a turbo for big data mining	Duck-Ho Bae, Jin-Hyung Kim, Sang-Wook Kim, Hyunok Oh, Chanik Park	This paper introduces the notion of intelligent SSDs.
208	Software plagiarism detection: a graph-based approach	Dong-Kyu Chae, Jiwoon Ha, Sang-Wook Kim, BooJoong Kang, Eul Gyu Im	In this paper, we propose a software plagiarism detection system using an API-labeled control flow graph (A-CFG) that abstracts the functionalities of a program.
209	Objectionable content filtering by click-through data	Lung-Hao Lee, Yen-Cheng Juan, Hsin-Hsi Chen, Yuen-Hsien Tseng	This paper explores users’ browsing intents to predict the category of a user’s next access during web surfing, and applies the results to objectionable content filtering.
210	Computational advertising: the linkedin way	Deepak Agarwal	In this talk, I will give an overview of machine learning and optimization components that power LinkedIn self-serve display advertising systems.
211	Automatic ad format selection via contextual bandits	Liang Tang, Romer Rosales, Ajit Singh, Deepak Agarwal	To balance exploration with exploitation, we pose automatic layout selection as a contextual bandit problem.
212	Graph similarity search with edit distance constraint in large graph databases	Weiguo Zheng, Lei Zou, Xiang Lian, Dong Wang, Dongyan Zhao	In this paper, we study the problem of graph similarity search, which retrieves graphs that are similar to a given query graph under the constraint of the minimum edit distance.
213	Fast and scalable reachability queries on graphs by pruned labeling with landmarks and paths	Yosuke Yano, Takuya Akiba, Yoichi Iwata, Yuichi Yoshida	In this paper, we propose new labeling-based methods for reachability queries, referred to as pruned landmark labeling and pruned path labeling.
214	Graph hashing and factorization for fast graph stream classification	Ting Guo, Lianhua Chi, Xingquan Zhu	In this paper, we propose a fine-grained graph factorization approach for Fast Graph Stream Classification (FGSC).
215	Efficiently anonymizing social networks with reachability preservation	Xiangyu Liu, Bin Wang, Xiaochun Yang	In this paper, we solve this problem by designing a reachability preserving anonymization (RPA for short) algorithm.
216	ImG-complex: graph data model for topology of unstructured meshes	Alireza Rezaei Mahdiraji, Peter Baumann, Guntram Berti	In this paper, we propose the Incidence multi-Graph Complex (ImG-Complex) data model for storing topological aspects of meshes in a database.
217	ROU: advanced keyword search on graph	Yifan Pan, Yuqing Wu	In this paper, we formally define a new type of keyword search query, ROU-query, which takes as input keywords in three categories: required, optional and unwanted, and returns as output sets of nodes in the data graph whose neighborhood satisfies the keyword requirements.
218	Hotness-aware buffer management for flash-based hybrid storage systems	Yanfei Lv, Bin Cui, Xuexuan Chen, Jing Li	In this paper, we propose a novel approach to manage the buffer in flash-based hybrid storage systems, named Hotness Aware Hit (HAT).
219	Expedited rating of data stores using agile data loading techniques	Sumita Barahmand, Shahram Ghandeharizadeh	This paper presents several agile data loading techniques to expedite the rating process.
220	Social recommendation incorporating topic mining and social trust analysis	Tong Zhao, Chunping Li, Mengya Li, Qiang Ding, Li Li	We propose a probabilistic matrix factorization (TTMF) algorithm and try to enhance the recommendation accuracy by utilizing the estimated topic-specific social trust relations.
221	Originator or propagator?: incorporating social role theory into topic models for twitter content analysis	Xin Wayne Zhao, Jinpeng Wang, Yulan He, Jian-Yun Nie, Xiaoming Li	In this paper, we propose a method inspired from Social Role Theory (SRT), which assumes that a user behaves differently with different roles in the generation process of Twitter content.
222	An effective latent networks fusion based model for event recommendation in offline ephemeral social networks	Guoqiong Liao, Yuchen Zhao, Sihong Xie, Philip S. Yu	An effective latent networks fusion based model for event recommendation in offline ephemeral social networks
223	Predicting trends in social networks via dynamic activeness model	Shuyang Lin, Xiangnan Kong, Philip S. Yu	In this paper, we study the problem of predicting dynamic trends in social networks.
224	Dyadic event attribution in social networks with mixtures of hawkes processes	Liangda Li, Hongyuan Zha	In this paper we focus on the problem of dyadic event attribution, an important missing data problem in dyadic event modeling where one needs to infer the missing actor-pairs of a subset of dyadic events based on their observed timestamps.
225	Modeling temporal effects of human mobile behavior on location-based social networks	Huiji Gao, Jiliang Tang, Xia Hu, Huan Liu	We propose a general framework to exploit and model temporal cyclic patterns and their relationships with spatial and social data.
226	Social media news communities: gatekeeping, coverage, and statement bias	Diego Saez-Trumper, Carlos Castillo, Mounia Lalmas	To that end, we introduce unsupervised methods considering three types of biases: selection or “gatekeeping” bias, coverage bias, and statement bias, characterizing each one through a series of metrics.
227	Discovering health-related knowledge in social media using ensembles of heterogeneous features	Suppawong Tuarob, Conrad S. Tucker, Marcel Salathe, Nilam Ram	These systems mostly employ traditional document classification techniques that represent a document with a bag of N-grams.
228	Seeking provenance of information using social media	Pritam Gundecha, Zhuo Feng, Huan Liu	In this paper, we are studying a novel research problem that facilitates the seeking of the provenance of information for a few known recipients (less than 1% of the total recipients) by recovering the paths it has taken from its originators.
229	Compact explanatory opinion summarization	Hyun Duk Kim, Malu Castellanos, Meichun Hsu, ChengXiang Zhai, Umeshwar Dayal, Riddhiman Ghosh	In this paper, we propose a novel opinion summarization problem called compact explanatory opinion summarization (CEOS) which aims to extract within-sentence explanatory text segments from input opinionated texts to help users better understand the detailed reasons of sentiments. We create new data sets and use a new evaluation measure to evaluate CEOS.
230	Towards an enhanced and adaptable ontology by distilling and assembling online encyclopedias	Shan Jiang, Lidong Bing, Yan Zhang	In this paper, we investigate the problem of making better use of semantic knowledge obtained from different encyclopedia sources.
231	Assessing sparse information extraction using semantic contexts	Peipei Li, Haixun Wang, Hongsong Li, Xindong Wu	In this work, we introduce a lightweight, explicit semantic approach for sparse information extraction.
232	Studying from electronic textbooks	Rakesh Agrawal, Sreenivas Gollapudi, Anitha Kannan, Krishnaram Kenthapadi	We propose a novel reader model for textbooks and an algorithm for generating the study navigator based on this model.
233	Generating informative snippet to maximize item visibility	Mahashweta Das, Habibur Rahman, Gautam Das, Vagelis Hristidis	We investigate the problem of finding the top-k best snippets for an item that are likely to maximize the probability that the user preference (available in the form of search query) is satisfied.
234	Assessing quality score of Wikipedia article using mutual evaluation of editors and texts	Yu Suzuki, Masatoshi Yoshikawa	In this paper, we propose a method for assessing quality scores of Wikipedia articles by mutually evaluating editors and texts.
235	Concept-based analysis of scientific literature	Chen-Tse Tsai, Gourab Kundu, Dan Roth	To reach this goal, we propose an unsupervised bootstrapping algorithm for identifying and categorizing mentions of concepts.
236	On sampling the wisdom of crowds: random vs. expert sampling of the twitter stream	Saptarshi Ghosh, Muhammad Bilal Zafar, Parantapa Bhattacharya, Naveen Sharma, Niloy Ganguly, Krishna Gummadi	In this paper, we investigate the crucial question of how to sample the data generated by users in social networks.
237	Can back-of-the-book indexes be automatically created?	Zhaohui Wu, Zhenhui Li, Prasenjit Mitra, C. Lee Giles	Inspired by how human indexers work on back-of-the-book indexes creation, we present a new domain-independent, corpus-free and training-free automation approach.
238	Directing exploratory search with interactive intent modeling	Tuukka Ruotsalo, Jaakko Peltonen, Manuel Eugster, Dorota Głowacka, Ksenia Konyushkova, Kumaripaba Athukorala, Ilkka Kosunen, Aki Reijonen, Petri Myllymäki, Giulio Jacucci, Samuel Kaski	We introduce interactive intent modeling, where the user directs exploratory search by providing feedback for estimates of search intents.
239	FRec: a novel framework of recommending users and communities in social media	Lei Li, Wei Peng, Saurabh Kataria, Tong Sun, Tao Li	In this paper, we propose a framework of recommending users and communities in social media.
240	Permutation indexing: fast approximate retrieval from large corpora	Maxim Gurevich, Tamás Sarlós	In this work we propose an alternative technique, permutation indexing, where retrieval cost is strictly bounded and has only logarithmic dependence on the corpus size.
241	Clustering-based transduction for learning a ranking model with limited human labels	Xin Zhang, Ben He, Tiejian Luo, Dongxing Li, Jungang Xu	To this end, we propose to incorporate a two-step k-means clustering algorithm to select the high quality training queries for generating the pseudo labels.
242	Exploiting ranking factorization machines for microblog retrieval	Runwei Qiang, Feng Liang, Jianwu Yang	In this paper, we propose a Ranking Factorization Machine (Ranking FM) model, which applies Factorization Machine model to microblog ranking on basis of pairwise classification.
243	Learning compact hashing codes for efficient tag completion and prediction	Qifan Wang, Lingyun Ruan, Zhiwei Zhang, Luo Si	This paper proposes a novel efficient Hashing approach for Tag Completion and Prediction (HashTCP).
244	How do users grow up along with search engines?: a study of long-term users’ behavior	Jian Liu, Yiqun Liu, Min Zhang, Shaoping Ma	In this paper we look into the interaction logs of these two user groups to analyze differences between these two user groups and to better understand how users grow up along with Web search engines.
245	LR-PPR: locality-sensitive, re-use promoting, approximate personalized pagerank computation	Jung Hyun Kim, K. Selçuk Candan, Maria Luisa Sapino	In this paper, we propose a Locality-sensitive, Re-use promoting, approximate personalized PageRank (LR-PPR) algorithm for efficiently computing the PPR values relying on the localities of the given seed nodes on the graph: (a) The LR-PPR algorithm is locality sensitive in the sense that it reduces the computational cost of the PPR computation process by focusing on the local neighborhoods of the seed nodes.
246	Multimedia summarization for trending topics in microblogs	Jingwen Bian, Yang Yang, Tat-Seng Chua	In this paper, we propose a multimedia microblog summarization framework to automatically generate visualized summaries for trending topics.
247	Semi-supervised discriminative preference elicitation for cold-start recommendation	Xi Zhang, Jian Cheng, Ting Yuan, Biao Niu, Hanqing Lu	In this paper, we propose a novel framework to mine the most valuable items to construct query set using a semi-supervised discriminative selection (SSDS) model.
248	Exploiting query term correlation for list caching in web search engines	Jiancong Tong, Gang Wang, Douglas S. Stones, Shizhao Sun, Xiaoguang Liu, Fan Zhang	We propose an inverted list caching policy, based on the Least Recently Used method, in which the co-occurring correlation between terms in the query stream is accounted for when deciding on which terms to keep in the cache.
249	Speller performance prediction for query autocorrection	Alexey Baytin, Irina Galinskaya, Marina Panina, Pavel Serdyukov	In this paper we define the problem of speller performance prediction and apply it to the task of query spelling autocorrection.
250	Predicting the impact of expansion terms using semantic and user interaction features	Anton Bakhtin, Yury Ustinovskiy, Pavel Serdyukov	Predicting the impact of expansion terms using semantic and user interaction features
251	QBEES: query by entity examples	Steffen Metzger, Ralf Schenkel, Marcin Sydow	We present QBEES, a novel framework for defining entity similarity based only on structural features, so-called aspects, of the entities, that includes query-dependent and query-independent entity ranking components.
252	Learning to selectively rank patients’ medical history	Nut Limsopatham, Craig Macdonald, Iadh Ounis	In this work, we propose a novel supervised approach that can effectively identify when to use either of the two aforementioned patient ranking approaches to attain effective retrieval performance.
253	A belief propagation approach for detecting shilling attacks in collaborative filtering	Jun Zou, Faramarz Fekri	In this paper, we develop a probabilistic inference framework that further exploits the target items for attack detection.
254	Automated snippet generation for online advertising	Stamatina Thomaidou, Ismini Lourentzou, Panagiotis Katsivelis-Perakis, Michalis Vazirgiannis	In this paper, we propose a method that produces in an automated manner compact text ads (promotional text snippets), given as input a product description webpage (landing page).
255	Detecting controversy on the web	Shiri Dori-Hacohen, James Allan	We explore the feasibility of solving the problem by treating it as supervised k-nearest-neighbor classification.
256	Mining user interest from search tasks and annotations	Sampath Jayarathna, Atish Patra, Frank Shipman	In this paper, we introduce UIMaP: User Interest Modeling and Personalization, a search task based personal user interest model to support users’ information gathering tasks.
257	Generating comparative summaries from reviews	Ruben Sipos, Thorsten Joachims	To facilitate direct comparisons between different products, we present an approach to constructing short and comparative summaries based on product reviews.
258	Zero-shot video retrieval using content and concepts	Jeffrey Dalton, James Allan, Pranav Mirajkar	In this work we introduce a new method for automatically identifying relevant concepts given a text query using the Markov Random Field (MRF) retrieval framework.
259	Diversified query expansion using conceptnet	Arbi Bouchoucha, Jing He, Jian-Yun Nie	For this purpose, we investigate a new approach to SRD by diversifying the query.
260	An empirical study of top-n recommendation for venture finance	Thomas Stone, Weinan Zhang, Xiaoxue Zhao	This paper concerns the task of top-N investment opportunity recommendation in the domain of venture finance.
261	Interest mining from user tweets	Thuy Vu, Victor Perez	We build a system to extract user interests from Twitter messages.
262	An analysis of crowd workers mistakes for specific and complex relevance assessment task	Jesse Anderton, Maryam Bashir, Virgil Pavlu, Javed A. Aslam	Since most crowdsourcing approaches submitted to the TREC 2012 track produced assessment sets nowhere close to the expert judgements, we decided to analyze crowdsourcing mistakes made on this task using data we collected via Amazon’s Mechanical Turk service.
263	Combining prestige and relevance ranking for personalized recommendation	Xiao Yang, Zhaoxin Zhang	In this paper, we present an adaptive graph-based personalized recommendation method based on combining prestige and relevance ranking.
264	Strategies for setting time-to-live values in result caches	Fethi Burak Sazoglu, B. Barla Cambazoglu, Rifat Ozcan, Ismail Sengor Altingovde, Özgür Ulusoy	In this work, we evaluate the performance of three alternative TTL mechanisms: time-based TTL, frequency-based TTL, and click-based TTL.
265	Learning to detect task boundaries of query session	Zhenzhong Zhang, Le Sun, Xianpei Han	In this paper we learn hidden topics from query log and leverage them to resolve the vocabulary gap problem.
266	Early prediction on imbalanced multivariate time series	Guoliang He, Yong Duan, Tieyun Qian, Xu Chen	To deal with this issue, we adopt a multiple under-sampling and dynamical subspace generation method to obtain initial training data, and each training data is used to learn a base learner.
267	Exploiting trustors as well as trustees in trust-based recommendation	Won-Seok Hwang, Shaoyu Li, Sang-Wook Kim, Ho Jin Choi	In this paper, we investigate this possibility by identifying and adding these users to the existing methods when predicting ratings for the target user.
268	Through-the-looking glass: utilizing rich post-search trail statistics for web search	Alexey Tolstikov, Mikhail Shakhray, Gleb Gusev, Pavel Serdyukov	We conduct a large-scale study and evaluation of a rich set of search trail features in realistic settings and conclude that a deeper investigation of a users experience far beyond her click on the result page has the potential to improve the existing ranking models.
269	Topical authority propagation on microblogs	Juan Hu, Yi Fang, Archana Godavarthy	We propose a novel Topical Authority Propagation (TAP) model by utilizing the fact that topical authority can be propagated through retweeting, i.e., if a user’s tweet on a given topic is retweeted by a topical authority, that user is likely to be an authority on the topic as well.
270	The importance of being socially-savvy: quantifying the influence of social networks on microblog retrieval	Alexander Kotov, Eugene Agichtein	In this work, we quantitatively evaluate the influence of social networks on social media content providers.
271	Flexible and dynamic compromises for effective recommendations	Saurabh Gupta, Sutanu Chakraborti	In this paper, we propose a way to realize the notion of compromise in a conversational setting.
272	The online revolution: education for everyone	Andrew Ng	In this talk, I’ll report on this far-reaching experiment in education, and why we believe this model can provide both an improved classroom experience for our on-campus students, via a flipped classroom model, as well as a meaningful learning experience for the millions of students around the world who would otherwise never have access to education of this quality.
273	Online learning from streaming data	Jeff Hawkins	In this talk I will describe recent advances in brain theory and how we have applied those advances to machine-generated streaming data.
274	From big data to big knowledge	Kevin Murphy	In this talk, I will survey some of the efforts we are engaged in to try to "grow" KG automatically using machine learning methods.
275	"All roads lead to Rome": optimistic recovery for distributed iterative data processing	Sebastian Schelter, Stephan Ewen, Kostas Tzoumas, Volker Markl	We propose an optimistic recovery mechanism using algorithmic compensations.
276	Optimizing plurality for human intelligence tasks	Luyi Mo, Reynold Cheng, Ben Kao, Xuan S. Yang, Chenghui Ren, Siyu Lei, David W. Cheung, Eric Lo	We propose a dynamic programming (DP) algorithm for solving the plurality assignment problem (PAP).
277	Entropy-based histograms for selectivity estimation	Hien To, Kuorong Chiang, Cyrus Shahabi	Therefore, we propose effective models to quantitatively measure bias and selectivity based on information entropy.
278	Efficient two-party private blocking based on sorted nearest neighborhood clustering	Dinusha Vatsalan, Peter Christen, Vassilios S. Verykios	We introduce a novel two-party private blocking technique for PPRL based on sorted nearest neighborhood clustering.
279	Context-aware top-K processing using views	Silviu Maniu, Bogdan Cautis	We present algorithms that address these two problems, and illustrate their practical use in two important application scenarios: location-aware search and social-aware search.
280	Locality sensitive hashing revisited: filling the gap between theory and algorithm analysis	Hongya Wang, Jiao Cao, LihChyun Shu, Davood Rafiei	In this paper, we show that a surprising gap exists between the LSH theory and widely practiced algorithm analysis techniques.
281	Personalization of web-search using short-term browsing context	Yury Ustinovskiy, Pavel Serdyukov	In this paper we study the problem of short-term personalization.
282	Factors affecting aggregated search coherence and search behavior	Jaime Arguello, Robert Capra, Wan-Ching Wu	We build upon this work and address three outstanding research questions about aggregated search coherence: (1) Does the same "spill-over" effect generalize to other verticals besides images?
283	Improving passage ranking with user behavior information	Weize Kong, Elif Aktolga, James Allan	In this paper, we study how user behavior information implies section relevance, and use this information to improve section ranking.
284	Personalized models of search satisfaction	Ahmed Hassan, Ryen W. White	In this paper we verify that searcher behavior when satisfied and dissatisfied is indeed different among individual searchers along a number of dimensions.
285	Beyond clicks: query reformulation as a predictor of search satisfaction	Ahmed Hassan, Xiaolin Shi, Nick Craswell, Bill Ramsey	Using a large unlabeled dataset, a labeled dataset of queries and a labeled dataset of user tasks, we analyze the relationship between these signals.
286	Unsupervised identification of synonymous query intent templates for attribute intents	Yanen Li, Bo-June Paul Hsu, ChengXiang Zhai	In this work we address the problem of identifying synonymous query intent templates for the attribute intent.
287	Toward advice mining: conditional random fields for extracting advice-revealing text units	Alfan Farizki Wicaksono, Sung-Hyon Myaeng	In this paper, we address the problem of advice-revealing text unit (ATU) extraction from online forums due to its usefulness in travel domain.
288	Information extraction as a filtering task	Henning Wachsmuth, Benno Stein, Gregor Engels	In this paper, we hence propose and evaluate a model and a formal approach that consistently put the filtering view in the focus: Before spending annotation effort, filter those portions of the input texts that may contain relevant information for filling a template and discard the others.
289	Web news extraction via path ratios	Gongqing Wu, Li Li, Xuegang Hu, Xindong Wu	In this paper, we present Content Extraction via Path Ratios (CEPR) – a fast, accurate and general on-line method for distinguishing news content from non-news content by the TPR/ETPR histogram effectively.
290	Lead-lag analysis via sparse co-projection in correlated text streams	Fangzhao Wu, Yangqiu Song, Shixia Liu, Yongfeng Huang, Zhenyu Liu	In this paper, we propose an algorithm that can both detect the correlation and discover the corresponding keywords that trigger the correlation.
291	Adaptive co-training SVM for sentiment classification on tweets	Shenghua Liu, Fuxin Li, Fangtao Li, Xueqi Cheng, Huawei Shen	Therefore we formally propose an adaptive multiclass SVM model which transfers an initial common sentiment classifier to a topic-adaptive one.
292	On handling textual errors in latent document modeling	Tao Yang, Dongwon Lee	On handling textual errors in latent document modeling
293	Overlapping community detection using seed set expansion	Joyce Jiyoung Whang, David F. Gleich, Inderjit S. Dhillon	In this paper, we propose an efficient overlapping community detection algorithm using a seed set expansion approach.
294	TODMIS: mining communities from trajectories	Siyuan Liu, Shuhui Wang, Kasthuri Jayarajah, Archan Misra, Ramayya Krishnan	To address this limitation, we propose TODMIS: a general framework for Trajectory cOmmunity Discovery using Multiple Information Sources.
295	Archiving the relaxed consistency web	Zhiwu Xie, Herbert Van de Sompel, Jinyang Liu, Johann van Reenen, Ramiro Jordan	We discuss the nature of such quality degradation and propose a few possible remedies.
296	Programming with personalized pagerank: a locally groundable first-order probabilistic logic	William Yang Wang, Kathryn Mazaitis, William W. Cohen	Here we present a first-order probabilistic language which is well-suited to approximate "local" grounding: in particular, every query $Q$ can be approximately grounded with a small graph.
297	Towards faster and better retrieval models for question search	Guangyou Zhou, Yubo Chen, Daojian Zeng, Jun Zhao	In this paper, we propose a faster and better retrieval model for question search by leveraging user chosen category.
298	Nonparametric bayesian multitask collaborative filtering	Sotirios Chatzis	To alleviate these issues, in this paper we propose a novel multitask collaborative filtering approach.
299	Local-to-global semi-supervised feature selection	Mohammed Hindawi, Khalid Benabdeslem	Global and local feature selection have different objectives, nevertheless, in this paper we propose a novel embedded approach which locally weights the variables towards a global feature selection.
300	Intelligently querying incomplete instances for improving classification performance	Karthik Sankaranarayanan, Amit Dhurandhar	In this paper, we propose a novel active feature acquisition technique to tackle this problem of instance completion prevalent in these domains.
301	A probabilistic mixture model for mining and analyzing product search log	Huizhong Duan, ChengXiang Zhai, Jinxing Cheng, Abhishek Gattani	In this paper, we propose a novel probabilistic mixture model for attribute-level analysis of product search logs.
302	Eigenvalues perturbation of integral operator for kernel selection	Yong Liu, Shali Jiang, Shizhong Liao	In this paper, we introduce new kernel selection criteria based on the eigenvalues perturbation of the integral operator.
303	Beyond data: from user information to business value through personalized recommendations and consumer science	Xavier Amatriain	In this invited talk I will discuss the different approaches we follow to deal with these large streams of user data in order to extract information for personalizing our service.
304	Beyond data: from user information to business value through personalized recommendations and consumer science	Xavier Amatriain	In this paper I will discuss the different approaches we follow to deal with these large streams of user data in order to extract information for personalizing our service.
305	Leveraging data to change industry paradigms	Chris Farmer	In this talk, I will discuss how we analyze these trends as venture capitalists and will look at a few case studies of specific companies leveraging data to innovate in their industries.
306	Large-scale deep learning at Baidu	Kai Yu	In this talk, I will walk through some of the latest technology advances of deep learning within Baidu, and discuss the main challenges, e.g., developing effective models for various applications, and scaling up the model training using many GPUs.
307	Wondering why data are missing from query results?: ask conseil why-not	Melanie Herschel	This solution goes beyond simply forming the union of explanations produced by different algorithms and is shown to be able to explain a larger set of missing-answers.
308	Fast evaluation of iceberg pattern-based aggregate queries	Zhian He, Petrie Wong, Ben Kao, Eric Lo, Reynold Cheng	This paper proposes an efficient approach to identify and evaluate iceberg cells of s-cuboids.
309	Top-down keyword query processing on XML data	Junfeng Zhou, Xingmin Zhao, Wei Wang, Ziyang Chen, Jeffrey Xu Yu	In this paper, we propose a generic top-down processing strategy to answer a given keyword query w.r.t. LCA/SLCA/ELCA semantics.
310	Efficient pruning algorithm for top-K ranking on dataset with value uncertainty	Jianwen Chen, Ling Feng	We present the mathematics of deriving the pruning techniques and the corresponding algorithms.
311	Query execution timing: taming real-time anytime queries on multicore processors	Chunyao Song, Zheng Li, Tingjian Ge, Jie Wang	Specifically, we propose two query optimization modes: offline periodic optimization and online optimization.
312	Merged aggregate nearest neighbor query processing in road networks	Weiwei Sun, Chong Chen, Baihua Zheng, Chunan Chen, Liang Zhu, Weimo Liu, Yan Huang	This paper proposes an effective algorithm to process MANN query in road networks based on our pruning strategies.
313	SkyView: a user evaluation of the skyline operator	Matteo Magnani, Ira Assent, Kasper Hornbæk, Mikkel R. Jakobsen, Ken Friis Larsen	Our study investigates the degree to which users understand skyline queries, how they specify query parameters and how they interact with skyline results made available in listings or map-based interfaces.
314	UMicS: from anonymized data to usable microdata	Graham Cormode, Entong Shen, Xi Gong, Ting Yu, Cecilia M. Procopiuc, Divesh Srivastava	In this paper, instead of proposing new privacy mechanisms for data publishing, we consider the whole data release process, from the data owner to the data user.
315	GAPfm: optimal top-n recommendations for graded relevance domains	Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Alan Hanjalic	We address the shortcomings of existing approaches by proposing GAPfm, the Graded Average Precision factor model, which is a latent factor model for top-N recommendation in domains with graded relevance data.
316	URL tree: efficient unsupervised content extraction from streams of web documents	Borut Sluban, Miha Grčar	In this work, we focus on content extraction from streams of HTML documents.
317	Estimating document focus time	Adam Jatowt, Ching-Man Au Yeung, Katsumi Tanaka	In this paper, we introduce the problem of estimating focus time of documents.
318	Faceted models of blog feeds	Lifeng Jia, Clement Yu, Weiyi Meng	In this paper we consider personal and official facets.
319	SRbench–a benchmark for soundtrack recommendation systems	Aleksandar Stupar, Sebastian Michel	In this work, a benchmark to evaluate the retrieval performance of soundtrack recommendation systems is proposed.
320	CV-PCR: a context-guided value-driven framework for patent citation recommendation	Sooyoung Oh, Zhen Lei, Wang-Chien Lee, Prasenjit Mitra, John Yen	Based on the insight that patent citations are important information reflecting the value of cited patents to the citing patent, we propose a heterogeneous patent citation-bibliographic network that combines patent citations (reflecting value relation) and bibliographic information (reflecting similarity relation) together.
321	Modeling behavioral factors ininteractive information retrieval	Feza Baskaya, Heikki Keskustalo, Kalervo Järvelin	In the present study we aim at assessing the effects of the behavioral factors on retrieval effectiveness.
322	Intent models for contextualising and diversifying query suggestions	Eugene Kharitonov, Craig Macdonald, Pavel Serdyukov, Iadh Ounis	We introduce a contextualisation framework that utilises a short-term context using the user’s behaviour within the current search session, such as the previous query, the documents examined, and the candidate query suggestions that the user has discarded.
323	Building user profiles from topic models for personalised search	Morgan Harvey, Fabio Crestani, Mark J. Carman	In this work we use query logs to build personalised ranking models in which user profiles are constructed based on the representation of clicked documents over a topic space.
324	Transferring knowledge with source selection to learn IR functions on unlabeled collections	Parantapa Goswami, Massih R. Amini, Eric Gaussier	For the transfer step, the relevance information in the source collection is summarized as a grid that provides, for each term frequency and document frequency values of a word in a document, an empirical estimate of the relevance of the document.
325	Understanding how people interact with web search results that change in real-time using implicit feedback	Jin Young Kim, Mark Cramer, Jaime Teevan, Dmitry Lagun	In this paper we compare a traditional search interface with one that dynamical-ly re-ranks and recommends search results as the user interacts with it in order to build a picture of how and when users should be offered dynamically identified relevant content.
326	Facet selection algorithms for web product search	Damir Vandic, Flavius Frasincar, Uzay Kaymak	In this paper, we focus on automatic facet selection, with the goal of minimizing the number of steps needed to find the desired product.
327	Learning deep structured semantic models for web search using clickthrough data	Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, Larry Heck	In this study we strive to develop a series of new latent semantic models with a deep structure that project queries and documents into a common low-dimensional space where the relevance of a document given a query is readily computed as the distance between them.
328	Learning open-domain comparable entity graphs from user search queries	Ziheng Jiang, Lei Ji, Jianwen Zhang, Jun Yan, Ping Guo, Ning Liu	In this paper, we propose a novel solution, which is known as Comparable Entity Graph Mining (CEGM), to learn an open-domain comparable entity graph from the user search queries.
329	RAProp: ranking tweets by exploiting the tweet/user/web ecosystem and inter-tweet agreement	Srijith Ravikumar, Kartik Talamadupula, Raju Balakrishnan, Subbarao Kambhampati	We present a novel ranking method called RAProp, which combines two orthogonal measures of relevance and trustworthiness of a tweet.
330	Incorporating the surfing behavior of web users into pagerank	Shatlyk Ashyralyyev, B. Barla Cambazoglu, Cevdet Aykanat	In this work, we combine these two types of feedback under a hybrid page ranking model in order to alleviate the above-mentioned drawbacks.
331	Question routing to user communities	Aditya Pal, Fei Wang, Michelle X. Zhou, Jeffrey Nichols, Barton A. Smith	In this paper, we consider the novel problem of routing questions to the right community and propose a framework to select the right set of communities for a question.
332	Learning to rank for question routing in community question answering	Zongcheng Ji, Bin Wang	This paper proposes a general framework based on the learning to rank concepts for QR.
333	Re-ranking for joint named-entity recognition and linking	Avirup Sil, Alexander Yates	We present a joint model for NER and EL, called NEREL, that takes a large set of candidate mentions from typical NER systems and a large set of candidate entity links from EL systems, and ranks the candidate mention-entity pairs together to make joint predictions.
334	Identifying salient entities in web pages	Michael Gamon, Tae Yano, Xinying Song, Johnson Apacible, Patrick Pantel	We propose a system that determines the salience of entities within web documents.
335	Recommending tags with a model of human categorization	Paul Seitlinger, Dominik Kowald, Christoph Trattner, Tobias Ley	In this paper we present a recommender approach for social tags derived from ALCOVE, a model of human category learning.
336	Automatically generating descriptions for resources by tag modeling	Bin Bi, Junghoo Cho	In this paper, we present a general framework of selecting a set of k tags as the description for a given resource.
337	Mining characteristic multi-scale motifs in sensor-based time series	Ugo Vespier, Siegfried Nijssen, Arno Knobbe	We propose a method to discover characteristic and potentially overlapping motifs at multiple time scales, taking into account systemic deformations and temporal warping.
338	Efficient forecasting for hierarchical time series	Lars Dannecker, Robert Lorenz, Philipp Rösch, Wolfgang Lehner, Gregor Hackenbroich	To increase the forecasting efficiency for hierarchically organized time series, we introduce a novel forecasting approach that takes advantage of the hierarchical organization.
339	Extraction and integration of web data by end-users	Sudhir Agarwal, Michael Genesereth	In this paper, we present a novel approach that enables end users to easily extract data from web pages while they browse, store it locally in their browser as well as structure, integrate and search such data.
340	pEDM: online-forecasting for smart energy analytics	Lars Dannecker, Philipp Rösch, Ulrike Fischer, Gordon Gaumnitz, Wolfgang Lehner, Gregor Hackenbroich	To solve this issue, we introduce our novel online forecasting process as part of our EDM system called pEDM.
341	An efficient probabilistic framework for multi-dimensional classification	Iyad Batal, Charmgil Hong, Milos Hauskrecht	In this paper, we propose a new probabilistic approach that represents class conditional dependencies in an effective yet computationally efficient way.
342	OMS-TL: a framework of online multiple source transfer learning	Liang Ge, Jing Gao, Aidong Zhang	To achieve this end, in this paper, we propose a new framework of Online Multiple Source Transfer Learning (OMS-TL).
343	Discovering and managing quantitative association rules	Chunyao Song, Tingjian Ge	In this paper, we propose a novel divide and conquer two-phase algorithm, which is guaranteed to find all good rules efficiently.
344	Combining one-class classifiers via meta learning	Eitan Menahem, Lior Rokach, Yuval Elovici	In this work we examine the notion of combining one-class classifiers as an alternative for selecting the best classifier.
345	Scalable bootstrapping for python	Peter Birsinger, Richard Xia, Armando Fox	In this work, we create a new DSEL compiler which instead emits code to run on Spark [16], a distributed processing framework.
346	FIRE: interactive visual support for parameter space-driven rule mining	Abhishek Mukherji, Xika Lin, Jason Whitehouse, Christopher R. Botaish, Elke A. Rundensteiner, Matthew O. Ward	Our user study with 22 subjects establishes the usability and effectiveness of the proposed features and interactions of FIRE using benchmark datasets.
347	Consumer-centric SLA manager for cloud-hosted databases	Liang Zhao, Sherif Sakr, Anna Liu	We present an end-to-end framework for consumer-centric SLA management of virtualized database servers.
348	TerraFly GeoCloud: online spatial data analysis system	Yun Lu, Mingjin Zhang, Tao Li, Chang Liu, Erik Edrosa, Naphtali Rishe	This paper develops an online Spatial Data Analysis System, TerraFly GeoCloud, which facilitates the end user to visualize and analyze spatial data, and to share the analysis results.
349	MetKB: enriching RDF knowledge bases with web entity-attribute tables	Haoqiong Bian, Yueguo Chen, Xiaoyong Du, Xiaolu Zhang	In this paper, we propose a feasible solution that is able to automatically search and rank entity-attribute tables from the Web, and effectively map the extracted tables with the RDF knowledge base with very few manual efforts.
350	READFAST: high-relevance search-engine for big text	Michael Gubanov, Anna Pyayt	Here we demonstrate one of the first Big text search engines that leverages hidden structure of the natural language sentences in order to process user queries and return more relevant search-results than a standard keyword-search.
351	FusionDB: conflict management system for small-science databases	Karim Ibrahim, Nathaniel Selvo, Mohamad El-Rifai, Mohamed Eltabakh	In this paper, we demonstrate the FusionDB system; an extended relational database engine for managing conflicts in small-science databases.
352	GeCo: an online personal data generator and corruptor	Khoi-Nguyen Tran, Dinusha Vatsalan, Peter Christen	We demonstrate GeCo, an online personal data GEnerator and COrruptor that facilitates the creation of realistic personal data ranging from names, addresses, and dates, to social security and credit card numbers, as well as numerical values such as salary or blood pressure.
353	DeExcelerator: a framework for extracting relational data from partially structured documents	Julian Eberius, Christoper Werner, Maik Thiele, Katrin Braunschweig, Lars Dannecker, Wolfgang Lehner	Studying data.gov as an example source for partially structured documents, we present a classification of typical normalization problems.
354	Demonstrating intelligent crawling and archiving of web applications	Muhammad Faheem, Pierre Senellart	We demonstrate here a new approach to Web archival crawling, based on an application-aware helper that drives crawls of Web applications according to their types (especially, according to their content management systems).
355	iNewsBox: modeling and exploiting implicit feedback for building personalized news radio	Yanan Xie, Liang Chen, Kunyang Jia, Lichuan Ji, Jian Wu	This paper presents a mobile application iNewsBox enabling users to listen to news collected from the Internet.
356	SportSense: using motion queries to find scenes in sports videos	Ihab Al Kabary, Heiko Schuldt	We present SportSense, a system for interactive sports video retrieval using sketch-based motion queries.
357	PredictionIO: a distributed machine learning server for practical software development	Simon Chan, Thomas Stone, Kit Pang Szeto, Ka Hou Chan	We present PredictionIO, an open source machine learning server that comes with a step-by-step graphical user interface for developers to (i) evaluate, compare and deploy scalable learning algorithms, (ii) tune hyperparameters of algorithms manually or automatically and (iii) evaluate model training status.
358	Exploring XML data is as easy as using maps	Yong Zeng, Zhifeng Bao, Guoliang Li, Tok Wang Ling	Therefore, we try to equip the traditional XML keyword search engine with our new exploration model XMAP, providing user an interactive yet novel way to explore the results with better user experience.
359	Inside the world’s playlist	Wouter Weerkamp, Manos Tsagkias, Maarten de Rijke	We describe Streamwatchr, a real-time system for analyzing the music listening behavior of people around the world.
360	Detecting and exploring clusters in attributed graphs: a plugin for the gephi platform	Brigitte Boden, Roman Haag, Thomas Seidl	In this paper, we introduce the GC-Viz system, which is implemented as a plugin for the Gephi platform.
361	Cloud Armor: a platform for credibility-based trust management of cloud services	Talal H. Noor, Quan Z. Sheng, Anne H.H. Ngu, Abdullah Alfazi, Jeriel Law	This paper describes Cloud Armor, a platform for credibility-based trust management of cloud services.
362	Human computing games for knowledge acquisition	Sarath Kumar Kondreddi, Peter Triantafillou, Gerhard Weikum	We provide a combined approach that tightly integrates automated extraction techniques with human computing for effective gathering of facts.
363	A tool for assisting provenance search in social media	Suhas Ranganath, Pritam Gundecha, Huan Liu	This paper presents a tool for capturing the propagation network of a given tweet or URL (Uniform Resource Locator) in the Twitter network.
364	SPHINX: rich insights into evidence-hypotheses relationships via parameter space-based exploration	Abhishek Mukherji, Jason Whitehouse, Christopher R. Botaish, Elke A. Rundensteiner, Matthew O. Ward	The computational contributions cover (a.) flexible computational model selection; and (b.) real-time incremental strength computations.
365	Search excavator: the knowledge discovery tool	Dmitri Danilov, Eero Vainikko	We present a knowledge discovery tool Search Excavator (SE) developed for detecting similar words in web documents ranked by overall usage frequency in American English.
366	ESTHETE: a news browsing system to visualize the context and evolution of news stories	Rahul Goyal, Ravee Malla, Amitabha Bagchi, Sameep Mehta, Maya Ramanath	In this paper, we introduce ESTHETE, a system that provides rich context(s) (through what we call personalized flexible context extraction), by preprocessing and storing articles in a structured representation (directed graphs) that makes it easy for the user to explore different contexts.
367	WordSeer: a knowledge synthesis environment for textual data	Aditi Muralidharan, Marti A. Hearst, Christopher Fan	We describe WordSeer, a tool whose goal is to help scholars and analysts discover patterns and formulate and test hypotheses about the contents of text collections, midway between what humanities scholars call a traditional "close read” and the new "distant read" or "culturomics" approach.
368	Channeling the deluge: research challenges for big data and information systems	Paul Bennett, Lee Giles, Alon Halevy, Jiawei Han, Marti Hearst, Jure Leskovec	As a group of experienced researchers in academia and industry, we will present at this panel our visions on what should be the challenging research issues in this promising research frontier and hope to attract heated discussions and debates from the audience.
369	AKBC 2013: third workshop on automated knowledge base construction	Fabian M. Suchanek, Sebastian Riedel, Sameer Singh, Partha P. Talukdar	The AKBC 2013 workshop aims to be a venue of excellence and vision in the area of knowledge base construction.
370	DOLAP 2013 workshop summary	Ladjel Bellatreche, Alfredo Cuzzocrea, Il-Yeol Song	The ACM DOLAP workshop presents research on data warehousing and On-Line Analytical Processing (OLAP).
371	Sixth workshop on exploiting semantic annotations in information retrieval (ESAIR’13)	Paul. N. Bennett, Evgeniy Gabrilovich, Jaap Kamps, Jussi Karlgren	Sixth workshop on exploiting semantic annotations in information retrieval (ESAIR’13)
372	2013 international workshop on computational scientometrics: theory and applications	Cornelia Caragea, C. Lee Giles, Lior Rokach, Xiaozhong Liu	2013 international workshop on computational scientometrics: theory and applications
373	Workshop summary for the 2013 international workshop on mining unstructured big data using natural language processing	Xiaozhong Liu, Miao Chen, Ying Ding, Min Song	Workshop summary for the 2013 international workshop on mining unstructured big data using natural language processing
374	CloudDB 2013: fifth international workshop on cloud data management	Feifei Li, Xiaofeng Meng, Fusheng Wang, Cong Yu	The main objective of the workshop is to address the challenges of large scale data management based on the cloud computing infrastructure.
375	DUBMOD13: international workshop on data-driven user behavioral modelling and mining from social media	Jalal Mahmud, Jeffrey Nichols, Michelle X. Zhou, James Caverlee, John O’Donovan	Since mining and understanding user behavior from social media often requires interdisciplinary effort, including machine learning, text mining, human-computer interaction, and social science, our workshop aims to bring together researchers and practitioners from multiple fields to discuss the creation of deeper models of individual users by mining the content that they publish and the social networking behavior that they exhibit.
376	PLEAD 2013: politics, elections and data	Ingmar Weber, Ana-Maria Popescu, Marco Pennacchiotti	The goal of this workshop is to bring together researchers working at the intersection of social network analysis, computational social science and political science, to share and discuss their ideas in a common forum; and to inspire further developments in this growing, fascinating field.
377	DTMBIO 2013: international workshop on data and text mining in biomedical informatics	Atul Butte, Doheon Lee, Hua Xu, Min Song	DTMBIO 13 will be a forum of discussing and exchanging informatics related techniques and problems in the context of biomedical research.
378	CIKM 2013 workshop on living labs for information retrieval evaluation	Krisztian Balog, David Elsweiler, Evangelos Kanoulas, Liadh Kelly, Mark D. Smucker	CIKM 2013 workshop on living labs for information retrieval evaluation
379	The first workshop on user engagement optimization	Liangjie Hong, Shuang-Hong Yang	Here, we organize the first workshop on the topic of online user engagement optimization, explicitly targeting the topic as a whole and bring researchers and practitioners together to foster the field.
380	PIKM 2013: the 6th ACM workshop for ph.d. students in information and knowledge management	Fabian M. Suchanek, Anisoara Nica	Similarly to the CIKM, the PIKM workshop covers a wide range of topics in the areas of databases, information retrieval and knowledge management.
381	Web-KR 2013: the 4th international workshop on web-scale knowledge representation, retrieval and reasoning	Yi Zeng, Spyros Kotoulas, Zhisheng Huang	This summary introduces the major contributions of accepted papers in the Web-KR 2013 workshop.
382	Data management & analytics for healthcare (DARE 2013)	Ullas Nambiar, Niranjan Thirumale	This workshop is focused on identifying challenges to be overcome for effectively delivering efficient healthcare and to the masses.