Paper Digest: CIKM 2013 Highlights
The ACM Conference on Information and Knowledge Management (CIKM) is an annual computer science research conference dedicated to information management and knowledge management.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
team@paperdigest.org
TABLE 1: CIKM 2013 Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | Scholarly big data: information extraction and data mining | C. Lee Giles | We discuss scholarly big data challenges, insights, methodologies and applications. |
2 | Applying theory to practice | Ronald Fagin | We present the remarkably simple Threshold Algorithm, which is optimal in an extremely strong sense: optimal not just in the worst case, or in the average case, but in every case! |
3 | Usability in machine learning at scale with graphlab | Carlos Guestrin | In this talk, we will focus on:
|
4 | Structured data in web search | Alon Halevy | I will describe some of the efforts we are conducting at Google to collect structured data, filter the high-quality content, and serve it to our users. |
5 | One size does not fit all: multi-granularity search of web forums | Gayatree Ganu, Amélie Marian | In this paper, we address the problem of presenting textual search results in a concise manner to answer user needs. |
6 | Spatial search for K diverse-near neighbors | Gregory Ference, Wang-Chien Lee, Hui-Ju Jung, De-Nian Yang | In this paper, we investigate the problem of searching for the k Diverse-Near Neighbors (kDNNs)} in spatial space that is based upon the spatial diversity and proximity of candidate locations to the query point. |
7 | Mining a search engine’s corpus without a query pool | Mingyang Zhang, Nan Zhang, Gautam Das | In this paper, we study how to enable third-party data analytics over a search engine’s corpus without the cooperation of its owner – specifically, by issuing a small number of search queries through the web interface. |
8 | G-tree: an efficient index for KNN search on road networks | Ruicheng Zhong, Guoliang Li, Kian-Lee Tan, Lizhu Zhou | In this paper we study the problem of kNN search on road networks. |
9 | Efficient parsing-based search over structured data | Aditya Parameswaran, Raghav Kaushik, Arvind Arasu | In this paper, we present a suite of efficient algorithms and auxiliary indexes for this problem. |
10 | Graph-of-word and TW-IDF: new approach to ad hoc IR | François Rousseau, Michalis Vazirgiannis | In this paper, we introduce novel document representation (graph-of-word) and retrieval model (TW-IDF) for ad hoc IR. |
11 | Map search via a factor graph model | Qi Zhang, Jihua Kang, Yeyun Gong, Huan Chen, Yaqian Zhou, Xuanjing Huang | In this paper, we propose to connect this task to the semi-structured retrieval problem. |
12 | A phased ranking model for question answering | Rui Liu, Eric Nyberg | We propose an approach that allows each phase in a system to leverage information propagated from preceding phases to inform the ranking decision. |
13 | CRF framework for supervised preference aggregation | Maksims N. Volkovs, Richard S. Zemel | We describe procedures for learning in this modelnand demonstrate that inference can be done much more efficiently thannin analogous models. |
14 | CQArank: jointly model topics and expertise in community question answering | Liu Yang, Minghui Qiu, Swapna Gottipati, Feida Zhu, Jing Jiang, Huiping Sun, Zhong Chen | To tackle this cluster of closely related problems in a principled approach, we proposed Topic Expertise Model (TEM), a novel probabilistic generative model with GMM hybrid, to jointly model topics and expertise by integrating textual content model and link structure analysis. |
15 | Penguins in sweaters, or serendipitous entity search on user-generated content | Ilaria Bordino, Yelena Mejova, Mounia Lalmas | In this work, the content of each data source is represented as an entity network, which is further enriched with metadata about sentiment, writing quality, and topical category. |
16 | Entity-centric document filtering: boosting feature mapping through meta-features | Mianwei Zhou, Kevin Chen-Chuan Chang | Based on the insight that keywords sharing some similar "properties" should have similar importance for their respective entities, we propose a novel concept of meta-feature to map keywords from different entities. |
17 | Structured positional entity language model for enterprise entity retrieval | Chunliang Lu, Lidong Bing, Wai Lam | We investigate the problem of general entity retrieval for enterprise websites. |
18 | Learning relatedness measures for entity linking | Diego Ceccarelli, Claudio Lucchese, Salvatore Orlando, Raffaele Perego, Salvatore Trani | In this paper we address the problem of learning high quality entity relatedness functions. |
19 | Gem-based entity-knowledge maintenance | Bilyana Taneva, Gerhard Weikum | To overcome this limitation and accelerate the maintenance of knowledge bases, we propose an approach that automatically extracts, from the Web, key contents for given input entities. |
20 | Predicting user activity level in social networks | Yin Zhu, Erheng Zhong, Sinno Jialin Pan, Xiao Wang, Minzhe Zhou, Qiang Yang | In this paper, we focus on a fundamental task: to predict a user’s future activity levels in a social network, e.g. weekly activeness, active or inactive. |
21 | On popularity prediction of videos shared in online social networks | Haitao Li, Xiaoqiang Ma, Feng Wang, Jiangchuan Liu, Ke Xu | In this paper, we present an initial study on the popularity prediction of videos propagated in OSNs along friendship links. |
22 | Inferring anchor links across multiple heterogeneous social networks | Xiangnan Kong, Jiawei Zhang, Philip S. Yu | In this paper, we study the problem of anchor link prediction across multiple heterogeneous social networks, i.e., discovering the correspondence among different accounts of the same user. |
23 | Community-based user recommendation in uni-directional social networks | Gang Zhao, Mong Li Lee, Wynne Hsu, Wei Chen, Haoji Hu | In this work, we propose a community-based approach to user recommendation in Twitter-style social networks. |
24 | Personalized influence maximization on social networks | Jing Guo, Peng Zhang, Chuan Zhou, Yanan Cao, Li Guo | In this paper, we study a new problem on social network influence maximization. |
25 | Discovering coherent topics using general knowledge | Zhiyuan Chen, Arjun Mukherjee, Bing Liu, Meichun Hsu, Malu Castellanos, Riddhiman Ghosh | In this paper, we propose a framework to leverage the general knowledge in topic models. |
26 | Spatio-temporal and events based analysis of topic popularity in twitter | Sebastien Ardon, Amitabha Bagchi, Anirban Mahanti, Amit Ruhela, Aaditeshwar Seth, Rudra Mohan Tripathy, Sipat Triukose | We present the first comprehensive characterization of the diffusion of ideas on Twitter, studying more than 5.96 million topics that include both popular and less popular topics. |
27 | Domain-dependent/independent topic switching model for online reviews with numerical ratings | Yasutoshi Ida, Takuma Nakamura, Takashi Matsumoto | We propose a domain-dependent/independent topic switching model based on Bayesian probabilistic modeling for modeling online product reviews that are accompanied with numerical ratings provided by users. |
28 | A partially supervised cross-collection topic model for cross-domain text classification | Yang Bao, Nigel Collier, Anindya Datta | In this paper, we propose a model called Partially Supervised Cross-Collection LDA topic model (PSCCLDA) for cross-domain learning with the purpose of addressing these two issues in a unified way. |
29 | Content coverage maximization on word networks for hierarchical topic summarization | Chi Wang, Xiao Yu, Yanen Li, Chengxiang Zhai, Jiawei Han | We propose a new approach of text modeling via network analysis. |
30 | Mining frequent neighborhood patterns in a large labeled graph | Jialong Han, Ji-Rong Wen | We propose mining a new class of patterns called frequent neighborhood patterns, which is free from the "DCP-intuitiveness" dilemma of mining frequent subgraphs in a single graph. |
31 | A two-phase algorithm for mining sequential patterns with differential privacy | Luca Bonomi, Li Xiong | In this paper, we study the sequential pattern mining problem under the differential privacy framework which provides formal and provable guarantees of privacy. |
32 | Mining diabetes complication and treatment patterns for clinical decision support | Lu Liu, Jie Tang, Yu Cheng, Ankit Agrawal, Wei-keng Liao, Alok Choudhary | In this paper, we investigate how to utilize the heterogeneous medical records to aid the clinical treatments of diabetes mellitus. |
33 | Mining-based compression approach of propositional formulae | Said Jabbour, Lakhdar Sais, Yakoub Salhi, Takeaki Uno | In this paper, we propose a first application of data mining techniques to propositional satisfiability. |
34 | Correlating medical-dependent query features with image retrieval models using association rules | Hajer Ayadi, Mouna Torjmen, Mariam Daoud, Maher Ben Jemaa, Jimmy Xiangji Huang | In this paper, we propose a novel approach for finding correlations between medical query features and retrieval models based on association rule mining. |
35 | Local correlation detection with linearity enhancement in streaming data | Qing Xie, Shuo Shang, Bo Yuan, Chaoyi Pang, Xiangliang Zhang | This paper proposes effective methods to continuously detect the correlation between data streams. |
36 | Efficient processing of streaming graphs for evolution-aware clustering | Mindi Yuan, Kun-Lung Wu, Gabriela Jacques-Silva, Yi Lu | In this paper, we present an efficient approach to processing streaming graphs for evolution-aware clustering (EAC) of vertices. |
37 | Searching similar segments over textual event sequences | Liang Tang, Tao Li, Shu-Ching Chen, Shunzhi Zhu | In this paper, we propose a method, suffix matrix, for efficiently searching similar segments over textual event sequences. |
38 | RWS-Diff: flexible and efficient change detection in hierarchical data | Jan P. Finis, Martin Raiber, Nikolaus Augsten, Robert Brunel, Alfons Kemper, Franz Färber | We propose the random walks similarity (RWS) measure which can be used to find similar subtrees rapidly. |
39 | Causality and responsibility: probabilistic queries revisited in uncertain databases | Xiang Lian, Lei Chen | To efficiently process CR-PNN queries, we propose effective pruning strategies to quickly filter out false alarms, and design efficient algorithms to obtain CR-PNN answers. |
40 | Locality sensitive hashing for scalable structural classification and clustering of web documents | Christian Hachenberg, Thomas Gottron | We introduce a novel technique to support these two tasks: template fingerprints. |
41 | An index for efficient semantic full-text search | Hannah Bast, Björn Buchhold | In this paper we present a novel index data structure tailored towards semantic full-text search. |
42 | Load-sensitive selective pruning for distributed search | Daniele Broccolo, Craig Macdonald, Salvatore Orlando, Iadh Ounis, Raffaele Perego, Fabrizio Silvestri, Nicola Tonellotto | In this paper, we propose and evaluate a different approach, where, given a set of different query processing strategies with differing efficiency, each query is considered by a framework that sets a maximum query processing time and selects which processing strategy is the best for that query, such that the processing time for all queries is kept below the threshold. |
43 | Rank-energy selective query forwarding for distributed search systems | Amin Teymorian, Ophir Frieder, Marcus A. Maloof | We present a hybrid rank-energy query forwarding model termed "RESQ." |
44 | Augmenting web search surrogates with images | Robert Capra, Jaime Arguello, Falk Scholer | In this paper, we present results of two large-scale user studies to examine the effects of augmenting text-based surrogates with images extracted from the underlying webpage. |
45 | Building a large-scale corpus for evaluating event detection on twitter | Andrew J. McMinn, Yashar Moshfeghi, Joemon M. Jose | In this paper, we propose a methodology for the creation of an event detection corpus. Specifically, we first create a new corpus that covers a period of 4 weeks and contains over 120 million tweets, which we make available for research. |
46 | On sparsity and drift for effective real-time filtering in microblogs | M-Dyaa Albakour, Craig Macdonald, Iadh Ounis | In this paper, we approach the problem of real-time filtering in the Twitter Microblogging platform. |
47 | Probabilistic solutions of influence propagation on social networks | Miao Zhang, Chunni Dai, Chris Ding, Enhong Chen | In this paper, we emphasize the probabilistic nature of influence propagation. |
48 | Improving pseudo-relevance feedback via tweet selection | Taiki Miyanishi, Kazuhiro Seki, Kuniaki Uehara | To overcome the limitation of pseudo-relevance feedback for microblog search, we propose a novel query expansion method based on two-stage relevance feedback that models search interests by manual tweet selection and integration of lexical and temporal evidence into its relevance model. |
49 | Supporting exploratory people search: a study of factor transparency and user control | Shuguang Han, Daqing He, Jiepu Jiang, Zhen Yue | In this project, we developed PeopleExplorer, an interactive people search system to support exploratory search tasks when looking for people. |
50 | Location prediction in social media based on tie strength | Jeffrey McGee, James Caverlee, Zhiyuan Cheng | We propose a novel network-based approach for location estimation in social media that integrates evidence of the social tie strength between users for improved location estimation. |
51 | To stay or not to stay: modeling engagement dynamics in social graphs | Fragkiskos D. Malliaros, Michalis Vazirgiannis | In this paper, we build upon recent work in the field of game theory, where the behavior of individuals (nodes) is modeled by a technology adoption game. |
52 | UNIK: unsupervised social network spam detection | Enhua Tan, Lei Guo, Songqing Chen, Xiaodong Zhang, Yihong Zhao | UNIK: unsupervised social network spam detection |
53 | Modeling dynamics of meta-populations with a probabilistic approach: global diffusion in social media | Minkyoung Kim, David Newth, Peter Christen | In this paper, we propose a macro-level diffusion model with a probabilistic approach by combining both heterogeneity and structural connectivity of social networks. |
54 | Diffusion of innovations revisited: from social network to innovation network | Xin Rong, Qiaozhu Mei | In this paper, we take a formal quantitative approach to address how different pieces of innovations socialize with each other and how the interrelationships among innovations affect users’ adoption behavior, which provides a novel perspective of understanding the diffusion of innovations. |
55 | StaticGreedy: solving the scalability-accuracy dilemma in influence maximization | Suqi Cheng, Huawei Shen, Junming Huang, Guoqing Zhang, Xueqi Cheng | Motivated by this critical finding, we propose a static greedy algorithm, named StaticGreedy, to strictly guarantee the submodularity of influence spread function during the seed selection process. |
56 | Online multitasking and user engagement | Janette Lehmann, Mounia Lalmas, Georges Dupret, Ricardo Baeza-Yates | In this paper, we study the effect of online multitasking on two widely used engagement metrics designed to capture users browsing behavior with a site. |
57 | PATRIC: a parallel algorithm for counting triangles in massive networks | Shaikh Arifuzzaman, Maleq Khan, Madhav Marathe | In this paper, we present an efficient MPI-based distributed memory parallel algorithm, called PATRIC, for counting triangles in massive networks. |
58 | An efficient MapReduce algorithm for counting triangles in a very large graph | Ha-Myung Park, Chin-Wan Chung | In this paper, we propose a new algorithm based on graph partitioning with a novel idea of triangle classification to count the number of triangles in a graph. |
59 | Parallel motif extraction from very long sequences | Majed Sahli, Essam Mansour, Panos Kalnis | This paper presents ACME, a combinatorial method for extracting motifs from a single very long sequence. |
60 | The logical diversity of explanations in OWL ontologies | Samantha Bail, Bijan Parsia, Ulrike Sattler | In this paper, we introduce and explore several equivalence relations over justifications for entailments of OWL ontologies which partition a set of justifications into structurally similar subsets. |
61 | Ontology authoring with FORZA | C. Maria Keet, Muhammad Tahir Khan, Chiara Ghidini | We solve this with a generic approach and realize it with the Foundational Ontology and Reasoner-enhanced axiomatiZAtion (FORZA) method, containing DOLCE, a decision diagram for DOLCE categories, part-whole relations, and an automated reasoner that is used during the authoring process to propose feasible axioms. |
62 | Aligning freebase with the YAGO ontology | Elena Demidova, Iryna Oelze, Wolfgang Nejdl | In this paper we analyze the structure of YAGO in more depth and show how to match YAGO and Freebase categories. |
63 | PIDGIN: ontology alignment using web text as interlingua | Derry Wijaya, Partha Pratim Talukdar, Tom Mitchell | We present a novel approach to this ontology alignment problem that employs a very large natural language text corpus as an interlingua to relate different knowledge bases (KBs). |
64 | Mapping adaptation actions for the automatic reconciliation of dynamic ontologies | Julio Cesar Dos Reis, Duy Dinh, Cédric Pruski, Marcos Da Silveira, Chantal Reynaud-Delaître | In this article, we propose a set of mapping adaptation actions and present how they are used to maintain mappings up-to-date based on ontology change operations of different nature. |
65 | On mining mobile apps usage behavior for predicting apps usage in smartphones | Zhung-Xun Liao, Yi-Chin Pan, Wen-Chih Peng, Po-Ruey Lei | In this paper, we propose two selection algorithms, MaxProb and MinEntropy. |
66 | Ranking fraud detection for mobile apps: a holistic view | Hengshu Zhu, Hui Xiong, Yong Ge, Enhong Chen | To this end, in this paper, we provide a holistic view of ranking fraud and propose a ranking fraud detection system for mobile Apps. |
67 | AnchorMF: towards effective event context identification | Hansu Gu, Mike Gartrell, Liang Zhang, Qin Lv, Dirk Grunwald | In this work, we have developed AnchorMF, a matrix factorization based technique that aims to identify event context by leveraging a prevalent feature in OSNs, the anchor information. |
68 | How the live web feels about events | George Valkanas, Dimitrios Gunopulos | In this paper, we focus on the problem of automatically identifying events as they occur, in such a user-driven, fast paced and voluminous setting. |
69 | Boolean satisfiability for sequence mining | Said Jabbour, Lakhdar Sais, Yakoub Salhi | In this paper, we propose a SAT-based encoding for the problem of discovering frequent, closed and maximal patterns in a sequence of items and a sequence of itemsets. |
70 | Users versus models: what observation tells us about effectiveness metrics | Alistair Moffat, Paul Thomas, Falk Scholer | This work explores that link, by analyzing the assumptions and implications of a number of effectiveness metrics, and exploring how these relate to observable user behaviors. |
71 | Evaluating aggregated search using interleaving | Aleksandr Chuklin, Anne Schuth, Katja Hofmann, Pavel Serdyukov, Maarten de Rijke | We propose an interleaving algorithm that allows comparisons of search engine result pages containing grouped vertical documents. |
72 | Using historical click data to increase interleaving sensitivity | Eugene Kharitonov, Craig Macdonald, Pavel Serdyukov, Iadh Ounis | In this paper we propose a novel approach to further improve interleaving sensitivity by using pre-experimental user behaviour data. |
73 | On the reliability and intuitiveness of aggregated search metrics | Ke Zhou, Mounia Lalmas, Tetsuya Sakai, Ronan Cummins, Joemon M. Jose | In this paper, we compare the properties of existing AS metrics under the assumptions that (1) queries may have multiple preferred verticals; (2) the likelihood of each vertical preference is available; and (3) the topical relevance assessments of results returned from each vertical is available. |
74 | User intent and assessor disagreement in web search evaluation | Gabriella Kazai, Emine Yilmaz, Nick Craswell, S.M.M. Tahaghoghi | In this paper, we examine the relationship between assessor disagreement and various click based measures, such as click preference strength and user intent similarity, for judgments collected from editorial judges and crowd workers using single absolute, pairwise absolute and pairwise preference based judging methods. |
75 | The water filling model and the cube test: multi-dimensional evaluation for professional search | Jiyun Luo, Christopher Wing, Hui Yang, Marti Hearst | This paper proposes a 3D water filling model to describe this search process, and derives a new evaluation metric, the Cube Test, to encompass the complex nature of professional search. |
76 | Disinformation techniques for entity resolution | Steven Euijong Whang, Hector Garcia-Molina | We formalize the problem of finding the disinformation with the highest benefit given a limited budget for creating the disinformation and propose efficient algorithms for solving the problem. |
77 | Location recommendation for out-of-town users in location-based social networks | Gregory Ference, Mao Ye, Wang-Chien Lee | In this paper, we study the issues in making location recommendations for out-of-town users by taking into account user preference, social influence and geographical proximity. |
78 | Short text classification by detecting information path | Shitao Zhang, Xiaoming Jin, Dou Shen, Bin Cao, Xuetao Ding, Xiaochen Zhang | We propose a method to detect the information path and employ it in short text classification. |
79 | Personalized point-of-interest recommendation by mining users’ preference transition | Xin Liu, Yong Liu, Karl Aberer, Chunyan Miao | In this work, we propose a novel category-aware POI recommendation model, which exploits the transition patterns of users’ preference over location categories to improve location recommendation accuracy. |
80 | Proximity | Jannik Strötgen, Michael Gertz | In this paper, we present a new model to rank documents according to combined textual, temporal, and geographic queries. |
81 | Timely crawling of high-quality ephemeral new content | Damien Lefortier, Liudmila Ostroumova, Egor Samosvat, Pavel Serdyukov | We thus propose a new metric, well thought out for this task, which takes into account the decrease of user interest for ephemeral pages over time. |
82 | LearNext: learning to predict tourists movements | Ranieri Baraglia, Cristina Ioana Muntean, Franco Maria Nardini, Fabrizio Silvestri | In this paper, we tackle the problem of predicting the "next" geographical position of a tourist given her history (i.e., the prediction is done accordingly to the tourist’s current trail) by means of supervised learning techniques, namely Gradient Boosted Regression Trees and Ranking SVM. |
83 | Where shall we go today?: planning touristic tours with tripbuilder | Igo Brilhante, Jose Antonio Macedo, Franco Maria Nardini, Raffaele Perego, Chiara Renso | In this paper we propose TripBuilder, a new framework for personalized touristic tour planning. |
84 | Efficient filtering and ranking schemes for finding inclusion dependencies on the web | Atsuyuki Morishima, Erika Yumiya, Masami Takahashi, Shigeo Sugimoto, Hiroyuki Kitagawa | In this paper, we address the problem of finding inclusion dependencies on the Web. |
85 | A generic front-stage for semi-stream processing | M. Asif Naeem, Gerald Weber, Gillian Dobbie, Christof Lutteroth | We propose a caching approach that can be used as a front-stage for different semi-stream join algorithms, resulting in significant performance gains for common applications. |
86 | Scalable diversification of multiple search results | Hina A. Khan, Marina Drosou, Mohamed A. Sharaf | In this paper, we address the concurrent diversification of multiple search results using various approximation techniques that provide orders of magnitude reductions in processing cost, while maintaining comparable quality of diversification as compared to sequential methods. |
87 | Parallel triangle counting in massive streaming graphs | Kanat Tangwongsan, A. Pavan, Srikanta Tirthapura | This paper presents the design and implementation of a fast parallel algorithm for estimating the number of triangles in a massive undirected graph whose edges arrive as a stream. |
88 | Cache refreshing for online social news feeds | Xiao Bai, Flavio P. Junqueira, Adam Silberstein | We propose a novel cache scheme called SOCR (Social Online Cache Refreshing) for identifying and refreshing cache entries. |
89 | A new operator for efficient stream-relation join processing in data streaming engines | Roozbeh Derakhshan, Abdul Sattar, Bela Stantic | In this paper, we propose a new SRJ operator to facilitate SRJ processing regardless of the cache performance using two techniques: batching and out-of-order processing. |
90 | SCISSOR: scalable and efficient reachability query processing in time-evolving hierarchies | Phani Rohit Mullangi, Lakshmish Ramaswamy | In this paper, we propose SCISSOR (selective snapshot indexing with progressive solution refinement), which, to the best of our knowledge is the first time and space efficient framework for answering reachability queries in TEHs. |
91 | Towards metric fusion on multi-view data: a cross-view based graph random walk approach | Yang Wang, Xuemin Lin, Qing Zhang | In this paper, we propose a novel Metric Fusion technique via cross-view graph Random Walk, named MFRW, regarding a multi-view based similarity graphs (with each similarity graph constructed under each view). |
92 | Discovering latent blockmodels in sparse and noisy graphs using non-negative matrix factorisation | Jeffrey Chan, Wei Liu, Andrey Kan, Christopher Leckie, James Bailey, Kotagiri Ramamohanarao | In this paper, we propose a new non-negative matrix factorisation approach that can discover blockmodels in sparse and noisy graphs. |
93 | Understanding the roles of sub-graph features for graph classification: an empirical study perspective | Ting Guo, Xingquan Zhu | One of the most common graph classification approaches is to use sub-graph features to convert graphs into instance-feature representations, so generic learning algorithms can be applied to derive learning models. |
94 | PAGE: a partition aware graph computation engine | Yingxia Shao, Junjie Yao, Bin Cui, Lin Ma | In this paper, we analyse the cost of parallel graph computing systems as well as the relationship between the cost and underlying graph partitioning. |
95 | Active exploration: simultaneous sampling and labeling for large graphs | Meng Fang, Jie Yin, Xingquan Zhu | In this paper, we propose an Active Exploration framework for large graphs where the goal is to carry out network sampling and node labeling at the same time. |
96 | Local clustering in provenance graphs | Peter Macko, Daniel Margo, Margo Seltzer | Local clustering in these graphs, in which we start with a seed vertex and grow a cluster around it, is of paramount importance because it supports critical provenance applications such as identifying semantically meaningful tasks in an object’s history. |
97 | Content-centric flow mining for influence analysis in social streams | Karthik Subbian, Charu Aggarwal, Jaideep Srivastava | In this paper, we propose a fully content-centered model of flow analysis in social network streams, in which the analysis is based on actual content transmissions in the network, rather than a static model of transmission on the edges. First, we introduce the problem of information flow mining in social streams, and then propose a novel algorithm InFlowMine to discover the information flow patterns in the network. |
98 | Labels or attributes?: rethinking the neighbors for collective classification in sparsely-labeled networks | Luke K. McDowell, David W. Aha | We show that these effects are consistent across a range of datasets, learning choices, and inference algorithms, and that using both neighbor attributes and labels often produces the best accuracy. |
99 | Fast parameterless density-based clustering via random projections | Johannes Schneider, Michail Vlachos | We present two fast density-based clustering algorithms based on random projections. |
100 | Mining entity attribute synonyms via compact clustering | Yanen Li, Bo-June Paul Hsu, ChengXiang Zhai, Kuansan Wang | In this work, we propose a novel compact clustering framework to jointly identify synonyms for a set of attribute values. |
101 | Modeling interaction features for debate side clustering | Minghui Qiu, Liu Yang, Jing Jiang | This paper proposes a two-stage solution based on latent variable models: an interaction feature identification stage to mine interaction features from structured debate posts with known sides and reply intentions; and a clustering stage to incorporate interaction features and model the interplay between interactions and sides for debate side clustering. |
102 | Dynamic multi-faceted topic discovery in twitter | Jan Vosecky, Di Jiang, Kenneth Wai-Ting Leung, Wilfred Ng | In this paper, we therefore propose a method for mining multifaceted topics from Twitter streams. |
103 | Mining causal topics in text data: iterative topic modeling with time series feedback | Hyun Duk Kim, Malu Castellanos, Meichun Hsu, ChengXiang Zhai, Thomas Rietz, Daniel Diermeier | We develop a novel general text mining framework for discovering such causal topics from text. |
104 | Navigating the topical structure of academic search results via the Wikipedia category network | Daniil Mirylenka, Andrea Passerini | We propose a novel method of organizing the search results into concise and informative topic hierarchies. |
105 | A multimodal framework for unsupervised feature fusion | Xiaoyi Li, Jing Gao, Hui Li, Le Yang, Rohini K. Srihari | In this paper, we propose a multimodal feature fusion framework which can model any given image-description pair using semantically meaningful features. |
106 | Probabilistic semantic similarity measurements for noisy short texts using Wikipedia entities | Masumi Shirakawa, Kotaro Nakayama, Takahiro Hara, Shojiro Nishio | This paper describes a novel probabilistic method of measuring semantic similarity for real-world noisy short texts like microblog posts. |
107 | Linear-time enumeration of maximal K-edge-connected subgraphs in large networks by random contraction | Takuya Akiba, Yoichi Iwata, Yuichi Yoshida | In this paper, we propose a new method to decompose a graph into maximal k-edge-connected components, based on random contraction of edges. |
108 | External memory K-bisimulation reduction of big graphs | Yongming Luo, George H.L. Fletcher, Jan Hidders, Yuqing Wu, Paul De Bra | In this paper, we present, to our knowledge, the first known I/O efficient solutions for computing the k-bisimulation partition of a massive directed graph, and performing maintenance of such a partition upon updates to the underlying graph. |
109 | Querying graphs with preferences | Valeria Fionda, Giuseppe Pirro’ | This paper presents GuLP a graph query language that enables to declaratively express preferences. |
110 | Network-aware search in social tagging applications: instance optimality versus efficiency | Silviu Maniu, Bogdan Cautis | We propose algorithms that have the potential to scale to current applications. |
111 | A comparison of two physical data designs for interactive social networking actions | Sumita Barahmand, Shahram Ghandeharizadeh, Jason Yap | This paper compares the performance of an SQL solution that implements a relational data model with a document store named MongoDB. |
112 | Community question topic categorization via hierarchical kernelized classification | Wen Chan, Weidong Yang, Jinhui Tang, Jintao Du, Xiangdong Zhou, Wei Wang | We present a hierarchical kernelized classification model for the automatic classification of general questions into their corresponding topic categories in community Question Answering service (cQAs). |
113 | Building structures from classifiers for passage reranking | Aliaksei Severyn, Massimo Nicosia, Alessandro Moschitti | This paper shows that learning to rank models can be applied to automatically learn complex patterns, such as relational semantic structures occurring in questions and their answer passages. |
114 | Uncovering collusive spammers in Chinese review websites | Chang Xu, Jie Zhang, Kuiyu Chang, Chong Long | Empirical analysis, on recently crawled product reviews from a popular Chinese e-commerce website, reveals the failure of many state-of-the-art spam indicators on detecting collusive spammers. |
115 | Towards minimizing the annotation cost of certified text classification | Mossaab Bagdouri, William Webber, David D. Lewis, Douglas W. Oard | Drawing on ideas from statistical power analysis, we present a framework for joint minimization of training and test annotation that maintains the statistical validity of effectiveness estimates, and yields a natural definition of an optimal allocation of annotations to training and test data. |
116 | A heterogenous automatic feedback semi-supervised method for image reranking | Xin-Chao Xu, Xin-Shun Xu, Yafang Wang, Xiaolin Wang | Motivated by this, in this paper, we propose the HAFSRerank–Heterogenous Automatic Feedback Semi-supervised Reranking method which makes use of both visual and textual features simultaneously during reranking. |
117 | Accurate and scalable nearest neighbors in large networks based on effective importance | Petko Bogdanov, Ambuj Singh | We propose a novel proximity measure for weighted graphs called Effective Importance which incorporates multiple paths between nodes and captures the inherent structural clusters within a network. |
118 | Spatial-temporal query homogeneity for KNN object search on road networks | Ying-Ju Chen, Kun-Ta Chuang, Ming-Syan Chen | We in this paper explore a new research paradigm, called query homogeneity, to process KNN queries on road networks for online LBS applications. |
119 | Discovering influential authors in heterogeneous academic networks by a co-ranking method | Qinxue Meng, Paul J. Kennedy | Faced with this problem, we propose a co–ranking method to evaluate scientific publications and authors. |
120 | Entity disambiguation in anonymized graphs using graph kernels | Linus Hermansson, Tommi Kerola, Fredrik Johansson, Vinay Jethava, Devdatt Dubhashi | This paper presents a novel method for entity disambiguation in anonymized graphs using local neighborhood structure. |
121 | Estimating the relative utility of networks for predicting user activities | Nina Mishra, Daniel M. Romero, Panayiotis Tsaparas | In this paper, we introduce a new related problem: given a collection of networks, how can we determine the relative importance of each network for predicting user activities? |
122 | Exploring weakly supervised latent sentiment explanations for aspect-level review analysis | Lei Fang, Minlie Huang, Xiaoyan Zhu | In this paper, we explore a new concept for aspect-level review analysis, latent sentiment explanations, which are defined as a set of informative aspect-specific sentences whose polarities are consistent with that of the review. |
123 | Using micro-reviews to select an efficient set of reviews | Thanh-Son Nguyen, Hady W. Lauw, Panayiotis Tsaparas | We propose a novel methodology that brings together these two diverse types of review content, to obtain something that is more than the sum of its parts. |
124 | Automatic construction of domain and aspect specific sentiment lexicons for customer review mining | Juergen Bross, Heiko Ehrig | We propose a novel method that allows to automatically adapt and extend existing lexicons to a specific product domain. |
125 | Wikification via link co-occurrence | Zhiyuan Cai, Kaiqi Zhao, Kenny Q. Zhu, Haixun Wang | In this paper, we present a simple but powerful framework of sense disambiguation using co-occurrences of Wikipedia links in the Wikipedia corpus. |
126 | Manipulation among the arbiters of collective intelligence: how wikipedia administrators mold public opinion | Sanmay Das, Allen Lavoie, Malik Magdon-Ismail | We find a surprisingly large number of editors who change their behavior and begin focusing more on a particular controversial topic once they are promoted to administrator status. |
127 | Robust question answering over the web of linked data | Mohamed Yahya, Klaus Berberich, Shady Elbassuoni, Gerhard Weikum | This paper advocates a new approach that allows questions to be partially translated into relaxed queries, covering the essential but not necessarily all aspects of the user’s input. |
128 | Expertise retrieval in bibliographic network: a topic dominance learning approach | Seyyed Hadi Hashemi, Mahmood Neshati, Hamid Beigy | Motivated by the observation that rarely do all coauthors contribute to a paper equally, in this paper, we propose a discriminative method to realize leading authors contributing in a scientific publication. |
129 | Instant foodie: predicting expert ratings from grassroots | Chenhao Tan, Ed H. Chi, David Huffaker, Gueorgi Kossinets, Alexander J. Smola | In this paper, we examine the two different approaches to collecting user ratings of restaurants and explore the question of whether it is possible to reconcile them. |
130 | On segmentation of eCommerce queries | Nish Parikh, Prasad Sriram, Mohammad Al Hasan | In this paper, we present QSEGMENT, a real-life query segmentation system for eCommerce queries. |
131 | Scientific articles recommendation | Yingming Li, Ming Yang, Zhongfei (Mark) Zhang | We study the problem of recommending scientific articles to users in an online community and present a novel matrix factorization model, the topic regression Matrix Factorization (tr-MF), to solve the problem. |
132 | MRPacker: an SQL to mapreduce optimizer | Xuelian Lin, Yue Ye, Shuai Ma | In this paper, we propose MRPacker, a novel SQL-to-MapReduce optimizer by (a) using a set of transformation rules to reduce the number of MapReduce jobs, and (b) merging MapReduce jobs in a more reasonable way. |
133 | A hybrid approach for privacy-preserving processing of knn queries in mobile database systems | Shixin Tian, Ying Cai, Qinghua Zheng | In this paper, we present a hybrid approach that mitigates the above dilemma. |
134 | Flexible and extensible generation and corruption of personal data | Peter Christen, Dinusha Vatsalan | We present a sophisticated data generation and corruption tool that allows the creation of various types of data, ranging from names and addresses, dates, social security and credit card numbers, to numerical values such as salary or blood pressure. |
135 | An efficient and robust privacy protection technique for massive streaming choice-based information | Ji Zhang, Xuemei Liu, Yonglong Luo | In this paper, we focus on the streaming choice-based information and propose a novel anonymization technique for providing a strong privacy protection to safeguard against privacy disclosure and information tampering. |
136 | RCached-tree: an index structure for efficiently answering popular queries | Manash Pal, Arnab Bhattacharya, Debjyoti Paul | In this paper, we propose RCached-tree, belonging to the family of R-trees, that aims to solve this problem. |
137 | Label constrained shortest path estimation | Ankita Likhyani, Srikanta Bedathur | In this paper, we develop SkIt index structure, which supports a wide range of label constraints on paths, and returns an accurate estimation of the shortest path that satisfies the constraints. |
138 | Feature-based models for improving the quality of noisy training data for relation extraction | Benjamin Roth, Dietrich Klakow | We propose and evaluate two feature-based models for increasing the quality of distant supervision extraction patterns. |
139 | Weighted hashing for fast large scale similarity search | Qifan Wang, Dan Zhang, Luo Si | This paper proposes a novel method, named Weighted Hashing (WeiHash), to assign different weights to different hashing bits. |
140 | Term associations in query expansion: a structural linguistic perspective | Michael Symonds, Guido Zuccon, Bevan Koopman, Peter Bruza, Laurianne Sitbon | Given the reliance on word meanings when a user formulates their query, our approach takes the novel step of modelling both syntagmatic and paradigmatic associations within the query expansion process based on the (pseudo) relevant documents returned in web search. |
141 | Predicting event-relatedness of popular queries | Seyyedeh Newsha Ghoreishi, Aixin Sun | In this paper, we identify 20 features including both contextual and temporal features from a small set of search results of a query and predict its event-relatedness. |
142 | Modeling latent topic interactions using quantum interference for information retrieval | Alessandro Sordoni, Jing He, Jian-Yun Nie | In this paper, we investigate the use of the well-known wave-like phenomenon of Quantum Interference for topic models such as Latent Dirichlet Allocation (LDA). |
143 | Generalizing diversity detection in blog feed retrieval | Mostafa Keikha, Fabio Crestani, Bruce Croft | In this paper, we propose a blog-level diversity measure where there is no assumption made about the underlying blog-ranking technique. |
144 | Dynamic query intent mining from a search log stream | Yanan Qian, Tetsuya Sakai, Junting Ye, Qinghua Zheng, Cong Li | We propose a method for mining dynamic query intents from search query logs. |
145 | Latency-aware strategy for static list caching in flash-based web search engines | Jiancong Tong, Gang Wang, Xiaoguang Liu | Based on the observation that the speed gap between the random access of flash-based solid state drive and its sequential access is much inapparent than that of magnetic hard disk drive, we introduce a new static list caching algorithm which takes the block-level access latency into consideration. |
146 | Bootstrapping active name disambiguation with crowdsourcing | Yu Cheng, Zhengzhang Chen, Jiang Wang, Ankit Agrawal, Alok Choudhary | To efficiently acquire labeled data, we propose a bootstrapping algorithm for the name disambiguation task based on active learning and crowdsourced labeling. |
147 | Modeling clicks beyond the first result page | Aleksandr Chuklin, Pavel Serdyukov, Maarten de Rijke | We propose a modification of the Dynamic Bayesian Network (DBN) click model by explicitly including into the model the probability of transition between result pages. |
148 | Maintaining discriminatory power in quantized indexes | Matt Crane, Andrew Trotman, Richard O’Keefe | We observe a relationship between the collection size and ideal quantization size, and provide a way to determine the number of bits to use from the collection size. |
149 | Retrieving opinions from discussion forums | Laura Dietz, Ziqi Wang, Samuel Huston, W. Bruce Croft | In this short paper, we test a range of existing techniques for forum retrieval and develop new retrieval models to differentiate between opinionated and factual forum posts. |
150 | Retrieval of trending keywords in a peer-to-peer micro-blogging OSN | H. Asthana, Ingemar Cox | We propose a two step solution. |
151 | Trustable aggregation of online ratings | Hyun-Kyo Oh, Sang-Wook Kim, Sunju Park, Ming Zhou | In this paper, we define false reputation as the problem of the reputation to be manipulated by unfair ratings, and design a general framework that provides trustable reputation. |
152 | Exploiting proximity feature in statistical translation models for information retrieval | Xinhui Tu, Jing Luo, Bo Li, Tingting He, Maofu Liu | In this paper, we study how to explicitly incorporate proximity information into the existing translation language model, and propose a proximity-based translation language model, called TM-P, with three variants. |
153 | Position-based contextualization for passage retrieval | David Carmel, Anna Shtok, Oren Kurland | We present a novel contextualization approach for passage retrieval. |
154 | High throughput filtering using FPGA-acceleration | Wim Vanderbauwhede, Anton Frolov, Leif Azzopardi, Sai Rahul Chalamalasetti, Martin Margala | In this paper, we develop an energy-efficient high performance information filtering system that is capable of classifying a stream of incoming document at high speed. |
155 | On challenges with mobile e-health: lessons from a game-theoretic perspective | Ann-Marie Eklund | This paper highlights some possibilities and benefits of a theoretic framework, based on existing works on game-theoretic treatments of information retrieval and communication, to allow for both descriptive and predictive analysis of internet-based health communication. |
156 | Improving entity search over linked data by modeling latent semantics | Nikita Zhiltsov, Eugene Agichtein | In this paper, we propose a principled and scalable approach for integrating of latent semantic information into a learning-to-rank model, by combining compact representation of semantic similarity, achieved by using a modified algorithm for tensor factorization, with explicit entity information. |
157 | Challenges in commerce search | Hugh Williams | In this talk, we discuss what makes commerce search hard, how eBay has solved some of these problems, and what challenges eBay faces in the next generation of its search technologies. |
158 | Clustering: probably approximately useless? | Rich Caruana | How do we fix this and make clustering more useful in practice? |
159 | Is top-k sufficient for ranking? | Yanyan Lan, Shuzi Niu, Jiafeng Guo, Xueqi Cheng | In this paper, we propose to study this problem from both empirical and theoretical aspects. |
160 | How fresh do you want your search results? | Shiwen Cheng, Anastasios Arvanitis, Vagelis Hristidis | In this work, we focus on this class of queries that we refer to as "timely queries". |
161 | TellMyRelevance!: predicting the relevance of web search results from cursor interactions | Maximilian Speicher, Andreas Both, Martin Gaedke | We introduce TellMyRelevance! |
162 | Selection fusion in semi-structured retrieval | Muhammad Ali Norozi, Paavo Arvola | Hence we propose, a novel type of fusion; the \textit{selection fusion} — a fusion methodology which fuses an all-purpose and comprehensive ranking of elements with a specific selection scheme, and also enables evaluation of the ranking in many selection perspectives. |
163 | Incorporating user preferences into click models | Qianli Xing, Yiqun Liu, Jian-Yun Nie, Min Zhang, Shaoping Ma, Kuo Zhang | As a uniform click model for all users can hardly capture the diverse click behavior, in this paper we incorporate user preferences into both a variety of existing click models and a novel click model. |
164 | Feedback-driven multiclass active learning for data streams | Yu Cheng, Zhengzhang Chen, Lu Liu, Jiang Wang, Ankit Agrawal, Alok Choudhary | In this paper, we present a systematic framework for stream-based multi-class active learning. |
165 | Discriminative feature selection for multi-view cross-domain learning | Zheng Fang, Zhongfei (Mark) Zhang | In this paper, we address this problem and propose a new framework, called DISMUTE, taking advantage of the typically available multiple views of the data in domains. |
166 | Functional dirichlet process | Lijing Qin, Xiaoyan Zhu | We present a general method for constructing dependent Dirichlet processes (DP) on arbitrary covariate space. |
167 | Spatio-temporal meme prediction: learning what hashtags will be popular where | Krishna Y. Kamath, James Caverlee | In this paper, we tackle the problem of predicting what online memes will be popular in what locations. |
168 | Cost-sensitive learning for large-scale hierarchical classification | Jianfu Chen, David Warren | We propose a loss normalization approach to appropriately calibrating the scaling of loss functions, which is applicable to general classification and structured prediction tasks whenever using structured SVM with margin re-scaling. |
169 | Effective measures for inter-document similarity | John S. Whissell, Charles L.A. Clarke | In this work, we extend that result, presenting and evaluating novel inter-document similarity measures based on BM25, language modeling, and divergence from randomness. |
170 | Efficient hierarchical clustering of large high dimensional datasets | Sean Gilpin, Buyue Qian, Ian Davidson | In this paper we explore using angular hashing to hash objects with similar angular distance to the same hash bucket. |
171 | Flexible and adaptive subspace search for outlier analysis | Fabian Keller, Emmanuel Müller, Andreas Wixler, Klemens Böhm | In this work we propose such a flexible and adaptive subspace selection scheme. |
172 | Query matching for report recommendation | Veronika Thost, Konrad Voigt, Daniel Schuster | Targeting at large-scale, real-world reporting scenarios, we propose a scalable, index-based query matching approach. |
173 | Computing term similarity by large probabilistic isA knowledge | Peipei Li, Haixun Wang, Kenny Q. Zhu, Zhongyuan Wang, Xindong Wu | Therefore, we propose a lightweight and effective approach for semantic similarity using a large scale semantic network automatically acquired from billions of web documents. |
174 | Interactive collaborative filtering | Xiaoxue Zhao, Weinan Zhang, Jun Wang | In this paper, we study collaborative filtering (CF) in an interactive setting, in which a recommender system continuously recommends items to individual users and receives interactive feedback. |
175 | Building optimal information systems automatically: configuration space exploration for biomedical information systems | Zi Yang, Elmer Garduno, Yan Fang, Avner Maiberg, Collin McCormack, Eric Nyberg | We introduce the CSE framework, an extension to the UIMA framework which provides a general distributed solution for building and exploring configuration spaces for information systems. |
176 | Learning to handle negated language in medical records search | Nut Limsopatham, Craig Macdonald, Iadh Ounis | In this paper, we propose a novel learning framework that effectively handles negated language. |
177 | A pattern-based selective recrawling approach for object-level vertical search | Yaqian Zhou, Qi Zhang, Xuanjing Huang, Lide Wu | To deal with this problem, we propose a new hypertext resource discovery method, called “selective recrawling” for object-level vertical search applications. |
178 | Robust models of mouse movement on dynamic web search results pages | Fernando Diaz, Ryen White, Georg Buscher, Dan Liebling | In this work, we develop robust, log-based mouse movement models capable of estimating searcher attention on novel SERP arrangements. |
179 | Cross-domain sparse coding | Jim Jing-Yan Wang, Halima Bensmail | In this paper, we extend the sparse coding to cross domain learning problem, which tries to learn from a source domain to a target domain with significant different distribution. |
180 | Motif discovery in spatial trajectories using grammar inference | Tim Oates, Arnold P. Boedihardjo, Jessica Lin, Crystal Chen, Susan Frankenstein, Sunil Gandhi | In this work, we study the problem of discovering motifs in trajectories based on symbolically transformed representations and context free grammars. |
181 | LCMKL: latent-community and multi-kernel learning based image annotation | Qing Li, Yun Gu, Xueming Qian | In this paper, we propose a novel approach called latent-community and multi-kernel learning (LCMKL). |
182 | Random walk-based graphical sampling in unbalanced heterogeneous bipartite social graphs | Yusheng Xie, Zhengzhang Chen, Ankit Agrawal, Alok Choudhary, Lu Liu | We propose random walked-based link sampling and stratified sampling for UHBGs and show that they have advantages over generic random walk samplers. |
183 | Modeling information diffusion over social networks for temporal dynamic prediction | Dong Li, Zhiming Xu, Yishu Luo, Sheng Li, Anika Gupta, Katia Sycara, Shengmei Luo, Lei Hu, Hong Chen | To address this problem, we propose a novel information diffusion model (GT model), which considers the users in network as intelligent agents. |
184 | Predicting retweet count using visual cues | Ethem F. Can, Hüseyin Oktay, R. Manmatha | In this study, we focus on predicting the expected retweet count of a tweet by using visual cues of an image linked in that tweet in addition to content and structure-based features. |
185 | Identifying multilingual Wikipedia articles based on cross language similarity and activity | Khoi-Nguyen Tran, Peter Christen | In this poster, we propose similarity and activity measures of Wikipedia articles across two languages: English and German. |
186 | An efficient algorithm for approximate betweenness centrality computation | Mostafa Haghir Chehreghani | In this paper, we propose a generic randomized framework for unbiased approximation of betweenness centrality. |
187 | Exploiting collaborative filtering techniques for automatic assessment of student free-text responses | Tao Ge, Zhifang Sui, Baobao Chang | Unlike some conventional methods which assess the student responses based on only information about their corresponding questions, this paper exploits idea of collaborative filtering to analyze student responses and used an effective collaborative filtering model — feature-based matrix factorization model to deal with this challenge. |
188 | Automated probabilistic modeling for relational data | Sameer Singh, Thore Graepel | Instead of requiring a domain expert to specify the probabilistic dependencies of the data, we present an approach that uses the relational DB schema to automatically construct a Bayesian graphical model for a database. |
189 | Semantic discovery from web comparison queries | Tingting Zhong, Wensheng Wu | We present a novel snowballing algorithm that "crawls" comparison queries from search engines via their query autocompletion services. |
190 | Joint learning on sentiment and emotion classification | Wei Gao, Shoushan Li, Sophia Yat Mei Lee, Guodong Zhou, Chu-Ren Huang | In this paper, we address joint learning on sentiment and emotion classification where both the labeled data for sentiment and emotion classification are available. |
191 | A unified graph model for personalized query-oriented reference paper recommendation | Fanqi Meng, Dehong Gao, Wenjie Li, Xu Sun, Yuexian Hou | In this paper, we propose a unified graph model that can easily incorporate various types of useful information (e.g., content, authorship, citation and collaboration networks etc.) for efficient recommendation. |
192 | Probabilistic latent class models for predicting student performance | Suleyman Cetintas, Luo Si, Yan Ping Xin, Ron Tzur | This paper proposes a set of novel probabilistic latent class models for the task. |
193 | Timeline adaptation for text classification | Fumiyo Fukumoto, Yoshimi Suzuki, Atsuhiro Takasu | In this paper, we address the text classification problem that a period of time created test data is different from the training data, and present a method for text classification based on temporal adaptation. |
194 | Recommendation via user’s personality and social contextual | He Feng, Xueming Qian | In this paper, three social factors, personal interest, interpersonal interest similarity and interpersonal influence, fuse into a unified personalized recommendation model based on probabilistic matrix factorization. |
195 | A fast convergence clustering algorithm merging MCMC and EM methods | David Sergio Matusevich, Carlos Ordonez, Veerabhadran Baladandayuthapani | In this article, we tackle two fundamental conflicting goals: Finding higher quality solutions and achieving faster convergence. |
196 | Discrimination aware classification for imbalanced datasets | Goce Ristanoski, Wei Liu, James Bailey | Once the discrimination sensitive attribute is identified, the methods aim to develop a strategy that will include the useful information from that attribute without causing any additional discrimination. |
197 | Incremental shared nearest neighbor density-based clustering | Sumeet Singh, Amit Awekar | We propose an incremental extension to this algorithm IncSNN-DBSCAN, capable of finding clusters on a dataset to which frequent inserts are made. |
198 | The essence of knowledge (bases) through entity rankings | Evica Ilieva, Sebastian Michel, Aleksandar Stupar | We consider the task of automatically phrasing and computing top-k rankings over the information contained in common knowledge bases (KBs), such as YAGO or DBPedia. |
199 | Chinese syntactic parsing based on linguistic entity-relationship model | Dechun Yin | In this paper, we present a new parsing method for Chinese based on a newly proposed linguistic entity relationship model. |
200 | Clustering-based anomaly detection in multi-view data | Alejandro Marcos Alvarez, Makoto Yamada, Akisato Kimura, Tomoharu Iwata | This paper proposes a simple yet effective anomaly detection method for multi-view data. |
201 | Discovering relations using matrix factorization methods | Ervina Cergani, Pauli Miettinen | In this work we propose the use of matrix factorization methods instead of clustering. |
202 | On exploiting content and citations together to compute similarity of scientific papers | Masoud Reyhani Hamedani, Sang-Wook Kim, Sang-Chul Lee, Dong-Jin Kim | In this paper, we propose a novel approach called SimCC that effectively combines the content and citation information to accurately compute the similarity of scientific papers. |
203 | Taxonomy-based regression model for cross-domain sentiment classification | Cong-Kai Lin, Yang-Yin Lee, Chi-Hsin Yu, Hsin-Hsi Chen | To select an appropriate source node for training in the domain taxonomy, we propose a Taxonomy-Based Regression Model (TBRM) which predicts the accuracy loss from multiple source nodes to a target node using the tree-structured domain representation combined with domain similarity and domain complexity. |
204 | Reconciliation of categorical opinions from multiple sources | Adway Mitra, Srujana Merugu | To address this, we propose a generic Bayesian framework for opinion reconciliation that can readily incorporate latent and observed attributes of sources and subjects. |
205 | An unsupervised transfer learning approach to discover topics for online reputation management | Tamara Martín-Wanton, Julio Gonzalo, Enrique Amigó | In this paper, we present a new technique to cluster a collection of tweets emitted within a short time span about a specific entity. |
206 | Discovering facts with boolean tensor tucker decomposition | Dora Erdos, Pauli Miettinen | We consider the presentation of the problem as a Boolean tensor decomposition as one of this paper’s main contributions. |
207 | Intelligent SSD: a turbo for big data mining | Duck-Ho Bae, Jin-Hyung Kim, Sang-Wook Kim, Hyunok Oh, Chanik Park | This paper introduces the notion of intelligent SSDs. |
208 | Software plagiarism detection: a graph-based approach | Dong-Kyu Chae, Jiwoon Ha, Sang-Wook Kim, BooJoong Kang, Eul Gyu Im | In this paper, we propose a software plagiarism detection system using an API-labeled control flow graph (A-CFG) that abstracts the functionalities of a program. |
209 | Objectionable content filtering by click-through data | Lung-Hao Lee, Yen-Cheng Juan, Hsin-Hsi Chen, Yuen-Hsien Tseng | This paper explores users’ browsing intents to predict the category of a user’s next access during web surfing, and applies the results to objectionable content filtering. |
210 | Computational advertising: the linkedin way | Deepak Agarwal | In this talk, I will give an overview of machine learning and optimization components that power LinkedIn self-serve display advertising systems. |
211 | Automatic ad format selection via contextual bandits | Liang Tang, Romer Rosales, Ajit Singh, Deepak Agarwal | To balance exploration with exploitation, we pose automatic layout selection as a contextual bandit problem. |
212 | Graph similarity search with edit distance constraint in large graph databases | Weiguo Zheng, Lei Zou, Xiang Lian, Dong Wang, Dongyan Zhao | In this paper, we study the problem of graph similarity search, which retrieves graphs that are similar to a given query graph under the constraint of the minimum edit distance. |
213 | Fast and scalable reachability queries on graphs by pruned labeling with landmarks and paths | Yosuke Yano, Takuya Akiba, Yoichi Iwata, Yuichi Yoshida | In this paper, we propose new labeling-based methods for reachability queries, referred to as pruned landmark labeling and pruned path labeling. |
214 | Graph hashing and factorization for fast graph stream classification | Ting Guo, Lianhua Chi, Xingquan Zhu | In this paper, we propose a fine-grained graph factorization approach for Fast Graph Stream Classification (FGSC). |
215 | Efficiently anonymizing social networks with reachability preservation | Xiangyu Liu, Bin Wang, Xiaochun Yang | In this paper, we solve this problem by designing a reachability preserving anonymization (RPA for short) algorithm. |
216 | ImG-complex: graph data model for topology of unstructured meshes | Alireza Rezaei Mahdiraji, Peter Baumann, Guntram Berti | In this paper, we propose the Incidence multi-Graph Complex (ImG-Complex) data model for storing topological aspects of meshes in a database. |
217 | ROU: advanced keyword search on graph | Yifan Pan, Yuqing Wu | In this paper, we formally define a new type of keyword search query, ROU-query, which takes as input keywords in three categories: required, optional and unwanted, and returns as output sets of nodes in the data graph whose neighborhood satisfies the keyword requirements. |
218 | Hotness-aware buffer management for flash-based hybrid storage systems | Yanfei Lv, Bin Cui, Xuexuan Chen, Jing Li | In this paper, we propose a novel approach to manage the buffer in flash-based hybrid storage systems, named Hotness Aware Hit (HAT). |
219 | Expedited rating of data stores using agile data loading techniques | Sumita Barahmand, Shahram Ghandeharizadeh | This paper presents several agile data loading techniques to expedite the rating process. |
220 | Social recommendation incorporating topic mining and social trust analysis | Tong Zhao, Chunping Li, Mengya Li, Qiang Ding, Li Li | We propose a probabilistic matrix factorization (TTMF) algorithm and try to enhance the recommendation accuracy by utilizing the estimated topic-specific social trust relations. |
221 | Originator or propagator?: incorporating social role theory into topic models for twitter content analysis | Xin Wayne Zhao, Jinpeng Wang, Yulan He, Jian-Yun Nie, Xiaoming Li | In this paper, we propose a method inspired from Social Role Theory (SRT), which assumes that a user behaves differently with different roles in the generation process of Twitter content. |
222 | An effective latent networks fusion based model for event recommendation in offline ephemeral social networks | Guoqiong Liao, Yuchen Zhao, Sihong Xie, Philip S. Yu | An effective latent networks fusion based model for event recommendation in offline ephemeral social networks |
223 | Predicting trends in social networks via dynamic activeness model | Shuyang Lin, Xiangnan Kong, Philip S. Yu | In this paper, we study the problem of predicting dynamic trends in social networks. |
224 | Dyadic event attribution in social networks with mixtures of hawkes processes | Liangda Li, Hongyuan Zha | In this paper we focus on the problem of dyadic event attribution, an important missing data problem in dyadic event modeling where one needs to infer the missing actor-pairs of a subset of dyadic events based on their observed timestamps. |
225 | Modeling temporal effects of human mobile behavior on location-based social networks | Huiji Gao, Jiliang Tang, Xia Hu, Huan Liu | We propose a general framework to exploit and model temporal cyclic patterns and their relationships with spatial and social data. |
226 | Social media news communities: gatekeeping, coverage, and statement bias | Diego Saez-Trumper, Carlos Castillo, Mounia Lalmas | To that end, we introduce unsupervised methods considering three types of biases: selection or “gatekeeping” bias, coverage bias, and statement bias, characterizing each one through a series of metrics. |
227 | Discovering health-related knowledge in social media using ensembles of heterogeneous features | Suppawong Tuarob, Conrad S. Tucker, Marcel Salathe, Nilam Ram | These systems mostly employ traditional document classification techniques that represent a document with a bag of N-grams. |
228 | Seeking provenance of information using social media | Pritam Gundecha, Zhuo Feng, Huan Liu | In this paper, we are studying a novel research problem that facilitates the seeking of the provenance of information for a few known recipients (less than 1% of the total recipients) by recovering the paths it has taken from its originators. |
229 | Compact explanatory opinion summarization | Hyun Duk Kim, Malu Castellanos, Meichun Hsu, ChengXiang Zhai, Umeshwar Dayal, Riddhiman Ghosh | In this paper, we propose a novel opinion summarization problem called compact explanatory opinion summarization (CEOS) which aims to extract within-sentence explanatory text segments from input opinionated texts to help users better understand the detailed reasons of sentiments. We create new data sets and use a new evaluation measure to evaluate CEOS. |
230 | Towards an enhanced and adaptable ontology by distilling and assembling online encyclopedias | Shan Jiang, Lidong Bing, Yan Zhang | In this paper, we investigate the problem of making better use of semantic knowledge obtained from different encyclopedia sources. |
231 | Assessing sparse information extraction using semantic contexts | Peipei Li, Haixun Wang, Hongsong Li, Xindong Wu | In this work, we introduce a lightweight, explicit semantic approach for sparse information extraction. |
232 | Studying from electronic textbooks | Rakesh Agrawal, Sreenivas Gollapudi, Anitha Kannan, Krishnaram Kenthapadi | We propose a novel reader model for textbooks and an algorithm for generating the study navigator based on this model. |
233 | Generating informative snippet to maximize item visibility | Mahashweta Das, Habibur Rahman, Gautam Das, Vagelis Hristidis | We investigate the problem of finding the top-k best snippets for an item that are likely to maximize the probability that the user preference (available in the form of search query) is satisfied. |
234 | Assessing quality score of Wikipedia article using mutual evaluation of editors and texts | Yu Suzuki, Masatoshi Yoshikawa | In this paper, we propose a method for assessing quality scores of Wikipedia articles by mutually evaluating editors and texts. |
235 | Concept-based analysis of scientific literature | Chen-Tse Tsai, Gourab Kundu, Dan Roth | To reach this goal, we propose an unsupervised bootstrapping algorithm for identifying and categorizing mentions of concepts. |
236 | On sampling the wisdom of crowds: random vs. expert sampling of the twitter stream | Saptarshi Ghosh, Muhammad Bilal Zafar, Parantapa Bhattacharya, Naveen Sharma, Niloy Ganguly, Krishna Gummadi | In this paper, we investigate the crucial question of how to sample the data generated by users in social networks. |
237 | Can back-of-the-book indexes be automatically created? | Zhaohui Wu, Zhenhui Li, Prasenjit Mitra, C. Lee Giles | Inspired by how human indexers work on back-of-the-book indexes creation, we present a new domain-independent, corpus-free and training-free automation approach. |
238 | Directing exploratory search with interactive intent modeling | Tuukka Ruotsalo, Jaakko Peltonen, Manuel Eugster, Dorota Głowacka, Ksenia Konyushkova, Kumaripaba Athukorala, Ilkka Kosunen, Aki Reijonen, Petri Myllymäki, Giulio Jacucci, Samuel Kaski | We introduce interactive intent modeling, where the user directs exploratory search by providing feedback for estimates of search intents. |
239 | FRec: a novel framework of recommending users and communities in social media | Lei Li, Wei Peng, Saurabh Kataria, Tong Sun, Tao Li | In this paper, we propose a framework of recommending users and communities in social media. |
240 | Permutation indexing: fast approximate retrieval from large corpora | Maxim Gurevich, Tamás Sarlós | In this work we propose an alternative technique, permutation indexing, where retrieval cost is strictly bounded and has only logarithmic dependence on the corpus size. |
241 | Clustering-based transduction for learning a ranking model with limited human labels | Xin Zhang, Ben He, Tiejian Luo, Dongxing Li, Jungang Xu | To this end, we propose to incorporate a two-step k-means clustering algorithm to select the high quality training queries for generating the pseudo labels. |
242 | Exploiting ranking factorization machines for microblog retrieval | Runwei Qiang, Feng Liang, Jianwu Yang | In this paper, we propose a Ranking Factorization Machine (Ranking FM) model, which applies Factorization Machine model to microblog ranking on basis of pairwise classification. |
243 | Learning compact hashing codes for efficient tag completion and prediction | Qifan Wang, Lingyun Ruan, Zhiwei Zhang, Luo Si | This paper proposes a novel efficient Hashing approach for Tag Completion and Prediction (HashTCP). |
244 | How do users grow up along with search engines?: a study of long-term users’ behavior | Jian Liu, Yiqun Liu, Min Zhang, Shaoping Ma | In this paper we look into the interaction logs of these two user groups to analyze differences between these two user groups and to better understand how users grow up along with Web search engines. |
245 | LR-PPR: locality-sensitive, re-use promoting, approximate personalized pagerank computation | Jung Hyun Kim, K. Selçuk Candan, Maria Luisa Sapino | In this paper, we propose a Locality-sensitive, Re-use promoting, approximate personalized PageRank (LR-PPR) algorithm for efficiently computing the PPR values relying on the localities of the given seed nodes on the graph: (a) The LR-PPR algorithm is locality sensitive in the sense that it reduces the computational cost of the PPR computation process by focusing on the local neighborhoods of the seed nodes. |
246 | Multimedia summarization for trending topics in microblogs | Jingwen Bian, Yang Yang, Tat-Seng Chua | In this paper, we propose a multimedia microblog summarization framework to automatically generate visualized summaries for trending topics. |
247 | Semi-supervised discriminative preference elicitation for cold-start recommendation | Xi Zhang, Jian Cheng, Ting Yuan, Biao Niu, Hanqing Lu | In this paper, we propose a novel framework to mine the most valuable items to construct query set using a semi-supervised discriminative selection (SSDS) model. |
248 | Exploiting query term correlation for list caching in web search engines | Jiancong Tong, Gang Wang, Douglas S. Stones, Shizhao Sun, Xiaoguang Liu, Fan Zhang | We propose an inverted list caching policy, based on the Least Recently Used method, in which the co-occurring correlation between terms in the query stream is accounted for when deciding on which terms to keep in the cache. |
249 | Speller performance prediction for query autocorrection | Alexey Baytin, Irina Galinskaya, Marina Panina, Pavel Serdyukov | In this paper we define the problem of speller performance prediction and apply it to the task of query spelling autocorrection. |
250 | Predicting the impact of expansion terms using semantic and user interaction features | Anton Bakhtin, Yury Ustinovskiy, Pavel Serdyukov | Predicting the impact of expansion terms using semantic and user interaction features |
251 | QBEES: query by entity examples | Steffen Metzger, Ralf Schenkel, Marcin Sydow | We present QBEES, a novel framework for defining entity similarity based only on structural features, so-called aspects, of the entities, that includes query-dependent and query-independent entity ranking components. |
252 | Learning to selectively rank patients’ medical history | Nut Limsopatham, Craig Macdonald, Iadh Ounis | In this work, we propose a novel supervised approach that can effectively identify when to use either of the two aforementioned patient ranking approaches to attain effective retrieval performance. |
253 | A belief propagation approach for detecting shilling attacks in collaborative filtering | Jun Zou, Faramarz Fekri | In this paper, we develop a probabilistic inference framework that further exploits the target items for attack detection. |
254 | Automated snippet generation for online advertising | Stamatina Thomaidou, Ismini Lourentzou, Panagiotis Katsivelis-Perakis, Michalis Vazirgiannis | In this paper, we propose a method that produces in an automated manner compact text ads (promotional text snippets), given as input a product description webpage (landing page). |
255 | Detecting controversy on the web | Shiri Dori-Hacohen, James Allan | We explore the feasibility of solving the problem by treating it as supervised k-nearest-neighbor classification. |
256 | Mining user interest from search tasks and annotations | Sampath Jayarathna, Atish Patra, Frank Shipman | In this paper, we introduce UIMaP: User Interest Modeling and Personalization, a search task based personal user interest model to support users’ information gathering tasks. |
257 | Generating comparative summaries from reviews | Ruben Sipos, Thorsten Joachims | To facilitate direct comparisons between different products, we present an approach to constructing short and comparative summaries based on product reviews. |
258 | Zero-shot video retrieval using content and concepts | Jeffrey Dalton, James Allan, Pranav Mirajkar | In this work we introduce a new method for automatically identifying relevant concepts given a text query using the Markov Random Field (MRF) retrieval framework. |
259 | Diversified query expansion using conceptnet | Arbi Bouchoucha, Jing He, Jian-Yun Nie | For this purpose, we investigate a new approach to SRD by diversifying the query. |
260 | An empirical study of top-n recommendation for venture finance | Thomas Stone, Weinan Zhang, Xiaoxue Zhao | This paper concerns the task of top-N investment opportunity recommendation in the domain of venture finance. |
261 | Interest mining from user tweets | Thuy Vu, Victor Perez | We build a system to extract user interests from Twitter messages. |
262 | An analysis of crowd workers mistakes for specific and complex relevance assessment task | Jesse Anderton, Maryam Bashir, Virgil Pavlu, Javed A. Aslam | Since most crowdsourcing approaches submitted to the TREC 2012 track produced assessment sets nowhere close to the expert judgements, we decided to analyze crowdsourcing mistakes made on this task using data we collected via Amazon’s Mechanical Turk service. |
263 | Combining prestige and relevance ranking for personalized recommendation | Xiao Yang, Zhaoxin Zhang | In this paper, we present an adaptive graph-based personalized recommendation method based on combining prestige and relevance ranking. |
264 | Strategies for setting time-to-live values in result caches | Fethi Burak Sazoglu, B. Barla Cambazoglu, Rifat Ozcan, Ismail Sengor Altingovde, Özgür Ulusoy | In this work, we evaluate the performance of three alternative TTL mechanisms: time-based TTL, frequency-based TTL, and click-based TTL. |
265 | Learning to detect task boundaries of query session | Zhenzhong Zhang, Le Sun, Xianpei Han | In this paper we learn hidden topics from query log and leverage them to resolve the vocabulary gap problem. |
266 | Early prediction on imbalanced multivariate time series | Guoliang He, Yong Duan, Tieyun Qian, Xu Chen | To deal with this issue, we adopt a multiple under-sampling and dynamical subspace generation method to obtain initial training data, and each training data is used to learn a base learner. |
267 | Exploiting trustors as well as trustees in trust-based recommendation | Won-Seok Hwang, Shaoyu Li, Sang-Wook Kim, Ho Jin Choi | In this paper, we investigate this possibility by identifying and adding these users to the existing methods when predicting ratings for the target user. |
268 | Through-the-looking glass: utilizing rich post-search trail statistics for web search | Alexey Tolstikov, Mikhail Shakhray, Gleb Gusev, Pavel Serdyukov | We conduct a large-scale study and evaluation of a rich set of search trail features in realistic settings and conclude that a deeper investigation of a users experience far beyond her click on the result page has the potential to improve the existing ranking models. |
269 | Topical authority propagation on microblogs | Juan Hu, Yi Fang, Archana Godavarthy | We propose a novel Topical Authority Propagation (TAP) model by utilizing the fact that topical authority can be propagated through retweeting, i.e., if a user’s tweet on a given topic is retweeted by a topical authority, that user is likely to be an authority on the topic as well. |
270 | The importance of being socially-savvy: quantifying the influence of social networks on microblog retrieval | Alexander Kotov, Eugene Agichtein | In this work, we quantitatively evaluate the influence of social networks on social media content providers. |
271 | Flexible and dynamic compromises for effective recommendations | Saurabh Gupta, Sutanu Chakraborti | In this paper, we propose a way to realize the notion of compromise in a conversational setting. |
272 | The online revolution: education for everyone | Andrew Ng | In this talk, I’ll report on this far-reaching experiment in education, and why we believe this model can provide both an improved classroom experience for our on-campus students, via a flipped classroom model, as well as a meaningful learning experience for the millions of students around the world who would otherwise never have access to education of this quality. |
273 | Online learning from streaming data | Jeff Hawkins | In this talk I will describe recent advances in brain theory and how we have applied those advances to machine-generated streaming data. |
274 | From big data to big knowledge | Kevin Murphy | In this talk, I will survey some of the efforts we are engaged in to try to "grow" KG automatically using machine learning methods. |
275 | "All roads lead to Rome": optimistic recovery for distributed iterative data processing | Sebastian Schelter, Stephan Ewen, Kostas Tzoumas, Volker Markl | We propose an optimistic recovery mechanism using algorithmic compensations. |
276 | Optimizing plurality for human intelligence tasks | Luyi Mo, Reynold Cheng, Ben Kao, Xuan S. Yang, Chenghui Ren, Siyu Lei, David W. Cheung, Eric Lo | We propose a dynamic programming (DP) algorithm for solving the plurality assignment problem (PAP). |
277 | Entropy-based histograms for selectivity estimation | Hien To, Kuorong Chiang, Cyrus Shahabi | Therefore, we propose effective models to quantitatively measure bias and selectivity based on information entropy. |
278 | Efficient two-party private blocking based on sorted nearest neighborhood clustering | Dinusha Vatsalan, Peter Christen, Vassilios S. Verykios | We introduce a novel two-party private blocking technique for PPRL based on sorted nearest neighborhood clustering. |
279 | Context-aware top-K processing using views | Silviu Maniu, Bogdan Cautis | We present algorithms that address these two problems, and illustrate their practical use in two important application scenarios: location-aware search and social-aware search. |
280 | Locality sensitive hashing revisited: filling the gap between theory and algorithm analysis | Hongya Wang, Jiao Cao, LihChyun Shu, Davood Rafiei | In this paper, we show that a surprising gap exists between the LSH theory and widely practiced algorithm analysis techniques. |
281 | Personalization of web-search using short-term browsing context | Yury Ustinovskiy, Pavel Serdyukov | In this paper we study the problem of short-term personalization. |
282 | Factors affecting aggregated search coherence and search behavior | Jaime Arguello, Robert Capra, Wan-Ching Wu | We build upon this work and address three outstanding research questions about aggregated search coherence: (1) Does the same "spill-over" effect generalize to other verticals besides images? |
283 | Improving passage ranking with user behavior information | Weize Kong, Elif Aktolga, James Allan | In this paper, we study how user behavior information implies section relevance, and use this information to improve section ranking. |
284 | Personalized models of search satisfaction | Ahmed Hassan, Ryen W. White | In this paper we verify that searcher behavior when satisfied and dissatisfied is indeed different among individual searchers along a number of dimensions. |
285 | Beyond clicks: query reformulation as a predictor of search satisfaction | Ahmed Hassan, Xiaolin Shi, Nick Craswell, Bill Ramsey | Using a large unlabeled dataset, a labeled dataset of queries and a labeled dataset of user tasks, we analyze the relationship between these signals. |
286 | Unsupervised identification of synonymous query intent templates for attribute intents | Yanen Li, Bo-June Paul Hsu, ChengXiang Zhai | In this work we address the problem of identifying synonymous query intent templates for the attribute intent. |
287 | Toward advice mining: conditional random fields for extracting advice-revealing text units | Alfan Farizki Wicaksono, Sung-Hyon Myaeng | In this paper, we address the problem of advice-revealing text unit (ATU) extraction from online forums due to its usefulness in travel domain. |
288 | Information extraction as a filtering task | Henning Wachsmuth, Benno Stein, Gregor Engels | In this paper, we hence propose and evaluate a model and a formal approach that consistently put the filtering view in the focus: Before spending annotation effort, filter those portions of the input texts that may contain relevant information for filling a template and discard the others. |
289 | Web news extraction via path ratios | Gongqing Wu, Li Li, Xuegang Hu, Xindong Wu | In this paper, we present Content Extraction via Path Ratios (CEPR) – a fast, accurate and general on-line method for distinguishing news content from non-news content by the TPR/ETPR histogram effectively. |
290 | Lead-lag analysis via sparse co-projection in correlated text streams | Fangzhao Wu, Yangqiu Song, Shixia Liu, Yongfeng Huang, Zhenyu Liu | In this paper, we propose an algorithm that can both detect the correlation and discover the corresponding keywords that trigger the correlation. |
291 | Adaptive co-training SVM for sentiment classification on tweets | Shenghua Liu, Fuxin Li, Fangtao Li, Xueqi Cheng, Huawei Shen | Therefore we formally propose an adaptive multiclass SVM model which transfers an initial common sentiment classifier to a topic-adaptive one. |
292 | On handling textual errors in latent document modeling | Tao Yang, Dongwon Lee | On handling textual errors in latent document modeling |
293 | Overlapping community detection using seed set expansion | Joyce Jiyoung Whang, David F. Gleich, Inderjit S. Dhillon | In this paper, we propose an efficient overlapping community detection algorithm using a seed set expansion approach. |
294 | TODMIS: mining communities from trajectories | Siyuan Liu, Shuhui Wang, Kasthuri Jayarajah, Archan Misra, Ramayya Krishnan | To address this limitation, we propose TODMIS: a general framework for Trajectory cOmmunity Discovery using Multiple Information Sources. |
295 | Archiving the relaxed consistency web | Zhiwu Xie, Herbert Van de Sompel, Jinyang Liu, Johann van Reenen, Ramiro Jordan | We discuss the nature of such quality degradation and propose a few possible remedies. |
296 | Programming with personalized pagerank: a locally groundable first-order probabilistic logic | William Yang Wang, Kathryn Mazaitis, William W. Cohen | Here we present a first-order probabilistic language which is well-suited to approximate "local" grounding: in particular, every query $Q$ can be approximately grounded with a small graph. |
297 | Towards faster and better retrieval models for question search | Guangyou Zhou, Yubo Chen, Daojian Zeng, Jun Zhao | In this paper, we propose a faster and better retrieval model for question search by leveraging user chosen category. |
298 | Nonparametric bayesian multitask collaborative filtering | Sotirios Chatzis | To alleviate these issues, in this paper we propose a novel multitask collaborative filtering approach. |
299 | Local-to-global semi-supervised feature selection | Mohammed Hindawi, Khalid Benabdeslem | Global and local feature selection have different objectives, nevertheless, in this paper we propose a novel embedded approach which locally weights the variables towards a global feature selection. |
300 | Intelligently querying incomplete instances for improving classification performance | Karthik Sankaranarayanan, Amit Dhurandhar | In this paper, we propose a novel active feature acquisition technique to tackle this problem of instance completion prevalent in these domains. |
301 | A probabilistic mixture model for mining and analyzing product search log | Huizhong Duan, ChengXiang Zhai, Jinxing Cheng, Abhishek Gattani | In this paper, we propose a novel probabilistic mixture model for attribute-level analysis of product search logs. |
302 | Eigenvalues perturbation of integral operator for kernel selection | Yong Liu, Shali Jiang, Shizhong Liao | In this paper, we introduce new kernel selection criteria based on the eigenvalues perturbation of the integral operator. |
303 | Beyond data: from user information to business value through personalized recommendations and consumer science | Xavier Amatriain | In this invited talk I will discuss the different approaches we follow to deal with these large streams of user data in order to extract information for personalizing our service. |
304 | Beyond data: from user information to business value through personalized recommendations and consumer science | Xavier Amatriain | In this paper I will discuss the different approaches we follow to deal with these large streams of user data in order to extract information for personalizing our service. |
305 | Leveraging data to change industry paradigms | Chris Farmer | In this talk, I will discuss how we analyze these trends as venture capitalists and will look at a few case studies of specific companies leveraging data to innovate in their industries. |
306 | Large-scale deep learning at Baidu | Kai Yu | In this talk, I will walk through some of the latest technology advances of deep learning within Baidu, and discuss the main challenges, e.g., developing effective models for various applications, and scaling up the model training using many GPUs. |
307 | Wondering why data are missing from query results?: ask conseil why-not | Melanie Herschel | This solution goes beyond simply forming the union of explanations produced by different algorithms and is shown to be able to explain a larger set of missing-answers. |
308 | Fast evaluation of iceberg pattern-based aggregate queries | Zhian He, Petrie Wong, Ben Kao, Eric Lo, Reynold Cheng | This paper proposes an efficient approach to identify and evaluate iceberg cells of s-cuboids. |
309 | Top-down keyword query processing on XML data | Junfeng Zhou, Xingmin Zhao, Wei Wang, Ziyang Chen, Jeffrey Xu Yu | In this paper, we propose a generic top-down processing strategy to answer a given keyword query w.r.t. LCA/SLCA/ELCA semantics. |
310 | Efficient pruning algorithm for top-K ranking on dataset with value uncertainty | Jianwen Chen, Ling Feng | We present the mathematics of deriving the pruning techniques and the corresponding algorithms. |
311 | Query execution timing: taming real-time anytime queries on multicore processors | Chunyao Song, Zheng Li, Tingjian Ge, Jie Wang | Specifically, we propose two query optimization modes: offline periodic optimization and online optimization. |
312 | Merged aggregate nearest neighbor query processing in road networks | Weiwei Sun, Chong Chen, Baihua Zheng, Chunan Chen, Liang Zhu, Weimo Liu, Yan Huang | This paper proposes an effective algorithm to process MANN query in road networks based on our pruning strategies. |
313 | SkyView: a user evaluation of the skyline operator | Matteo Magnani, Ira Assent, Kasper Hornbæk, Mikkel R. Jakobsen, Ken Friis Larsen | Our study investigates the degree to which users understand skyline queries, how they specify query parameters and how they interact with skyline results made available in listings or map-based interfaces. |
314 | UMicS: from anonymized data to usable microdata | Graham Cormode, Entong Shen, Xi Gong, Ting Yu, Cecilia M. Procopiuc, Divesh Srivastava | In this paper, instead of proposing new privacy mechanisms for data publishing, we consider the whole data release process, from the data owner to the data user. |
315 | GAPfm: optimal top-n recommendations for graded relevance domains | Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Alan Hanjalic | We address the shortcomings of existing approaches by proposing GAPfm, the Graded Average Precision factor model, which is a latent factor model for top-N recommendation in domains with graded relevance data. |
316 | URL tree: efficient unsupervised content extraction from streams of web documents | Borut Sluban, Miha Grčar | In this work, we focus on content extraction from streams of HTML documents. |
317 | Estimating document focus time | Adam Jatowt, Ching-Man Au Yeung, Katsumi Tanaka | In this paper, we introduce the problem of estimating focus time of documents. |
318 | Faceted models of blog feeds | Lifeng Jia, Clement Yu, Weiyi Meng | In this paper we consider personal and official facets. |
319 | SRbench–a benchmark for soundtrack recommendation systems | Aleksandar Stupar, Sebastian Michel | In this work, a benchmark to evaluate the retrieval performance of soundtrack recommendation systems is proposed. |
320 | CV-PCR: a context-guided value-driven framework for patent citation recommendation | Sooyoung Oh, Zhen Lei, Wang-Chien Lee, Prasenjit Mitra, John Yen | Based on the insight that patent citations are important information reflecting the value of cited patents to the citing patent, we propose a heterogeneous patent citation-bibliographic network that combines patent citations (reflecting value relation) and bibliographic information (reflecting similarity relation) together. |
321 | Modeling behavioral factors ininteractive information retrieval | Feza Baskaya, Heikki Keskustalo, Kalervo Järvelin | In the present study we aim at assessing the effects of the behavioral factors on retrieval effectiveness. |
322 | Intent models for contextualising and diversifying query suggestions | Eugene Kharitonov, Craig Macdonald, Pavel Serdyukov, Iadh Ounis | We introduce a contextualisation framework that utilises a short-term context using the user’s behaviour within the current search session, such as the previous query, the documents examined, and the candidate query suggestions that the user has discarded. |
323 | Building user profiles from topic models for personalised search | Morgan Harvey, Fabio Crestani, Mark J. Carman | In this work we use query logs to build personalised ranking models in which user profiles are constructed based on the representation of clicked documents over a topic space. |
324 | Transferring knowledge with source selection to learn IR functions on unlabeled collections | Parantapa Goswami, Massih R. Amini, Eric Gaussier | For the transfer step, the relevance information in the source collection is summarized as a grid that provides, for each term frequency and document frequency values of a word in a document, an empirical estimate of the relevance of the document. |
325 | Understanding how people interact with web search results that change in real-time using implicit feedback | Jin Young Kim, Mark Cramer, Jaime Teevan, Dmitry Lagun | In this paper we compare a traditional search interface with one that dynamical-ly re-ranks and recommends search results as the user interacts with it in order to build a picture of how and when users should be offered dynamically identified relevant content. |
326 | Facet selection algorithms for web product search | Damir Vandic, Flavius Frasincar, Uzay Kaymak | In this paper, we focus on automatic facet selection, with the goal of minimizing the number of steps needed to find the desired product. |
327 | Learning deep structured semantic models for web search using clickthrough data | Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, Larry Heck | In this study we strive to develop a series of new latent semantic models with a deep structure that project queries and documents into a common low-dimensional space where the relevance of a document given a query is readily computed as the distance between them. |
328 | Learning open-domain comparable entity graphs from user search queries | Ziheng Jiang, Lei Ji, Jianwen Zhang, Jun Yan, Ping Guo, Ning Liu | In this paper, we propose a novel solution, which is known as Comparable Entity Graph Mining (CEGM), to learn an open-domain comparable entity graph from the user search queries. |
329 | RAProp: ranking tweets by exploiting the tweet/user/web ecosystem and inter-tweet agreement | Srijith Ravikumar, Kartik Talamadupula, Raju Balakrishnan, Subbarao Kambhampati | We present a novel ranking method called RAProp, which combines two orthogonal measures of relevance and trustworthiness of a tweet. |
330 | Incorporating the surfing behavior of web users into pagerank | Shatlyk Ashyralyyev, B. Barla Cambazoglu, Cevdet Aykanat | In this work, we combine these two types of feedback under a hybrid page ranking model in order to alleviate the above-mentioned drawbacks. |
331 | Question routing to user communities | Aditya Pal, Fei Wang, Michelle X. Zhou, Jeffrey Nichols, Barton A. Smith | In this paper, we consider the novel problem of routing questions to the right community and propose a framework to select the right set of communities for a question. |
332 | Learning to rank for question routing in community question answering | Zongcheng Ji, Bin Wang | This paper proposes a general framework based on the learning to rank concepts for QR. |
333 | Re-ranking for joint named-entity recognition and linking | Avirup Sil, Alexander Yates | We present a joint model for NER and EL, called NEREL, that takes a large set of candidate mentions from typical NER systems and a large set of candidate entity links from EL systems, and ranks the candidate mention-entity pairs together to make joint predictions. |
334 | Identifying salient entities in web pages | Michael Gamon, Tae Yano, Xinying Song, Johnson Apacible, Patrick Pantel | We propose a system that determines the salience of entities within web documents. |
335 | Recommending tags with a model of human categorization | Paul Seitlinger, Dominik Kowald, Christoph Trattner, Tobias Ley | In this paper we present a recommender approach for social tags derived from ALCOVE, a model of human category learning. |
336 | Automatically generating descriptions for resources by tag modeling | Bin Bi, Junghoo Cho | In this paper, we present a general framework of selecting a set of k tags as the description for a given resource. |
337 | Mining characteristic multi-scale motifs in sensor-based time series | Ugo Vespier, Siegfried Nijssen, Arno Knobbe | We propose a method to discover characteristic and potentially overlapping motifs at multiple time scales, taking into account systemic deformations and temporal warping. |
338 | Efficient forecasting for hierarchical time series | Lars Dannecker, Robert Lorenz, Philipp Rösch, Wolfgang Lehner, Gregor Hackenbroich | To increase the forecasting efficiency for hierarchically organized time series, we introduce a novel forecasting approach that takes advantage of the hierarchical organization. |
339 | Extraction and integration of web data by end-users | Sudhir Agarwal, Michael Genesereth | In this paper, we present a novel approach that enables end users to easily extract data from web pages while they browse, store it locally in their browser as well as structure, integrate and search such data. |
340 | pEDM: online-forecasting for smart energy analytics | Lars Dannecker, Philipp Rösch, Ulrike Fischer, Gordon Gaumnitz, Wolfgang Lehner, Gregor Hackenbroich | To solve this issue, we introduce our novel online forecasting process as part of our EDM system called pEDM. |
341 | An efficient probabilistic framework for multi-dimensional classification | Iyad Batal, Charmgil Hong, Milos Hauskrecht | In this paper, we propose a new probabilistic approach that represents class conditional dependencies in an effective yet computationally efficient way. |
342 | OMS-TL: a framework of online multiple source transfer learning | Liang Ge, Jing Gao, Aidong Zhang | To achieve this end, in this paper, we propose a new framework of Online Multiple Source Transfer Learning (OMS-TL). |
343 | Discovering and managing quantitative association rules | Chunyao Song, Tingjian Ge | In this paper, we propose a novel divide and conquer two-phase algorithm, which is guaranteed to find all good rules efficiently. |
344 | Combining one-class classifiers via meta learning | Eitan Menahem, Lior Rokach, Yuval Elovici | In this work we examine the notion of combining one-class classifiers as an alternative for selecting the best classifier. |
345 | Scalable bootstrapping for python | Peter Birsinger, Richard Xia, Armando Fox | In this work, we create a new DSEL compiler which instead emits code to run on Spark [16], a distributed processing framework. |
346 | FIRE: interactive visual support for parameter space-driven rule mining | Abhishek Mukherji, Xika Lin, Jason Whitehouse, Christopher R. Botaish, Elke A. Rundensteiner, Matthew O. Ward | Our user study with 22 subjects establishes the usability and effectiveness of the proposed features and interactions of FIRE using benchmark datasets. |
347 | Consumer-centric SLA manager for cloud-hosted databases | Liang Zhao, Sherif Sakr, Anna Liu | We present an end-to-end framework for consumer-centric SLA management of virtualized database servers. |
348 | TerraFly GeoCloud: online spatial data analysis system | Yun Lu, Mingjin Zhang, Tao Li, Chang Liu, Erik Edrosa, Naphtali Rishe | This paper develops an online Spatial Data Analysis System, TerraFly GeoCloud, which facilitates the end user to visualize and analyze spatial data, and to share the analysis results. |
349 | MetKB: enriching RDF knowledge bases with web entity-attribute tables | Haoqiong Bian, Yueguo Chen, Xiaoyong Du, Xiaolu Zhang | In this paper, we propose a feasible solution that is able to automatically search and rank entity-attribute tables from the Web, and effectively map the extracted tables with the RDF knowledge base with very few manual efforts. |
350 | READFAST: high-relevance search-engine for big text | Michael Gubanov, Anna Pyayt | Here we demonstrate one of the first Big text search engines that leverages hidden structure of the natural language sentences in order to process user queries and return more relevant search-results than a standard keyword-search. |
351 | FusionDB: conflict management system for small-science databases | Karim Ibrahim, Nathaniel Selvo, Mohamad El-Rifai, Mohamed Eltabakh | In this paper, we demonstrate the FusionDB system; an extended relational database engine for managing conflicts in small-science databases. |
352 | GeCo: an online personal data generator and corruptor | Khoi-Nguyen Tran, Dinusha Vatsalan, Peter Christen | We demonstrate GeCo, an online personal data GEnerator and COrruptor that facilitates the creation of realistic personal data ranging from names, addresses, and dates, to social security and credit card numbers, as well as numerical values such as salary or blood pressure. |
353 | DeExcelerator: a framework for extracting relational data from partially structured documents | Julian Eberius, Christoper Werner, Maik Thiele, Katrin Braunschweig, Lars Dannecker, Wolfgang Lehner | Studying data.gov as an example source for partially structured documents, we present a classification of typical normalization problems. |
354 | Demonstrating intelligent crawling and archiving of web applications | Muhammad Faheem, Pierre Senellart | We demonstrate here a new approach to Web archival crawling, based on an application-aware helper that drives crawls of Web applications according to their types (especially, according to their content management systems). |
355 | iNewsBox: modeling and exploiting implicit feedback for building personalized news radio | Yanan Xie, Liang Chen, Kunyang Jia, Lichuan Ji, Jian Wu | This paper presents a mobile application iNewsBox enabling users to listen to news collected from the Internet. |
356 | SportSense: using motion queries to find scenes in sports videos | Ihab Al Kabary, Heiko Schuldt | We present SportSense, a system for interactive sports video retrieval using sketch-based motion queries. |
357 | PredictionIO: a distributed machine learning server for practical software development | Simon Chan, Thomas Stone, Kit Pang Szeto, Ka Hou Chan | We present PredictionIO, an open source machine learning server that comes with a step-by-step graphical user interface for developers to (i) evaluate, compare and deploy scalable learning algorithms, (ii) tune hyperparameters of algorithms manually or automatically and (iii) evaluate model training status. |
358 | Exploring XML data is as easy as using maps | Yong Zeng, Zhifeng Bao, Guoliang Li, Tok Wang Ling | Therefore, we try to equip the traditional XML keyword search engine with our new exploration model XMAP, providing user an interactive yet novel way to explore the results with better user experience. |
359 | Inside the world’s playlist | Wouter Weerkamp, Manos Tsagkias, Maarten de Rijke | We describe Streamwatchr, a real-time system for analyzing the music listening behavior of people around the world. |
360 | Detecting and exploring clusters in attributed graphs: a plugin for the gephi platform | Brigitte Boden, Roman Haag, Thomas Seidl | In this paper, we introduce the GC-Viz system, which is implemented as a plugin for the Gephi platform. |
361 | Cloud Armor: a platform for credibility-based trust management of cloud services | Talal H. Noor, Quan Z. Sheng, Anne H.H. Ngu, Abdullah Alfazi, Jeriel Law | This paper describes Cloud Armor, a platform for credibility-based trust management of cloud services. |
362 | Human computing games for knowledge acquisition | Sarath Kumar Kondreddi, Peter Triantafillou, Gerhard Weikum | We provide a combined approach that tightly integrates automated extraction techniques with human computing for effective gathering of facts. |
363 | A tool for assisting provenance search in social media | Suhas Ranganath, Pritam Gundecha, Huan Liu | This paper presents a tool for capturing the propagation network of a given tweet or URL (Uniform Resource Locator) in the Twitter network. |
364 | SPHINX: rich insights into evidence-hypotheses relationships via parameter space-based exploration | Abhishek Mukherji, Jason Whitehouse, Christopher R. Botaish, Elke A. Rundensteiner, Matthew O. Ward | The computational contributions cover (a.) flexible computational model selection; and (b.) real-time incremental strength computations. |
365 | Search excavator: the knowledge discovery tool | Dmitri Danilov, Eero Vainikko | We present a knowledge discovery tool Search Excavator (SE) developed for detecting similar words in web documents ranked by overall usage frequency in American English. |
366 | ESTHETE: a news browsing system to visualize the context and evolution of news stories | Rahul Goyal, Ravee Malla, Amitabha Bagchi, Sameep Mehta, Maya Ramanath | In this paper, we introduce ESTHETE, a system that provides rich context(s) (through what we call personalized flexible context extraction), by preprocessing and storing articles in a structured representation (directed graphs) that makes it easy for the user to explore different contexts. |
367 | WordSeer: a knowledge synthesis environment for textual data | Aditi Muralidharan, Marti A. Hearst, Christopher Fan | We describe WordSeer, a tool whose goal is to help scholars and analysts discover patterns and formulate and test hypotheses about the contents of text collections, midway between what humanities scholars call a traditional "close read” and the new "distant read" or "culturomics" approach. |
368 | Channeling the deluge: research challenges for big data and information systems | Paul Bennett, Lee Giles, Alon Halevy, Jiawei Han, Marti Hearst, Jure Leskovec | As a group of experienced researchers in academia and industry, we will present at this panel our visions on what should be the challenging research issues in this promising research frontier and hope to attract heated discussions and debates from the audience. |
369 | AKBC 2013: third workshop on automated knowledge base construction | Fabian M. Suchanek, Sebastian Riedel, Sameer Singh, Partha P. Talukdar | The AKBC 2013 workshop aims to be a venue of excellence and vision in the area of knowledge base construction. |
370 | DOLAP 2013 workshop summary | Ladjel Bellatreche, Alfredo Cuzzocrea, Il-Yeol Song | The ACM DOLAP workshop presents research on data warehousing and On-Line Analytical Processing (OLAP). |
371 | Sixth workshop on exploiting semantic annotations in information retrieval (ESAIR’13) | Paul. N. Bennett, Evgeniy Gabrilovich, Jaap Kamps, Jussi Karlgren | Sixth workshop on exploiting semantic annotations in information retrieval (ESAIR’13) |
372 | 2013 international workshop on computational scientometrics: theory and applications | Cornelia Caragea, C. Lee Giles, Lior Rokach, Xiaozhong Liu | 2013 international workshop on computational scientometrics: theory and applications |
373 | Workshop summary for the 2013 international workshop on mining unstructured big data using natural language processing | Xiaozhong Liu, Miao Chen, Ying Ding, Min Song | Workshop summary for the 2013 international workshop on mining unstructured big data using natural language processing |
374 | CloudDB 2013: fifth international workshop on cloud data management | Feifei Li, Xiaofeng Meng, Fusheng Wang, Cong Yu | The main objective of the workshop is to address the challenges of large scale data management based on the cloud computing infrastructure. |
375 | DUBMOD13: international workshop on data-driven user behavioral modelling and mining from social media | Jalal Mahmud, Jeffrey Nichols, Michelle X. Zhou, James Caverlee, John O’Donovan | Since mining and understanding user behavior from social media often requires interdisciplinary effort, including machine learning, text mining, human-computer interaction, and social science, our workshop aims to bring together researchers and practitioners from multiple fields to discuss the creation of deeper models of individual users by mining the content that they publish and the social networking behavior that they exhibit. |
376 | PLEAD 2013: politics, elections and data | Ingmar Weber, Ana-Maria Popescu, Marco Pennacchiotti | The goal of this workshop is to bring together researchers working at the intersection of social network analysis, computational social science and political science, to share and discuss their ideas in a common forum; and to inspire further developments in this growing, fascinating field. |
377 | DTMBIO 2013: international workshop on data and text mining in biomedical informatics | Atul Butte, Doheon Lee, Hua Xu, Min Song | DTMBIO 13 will be a forum of discussing and exchanging informatics related techniques and problems in the context of biomedical research. |
378 | CIKM 2013 workshop on living labs for information retrieval evaluation | Krisztian Balog, David Elsweiler, Evangelos Kanoulas, Liadh Kelly, Mark D. Smucker | CIKM 2013 workshop on living labs for information retrieval evaluation |
379 | The first workshop on user engagement optimization | Liangjie Hong, Shuang-Hong Yang | Here, we organize the first workshop on the topic of online user engagement optimization, explicitly targeting the topic as a whole and bring researchers and practitioners together to foster the field. |
380 | PIKM 2013: the 6th ACM workshop for ph.d. students in information and knowledge management | Fabian M. Suchanek, Anisoara Nica | Similarly to the CIKM, the PIKM workshop covers a wide range of topics in the areas of databases, information retrieval and knowledge management. |
381 | Web-KR 2013: the 4th international workshop on web-scale knowledge representation, retrieval and reasoning | Yi Zeng, Spyros Kotoulas, Zhisheng Huang | This summary introduces the major contributions of accepted papers in the Web-KR 2013 workshop. |
382 | Data management & analytics for healthcare (DARE 2013) | Ullas Nambiar, Niranjan Thirumale | This workshop is focused on identifying challenges to be overcome for effectively delivering efficient healthcare and to the masses. |