Paper Digest: SIGMOD 2013 Highlights

June 16, 2013June 26, 2020 admin

The ACM Special Interest Group on Management of Data (SIGMOD) is one of the top conferences on database management systems and data management technology.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: SIGMOD 2013 Papers

	Title	Authors	Highlight
1	Cumulon: optimizing statistical data analysis in the cloud	Botong Huang, Shivnath Babu, Jun Yang	We present Cumulon, a system designed to help users rapidly develop and intelligently deploy matrix-based big-data analysis programs in the cloud.
2	Shark: SQL and rich analytics at scale	Reynold S. Xin, Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, Ion Stoica	Shark: SQL and rich analytics at scale
3	Parallel analytics as a service	Petrie Wong, Zhian He, Eric Lo	This paper presents Thrifty, a prototype implementation of MPPDB-as-a-service.
4	MESSIAH: missing element-conscious SLCA nodes search in XML data	Ba Quan Truong, Sourav S Bhowmick, Curtis Dyreson, Aixin Sun	In this paper, we generalize the SLCA search paradigm to support queries involving missing elements.
5	Indexing for subtree similarity-search using edit distance	Sara Cohen	This paper proposes the first index structure for subtree similarity-search, provided that the unit cost function is used.
6	Discovering XSD keys from XML data	Marcelo Arenas, Jonny Daenen, Frank Neven, Martin Ugarte, Jan Van den Bussche, Stijn Vansummeren	The algorithm leverages known discovery algorithms for functional dependencies in the relational model, but incorporates the above mentioned properties to assess and refine the quality of derived keys.
7	A scalable lock manager for multicores	Hyungsoo Jung, Hyuck Han, Alan D. Fekete, Gernot Heiser, Heon Y. Yeom	Our analysis of MySQL identifies latch contention within the lock manager as the bottleneck responsible for this collapse.
8	Controlled lock violation	Goetz Graefe, Mark Lillibridge, Harumi Kuno, Joseph Tucek, Alistair Veitch	Thus, we set out to achieve the same goals as early lock release but with a different, simpler, and more robust approach.
9	X-FTL: transactional FTL for SQLite databases	Woon-Hak Kang, Sang-Won Lee, Bongki Moon, Gi-Hwan Oh, Changwoo Min	In this paper, we propose X-FTL, a transactional flash translation layer(FTL) for SQLite databases.
10	Optimal splitters for temporal and multi-version databases	Wangchao Le, Feifei Li, Yufei Tao, Robert Christensen	We introduce the concept of optimal splitters for temporal and multi-version databases, which induce a partition of the input data set, and guarantee that the size of the maximum bucket be minimized among all possible configurations, given a budget for the desired number of buckets.
11	Building an efficient RDF store over a relational database	Mihaela A. Bornea, Julian Dolby, Anastasios Kementsietsidis, Kavitha Srinivas, Patrick Dantressangle, Octavian Udrea, Bishwaranjan Bhattacharjee	In this paper, we describe a novel storage and query mechanism for RDF which works on top of existing relational representations.
12	Automatic synthesis of out-of-core algorithms	Yannis Klonatos, Andres Nötzli, Andrej Spielmann, Christoph Koch, Victor Kuncak	We present a system for the automatic synthesis of efficient algorithms specialized for a particular memory hierarchy and a set of storage devices.
13	InfoGather+: semantic matching and annotation of numeric and time-varying attributes in web tables	Meihui Zhang, Kaushik Chakrabarti	Our key insight is to leverage the wealth of tables on the web and infer label information from semantically matching columns of other web tables; this complements "local" extraction from column headers.
14	Value invention in data exchange	Patricia C. Arocena, Boris Glavic, Renee J. Miller	In this paper, we present two techniques for understanding when the Skolem functions needed to represent the correct semantics of incomplete information are computationally well-behaved.
15	Indexing methods for moving object databases: games and other applications	Hanan Samet, Jagan Sankaranarayanan, Michael Auerbach	Indexing methods for moving object databases: games and other applications
16	I/O efficient: computing SCCs in massive graphs	Zhiwei Zhang, Jeffrey Xu Yu, Lu Qin, Lijun Chang, Xuemin Lin	We propose a new two phase algorithm, namely, tree construction and tree search.
17	TF-Label: a topological-folding labeling scheme for reachability querying in a large graph	James Cheng, Silu Huang, Huanhuan Wu, Ada Wai-Chee Fu	We propose TF-label, an efficient and scalable labeling scheme for processing reachability queries.
18	Efficiently computing k-edge connected components via graph decomposition	Lijun Chang, Jeffrey Xu Yu, Lu Qin, Xuemin Lin, Chengfei Liu, Weifa Liang	As a result, our algorithm for computing k-edge connected components significantly improves the time complexity of an existing state-of-the-art technique from O(\|V\|²\|E\| + \|V\|³ log \|V\|) to O(h × l × \|E\|).
19	An online cost sensitive decision-making method in crowdsourcing systems	Jinyang Gao, Xuan Liu, Beng Chin Ooi, Haixun Wang, Gang Chen	In this paper, we design and implement a cost sensitive method for crowdsourcing.
20	Leveraging transitive relations for crowdsourced joins	Jiannan Wang, Guoliang Li, Tim Kraska, Michael J. Franklin, Jianhua Feng	In this paper, we focus on the crowdsourced join query which aims to utilize humans to find all pairs of matching objects from two collections.
21	Crowd mining	Yael Amsterdamer, Yael Grossman, Tova Milo, Pierre Senellart	Based on these, we design a framework of generic components, used for choosing the best questions to ask the crowd and mining significant patterns from the answers.
22	Efficient sentiment correlation for large-scale demographics	Mikalai Tsytsarau, Sihem Amer-Yahia, Themis Palpanas	We propose a scalable approach for sentiment indexing and aggregation that works on multiple time granularities and uses incrementally updateable data structures for online operation.
23	EBM: an entropy-based model to infer social strength from spatiotemporal data	Huy Pham, Cyrus Shahabi, Yan Liu	In this paper, we are interested in inferring these social connections by analyzing people’s location information, which is useful in a variety of application domains from sales and marketing to intelligence analysis.
24	Online search of overlapping communities	Wanyun Cui, Yanghua Xiao, Haixun Wang, Yiqi Lu, Wei Wang	In this paper, we focus on online search of overlapping communities, that is, given a query vertex, we find meaningful overlapping communities the vertex belongs to in an online manner.
25	BitWeaving: fast scans for main memory data processing	Yinan Li, Jignesh M. Patel	In this paper, we propose a technique called BitWeaving that exploits the parallelism available at the bit level in modern processors.
26	Performance and resource modeling in highly-concurrent OLTP workloads	Barzan Mozafari, Carlo Curino, Alekh Jindal, Samuel Madden	In this paper, we introduce our framework, called DBSeer, that addresses this problem by employing statistical models that provide resource and performance analysis and prediction for highly concurrent OLTP workloads.
27	ODYS: an approach to building a massively-parallel search engine using a DB-IR tightly-integrated parallel DBMS for higher-level functionality	Kyu-Young Whang, Tae-Seob Yun, Yeon-Mi Yeo, Il-Yeol Song, Hyuk-Yoon Kwon, In-Joong Kim	In this paper, we propose a new approach of building a massively-parallel search engine using a DB-IR tightly-integrated parallel DBMS.
28	Massive graph triangulation	Xiaocheng Hu, Yufei Tao, Chin-Wan Chung	Motivated by this, we develop a new algorithm that is provably I/O and CPU efficient at the same time, without making any assumption on the input G at all.
29	Turbo	Wook-Shin Han, Jinsoo Lee, Jeong-Hoon Lee	In this paper, we present an efficient and robust subgraph search solution, called Turbo_ISO, which is turbo-charged with two novel concepts, candidate region exploration and the combine and permute strategy (in short, Comb/Perm).
30	Fast exact shortest-path distance queries on large networks by pruned landmark labeling	Takuya Akiba, Yoichi Iwata, Yuichi Yoshida	We propose a new exact method for shortest-path distance queries on large-scale networks.
31	Improving regular-expression matching on strings using negative factors	Xiaochun Yang, Bin Wang, Tao Qiu, Yaoshu Wang, Chen Li	In this paper we propose a novel technique that prunes false negatives by utilizing negative factors, which are substrings that cannot appear in an answer.
32	String similarity measures and joins with synonyms	Jiaheng Lu, Chunbin Lin, Wei Wang, Chen Li, Haiyong Wang	Because using synonyms in similarity measures is, while expressive, computationally expensive (NP-hard), we propose an efficient algorithm, called selective-expansion, which guarantees the optimality in many real scenarios.
33	Efficient top-k algorithms for approximate substring matching	Younghoon Kim, Kyuseok Shim	To reduce the number of expensive distance computations, the proposed algorithms utilize our novel filtering techniques which take advantages of q-grams and inverted q-gram indexes available.
34	Towards high-throughput gibbs sampling at scale: a study across storage managers	Ce Zhang, Christopher Ré	We find that both new theoretical and new algorithmic techniques are required to understand the tradeoff space for each choice.
35	Latch-free data structures for DBMS: design, implementation, and evaluation	Takashi Horikawa	This paper investigates these LF data structures with a particular focus on their applicability and effectiveness.
36	DBMS metrology: measuring query time	Sabah Currim, Richard T. Snodgrass, Young-Kyoon Suh, Rui Zhang, Matthew Wong Johnson, Cheng Yi	We review relevant process and overall measures obtainable from the Linux kernel and introduce a structural causal model relating these measures.
37	Quality and efficiency for kernel density estimates in large data	Yan Zheng, Jeffrey Jestes, Jeff M. Phillips, Feifei Li	We propose randomized and deterministic algorithms with quality guarantees which are orders of magnitude more efficient than previous algorithms.
38	Efficient ad-hoc search for personalized PageRank	Yasuhiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, Takeshi Mishima, Makoto Onizuka	The goal of this paper is to efficiently find the top-k nodes with exact node ranking so as to effectively support interactive similarity search based on PPR.
39	Provenance-based dictionary refinement in information extraction	Sudeepa Roy, Laura Chiticariu, Vitaly Feldman, Frederick R. Reiss, Huaiyu Zhu	In this paper, we study the dictionary refinement problem and address the above challenges.
40	CS2: a new database synopsis for query estimation	Feng Yu, Wen-Chi Hou, Cheng Luo, Dunren Che, Mengxia Zhu	We introduce a statistical technique, called reverse sample, and design a powerful estimator, called reverse estimator, to fully utilize correlated sample tuples for query estimation.
41	Branch-and-bound algorithm for reverse top-k queries	Akrivi Vlachou, Christos Doulkeridis, Kjetil Nørvåg, Yannis Kotidis	We propose an intuitive branch-and-bound algorithm for processing reverse top-k queries efficiently and discuss novel optimizations to boost its performance.
42	On the correct and complete enumeration of the core search space	Guido Moerkotte, Pit Fender, Marius Eich	We present three conflict detectors.
43	Trinity: a distributed graph engine on a memory cloud	Bin Shao, Haixun Wang, Yatao Li	In this paper, we introduce Trinity, a general purpose graph engine over a distributed memory cloud.
44	Characterizing tenant behavior for placement and crisis mitigation in multitenant DBMSs	Aaron J. Elmore, Sudipto Das, Alexander Pucher, Divyakant Agrawal, Amr El Abbadi, Xifeng Yan	We present Delphi, a self-managing system controller for a multitenant DBMS, and Pythia, a technique to learn behavior through observation and supervision using DBMS-agnostic database level performance measures.
45	Minimal MapReduce algorithms	Yufei Tao, Wenqing Lin, Xiaokui Xiao	This paper presents the notion of minimal algorithm, that is, an algorithm that guarantees the best parallelization in multiple aspects at the same time, up to a small constant factor.
46	NADEEF: a commodity data cleaning system	Michele Dallachiesa, Amr Ebaid, Ahmed Eldawy, Ahmed Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, Nan Tang	In this paper, we present NADEEF, an extensible, generalized and easy-to-deploy data cleaning platform.
47	Don’t be SCAREd: use SCalable Automatic REpairing with maximal likelihood and bounded changes	Mohamed Yakout, Laure Berti-Équille, Ahmed K. Elmagarmid	In this paper, we propose a new data repairing approach that is based on maximizing the likelihood of replacement data given the data distribution, which can be modeled using statistical machine learning techniques.
48	Determining the relative accuracy of attributes	Yang Cao, Wenfei Fan, Wenyuan Yu	This paper proposes a model for determining relative accuracy.
49	Photon: fault-tolerant and scalable joining of continuous data streams	Rajagopal Ananthanarayanan, Venkatesh Basker, Sumit Das, Ashish Gupta, Haifeng Jiang, Tianhao Qiu, Alexey Reznichenko, Deomid Ryabkov, Manpreet Singh, Shivakumar Venkataraman	In this paper, we describe the architecture of Photon, a geographically distributed system for joining multiple continuously flowing streams of data in real-time with high scalability and low latency, where the streams may be unordered or delayed. We also present challenges and solutions in maintaining large persistent state across geographically distant locations, and highlight the design principles that emerged from our experience.
50	Utility-maximizing event stream suppression	Di Wang, Yeye He, Elke Rundensteiner, Jeffrey F. Naughton	In this paper we consider how to suppress events in a stream to reduce the disclosure of sensitive patterns while maximizing the detection of nonsensitive patterns.
51	ε-Matching: event processing over noisy sequences in real time	Zheng Li, Tingjian Ge, Cindy X. Chen	Instead of the traditional approach of learning a distribution of the stream first and then processing queries, we propose a new approach that efficiently does the matching based on an error model.
52	Toward practical query pricing with QueryMarket	Paraschos Koutris, Prasang Upadhyaya, Magdalena Balazinska, Bill Howe, Dan Suciu	We develop a new pricing system, QueryMarket, for flexible query pricing in a data market based on an earlier theoretical framework (Koutris et al., PODS 2012).
53	Generalized scale independence through incremental precomputation	Michael Armbrust, Eric Liang, Tim Kraska, Armando Fox, Michael J. Franklin, David A. Patterson	In this paper, we describe a scale-independent view selection and maintenance system, which uses novel static analysis techniques that ensure that created views do not themselves become scaling bottlenecks.
54	Simulation of database-valued markov chains using SimSQL	Zhuhua Cai, Zografoula Vagena, Luis Perez, Subramanian Arumugam, Peter J. Haas, Christopher Jermaine	This paper describes the SimSQL system, which allows for SQLbased specification, simulation, and querying of database-valued Markov chains, i.e., chains whose value at any time step comprises the contents of an entire database.
55	Recursive mechanism: towards node differential privacy and unrestricted joins	Shixi Chen, Shuigeng Zhou	In this paper, we propose a novel differentially private mechanism that supports unrestricted joins, to release an approximation of a linear statistic of the result of some positive relational algebra calculation over a sensitive database.
56	PrivGene: differentially private model fitting using genetic algorithms	Jun Zhang, Xiaokui Xiao, Yin Yang, Zhenjie Zhang, Marianne Winslett	Motivated by this, we propose PrivGene, a general-purpose differentially private model fitting solution based on genetic algorithms (GA).
57	Information preservation in statistical privacy and bayesian estimation of unattributed histograms	Bing-Rong Lin, Daniel Kifer	In statistical privacy, utility refers to two concepts: information preservation — how much statistical information is retained by a sanitizing algorithm, and usability — how (and with how much difficulty) does one extract this information to build statistical models, answer queries, etc.
58	Collective spatial keyword queries: a distance owner-driven approach	Cheng Long, Raymond Chi-Wing Wong, Ke Wang, Ada Wai-Chee Fu	In this paper, we study the CoSKQ problem and address the above issues.
59	TOUCH: in-memory spatial join by hierarchical data-oriented partitioning	Sadegh Nobari, Farhan Tauheed, Thomas Heinis, Panagiotis Karras, Stéphane Bressan, Anastasia Ailamaki	In this paper we develop TOUCH, a novel in-memory spatial join algorithm that uses hierarchical data-oriented space partitioning, thereby keeping both its memory footprint and the number of comparisons low.
60	Finding time period-based most frequent path in big trajectory data	Wuman Luo, Haoyu Tan, Lei Chen, Lionel M. Ni	In this paper, we study a new path finding query which finds the most frequent path (MFP) during user-specified time periods in large-scale historical trajectory data.
61	Integrating scale out and fault tolerance in stream processing using operator state management	Raul Castro Fernandez, Matteo Migliavacca, Evangelia Kalyvianaki, Peter Pietzuch	Based on them, we describe an integrated approach for dynamic scale out and recovery of stateful operators.
62	Quantiles over data streams: an experimental study	Lu Wang, Ge Luo, Ke Yi, Graham Cormode	In this paper, we remedy this deficit by providing a taxonomy of different methods, and describe efficient implementations.
63	An efficient query indexing mechanism for filtering geo-textual data	Lisi Chen, Gao Cong, Xin Cao	In particular, we propose a hybrid index, called IQ-tree, and novel cost models for managing a stream of incoming Boolean Range Continuous queries.
64	Bolt-on causal consistency	Peter Bailis, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica	We consider the problem of separating consistency-related safety properties from availability and durability in distributed data stores via the application of a "bolt-on" shim layer that upgrades the safety of an underlying general-purpose data store.
65	RTP: robust tenant placement for elastic in-memory database clusters	Jan Schaffner, Tim Januschowski, Megan Kercher, Tim Kraska, Hasso Plattner, Michael J. Franklin, Dean Jacobs	In this paper, we consider algorithms that elastically contract and expand a cluster of in-memory databases depending on tenants’ behavior over time while maintaining response time guarantees.
66	Inter-media hashing for large-scale retrieval from heterogeneous data sources	Jingkuan Song, Yang Yang, Yi Yang, Zi Huang, Heng Tao Shen	In this paper, we present a new multimedia retrieval paradigm to innovate large-scale search of heterogenous multimedia data.
67	Mind the gap: large-scale frequent sequence mining	Iris Miliaraki, Klaus Berberich, Rainer Gemulla, Spyros Zoupanos	In this paper, we propose MG-FSM, a scalable algorithm for frequent sequence mining on MapReduce.
68	Reverse engineering complex join queries	Meihui Zhang, Hazem Elmeleegy, Cecilia M. Procopiuc, Divesh Srivastava	In this paper, we propose an efficient algorithm that discovers queries with arbitrary join graphs.
69	A direct mining approach to efficient constrained graph pattern discovery	Feida Zhu, Zequn Zhang, Qiang Qu	In this paper, we propose a direct mining framework to solve the problem and illustrate our ideas in the context of a particular type of constrained frequent patterns — the "skinny" patterns, which are graph patterns with a long backbone from which short twigs branch out.
70	Calibrating trajectory data for similarity-based analysis	Han Su, Kai Zheng, Haozhou Wang, Jiamin Huang, Xiaofang Zhou	In this paper, we pioneer a systematic approach to trajectory calibration that is a process to transform a heterogeneous trajectory dataset to one with (almost) unified sampling strategies.
71	On optimal worst-case matching	Cheng Long, Raymond Chi-Wing Wong, Philip S. Yu, Minhao Jiang	In this paper, we propose a new problem called Spatial Matching for Minimizing Maximum matching distance (SPM-MM).
72	Shortest path and distance queries on road networks: towards bridging theory and practice	Andy Diwen Zhu, Hui Ma, Xiaokui Xiao, Siqiang Luo, Youze Tang, Shuigeng Zhou	This paper presents Arterial Hierarchy (AH), an index structure that narrows the gap between theory and practice in answering shortest path and distance queries on road networks.
73	Fine-grained disclosure control for app ecosystems	Gabriel M. Bender, Lucja Kot, Johannes Gehrke, Christoph Koch	In this paper, we develop a new model of disclosure in app ecosystems.
74	Lightweight authentication of linear algebraic queries on data streams	Stavros Papadopoulos, Graham Cormode, Antonios Deligiannakis, Minos Garofalakis	We consider a stream outsourcing setting, where a data owner delegates the management of a set of disjoint data streams to an untrusted server.
75	Column imprints: a secondary index structure	Lefteris Sidirourgos, Martin Kersten	In this paper, we introduce column imprint, a simple but efficient cache conscious secondary index.
76	DeltaNI: an efficient labeling scheme for versioned hierarchical data	Jan Finis, Robert Brunel, Alfons Kemper, Thomas Neumann, Franz Färber, Norman May	We propose the DeltaNI index as a versioned pendant of the nested intervals (NI) labeling scheme.
77	Big data in capital markets	Alex Nazaruk, Michael Rauchman	In this talk we present and analyze forces behind the wide proliferation of electronic securities trading in US stocks and options markets.
78	Managing database technology at enterprise scale	Paul Yaron	This talk will discuss the challenges and strategies of managing the evolving ecosystem of "all data", from information security, to internal virtualization strategies.
79	We are drowning in a sea of least publishable units (LPUs)	David J. DeWitt, Ihab F. Ilyas, Jeffrey Naughton, Michael Stonebraker	In order to improve the quality of the papers being published we must reduce the number being submitted.
80	Rethinking eventual consistency	Philip A. Bernstein, Sudipto Das	We present a framework for comparing these criteria and mechanisms, to help architects navigate through this complex design space.
81	Workload management for big data analytics	Ashraf Aboulnaga, Shivnath Babu	Workload management for big data analytics
82	Knowledge harvesting in the big-data era	Fabian Suchanek, Gerhard Weikum	This tutorial presents state-of-the-art methods, recent advances, research opportunities, and open challenges along this avenue of knowledge harvesting and its applications.
83	Machine learning for big data	Tyson Condie, Paul Mineiro, Neoklis Polyzotis, Markus Weimer	This tutorial introduces current applications, techniques and systems with the aim of cross-fertilizing research between the database and machine learning communities. This leads to the closing of the seminar, where we introduce two sets of open research questions: Better systems support for the already established use cases of Machine Learning and support for recent advances in Machine Learning research.
84	Data management perspectives on business process management: tutorial overview	Richard Hull, Jianwen Su, Roman Vaculin	A recent approach to BPM, based on "business artifacts", is centered on a modeling framework that places data and process on an equal footing.
85	Data stream warehousing	Lukasz Golab, Theodore Johnson	Data stream warehousing
86	Data-driven neuroscience: enabling breakthroughs via innovative data management	Alexandros Stougiannis, Mirjana Pavlovic, Farhan Tauheed, Thomas Heinis, Anastasia Ailamaki	The level of detail of their models is unprecedented as they model details on the subcellular level (e.g., the neurotransmitter).
87	TsingNUS: a location-based service system towards live city	Guoliang Li, Nan Zhang, Ruicheng Zhong, Sitong Liu, Weihuang Huang, Ju Fan, Kian-Lee Tan, Lizhu Zhou, Jianhua Feng	We present our system towards live city, called TsingNUS, aiming to provide users with more user-friendly location-aware search experiences.
88	WOW: what the world of (data) warehousing can learn from the World of Warcraft	Rene Mueller, Tim Kaldewey, Guy M. Lohman, John McPherson	WOW: what the world of (data) warehousing can learn from the World of Warcraft
89	Rule-based application development using Webdamlog	Serge Abiteboul, Émilien Antoine, Gerome Miklau, Julia Stoyanovich, Jules Testard	We present the WebdamLog system for managing distributed data on the Web in a peer-to-peer manner.
90	Speeding up database applications with Pyxis	Alvin Cheung, Owen Arden, Samuel Madden, Andrew C. Myers	We propose to demonstrate Pyxis, a system that optimizes database applications by pushing computation to the database server.
91	Peckalytics: analyzing experts and interests on Twitter	Alex Cheng, Nilesh Bansal, Nick Koudas	Our aim is to facilitate targeting and optimization of advertising campaigns on the Twitter platform.
92	Packing experiments for sharing and publication	Fernando Chirigati, Dennis Shasha, Juliana Freire	As a step towards simplifying the process of creating reproducible experiments, we have developed ReproZip, a tool that automatically captures the provenance of experiments and packs all the necessary files, library dependencies and variables to reproduce the results.
93	CHIC: a combination-based recommendation system	Manasi Vartak, Samuel Madden	In this demonstration, we present CHIC, a first-of-its-kind, combination-based recommendation system for clothing.
94	Noah: a dynamic ridesharing system	Charles Tian, Yan Huang, Zhi Liu, Favyen Bastani, Ruoming Jin	Noah: a dynamic ridesharing system
95	A query answering system for data with evolution relationships	Siarhei Bykau, Flavio Rizzolo, Yannis Velegrakis	We present the TrenDS, a system for exploiting the evolution relationships between the structures in the database.
96	GeoDeepDive: statistical inference using familiar data-processing languages	Ce Zhang, Vidhya Govindaraju, Jackson Borchardt, Tim Foltz, Christopher Ré, Shanan Peters	We describe our proposed demonstration of GeoDeepDive, a system that helps geoscientists discover information and knowledge buried in the text, tables, and figures of geology journal articles.
97	Fact checking and analyzing the web	François Goasdoué, Konstantinos Karanasos, Yannis Katsis, Julien Leblay, Ioana Manolescu, Stamatis Zampetakis	We propose to demonstrate FactMinder, a fact checking and analysis assistance application.
98	Data mining algorithms as a service in the cloud exploiting relational database systems	Carlos Ordonez, Javier García-García, Carlos Garcia-Alvarado, Wellington Cabrera, Veerabhadran Baladandayuthapani, Mohammed S. Quraishi	We present a novel cloud system based on DBMS technology, where data mining algorithms are offered as a service.
99	SONDY: an open source platform for social dynamics mining and analysis	Adrien Guille, Cécile Favre, Hakim Hacid, Djamel A. Zighed	This paper describes SONDY, a tool for analysis of trends and dynamics in online social network data.
100	Interactive data mining with 3D-parallel-coordinate-trees	Elke Achtert, Hans-Peter Kriegel, Erich Schubert, Arthur Zimek	Here, we provide a tool to explore complex data sets using 3D-parallel-coordinate-trees, along with a number of approaches to arrange the axes.
101	Stat!: an interactive analytics environment for big data	Mike Barnett, Badrish Chandramouli, Robert DeLine, Steven Drucker, Danyel Fisher, Jonathan Goldstein, Patrick Morrison, John Platt	We demonstrate Stat!
102	PARAS: interactive parameter space exploration for association rule mining	Abhishek Mukherji, Xika Lin, Christopher R. Botaish, Jason Whitehouse, Elke A. Rundensteiner, Matthew O. Ward, Carolina Ruiz	We demonstrate our PARAS technology for supporting interactive association mining at near real-time speeds.
103	STEM: a spatio-temporal miner for bursty activity	Theodoros Lappas, Marcos R. Vieira, Dimitrios Gunopulos, Vassilis J. Tsotras	In this paper we describe STEM (Spatio-TEmporal Miner), a system for finding spatiotemporal burstiness patterns in a collection of spatially distributed frequency streams.
104	The farm: where pig scripts are bred and raised	Craig P. Sayers, Alkis Simitsis, Georgia Koutrika, Alejandro Guerrero Gonzalez, David Tamez Cantu, Meichun Hsu	To further assist developers, and support novice users, we offer "The Farm", a catalog of scriptable services supporting creation, discovery, composition, and optimized execution.
105	LinkIT: privacy preserving record linkage and integration via transformations	Luca Bonomi, Li Xiong, James J. Lu	We propose to demonstrate an open-source tool, LinkIT, for privacy preserving record Linkage and Integration via data Transformations.
106	Secure database-as-a-service with Cipherbase	Arvind Arasu, Spyros Blanas, Ken Eguro, Manas Joglekar, Raghav Kaushik, Donald Kossmann, Ravi Ramamurthy, Prasang Upadhyaya, Ramarathnam Venkatesan	In this demonstration we outline the functionality of Cipherbase — a full fledged SQL database system that supports the full generality of a database system while providing high data confidentiality.
107	DBalancer: distributed load balancing for NoSQL data-stores	Ioannis Konstantinou, Dimitrios Tsoumakos, Ioannis Mytilinis, Nectarios Koziris	In this demonstration, we present the DBalancer, a generic distributed module that can be installed on top of a typical NoSQL data-store and provide an efficient and highly configurable load balancing mechanism.
108	COCCUS: self-configured cost-based query services in the cloud	Ioannis Konstantinou, Verena Kantere, Dimitrios Tsoumakos, Nectarios Koziris	We demonstrate COCCUS, a modular system for cost-aware query execution, adaptive query charge and optimization of cloud data services.
109	Workload optimization using SharedDB	Georgios Giannikis, Darko Makreshanski, Gustavo Alonso, Donald Kossmann	Workload optimization using SharedDB
110	SciQL: array data processing inside an RDBMS	Ying Zhang, Martin Kersten, Stefan Manegold	To bridge the gap between the needs of the Data-Intensive Research fields and the current DBMS technologies, we have introduced SciQL (pronounced as ‘cycle’).
111	Iterative parallel data processing with stratosphere: an inside look	Stephan Ewen, Sebastian Schelter, Kostas Tzoumas, Daniel Warneke, Volker Markl	In prior work, we have shown how to extend and use a parallel data flow system to efficiently run iterative algorithms in a shared-nothing environment.
112	CARTILAGE: adding flexibility to the Hadoop skeleton	Alekh Jindal, Jorge Quiané-Ruiz, Samuel Madden	In this paper, we present CARTILAGE, a comprehensive data storage framework built on top of HDFS.
113	Continuous outlier detection in data streams: an extensible framework and state-of-the-art algorithms	Dimitrios Georgiadis, Maria Kontaki, Anastasios Gounaris, Apostolos N. Papadopoulos, Kostas Tsichlas, Yannis Manolopoulos	In this demo, we show both an extendible framework for outlier detection algorithms and specific outlier detection algorithms for the demanding case where outlier detection is continuously performed over a data stream.
114	FAST: differentially private real-time aggregate monitor with filtering and adaptive sampling	Liyue Fan, Li Xiong, Vaidy Sunderam	We propose FAST, a real-time system that allows differentially private aggregate sharing and time-series analytics.
115	Execution and optimization of continuous queries with cyclops	Harold Lim, Shivnath Babu	Cyclops employs a cost-based approach for picking the most suitable engine and plan for executing a given query.
116	Less watts, more performance: an intelligent storage engine for data appliances	Louis Woods, Jens Teubner, Gustavo Alonso	In this demonstration, we present Ibex, a novel storage engine featuring hybrid, FPGA-accelerated query processing.
117	A demonstration of SQLVM: performance isolation in multi-tenant relational database-as-a-service	Vivek Narasayya, Sudipto Das, Manoj Syamala, Surajit Chaudhuri, Feng Li, Hyunjung Park	A demonstration of SQLVM: performance isolation in multi-tenant relational database-as-a-service
118	SQUIN: a traversal based query execution system for the web of linked data	Olaf Hartig	We demonstrate the query execution system SQUIN which implements a novel query execution approach.
119	GRDB: a system for declarative and interactive analysis of noisy information networks	Walaa Eldin Moustafa, Hui Miao, Amol Deshpande, Lise Getoor	GRDB: a system for declarative and interactive analysis of noisy information networks
120	HiNGE: enabling temporal network analytics at scale	Udayan Khurana, Amol Deshpande	In this demonstration proposal, we present HiNGE (Historical Network/Graph Explorer), a system that enables interactive exploration and analytics over large evolving networks through visualization and node-centric metric computations.
121	Research-insight: providing insight on research by publication network analysis	Fangbo Tao, Xiao Yu, Kin Hou Lei, George Brova, Xiao Cheng, Jiawei Han, Rucha Kanade, Yizhou Sun, Chi Wang, Lidan Wang, Tim Weninger	Much knowl- edge can be derived from such information networks if we systematically develop an effective and scalable database-oriented information network analysis technology.
122	QUBLE: blending visual subgraph query formulation with query processing on large networks	Ho Hoang Hung, Sourav S Bhowmick, Ba Quan Truong, Byron Choi, Shuigeng Zhou	In this demonstration, we present a novel system called QUBLE (QUery Blender for Large nEtworks) to realize this novel paradigm on large networks.
123	StreamWorks: a system for dynamic graph search	Sutanay Choudhury, Lawrence Holder, George Chin, Abhik Ray, Sherman Beus, John Feo	The goal of our work is to enable real-time search capabilities for graph databases.
124	Query processing on prefix trees live	Thomas Kissinger, Benjamin Schlegel, Dirk Habich, Wolfgang Lehner	In this demonstration proposal, we present DexterDB, which implements our novel prefix tree-based processing model that makes indexes the first-class citizen of the database system.
125	xPAD: a platform for analytic data flows	Alkis Simitsis, Kevin Wilkinson, Petar Jovanovic	We present xPAD, our platform to manage analytic data flows.
126	PBS at work: advancing data management with consistency metrics	Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, Ion Stoica	A large body of recent work has proposed analytical and empirical techniques for quantifying the data consistency properties of distributed data stores.
127	The power of data use management in action	Prasang Upadhyaya, Nick Anderson, Magdalena Balazinska, Bill Howe, Raghav Kaushik, Ravi Ramamurthy, Dan Suciu	The power of data use management in action
128	CTrace: semantic comparison of multi-granularity process traces	Qing Liu, Kerry Taylor, Xiang Zhao, Geoffrey Squire, Xuemin Lin, Corne Kloppers, Richard Miller	We present CTrace, a system that (1) lets users explore the conceptual abstraction of large process traces with different levels of granularity, and (2) provides semantic comparison among traces in which both the structural and the semantic similarity are considered.
129	The big data ecosystem at LinkedIn	Roshan Sumbaly, Jay Kreps, Sam Shah	In particular, we present our solutions to the “last mile” issues in providing a rich developer ecosystem.
130	On brewing fresh espresso: LinkedIn’s distributed data serving platform	Lin Qiao, Kapil Surlaker, Shirshanka Das, Tom Quiggle, Bob Schulman, Bhaskar Ghosh, Antony Curtis, Oliver Seeliger, Zhen Zhang, Aditya Auradar, Chris Beaver, Gregory Brandt, Mihir Gandhi, Kishore Gopalakrishna, Wai Ip, Swaroop Jgadish, Shi Lu, Alexander Pachev, Aditya Ramesh, Abraham Sebastian, Rupa Shanbhag, Subbu Subramaniam, Yun Sun, Sajid Topiwala, Cuong Tran, Jemiah Westerman, David Zhang	This paper describes the motivation and design principles involved in building Espresso, the data model and capabilities exposed to clients, details of the replication and secondary indexing implementation and presents a set of experimental results that characterize the performance of the system along various dimensions.
131	Fast data in the era of big data: Twitter’s real-time related query suggestion architecture	Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin	We present the architecture behind Twitter’s real-time related query suggestion and spelling correction service.
132	Enhancements to SQL server column stores	Per-Ake Larson, Cipri Clinciu, Campbell Fraser, Eric N. Hanson, Mostafa Mokhtar, Michal Nowakiewicz, Vassilis Papadimos, Susan L. Price, Srikumar Rangarajan, Remus Rusanu, Mayukh Saubhasik	This paper gives an overview of SQL Server’s column stores and batch processing, in particular the enhancements introduced in the upcoming release.
133	Query containment in entity SQL	Guillem Rull, Philip A. Bernstein, Ivo Garcia dos Santos, Yannis Katsis, Sergey Melnik, Ernest Teniente	We describe a software architecture we have developed for a constructive containment checker of Entity SQL queries defined over extended ER schemas expressed in Microsoft’s Entity Data Model.
134	Timeline index: a unified data structure for processing queries on temporal data in SAP HANA	Martin Kaufmann, Amin Amiri Manjili, Panagiotis Vagenas, Peter Michael Fischer, Donald Kossmann, Franz Färber, Norman May	In this paper, we develop the Timeline Index as a novel, unified data structure that efficiently supports temporal operators such as temporal aggregation, time travel, and temporal joins.
135	LinkBench: a database benchmark based on the Facebook social graph	Timothy G. Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, Mark Callaghan	In this paper we present a new synthetic benchmark called LinkBench.
136	BigBench: towards an industry standard benchmark for big data analytics	Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, Hans-Arno Jacobsen	In this paper, we present BigBench, an end-to-end big data benchmark proposal.
137	Building, maintaining, and using knowledge bases: a report from the trenches	Omkar Deshpande, Digvijay S. Lamba, Michel Tourn, Sanjib Das, Sri Subramaniam, Anand Rajaraman, Venky Harinarayan, AnHai Doan	In this paper we describe such a process.
138	Query processing on smart SSDs: opportunities and challenges	Jaeyoung Do, Yang-Suk Kee, Jignesh M. Patel, Chanik Park, Kwanghyun Park, David J. DeWitt	We have implemented an initial prototype of Microsoft SQL Server running on a Samsung Smart SSD.
139	Micro adaptivity in Vectorwise	Bogdan Răducanu, Peter Boncz, Marcin Zukowski	In this paper, we (i) characterize a number of factors that cause performance diversity between primitive flavors, (ii) describe an e-greedy learning algorithm that casts the flavor selection into a multi-armed bandit problem, and (iii) describe the software framework for Micro Adaptivity that we implemented in the Vectorwise system.
140	Hekaton: SQL server’s memory-optimized OLTP engine	Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, Mike Zwilling	To achieve this it uses only latch-free data structures and a new optimistic, multiversion concurrency control technique.
141	Split query processing in polybase	David J. DeWitt, Alan Halverson, Rimma Nehme, Srinath Shankar, Josep Aguilar-Saborit, Artin Avanes, Miro Flasza, Jim Gramling	This paper presents Polybase, a feature of SQL Server PDW V2 that allows users to manage and query data stored in a Hadoop cluster using the standard SQL query language.
142	Petabyte scale databases and storage systems at Facebook	Dhruba Borthakur	We describe the techniques we have used to map the Facebook Graph Database into a set of relational tables. We present an alternate set of benchmark techniques that measure capacity of a database, the value/byte in that database and the efficiency of inbuilt crowd-sourcing techniques to reduce administration costs of that database.
143	Incremental mapping compilation in an object-to-relational mapping system	Philip A. Bernstein, Marie Jacob, Jorge Pérez, Guillem Rull, James F. Terwilliger	We define the problem formally, present algorithms to solve it for Microsoft’s Entity Framework, and report on an implementation.
144	FriendRouter: real-time path finder in social networks	Wladston Viana, Mirella M. Moro	Here, we introduce a tool for finding paths between social network users in real-time, a task that classical solutions are not tailored for.
145	Adaptive log compression for massive log data	Robert Christensen, Feifei Li	We present a novel adaptive log compression scheme.
146	BUZZARD: a NUMA-aware in-memory indexing system	Lukas M. Maas, Thomas Kissinger, Dirk Habich, Wolfgang Lehner	Using adaptive data partitioning techniques, BUZZARD distributes a prefix-tree-based index across the NUMA system and hands off incoming requests to worker threads located on each partition’s respective NUMA node.
147	Resa: realtime elastic streaming analytics in the cloud	Tian Tan, Richard T.B. Ma, Marianne Winslett, Yin Yang, Yong Yu, Zhenjie Zhang	We propose Resa, a novel framework for robust, elastic and realtime stream processing in the cloud.
148	Natural language question answering over RDF data	Ruizhe Huang, Lei Zou	In this work, we propose a methodology to translate natural language questions into SPARQL queries, which can be answered by existing RDF engines and fulfill users?
149	Mobile interaction and query optimizationin a protein-ligand data analysis system	Marvin Lapeine, Katherine G. Herbert, Emily Hill, Nina M. Goodey	Our approach applies standards as well as uses novel mechanisms to help improve performance time.
150	SIGMOD 2013 new researcher symposium	Anish Das Sarma, Xin Luna Dong	SIGMOD 2013 new researcher symposium