Paper Digest: SIGMOD 2014 Highlights

June 16, 2014June 26, 2020 admin

The ACM Special Interest Group on Management of Data (SIGMOD) is one of the top conferences on database management systems and data management technology.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: SIGMOD 2014 Papers

	Title	Authors	Highlight
1	Edgar F. Codd Innovations Award Talk	Martin Kersten	Edgar F. Codd Innovations Award Talk
2	SIGMOD Jim Gray Doctoral Dissertation Award Talk	Aditya Parameswaran	SIGMOD Jim Gray Doctoral Dissertation Award Talk
3	SIGMOD Jim Gray Doctoral Dissertation Award Talk	Andy Pavlo	SIGMOD Jim Gray Doctoral Dissertation Award Talk
4	How I learned to stop worrying and love compilers	Eric Sedlar	The modern platforms that we want to use to manage our data are far more complex to program efficiently than the machines we used in the past.
5	PLANET: making progress with commit processing in unpredictable environments	Gene Pang, Tim Kraska, Michael J. Franklin, Alan Fekete	We propose Predictive Latency-Aware NEtworked Transactions (PLANET), a new transaction programming model and underlying system support to address this issue.
6	Lazy evaluation of transactions in database systems	Jose M. Faleiro, Alexander Thomson, Daniel J. Abadi	We introduce a \textit{lazy} transaction execution engine, in which a transaction may be considered durably completed after only partial execution, while the bulk of its operations (notably all reads from the database and all execution of transaction logic) may be deferred until an arbitrary future time, such as when a user attempts to read some element of the transaction’s write-set—all without modifying the semantics of the transaction or sacrificing ACID guarantees.
7	Scalable atomic visibility with RAMP transactions	Peter Bailis, Alan Fekete, Joseph M. Hellerstein, Ali Ghodsi, Ion Stoica	In this work, we identify a new isolation model—Read Atomic (RA) isolation—that matches the requirements of these use cases by ensuring atomic visibility: either all or none of each transaction’s updates are observed by other transactions.
8	JECB: a join-extension, code-based approach to OLTP data partitioning	Khai Q. Tran, Jeffrey F. Naughton, Bruhathi Sundarmurthy, Dimitris Tsirogiannis	In this paper, we present a low-overhead data partitioning approach, termed JECB, that can reduce the number of distributed transactions in complex database workloads such as TPC-E.
9	HYDRA: large-scale social identity linkage via heterogeneous behavior modeling	Siyuan Liu, Shuhui Wang, Feida Zhu, Jinbo Zhang, Ramayya Krishnan	This paper proposes HYDRA, a solution framework which consists of three key steps: (I) modeling heterogeneous behavior by long-term behavior distribution analysis and multi-resolution temporal information matching; (II) constructing structural consistency graph to measure the high-order structure consistency on users’ core social structures across different platforms; and (III) learning the mapping function by multi-objective optimization composed of both the supervised learning on pair-wise ID linkage information and the cross-platform structure consistency maximization.
10	In search of influential event organizers in online social networks	Kaiyu Feng, Gao Cong, Sourav S. Bhowmick, Shuai Ma	Hence, we propose three algorithms to find approximate solutions to the problem.
11	Influence maximization: near-optimal time complexity meets practical efficiency	Youze Tang, Xiaokui Xiao, Yanchen Shi	This paper presents TIM, an algorithm that aims to bridge the theory and practice in influence maximization.
12	Efficient location-aware influence maximization	Guoliang Li, Shuo Chen, Jianhua Feng, Kian-lee Tan, Wen-syan Li	In this paper we study the location-aware influence maximization problem.
13	Density-based place clustering in geo-social networks	Jieming Shi, Nikos Mamoulis, Dingming Wu, David W. Cheung	In this paper, we show how the density-based clustering paradigm can be extended to apply on places which are visited by users of a geo-social network.
14	Hypersphere dominance: an optimal approach	Cheng Long, Raymond Chi-Wing Wong, Bin Zhang, Min Xie	In this paper, we propose an approach called Hyperbola which is optimal in the sense that it gives neither false positives nor false negatives and runs in linear time wrt the dimensionality.
15	Efficient algorithms for optimal location queries in road networks	Zitong Chen, Yubao Liu, Raymond Chi-Wing Wong, Jiamin Xiong, Ganglin Mai, Cheng Long	In this paper, we study the optimal location query problem based on road networks.
16	Robust set reconciliation	Di Chen, Christian Konrad, Ke Yi, Wei Yu, Qin Zhang	In this paper, we propose the robust set reconciliation problem, and take a principled approach to address this issue via the earth mover’s distance.
17	Storm@twitter	Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, Dmitriy Ryaboy	This paper describes the use of Storm at Twitter.
18	Druid: a real-time analytical data store	Fangjin Yang, Eric Tschetter, Xavier Léauté, Nelson Ray, Gian Merlino, Deep Ganguli	In this paper, we describe Druid’s architecture, and detail how it supports fast aggregations, flexible filters, and low latency data ingestion.
19	The next generation operational data historian for IoT based on informix	Sheng Huang, Yaoliang Chen, Xiaoyan Chen, Kai Liu, Xiaomin Xu, Chen Wang, Kevin Brown, Inge Halilovic	In this paper, we present the next-generation Opera-tional Data Historian (ODH) system that is based on the IBM© Informix© system architecture. In addition, we present the first benchmark, IoT-X, to evaluate technologies on operational data management for IoT.
20	GenBase: a complex analytics genomics benchmark	Rebecca Taft, Manasi Vartak, Nadathur Rajagopalan Satish, Narayanan Sundaram, Samuel Madden, Michael Stonebraker	This paper introduces a new benchmark designed to test database management system (DBMS) performance on a mix of data management tasks (joins, filters, etc.) and complex analytics (regression, singular value decomposition, etc.) Such mixed workloads are prevalent in a number of application areas including most science workloads and web analytics.
21	How to stop under-utilization and love multicores	Anastasia Ailamaki, Erietta Liarou, Pinar Tözün, Danica Porobic, Iraklis Psaroudakis	In this tutorial, we shed light on the above three challenges and survey recent proposals to alleviate them.
22	AutoPlait: automatic mining of co-evolving time sequences	Yasuko Matsubara, Yasushi Sakurai, Christos Faloutsos	In this paper we present AutoPlait, a fully automatic mining algorithm for co-evolving time sequences.
23	Resource-oriented approximation for frequent itemset mining from bursty data streams	Yoshitaka Yamamoto, Koji Iwanuma, Shoshi Fukuda	Thus, we present resource-oriented approximation algorithms that fix an upper bound for memory consumption to tolerate bursty transactions.
24	On complexity and optimization of expensive queries in complex event processing	Haopeng Zhang, Yanlei Diao, Neil Immerman	This analysis allows us to identify performance bottlenecks in processing those expensive queries, and provides key insights for us to develop a series of optimizations to mitigate those bottlenecks.
25	Complex event analytics: online aggregation of stream sequence patterns	Yingmei Qi, Lei Cao, Medhabi Ray, Elke A. Rundensteiner	In this paper, we demonstrate that CEP aggregation can be pushed into the sequence construction process.
26	Towards indexing functions: answering scalar product queries	Arijit Khan, Pouya Yanki, Bojana Dimcheva, Donald Kossmann	We present a lightweight, yet scalable, dynamic, and generalized indexing scheme, called the planar index, for answering scalar product queries in an accurate manner, which is based on the idea of indexing function f(x) for each data point x using multiple sets of parallel hyperplanes.
27	LINVIEW: incremental view maintenance for complex analytical queries	Milos Nikolic, Mohammed ElSeidy, Christoph Koch	In this paper, we study the incremental view maintenance problem for such complex analytical queries.
28	Materialization optimizations for feature selection workloads	Ce Zhang, Arun Kumar, Christopher Ré	Analytics is one of the biggest topics in data management, and feature selection is widely regarded as the most critical step of analytics; thus, we argue that managing the feature selection process is a pressing data management challenge.
29	The analytical bootstrap: a new method for fast error estimation in approximate query processing	Kai Zeng, Shi Gao, Barzan Mozafari, Carlo Zaniolo	In this paper, we introduce a probabilistic relational model for the bootstrap process, along with rigorous semantics and a unified error model, which bridges the gap between these two traditional approaches.
30	TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing	Sairam Gurajada, Stephan Seufert, Iris Miliaraki, Martin Theobald	We investigate a new approach to the design of distributed, shared-nothing RDF engines.
31	Querying big graphs within bounded resources	Wenfei Fan, Xin Wang, Yinghui Wu	We propose resource-bounded query answering via a dynamic scheme that reduces big G to G_Q.
32	Natural language question answering over RDF: a graph data driven approach	Lei Zou, Ruizhe Huang, Haixun Wang, Jeffrey Xu Yu, Wenqiang He, Dongyan Zhao	In this paper, we propose a systematic framework to answer natural language questions over RDF repository (RDF Q/A) from a graph data-driven perspective.
33	Scalable similarity search for SimRank	Mitsuru Kusumoto, Takanori Maehara, Ken-ichi Kawarabayashi	We propose a very fast and scalable algorithm for this similarity search problem.
34	Orca: a modular query optimizer architecture for big data	Mohamed A. Soliman, Lyublena Antova, Venkatesh Raghavan, Amr El-Helw, Zhongxian Gu, Entong Shen, George C. Caragea, Carlos Garcia-Alvarado, Foyzur Rahman, Michalis Petropoulos, Florian Waas, Sivaramakrishnan Narayanan, Konstantinos Krikellas, Rhonda Baldwin	In this paper we present the architecture of Orca, the new query optimizer for all Pivotal data management products, including Pivotal Greenplum Database and Pivotal HAWQ.
35	Parallel I/O aware query optimization	Pedram Ghodsnia, Ivan T. Bowman, Anisoara Nica	We characterize the benefit of exploiting I/O parallelism in database scan operators in SAP SQL Anywhere and propose a novel general I/O cost model that considers the impact of device I/O queue depth in I/O cost estimation.
36	Exploiting ordered dictionaries to efficiently construct histograms with q-error guarantees in SAP HANA	Guido Moerkotte, David DeHaan, Norman May, Anisoara Nica, Alexander Boehm	In this paper we extend this concept with a threshold, i.e., an estimate or true cardinality θ, below which we do not care about the q-error because we still expect optimal plans.
37	Optimizing queries over partitioned tables in MPP systems	Lyublena Antova, Amr El-Helw, Mohamed A. Soliman, Zhongxian Gu, Michalis Petropoulos, Florian Waas	In this paper, we present optimization techniques for queries over partitioned tables as implemented in Pivotal Greenplum Database.
38	Parallel data analysis directly on scientific file formats	Spyros Blanas, Kesheng Wu, Surendra Byna, Bin Dong, Arie Shoshani	In this paper, we present the design of a new scientific data analysis system that efficiently processes queries directly over data stored in the HDF5 file format.
39	The PH-tree: a space-efficient storage structure and multi-dimensional index	Tilmann Zäschke, Christoph Zimmerli, Moira C. Norrie	We propose the PATRICIA-hypercube-tree, or PH-tree, a multi-dimensional data storage and indexing structure.
40	Incremental elasticity for array databases	Jennie Duggan, Michael Stonebraker	In both steps we propose incremental approaches, affecting a minimum set of data and nodes, while maintaining high performance.
41	Efficient summarization framework for multi-attribute uncertain data	Jie Xu, Dmitri V. Kalashnikov, Sharad Mehrotra	We propose a framework that models objects as a set of the corresponding information units and reduces the ummarization problem to that of optimizing probabilistic coverage.
42	Fusing data with correlations	Ravali Pochampally, Anish Das Sarma, Xin Luna Dong, Alexandra Meliou, Divesh Srivastava	In this paper we present novel techniques modeling correlations between sources and applying it in truth finding. We provide a comprehensive evaluation of our approach on three real-world datasets with different characteristics, as well as on synthetic data, showing that our algorithms outperform the existing state-of-the-art techniques.
43	Descriptive and prescriptive data cleaning	Anup Chalamalla, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti	In this paper, we propose a system to address this decoupling.
44	Towards dependable data repairing with fixing rules	Jiannan Wang, Nan Tang	Towards dependable data repairing with fixing rules
45	A sample-and-clean framework for fast and accurate query processing on dirty data	Jiannan Wang, Sanjay Krishnan, Michael J. Franklin, Ken Goldberg, Tim Kraska, Tova Milo	In this paper, we explore an intriguing opportunity.
46	Knowing when you’re wrong: building fast and reliable approximate query processing systems	Sameer Agarwal, Henry Milner, Ariel Kleiner, Ameet Talwalkar, Michael Jordan, Samuel Madden, Barzan Mozafari, Ion Stoica	In this paper, we show that it is possible to implement a query approximation pipeline that produces approximate answers and reliable error bars at interactive speeds.
47	Discovering queries based on example tuples	Yanyan Shen, Kaushik Chakrabarti, Surajit Chaudhuri, Bolin Ding, Lev Novik	We propose novel algorithms to solve this problem.
48	Interactive data exploration using semantic windows	Alexander Kalinin, Ugur Cetintemel, Stan Zdonik	We present a new interactive data exploration approach, called Semantic Windows (SW), in which users query for multidimensional "windows" of interest via standard DBMS-style queries enhanced with exploration constructs.
49	Explore-by-example: an automatic query steering framework for interactive data exploration	Kyriaki Dimitriadou, Olga Papaemmanouil, Yanlei Diao	In this paper, we introduce AIDE, an Automatic Interactive Data Exploration framework, that iteratively steers the user towards interesting data areas and predicts a query that retrieves his objects of interest.
50	Durable write cache in flash memory SSD for relational and NoSQL databases	Woon-Hak Kang, Sang-Won Lee, Bongki Moon, Yang-Suk Kee, Moonwook Oh	This paper presents a new SSD prototype called DuraSSD equipped with tantalum capacitors.
51	Fast database restarts at facebook	Aakash Goel, Bhuwan Chopra, Ciprian Gerea, Dhruv Mátáni, Josh Metzler, Fahim Ul Haq, Janet Wiener	In this paper, we show that using shared memory provides a simple, effective, fast, solution to upgrading servers.
52	SpongeFiles: mitigating data skew in mapreduce using distributed memory	Khaled Elmeleegy, Christopher Olston, Benjamin Reed	We introduce SpongeFiles, a novel distributed-memory abstraction tailored to data processing environments like MapReduce.
53	Leveraging compression in the tableau data engine	Richard Michael Grantham Wesley, Pawel Terlecki	In this paper, we describe how the Tableau Data Engine (an internally developed column store) leverages a number of compression techniques to improve query performance.
54	Fun with hardware transactional memory	Maurice Herlihy	This talk will argue that HTM is not just a faster way of doing the same old latches and monitors.
55	CrowdFill: collecting structured data from the crowd	Hyunjung Park, Jennifer Widom	We present CrowdFill, a system for collecting structured data from the crowd.
56	OASSIS: query driven crowd mining	Yael Amsterdamer, Susan B. Davidson, Tova Milo, Slava Novgorodov, Amit Somech	In this paper, we explore a novel approach that broadens crowd data sourcing by enabling users to pose general questions, to mine the crowd for potentially relevant data, and to receive concise, relevant answers that represent frequent, significant data patterns.
57	Corleone: hands-off crowdsourcing for entity matching	Chaitanya Gokhale, Sanjib Das, AnHai Doan, Jeffrey F. Naughton, Narasimhan Rampalli, Jude Shavlik, Xiaojin Zhu	We describe Corleone, a HOC solution for EM, which uses the crowd in all major steps of the EM process.
58	Efficient cohesive subgraphs detection in parallel	Yingxia Shao, Lei Chen, Bin Cui	In this paper, we propose a novel parallel and efficient truss detection algorithm, called PeTa.
59	Parallel subgraph listing in a large-scale graph	Yingxia Shao, Bin Cui, Lei Chen, Lin Ma, Junjie Yao, Ning Xu	In this paper, we design a novel parallel subgraph listing framework, named PSgL.
60	OPT: a new framework for overlapped and parallel triangulation in large-scale graphs	Jinha Kim, Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, Hwanjo Yu	In this paper, we propose an overlapped and parallel disk-based triangulation framework for billion-scale graphs, OPT, which achieves the ideal cost by (1) full overlap of the CPU and I/O operations and (2) full parallelism of multi-core CPU and FlashSSD I/O.
61	Knowledge expansion over probabilistic knowledge bases	Yang Chen, Daisy Zhe Wang	In this paper, we present ProbKB, a probabilistic knowledge base designed to infer missing facts in a scalable, probabilistic, and principled manner using a relational DBMS.
62	InsightNotes: summary-based annotation management in relational databases	Dongqing Xiao, Mohamed Y. Eltabakh	In this paper, we address the challenges that arise from the growing scale of annotations in scientific databases.
63	A pivotal prefix based filtering algorithm for string similarity search	Dong Deng, Guoliang Li, Jianhua Feng	To address this problem, we propose a novel pivotal prefix filter which significantly reduces the number of signatures.
64	Versatile optimization of UDF-heavy data flows with sofa	Astrid Rheinländer, Martin Beckmann, Anja Kunkel, Arvid Heise, Thomas Stoltmann, Ulf Leser	In this demonstration, we present Meteor, a declarative data flow language, and Sofa, a logical optimizer for UDF-heavy data flows, which are both part of the Stratosphere system.
65	ERIS live: a NUMA-aware in-memory storage engine for tera-scale multiprocessor systems	Tim Kiefer, Thomas Kissinger, Benjamin Schlegel, Dirk Habich, Daniel Molka, Wolfgang Lehner	In this demonstration, we present ERIS, our NUMA-aware in-memory storage engine.
66	Demonstrating efficient query processing in heterogeneous environments	Tomas Karnagel, Matthias Hille, Mario Ludwig, Dirk Habich, Wolfgang Lehner, Max Heimel, Volker Markl	In prior work, we presented a generic hardware-oblivious database system, where the operators can be executed on the main processor as well as on a large number of accelerator architectures.
67	One DBMS for all: the brawny few and the wimpy crowd	Tobias Mühlbauer, Wolf Rödiger, Robert Seilbeck, Angelika Reiser, Alfons Kemper, Thomas Neumann	One DBMS for all: the brawny few and the wimpy crowd
68	VQA: vertica query analyzer	Alkis Simitsis, Kevin Wilkinson, Jason Blais, Joe Walsh	We demonstrate VQA using TPC-DS queries which have a wide range of query duration and complexity.
69	Palette: enabling scalable analytics for big-memory, multicore machines	Fei Chen, Tere Gonzalez, Jun Li, Manish Marwah, Jim Pruyne, Krishnamurthy Viswanathan, Mijung Kim	In this demo, we present Palette, an analytics framework that exploits large memory to trade space for time while also addressing the challenges of multi-threaded, NUMA-aware programming.
70	NaLIR: an interactive natural language interface for querying relational databases	Fei Li, Hosagrahar V Jagadish	In this demo, we present NaLIR, a generic interactive natural language interface for querying relational databases.
71	BabbleFlow: a translator for analytic data flow programs	Petar Jovanovic, Alkis Simitsis, Kevin Wilkinson	To address this problem, we present BabbleFlow, a system for enabling flow design at a logical level and automatic translation to physical flows.
72	Indexing on modern hardware: hekaton and beyond	Justin Levandoski, David Lomet, Sudipta Sengupta, Adrian Birka, Cristian Diaconu	Recent OLTP support exploits new techniques, running on modern hardware, to achieve unprecedented performance compared with prior approaches.
73	CrowdMatcher: crowd-assisted schema matching	Chen Jason Zhang, Ziyuan Zhao, Lei Chen, H. V. Jagadish, Chen Caleb Cao	Thus in this demo, we will show how to utilize the crowd to find the right matching.
74	Cloud-based RDF data management	Zoi Kaoudi, Ioana Manolescu	This tutorial presents the challenges faced in order to efficiently handle massive amounts of RDF data in a cloud environment.
75	Patience is a virtue: revisiting merge and sort on modern processors	Badrish Chandramouli, Jonathan Goldstein	We revisit the problem of sorting and merging data in main memory, and show that a long-forgotten technique called Patience Sort can, with some key modifications, be made competitive with today’s best comparison-based sorting techniques for both random and almost sorted data.
76	Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age	Viktor Leis, Peter Boncz, Alfons Kemper, Thomas Neumann	In response, we present the morsel-driven query execution framework, where scheduling becomes a fine-grained run-time task that is NUMA-aware.
77	A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort	Orestis Polychroniou, Kenneth A. Ross	We revisit the pitfalls of in-cache partitioning, and utilizing the crucial performance factors, we introduce new variants for partitioning out-of-cache.
78	An application-specific instruction set for accelerating set-oriented database primitives	Oliver Arnold, Sebastian Haas, Gerhard Fettweis, Benjamin Schlegel, Thomas Kissinger, Wolfgang Lehner	In this paper, we show that the development of a database processor is much more feasible nowadays through the availability of customizable processors.
79	Which concepts are worth extracting?	Arash Termehchy, Ali Vakilian, Yodsawalai Chodpathumwan, Marianne Winslett	In this paper, we introduce the problem of cost effective conceptual design, where given a collection, a set of relevant concepts, and a fixed budget, one likes to find a conceptual design that improves the effectiveness of answering queries over the collection the most.
80	Querying virtual hierarchies using virtual prefix-based numbers	Curtis E. Dyreson, Sourav S. Bhowmick, Ryan Grapp	In this paper we present a novel strategy to virtually transform the data without instantiating and renumbering.
81	NLyze: interactive programming by natural language for spreadsheet data analysis and manipulation	Sumit Gulwani, Mark Marron	This paper describes the design and implementation of a robust natural language based interface to spreadsheet programming.
82	Sinew: a SQL system for multi-structured data	Daniel Tahara, Thaddeus Diamond, Daniel J. Abadi	In this paper, we discuss the design of a system that enables developers to continue to represent their data using self-describing formats without moving away from SQL and traditional relational database systems.
83	Scalable big graph processing in MapReduce	Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Chengqi Zhang, Xuemin Lin	In this paper, we study scalable big graph processing in MapReduce.
84	Anti-combining for MapReduce	Alper Okcan, Mirek Riedewald	We propose Anti-Combining, a novel optimization for MapReduce programs to decrease the amount of data transferred from mappers to reducers.
85	Opportunistic physical design for big data analytics	Jeff LeFevre, Jagan Sankaranarayanan, Hakan Hacigumus, Junichi Tatemura, Neoklis Polyzotis, Michael J. Carey	We present a semantic model for UDFs that enables effective reuse of views containing UDFs along with a rewrite algorithm that provably finds the minimum-cost rewrite under certain assumptions.
86	Stratified-sampling over social networks using mapreduce	Roy Levin, Yaron Kanza	In this paper we consider sampling of large-scale, distributed online social networks, and we show how to deal with cases where several surveys are conducted in parallel—in some surveys it may be desired to share individuals to reduce costs, while in other surveys, sharing should be minimized, e.g., to prevent survey fatigue.
87	Demonstration of the Myria big data management service	Daniel Halperin, Victor Teixeira de Almeida, Lee Lee Choo, Shumo Chu, Paraschos Koutris, Dominik Moritz, Jennifer Ortiz, Vaspol Ruamviboonsuk, Jingjing Wang, Andrew Whitaker, Shengliang Xu, Magdalena Balazinska, Bill Howe, Dan Suciu	Myria queries are executed on a scalable, parallel cluster that uses both state-of-the-art and novel methods for distributed query processing.
88	DataSift: a crowd-powered search toolkit	Aditya Parameswaran, Ming Han Teh, Hector Garcia-Molina, Jennifer Widom	We demonstrate DataSift, a crowd-powered search toolkit that can be instrumented over any corpus supporting a keyword search API, and supports efficient and accurate querying for a rich general class of queries, including those described previously.
89	Reactive and proactive sharing across concurrent analytical queries	Iraklis Psaroudakis, Manos Athanassoulis, Matthaios Olma, Anastasia Ailamaki	We show that pull-based sharing for SP eliminates the serialization point imposed by the original push-based approach.
90	SLQ: a user-friendly graph querying system	Shengqi Yang, Yanan Xie, Yinghui Wu, Tianyu Wu, Huan Sun, Jian Wu, Xifeng Yan	In this demo, we present SLQ, a user-friendly graph querying system enabling schemales and structures graph querying, where a user need not describe queries precisely as required by most databases.
91	TAREEG: a MapReduce-based web service for extracting spatial data from OpenStreetMap	Louai Alarabi, Ahmed Eldawy, Rami Alghamdi, Mohamed F. Mokbel	TAREEG employs MapReduce-based techniques to make it efficient and easy to extract OpenStreetMap data in a standard form with minimal effort.
92	Searching with XQ: the exemplar query search engine	Davide Mottin, Matteo Lissandrini, Yannis Velegrakis, Themis Palpanas	At the same time, we highlight the technical challenges for this type of query answering and illustrate the implementation approach we have materialized.
93	MeanKS: meaningful keyword search in relational databases with complex schema	Mehdi Kargar, Aijun An, Nick Cercone, Parke Godfrey, Jaroslaw Szlichta, Xiaohui Yu	We demonstrate MeanKS, a new system for meaningful keyword search over relational databases.
94	H	Nikolaos Papailiou, Dimitrios Tsoumakos, Ioannis Konstantinou, Panagiotis Karras, Nectarios Koziris	In this paper, we present its key scientific contributions and allow participants to interact with an H2RDF+ deployment over a Cloud infrastructure.
95	DoomDB: kill the query	Carsten Binnig, Abdallah Salama, Erfan Zamanian	For the demonstration, we present a computer game called DoomDB.
96	Should we all be teaching "intro to data science" instead of "intro to databases"?	Bill Howe, Michael J. Franklin, Juliana Freire, James Frew, Tim Kraska, Raghu Ramakrishnan	We consider how to bring these concepts front and center into the emerging wave of Data Science courses, degree programs and even departments.
97	Characterizing and selecting fresh data sources	Theodoros Rekatsinas, Xin Luna Dong, Divesh Srivastava	In this paper, we study the problem of source selection considering dynamic data sources whose content changes over time.
98	Sloth: being lazy is a virtue (when issuing database queries)	Alvin Cheung, Samuel Madden, Armando Solar-Lezama	In this paper, we present Sloth, a new system that extends traditional lazy evaluation to expose query batching opportunities during application execution, even across loops, branches, and method boundaries.
99	Dynamically optimizing queries over large scale data platforms	Konstantinos Karanasos, Andrey Balmin, Marcel Kutsch, Fatma Ozcan, Vuk Ercegovac, Chunyang Xia, Jesse Jackson	In this paper, we propose new techniques that take into account UDFs and correlations between relations for optimizing queries running on large scale clusters.
100	A software-defined networking based approach for performance management of analytical queries on distributed data stores	Pengcheng Xiong, Hakan Hacigumus, Jeffrey F. Naughton	More specifically, we present a group of methods to leverage SDN’s visibility into and control of the network’s state that enable distributed query processors to achieve performance improvements and differentiation for analytical queries.
101	The pursuit of a good possible world: extracting representative instances of uncertain graphs	Panos Parchas, Francesco Gullo, Dimitris Papadias, Franceseco Bonchi	To overcome these problems, we propose algorithms for creating deterministic representative instances of uncertain graphs that maintain the underlying graph properties.
102	Navigating the maze of graph analytics frameworks using massive graph datasets	Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M. Amber Hassaan, Shubho Sengupta, Zhaoming Yin, Pradeep Dubey	In this work, we offer a quantitative roadmap for improving the performance of all these frameworks and bridging the "ninja gap".
103	Local search of communities in large graphs	Wanyun Cui, Yanghua Xiao, Haixun Wang, Wei Wang	In this paper, we propose a \emph{local search} strategy, which searches in the neighborhood of a vertex to find the best community for the vertex.
104	Mining statistically significant connected subgraphs in vertex labeled graphs	Akhil Arora, Mayank Sachan, Arnab Bhattacharya	In this paper, we address the problem of finding statistically significant connected subgraphs where the nodes of the graph are labeled.
105	Complete yet practical search for minimal query reformulations under constraints	Ioana Ileana, Bogdan Cautis, Alin Deutsch, Yannis Katsis	We revisit the Chase&Backchase (C&B) algorithm for query reformulation under constraints, which provides a uniform solution to such particular-case problems as view-based rewriting under constraints, semantic query optimization, and physical access path selection in query optimization.
106	Query shredding: efficient relational evaluation of queries over nested multisets	James Cheney, Sam Lindley, Philip Wadler	We present a new approach to query shredding, which converts a query returning nested data to a fixed number of SQL queries.
107	Plan bouquets: query processing without selectivity estimation	Anshuman Dutt, Jayant R. Haritsa	We propose here a conceptually new approach to address this problem, wherein the compile-time estimation process is completely eschewed for error-prone selectivities.
108	Schema-free SQL	Fei Li, Tianyin Pan, Hosagrahar V. Jagadish	In this paper, we propose a query language, Schema-free SQL, which enables its users to query a relational database using whatever partial schema they know.
109	iCheck: computationally combating "lies, d–ned lies, and statistics"	You Wu, Brett Walenz, Peggy Li, Andrew Shim, Emre Sonmez, Pankaj K. Agarwal, Chengkai Li, Jun Yang, Cong Yu	For claims based on structured data, we present a system to automatically assess the quality of claims (beyond their correctness) and counter misleading claims that cherry-pick data to advance their conclusions.
110	ABS: a system for scalable approximate queries with accuracy guarantees	Kai Zeng, Shi Gao, Jiaqi Gu, Barzan Mozafari, Carlo Zaniolo	Our recently introduced Analytical Bootstrap method combines the strengths of both approaches and provides the basis for our ABS system, which will be demonstrated at the conference.
111	NADEEF/ER: generic and interactive entity resolution	Ahmed Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, Jorge-Arnulfo Quiané-Ruiz, Nan Tang, Si Yin	NADEEF/ER: generic and interactive entity resolution
112	SerpentTI: flexible analytics of users, boards and domains for pinterest	Alex Cheng, Mary Malit, Chuanxi Zhang, Nick Koudas	We provide a description of SerpentTI, a system that currently crawls, indexes and aggregates more than 31 million users, 96 million boards and 3.1 billion pins from Pinterest to enable flexible and deep analytics.
113	Interactive redescription mining	Esther Galbrun, Pauli Miettinen	We present Siren, a tool for interactive redescription mining.
114	ONTOCUBO: cube-based ontology construction and exploration	Carlos Garcia-Alvarado, Carlos Ordonez	In this paper, we present ONTOCUBO, a novel system based on our research for text summarization using ontologies and automatic extraction of concepts for building ontologies using Online Analytical Processing (OLAP) cubes.
115	An extendable framework for managing uncertain spatio-temporal data	Tobias Emrich, Maximilian Franzke, Hans-Peter Kriegel, Johannes Niedermayer, Matthias Renz, Andreas Züfle	This demonstration presents our Uncertain-Spatio-Temporal (UST)} framework that we have developed in recent years.
116	NewsNetExplorer: automatic construction and exploration of news information networks	Fangbo Tao, George Brova, Jiawei Han, Heng Ji, Chi Wang, Brandon Norick, Ahmed El-Kishky, Jialu Liu, Xiang Ren, Yizhou Sun	Much knowledge can be derived and explored with such an information network if we systematically develop effective and scalable data-intensive information network analysis technologies. Further, we develop a set of news information network exploration and mining mechanisms that explore news in multi-dimensional space, which include (i) OLAP-based operations on the hierarchical dimensional and topical structures and rich-text, such as cell summary, single dimension analysis, and promo- tion analysis, (ii) a set of network-based operations, such as similarity search and ranking-based clustering, and (iii) a set of hybrid operations or network-OLAP operations, such as entity ranking at different granularity levels.
117	IQR: an interactive query relaxation system for the empty-answer problem	Davide Mottin, Alice Marascu, Senjuti Basu Roy, Gautam Das, Themis Palpanas, Yannis Velegrakis	We present IQR, a system that demonstrates optimization based interactive relaxations for queries that return an empty answer.
118	OceanRT: real-time analytics over large temporal data	Shiming Zhang, Yin Yang, Wei Fan, Liang Lan, Mingxuan Yuan	We demonstrate OceanRT, a novel cloud-based infrastructure that performs online analytics in real time, over large-scale temporal data such as call logs from a telecommunication company.
119	H2O: a hands-free adaptive store	Ioannis Alagiannis, Stratos Idreos, Anastasia Ailamaki	In this paper, we present the H2O system which introduces two novel concepts.
120	Fine-grained partitioning for aggressive data skipping	Liwen Sun, Michael J. Franklin, Sanjay Krishnan, Reynold S. Xin	In this paper, we propose a fine-grained blocking technique that reorganizes the data tuples into blocks with a goal of enabling queries to skip blocks aggressively.
121	DSH: data sensitive hashing for high-dimensional k-nnsearch	Jinyang Gao, Hosagrahar Visvesvaraya Jagadish, Wei Lu, Beng Chin Ooi	In this paper, we propose a new and efficient method called Data Sensitive Hashing (DSH) to address this drawback.
122	Fast and unified local search for random walk based k-nearest-neighbor query in large graphs	Yubao Wu, Ruoming Jin, Xiang Zhang	In this paper, we present FLoS (Fast Local Search), a unified local search method for efficient and exact top-k proximity query in large graphs.
123	Global immutable region computation	Jilian Zhang, Kyriakos Mouratidis, HweeHwa Pang	In this paper we propose an auxiliary feature to standard top-k query processing.
124	Answering top-k representative queries on graph databases	Sayan Ranu, Minh Hoang, Ambuj Singh	In this paper, we solve the problem of top-k representative queries on graph databases.
125	Modeling entity evolution for temporal record matching	Yueh-Hsuan Chiang, AnHai Doan, Jeffrey F. Naughton	In our work, we propose and evaluate a more detailed model that focuses on the probability that a given attribute value reappears over time.
126	Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation	Qi Li, Yaliang Li, Jing Gao, Bo Zhao, Wei Fan, Jiawei Han	In this paper, we propose to resolve conflicts among multiple sources of heterogeneous data types.
127	A probabilistic model for linking named entities in web text with heterogeneous information networks	Wei Shen, Jiawei Han, Jianyong Wang	We propose an effective iterative approach to automatically learning the weights for each meta-path based on the expectation-maximization (EM) algorithm without requiring any training data.
128	Matching heterogeneous event data	Xiaochen Zhu, Shaoxu Song, Xiang Lian, Jianmin Wang, Lei Zou	We prove the convergence of iterative similarity computation, and propose several pruning and estimation methods.
129	HAWQ: a massively parallel processing SQL engine in hadoop	Lei Chang, Zhanwei Wang, Tao Ma, Lirong Jian, Lili Ma, Alon Goldshuv, Luke Lonergan, Jeffrey Cohen, Caleb Welton, Gavin Sherry, Milind Bhandarkar	This paper presents the novel design of HAWQ, including query processing, the scalable software interconnect based on UDP protocol, transaction management, fault tolerance, read optimized storage, the extensible framework for supporting various popular Hadoop based data stores and formats, and various optimization choices we considered to enhance the query performance.
130	Major technical advancements in apache hive	Yin Huai, Ashutosh Chauhan, Alan Gates, Gunther Hagleitner, Eric N. Hanson, Owen O’Malley, Jitendra Pandey, Yuan Yuan, Rubao Lee, Xiaodong Zhang	In this paper, we present a community-based effort on technical advancements in Hive.
131	JSON data management: supporting schema-less development in RDBMS	Zhen Hua Liu, Beda Hammerschmidt, Doug McMahon	In this paper, we analyze the way in which requirements differ between management of relational data and management of JSON data.
132	Querying encrypted data	Arvind Arasu, Ken Eguro, Raghav Kaushik, Ravishankar Ramamurthy	We cover approaches based on both classic client-server and involving the use of a trusted hardware module where data can be securely decrypted.
133	Towards unified ad-hoc data processing	Xiaogang Shi, Bin Cui, Gillian Dobbie, Beng Chin Ooi	In this paper, we present UniAD, a system designed to simplify the programming of data processing tasks and provide efficient execution for user programs.
134	Partial results in database systems	Willis Lang, Rimma V. Nehme, Eric Robinson, Jeffrey F. Naughton	We explore ways to characterize and classify these partial results, and describe an analytical framework that allows the system to perform coarse to fine-grained analysis to determine the semantics of a partial result.
135	Parallel in-situ data processing with speculative loading	Yu Cheng, Florin Rusu	In this paper, we propose SCANRAW, a novel database physical operator for in-situ processing over raw files that integrates data loading and external tables seamlessly while preserving their advantages: optimal performance across a query workload and zero time-to-query.
136	Approximation schemes for many-objective query optimization	Immanuel Trummer, Christoph Koch	This is why we propose several approximation schemes for MOQO that generate guaranteed near-optimal plans in seconds where exhaustive optimization takes hours.
137	Querying k-truss community in large and dynamic graphs	Xin Huang, Hong Cheng, Lu Qin, Wentao Tian, Jeffrey Xu Yu	We propose a novel community model based on the k-truss concept, which brings nice structural and computational properties.
138	Reachability queries on large dynamic graphs: a total order approach	Andy Diwen Zhu, Wenqing Lin, Sibo Wang, Xiaokui Xiao	To address this deficiency, this paper presents a novel study on reachability indices for large dynamic graphs.
139	EAGr: supporting continuous ego-centric aggregate queries over large dynamic graphs	Jayanta Mondal, Amol Deshpande	In this paper, we present EAGr, a system for supporting large numbers of continuous neighborhood-based ("ego-centric") aggregate queries over large, highly dynamic, rapidly evolving graphs.
140	Localizing anomalous changes in time-evolving graphs	Kumar Sricharan, Kamalika Das	In this paper, we use the term `localization’ to refer to the problem of identifying abnormal changes in node relationships (edges) that cause anomalous changes in graph structure.
141	Online optimization and fair costing for dynamic data sharing in a cloud data market	Ziyang Liu, Hakan Hacigümüs	In this paper, we study a data market framework that enables the sale or sharing of dynamic data, where each sharing is specified by an ad-hoc query. We propose an intuitive online algorithm for sharing plan selection, as well as a set of fair costing criteria and an algorithm that maximizes the fairness.
142	A comparison of platforms for implementing and running very large scale machine learning algorithms	Zhuhua Cai, Zekai J. Gao, Shangyu Luo, Luis L. Perez, Zografoula Vagena, Christopher Jermaine	We describe an extensive benchmark of platforms available to a user who wants to run a machine learning (ML) inference algorithm over a very large data set, but cannot find an existing implementation and thus must "roll her own" ML code.
143	Re-evaluating designs for multi-tenant OLTP workloads on SSD-basedI/O subsystems	Ning Zhang, Junichi Tatemura, Jignesh Patel, Hakan Hacigumus	In this paper, we compare three designs using both open-source and proprietary DBMSs on SSD-based I/O subsystems.
144	Secure query processing with data interoperability in a cloud database environment	Wai Kit Wong, Ben Kao, David Wai Lok Cheung, Rongbin Li, Siu Ming Yiu	We propose and analyze a secure query processing system (SDB) on relational tables and a set of elementary operators on encrypted data that allow data interoperability, which allows a wide range of SQL queries to be processed by the SP on encrypted information.
145	Are we experiencing a big data bubble?	Fatma Özcan, Nesime Tatbul, Daniel J. Abadi, Marcel Kornacker, C. Mohan, Karthik Ramasamy, Janet Wiener	Are we experiencing a big data bubble?
146	Mining latent entity structures from massive unstructured and interconnected data	Jiawei Han, Chi Wang	In this tutorial, we summarize the closely related literature in database systems, data mining, Web, information extraction, information retrieval, and natural language processing, overview a spectrum of data-driven methods that extract and infer such latent structures, from an interdisciplinary point of view, and demonstrate how these structures support entity discovery and management, data understanding, and some new database applications.
147	Explainable security for relational databases	Gabriel Bender, Lucja Kot, Johannes Gehrke	To encourage developers and administrators to use security mechanisms more effectively, we propose a novel security model in which all security decisions are formally explainable.
148	PrivBayes: private data release via bayesian networks	Jun Zhang, Graham Cormode, Cecilia M. Procopiuc, Divesh Srivastava, Xiaokui Xiao	To address the deficiency of the existing methods, this paper presents PrivBayes, a differentially private method for releasing high-dimensional data.
149	PriView: practical differentially private release of marginal contingency tables	Wahbeh Qardaji, Weining Yang, Ninghui Li	We consider the problem of publishing a differentially private synopsis of a d-dimensional dataset so that one can reconstruct any k-way marginal contingency tables from the synopsis.
150	Blowfish privacy: tuning privacy-utility trade-offs using policies	Xi He, Ashwin Machanavajjhala, Bolin Ding	In this paper, we present Blowfish, a class of privacy definitions inspired by the Pufferfish framework, that provides a rich interface for this trade-off.
151	Overlap interval partition join	Anton Dignös, Michael H. Böhlen, Johann Gamper	We propose Overlap Interval Partitioning (OIP), a new partitioning approach for data with an interval.
152	Similarity joins for uncertain strings	Manish Patil, Rahul Shah	We propose various filtering techniques that give upper and (or) lower bound on Pr(ed(R,S) ≤ k) without instantiating possible worlds for either of the strings.
153	Track join: distributed joins with minimal network traffic	Orestis Polychroniou, Rajkumar Sen, Kenneth A. Ross	We introduce track join, a novel distributed join algorithm that minimizes network traffic by generating an optimal transfer schedule for each distinct join key.
154	On-the-fly token similarity joins in relational databases	Nikolaus Augsten, Armando Miraglia, Thomas Neumann, Alfons Kemper	Our goal is to efficiently compute token similarity joins on-the-fly, i.e., without any precomputed tokens or indexes.
155	Tracking set correlations at large scale	Foteini Alvanaki, Sebastian Michel	In this work, we consider the continuous computation of correlations between co-occurring tags that appear in messages published in social media streams.
156	Aggregate estimation over a microblog platform	Saravanan Thirumuruganathan, Nan Zhang, Vagelis Hristidis, Gautam Das	In this paper, we consider a novel problem of estimating aggregate queries over microblogs, e.g., "how many users mentioned the word ‘privacy’ in 2013?"
157	Tripartite graph clustering for dynamic sentiment analysis on social media	Linhong Zhu, Aram Galstyan, James Cheng, Kristina Lerman	In this work, we propose an unsupervised tri-clustering framework, which analyzes both user-level and tweet-level sentiments through co-clustering of a tripartite graph.
158	A temporal context-aware model for user behavior modeling in social media systems	Hongzhi Yin, Bin Cui, Ling Chen, Zhiting Hu, Zi Huang	This paper focuses on analyzing user behaviors in social media systems and designing a latent class statistical mixture model, named temporal context-aware mixture model (TCAM), to account for the intentions and preferences behind user behaviors.
159	Indexing for interactive exploration of big data series	Kostas Zoumpatianos, Stratos Idreos, Themis Palpanas	In this paper, we present the first adaptive indexing mechanism, specifically tailored to solve the problem of indexing and querying very large data series collections.
160	Histograms as a side effect of data movement for big data	Zsolt Istvan, Louis Woods, Gustavo Alonso	In this paper, we show how to calculate statistics as a side effect of data movement within a DBMS using a hardware accelerator in the data path.
161	A formal approach to finding explanations for database queries	Sudeepa Roy, Dan Suciu	In this paper we introduce a principled approach to provide explanations for answers to SQL queries based on intervention: removal of tuples from the database that significantly affect the query answers.
162	MISO: souping up big data query processing with a multistore system	Jeff LeFevre, Jagan Sankaranarayanan, Hakan Hacigumus, Junichi Tatemura, Neoklis Polyzotis, Michael J. Carey	In this work, we provide what we believe to be the first method to tune the physical design of a multistore system, by focusing on which store to place data.
163	Efficient top-K SimRank-based similarity join	Wenbo Tao, Guoliang Li	In this paper we study the problem of top-k SimRank-based similarity join, which finds k pairs of nodes with the largest SimRank values.
164	Multi-dimensional data statistics for columnar in-memory databases	Curtis Kroetsch	The research presented here studies the multi-dimensional data statistics in the context of columnar in-memory database systems.
165	A user interaction based community detection algorithm for online social networks	Himel Dev	To alleviate the limitations of existing approaches, we propose a novel solution of community detection in OSNs.
166	EDS: a segment-based distance measure for sub-trajectory similarity search	Min Xie	In this paper, we study a sub-trajectory similarity search problem which returns for a query trajectory some trajectories from the trajectory database each of which contains a sub-trajectory similar to the query trajectory.
167	Spatio-temporal visual analysis for event-specific tweets	Mashaal Musleh	In this poster, we present our on-going work on this module and discuss three of its use cases.
168	PackageBuilder: querying for packages of tuples	Kevin Fernandes, Matteo Brucato, Rahul Ramakrishna, Azza Abouzied, Alexandra Meliou	PackageBuilder introduces simple extensions to the SQL language to support package-level predicates, and includes a simple interface that allows users to load datasets and interactively specify package queries.
169	Privacy preserving social graphs for high precision community detection	Himel Dev	To resolve this issue, we address the problem of privacy preserving community detection in social networks.