Paper Digest: SIGMOD 2017 Highlights

June 16, 2017June 26, 2020 admin

The ACM Special Interest Group on Management of Data (SIGMOD) is one of the top conferences on database management systems and data management technology.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: SIGMOD 2017 Papers

	Title	Authors	Highlight
1	The Next 700 Transaction Processing Engines	Anastasia Ailamaki	In this talk, we discuss the implications of these trends on the design of next-generation transaction processing engines.
2	What Are We Doing With Our Lives?: Nobody Cares About Our Concurrency Control Research	Andrew Pavlo	In this talk/denouncement, I will descend from my ivory tower and argue that we need to rethink our agenda for concurrency control research.
3	ACIDRain: Concurrency-Related Attacks on Database-Backed Web Applications	Todd Warszawski, Peter Bailis	In this paper, we formalize a new kind of attack on database-backed applications called an ACIDRain attack, in which an adversary systematically exploits concurrency-related vulnerabilities via programmatically accessible APIs.
4	Cicada: Dependably Fast Multi-Core In-Memory Transactions	Hyeontaek Lim, Michael Kaminsky, David G. Andersen	Cicada: Dependably Fast Multi-Core In-Memory Transactions
5	BatchDB: Efficient Isolated Execution of Hybrid OLTP+OLAP Workloads for Interactive Applications	Darko Makreshanski, Jana Giceva, Claude Barthels, Gustavo Alonso	In this paper we present BatchDB, an in-memory database engine designed for hybrid OLTP and OLAP workloads.
6	Azure Data Lake Store: A Hyperscale Distributed File Service for Big Data Analytics	Raghu Ramakrishnan, Baskar Sridharan, John R. Douceur, Pavan Kasturi, Balaji Krishnamachari-Sampath, Karthick Krishnamoorthy, Peng Li, Mitica Manu, Spiro Michaylov, Rogério Ramos, Neil Sharman, Zee Xu, Youssef Barakat, Chris Douglas, Richard Draves, Shrikant S. Naidu, Shankar Shastry, Atul Sikaria, Simon Sun, Ramarathnam Venkatesan	We present an overview of ADLS architecture, design points, and performance.
7	OctopusFS: A Distributed File System with Tiered Storage Management	Elena Kakoulli, Herodotos Herodotou	We present OctopusFS, a novel distributed file system that is aware of heterogeneous storage media (e.g., memory, SSDs, HDDs, NAS) with different capacities and performance characteristics.
8	Monkey: Optimal Navigable Key-Value Store	Niv Dayan, Manos Athanassoulis, Stratos Idreos	In this paper, we show that key-value stores backed by an LSM-tree exhibit an intrinsic trade-off between lookup cost, update cost, and main memory footprint, yet all existing designs expose a suboptimal and difficult to tune trade-off among these metrics.
9	Enabling Signal Processing over Data Streams	Milos Nikolic, Badrish Chandramouli, Jonathan Goldstein	In this paper, we advocate a deep integration of signal processing operations and general-purpose query processors.
10	Complete Event Trend Detection in High-Rate Event Streams	Olga Poppe, Chuan Lei, Salah Ahmed, Elke A. Rundensteiner	To overcome these limitations, we define the CET graph to compactly encode all CETs matched by a query.
11	LittleTable: A Time-Series Database and Its Uses	Sean Rhea, Eric Wang, Edmund Wong, Ethan Atkins, Nat Storer	We present LittleTable, a relational database that Cisco Meraki has used since 2008 to store usage statistics, event logs, and other time-series data from our customers’ devices.
12	Incremental View Maintenance over Array Data	Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, Peter Nugent	In this paper, we introduce materialized array views as a database construct for scientific data products.
13	Incremental Graph Computations: Doable and Undoable	Wenfei Fan, Chunming Hu, Chao Tian	In light of the negative results, we propose two characterizations for the effectiveness of incremental computation: (a) localizable, if its cost is decided by small neighbors of nodes in Δ G instead of the entire G; and (b) bounded relative to a batch algorithm T, if the cost is determined by the sizes of Δ G and changes to the affected area that is necessarily checked by T.
14	DEX: Query Execution in a Delta-based Storage System	Amit Chavan, Amol Deshpande	In this paper, we initiate a systematic study of this problem, and present DEX, a novel stand-alone delta-oriented execution engine, whose goal is to take advantage of the already computed deltas between the datasets for efficient query processing.
15	Massively Parallel Processing of Whole Genome Sequence Data: An In-Depth Performance Study	Abhishek Roy, Yanlei Diao, Uday Evani, Avinash Abhyankar, Clinton Howarth, Rémi Le Priol, Toby Bloom	The key goals of this study are to develop a thorough understanding of the strengths and limitations of big data technology for genomic data analysis, and to identify the key questions that the research community could address to realize the vision of personalized genomic medicine.
16	Distributed Provenance Compression	Chen Chen, Harshal Tushar Lehri, Lay Kuan Loh, Anupam Alur, Limin Jia, Boon Thau Loo, Wenchao Zhou	In this paper, we explore techniques to dynamically compress distributed provenance stored at scale.
17	ROBUS: Fair Cache Allocation for Data-parallel Workloads	Mayuresh Kunjir, Brandon Fain, Kamesh Munagala, Shivnath Babu	In this paper, we develop cache allocation strategies that speed up the overall workload while being fair to each tenant.
18	Transaction Repair for Multi-Version Concurrency Control	Mohammad Dashti, Sachin Basil John, Amir Shaikhha, Christoph Koch	In this paper, we propose a novel approach for conflict resolution in MVCC for in-memory databases.
19	Concerto: A High Concurrency Key-Value Store with Integrity	Arvind Arasu, Ken Eguro, Raghav Kaushik, Donald Kossmann, Pingfan Meng, Vineet Pandey, Ravi Ramamurthy	In this paper, we investigate the potential advantages of deferred and batched verification rather than the per-operation verification used in prior work.
20	Fast Failure Recovery for Main-Memory DBMSs on Multicores	Yingjun Wu, Wentian Guo, Chee-Yong Chan, Kian-Lee Tan	In this paper, we show that, by exploiting application semantics, it is possible to achieve speedy failure recovery without introducing any costly logging overhead to the execution of concurrent transactions.
21	Bringing Modular Concurrency Control to the Next Level	Chunzhi Su, Natacha Crooks, Cong Ding, Lorenzo Alvisi, Chao Xie	This paper presents Tebaldi, a distributed key-value store that explores new ways to harness the performance opportunity of combining different specialized concurrency control mechanisms (CCs) within the same database.
22	Wide Table Layout Optimization based on Column Ordering and Duplication	Haoqiong Bian, Ying Yan, Wenbo Tao, Liang Jeff Chen, Yueguo Chen, Xiaoyong Du, Thomas Moscibroda	In this paper, we aim to find such an optimal column layout to maximize I/O performance.
23	Query Centric Partitioning and Allocation for Partially Replicated Database Systems	Tilmann Rabl, Hans-Arno Jacobsen	To address this problem, we present an approach for efficient data allocation that features good scalability while keeping the data distribution transparent.
24	Spanner: Becoming a SQL System	David F. Bacon, Nathan Bales, Nico Bruno, Brian F. Cooper, Adam Dickinson, Andrew Fikes, Campbell Fraser, Andrey Gubarev, Milind Joshi, Eugene Kogan, Alexander Lloyd, Sergey Melnik, Rajesh Rao, David Shue, Christopher Taylor, Marcel van der Holst, Dale Woodford	We describe distributed query execution in the presence of resharding, query restarts upon transient failures, range extraction that drives query routing and index seeks, and the improved blockwise-columnar storage format.
25	Landmark Indexing for Evaluation of Label-Constrained Reachability Queries	Lucien D.J. Valstar, George H.L. Fletcher, Yuichi Yoshida	In this paper we present the first practical solution for efficient LCR evaluation, leveraging landmark-based indexes for large graphs.
26	Efficient Ad-Hoc Graph Inference and Matching in Biological Databases	Xiang Lian, Dongchul Kim	Motivated by this, in this paper, we formalize the problem of ad-hoc inference and matching over gene regulatory networks (IM-GRN), which deciphers ad-hoc GRN graph structures online from gene feature databases (without full GRN materializations), and retrieves the inferred GRNs that are subgraph-isomorphic to a query GRN graph with high confidences.
27	DAG Reduction: Fast Answering Reachability Queries	Junfeng Zhou, Shijie Zhou, Jeffrey Xu Yu, Hao Wei, Ziyang Chen, Xian Tang	In this paper, we study DAG reduction to accelerate reachability query processing, which reduces the size of G by computing transitive reduction (TR) followed by computing equivalence reduction (ER).
28	Flexible and Feasible Support Measures for Mining Frequent Patterns in Large Labeled Graphs	Jinghan Meng, Yi-cheng Tu	In this paper, we propose a novel framework for constructing support measures that brings together existing minimum-image-based and overlap-graph-based support measures.
29	Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures	David Sidler, Zsolt István, Muhsen Owaida, Gustavo Alonso	Taking advantage of recently released hybrid multicore architectures, such as the Intel’s Xeon+FPGA machine, where the FPGA has coherent access to the main memory through the QPI bus, we explore the benefits of specializing operators to hardware.
30	A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs	Elias Stehle, Hans-Arno Jacobsen	Our work proposes a novel approach that almost halves the amount of memory transfers and, therefore, considerably lifts the memory bandwidth limitation.
31	FPGA-based Data Partitioning	Kaan Kara, Jana Giceva, Gustavo Alonso	In this paper we explore the use of an FPGA to accelerate data partitioning.
32	Template Skycube Algorithms for Heterogeneous Parallelism on Multicore and GPU Architectures	Kenneth S. Bøgh, Sean Chester, Darius Šidlauskas, Ira Assent	We define three parallel templates, two that leverage insights from previous skycube research and a third that exploits a novel point-based paradigm to expose more data parallelism.
33	Heterogeneity-aware Distributed Parameter Servers	Jiawei Jiang, Bin Cui, Ce Zhang, Lele Yu	We study distributed machine learning in heterogeneous environments in this work.
34	Distributed Algorithms on Exact Personalized PageRank	Tao Guo, Xin Cao, Gao Cong, Jiaheng Lu, Xuemin Lin	In this paper, we propose novel and efficient distributed algorithms that compute PPV exactly based on graph partitioning on a general coordinator-based share-nothing distributed computing platform.
35	Parallelizing Sequential Graph Computations	Wenfei Fan, Jingbo Xu, Yinghui Wu, Wenyuan Yu, Jiaxin Jiang, Zeyu Zheng, Bohan Zhang, Yang Cao, Chao Tian	This paper presents GRAPE, a parallel system for graph computations.
36	Approximate Query Processing: No Silver Bullet	Surajit Chaudhuri, Bolin Ding, Srikanth Kandula	In this paper, we reflect on the state of the art of Approximate Query Processing.
37	Approximate Query Engines: Commercial Challenges and Research Opportunities	Barzan Mozafari	Our goal in this talk is to suggest some of the exciting research directions in this field that are worth pursuing.
38	Approximate Query Processing for Interactive Data Science	Tim Kraska	In this talk, I will present some of our recent results from building a third-generation AQP system, called IDEA.
39	Controlling False Discoveries During Interactive Data Exploration	Zheguang Zhao, Lorenzo De Stefani, Emanuel Zgraggen, Carsten Binnig, Eli Upfal, Tim Kraska	In this work, we propose a solution to integrate the control of multiple hypothesis testing into interactive data exploration systems.
40	MacroBase: Prioritizing Attention in Fast Data	Peter Bailis, Edward Gan, Samuel Madden, Deepak Narayanan, Kexin Rong, Sahaana Suri	In response, we present MacroBase, a data analytics engine that prioritizes end-user attention in high-volume fast data streams.
41	Data Canopy: Accelerating Exploratory Statistical Analysis	Abdul Wasay, Xinding Wei, Niv Dayan, Stratos Idreos	We address this challenge in Data Canopy, where descriptive and dependence statistics are synthesized from a library of basic aggregates.
42	Beta Probabilistic Databases: A Scalable Approach to Belief Updating and Parameter Learning	Niccolo’ Meneghetti, Oliver Kennedy, Wolfgang Gatterbauer	We use this model to provide the following key contributions: (i) we show how to scalably compute the posterior densities of the parameters given new evidence; (ii) we study the complexity of performing Bayesian belief updates, devising efficient algorithms for tractable classes of queries; (iii) we propose a soft-EM algorithm for computing maximum-likelihood estimates of the parameters; (iv) we show how to embed the proposed algorithms into a standard relational engine; (v) we support our conclusions with extensive experimental results. We introduce Beta Probabilistic Databases (B-PDBs), a generalization of TI-PDBs designed to support both (i) belief updating and (ii) parameter learning in a principled and scalable way.
43	Database Learning: Toward a Database that Becomes Smarter Every Time	Yongjoo Park, Ahmad Shahab Tajik, Michael Cafarella, Barzan Mozafari	We exploit the principle of maximum entropy to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations.
44	Staging User Feedback toward Rapid Conflict Resolution in Data Fusion	Romila Pradhan, Siarhei Bykau, Sunil Prabhakar	In this paper, we propose to leverage user feedback for validating data conflicts and rapidly improving the performance of fusion.
45	Discovering Your Selling Points: Personalized Social Influential Tags Exploration	Yuchen Li, Ju Fan, Dongxiang Zhang, Kian-Lee Tan	In this paper, we study a new social influence problem, called personalized social tags exploration (PITEX), to help any user in the SN explore how she influences the network.
46	Coarsening Massive Influence Networks for Scalable Diffusion Analysis	Naoto Ohsaka, Tomohiro Sonobe, Sumio Fujita, Ken-ichi Kawarabayashi	In this paper, we propose a new algorithm for reducing influence graphs.
47	Debunking the Myths of Influence Maximization: An In-Depth Benchmarking Study	Akhil Arora, Sainyam Galhotra, Sayan Ranu	In this paper, we perform an in-depth benchmarking study of IM techniques on social networks.
48	Interactive Mapping Specification with Exemplar Tuples	Angela Bonifati, Ugo Comignani, Emmanuel Coquery, Romuald Thion	In this paper, we present an interactive framework for schema mapping specification suited for non-expert users.
49	Foofah: Transforming Data By Example	Zhongjun Jin, Michael R. Anderson, Michael Cafarella, H. V. Jagadish	In this paper, we develop a technique to synthesize data transformation programs by example, reducing this burden by allowing the analyst to describe the transformation with a small input-output example pair, without being concerned with the transformation steps required to get there.
50	QIRANA: A Framework for Scalable Query Pricing	Shaleen Deep, Paraschos Koutris	In this work, we present a novel pricing system, called QIRANA, that performs query-based data pricing for a large class of SQL queries (including aggregation) in real time.
51	Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe?	Michael S. Kester, Manos Athanassoulis, Stratos Idreos	In this paper, we compare modern sequential scans and secondary index scans.
52	Optimization of Disjunctive Predicates for Main Memory Column Stores	Fisnik Kastrati, Guido Moerkotte	In this work, we focus on the complex problem of optimizing disjunctive predicates by means of the bypass processing technique.
53	A Top-Down Approach to Achieving Performance Predictability in Database Systems	Jiamin Huang, Barzan Mozafari, Grant Schoenebeck, Thomas F. Wenisch	In this paper, we focus on understanding and mitigating the sources of performance unpredictability in today’s transactional databases.
54	Two-Level Sampling for Join Size Estimation	Yu Chen, Ke Yi	In this paper, we propose a new sampling algorithm for join size estimation, called two-level sampling, which combines the advantages of three previous sampling methods while making further improvements.
55	A General-Purpose Counting Filter: Making Every Bit Count	Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro	This paper proposes a new general-purpose AMQ, the counting quotient filter (CQF).
56	BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart	Jinhong Jung, Namyong Park, Sael Lee, U Kang	In this paper, we propose BePI, a fast, memory-efficient, and scalable method for computing RWR on billion-scale graphs.
57	Determining the Impact Regions of Competing Options in Preference Space	Bo Tang, Kyriakos Mouratidis, Man Lung Yiu	In this paper we study the problem of determining in which regions of the preference space the weight vector should lie so that a given option (focal record) is among the top-k score-wise.
58	Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Representative	Abolfazl Asudeh, Azade Nazi, Nan Zhang, Gautam Das	Finding the maxima of a database based on a user preference, especially when the ranking function is a linear combination of the attributes, has been the subject of recent research.
59	FEXIPRO: Fast and Exact Inner Product Retrieval in Recommender Systems	Hui Li, Tsz Nam Chan, Man Lung Yiu, Nikos Mamoulis	Matrix Factorization (MF) is one of the most popular recommendation approaches; the original user-product rating matrix R with millions of rows and columns is decomposed into a user matrix Q and an item matrix P, such that the product Q^T P approximates R. Each column q (p) of Q (P) holds the latent factors of the corresponding user (item), and q^T p is a prediction of the rating to item p by user q. Recommender systems based on MF suggest to a user in q the items with the top-k scores in q^T P. For this problem, we propose a Fast and EXact Inner PROduct retrieval (FEXIPRO) framework, based on sequential scan, which includes three elements.
60	Feedback-Aware Social Event-Participant Arrangement	Jieying She, Yongxin Tong, Lei Chen, Tianshu Song	In this work, we study a new event-participant arrangement strategy for online scenarios, the Feedback-Aware Social Event-participant Arrangement (FASEA) problem, where satisfaction scores of an arrangement are learned adaptively and users can choose to accept or reject the arranged events.
61	Exploiting Common Patterns for Tree-Structured Data	Zhiyi Wang, Shimin Chen	In this paper, we aim to better understand tree-structured data types in real uses and optimize for the common patterns.
62	Extracting and Analyzing Hidden Graphs from Relational Databases	Konstantinos Xirogiannopoulos, Amol Deshpande	We present a general algorithm for creating such a condensed representation for a large class of graph extraction queries against arbitrary schemas.
63	TrillionG: A Trillion-scale Synthetic Graph Generator using a Recursive Vector Model	Himchan Park, Min-Soo Kim	Here, we propose an efficient and scalable disk-based graph generator, TrillionG that can generate massive graphs in a short time only using a small amount of memory.
64	Schema Independent Relational Learning	Jose Picado, Arash Termehchy, Alan Fern, Parisa Ataei	We propose Castor, a relational learning algorithm that achieves schema independence by leveraging data dependencies.
65	Scalable Kernel Density Classification via Threshold-Based Pruning	Edward Gan, Peter Bailis	In this paper, we introduce a simple technique for improving the performance of using a KDE to classify points by their density (density classification).
66	The BUDS Language for Distributed Bayesian Machine Learning	Zekai J. Gao, Shangyu Luo, Luis L. Perez, Chris Jermaine	We describe BUDS, a declarative language for succinctly and simply specifying the implementation of large-scale machine learning algorithms on a distributed computing platform.
67	A Cost-based Optimizer for Gradient Descent Optimization	Zoi Kaoudi, Jorge-Arnulfo Quiane-Ruiz, Saravanan Thirumuruganathan, Sanjay Chawla, Divy Agrawal	To build our optimizer, we introduce a set of abstract operators for expressing GD algorithms and propose a novel approach to estimate the number of iterations a GD algorithm requires to converge.
68	An Experimental Study of Bitmap Compression vs. Inverted List Compression	Jianguo Wang, Chunbin Lin, Yannis Papakonstantinou, Steven Swanson	To answer the question, we present the first comprehensive experimental study to compare a series of 9 bitmap compression methods and 12 inverted list compression methods.
69	Automatic Database Management System Tuning Through Large-scale Machine Learning	Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, Bohan Zhang	To overcome these challenges, we present an automated approach that leverages past experience and collects new information to tune DBMS configurations: we use a combination of supervised and unsupervised machine learning methods to (1) select the most impactful knobs, (2) map unseen database workloads to previous workloads from which we can transfer experience, and (3) recommend knob settings.
70	Solving the Join Ordering Problem via Mixed Integer Linear Programming	Immanuel Trummer, Christoph Koch	We present a MILP formulation for searching left-deep query plans.
71	Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases	Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, Xiaofeng Bao	In this paper, we describe the architecture of Aurora and the design considerations leading to that architecture.
72	Fast Searchable Encryption With Tunable Locality	Ioannis Demertzis, Charalampos Papamanthou	In this work, we design, formally prove secure, and evaluate the first SE scheme with tunable locality and linear space.
73	Cryptanalysis of Comparable Encryption in SIGMOD’16	Caleb Horst, Ryo Kikuchi, Keita Xagawa	Comparable Encryption proposed by Furukawa (ESORICS 2013, CANS 2014) is a variant of order-preserving encryption (OPE) and order-revealing encryption (ORE); we cannot compare a ciphertext of v and another ciphertext of v’, but we can compare a ciphertext of v and a token of b and compare a token of $b$ and another token of b’.
74	BLOCKBENCH: A Framework for Analyzing Private Blockchains	Tien Tuan Anh Dinh, Ji Wang, Gang Chen, Rui Liu, Beng Chin Ooi, Kian-Lee Tan	This paper concerns recent private blockchain systems designed with stronger security (trust) assumption and performance requirement.
75	Living in Parallel Realities: Co-Existing Schema Versions with a Bidirectional Database Evolution Language	Kai Herrmann, Hannes Voigt, Andreas Behrend, Jonas Rausch, Wolfgang Lehner	In this paper, we present InVerDa: developers use the simple bidirectional database evolution language BiDEL, which carries enough information to generate all delta code automatically.
76	Synthesizing Mapping Relationships Using Table Corpus	Yue Wang, Yeye He	Motivated by their broad applicability, we study the problem of synthesizing mapping relationships using a large table corpus.
77	Waldo: An Adaptive Human Interface for Crowd Entity Resolution	Vasilis Verroios, Hector Garcia-Molina, Yannis Papakonstantinou	We study a hybrid approach that combines two common interfaces for human tasks in Crowd Entity Resolution, taking into account key observations about the advantages and disadvantages of the two interfaces.
78	ZipG: A Memory-efficient Graph Store for Interactive Queries	Anurag Khandelwal, Zongheng Yang, Evan Ye, Rachit Agarwal, Ion Stoica	We present ZipG, a distributed memory-efficient graph store for serving interactive graph queries.
79	All-in-One: Graph Processing in RDBMSs Revisited	Kangfei Zhao, Jeffrey Xu Yu	In this paper, we focus on RDBM, which has been well studied over decades to manage large datasets, and we revisit the issue how RDBM can support graph processing at the SQL level.
80	Computing A Near-Maximum Independent Set in Linear Time by Reducing-Peeling	Lijun Chang, Wei Li, Wenjie Zhang	Observing that the existing techniques have various limits, in this paper, we aim to develop efficient algorithms (with linear or near-linear time complexity) that can generate a high-quality (large-size) independent set from a graph in practice.
81	Utility-Aware Ridesharing on Road Networks	Peng Cheng, Hao Xin, Lei Chen	To assign a new rider to a given vehicle, we propose an efficient algorithm with a minimum increase in travel cost without reordering the existing schedule of the vehicle.
82	Distance Oracle on Terrain Surface	Victor Junqiu Wei, Raymond Chi-Wing Wong, Cheng Long, David M. Mount	In this paper, we study the shortest distance query which is to find the shortest distance between a point-of-interest and another point-of-interest on the surface of the terrain due to a variety of applications.
83	Efficient Computation of Top-k Frequent Terms over Spatio-temporal Ranges	Pritom Ahmed, Mahbub Hasan, Abhijith Kashyap, Vagelis Hristidis, Vassilis J. Tsotras	In this paper we study a basic analytics query on geotagged data, namely: given a spatiotemporal region, find the most frequent terms among the social posts in that region.
84	Optimizing Iceberg Queries with Complex Joins	Brett Walenz, Sudeepa Roy, Jun Yang	This paper proposes a framework for combining a number of techniques—a-priori, memoization, and pruning—to optimize iceberg queries with complex joins.
85	The Dynamic Yannakakis Algorithm: Compact and Efficient Query Processing Under Updates	Muhammad Idris, Martin Ugarte, Stijn Vansummeren	In this paper, we show that the full materialization of results is a barrier for more general optimization strategies.
86	Revisiting Reuse in Main Memory Database Systems	Kayhan Dursun, Carsten Binnig, Ugur Cetintemel, TIm Kraska	We focus on hash tables, the most commonly used internal data structure in main memory databases to perform join and aggregation operations.
87	Pufferfish Privacy Mechanisms for Correlated Data	Shuang Song, Yizhen Wang, Kamalika Chaudhuri	Since this mechanism may be computationally inefficient, we provide an additional mechanism that applies to some practical cases such as physical activity measurements across time, and is computationally efficient.
88	Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics	Xi Wu, Fengan Li, Arun Kumar, Kamalika Chaudhuri, Somesh Jha, Jeffrey Naughton	We address this challenge by providing a novel analysis of the L₂-sensitivity of SGD, which allows, under the same privacy guarantees, better convergence of SGD when only a constant number of passes can be made over the data.
89	Pythia: Data Dependent Differentially Private Algorithm Selection	Ios Kotsogiannis, Ashwin Machanavajjhala, Michael Hay, Gerome Miklau	We address this challenge by proposing a novel meta-algorithm designed to relieve the data curator of the burden of algorithm selection.
90	Utility Cost of Formal Privacy for Releasing National Employer-Employee Statistics	Samuel Haney, Ashwin Machanavajjhala, John M. Abowd, Matthew Graham, Mark Kutzbach, Lars Vilhuber	In this work, we present novel algorithms for releasing tabular summaries of linked ER-EE data with formal, provable guarantees of privacy.
91	Online Deduplication for Databases	Lianghong Xu, Andrew Pavlo, Sudipta Sengupta, Gregory R. Ganger	dbDedup’s single-pass encoding method can be integrated into the storage and logging components of a DBMS to provide two benefits: (1) reduced size of data stored on disk beyond what traditional compression schemes provide, and (2) reduced amount of data transmitted over the network for replication services.
92	QFix: Diagnosing Errors through Query Histories	Xiaolan Wang, Alexandra Meliou, Eugene Wu	In this paper, we propose QFix, a framework that derives explanations and repairs for discrepancies in relational data, by analyzing the effect of queries that operated on the data and identifying potential mistakes in those queries.
93	UGuide: User-Guided Discovery of FD-Detectable Errors	Saravanan Thirumuruganathan, Laure Berti-Equille, Mourad Ouzzani, Jorge-Arnulfo Quiane-Ruiz, Nan Tang	In this paper, we propose an end-to-end solution to detect FD-detectable errors from dirty data.
94	SLiMFast: Guaranteed Results for Data Fusion and Source Reliability	Theodoros Rekatsinas, Manas Joglekar, Hector Garcia-Molina, Aditya Parameswaran, Christopher Ré	We propose SLiMFast, a framework that expresses data fusion as a statistical learning problem over discriminative probabilistic models, which in many cases correspond to logistic regression.
95	Crowdsourced Top-k Queries by Confidence-Aware Pairwise Judgments	Ngai Meng Kou, Yan Li, Hao Wang, Leong Hou U., Zhiguo Gong	In this work, we attempt to revisit the crowdsourced processing of the top-k queries, aiming at (1) securing the quality of crowdsourced comparisons by a certain confidence level and (2) minimizing the total monetary cost.
96	Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services	Sanjib Das, Paul Suganthan G.C., AnHai Doan, Jeffrey F. Naughton, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, Vijay Raghavendra, Youngchoon Park	We propose Falcon, a solution that scales up the hands-off crowdsourced EM approach of Corleone, using RDBMS-style query execution and optimization over a Hadoop cluster.
97	CrowdDQS: Dynamic Question Selection in Crowdsourcing Systems	Asif R. Khan, Hector Garcia-Molina	In this paper, we present CrowdDQS, a system that uses the most recent set of crowdsourced voting evidence to dynamically issue questions to workers on Amazon Mechanical Turk (AMT).
98	CDB: Optimizing Queries with Crowd-Based Selections and Joins	Guoliang Li, Chengliang Chai, Ju Fan, Xueping Weng, Jian Li, Yudian Zheng, Yuanbing Li, Xiang Yu, Xiaohang Zhang, Haitao Yuan	To address the limitations, we develop a crowd-powered database system CDB that supports crowd-based query optimizations, with focus on join and selection. We have also created a benchmark for evaluating crowd-powered databases.
99	Scaling Locally Linear Embedding	Yasuhiro Fujiwara, Naoki Marumo, Mathieu Blondel, Koh Takeuchi, Hideaki Kim, Tomoharu Iwata, Naonori Ueda	Our approach, Ripple, is based on two ideas: (1) it incrementally updates the edge weights by exploiting the Woodbury formula and (2) it efficiently computes eigenvectors of the LLE kernel by exploiting the LU decomposition-based inverse power method.
100	Dynamic Density Based Clustering	Junhao Gan, Yufei Tao	Motivated by the above, we investigate the algorithmic principles for dynamic clustering by DBSCAN, a successful representative of density-based clustering, and ρ-approximate DBSCAN, proposed to bring down the computational hardness of the former on static data.
101	Extracting Top-K Insights from Multi-dimensional Data	Bo Tang, Shi Han, Man Lung Yiu, Rui Ding, Dongmei Zhang	We propose a meaningful scoring function for insights to address (i).
102	QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and Skew-Tolerant Space-Filling Curves	Shoji Nishimura, Haruo Yokota	We propose a framework that involves a multidimensional indexing technique based on a space-filling curve.
103	Leveraging Re-costing for Online Optimization of Parameterized Queries with Guarantees	Anshuman Dutt, Vivek Narasayya, Surajit Chaudhuri	We propose a plan re-costing based approach that enables us to perform well on all three metrics.
104	Handling Environments in a Nested Relational Algebra with Combinators and an Implementation in a Verified Query Compiler	Joshua S. Auerbach, Martin Hirzel, Louis Mandel, Avraham Shinnar, Jérôme Siméon	This paper proposes NRA^e, an extension of a combinators-based nested relational algebra (NRA) with built-in support for environments.
105	From In-Place Updates to In-Place Appends: Revisiting Out-of-Place Updates on Flash	Sergey Hardock, Ilia Petrov, Robert Gottstein, Alejandro Buchmann	In this paper we propose an approach that transforms those small in-place updates into small update deltas that are appended to the original page.
106	Visual Graph Query Construction and Refinement	Robert Pienta, Fred Hohman, Acar Tamersoy, Alex Endert, Shamkant Navathe, Hanghang Tong, Duen Horng Chau	We will present the first demonstration of VISAGE, an interactive visual graph querying approach that empowers analysts to construct expressive queries, without writing complex code (see our video: https://youtu.be/l2L7Y5mCh1s).
107	Demonstration of the Cosette Automated SQL Prover	Shumo Chu, Daniel Li, Chenglong Wang, Alvin Cheung, Dan Suciu	Demonstration of the Cosette Automated SQL Prover
108	Interactive Time Series Analytics Powered by ONEX	Rodica Neamtu, Ramoza Ahsan, Charles Lovering, Cuong Nguyen, Elke Rundensteiner, Gabor Sarkozy	The ONEX (Online Exploration of Time Series) system supports effective exploratory analysis of time series collections composed of heterogeneous, variable-length and misaligned time series using robust alignment dynamic time warping (DTW) methods.
109	The VADA Architecture for Cost-Effective Data Wrangling	Nikolaos Konstantinou, Martin Koehler, Edward Abel, Cristina Civili, Bernd Neumayr, Emanuel Sallinger, Alvaro A.A. Fernandes, Georg Gottlob, John A. Keane, Leonid Libkin, Norman W. Paton	In this paper, we present an architecture that supports a complete data wrangling lifecycle, orchestrates components dynamically, builds on automation wherever possible, is informed by whatever data is available, refines automatically produced results in the light of feedback, takes into account the user’s priorities, and supports data scientists with diverse skill sets.
110	A Demonstration of Lusail: Querying Linked Data at Scale	Essam Mansour, Ibrahim Abdelaziz, Mourad Ouzzani, Ashraf Aboulnaga, Panos Kalnis	We will demonstrate Lusail; a system that supports the need of emerging applications to access tens to hundreds of geo-distributed datasets.
111	Foofah: A Programming-By-Example System for Synthesizing Data Transformation Programs	Zhongjun Jin, Michael R. Anderson, Michael Cafarella, H. V. Jagadish	We built a system called FOOFAH for helping the user easily synthesize a desired data transformation program.
112	Virtualized Network Service Topology Exploration Using Nepal	Pramod Jamkhedkar, Theodore Johnson, Yaron Kanza, Aman Shaikh, N.K. Shankarnarayanan, Vladislav Shkapenyuk, Gordon Woodhull	In this demonstration we present Nepal — a network path query language which is designed to effectively retrieve desired paths from a network graph.
113	VisualCloud Demonstration: A DBMS for Virtual Reality	Brandon Haynes, Artem Minyaylov, Magdalena Balazinska, Luis Ceze, Alvin Cheung	We demonstrate VisualCloud, a database management system designed to efficiently ingest, store, and deliver virtual reality (VR) content at scale.
114	The Best of Both Worlds: Big Data Programming with Both Productivity and Performance	Fan Yang, Yuzhen Huang, Yunjian Zhao, Jinfeng Li, Guanxian Jiang, James Cheng	In [7] our prior work, we proposed Husky which provides a highly expressive API to solve the above dilemma.
115	In-Browser Interactive SQL Analytics with Afterburner	Kareem El Gebaly, Jimmy Lin	On the TPC-H benchmark, we show that Afterburner achieves comparable performance to MonetDB running natively on the same machine.
116	Debugging Big Data Analytics in Spark with BigDebug	Muhammad Ali Gulzar, Matteo Interlandi, Tyson Condie, Miryung Kim	Debugging Big Data Analytics in Spark with BigDebug
117	Interactive Query Synthesis from Input-Output Examples	Chenglong Wang, Alvin Cheung, Rastislav Bodik	Interactive Query Synthesis from Input-Output Examples
118	Generating Concise Entity Matching Rules	Rohit Singh, Vamsi Meduri, Ahmed Elmagarmid, Samuel Madden, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, Armando Solar-Lezama, Nan Tang	We model EM rules in the form of General Boolean Formulas (GBFs) that allows arbitrary attribute matching combined by conjunctions (∨), disjunctions (∧), and negations.
119	A Demo of the Data Civilizer System	Raul Castro Fernandez, Dong Deng, Essam Mansour, Abdulhakim A. Qahtan, Wenbo Tao, Ziawasch Abedjan, Ahmed Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, Nan Tang	We propose to demonstrate DATA CIVILIZER to ease the pain faced in analyzing data "in the wild".
120	Querying and Exploring Polygamous Relationships in Urban Spatio-Temporal Data Sets	Yeuk-Yin Chan, Fernando Chirigati, Harish Doraiswamy, Cláudio T. Silva, Juliana Freire	In this demo, we show how visualization can help in the discovery of relationships that are potentially interesting by allowing users to query and explore the relationship set in an intuitive way.
121	Graph Data Mining with Arabesque	Eslam Hussein, Abdurrahman Ghanem, Vinicius Vitor dos Santos Dias, Carlos H.C. Teixeira, Ghadeer AbuOda, Marco Serafini, Georgos Siganos, Gianmarco De Francisci Morales, Ashraf Aboulnaga, Mohammed Zaki	These problems differ from other graph processing problems such as PageRank or shortest path in that graph data mining requires searching through an exponential number of subgraphs.
122	Alpine: Efficient In-Situ Data Exploration in the Presence of Updates	Antonios Anagnostou, Matthaios Olma, Anastasia Ailamaki	We present Alpine, our prototype implementation, which combines the tuner with a query executor incorporating in situ query techniques to provide efficient raw data access.
123	OrpheusDB: A Lightweight Approach to Relational Dataset Versioning	Liqi Xu, Silu Huang, Sili Hui, Aaron J. Elmore, Aditya Parameswaran	We demonstrate OrpheusDB, a lightweight approach to versioning of relational datasets.
124	doppioDB: A Hardware Accelerated Database	David Sidler, Zsolt Istvan, Muhsen Owaida, Kaan Kara, Gustavo Alonso	We present doppioDB which consists of MonetDB, a main-memory column store, extended with Hardware User Defined Functions (HUDFs).
125	DBridge: Translating Imperative Code to SQL	K. Venkatesh Emani, Tejas Deshpande, Karthik Ramachandra, S. Sudarshan	We show the performance gains achieved by employing our system on real world applications that use JDBC or Hibernate.
126	BEAS: Bounded Evaluation of SQL Queries	Yang Cao, Wenfei Fan, Yanghao Wang, Tengfei Yuan, Yanchao Li, Laura Yu Chen	We demonstrate BEAS, a prototype system for querying relations with bounded resources.
127	Safe Visual Data Exploration	Zheguang Zhao, Emanuel Zgraggen, Lorenzo De Stefani, Carsten Binnig, Eli Upfal, Tim Kraska	Thus without proper statistical control, the risk of false discovery renders visual data exploration unsafe and makes users susceptible to questionable inference.To address these problems, we present QUDE, a visual data exploration system that interacts with users to formulate hypotheses based on visualizations and provides interactive control of false discoveries.
128	Optimizing Data-Intensive Applications Automatically By Leveraging Parallel Data Processing Frameworks	Maaz Bin Safeer Ahmad, Alvin Cheung	In our interactive presentation, we will use CASPER to optimize sequential implementations of data visualization programs as well as image processing kernels.
129	DIAS: Differentially Private Interactive Algorithm Selection using Pythia	Ios Kotsogiannis, Michael Hay, Ashwin Machanavajjhala, Gerome Miklau, Margaret Orr	In this demonstration we present DIAS (Differentially-private Interactive Algorithm Selection), an educational privacy game.
130	Snorkel: Fast Training Set Generation for Information Extraction	Alexander J. Ratner, Stephen H. Bach, Henry R. Ehrenberg, Chris Ré	State-of-the art machine learning methods such as deep learning rely on large sets of hand-labeled training data.
131	Synthesizing Extraction Rules from User Examples with SEER	Maeda F. Hanafi, Azza Abouzied, Laura Chiticariu, Yunyao Li	SEER’s design principles and learning algorithm are motivated by how rule developers naturally construct data extraction rules.
132	Scout: A GPU-Aware System for Interactive Spatio-temporal Data Visualization	Harshada Chavan, Mohamed F. Mokbel	We use real data sets to demonstrate scalability and important features of Scout.
133	Graphflow: An Active Graph Database	Chathura Kankanamge, Siddhartha Sahu, Amine Mhedbhi, Jeremy Chen, Semih Salihoglu	At the core of Graphflow’s query processor are two worst-case optimal join algorithms called Generic Join and our new Delta Generic Join algorithm for one-time and continuous subgraph queries, respectively.
134	Demonstration: MacroBase, A Fast Data Analysis Engine	Peter Bailis, Edward Gan, Kexin Rong, Sahaana Suri	To address this gap, we have developed MacroBase, a fast data analytics engine that acts as a search engine over fast data streams.
135	Q*cert: A Platform for Implementing and Verifying Query Compilers	Joshua S. Auerbach, Martin Hirzel, Louis Mandel, Avraham Shinnar, Jérôme Siméon	We present Q*cert, a platform for the specification, verification, and implementation of query compilers written using the Coq proof assistant.
136	A Demonstration of Interactive Analysis of Performance Measurements with Viska	Helga Gudmundsdottir, Babak Salimi, Magdalena Balazinska, Dan R.K. Ports, Dan Suciu	We make this goal easier to achieve with Viska, a new tool for generating and interpreting performance measurement results.
137	Crowdsourced Data Management: Overview and Challenges	Guoliang Li, Yudian Zheng, Ju Fan, Jiannan Wang, Reynold Cheng	In this tutorial, we will survey and synthesize a wide spectrum of existing studies on crowdsourced data management. Finally, we provide the emerging challenges.
138	Data Management in Machine Learning: Challenges, Techniques, and Systems	Arun Kumar, Matthias Boehm, Jun Yang	This tutorial provides a comprehensive review of such systems and analyzes key data management challenges and techniques.
139	Data Management Challenges in Production Machine Learning	Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang, Martin Zinkevich	The goal of the tutorial is to bring forth these issues, draw connections to prior work in the database literature, and outline the open research questions that are not addressed by prior art.
140	Differential Privacy in the Wild: A Tutorial on Current Practices & Open Challenges	Ashwin Machanavajjhala, Xi He, Michael Hay	In the second half of the tutorial we will highlight real world applications on complex data types, and identify research challenges in applying differential privacy to real world applications.
141	Graph Querying Meets HCI: State of the Art and Future Directions	Sourav S. Bhowmick, Byron Choi, Chengkai Li	In this tutorial, we survey recent developments in the emerging area of visual graph querying paradigm that bridges traditional graph querying with human computer interaction (HCI).
142	Graph Exploration: From Users to Large Graphs	Davide Mottin, Emmanuel Müller	In this tutorial, we will discuss a set of techniques, which have been developed in the last few years for independent purposes, within a unified graph exploration taxonomy.
143	Building Structured Databases of Factual Knowledge from Massive Text Corpora	Xiang Ren, Meng Jiang, Jingbo Shang, Jiawei Han	In this tutorial, we introduce data-driven methods on mining structured facts (i.e., entities and their relations/attributes for types of interest) from massive text corpora, to construct structured databases of factual knowledge (called StructDBs).
144	Data Profiling: A Tutorial	Ziawasch Abedjan, Lukasz Golab, Felix Naumann	In this tutorial, we highlight the importance of data profiling as part of any data-related use-case, and we discuss the area of data profiling by classifying data profiling tasks and reviewing the state-of-the-art data profiling systems and techniques.
145	How to Build a Non-Volatile Memory Database Management System	Joy Arulraj, Andrew Pavlo	In this tutorial, we provide an outline on how to build a new DBMS given the changes to hardware landscape due to NVM.
146	Data Structure Engineering For Byte-Addressable Non-Volatile Memory	Ismail Oukid, Wolfgang Lehner	In this tutorial we will dissect SCM challenges and provide an in-depth view of existing programming models that circumvent them, as well as novel data structures that stem from these models.
147	Natural Language Data Management and Interfaces: Recent Development and Open Challenges	Yunyao Li, Davood Rafiei	The tutorial presents state-of-the-art methods, related systems, research opportunities and challenges covering both areas.
148	Hybrid Transactional/Analytical Processing: A Survey	Fatma Özcan, Yuanyuan Tian, Pinar Tözün	The goal of this tutorial is to 1-) quickly review the historical progression of OLTP and OLAP systems, 2-) discuss the driving factors for HTAP, and finally 3-) provide a deep technical analysis of existing and emerging HTAP solutions, detailing their key architectural differences and trade-offs.
149	Query Processing Techniques for Big Spatial-Keyword Data	Ahmed Mahmood, Walid G. Aref	We describe the main models for big spatial-keyword processing, and list the popular spatial-keyword queries.