Paper Digest: SIGMOD 2017 Highlights
The ACM Special Interest Group on Management of Data (SIGMOD) is one of the top conferences on database management systems and data management technology.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
team@paperdigest.org
TABLE 1: SIGMOD 2017 Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | The Next 700 Transaction Processing Engines | Anastasia Ailamaki | In this talk, we discuss the implications of these trends on the design of next-generation transaction processing engines. |
2 | What Are We Doing With Our Lives?: Nobody Cares About Our Concurrency Control Research | Andrew Pavlo | In this talk/denouncement, I will descend from my ivory tower and argue that we need to rethink our agenda for concurrency control research. |
3 | ACIDRain: Concurrency-Related Attacks on Database-Backed Web Applications | Todd Warszawski, Peter Bailis | In this paper, we formalize a new kind of attack on database-backed applications called an ACIDRain attack, in which an adversary systematically exploits concurrency-related vulnerabilities via programmatically accessible APIs. |
4 | Cicada: Dependably Fast Multi-Core In-Memory Transactions | Hyeontaek Lim, Michael Kaminsky, David G. Andersen | Cicada: Dependably Fast Multi-Core In-Memory Transactions |
5 | BatchDB: Efficient Isolated Execution of Hybrid OLTP+OLAP Workloads for Interactive Applications | Darko Makreshanski, Jana Giceva, Claude Barthels, Gustavo Alonso | In this paper we present BatchDB, an in-memory database engine designed for hybrid OLTP and OLAP workloads. |
6 | Azure Data Lake Store: A Hyperscale Distributed File Service for Big Data Analytics | Raghu Ramakrishnan, Baskar Sridharan, John R. Douceur, Pavan Kasturi, Balaji Krishnamachari-Sampath, Karthick Krishnamoorthy, Peng Li, Mitica Manu, Spiro Michaylov, Rogério Ramos, Neil Sharman, Zee Xu, Youssef Barakat, Chris Douglas, Richard Draves, Shrikant S. Naidu, Shankar Shastry, Atul Sikaria, Simon Sun, Ramarathnam Venkatesan | We present an overview of ADLS architecture, design points, and performance. |
7 | OctopusFS: A Distributed File System with Tiered Storage Management | Elena Kakoulli, Herodotos Herodotou | We present OctopusFS, a novel distributed file system that is aware of heterogeneous storage media (e.g., memory, SSDs, HDDs, NAS) with different capacities and performance characteristics. |
8 | Monkey: Optimal Navigable Key-Value Store | Niv Dayan, Manos Athanassoulis, Stratos Idreos | In this paper, we show that key-value stores backed by an LSM-tree exhibit an intrinsic trade-off between lookup cost, update cost, and main memory footprint, yet all existing designs expose a suboptimal and difficult to tune trade-off among these metrics. |
9 | Enabling Signal Processing over Data Streams | Milos Nikolic, Badrish Chandramouli, Jonathan Goldstein | In this paper, we advocate a deep integration of signal processing operations and general-purpose query processors. |
10 | Complete Event Trend Detection in High-Rate Event Streams | Olga Poppe, Chuan Lei, Salah Ahmed, Elke A. Rundensteiner | To overcome these limitations, we define the CET graph to compactly encode all CETs matched by a query. |
11 | LittleTable: A Time-Series Database and Its Uses | Sean Rhea, Eric Wang, Edmund Wong, Ethan Atkins, Nat Storer | We present LittleTable, a relational database that Cisco Meraki has used since 2008 to store usage statistics, event logs, and other time-series data from our customers’ devices. |
12 | Incremental View Maintenance over Array Data | Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu, Peter Nugent | In this paper, we introduce materialized array views as a database construct for scientific data products. |
13 | Incremental Graph Computations: Doable and Undoable | Wenfei Fan, Chunming Hu, Chao Tian | In light of the negative results, we propose two characterizations for the effectiveness of incremental computation: (a) localizable, if its cost is decided by small neighbors of nodes in Δ G instead of the entire G; and (b) bounded relative to a batch algorithm T, if the cost is determined by the sizes of Δ G and changes to the affected area that is necessarily checked by T. |
14 | DEX: Query Execution in a Delta-based Storage System | Amit Chavan, Amol Deshpande | In this paper, we initiate a systematic study of this problem, and present DEX, a novel stand-alone delta-oriented execution engine, whose goal is to take advantage of the already computed deltas between the datasets for efficient query processing. |
15 | Massively Parallel Processing of Whole Genome Sequence Data: An In-Depth Performance Study | Abhishek Roy, Yanlei Diao, Uday Evani, Avinash Abhyankar, Clinton Howarth, Rémi Le Priol, Toby Bloom | The key goals of this study are to develop a thorough understanding of the strengths and limitations of big data technology for genomic data analysis, and to identify the key questions that the research community could address to realize the vision of personalized genomic medicine. |
16 | Distributed Provenance Compression | Chen Chen, Harshal Tushar Lehri, Lay Kuan Loh, Anupam Alur, Limin Jia, Boon Thau Loo, Wenchao Zhou | In this paper, we explore techniques to dynamically compress distributed provenance stored at scale. |
17 | ROBUS: Fair Cache Allocation for Data-parallel Workloads | Mayuresh Kunjir, Brandon Fain, Kamesh Munagala, Shivnath Babu | In this paper, we develop cache allocation strategies that speed up the overall workload while being fair to each tenant. |
18 | Transaction Repair for Multi-Version Concurrency Control | Mohammad Dashti, Sachin Basil John, Amir Shaikhha, Christoph Koch | In this paper, we propose a novel approach for conflict resolution in MVCC for in-memory databases. |
19 | Concerto: A High Concurrency Key-Value Store with Integrity | Arvind Arasu, Ken Eguro, Raghav Kaushik, Donald Kossmann, Pingfan Meng, Vineet Pandey, Ravi Ramamurthy | In this paper, we investigate the potential advantages of deferred and batched verification rather than the per-operation verification used in prior work. |
20 | Fast Failure Recovery for Main-Memory DBMSs on Multicores | Yingjun Wu, Wentian Guo, Chee-Yong Chan, Kian-Lee Tan | In this paper, we show that, by exploiting application semantics, it is possible to achieve speedy failure recovery without introducing any costly logging overhead to the execution of concurrent transactions. |
21 | Bringing Modular Concurrency Control to the Next Level | Chunzhi Su, Natacha Crooks, Cong Ding, Lorenzo Alvisi, Chao Xie | This paper presents Tebaldi, a distributed key-value store that explores new ways to harness the performance opportunity of combining different specialized concurrency control mechanisms (CCs) within the same database. |
22 | Wide Table Layout Optimization based on Column Ordering and Duplication | Haoqiong Bian, Ying Yan, Wenbo Tao, Liang Jeff Chen, Yueguo Chen, Xiaoyong Du, Thomas Moscibroda | In this paper, we aim to find such an optimal column layout to maximize I/O performance. |
23 | Query Centric Partitioning and Allocation for Partially Replicated Database Systems | Tilmann Rabl, Hans-Arno Jacobsen | To address this problem, we present an approach for efficient data allocation that features good scalability while keeping the data distribution transparent. |
24 | Spanner: Becoming a SQL System | David F. Bacon, Nathan Bales, Nico Bruno, Brian F. Cooper, Adam Dickinson, Andrew Fikes, Campbell Fraser, Andrey Gubarev, Milind Joshi, Eugene Kogan, Alexander Lloyd, Sergey Melnik, Rajesh Rao, David Shue, Christopher Taylor, Marcel van der Holst, Dale Woodford | We describe distributed query execution in the presence of resharding, query restarts upon transient failures, range extraction that drives query routing and index seeks, and the improved blockwise-columnar storage format. |
25 | Landmark Indexing for Evaluation of Label-Constrained Reachability Queries | Lucien D.J. Valstar, George H.L. Fletcher, Yuichi Yoshida | In this paper we present the first practical solution for efficient LCR evaluation, leveraging landmark-based indexes for large graphs. |
26 | Efficient Ad-Hoc Graph Inference and Matching in Biological Databases | Xiang Lian, Dongchul Kim | Motivated by this, in this paper, we formalize the problem of ad-hoc inference and matching over gene regulatory networks (IM-GRN), which deciphers ad-hoc GRN graph structures online from gene feature databases (without full GRN materializations), and retrieves the inferred GRNs that are subgraph-isomorphic to a query GRN graph with high confidences. |
27 | DAG Reduction: Fast Answering Reachability Queries | Junfeng Zhou, Shijie Zhou, Jeffrey Xu Yu, Hao Wei, Ziyang Chen, Xian Tang | In this paper, we study DAG reduction to accelerate reachability query processing, which reduces the size of G by computing transitive reduction (TR) followed by computing equivalence reduction (ER). |
28 | Flexible and Feasible Support Measures for Mining Frequent Patterns in Large Labeled Graphs | Jinghan Meng, Yi-cheng Tu | In this paper, we propose a novel framework for constructing support measures that brings together existing minimum-image-based and overlap-graph-based support measures. |
29 | Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures | David Sidler, Zsolt István, Muhsen Owaida, Gustavo Alonso | Taking advantage of recently released hybrid multicore architectures, such as the Intel’s Xeon+FPGA machine, where the FPGA has coherent access to the main memory through the QPI bus, we explore the benefits of specializing operators to hardware. |
30 | A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs | Elias Stehle, Hans-Arno Jacobsen | Our work proposes a novel approach that almost halves the amount of memory transfers and, therefore, considerably lifts the memory bandwidth limitation. |
31 | FPGA-based Data Partitioning | Kaan Kara, Jana Giceva, Gustavo Alonso | In this paper we explore the use of an FPGA to accelerate data partitioning. |
32 | Template Skycube Algorithms for Heterogeneous Parallelism on Multicore and GPU Architectures | Kenneth S. Bøgh, Sean Chester, Darius Šidlauskas, Ira Assent | We define three parallel templates, two that leverage insights from previous skycube research and a third that exploits a novel point-based paradigm to expose more data parallelism. |
33 | Heterogeneity-aware Distributed Parameter Servers | Jiawei Jiang, Bin Cui, Ce Zhang, Lele Yu | We study distributed machine learning in heterogeneous environments in this work. |
34 | Distributed Algorithms on Exact Personalized PageRank | Tao Guo, Xin Cao, Gao Cong, Jiaheng Lu, Xuemin Lin | In this paper, we propose novel and efficient distributed algorithms that compute PPV exactly based on graph partitioning on a general coordinator-based share-nothing distributed computing platform. |
35 | Parallelizing Sequential Graph Computations | Wenfei Fan, Jingbo Xu, Yinghui Wu, Wenyuan Yu, Jiaxin Jiang, Zeyu Zheng, Bohan Zhang, Yang Cao, Chao Tian | This paper presents GRAPE, a parallel system for graph computations. |
36 | Approximate Query Processing: No Silver Bullet | Surajit Chaudhuri, Bolin Ding, Srikanth Kandula | In this paper, we reflect on the state of the art of Approximate Query Processing. |
37 | Approximate Query Engines: Commercial Challenges and Research Opportunities | Barzan Mozafari | Our goal in this talk is to suggest some of the exciting research directions in this field that are worth pursuing. |
38 | Approximate Query Processing for Interactive Data Science | Tim Kraska | In this talk, I will present some of our recent results from building a third-generation AQP system, called IDEA. |
39 | Controlling False Discoveries During Interactive Data Exploration | Zheguang Zhao, Lorenzo De Stefani, Emanuel Zgraggen, Carsten Binnig, Eli Upfal, Tim Kraska | In this work, we propose a solution to integrate the control of multiple hypothesis testing into interactive data exploration systems. |
40 | MacroBase: Prioritizing Attention in Fast Data | Peter Bailis, Edward Gan, Samuel Madden, Deepak Narayanan, Kexin Rong, Sahaana Suri | In response, we present MacroBase, a data analytics engine that prioritizes end-user attention in high-volume fast data streams. |
41 | Data Canopy: Accelerating Exploratory Statistical Analysis | Abdul Wasay, Xinding Wei, Niv Dayan, Stratos Idreos | We address this challenge in Data Canopy, where descriptive and dependence statistics are synthesized from a library of basic aggregates. |
42 | Beta Probabilistic Databases: A Scalable Approach to Belief Updating and Parameter Learning | Niccolo’ Meneghetti, Oliver Kennedy, Wolfgang Gatterbauer | We use this model to provide the following key contributions: (i) we show how to scalably compute the posterior densities of the parameters given new evidence; (ii) we study the complexity of performing Bayesian belief updates, devising efficient algorithms for tractable classes of queries; (iii) we propose a soft-EM algorithm for computing maximum-likelihood estimates of the parameters; (iv) we show how to embed the proposed algorithms into a standard relational engine; (v) we support our conclusions with extensive experimental results. We introduce Beta Probabilistic Databases (B-PDBs), a generalization of TI-PDBs designed to support both (i) belief updating and (ii) parameter learning in a principled and scalable way. |
43 | Database Learning: Toward a Database that Becomes Smarter Every Time | Yongjoo Park, Ahmad Shahab Tajik, Michael Cafarella, Barzan Mozafari | We exploit the principle of maximum entropy to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations. |
44 | Staging User Feedback toward Rapid Conflict Resolution in Data Fusion | Romila Pradhan, Siarhei Bykau, Sunil Prabhakar | In this paper, we propose to leverage user feedback for validating data conflicts and rapidly improving the performance of fusion. |
45 | Discovering Your Selling Points: Personalized Social Influential Tags Exploration | Yuchen Li, Ju Fan, Dongxiang Zhang, Kian-Lee Tan | In this paper, we study a new social influence problem, called personalized social tags exploration (PITEX), to help any user in the SN explore how she influences the network. |
46 | Coarsening Massive Influence Networks for Scalable Diffusion Analysis | Naoto Ohsaka, Tomohiro Sonobe, Sumio Fujita, Ken-ichi Kawarabayashi | In this paper, we propose a new algorithm for reducing influence graphs. |
47 | Debunking the Myths of Influence Maximization: An In-Depth Benchmarking Study | Akhil Arora, Sainyam Galhotra, Sayan Ranu | In this paper, we perform an in-depth benchmarking study of IM techniques on social networks. |
48 | Interactive Mapping Specification with Exemplar Tuples | Angela Bonifati, Ugo Comignani, Emmanuel Coquery, Romuald Thion | In this paper, we present an interactive framework for schema mapping specification suited for non-expert users. |
49 | Foofah: Transforming Data By Example | Zhongjun Jin, Michael R. Anderson, Michael Cafarella, H. V. Jagadish | In this paper, we develop a technique to synthesize data transformation programs by example, reducing this burden by allowing the analyst to describe the transformation with a small input-output example pair, without being concerned with the transformation steps required to get there. |
50 | QIRANA: A Framework for Scalable Query Pricing | Shaleen Deep, Paraschos Koutris | In this work, we present a novel pricing system, called QIRANA, that performs query-based data pricing for a large class of SQL queries (including aggregation) in real time. |
51 | Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe? | Michael S. Kester, Manos Athanassoulis, Stratos Idreos | In this paper, we compare modern sequential scans and secondary index scans. |
52 | Optimization of Disjunctive Predicates for Main Memory Column Stores | Fisnik Kastrati, Guido Moerkotte | In this work, we focus on the complex problem of optimizing disjunctive predicates by means of the bypass processing technique. |
53 | A Top-Down Approach to Achieving Performance Predictability in Database Systems | Jiamin Huang, Barzan Mozafari, Grant Schoenebeck, Thomas F. Wenisch | In this paper, we focus on understanding and mitigating the sources of performance unpredictability in today’s transactional databases. |
54 | Two-Level Sampling for Join Size Estimation | Yu Chen, Ke Yi | In this paper, we propose a new sampling algorithm for join size estimation, called two-level sampling, which combines the advantages of three previous sampling methods while making further improvements. |
55 | A General-Purpose Counting Filter: Making Every Bit Count | Prashant Pandey, Michael A. Bender, Rob Johnson, Rob Patro | This paper proposes a new general-purpose AMQ, the counting quotient filter (CQF). |
56 | BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart | Jinhong Jung, Namyong Park, Sael Lee, U Kang | In this paper, we propose BePI, a fast, memory-efficient, and scalable method for computing RWR on billion-scale graphs. |
57 | Determining the Impact Regions of Competing Options in Preference Space | Bo Tang, Kyriakos Mouratidis, Man Lung Yiu | In this paper we study the problem of determining in which regions of the preference space the weight vector should lie so that a given option (focal record) is among the top-k score-wise. |
58 | Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Representative | Abolfazl Asudeh, Azade Nazi, Nan Zhang, Gautam Das | Finding the maxima of a database based on a user preference, especially when the ranking function is a linear combination of the attributes, has been the subject of recent research. |
59 | FEXIPRO: Fast and Exact Inner Product Retrieval in Recommender Systems | Hui Li, Tsz Nam Chan, Man Lung Yiu, Nikos Mamoulis | Matrix Factorization (MF) is one of the most popular recommendation approaches; the original user-product rating matrix R with millions of rows and columns is decomposed into a user matrix Q and an item matrix P, such that the product QT P approximates R. Each column q (p) of Q (P) holds the latent factors of the corresponding user (item), and qT p is a prediction of the rating to item p by user q. Recommender systems based on MF suggest to a user in q the items with the top-k scores in qT P. For this problem, we propose a Fast and EXact Inner PROduct retrieval (FEXIPRO) framework, based on sequential scan, which includes three elements. |
60 | Feedback-Aware Social Event-Participant Arrangement | Jieying She, Yongxin Tong, Lei Chen, Tianshu Song | In this work, we study a new event-participant arrangement strategy for online scenarios, the Feedback-Aware Social Event-participant Arrangement (FASEA) problem, where satisfaction scores of an arrangement are learned adaptively and users can choose to accept or reject the arranged events. |
61 | Exploiting Common Patterns for Tree-Structured Data | Zhiyi Wang, Shimin Chen | In this paper, we aim to better understand tree-structured data types in real uses and optimize for the common patterns. |
62 | Extracting and Analyzing Hidden Graphs from Relational Databases | Konstantinos Xirogiannopoulos, Amol Deshpande | We present a general algorithm for creating such a condensed representation for a large class of graph extraction queries against arbitrary schemas. |
63 | TrillionG: A Trillion-scale Synthetic Graph Generator using a Recursive Vector Model | Himchan Park, Min-Soo Kim | Here, we propose an efficient and scalable disk-based graph generator, TrillionG that can generate massive graphs in a short time only using a small amount of memory. |
64 | Schema Independent Relational Learning | Jose Picado, Arash Termehchy, Alan Fern, Parisa Ataei | We propose Castor, a relational learning algorithm that achieves schema independence by leveraging data dependencies. |
65 | Scalable Kernel Density Classification via Threshold-Based Pruning | Edward Gan, Peter Bailis | In this paper, we introduce a simple technique for improving the performance of using a KDE to classify points by their density (density classification). |
66 | The BUDS Language for Distributed Bayesian Machine Learning | Zekai J. Gao, Shangyu Luo, Luis L. Perez, Chris Jermaine | We describe BUDS, a declarative language for succinctly and simply specifying the implementation of large-scale machine learning algorithms on a distributed computing platform. |
67 | A Cost-based Optimizer for Gradient Descent Optimization | Zoi Kaoudi, Jorge-Arnulfo Quiane-Ruiz, Saravanan Thirumuruganathan, Sanjay Chawla, Divy Agrawal | To build our optimizer, we introduce a set of abstract operators for expressing GD algorithms and propose a novel approach to estimate the number of iterations a GD algorithm requires to converge. |
68 | An Experimental Study of Bitmap Compression vs. Inverted List Compression | Jianguo Wang, Chunbin Lin, Yannis Papakonstantinou, Steven Swanson | To answer the question, we present the first comprehensive experimental study to compare a series of 9 bitmap compression methods and 12 inverted list compression methods. |
69 | Automatic Database Management System Tuning Through Large-scale Machine Learning | Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, Bohan Zhang | To overcome these challenges, we present an automated approach that leverages past experience and collects new information to tune DBMS configurations: we use a combination of supervised and unsupervised machine learning methods to (1) select the most impactful knobs, (2) map unseen database workloads to previous workloads from which we can transfer experience, and (3) recommend knob settings. |
70 | Solving the Join Ordering Problem via Mixed Integer Linear Programming | Immanuel Trummer, Christoph Koch | We present a MILP formulation for searching left-deep query plans. |
71 | Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases | Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, Xiaofeng Bao | In this paper, we describe the architecture of Aurora and the design considerations leading to that architecture. |
72 | Fast Searchable Encryption With Tunable Locality | Ioannis Demertzis, Charalampos Papamanthou | In this work, we design, formally prove secure, and evaluate the first SE scheme with tunable locality and linear space. |
73 | Cryptanalysis of Comparable Encryption in SIGMOD’16 | Caleb Horst, Ryo Kikuchi, Keita Xagawa | Comparable Encryption proposed by Furukawa (ESORICS 2013, CANS 2014) is a variant of order-preserving encryption (OPE) and order-revealing encryption (ORE); we cannot compare a ciphertext of v and another ciphertext of v’, but we can compare a ciphertext of v and a token of b and compare a token of $b$ and another token of b’. |
74 | BLOCKBENCH: A Framework for Analyzing Private Blockchains | Tien Tuan Anh Dinh, Ji Wang, Gang Chen, Rui Liu, Beng Chin Ooi, Kian-Lee Tan | This paper concerns recent private blockchain systems designed with stronger security (trust) assumption and performance requirement. |
75 | Living in Parallel Realities: Co-Existing Schema Versions with a Bidirectional Database Evolution Language | Kai Herrmann, Hannes Voigt, Andreas Behrend, Jonas Rausch, Wolfgang Lehner | In this paper, we present InVerDa: developers use the simple bidirectional database evolution language BiDEL, which carries enough information to generate all delta code automatically. |
76 | Synthesizing Mapping Relationships Using Table Corpus | Yue Wang, Yeye He | Motivated by their broad applicability, we study the problem of synthesizing mapping relationships using a large table corpus. |
77 | Waldo: An Adaptive Human Interface for Crowd Entity Resolution | Vasilis Verroios, Hector Garcia-Molina, Yannis Papakonstantinou | We study a hybrid approach that combines two common interfaces for human tasks in Crowd Entity Resolution, taking into account key observations about the advantages and disadvantages of the two interfaces. |
78 | ZipG: A Memory-efficient Graph Store for Interactive Queries | Anurag Khandelwal, Zongheng Yang, Evan Ye, Rachit Agarwal, Ion Stoica | We present ZipG, a distributed memory-efficient graph store for serving interactive graph queries. |
79 | All-in-One: Graph Processing in RDBMSs Revisited | Kangfei Zhao, Jeffrey Xu Yu | In this paper, we focus on RDBM, which has been well studied over decades to manage large datasets, and we revisit the issue how RDBM can support graph processing at the SQL level. |
80 | Computing A Near-Maximum Independent Set in Linear Time by Reducing-Peeling | Lijun Chang, Wei Li, Wenjie Zhang | Observing that the existing techniques have various limits, in this paper, we aim to develop efficient algorithms (with linear or near-linear time complexity) that can generate a high-quality (large-size) independent set from a graph in practice. |
81 | Utility-Aware Ridesharing on Road Networks | Peng Cheng, Hao Xin, Lei Chen | To assign a new rider to a given vehicle, we propose an efficient algorithm with a minimum increase in travel cost without reordering the existing schedule of the vehicle. |
82 | Distance Oracle on Terrain Surface | Victor Junqiu Wei, Raymond Chi-Wing Wong, Cheng Long, David M. Mount | In this paper, we study the shortest distance query which is to find the shortest distance between a point-of-interest and another point-of-interest on the surface of the terrain due to a variety of applications. |
83 | Efficient Computation of Top-k Frequent Terms over Spatio-temporal Ranges | Pritom Ahmed, Mahbub Hasan, Abhijith Kashyap, Vagelis Hristidis, Vassilis J. Tsotras | In this paper we study a basic analytics query on geotagged data, namely: given a spatiotemporal region, find the most frequent terms among the social posts in that region. |
84 | Optimizing Iceberg Queries with Complex Joins | Brett Walenz, Sudeepa Roy, Jun Yang | This paper proposes a framework for combining a number of techniques—a-priori, memoization, and pruning—to optimize iceberg queries with complex joins. |
85 | The Dynamic Yannakakis Algorithm: Compact and Efficient Query Processing Under Updates | Muhammad Idris, Martin Ugarte, Stijn Vansummeren | In this paper, we show that the full materialization of results is a barrier for more general optimization strategies. |
86 | Revisiting Reuse in Main Memory Database Systems | Kayhan Dursun, Carsten Binnig, Ugur Cetintemel, TIm Kraska | We focus on hash tables, the most commonly used internal data structure in main memory databases to perform join and aggregation operations. |
87 | Pufferfish Privacy Mechanisms for Correlated Data | Shuang Song, Yizhen Wang, Kamalika Chaudhuri | Since this mechanism may be computationally inefficient, we provide an additional mechanism that applies to some practical cases such as physical activity measurements across time, and is computationally efficient. |
88 | Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics | Xi Wu, Fengan Li, Arun Kumar, Kamalika Chaudhuri, Somesh Jha, Jeffrey Naughton | We address this challenge by providing a novel analysis of the L2-sensitivity of SGD, which allows, under the same privacy guarantees, better convergence of SGD when only a constant number of passes can be made over the data. |
89 | Pythia: Data Dependent Differentially Private Algorithm Selection | Ios Kotsogiannis, Ashwin Machanavajjhala, Michael Hay, Gerome Miklau | We address this challenge by proposing a novel meta-algorithm designed to relieve the data curator of the burden of algorithm selection. |
90 | Utility Cost of Formal Privacy for Releasing National Employer-Employee Statistics | Samuel Haney, Ashwin Machanavajjhala, John M. Abowd, Matthew Graham, Mark Kutzbach, Lars Vilhuber | In this work, we present novel algorithms for releasing tabular summaries of linked ER-EE data with formal, provable guarantees of privacy. |
91 | Online Deduplication for Databases | Lianghong Xu, Andrew Pavlo, Sudipta Sengupta, Gregory R. Ganger | dbDedup’s single-pass encoding method can be integrated into the storage and logging components of a DBMS to provide two benefits: (1) reduced size of data stored on disk beyond what traditional compression schemes provide, and (2) reduced amount of data transmitted over the network for replication services. |
92 | QFix: Diagnosing Errors through Query Histories | Xiaolan Wang, Alexandra Meliou, Eugene Wu | In this paper, we propose QFix, a framework that derives explanations and repairs for discrepancies in relational data, by analyzing the effect of queries that operated on the data and identifying potential mistakes in those queries. |
93 | UGuide: User-Guided Discovery of FD-Detectable Errors | Saravanan Thirumuruganathan, Laure Berti-Equille, Mourad Ouzzani, Jorge-Arnulfo Quiane-Ruiz, Nan Tang | In this paper, we propose an end-to-end solution to detect FD-detectable errors from dirty data. |
94 | SLiMFast: Guaranteed Results for Data Fusion and Source Reliability | Theodoros Rekatsinas, Manas Joglekar, Hector Garcia-Molina, Aditya Parameswaran, Christopher Ré | We propose SLiMFast, a framework that expresses data fusion as a statistical learning problem over discriminative probabilistic models, which in many cases correspond to logistic regression. |
95 | Crowdsourced Top-k Queries by Confidence-Aware Pairwise Judgments | Ngai Meng Kou, Yan Li, Hao Wang, Leong Hou U., Zhiguo Gong | In this work, we attempt to revisit the crowdsourced processing of the top-k queries, aiming at (1) securing the quality of crowdsourced comparisons by a certain confidence level and (2) minimizing the total monetary cost. |
96 | Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services | Sanjib Das, Paul Suganthan G.C., AnHai Doan, Jeffrey F. Naughton, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, Vijay Raghavendra, Youngchoon Park | We propose Falcon, a solution that scales up the hands-off crowdsourced EM approach of Corleone, using RDBMS-style query execution and optimization over a Hadoop cluster. |
97 | CrowdDQS: Dynamic Question Selection in Crowdsourcing Systems | Asif R. Khan, Hector Garcia-Molina | In this paper, we present CrowdDQS, a system that uses the most recent set of crowdsourced voting evidence to dynamically issue questions to workers on Amazon Mechanical Turk (AMT). |
98 | CDB: Optimizing Queries with Crowd-Based Selections and Joins | Guoliang Li, Chengliang Chai, Ju Fan, Xueping Weng, Jian Li, Yudian Zheng, Yuanbing Li, Xiang Yu, Xiaohang Zhang, Haitao Yuan | To address the limitations, we develop a crowd-powered database system CDB that supports crowd-based query optimizations, with focus on join and selection. We have also created a benchmark for evaluating crowd-powered databases. |
99 | Scaling Locally Linear Embedding | Yasuhiro Fujiwara, Naoki Marumo, Mathieu Blondel, Koh Takeuchi, Hideaki Kim, Tomoharu Iwata, Naonori Ueda | Our approach, Ripple, is based on two ideas: (1) it incrementally updates the edge weights by exploiting the Woodbury formula and (2) it efficiently computes eigenvectors of the LLE kernel by exploiting the LU decomposition-based inverse power method. |
100 | Dynamic Density Based Clustering | Junhao Gan, Yufei Tao | Motivated by the above, we investigate the algorithmic principles for dynamic clustering by DBSCAN, a successful representative of density-based clustering, and ρ-approximate DBSCAN, proposed to bring down the computational hardness of the former on static data. |
101 | Extracting Top-K Insights from Multi-dimensional Data | Bo Tang, Shi Han, Man Lung Yiu, Rui Ding, Dongmei Zhang | We propose a meaningful scoring function for insights to address (i). |
102 | QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and Skew-Tolerant Space-Filling Curves | Shoji Nishimura, Haruo Yokota | We propose a framework that involves a multidimensional indexing technique based on a space-filling curve. |
103 | Leveraging Re-costing for Online Optimization of Parameterized Queries with Guarantees | Anshuman Dutt, Vivek Narasayya, Surajit Chaudhuri | We propose a plan re-costing based approach that enables us to perform well on all three metrics. |
104 | Handling Environments in a Nested Relational Algebra with Combinators and an Implementation in a Verified Query Compiler | Joshua S. Auerbach, Martin Hirzel, Louis Mandel, Avraham Shinnar, Jérôme Siméon | This paper proposes NRAe, an extension of a combinators-based nested relational algebra (NRA) with built-in support for environments. |
105 | From In-Place Updates to In-Place Appends: Revisiting Out-of-Place Updates on Flash | Sergey Hardock, Ilia Petrov, Robert Gottstein, Alejandro Buchmann | In this paper we propose an approach that transforms those small in-place updates into small update deltas that are appended to the original page. |
106 | Visual Graph Query Construction and Refinement | Robert Pienta, Fred Hohman, Acar Tamersoy, Alex Endert, Shamkant Navathe, Hanghang Tong, Duen Horng Chau | We will present the first demonstration of VISAGE, an interactive visual graph querying approach that empowers analysts to construct expressive queries, without writing complex code (see our video: https://youtu.be/l2L7Y5mCh1s). |
107 | Demonstration of the Cosette Automated SQL Prover | Shumo Chu, Daniel Li, Chenglong Wang, Alvin Cheung, Dan Suciu | Demonstration of the Cosette Automated SQL Prover |
108 | Interactive Time Series Analytics Powered by ONEX | Rodica Neamtu, Ramoza Ahsan, Charles Lovering, Cuong Nguyen, Elke Rundensteiner, Gabor Sarkozy | The ONEX (Online Exploration of Time Series) system supports effective exploratory analysis of time series collections composed of heterogeneous, variable-length and misaligned time series using robust alignment dynamic time warping (DTW) methods. |
109 | The VADA Architecture for Cost-Effective Data Wrangling | Nikolaos Konstantinou, Martin Koehler, Edward Abel, Cristina Civili, Bernd Neumayr, Emanuel Sallinger, Alvaro A.A. Fernandes, Georg Gottlob, John A. Keane, Leonid Libkin, Norman W. Paton | In this paper, we present an architecture that supports a complete data wrangling lifecycle, orchestrates components dynamically, builds on automation wherever possible, is informed by whatever data is available, refines automatically produced results in the light of feedback, takes into account the user’s priorities, and supports data scientists with diverse skill sets. |
110 | A Demonstration of Lusail: Querying Linked Data at Scale | Essam Mansour, Ibrahim Abdelaziz, Mourad Ouzzani, Ashraf Aboulnaga, Panos Kalnis | We will demonstrate Lusail; a system that supports the need of emerging applications to access tens to hundreds of geo-distributed datasets. |
111 | Foofah: A Programming-By-Example System for Synthesizing Data Transformation Programs | Zhongjun Jin, Michael R. Anderson, Michael Cafarella, H. V. Jagadish | We built a system called FOOFAH for helping the user easily synthesize a desired data transformation program. |
112 | Virtualized Network Service Topology Exploration Using Nepal | Pramod Jamkhedkar, Theodore Johnson, Yaron Kanza, Aman Shaikh, N.K. Shankarnarayanan, Vladislav Shkapenyuk, Gordon Woodhull | In this demonstration we present Nepal — a network path query language which is designed to effectively retrieve desired paths from a network graph. |
113 | VisualCloud Demonstration: A DBMS for Virtual Reality | Brandon Haynes, Artem Minyaylov, Magdalena Balazinska, Luis Ceze, Alvin Cheung | We demonstrate VisualCloud, a database management system designed to efficiently ingest, store, and deliver virtual reality (VR) content at scale. |
114 | The Best of Both Worlds: Big Data Programming with Both Productivity and Performance | Fan Yang, Yuzhen Huang, Yunjian Zhao, Jinfeng Li, Guanxian Jiang, James Cheng | In [7] our prior work, we proposed Husky which provides a highly expressive API to solve the above dilemma. |
115 | In-Browser Interactive SQL Analytics with Afterburner | Kareem El Gebaly, Jimmy Lin | On the TPC-H benchmark, we show that Afterburner achieves comparable performance to MonetDB running natively on the same machine. |
116 | Debugging Big Data Analytics in Spark with BigDebug | Muhammad Ali Gulzar, Matteo Interlandi, Tyson Condie, Miryung Kim | Debugging Big Data Analytics in Spark with BigDebug |
117 | Interactive Query Synthesis from Input-Output Examples | Chenglong Wang, Alvin Cheung, Rastislav Bodik | Interactive Query Synthesis from Input-Output Examples |
118 | Generating Concise Entity Matching Rules | Rohit Singh, Vamsi Meduri, Ahmed Elmagarmid, Samuel Madden, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, Armando Solar-Lezama, Nan Tang | We model EM rules in the form of General Boolean Formulas (GBFs) that allows arbitrary attribute matching combined by conjunctions (∨), disjunctions (∧), and negations. |
119 | A Demo of the Data Civilizer System | Raul Castro Fernandez, Dong Deng, Essam Mansour, Abdulhakim A. Qahtan, Wenbo Tao, Ziawasch Abedjan, Ahmed Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, Nan Tang | We propose to demonstrate DATA CIVILIZER to ease the pain faced in analyzing data "in the wild". |
120 | Querying and Exploring Polygamous Relationships in Urban Spatio-Temporal Data Sets | Yeuk-Yin Chan, Fernando Chirigati, Harish Doraiswamy, Cláudio T. Silva, Juliana Freire | In this demo, we show how visualization can help in the discovery of relationships that are potentially interesting by allowing users to query and explore the relationship set in an intuitive way. |
121 | Graph Data Mining with Arabesque | Eslam Hussein, Abdurrahman Ghanem, Vinicius Vitor dos Santos Dias, Carlos H.C. Teixeira, Ghadeer AbuOda, Marco Serafini, Georgos Siganos, Gianmarco De Francisci Morales, Ashraf Aboulnaga, Mohammed Zaki | These problems differ from other graph processing problems such as PageRank or shortest path in that graph data mining requires searching through an exponential number of subgraphs. |
122 | Alpine: Efficient In-Situ Data Exploration in the Presence of Updates | Antonios Anagnostou, Matthaios Olma, Anastasia Ailamaki | We present Alpine, our prototype implementation, which combines the tuner with a query executor incorporating in situ query techniques to provide efficient raw data access. |
123 | OrpheusDB: A Lightweight Approach to Relational Dataset Versioning | Liqi Xu, Silu Huang, Sili Hui, Aaron J. Elmore, Aditya Parameswaran | We demonstrate OrpheusDB, a lightweight approach to versioning of relational datasets. |
124 | doppioDB: A Hardware Accelerated Database | David Sidler, Zsolt Istvan, Muhsen Owaida, Kaan Kara, Gustavo Alonso | We present doppioDB which consists of MonetDB, a main-memory column store, extended with Hardware User Defined Functions (HUDFs). |
125 | DBridge: Translating Imperative Code to SQL | K. Venkatesh Emani, Tejas Deshpande, Karthik Ramachandra, S. Sudarshan | We show the performance gains achieved by employing our system on real world applications that use JDBC or Hibernate. |
126 | BEAS: Bounded Evaluation of SQL Queries | Yang Cao, Wenfei Fan, Yanghao Wang, Tengfei Yuan, Yanchao Li, Laura Yu Chen | We demonstrate BEAS, a prototype system for querying relations with bounded resources. |
127 | Safe Visual Data Exploration | Zheguang Zhao, Emanuel Zgraggen, Lorenzo De Stefani, Carsten Binnig, Eli Upfal, Tim Kraska | Thus without proper statistical control, the risk of false discovery renders visual data exploration unsafe and makes users susceptible to questionable inference.To address these problems, we present QUDE, a visual data exploration system that interacts with users to formulate hypotheses based on visualizations and provides interactive control of false discoveries. |
128 | Optimizing Data-Intensive Applications Automatically By Leveraging Parallel Data Processing Frameworks | Maaz Bin Safeer Ahmad, Alvin Cheung | In our interactive presentation, we will use CASPER to optimize sequential implementations of data visualization programs as well as image processing kernels. |
129 | DIAS: Differentially Private Interactive Algorithm Selection using Pythia | Ios Kotsogiannis, Michael Hay, Ashwin Machanavajjhala, Gerome Miklau, Margaret Orr | In this demonstration we present DIAS (Differentially-private Interactive Algorithm Selection), an educational privacy game. |
130 | Snorkel: Fast Training Set Generation for Information Extraction | Alexander J. Ratner, Stephen H. Bach, Henry R. Ehrenberg, Chris Ré | State-of-the art machine learning methods such as deep learning rely on large sets of hand-labeled training data. |
131 | Synthesizing Extraction Rules from User Examples with SEER | Maeda F. Hanafi, Azza Abouzied, Laura Chiticariu, Yunyao Li | SEER’s design principles and learning algorithm are motivated by how rule developers naturally construct data extraction rules. |
132 | Scout: A GPU-Aware System for Interactive Spatio-temporal Data Visualization | Harshada Chavan, Mohamed F. Mokbel | We use real data sets to demonstrate scalability and important features of Scout. |
133 | Graphflow: An Active Graph Database | Chathura Kankanamge, Siddhartha Sahu, Amine Mhedbhi, Jeremy Chen, Semih Salihoglu | At the core of Graphflow’s query processor are two worst-case optimal join algorithms called Generic Join and our new Delta Generic Join algorithm for one-time and continuous subgraph queries, respectively. |
134 | Demonstration: MacroBase, A Fast Data Analysis Engine | Peter Bailis, Edward Gan, Kexin Rong, Sahaana Suri | To address this gap, we have developed MacroBase, a fast data analytics engine that acts as a search engine over fast data streams. |
135 | Q*cert: A Platform for Implementing and Verifying Query Compilers | Joshua S. Auerbach, Martin Hirzel, Louis Mandel, Avraham Shinnar, Jérôme Siméon | We present Q*cert, a platform for the specification, verification, and implementation of query compilers written using the Coq proof assistant. |
136 | A Demonstration of Interactive Analysis of Performance Measurements with Viska | Helga Gudmundsdottir, Babak Salimi, Magdalena Balazinska, Dan R.K. Ports, Dan Suciu | We make this goal easier to achieve with Viska, a new tool for generating and interpreting performance measurement results. |
137 | Crowdsourced Data Management: Overview and Challenges | Guoliang Li, Yudian Zheng, Ju Fan, Jiannan Wang, Reynold Cheng | In this tutorial, we will survey and synthesize a wide spectrum of existing studies on crowdsourced data management. Finally, we provide the emerging challenges. |
138 | Data Management in Machine Learning: Challenges, Techniques, and Systems | Arun Kumar, Matthias Boehm, Jun Yang | This tutorial provides a comprehensive review of such systems and analyzes key data management challenges and techniques. |
139 | Data Management Challenges in Production Machine Learning | Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang, Martin Zinkevich | The goal of the tutorial is to bring forth these issues, draw connections to prior work in the database literature, and outline the open research questions that are not addressed by prior art. |
140 | Differential Privacy in the Wild: A Tutorial on Current Practices & Open Challenges | Ashwin Machanavajjhala, Xi He, Michael Hay | In the second half of the tutorial we will highlight real world applications on complex data types, and identify research challenges in applying differential privacy to real world applications. |
141 | Graph Querying Meets HCI: State of the Art and Future Directions | Sourav S. Bhowmick, Byron Choi, Chengkai Li | In this tutorial, we survey recent developments in the emerging area of visual graph querying paradigm that bridges traditional graph querying with human computer interaction (HCI). |
142 | Graph Exploration: From Users to Large Graphs | Davide Mottin, Emmanuel Müller | In this tutorial, we will discuss a set of techniques, which have been developed in the last few years for independent purposes, within a unified graph exploration taxonomy. |
143 | Building Structured Databases of Factual Knowledge from Massive Text Corpora | Xiang Ren, Meng Jiang, Jingbo Shang, Jiawei Han | In this tutorial, we introduce data-driven methods on mining structured facts (i.e., entities and their relations/attributes for types of interest) from massive text corpora, to construct structured databases of factual knowledge (called StructDBs). |
144 | Data Profiling: A Tutorial | Ziawasch Abedjan, Lukasz Golab, Felix Naumann | In this tutorial, we highlight the importance of data profiling as part of any data-related use-case, and we discuss the area of data profiling by classifying data profiling tasks and reviewing the state-of-the-art data profiling systems and techniques. |
145 | How to Build a Non-Volatile Memory Database Management System | Joy Arulraj, Andrew Pavlo | In this tutorial, we provide an outline on how to build a new DBMS given the changes to hardware landscape due to NVM. |
146 | Data Structure Engineering For Byte-Addressable Non-Volatile Memory | Ismail Oukid, Wolfgang Lehner | In this tutorial we will dissect SCM challenges and provide an in-depth view of existing programming models that circumvent them, as well as novel data structures that stem from these models. |
147 | Natural Language Data Management and Interfaces: Recent Development and Open Challenges | Yunyao Li, Davood Rafiei | The tutorial presents state-of-the-art methods, related systems, research opportunities and challenges covering both areas. |
148 | Hybrid Transactional/Analytical Processing: A Survey | Fatma Özcan, Yuanyuan Tian, Pinar Tözün | The goal of this tutorial is to 1-) quickly review the historical progression of OLTP and OLAP systems, 2-) discuss the driving factors for HTAP, and finally 3-) provide a deep technical analysis of existing and emerging HTAP solutions, detailing their key architectural differences and trade-offs. |
149 | Query Processing Techniques for Big Spatial-Keyword Data | Ahmed Mahmood, Walid G. Aref | We describe the main models for big spatial-keyword processing, and list the popular spatial-keyword queries. |