Paper Digest: SIGMOD 2018 Highlights

June 16, 2018June 26, 2020 admin

The ACM Special Interest Group on Management of Data (SIGMOD) is one of the top conferences on database management systems and data management technology.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: SIGMOD 2018 Papers

	Title	Authors	Highlight
1	Kubernetes and the New Cloud	Eric Brewer	Kubernetes and the New Cloud
2	Robust Entity Resolution using Random Graphs	Sainyam Galhotra, Donatella Firmani, Barna Saha, Divesh Srivastava	Our contribution is a general error correction tool that can be leveraged by a variety of hybrid-human machine ER algorithms, based on a formal way for selecting indirect "control queries”.
3	Deep Learning for Entity Matching: A Design Space Exploration	Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, Vijay Raghavendra	In this paper we examine applying deep learning (DL) to EM, to understand DL’s benefits and limitations.
4	Synthesizing Type-Detection Logic for Rich Semantic Data Types using Open-source Code	Cong Yan, Yeye He	We developed AUTOTYPE from open-source repositories like GitHub.
5	Fine-grained Concept Linking using Neural Networks in Healthcare	Jian Dai, Meihui Zhang, Gang Chen, Ju Fan, Kee Yuan Ngiam, Beng Chin Ooi	To address these challenges, we propose a Neural Concept Linking (NCL) approach for accurate concept linking using systematically integrated neural networks.
6	Big Data Linkage for Product Specification Pages	Disheng Qiu, Luciano Barbosa, Valter Crescenzi, Paolo Merialdo, Divesh Srivastava	In this paper, we take advantage of the opportunity that sources publish product identifiers to perform big data linkage across sources at the beginning of the data integration pipeline, before schema alignment.
7	The Data Interaction Game	Ben McCamish, Vahid Ghadakchi, Arash Termehchy, Behrouz Touri, Liang Huang	We propose a reinforcement learning method that learns and answers the information needs behind queries and adapts to the changes in users’ strategies and prove that it improves the effectiveness of answering queries stochastically speaking.
8	Data Citation: Giving Credit Where Credit is Due	Yinjun Wu, Abdussalam Alawini, Susan B. Davidson, Gianmaria Silvello	We present three approaches to implementing citation views and describe alternative policies for the joint, alternate and aggregated use of citation views.
9	EKTELO: A Framework for Defining Differentially-Private Computations	Dan Zhang, Ryan McKenna, Ios Kotsogiannis, Michael Hay, Ashwin Machanavajjhala, Gerome Miklau	We propose a novel programming framework and system, Ektelo, for implementing both existing and new privacy algorithms.
10	Marginal Release Under Local Differential Privacy	Graham Cormode, Tejas Kulkarni, Divesh Srivastava	In this paper, we provide a set of algorithms for materializing marginal statistics under the strong model of local differential privacy.
11	When Query Authentication Meets Fine-Grained Access Control: A Zero-Knowledge Approach	Cheng Xu, Jianliang Xu, Haibo Hu, Man Ho Au	In this paper, we take the first step toward studying the problem of authenticating relational queries with fine-grained access control.
12	Practical and Secure Substring Search	Florian Hahn, Nicolas Loza, Florian Kerschbaum	In this paper we address the problem of outsourcing sensitive strings while still providing the functionality of substring searches.
13	Columnstore and B+ tree – Are Hybrid Physical Designs Important?	Adam Dziedzic, Jingjing Wang, Sudipto Das, Bolin Ding, Vivek R. Narasayya, Manoj Syamala	We extend the Database Engine Tuning Advisor for Microsoft SQL Server to recommend a suitable combination of B+ tree and columnstore indexes for a given workload.
14	Computation Reuse in Analytics Job Service at Microsoft	Alekh Jindal, Shi Qiao, Hiren Patel, Zhicheng Yin, Jieming Di, Malay Bag, Marc Friedman, Yifung Lin, Konstantinos Karanasos, Sriram Rao	In this paper, we describe a computation reuse framework, coined CLOUDVIEWS, which we built to address the computation overlap problem in Microsoft’s SCOPE job service.
15	P-Store: An Elastic Database System with Predictive Provisioning	Rebecca Taft, Nosayba El-Sayed, Marco Serafini, Yu Lu, Ashraf Aboulnaga, Michael Stonebraker, Ricardo Mayerhofer, Francisco Andrade	We present P-Store, the first elastic OLTP DBMS to use prediction, and apply it to the workload of B2W Digital (B2W), a large online retailer.
16	Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources	Edmon Begoli, Jesús Camacho-Rodríguez, Julian Hyde, Michael J. Mior, Daniel Lemire	The goal of this paper is to formally introduce Calcite to the broader research community, brie y present its history, and describe its architecture, features, functionality, and patterns for adoption.
17	Carousel: Low-Latency Transaction Processing for Globally-Distributed Data	Xinan Yan, Linguan Yang, Hongbo Zhang, Xiayue Charles Lin, Bernard Wong, Kenneth Salem, Tim Brecht	This paper introduces Carousel, a distributed database system that provides low-latency transaction processing for multi-partition globally-distributed transactions.
18	Accelerating Analytical Processing in MVCC using Fine-Granular High-Frequency Virtual Snapshotting	Ankur Sharma, Felix Martin Schuhknecht, Jens Dittrich	Inside both execution engines, we still apply MVCC.
19	Reactors: A Case for Predictable, Virtualized Actor Database Systems	Vivek Shah, Marcos Antonio Vaz Salles	In this paper, we propose a relational actor programming model for in-memory databases as a novel, holistic approach towards fulfilling these challenging requirements.
20	FASTER: A Concurrent Key-Value Store with In-Place Updates	Badrish Chandramouli, Guna Prasaad, Donald Kossmann, Justin Levandoski, James Hunter, Mike Barnett	This paper presents FASTER, a new key-value store for point read, blind update, and read-modify-write operations.
21	Workload-Aware CPU Performance Scaling for Transactional Database Systems	Mustafa Korkmaz, Martin Karsten, Kenneth Salem, Semih Salihoglu	In this paper, we show that transactional database systems can manage DVFS more effectively than the underlying operating system.
22	How to Architect a Query Compiler, Revisited	Ruby Y. Tahboub, Grégory M. Essertel, Tiark Rompf	We aim to contribute to this discussion by drawing attention to an old but underappreciated idea known as Futamura projections, which fundamentally link interpreters and compilers.
23	SuRF: Practical Range Query Filtering with Fast Succinct Tries	Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G. Andersen, Michael Kaminsky, Kimberly Keeton, Andrew Pavlo	We present the Succinct Range Filter (SuRF), a fast and compact data structure for approximate membership tests.
24	FastQRE: Fast Query Reverse Engineering	Dmitri V. Kalashnikov, Laks V.S. Lakshmanan, Divesh Srivastava	In this work we propose a novel approach for solving the QRE problem efficiently.
25	Adaptive Energy-Control for In-Memory Database Systems	Thomas Kissinger, Dirk Habich, Wolfgang Lehner	Thus, we propose the Energy-Control Loop (ECL) as an DBMS-integrated approach for adaptive energy-control on scale-up in-memory database systems that obeys a query latency limit as a soft constraint and actively optimizes energy efficiency and performance of the DBMS.
26	Incremental View Maintenance with Triple Lock Factorization Benefits	Milos Nikolic, Dan Olteanu	We introduce F-IVM, a unified incremental view maintenance (IVM) approach for a variety of tasks, including gradient computation for learning linear regression models over joins, matrix chain multiplication, and factorized evaluation of conjunctive queries.
27	Catching Numeric Inconsistencies in Graphs	Wenfei Fan, Xueli Liu, Ping Lu, Chao Tian	To catch such errors, we propose to extend graph functional dependencies with linear arithmetic expressions and comparison predicates, referred to as NGDs.
28	TurboGraph++: A Scalable and Fast Graph Analytics System	Seongyun Ko, Wook-Shin Han	We present TurboGraph++, a scalable and fast graph analytics system which efficiently processes large graphs by exploiting external memory for scale-up without compromising efficiency.
29	TurboFlux: A Fast Continuous Subgraph Matching System for Streaming Graph Data	Kyoungmin Kim, In Seo, Wook-Shin Han, Jeong-Hoon Lee, Sungpack Hong, Hassan Chafi, Hyungyu Shin, Geonhwa Jeong	We present a fast continuous subgraph matching system called TurboFlux which provides high throughput over a fast graph update stream.
30	Discovering Graph Functional Dependencies	Wenfei Fan, Chunming Hu, Xueli Liu, Ping Lu	We introduce notions of reduced GFDs and their topological support, and formalize the discovery problem for GFDs.
31	TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs	Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Shuo Shang, Ji-Rong Wen	To address the deficiencies of existing solutions, we propose PPR, an algorithm for top-k PPR queries that ensure at least ρ precision (i.e., at least ρ fraction of the actual top-k results are returned) with at least 1 – 1/n probability, where ρ ∈;n (0, 1] is a user-specified parameter and n is the number of nodes in G.
32	Skyline Community Search in Multi-valued Networks	Rong-Hua Li, Lu Qin, Fanghua Ye, Jeffrey Xu Yu, Xiaokui Xiao, Nong Xiao, Zibin Zheng	To capture d numerical attributes, we propose a novel community model, called skyline community, based on the concepts of k-core and skyline.
33	Building a Bw-Tree Takes More Than Just Buzz Words	Ziqi Wang, Andrew Pavlo, Hyeontaek Lim, Viktor Leis, Huanchen Zhang, Michael Kaminsky, David G. Andersen	In 2013, Microsoft Research proposed the Bw-Tree (humorously termed the "Buzz Word Tree”), a lock-free index that provides high throughput for transactional database workloads in SQL Server’s Hekaton engine.
34	The Case for Learned Index Structures	Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis	In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term \em learned indexes.
35	Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging	Niv Dayan, Stratos Idreos	In this paper, we show that all mainstream LSM-tree based key-value stores in the literature and in industry are suboptimal with respect to how they trade off among the I/O costs of updates, point lookups, range lookups, as well as the cost of storage, measured as space-amplification.
36	HOT: A Height Optimized Trie Index for Main-Memory Database Systems	Robert Binna, Eva Zangerle, Martin Pichl, Günther Specht, Viktor Leis	We present the Height Optimized Trie (HOT), a fast and space-efficient in-memory index structure.
37	The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models	Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Michael S. Kester, Demi Guo	We present a design engine, the Data Calculator, which enables interactive and semi-automated design of data structures.
38	A Comparative Study of Secondary Indexing Techniques in LSM-based NoSQL Databases	Mohiuddin Abdul Qader, Shiwen Cheng, Vagelis Hristidis	In this paper, we present a taxonomy of NoSQL secondary indexes, broadly split into two classes: Embedded Indexes (i.e. lightweight filters embedded inside the primary table) and Stand-Alone Indexes (i.e. separate data structures).
39	Efficient Selection of Geospatial Data on Maps for Interactive and Visualized Exploration	Tao Guo, Kaiyu Feng, Gao Cong, Zhifeng Bao	We propose that such systems should support the following desirable features: representativeness, visibility constraint, zooming consistency, and panning consistency.
40	Pinot: Realtime OLAP for 530 Million Users	Jean-François Im, Kishore Gopalakrishna, Subbu Subramaniam, Mayank Shrivastava, Adwait Tumbde, Xiaotian Jiang, Jennifer Dai, Seunghyun Lee, Neha Pawar, Jialiang Li, Ravi Aringunram	We present Pinot, a single system used in production at Linkedin that can serve tens of thousands of analytical queries per second, offers near-realtime data ingestion from streaming data sources, and handles the operational requirements of large web properties.
41	Robust, Scalable, Real-Time Event Time Series Aggregation at Twitter	Peilin Yang, Srikanth Thiagarajan, Jimmy Lin	In this paper, we describe TSAR (TimeSeries AggregatoR), a robust, scalable, real-time event time series aggregation framework built primarily for engagement monitoring: aggregating interactions with Tweets, segmented along a multitude of dimensions such as device, engagement type, etc.
42	Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark	Michael Armbrust, Tathagata Das, Joseph Torres, Burak Yavuz, Shixiong Zhu, Reynold Xin, Ali Ghodsi, Ion Stoica, Matei Zaharia	We describe the system’s design and use cases from several hundred production deployments on Databricks, the largest of which process over 1 PB of data per month.
43	TcpRT: Instrument and Diagnostic Analysis System for Service Quality of Cloud Databases at Massive Scale in Real-time	Wei Cao, Yusong Gao, Bingchen Lin, Xiaojie Feng, Yu Xie, Xiao Lou, Peng Wang	This paper presents TcpRT, the instrument and diagnosis infrastructure in Alibaba Cloud RDS that achieves real-time anomaly detection.
44	Machine Learning for Data Management: Problems and Solutions	Pedro Domingos	Learning structure-in the case of Markov logic, a set of formulas in first-order logic-is intractable, as in more traditional representations, but can be done effectively using inductive logic programming techniques.
45	Query-based Workload Forecasting for Self-Driving Database Management Systems	Lin Ma, Dana Van Aken, Ahmed Hefny, Gustavo Mezerhane, Andrew Pavlo, Geoffrey J. Gordon	We present a robust forecasting framework called QueryBot 5000 that allows a DBMS to predict the expected arrival rate of queries in the future based on historical data.
46	RushMon: Real-time Isolation Anomalies Monitoring	Zechao Shang, Jeffrey Xu Yu, Aaron J. Elmore	In this paper, we tackle this problem.
47	On the Calculation of Optimality Ranges for Relational Query Execution Plans	Florian Wolf, Norman May, Paul R. Willems, Kai-Uwe Sattler	In this paper we analyze the deviation from the estimate, and denote the cardinality range of an intermediate result, where the optimal plan remains optimal as the optimality range.
48	Adaptive Optimization of Very Large Join Queries	Thomas Neumann, Bernhard Radke	This paper introduces an adaptive optimization framework that is able to solve most common join queries exactly, while simultaneously scaling to queries with thousands of joins.
49	Improving Join Reorderability with Compensation Operators	TaiNing Wang, Chee-Yong Chan	In this paper, we present a novel approach for this problem for the class of queries involving inner-joins, single-sided outerjoins, and/or antijoins.
50	When Hierarchy Meets 2-Hop-Labeling: Efficient Shortest Distance Queries on Road Networks	Dian Ouyang, Lu Qin, Lijun Chang, Xuemin Lin, Ying Zhang, Qing Zhu	To overcome the drawbacks of both solutions, in this paper, we propose a novel hierarchical 2-hop index (H2H-Index) which assigns a label for each vertex and at the same time preserves a hierarchy among all vertices.
51	DITA: Distributed In-Memory Trajectory Analytics	Zeyuan Shang, Guoliang Li, Zhifeng Bao	We propose an effective partitioning method, global index and local index, to address the data locality problem.
52	Cold Filter: A Meta-Framework for Faster and More Accurate Stream Processing	Yang Zhou, Tong Yang, Jie Jiang, Bin Cui, Minlan Yu, Xiaoming Li, Steve Uhlig	To enhance these algorithms, we propose a meta-framework, called Cold Filter (CF), that enables faster and more accurate stream processing.
53	Sketching Linear Classifiers over Data Streams	Kai Sheng Tai, Vatsal Sharan, Peter Bailis, Gregory Valiant	We introduce a new sub-linear space sketch—the Weight-Median Sketch—for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model.
54	Dynamic Pricing in Spatial Crowdsourcing: A Matching-Based Approach	Yongxin Tong, Libin Wang, Zimu Zhou, Lei Chen, Bowen Du, Jieping Ye	In the paper, we formally define this <u>G</u>lobal <u>D</u>ynamic <u>P</u>ricing(GDP) problem in spatial crowdsourcing.
55	Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes	Alexandre Verbitski, Anurag Gupta, Debanjan Saha, James Corey, Kamal Gupta, Murali Brahmadesam, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvilli, Xiaofeng Bao	In this paper, we describe how Aurora avoids distributed consensus under most circumstances by establishing invariants and leveraging local transient state.
56	Eon Mode: Bringing the Vertica Columnar Database to the Cloud	Ben Vandiver, Shreya Prasad, Pratibha Rana, Eden Zik, Amin Saeidi, Pratyush Parimal, Styliani Pantela, Jaimin Dave	Eon Mode: Bringing the Vertica Columnar Database to the Cloud
57	Survivability of Cloud Databases – Factors and Prediction	Jose Picado, Willis Lang, Edward C. Thayer	Given the large and diverse relational database population that Azure SQL DB has, we present a large-scale survivability study of our service and identify some factors that can demonstrably help predict the lifespan of cloud databases.
58	Rapid Adoption of Cloud Data Warehouse Technology Using Datometry Hyper-Q	Lyublena Antova, Derrick Bryant, Tuan Cao, Michael Duller, Mohamed A. Soliman, Florian M. Waas	In this paper, we present a next-generation virtualization technology that lets existing applications run natively on cloud-based database systems.
59	Lightweight Cardinality Estimation in LSM-based Systems	Ildar Absalyamov, Michael J. Carey, Vassilis J. Tsotras	In this paper we address the problem of computing data statistics for workloads with rapid data ingestion and propose a lightweight statistics-collection framework that exploits the properties of LSM storage.
60	Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation	Brian Hentschel, Michael S. Kester, Stratos Idreos	We present a new class of indexing scheme, termed a Column Sketch, which improves the performance of predicate evaluation independently of workload properties.
61	ZigZag: Supporting Similarity Queries on Vector Space Models	Wenhai Li, Lingfeng Deng, Yang Li, Chen Li	In this paper we study the problem of supporting similarity queries on a large number of records using a vector space model, where each record is a bag of tokens.
62	Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search	Yiqiu Wang, Anshumali Shrivastava, Jonathan Wang, Junghee Ryu	We present FLASH (F ast L SH A lgorithm for S imilarity search accelerated with H PC), a similarity search system for ultra-high dimensional datasets on a single machine, that does not require similarity computations and is tailored for high-performance computing platforms.
63	Overlap Set Similarity Joins with Theoretical Guarantees	Dong Deng, Yufei Tao, Guoliang Li	As the size boundary between the small sets and the large sets is crucial to the efficiency, we propose an effective size boundary selection algorithm to judiciously choose an appropriate size boundary, which works very well in practice.
64	BOOMER: Blending Visual Formulation and Processing of P -Homomorphic Queries on Large Networks	Yinglong Song, Huey Eng Chua, Sourav S. Bhowmick, Byron Choi, Shuigeng Zhou	In this paper, we explore how this vision can be realized for more generic but complex 1-1 p-homomorphic p-hom) queries introduced by Fan et al.
65	Navigating the Data Lake with DATAMARAN: Automatically Extracting Structure from Log Datasets	Yihan Gao, Silu Huang, Aditya Parameswaran	We present DATAMARAN, an tool that extracts structure from semi-structured log datasets with no human supervision.
66	Efficient k-Regret Query Algorithm with Restriction-free Bound for any Dimensionality	Min Xie, Raymond Chi-Wing Wong, Jian Li, Cheng Long, Ashwin Lall	In this paper, we present the lower bound of the maximum regret ratio for the k -regret query.
67	A Rating-Ranking Method for Crowdsourced Top-k Computation	Kaiyu Li, Xiaohang Zhang, Guoliang Li	To address this problem, we propose a rating-ranking-based approach, which contains two types of questions to ask the crowd.
68	Online Processing Algorithms for Influence Maximization	Jing Tang, Xueyan Tang, Xiaokui Xiao, Junsong Yuan	Motivated by this, we propose a new algorithm for OPIM with both superior empirical effectiveness and strong theoretical guarantees, and we show that it can also be extended to handle conventional influence maximization.
69	Top-k Sorting Under Partial Order Information	Eyal Dushkin, Tova Milo	In light of this, we present a dedicated algorithm for top-k sorting that aims to minimize the number of comparisons by thoroughly leveraging the partial order information.
70	Bias in OLAP Queries: Detection, Explanation, and Removal	Babak Salimi, Johannes Gehrke, Dan Suciu	In this paper, we propose, a system to detect, explain, and to resolve bias in decision-support queries.
71	Persistent Bloom Filter: Membership Testing for the Entire History	Yanqing Peng, Jinwei Guo, Feifei Li, Weining Qian, Aoying Zhou	It is possible to support such "temporal membership testing" using a BF, but we will show that this is fairly expensive.
72	Matrix Profile X: VALMOD – Scalable Discovery of Variable-Length Motifs in Data Series	Michele Linardi, Yan Zhu, Themis Palpanas, Eamonn Keogh	In this work, we introduce VALMOD, an exact and scalable motif discovery algorithm that efficiently finds all motifs in a given range of lengths.
73	Fast Euclidean OPTICS with Bounded Precision in Low Dimensional Space	Junhao Gan, Yufei Tao	As a side product, our algorithm gives an index structure that occupies linear space, and supports the cluster group-by query with near-optimal cost.
74	The Cascading Analysts Algorithm	Matthias Ruhl, Mukund Sundararajan, Qiqi Yan	Given a change in such a metric, our goal is to identify a small set of non-overlapping data segments that account for a majority of the change.
75	Finding Seeds and Relevant Tags Jointly: For Targeted Influence Maximization in Social Networks	Xiangyu Ke, Arijit Khan, Gao Cong	In this work, we model such practical constraints, and investigate the novel problem of jointly finding the top-k seed nodes and the top- r relevant tags that maximize the influence inside a target set of users.
76	Efficient Algorithms for Finding Approximate Heavy Hitters in Personalized PageRanks	Sibo Wang, Yufei Tao	In this paper, we propose BLOG, an efficient framework for three types of heavy hitter queries: the pairwise approximate heavy hitter (AHH), the reverse AHH, and the multi-source reverse AHH queries.
77	Data Sketches for Disaggregated Subset Sum and Frequent Item Estimation	Daniel Ting	We introduce and study a new data sketch for processing massive datasets.
78	Adaptive Asynchronous Parallelization of Graph Algorithms	Wenfei Fan, Ping Lu, Xiaojian Luo, Jingbo Xu, Qiang Yin, Wenyuan Yu, Ruiqi Xu	This paper proposes an Adaptive Asynchronous Parallel (AAP) model for graph computations.
79	Meta-Dataflows: Efficient Exploratory Dataflow Jobs	Raul Castro Fernandez, William Culhane, Pijika Watcharapichat, Matthias Weidlich, Victoria Lopez Morales, Peter Pietzuch	We describe meta-dataflows(MDFs), a new model to effectively express exploratory workflows and efficiently execute them on compute clusters.
80	RP-DBSCAN: A Superfast Parallel DBSCAN Algorithm Based on Random Partitioning	Hwanjun Song, Jae-Gil Lee	To remedy these problems, we propose a cell-based data partitioning scheme, pseudo random partitioning , that randomly distributes small cells rather than the points themselves.
81	PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development	Jia Zou, R. Matthew Barnett, Tania Lorido-Botran, Shangyu Luo, Carlos Monroy, Sourav Sikdar, Kia Teymourian, Binhang Yuan, Chris Jermaine	This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries.
82	Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications	Maaz Bin Safeer Ahmad, Alvin Cheung	We present Casper, a new tool that automatically translates sequential Java programs into the MapReduce paradigm.
83	DPaxos: Managing Data Closer to Users for Low-Latency and Mobile Applications	Faisal Nawab, Divyakant Agrawal, Amr El Abbadi	In this paper, we propose Dynamic Paxos (DPaxos), a Paxos-based consensus protocol to manage access to partitioned data across globally-distributed datacenters and edge nodes.
84	Submodularity of Distributed Join Computation	Rundong Li, Mirek Riedewald, Xinyan Deng	We show that minimizing load variance subject to a constraint on expectation is a monotone submodular maximization problem with Knapsack constraints, hence admitting provably near-optimal greedy solutions.
85	NashDB: An End-to-End Economic Method for Elastic Database Fragmentation, Replication, and Provisioning	Ryan Marcus, Olga Papaemmanouil, Sofiya Semenova, Solomon Garber	This paper introduces NashDB, an adaptive data distribution framework that relies on an economic model to automatically balance the supply and demand of data fragments, replicas, and cluster nodes.
86	SketchML: Accelerating Distributed Machine Learning with Data Sketches	Jiawei Jiang, Fangcheng Fu, Tong Yang, Bin Cui	In this paper, we study is there a compression method that can efficiently handle a sparse and nonuniform gradient consisting of key-value pairs?
87	MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis	Manasi Vartak, Joana M. F. da Trindade, Samuel Madden, Matei Zaharia	For intermediates that are stored in MISTIQUE, we propose a range of optimizations to reduce storage footprint including quantization, summarization, and data de-duplication.
88	Fonduer: Knowledge Base Construction from Richly Formatted Data	Sen Wu, Luke Hsiao, Xiao Cheng, Braden Hancock, Theodoros Rekatsinas, Philip Levis, Christopher Ré	We introduce Fonduer, a machine-learning-based KBC system for richly formatted data.
89	Maverick: Discovering Exceptional Facts from Knowledge Graphs	Gensheng Zhang, Damian Jimenez, Chengkai Li	We present Maverick, a general, extensible framework that discovers exceptional facts about entities in knowledge graphs.
90	A General and Efficient Querying Method for Learning to Hash	Jinfeng Li, Xiao Yan, Jian Zhang, An Xu, James Cheng, Jie Liu, Kelvin K. W. Ng, Ti-chung Cheng	We propose a new fine-grained similarity indicator, quantization distance (QD), which provides more information about the similarity between a query and the items in a bucket.
91	Subjective Knowledge Base Construction Powered By Crowdsourcing and Knowledge Base	Hao Xin, Rui Meng, Lei Chen	In our work, we propose a KBC framework for subjective knowledge base construction taking advantage of the knowledge from the crowd and existing KBs.
92	DimBoost: Boosting Gradient Boosting Decision Tree to Higher Dimensions	Jiawei Jiang, Bin Cui, Ce Zhang, Fangcheng Fu	In this paper, we ask "Can we build a scalable GBDT training system whose performance scales better with respect to dimensionality of the data?"
93	Auto-Detect: Data-Driven Error Detection in Tables	Zhipeng Huang, Yeye He	We propose \sj, a statistics-based technique that leverages co-occurrence statistics from large corpora for error detection, which is a significant departure from existing rule-based methods.
94	A Graph Database for a Virtualized Network Infrastructure	Pramod Jamkhedkar, Theodore Johnson, Yaron Kanza, Aman Shaikh, N. K. Shankaranarayanan, Vladislav Shkapenyuk	In this paper, we explore the database requirements for the management and troubleshooting of network services using VNF and SDN technologies.
95	RAPID: In-Memory Analytical Query Processing Engine with Extreme Performance per Watt	Cagri Balkesen, Nitin Kunal, Georgios Giannikis, Pit Fender, Seema Sundara, Felix Schmidt, Jarod Wen, Sandeep Agrawal, Arun Raghavan, Venkatanathan Varadarajan, Anand Viswanathan, Balakrishnan Chandrasekaran, Sam Idicula, Nipun Agarwal, Eric Sedlar	In this paper, we demonstrate through a carefully designed modern data processing system called RAPID and a simple, low-power processor specially tailored for data processing that at least an order of magnitude performance/power improvement in SQL processing can be achieved over a modern system running on today’s complex processors.
96	G-CORE: A Core for Future Graph Query Languages	Renzo Angles, Marcelo Arenas, Pablo Barcelo, Peter Boncz, George Fletcher, Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan Sequeda, Oskar van Rest, Hannes Voigt	We report on a community effort between industry and academia to shape the future of graph query languages.
97	Cypher: An Evolving Query Language for Property Graphs	Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, Andrés Taylor	We compare the features of Cypher to other property graph query languages, and describe extensions, at an advanced stage of development, which will form part of Cypher 10, turning the language into a compositional language which supports graph projections and multiple named graphs.
98	BIPie: Fast Selection and Aggregation on Encoded Data using Operator Specialization	Michal Nowakiewicz, Eric Boutin, Eric Hanson, Robert Walzer, Akash Katipally	In this paper, we demonstrate that there is tremendous room for improvement in the processing of analytical queries on modern commodity hardware.
99	VerdictDB: Universalizing Approximate Query Processing	Yongjoo Park, Barzan Mozafari, Joseph Sorenson, Junhao Wang	Therefore, we argue that a universal solution is needed: a database-agnostic approximation engine that will widen the reach of this emerging technology across various platforms.
100	AQP++: Connecting Approximate Query Processing With Aggregate Precomputation for Interactive Analytics	Jinglin Peng, Dongxiang Zhang, Jiannan Wang, Jian Pei	In this paper, we argue for the need to connect these two separate ideas for interactive analytics.
101	Accelerating Machine Learning Inference with Probabilistic Predicates	Yao Lu, Aakanksha Chowdhery, Srikanth Kandula, Surajit Chaudhuri	In this work, we demonstrate constructing and applying probabilistic predicates to filter data blobs that do not satisfy the query predicate; such filtering is parametrized to different target accuracies.
102	A Query Engine for Probabilistic Preferences	Uzi Cohen, Batya Kenig, Haoyue Ping, Benny Kimelfeld, Julia Stoyanovich	In this paper, we embark on the challenge of a practical realization of this framework.
103	Random Sampling over Joins Revisited	Zhuoyue Zhao, Robert Christensen, Feifei Li, Xiao Hu, Ke Yi	We analyze the properties of different instantiations and evaluate them against the baseline methods; the results clearly demonstrate the superiority of our new techniques.
104	Managing Non-Volatile Memory in Database Systems	Alexander van Renen, Viktor Leis, Alfons Kemper, Thomas Neumann, Takushi Hashida, Kazuichi Oe, Yoshiyasu Doi, Lilian Harada, Mitsuru Sato	In this work, we evaluate these two approaches and compare them with in-memory databases as well as more traditional buffer managers that use main memory as a cache in front of SSDs.
105	Efficient Top-K Query Processing on Massively Parallel Hardware	Anil Shanbhag, Holger Pirk, Samuel Madden	In this work, we present several top-k algorithms for GPUs, including a new algorithm based on bitonic sort called bitonic top-k.
106	Distributed Lock Management with RDMA: Decentralization without Starvation	Dong Young Yoon, Mosharaf Chowdhury, Barzan Mozafari	In this paper, we show that it is possible for a lock manager to be fully decentralized and yet exchange the partial knowledge necessary for preventing starvation and thereby reducing tail latencies.
107	Speeding Up Set Intersections in Graph Algorithms using SIMD Instructions	Shuo Han, Lei Zou, Jeffrey Xu Yu	In this paper, we focus on accelerating a widely employed computing pattern — set intersection, to boost a group of graph algorithms.
108	Pipelined Query Processing in Coprocessor Environments	Henning Funke, Sebastian Breß, Stefan Noll, Volker Markl, Jens Teubner	In this paper, we show how query compilation and GPU-style parallelism can be made to play in unison nevertheless.
109	AHEAD: Adaptable Data Hardening for On-the-Fly Hardware Error Detection during Database Query Processing	Till Kolditz, Dirk Habich, Wolfgang Lehner, Matthias Werner, Stefan T.J. de Bruijn	Thus, we propose a novel adaptable and on-the-fly hardware error detection approach called AHEAD for database systems in this paper.
110	Special Session: A Technical Research Agenda in Data Ethics and Responsible Data Management	Julia Stoyanovich, Bill Howe, HV Jagadish	Special Session: A Technical Research Agenda in Data Ethics and Responsible Data Management
111	Evaluating Interactive Data Systems: Workloads, Metrics, and Guidelines	Lilong Jiang, Protiva Rahman, Arnab Nandi	In this tutorial, we will describe unique characteristics of interactive workloads for a variety of user input devices and query interfaces.
112	Data Integration and Machine Learning: A Natural Synergy	Xin Luna Dong, Theodoros Rekatsinas	This tutorial focuses on three aspects of the synergistic relationship between data integration and machine learning: (1) we survey how state-of-the-art data integration solutions rely on machine learning-based approaches for accurate results and effective human-in-the-loop pipelines, (2) we review how end-to-end machine learning applications rely on data integration to identify accurate, clean, and relevant data for their analytics exercises, and (3) we discuss open research challenges and opportunities that span across data integration and machine learning.
113	Modern Recommender Systems: from Computing Matrices to Thinking with Neurons	Georgia Koutrika	The ultimate goal of the tutorial is to encourage the application of novel recommendation approaches to solve problems that go beyond user consumption and to further promote research in the intersection of recommender systems and databases.
114	Privacy at Scale: Local Differential Privacy in Practice	Graham Cormode, Somesh Jha, Tejas Kulkarni, Ninghui Li, Divesh Srivastava, Tianhao Wang	This tutorial aims to introduce the key technical underpinnings of these deployed systems, to survey current research that addresses related problems within the LDP model, and to identify relevant open problems and research directions for the community.
115	Algorithmic Aspects of Parallel Query Processing	Paris Koutris, Semih Salihoglu, Dan Suciu	Based on the MPC model, we study and analyze several algorithms for three core data processing tasks: multiway join queries, sorting and matrix multiplication.
116	Demonstration of VerdictDB, the Platform-Independent AQP System	Wen He, Yongjoo Park, Idris Hanafi, Jacob Yatvitskiy, Barzan Mozafari	We demonstrate VerdictDB, the first platform-independent approximate query processing (AQP) system.
117	Interactive Demonstration of Probabilistic Predicates	Yao Lu, Srikanth Kandula, Surajit Chaudhuri	We will demonstrate a prototype query processing engine that uses probabilistic predicates (PPs) to speed up machine learning inference jobs.
118	Trip Planning by an Integrated Search Paradigm	Sheng Wang, Mingzhao Li, Yipeng Zhang, Zhifeng Bao, David Alexander Tedjopurnomo, Xiaolin Qin	In this paper, we build a trip planning system called TISP, which enables user’s interactive exploration of POIs and trajectories in their incremental trip planning.
119	POIsam: a System for Efficient Selection of Large-scale Geospatial Data on Maps	Tao Guo, Mingzhao Li, Peishan Li, Zhifeng Bao, Gao Cong	In this demonstration we present POIsam, a visualization system supporting the following desirable features: representativeness, visibility constraint, zooming consistency, and panning consistency.
120	DITA: A Distributed In-Memory Trajectory Analytics System	Zeyuan Shang, Guoliang Li, Zhifeng Bao	In this paper, we demonstrate a distributed in-memory trajectory analytics system DITA to support large-scale trajectory data analytics.
121	ACID: A System for Computing Approximate Certain Query Answers over Incomplete Databases	Nicola Fiorentino, Sergio Greco, Cristian Molinaro, Irina Trubitsyna	Since their computation is a coNP-hard problem, recent research has focused on developing polynomial time approximation algorithms computing a sound (but possibly incomplete) set of certain answers.
122	A Demonstration of Sya: A Spatial Probabilistic Knowledge Base Construction System	Ibrahim Sabek, Mashaal Musleh, Mohamed F. Mokbel	Sya employs a simple spatial high-level language, a rule-based spatial SQL query engine, a spatially-indexed probabilistic graphical model, and an adapted spatial statistical inference technique to infer the factual scores of relations.
123	Interactive Visual Exploration of Spatio-Temporal Urban Data Sets using Urbane	Harish Doraiswamy, Eleni Tzirita Zacharatou, Fabio Miranda, Marcos Lage, Anastasia Ailamaki, Cláudio T. Silva, Juliana Freire	To address this limitation, we have recently proposed Raster Join, an approach that converts a spatial aggregation query into a set of drawing operations on a canvas and leverages the rendering pipeline of the graphics hardware (GPU).
124	DBLOC: Density Based Clustering over LOCation Based Services	Yeshwanth D. Gunasekaran, Md Farhadur Rahman, Sona Hasani, Nan Zhang, Gautam Das	Due to query rate limit constraint – i.e., maximum number of kNN queries a user/IP address can issue over a specific period of time, it is often impossible to access all the tuples in backend database of an LBS.
125	Crowdsourcing Analytics With CrowdCur	Mohammadreza Esfandiari, Kavan Bharat Patel, Sihem Amer-Yahia, Senjuti Basu Roy	We propose to demonstrate CrowdCur \xspace, a system that allows platform administrators, requesters, and workers to conduct various analytics of interest.
126	Joins over UNION ALL Queries in Teradata®: Demonstration of Optimized Execution	Mohammed Al-Kateb, Paul Sinclair, Grace Au, Sanjay Nair, Mark Sirek, Lu Ma, Mohamed Y. Eltabakh	In this project, we demonstrate novel cost-based optimization techniques implemented in Teradata Database for join queries involving UNION ALL views and derived tables.
127	QAGView: Interactively Summarizing High-Valued Aggregate Query Answers	Yuhao Wen, Xiaodan Zhu, Sudeepa Roy, Jun Yang	We present QAGView (Quick AGgregate View), which provides a holistic overview of high-valued aggregate query answers to the user in the form of summaries (showing high-level properties that emerge from subsets of answers) with coverage guarantee (for a user-specified number of top-valued answers) that is both diverse (avoiding overlapping or similar summaries) and relevant (focusing on high-valued aggregate answers).
128	PISTIS: A Conflict of Interest Declaration and Detection System for Peer Review Management	Siyuan Wu, Leong Hou U., Sourav S. Bhowmick, Wolfgang Gatterbauer	To address this problem, we demonstrate a novel interactive system called PISTIS that assists the declaration process in a semi-automatic manner.
129	Energy-Utility Function-Based Resource Control for In-Memory Database Systems LIVE	Thomas Kissinger, Marcus Hähnel, Till Smejkal, Dirk Habich, Hermann Härtig, Wolfgang Lehner	In this demo, we present energy-utility functions as an approach for enabling the operating system to improve the energy efficiency of scalable in-memory database systems.
130	iQCAR: A Demonstration of an Inter-Query Contention Analyzer for Cluster Computing Frameworks	Prajakta Kalmegh, Harrison Lundberg, Frederick Xu, Shivnath Babu, Sudeepa Roy	iQCAR – inter Query Contention AnalyzeR is a system that formally models these interferences between concurrent queries and provides a framework to attribute blame for contentions.
131	IoT-Detective: Analyzing IoT Data Under Differential Privacy	Sameera Ghayyur, Yan Chen, Roberto Yus, Ashwin Machanavajjhala, Michael Hay, Gerome Miklau, Sharad Mehrotra	IoT-Detective: Analyzing IoT Data Under Differential Privacy
132	GExp: Cost-aware Graph Exploration with Keywords	Mohammad Hossein Namaki, Yinghui Wu, Xin Zhang	We demonstrate GExp, an interactive graph exploration tool that uses keywords to support effective access and exploration of large graphs.
133	DeepEye: Creating Good Data Visualizations by Keyword Search	Yuyu Luo, Xuedi Qin, Nan Tang, Guoliang Li, Xinran Wang	DeepEye: Creating Good Data Visualizations by Keyword Search
134	Cohort Analysis with Ease	Zhongle Xie, Qingchao Cai, Fei He, Gene Yan Ooi, Weilong Huang, Beng Chin Ooi	In order to make COHANA easy-to-use, we present a comprehensive and powerful tool in this demo, covering the major use cases in cohort analysis with intuitive and accessible operations.
135	Qetch: Time Series Querying with Expressive Sketches	Miro Mannino, Azza Abouzied	By studying how humans sketch time series patterns we develop a matching algorithm that accounts for human sketching errors.
136	SQuID: Semantic Similarity-Aware Query Intent Discovery	Anna Fariha, Sheikh Muhammad Sarwar, Alexandra Meliou	In this demo, we present SQuID, a system for Semantic similarity-aware Query Intent Discovery.
137	An Ontology based Dialog Interface to Database	Ashish Mittal, Jaydeep Sen, Diptikalyan Saha, Karthik Sankaranarayanan	In this paper, we extend the state-of-the-art NLIDB system and present a dialog interface to relational databases.
138	IMPROVE-QA: An Interactive Mechanism for RDF Question/Answering Systems	Xinbo Zhang, Lei Zou	In this demo, we design an I Interactive Mechanism aiming for PRO motion V ia feedback to Q/A systems (IMPROVE-QA), a whole platform to make existing Q/A systems return more precise answers (denoted as $\mathcal Q^\prime (D)$) to users.
139	VALMOD: A Suite for Easy and Exact Detection of Variable Length Motifs in Data Series	Michele Linardi, Yan Zhu, Themis Palpanas, Eamonn Keogh	We demonstrate VALMOD, our scalable motif discovery algorithm that efficiently finds all motifs in a given range of lengths, and outputs a length-invariant ranking of motifs.
140	EPUI: Experimental Platform for Urban Informatics	Xiaoyu Ge, Panos K. Chrysanthis, Konstantinos Pelechrinis, Demetrios Zeinalipour-Yazti	In this paper, we present a prototype system, namely, EPUI (an Experimental Platform of Urban Informatics), which provides a testbed for exploring and evaluating venues and route recommendation solutions that balance between different objectives (i.e., demands) including the newly discovered ones.
141	DBPal: A Learned NL-Interface for Databases	Fuat Basik, Benjamin Hättasch, Amir Ilkhechi, Arif Usta, Shekar Ramaswamy, Prasetya Utama, Nathaniel Weir, Carsten Binnig, Ugur Cetintemel	In this demo, we present DBPal, a novel data exploration tool with a natural language interface.
142	DataDiff: User-Interpretable Data Transformation Summaries for Collaborative Data Analysis	Gunce Su Yilmaz, Tana Wattanawaroon, Liqi Xu, Abhishek Nigam, Aaron J. Elmore, Aditya Parameswaran	We demonstrate DataDiff, a practical and concise data-diff tool that provides human-interpretable explanations of changes between datasets without reliance on the operations that led to the changes.
143	A Nutritional Label for Rankings	Ke Yang, Julia Stoyanovich, Abolfazl Asudeh, Bill Howe, HV Jagadish, Gerome Miklau	In this demonstration we present Ranking Facts, a Web-based application that generates a "nutritional label" for rankings.
144	Precision Interfaces for Different Modalities	Haoci Zhang, Viraj Raj, Thibault Sellam, Eugene Wu	To address this problem, we present Precision Interfaces, a semi-automatic system to generate task-specific data analytics interfaces.
145	Demonstration of Smoke: A Deep Breath of Data-Intensive Lineage Applications	Fotis Psallidas, Eugene Wu	Recently, we introduced a set of implementation design principles and associated techniques to optimize lineage-enabled database engines and realized them in our prototype database engine, namely, Smoke.
146	Transform-Data-by-Example (TDE): Extensible Data Transformation in Excel	Yeye He, Kris Ganjam, Kukjin Lee, Yue Wang, Vivek Narasayya, Surajit Chaudhuri, Xu Chu, Yudian Zheng	We thus develop an extensible data transformation system called Transform-Data-by-Example (TDE) that can leverage rich transformation logic in source code, DLLs, web services and mapping tables, so that end-users only need to provide a few (typically 3) input/output examples, and TDE can synthesize desired programs using relevant transformation logic from these sources.
147	GRFusion: Graphs as First-Class Citizens in Main-Memory Relational Database Systems	Mohamed S. Hassan, Tatiana Kuznetsova, Hyun Chai Jeong, Walid G. Aref, Mohammad Sadoghi	In this demonstration, we present GRFusion, an in-memory relational database system, where graphs are managed as first-class citizens.
148	DataProf: Semantic Profiling for Iterative Data Cleansing and Business Rule Acquisition	Ziheng Wei, Sebastian Link	DataProf: Semantic Profiling for Iterative Data Cleansing and Business Rule Acquisition
149	GeoFlux: Hands-Off Data Integration Leveraging Join Key Knowledge	Jie Song, Danai Koutra, Murali Mani, H. V. Jagadish	To address this barrier, we study how much we can do with no user guidance.
150	Deeper: A Data Enrichment System Powered by Deep Web	Pei Wang, Yongjun He, Ryan Shea, Jiannan Wang, Eugene Wu	In this work, we explore a more targeted alternative that uses resources (in terms of web API calls) proportional to the size of the local database of interest.
151	Tighter Upper Bounds for Join Cardinality Estimates	Walter Cai	Tighter Upper Bounds for Join Cardinality Estimates
152	Approximate Triangle Count and Clustering Coefficient	Siddharth Bhatia	In this paper, we present methods to approximate these metrics for graphs.
153	(Artificial) Mind over Matter: Integrating Humans and Algorithms in Solving Matching Problems	Roee Shraga	(Artificial) Mind over Matter: Integrating Humans and Algorithms in Solving Matching Problems
154	FREDDY: Fast Word Embeddings in Database Systems	Michael Günther	FREDDY: Fast Word Embeddings in Database Systems
155	RDSQ: Reliable Queue Protocol over Shared Logs	Haolin Yu	RDSQ: Reliable Queue Protocol over Shared Logs
156	Efficient Exploration of Linked Data	Oren Kalinsky	Towards that we develop ELinda – an explorer for linked data.
157	SSD as SQLite Engine	Soyee Choi	SSD as SQLite Engine
158	Worst Case Optimal Joins on Relational and XML data	Yuxing Chen	To overcome such limitation, we propose a multi-model processing framework for relational and semi-structured data (i.e. XML), and design a worst-case optimal join algorithm.
159	MonetDBLite: An Embedded Analytical Database	Mark Raasveldt	MonetDBLite: An Embedded Analytical Database
160	Splaying Log-Structured Merge-Trees	Thomas Lively, Luca Schroeder, Carlos Mendizábal	To address this shortcoming, we propose and analyze a simple decision scheme that can be added to any LSM-based key-value store and dramatically reduce the number of disk I/Os for these classes of workloads.
161	Incremental View Maintenance for Property Graph Queries	Gábor Szárnyas	Due to the novelty of the field, graph databases and frameworks typically provide their own query language, such as Cypher for Neo4j, Gremlin for TinkerPop and GraphScript for SAP HANA.