Paper Digest: SIGMOD 2018 Highlights
The ACM Special Interest Group on Management of Data (SIGMOD) is one of the top conferences on database management systems and data management technology.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
team@paperdigest.org
TABLE 1: SIGMOD 2018 Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | Kubernetes and the New Cloud | Eric Brewer | Kubernetes and the New Cloud |
2 | Robust Entity Resolution using Random Graphs | Sainyam Galhotra, Donatella Firmani, Barna Saha, Divesh Srivastava | Our contribution is a general error correction tool that can be leveraged by a variety of hybrid-human machine ER algorithms, based on a formal way for selecting indirect "control queries”. |
3 | Deep Learning for Entity Matching: A Design Space Exploration | Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, Vijay Raghavendra | In this paper we examine applying deep learning (DL) to EM, to understand DL’s benefits and limitations. |
4 | Synthesizing Type-Detection Logic for Rich Semantic Data Types using Open-source Code | Cong Yan, Yeye He | We developed AUTOTYPE from open-source repositories like GitHub. |
5 | Fine-grained Concept Linking using Neural Networks in Healthcare | Jian Dai, Meihui Zhang, Gang Chen, Ju Fan, Kee Yuan Ngiam, Beng Chin Ooi | To address these challenges, we propose a Neural Concept Linking (NCL) approach for accurate concept linking using systematically integrated neural networks. |
6 | Big Data Linkage for Product Specification Pages | Disheng Qiu, Luciano Barbosa, Valter Crescenzi, Paolo Merialdo, Divesh Srivastava | In this paper, we take advantage of the opportunity that sources publish product identifiers to perform big data linkage across sources at the beginning of the data integration pipeline, before schema alignment. |
7 | The Data Interaction Game | Ben McCamish, Vahid Ghadakchi, Arash Termehchy, Behrouz Touri, Liang Huang | We propose a reinforcement learning method that learns and answers the information needs behind queries and adapts to the changes in users’ strategies and prove that it improves the effectiveness of answering queries stochastically speaking. |
8 | Data Citation: Giving Credit Where Credit is Due | Yinjun Wu, Abdussalam Alawini, Susan B. Davidson, Gianmaria Silvello | We present three approaches to implementing citation views and describe alternative policies for the joint, alternate and aggregated use of citation views. |
9 | EKTELO: A Framework for Defining Differentially-Private Computations | Dan Zhang, Ryan McKenna, Ios Kotsogiannis, Michael Hay, Ashwin Machanavajjhala, Gerome Miklau | We propose a novel programming framework and system, Ektelo, for implementing both existing and new privacy algorithms. |
10 | Marginal Release Under Local Differential Privacy | Graham Cormode, Tejas Kulkarni, Divesh Srivastava | In this paper, we provide a set of algorithms for materializing marginal statistics under the strong model of local differential privacy. |
11 | When Query Authentication Meets Fine-Grained Access Control: A Zero-Knowledge Approach | Cheng Xu, Jianliang Xu, Haibo Hu, Man Ho Au | In this paper, we take the first step toward studying the problem of authenticating relational queries with fine-grained access control. |
12 | Practical and Secure Substring Search | Florian Hahn, Nicolas Loza, Florian Kerschbaum | In this paper we address the problem of outsourcing sensitive strings while still providing the functionality of substring searches. |
13 | Columnstore and B+ tree – Are Hybrid Physical Designs Important? | Adam Dziedzic, Jingjing Wang, Sudipto Das, Bolin Ding, Vivek R. Narasayya, Manoj Syamala | We extend the Database Engine Tuning Advisor for Microsoft SQL Server to recommend a suitable combination of B+ tree and columnstore indexes for a given workload. |
14 | Computation Reuse in Analytics Job Service at Microsoft | Alekh Jindal, Shi Qiao, Hiren Patel, Zhicheng Yin, Jieming Di, Malay Bag, Marc Friedman, Yifung Lin, Konstantinos Karanasos, Sriram Rao | In this paper, we describe a computation reuse framework, coined CLOUDVIEWS, which we built to address the computation overlap problem in Microsoft’s SCOPE job service. |
15 | P-Store: An Elastic Database System with Predictive Provisioning | Rebecca Taft, Nosayba El-Sayed, Marco Serafini, Yu Lu, Ashraf Aboulnaga, Michael Stonebraker, Ricardo Mayerhofer, Francisco Andrade | We present P-Store, the first elastic OLTP DBMS to use prediction, and apply it to the workload of B2W Digital (B2W), a large online retailer. |
16 | Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources | Edmon Begoli, Jesús Camacho-Rodríguez, Julian Hyde, Michael J. Mior, Daniel Lemire | The goal of this paper is to formally introduce Calcite to the broader research community, brie y present its history, and describe its architecture, features, functionality, and patterns for adoption. |
17 | Carousel: Low-Latency Transaction Processing for Globally-Distributed Data | Xinan Yan, Linguan Yang, Hongbo Zhang, Xiayue Charles Lin, Bernard Wong, Kenneth Salem, Tim Brecht | This paper introduces Carousel, a distributed database system that provides low-latency transaction processing for multi-partition globally-distributed transactions. |
18 | Accelerating Analytical Processing in MVCC using Fine-Granular High-Frequency Virtual Snapshotting | Ankur Sharma, Felix Martin Schuhknecht, Jens Dittrich | Inside both execution engines, we still apply MVCC. |
19 | Reactors: A Case for Predictable, Virtualized Actor Database Systems | Vivek Shah, Marcos Antonio Vaz Salles | In this paper, we propose a relational actor programming model for in-memory databases as a novel, holistic approach towards fulfilling these challenging requirements. |
20 | FASTER: A Concurrent Key-Value Store with In-Place Updates | Badrish Chandramouli, Guna Prasaad, Donald Kossmann, Justin Levandoski, James Hunter, Mike Barnett | This paper presents FASTER, a new key-value store for point read, blind update, and read-modify-write operations. |
21 | Workload-Aware CPU Performance Scaling for Transactional Database Systems | Mustafa Korkmaz, Martin Karsten, Kenneth Salem, Semih Salihoglu | In this paper, we show that transactional database systems can manage DVFS more effectively than the underlying operating system. |
22 | How to Architect a Query Compiler, Revisited | Ruby Y. Tahboub, Grégory M. Essertel, Tiark Rompf | We aim to contribute to this discussion by drawing attention to an old but underappreciated idea known as Futamura projections, which fundamentally link interpreters and compilers. |
23 | SuRF: Practical Range Query Filtering with Fast Succinct Tries | Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G. Andersen, Michael Kaminsky, Kimberly Keeton, Andrew Pavlo | We present the Succinct Range Filter (SuRF), a fast and compact data structure for approximate membership tests. |
24 | FastQRE: Fast Query Reverse Engineering | Dmitri V. Kalashnikov, Laks V.S. Lakshmanan, Divesh Srivastava | In this work we propose a novel approach for solving the QRE problem efficiently. |
25 | Adaptive Energy-Control for In-Memory Database Systems | Thomas Kissinger, Dirk Habich, Wolfgang Lehner | Thus, we propose the Energy-Control Loop (ECL) as an DBMS-integrated approach for adaptive energy-control on scale-up in-memory database systems that obeys a query latency limit as a soft constraint and actively optimizes energy efficiency and performance of the DBMS. |
26 | Incremental View Maintenance with Triple Lock Factorization Benefits | Milos Nikolic, Dan Olteanu | We introduce F-IVM, a unified incremental view maintenance (IVM) approach for a variety of tasks, including gradient computation for learning linear regression models over joins, matrix chain multiplication, and factorized evaluation of conjunctive queries. |
27 | Catching Numeric Inconsistencies in Graphs | Wenfei Fan, Xueli Liu, Ping Lu, Chao Tian | To catch such errors, we propose to extend graph functional dependencies with linear arithmetic expressions and comparison predicates, referred to as NGDs. |
28 | TurboGraph++: A Scalable and Fast Graph Analytics System | Seongyun Ko, Wook-Shin Han | We present TurboGraph++, a scalable and fast graph analytics system which efficiently processes large graphs by exploiting external memory for scale-up without compromising efficiency. |
29 | TurboFlux: A Fast Continuous Subgraph Matching System for Streaming Graph Data | Kyoungmin Kim, In Seo, Wook-Shin Han, Jeong-Hoon Lee, Sungpack Hong, Hassan Chafi, Hyungyu Shin, Geonhwa Jeong | We present a fast continuous subgraph matching system called TurboFlux which provides high throughput over a fast graph update stream. |
30 | Discovering Graph Functional Dependencies | Wenfei Fan, Chunming Hu, Xueli Liu, Ping Lu | We introduce notions of reduced GFDs and their topological support, and formalize the discovery problem for GFDs. |
31 | TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs | Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Shuo Shang, Ji-Rong Wen | To address the deficiencies of existing solutions, we propose PPR, an algorithm for top-k PPR queries that ensure at least ρ precision (i.e., at least ρ fraction of the actual top-k results are returned) with at least 1 – 1/n probability, where ρ ∈;n (0, 1] is a user-specified parameter and n is the number of nodes in G. |
32 | Skyline Community Search in Multi-valued Networks | Rong-Hua Li, Lu Qin, Fanghua Ye, Jeffrey Xu Yu, Xiaokui Xiao, Nong Xiao, Zibin Zheng | To capture d numerical attributes, we propose a novel community model, called skyline community, based on the concepts of k-core and skyline. |
33 | Building a Bw-Tree Takes More Than Just Buzz Words | Ziqi Wang, Andrew Pavlo, Hyeontaek Lim, Viktor Leis, Huanchen Zhang, Michael Kaminsky, David G. Andersen | In 2013, Microsoft Research proposed the Bw-Tree (humorously termed the "Buzz Word Tree”), a lock-free index that provides high throughput for transactional database workloads in SQL Server’s Hekaton engine. |
34 | The Case for Learned Index Structures | Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis | In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term \em learned indexes. |
35 | Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging | Niv Dayan, Stratos Idreos | In this paper, we show that all mainstream LSM-tree based key-value stores in the literature and in industry are suboptimal with respect to how they trade off among the I/O costs of updates, point lookups, range lookups, as well as the cost of storage, measured as space-amplification. |
36 | HOT: A Height Optimized Trie Index for Main-Memory Database Systems | Robert Binna, Eva Zangerle, Martin Pichl, Günther Specht, Viktor Leis | We present the Height Optimized Trie (HOT), a fast and space-efficient in-memory index structure. |
37 | The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models | Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Michael S. Kester, Demi Guo | We present a design engine, the Data Calculator, which enables interactive and semi-automated design of data structures. |
38 | A Comparative Study of Secondary Indexing Techniques in LSM-based NoSQL Databases | Mohiuddin Abdul Qader, Shiwen Cheng, Vagelis Hristidis | In this paper, we present a taxonomy of NoSQL secondary indexes, broadly split into two classes: Embedded Indexes (i.e. lightweight filters embedded inside the primary table) and Stand-Alone Indexes (i.e. separate data structures). |
39 | Efficient Selection of Geospatial Data on Maps for Interactive and Visualized Exploration | Tao Guo, Kaiyu Feng, Gao Cong, Zhifeng Bao | We propose that such systems should support the following desirable features: representativeness, visibility constraint, zooming consistency, and panning consistency. |
40 | Pinot: Realtime OLAP for 530 Million Users | Jean-François Im, Kishore Gopalakrishna, Subbu Subramaniam, Mayank Shrivastava, Adwait Tumbde, Xiaotian Jiang, Jennifer Dai, Seunghyun Lee, Neha Pawar, Jialiang Li, Ravi Aringunram | We present Pinot, a single system used in production at Linkedin that can serve tens of thousands of analytical queries per second, offers near-realtime data ingestion from streaming data sources, and handles the operational requirements of large web properties. |
41 | Robust, Scalable, Real-Time Event Time Series Aggregation at Twitter | Peilin Yang, Srikanth Thiagarajan, Jimmy Lin | In this paper, we describe TSAR (TimeSeries AggregatoR), a robust, scalable, real-time event time series aggregation framework built primarily for engagement monitoring: aggregating interactions with Tweets, segmented along a multitude of dimensions such as device, engagement type, etc. |
42 | Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark | Michael Armbrust, Tathagata Das, Joseph Torres, Burak Yavuz, Shixiong Zhu, Reynold Xin, Ali Ghodsi, Ion Stoica, Matei Zaharia | We describe the system’s design and use cases from several hundred production deployments on Databricks, the largest of which process over 1 PB of data per month. |
43 | TcpRT: Instrument and Diagnostic Analysis System for Service Quality of Cloud Databases at Massive Scale in Real-time | Wei Cao, Yusong Gao, Bingchen Lin, Xiaojie Feng, Yu Xie, Xiao Lou, Peng Wang | This paper presents TcpRT, the instrument and diagnosis infrastructure in Alibaba Cloud RDS that achieves real-time anomaly detection. |
44 | Machine Learning for Data Management: Problems and Solutions | Pedro Domingos | Learning structure-in the case of Markov logic, a set of formulas in first-order logic-is intractable, as in more traditional representations, but can be done effectively using inductive logic programming techniques. |
45 | Query-based Workload Forecasting for Self-Driving Database Management Systems | Lin Ma, Dana Van Aken, Ahmed Hefny, Gustavo Mezerhane, Andrew Pavlo, Geoffrey J. Gordon | We present a robust forecasting framework called QueryBot 5000 that allows a DBMS to predict the expected arrival rate of queries in the future based on historical data. |
46 | RushMon: Real-time Isolation Anomalies Monitoring | Zechao Shang, Jeffrey Xu Yu, Aaron J. Elmore | In this paper, we tackle this problem. |
47 | On the Calculation of Optimality Ranges for Relational Query Execution Plans | Florian Wolf, Norman May, Paul R. Willems, Kai-Uwe Sattler | In this paper we analyze the deviation from the estimate, and denote the cardinality range of an intermediate result, where the optimal plan remains optimal as the optimality range. |
48 | Adaptive Optimization of Very Large Join Queries | Thomas Neumann, Bernhard Radke | This paper introduces an adaptive optimization framework that is able to solve most common join queries exactly, while simultaneously scaling to queries with thousands of joins. |
49 | Improving Join Reorderability with Compensation Operators | TaiNing Wang, Chee-Yong Chan | In this paper, we present a novel approach for this problem for the class of queries involving inner-joins, single-sided outerjoins, and/or antijoins. |
50 | When Hierarchy Meets 2-Hop-Labeling: Efficient Shortest Distance Queries on Road Networks | Dian Ouyang, Lu Qin, Lijun Chang, Xuemin Lin, Ying Zhang, Qing Zhu | To overcome the drawbacks of both solutions, in this paper, we propose a novel hierarchical 2-hop index (H2H-Index) which assigns a label for each vertex and at the same time preserves a hierarchy among all vertices. |
51 | DITA: Distributed In-Memory Trajectory Analytics | Zeyuan Shang, Guoliang Li, Zhifeng Bao | We propose an effective partitioning method, global index and local index, to address the data locality problem. |
52 | Cold Filter: A Meta-Framework for Faster and More Accurate Stream Processing | Yang Zhou, Tong Yang, Jie Jiang, Bin Cui, Minlan Yu, Xiaoming Li, Steve Uhlig | To enhance these algorithms, we propose a meta-framework, called Cold Filter (CF), that enables faster and more accurate stream processing. |
53 | Sketching Linear Classifiers over Data Streams | Kai Sheng Tai, Vatsal Sharan, Peter Bailis, Gregory Valiant | We introduce a new sub-linear space sketch—the Weight-Median Sketch—for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model. |
54 | Dynamic Pricing in Spatial Crowdsourcing: A Matching-Based Approach | Yongxin Tong, Libin Wang, Zimu Zhou, Lei Chen, Bowen Du, Jieping Ye | In the paper, we formally define this <u>G</u>lobal <u>D</u>ynamic <u>P</u>ricing(GDP) problem in spatial crowdsourcing. |
55 | Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes | Alexandre Verbitski, Anurag Gupta, Debanjan Saha, James Corey, Kamal Gupta, Murali Brahmadesam, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvilli, Xiaofeng Bao | In this paper, we describe how Aurora avoids distributed consensus under most circumstances by establishing invariants and leveraging local transient state. |
56 | Eon Mode: Bringing the Vertica Columnar Database to the Cloud | Ben Vandiver, Shreya Prasad, Pratibha Rana, Eden Zik, Amin Saeidi, Pratyush Parimal, Styliani Pantela, Jaimin Dave | Eon Mode: Bringing the Vertica Columnar Database to the Cloud |
57 | Survivability of Cloud Databases – Factors and Prediction | Jose Picado, Willis Lang, Edward C. Thayer | Given the large and diverse relational database population that Azure SQL DB has, we present a large-scale survivability study of our service and identify some factors that can demonstrably help predict the lifespan of cloud databases. |
58 | Rapid Adoption of Cloud Data Warehouse Technology Using Datometry Hyper-Q | Lyublena Antova, Derrick Bryant, Tuan Cao, Michael Duller, Mohamed A. Soliman, Florian M. Waas | In this paper, we present a next-generation virtualization technology that lets existing applications run natively on cloud-based database systems. |
59 | Lightweight Cardinality Estimation in LSM-based Systems | Ildar Absalyamov, Michael J. Carey, Vassilis J. Tsotras | In this paper we address the problem of computing data statistics for workloads with rapid data ingestion and propose a lightweight statistics-collection framework that exploits the properties of LSM storage. |
60 | Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation | Brian Hentschel, Michael S. Kester, Stratos Idreos | We present a new class of indexing scheme, termed a Column Sketch, which improves the performance of predicate evaluation independently of workload properties. |
61 | ZigZag: Supporting Similarity Queries on Vector Space Models | Wenhai Li, Lingfeng Deng, Yang Li, Chen Li | In this paper we study the problem of supporting similarity queries on a large number of records using a vector space model, where each record is a bag of tokens. |
62 | Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search | Yiqiu Wang, Anshumali Shrivastava, Jonathan Wang, Junghee Ryu | We present FLASH (F ast L SH A lgorithm for S imilarity search accelerated with H PC), a similarity search system for ultra-high dimensional datasets on a single machine, that does not require similarity computations and is tailored for high-performance computing platforms. |
63 | Overlap Set Similarity Joins with Theoretical Guarantees | Dong Deng, Yufei Tao, Guoliang Li | As the size boundary between the small sets and the large sets is crucial to the efficiency, we propose an effective size boundary selection algorithm to judiciously choose an appropriate size boundary, which works very well in practice. |
64 | BOOMER: Blending Visual Formulation and Processing of P -Homomorphic Queries on Large Networks | Yinglong Song, Huey Eng Chua, Sourav S. Bhowmick, Byron Choi, Shuigeng Zhou | In this paper, we explore how this vision can be realized for more generic but complex 1-1 p-homomorphic p-hom) queries introduced by Fan et al. |
65 | Navigating the Data Lake with DATAMARAN: Automatically Extracting Structure from Log Datasets | Yihan Gao, Silu Huang, Aditya Parameswaran | We present DATAMARAN, an tool that extracts structure from semi-structured log datasets with no human supervision. |
66 | Efficient k-Regret Query Algorithm with Restriction-free Bound for any Dimensionality | Min Xie, Raymond Chi-Wing Wong, Jian Li, Cheng Long, Ashwin Lall | In this paper, we present the lower bound of the maximum regret ratio for the k -regret query. |
67 | A Rating-Ranking Method for Crowdsourced Top-k Computation | Kaiyu Li, Xiaohang Zhang, Guoliang Li | To address this problem, we propose a rating-ranking-based approach, which contains two types of questions to ask the crowd. |
68 | Online Processing Algorithms for Influence Maximization | Jing Tang, Xueyan Tang, Xiaokui Xiao, Junsong Yuan | Motivated by this, we propose a new algorithm for OPIM with both superior empirical effectiveness and strong theoretical guarantees, and we show that it can also be extended to handle conventional influence maximization. |
69 | Top-k Sorting Under Partial Order Information | Eyal Dushkin, Tova Milo | In light of this, we present a dedicated algorithm for top-k sorting that aims to minimize the number of comparisons by thoroughly leveraging the partial order information. |
70 | Bias in OLAP Queries: Detection, Explanation, and Removal | Babak Salimi, Johannes Gehrke, Dan Suciu | In this paper, we propose, a system to detect, explain, and to resolve bias in decision-support queries. |
71 | Persistent Bloom Filter: Membership Testing for the Entire History | Yanqing Peng, Jinwei Guo, Feifei Li, Weining Qian, Aoying Zhou | It is possible to support such "temporal membership testing" using a BF, but we will show that this is fairly expensive. |
72 | Matrix Profile X: VALMOD – Scalable Discovery of Variable-Length Motifs in Data Series | Michele Linardi, Yan Zhu, Themis Palpanas, Eamonn Keogh | In this work, we introduce VALMOD, an exact and scalable motif discovery algorithm that efficiently finds all motifs in a given range of lengths. |
73 | Fast Euclidean OPTICS with Bounded Precision in Low Dimensional Space | Junhao Gan, Yufei Tao | As a side product, our algorithm gives an index structure that occupies linear space, and supports the cluster group-by query with near-optimal cost. |
74 | The Cascading Analysts Algorithm | Matthias Ruhl, Mukund Sundararajan, Qiqi Yan | Given a change in such a metric, our goal is to identify a small set of non-overlapping data segments that account for a majority of the change. |
75 | Finding Seeds and Relevant Tags Jointly: For Targeted Influence Maximization in Social Networks | Xiangyu Ke, Arijit Khan, Gao Cong | In this work, we model such practical constraints, and investigate the novel problem of jointly finding the top-k seed nodes and the top- r relevant tags that maximize the influence inside a target set of users. |
76 | Efficient Algorithms for Finding Approximate Heavy Hitters in Personalized PageRanks | Sibo Wang, Yufei Tao | In this paper, we propose BLOG, an efficient framework for three types of heavy hitter queries: the pairwise approximate heavy hitter (AHH), the reverse AHH, and the multi-source reverse AHH queries. |
77 | Data Sketches for Disaggregated Subset Sum and Frequent Item Estimation | Daniel Ting | We introduce and study a new data sketch for processing massive datasets. |
78 | Adaptive Asynchronous Parallelization of Graph Algorithms | Wenfei Fan, Ping Lu, Xiaojian Luo, Jingbo Xu, Qiang Yin, Wenyuan Yu, Ruiqi Xu | This paper proposes an Adaptive Asynchronous Parallel (AAP) model for graph computations. |
79 | Meta-Dataflows: Efficient Exploratory Dataflow Jobs | Raul Castro Fernandez, William Culhane, Pijika Watcharapichat, Matthias Weidlich, Victoria Lopez Morales, Peter Pietzuch | We describe meta-dataflows(MDFs), a new model to effectively express exploratory workflows and efficiently execute them on compute clusters. |
80 | RP-DBSCAN: A Superfast Parallel DBSCAN Algorithm Based on Random Partitioning | Hwanjun Song, Jae-Gil Lee | To remedy these problems, we propose a cell-based data partitioning scheme, pseudo random partitioning , that randomly distributes small cells rather than the points themselves. |
81 | PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development | Jia Zou, R. Matthew Barnett, Tania Lorido-Botran, Shangyu Luo, Carlos Monroy, Sourav Sikdar, Kia Teymourian, Binhang Yuan, Chris Jermaine | This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. |
82 | Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications | Maaz Bin Safeer Ahmad, Alvin Cheung | We present Casper, a new tool that automatically translates sequential Java programs into the MapReduce paradigm. |
83 | DPaxos: Managing Data Closer to Users for Low-Latency and Mobile Applications | Faisal Nawab, Divyakant Agrawal, Amr El Abbadi | In this paper, we propose Dynamic Paxos (DPaxos), a Paxos-based consensus protocol to manage access to partitioned data across globally-distributed datacenters and edge nodes. |
84 | Submodularity of Distributed Join Computation | Rundong Li, Mirek Riedewald, Xinyan Deng | We show that minimizing load variance subject to a constraint on expectation is a monotone submodular maximization problem with Knapsack constraints, hence admitting provably near-optimal greedy solutions. |
85 | NashDB: An End-to-End Economic Method for Elastic Database Fragmentation, Replication, and Provisioning | Ryan Marcus, Olga Papaemmanouil, Sofiya Semenova, Solomon Garber | This paper introduces NashDB, an adaptive data distribution framework that relies on an economic model to automatically balance the supply and demand of data fragments, replicas, and cluster nodes. |
86 | SketchML: Accelerating Distributed Machine Learning with Data Sketches | Jiawei Jiang, Fangcheng Fu, Tong Yang, Bin Cui | In this paper, we study is there a compression method that can efficiently handle a sparse and nonuniform gradient consisting of key-value pairs? |
87 | MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis | Manasi Vartak, Joana M. F. da Trindade, Samuel Madden, Matei Zaharia | For intermediates that are stored in MISTIQUE, we propose a range of optimizations to reduce storage footprint including quantization, summarization, and data de-duplication. |
88 | Fonduer: Knowledge Base Construction from Richly Formatted Data | Sen Wu, Luke Hsiao, Xiao Cheng, Braden Hancock, Theodoros Rekatsinas, Philip Levis, Christopher Ré | We introduce Fonduer, a machine-learning-based KBC system for richly formatted data. |
89 | Maverick: Discovering Exceptional Facts from Knowledge Graphs | Gensheng Zhang, Damian Jimenez, Chengkai Li | We present Maverick, a general, extensible framework that discovers exceptional facts about entities in knowledge graphs. |
90 | A General and Efficient Querying Method for Learning to Hash | Jinfeng Li, Xiao Yan, Jian Zhang, An Xu, James Cheng, Jie Liu, Kelvin K. W. Ng, Ti-chung Cheng | We propose a new fine-grained similarity indicator, quantization distance (QD), which provides more information about the similarity between a query and the items in a bucket. |
91 | Subjective Knowledge Base Construction Powered By Crowdsourcing and Knowledge Base | Hao Xin, Rui Meng, Lei Chen | In our work, we propose a KBC framework for subjective knowledge base construction taking advantage of the knowledge from the crowd and existing KBs. |
92 | DimBoost: Boosting Gradient Boosting Decision Tree to Higher Dimensions | Jiawei Jiang, Bin Cui, Ce Zhang, Fangcheng Fu | In this paper, we ask "Can we build a scalable GBDT training system whose performance scales better with respect to dimensionality of the data?" |
93 | Auto-Detect: Data-Driven Error Detection in Tables | Zhipeng Huang, Yeye He | We propose \sj, a statistics-based technique that leverages co-occurrence statistics from large corpora for error detection, which is a significant departure from existing rule-based methods. |
94 | A Graph Database for a Virtualized Network Infrastructure | Pramod Jamkhedkar, Theodore Johnson, Yaron Kanza, Aman Shaikh, N. K. Shankaranarayanan, Vladislav Shkapenyuk | In this paper, we explore the database requirements for the management and troubleshooting of network services using VNF and SDN technologies. |
95 | RAPID: In-Memory Analytical Query Processing Engine with Extreme Performance per Watt | Cagri Balkesen, Nitin Kunal, Georgios Giannikis, Pit Fender, Seema Sundara, Felix Schmidt, Jarod Wen, Sandeep Agrawal, Arun Raghavan, Venkatanathan Varadarajan, Anand Viswanathan, Balakrishnan Chandrasekaran, Sam Idicula, Nipun Agarwal, Eric Sedlar | In this paper, we demonstrate through a carefully designed modern data processing system called RAPID and a simple, low-power processor specially tailored for data processing that at least an order of magnitude performance/power improvement in SQL processing can be achieved over a modern system running on today’s complex processors. |
96 | G-CORE: A Core for Future Graph Query Languages | Renzo Angles, Marcelo Arenas, Pablo Barcelo, Peter Boncz, George Fletcher, Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan Sequeda, Oskar van Rest, Hannes Voigt | We report on a community effort between industry and academia to shape the future of graph query languages. |
97 | Cypher: An Evolving Query Language for Property Graphs | Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, Andrés Taylor | We compare the features of Cypher to other property graph query languages, and describe extensions, at an advanced stage of development, which will form part of Cypher 10, turning the language into a compositional language which supports graph projections and multiple named graphs. |
98 | BIPie: Fast Selection and Aggregation on Encoded Data using Operator Specialization | Michal Nowakiewicz, Eric Boutin, Eric Hanson, Robert Walzer, Akash Katipally | In this paper, we demonstrate that there is tremendous room for improvement in the processing of analytical queries on modern commodity hardware. |
99 | VerdictDB: Universalizing Approximate Query Processing | Yongjoo Park, Barzan Mozafari, Joseph Sorenson, Junhao Wang | Therefore, we argue that a universal solution is needed: a database-agnostic approximation engine that will widen the reach of this emerging technology across various platforms. |
100 | AQP++: Connecting Approximate Query Processing With Aggregate Precomputation for Interactive Analytics | Jinglin Peng, Dongxiang Zhang, Jiannan Wang, Jian Pei | In this paper, we argue for the need to connect these two separate ideas for interactive analytics. |
101 | Accelerating Machine Learning Inference with Probabilistic Predicates | Yao Lu, Aakanksha Chowdhery, Srikanth Kandula, Surajit Chaudhuri | In this work, we demonstrate constructing and applying probabilistic predicates to filter data blobs that do not satisfy the query predicate; such filtering is parametrized to different target accuracies. |
102 | A Query Engine for Probabilistic Preferences | Uzi Cohen, Batya Kenig, Haoyue Ping, Benny Kimelfeld, Julia Stoyanovich | In this paper, we embark on the challenge of a practical realization of this framework. |
103 | Random Sampling over Joins Revisited | Zhuoyue Zhao, Robert Christensen, Feifei Li, Xiao Hu, Ke Yi | We analyze the properties of different instantiations and evaluate them against the baseline methods; the results clearly demonstrate the superiority of our new techniques. |
104 | Managing Non-Volatile Memory in Database Systems | Alexander van Renen, Viktor Leis, Alfons Kemper, Thomas Neumann, Takushi Hashida, Kazuichi Oe, Yoshiyasu Doi, Lilian Harada, Mitsuru Sato | In this work, we evaluate these two approaches and compare them with in-memory databases as well as more traditional buffer managers that use main memory as a cache in front of SSDs. |
105 | Efficient Top-K Query Processing on Massively Parallel Hardware | Anil Shanbhag, Holger Pirk, Samuel Madden | In this work, we present several top-k algorithms for GPUs, including a new algorithm based on bitonic sort called bitonic top-k. |
106 | Distributed Lock Management with RDMA: Decentralization without Starvation | Dong Young Yoon, Mosharaf Chowdhury, Barzan Mozafari | In this paper, we show that it is possible for a lock manager to be fully decentralized and yet exchange the partial knowledge necessary for preventing starvation and thereby reducing tail latencies. |
107 | Speeding Up Set Intersections in Graph Algorithms using SIMD Instructions | Shuo Han, Lei Zou, Jeffrey Xu Yu | In this paper, we focus on accelerating a widely employed computing pattern — set intersection, to boost a group of graph algorithms. |
108 | Pipelined Query Processing in Coprocessor Environments | Henning Funke, Sebastian Breß, Stefan Noll, Volker Markl, Jens Teubner | In this paper, we show how query compilation and GPU-style parallelism can be made to play in unison nevertheless. |
109 | AHEAD: Adaptable Data Hardening for On-the-Fly Hardware Error Detection during Database Query Processing | Till Kolditz, Dirk Habich, Wolfgang Lehner, Matthias Werner, Stefan T.J. de Bruijn | Thus, we propose a novel adaptable and on-the-fly hardware error detection approach called AHEAD for database systems in this paper. |
110 | Special Session: A Technical Research Agenda in Data Ethics and Responsible Data Management | Julia Stoyanovich, Bill Howe, HV Jagadish | Special Session: A Technical Research Agenda in Data Ethics and Responsible Data Management |
111 | Evaluating Interactive Data Systems: Workloads, Metrics, and Guidelines | Lilong Jiang, Protiva Rahman, Arnab Nandi | In this tutorial, we will describe unique characteristics of interactive workloads for a variety of user input devices and query interfaces. |
112 | Data Integration and Machine Learning: A Natural Synergy | Xin Luna Dong, Theodoros Rekatsinas | This tutorial focuses on three aspects of the synergistic relationship between data integration and machine learning: (1) we survey how state-of-the-art data integration solutions rely on machine learning-based approaches for accurate results and effective human-in-the-loop pipelines, (2) we review how end-to-end machine learning applications rely on data integration to identify accurate, clean, and relevant data for their analytics exercises, and (3) we discuss open research challenges and opportunities that span across data integration and machine learning. |
113 | Modern Recommender Systems: from Computing Matrices to Thinking with Neurons | Georgia Koutrika | The ultimate goal of the tutorial is to encourage the application of novel recommendation approaches to solve problems that go beyond user consumption and to further promote research in the intersection of recommender systems and databases. |
114 | Privacy at Scale: Local Differential Privacy in Practice | Graham Cormode, Somesh Jha, Tejas Kulkarni, Ninghui Li, Divesh Srivastava, Tianhao Wang | This tutorial aims to introduce the key technical underpinnings of these deployed systems, to survey current research that addresses related problems within the LDP model, and to identify relevant open problems and research directions for the community. |
115 | Algorithmic Aspects of Parallel Query Processing | Paris Koutris, Semih Salihoglu, Dan Suciu | Based on the MPC model, we study and analyze several algorithms for three core data processing tasks: multiway join queries, sorting and matrix multiplication. |
116 | Demonstration of VerdictDB, the Platform-Independent AQP System | Wen He, Yongjoo Park, Idris Hanafi, Jacob Yatvitskiy, Barzan Mozafari | We demonstrate VerdictDB, the first platform-independent approximate query processing (AQP) system. |
117 | Interactive Demonstration of Probabilistic Predicates | Yao Lu, Srikanth Kandula, Surajit Chaudhuri | We will demonstrate a prototype query processing engine that uses probabilistic predicates (PPs) to speed up machine learning inference jobs. |
118 | Trip Planning by an Integrated Search Paradigm | Sheng Wang, Mingzhao Li, Yipeng Zhang, Zhifeng Bao, David Alexander Tedjopurnomo, Xiaolin Qin | In this paper, we build a trip planning system called TISP, which enables user’s interactive exploration of POIs and trajectories in their incremental trip planning. |
119 | POIsam: a System for Efficient Selection of Large-scale Geospatial Data on Maps | Tao Guo, Mingzhao Li, Peishan Li, Zhifeng Bao, Gao Cong | In this demonstration we present POIsam, a visualization system supporting the following desirable features: representativeness, visibility constraint, zooming consistency, and panning consistency. |
120 | DITA: A Distributed In-Memory Trajectory Analytics System | Zeyuan Shang, Guoliang Li, Zhifeng Bao | In this paper, we demonstrate a distributed in-memory trajectory analytics system DITA to support large-scale trajectory data analytics. |
121 | ACID: A System for Computing Approximate Certain Query Answers over Incomplete Databases | Nicola Fiorentino, Sergio Greco, Cristian Molinaro, Irina Trubitsyna | Since their computation is a coNP-hard problem, recent research has focused on developing polynomial time approximation algorithms computing a sound (but possibly incomplete) set of certain answers. |
122 | A Demonstration of Sya: A Spatial Probabilistic Knowledge Base Construction System | Ibrahim Sabek, Mashaal Musleh, Mohamed F. Mokbel | Sya employs a simple spatial high-level language, a rule-based spatial SQL query engine, a spatially-indexed probabilistic graphical model, and an adapted spatial statistical inference technique to infer the factual scores of relations. |
123 | Interactive Visual Exploration of Spatio-Temporal Urban Data Sets using Urbane | Harish Doraiswamy, Eleni Tzirita Zacharatou, Fabio Miranda, Marcos Lage, Anastasia Ailamaki, Cláudio T. Silva, Juliana Freire | To address this limitation, we have recently proposed Raster Join, an approach that converts a spatial aggregation query into a set of drawing operations on a canvas and leverages the rendering pipeline of the graphics hardware (GPU). |
124 | DBLOC: Density Based Clustering over LOCation Based Services | Yeshwanth D. Gunasekaran, Md Farhadur Rahman, Sona Hasani, Nan Zhang, Gautam Das | Due to query rate limit constraint – i.e., maximum number of kNN queries a user/IP address can issue over a specific period of time, it is often impossible to access all the tuples in backend database of an LBS. |
125 | Crowdsourcing Analytics With CrowdCur | Mohammadreza Esfandiari, Kavan Bharat Patel, Sihem Amer-Yahia, Senjuti Basu Roy | We propose to demonstrate CrowdCur \xspace, a system that allows platform administrators, requesters, and workers to conduct various analytics of interest. |
126 | Joins over UNION ALL Queries in Teradata®: Demonstration of Optimized Execution | Mohammed Al-Kateb, Paul Sinclair, Grace Au, Sanjay Nair, Mark Sirek, Lu Ma, Mohamed Y. Eltabakh | In this project, we demonstrate novel cost-based optimization techniques implemented in Teradata Database for join queries involving UNION ALL views and derived tables. |
127 | QAGView: Interactively Summarizing High-Valued Aggregate Query Answers | Yuhao Wen, Xiaodan Zhu, Sudeepa Roy, Jun Yang | We present QAGView (Quick AGgregate View), which provides a holistic overview of high-valued aggregate query answers to the user in the form of summaries (showing high-level properties that emerge from subsets of answers) with coverage guarantee (for a user-specified number of top-valued answers) that is both diverse (avoiding overlapping or similar summaries) and relevant (focusing on high-valued aggregate answers). |
128 | PISTIS: A Conflict of Interest Declaration and Detection System for Peer Review Management | Siyuan Wu, Leong Hou U., Sourav S. Bhowmick, Wolfgang Gatterbauer | To address this problem, we demonstrate a novel interactive system called PISTIS that assists the declaration process in a semi-automatic manner. |
129 | Energy-Utility Function-Based Resource Control for In-Memory Database Systems LIVE | Thomas Kissinger, Marcus Hähnel, Till Smejkal, Dirk Habich, Hermann Härtig, Wolfgang Lehner | In this demo, we present energy-utility functions as an approach for enabling the operating system to improve the energy efficiency of scalable in-memory database systems. |
130 | iQCAR: A Demonstration of an Inter-Query Contention Analyzer for Cluster Computing Frameworks | Prajakta Kalmegh, Harrison Lundberg, Frederick Xu, Shivnath Babu, Sudeepa Roy | iQCAR – inter Query Contention AnalyzeR is a system that formally models these interferences between concurrent queries and provides a framework to attribute blame for contentions. |
131 | IoT-Detective: Analyzing IoT Data Under Differential Privacy | Sameera Ghayyur, Yan Chen, Roberto Yus, Ashwin Machanavajjhala, Michael Hay, Gerome Miklau, Sharad Mehrotra | IoT-Detective: Analyzing IoT Data Under Differential Privacy |
132 | GExp: Cost-aware Graph Exploration with Keywords | Mohammad Hossein Namaki, Yinghui Wu, Xin Zhang | We demonstrate GExp, an interactive graph exploration tool that uses keywords to support effective access and exploration of large graphs. |
133 | DeepEye: Creating Good Data Visualizations by Keyword Search | Yuyu Luo, Xuedi Qin, Nan Tang, Guoliang Li, Xinran Wang | DeepEye: Creating Good Data Visualizations by Keyword Search |
134 | Cohort Analysis with Ease | Zhongle Xie, Qingchao Cai, Fei He, Gene Yan Ooi, Weilong Huang, Beng Chin Ooi | In order to make COHANA easy-to-use, we present a comprehensive and powerful tool in this demo, covering the major use cases in cohort analysis with intuitive and accessible operations. |
135 | Qetch: Time Series Querying with Expressive Sketches | Miro Mannino, Azza Abouzied | By studying how humans sketch time series patterns we develop a matching algorithm that accounts for human sketching errors. |
136 | SQuID: Semantic Similarity-Aware Query Intent Discovery | Anna Fariha, Sheikh Muhammad Sarwar, Alexandra Meliou | In this demo, we present SQuID, a system for Semantic similarity-aware Query Intent Discovery. |
137 | An Ontology based Dialog Interface to Database | Ashish Mittal, Jaydeep Sen, Diptikalyan Saha, Karthik Sankaranarayanan | In this paper, we extend the state-of-the-art NLIDB system and present a dialog interface to relational databases. |
138 | IMPROVE-QA: An Interactive Mechanism for RDF Question/Answering Systems | Xinbo Zhang, Lei Zou | In this demo, we design an I Interactive Mechanism aiming for PRO motion V ia feedback to Q/A systems (IMPROVE-QA), a whole platform to make existing Q/A systems return more precise answers (denoted as $\mathcal Q^\prime (D)$) to users. |
139 | VALMOD: A Suite for Easy and Exact Detection of Variable Length Motifs in Data Series | Michele Linardi, Yan Zhu, Themis Palpanas, Eamonn Keogh | We demonstrate VALMOD, our scalable motif discovery algorithm that efficiently finds all motifs in a given range of lengths, and outputs a length-invariant ranking of motifs. |
140 | EPUI: Experimental Platform for Urban Informatics | Xiaoyu Ge, Panos K. Chrysanthis, Konstantinos Pelechrinis, Demetrios Zeinalipour-Yazti | In this paper, we present a prototype system, namely, EPUI (an Experimental Platform of Urban Informatics), which provides a testbed for exploring and evaluating venues and route recommendation solutions that balance between different objectives (i.e., demands) including the newly discovered ones. |
141 | DBPal: A Learned NL-Interface for Databases | Fuat Basik, Benjamin Hättasch, Amir Ilkhechi, Arif Usta, Shekar Ramaswamy, Prasetya Utama, Nathaniel Weir, Carsten Binnig, Ugur Cetintemel | In this demo, we present DBPal, a novel data exploration tool with a natural language interface. |
142 | DataDiff: User-Interpretable Data Transformation Summaries for Collaborative Data Analysis | Gunce Su Yilmaz, Tana Wattanawaroon, Liqi Xu, Abhishek Nigam, Aaron J. Elmore, Aditya Parameswaran | We demonstrate DataDiff, a practical and concise data-diff tool that provides human-interpretable explanations of changes between datasets without reliance on the operations that led to the changes. |
143 | A Nutritional Label for Rankings | Ke Yang, Julia Stoyanovich, Abolfazl Asudeh, Bill Howe, HV Jagadish, Gerome Miklau | In this demonstration we present Ranking Facts, a Web-based application that generates a "nutritional label" for rankings. |
144 | Precision Interfaces for Different Modalities | Haoci Zhang, Viraj Raj, Thibault Sellam, Eugene Wu | To address this problem, we present Precision Interfaces, a semi-automatic system to generate task-specific data analytics interfaces. |
145 | Demonstration of Smoke: A Deep Breath of Data-Intensive Lineage Applications | Fotis Psallidas, Eugene Wu | Recently, we introduced a set of implementation design principles and associated techniques to optimize lineage-enabled database engines and realized them in our prototype database engine, namely, Smoke. |
146 | Transform-Data-by-Example (TDE): Extensible Data Transformation in Excel | Yeye He, Kris Ganjam, Kukjin Lee, Yue Wang, Vivek Narasayya, Surajit Chaudhuri, Xu Chu, Yudian Zheng | We thus develop an extensible data transformation system called Transform-Data-by-Example (TDE) that can leverage rich transformation logic in source code, DLLs, web services and mapping tables, so that end-users only need to provide a few (typically 3) input/output examples, and TDE can synthesize desired programs using relevant transformation logic from these sources. |
147 | GRFusion: Graphs as First-Class Citizens in Main-Memory Relational Database Systems | Mohamed S. Hassan, Tatiana Kuznetsova, Hyun Chai Jeong, Walid G. Aref, Mohammad Sadoghi | In this demonstration, we present GRFusion, an in-memory relational database system, where graphs are managed as first-class citizens. |
148 | DataProf: Semantic Profiling for Iterative Data Cleansing and Business Rule Acquisition | Ziheng Wei, Sebastian Link | DataProf: Semantic Profiling for Iterative Data Cleansing and Business Rule Acquisition |
149 | GeoFlux: Hands-Off Data Integration Leveraging Join Key Knowledge | Jie Song, Danai Koutra, Murali Mani, H. V. Jagadish | To address this barrier, we study how much we can do with no user guidance. |
150 | Deeper: A Data Enrichment System Powered by Deep Web | Pei Wang, Yongjun He, Ryan Shea, Jiannan Wang, Eugene Wu | In this work, we explore a more targeted alternative that uses resources (in terms of web API calls) proportional to the size of the local database of interest. |
151 | Tighter Upper Bounds for Join Cardinality Estimates | Walter Cai | Tighter Upper Bounds for Join Cardinality Estimates |
152 | Approximate Triangle Count and Clustering Coefficient | Siddharth Bhatia | In this paper, we present methods to approximate these metrics for graphs. |
153 | (Artificial) Mind over Matter: Integrating Humans and Algorithms in Solving Matching Problems | Roee Shraga | (Artificial) Mind over Matter: Integrating Humans and Algorithms in Solving Matching Problems |
154 | FREDDY: Fast Word Embeddings in Database Systems | Michael Günther | FREDDY: Fast Word Embeddings in Database Systems |
155 | RDSQ: Reliable Queue Protocol over Shared Logs | Haolin Yu | RDSQ: Reliable Queue Protocol over Shared Logs |
156 | Efficient Exploration of Linked Data | Oren Kalinsky | Towards that we develop ELinda – an explorer for linked data. |
157 | SSD as SQLite Engine | Soyee Choi | SSD as SQLite Engine |
158 | Worst Case Optimal Joins on Relational and XML data | Yuxing Chen | To overcome such limitation, we propose a multi-model processing framework for relational and semi-structured data (i.e. XML), and design a worst-case optimal join algorithm. |
159 | MonetDBLite: An Embedded Analytical Database | Mark Raasveldt | MonetDBLite: An Embedded Analytical Database |
160 | Splaying Log-Structured Merge-Trees | Thomas Lively, Luca Schroeder, Carlos Mendizábal | To address this shortcoming, we propose and analyze a simple decision scheme that can be added to any LSM-based key-value store and dramatically reduce the number of disk I/Os for these classes of workloads. |
161 | Incremental View Maintenance for Property Graph Queries | Gábor Szárnyas | Due to the novelty of the field, graph databases and frameworks typically provide their own query language, such as Cypher for Neo4j, Gremlin for TinkerPop and GraphScript for SAP HANA. |