Paper Digest: SIGMOD 2016 Highlights
The ACM Special Interest Group on Management of Data (SIGMOD) is one of the top conferences on database management systems and data management technology.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
team@paperdigest.org
TABLE 1: SIGMOD 2016 Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | Building Machine Learning Systems that Understand | Jeff Dean | In this talk, I will highlight some of the advances that have been made in deep learning and suggest some interesting directions for future research. |
2 | Learning Linear Regression Models over Factorized Joins | Maximilian Schleich, Dan Olteanu, Radu Ciucanu | We propose a new paradigm for computing batch gradient descent that exploits the factorized computation and representation of the training datasets, a rewriting of the regression objective function that decouples the computation of cofactors of model parameters from their convergence, and the commutativity of cofactor computation with relational union and projection. |
3 | To Join or Not to Join?: Thinking Twice about Joins before Feature Selection | Arun Kumar, Jeffrey Naughton, Jignesh M. Patel, Xiaojin Zhu | In this work, we show that the features brought in by such joins can often be ignored without affecting ML accuracy significantly, i.e., we can "avoid joins safely." |
4 | Real-time Video Recommendation Exploration | Yanxiang Huang, Bin Cui, Jie Jiang, Kunqian Hong, Wenyu Zhang, Yiran Xie | To address the deficiencies of current recommendation systems, we introduce some new techniques to provide real-time and accurate recommendations to users in the video recommendation system of Tencent Inc.. |
5 | Towards Globally Optimal Crowdsourcing Quality Management: The Uniform Worker Setting | Akash Das Sarma, Aditya Parameswaran, Jennifer Widom | In this paper, we focus on filtering, where tasks require the evaluation of a yes/no predicate, and rating, where tasks elicit integer scores from a finite domain. |
6 | Building the Enterprise Fabric for Big Data with Vertica and Spark Integration | Jeff LeFevre, Rui Liu, Cornelio Inigo, Lupita Paz, Edward Ma, Malu Castellanos, Meichun Hsu | In this paper, we present our initial efforts toward a solution that satisfies the above requirements by integrating the HPE Vertica enterprise database with Apache Spark’s open source big data computation engine. |
7 | Truss Decomposition of Probabilistic Graphs: Semantics and Algorithms | Xin Huang, Wei Lu, Laks V.S. Lakshmanan | In this paper, given a probabilistic graph G, number k and parameter γ –(0,1], we define a (k,γ)-truss as a maximal connected subgraph H ⊆ G, in which for each edge, the probability that it is contained in at least (k-2) triangles is at least γ. |
8 | Efficient and Progressive Group Steiner Tree Search | Rong-Hua Li, Lu Qin, Jeffrey Xu Yu, Rui Mao | To overcome these limitations, we propose an efficient and progressive GST algorithm in this paper, called PrunedDP. |
9 | Publishing Attributed Social Graphs with Formal Privacy Guarantees | Zach Jorgensen, Ting Yu, Graham Cormode | We introduce an approach to release such graphs under the strong guarantee of differential privacy. |
10 | Publishing Graph Degree Distribution with Node Differential Privacy | Wei-Yen Day, Ninghui Li, Min Lyu | In this paper, we investigate the problem of publishing the degree distribution of a graph under node-DP by exploring the projection approach to reduce the sensitivity. |
11 | Principled Evaluation of Differentially Private Algorithms using DPBench | Michael Hay, Ashwin Machanavajjhala, Gerome Miklau, Yan Chen, Dan Zhang | In this paper we propose a set of evaluation principles which we argue are essential for sound evaluation. |
12 | PrivTree: A Differentially Private Algorithm for Hierarchical Decompositions | Jun Zhang, Xiaokui Xiao, Xing Xie | To remedy the deficiency of existing solutions, we present PrivTree, a histogram construction algorithm that adopts hierarchical decomposition but completely eliminates the dependency on a pre-defined h. |
13 | Adaptive Indexing over Encrypted Numeric Data | Panagiotis Karras, Artyom Nikitin, Muhammad Saad, Rudrika Bhatt, Denis Antyukhov, Stratos Idreos | In this paper, we propose and analyze a scheme for lightweight and indexable encryption, based on linear-algebra operations. |
14 | Practical Private Range Search Revisited | Ioannis Demertzis, Stavros Papadopoulos, Odysseas Papapetrou, Antonios Deligiannakis, Minos Garofalakis | In this paper, we take an interdisciplinary approach, which combines the rigor of Security formulations and proofs with efficient Data Management techniques. We construct a wide set of novel schemes with realistic security/performance trade-offs, adopting the notion of Searchable Symmetric Encryption (SSE) primarily proposed for keyword search. |
15 | Privacy Preserving Subgraph Matching on Large Graphs in Cloud | Zhao Chang, Lei Zou, Feifei Li | To reduce the search space for a subgraph matching query, we propose a cost model to select the more effective label combinations. |
16 | The Snowflake Elastic Data Warehouse | Benoit Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, Allison W. Lee, Ashish Motivala, Abdul Q. Munir, Steven Pelley, Peter Povinec, Greg Rahn, Spyridon Triantafyllis, Philipp Unterbrunner | In this paper, we describe the design of Snowflake and its novel multi-cluster, shared-data architecture. |
17 | Closing the functional and Performance Gap between SQL and NoSQL | Zhen Hua Liu, Beda Hammerschmidt, Doug McMahon, Ying Liu, Hui Joe Chang | In this paper, we present enhancements to Oracle’s JSON data management in the upcoming 12cR2 release. |
18 | Have Your Data and Query It Too: From Key-Value Caching to Big Data Management | Dipti Borkar, Ravi Mayuram, Gerald Sangudi, Michael Carey | This paper describes the architectural changes needed to address the requirements posed by next-generation database applications. |
19 | Ambry: LinkedIn’s Scalable Geo-Distributed Object Store | Shadi A. Noghabi, Sriram Subramanian, Priyesh Narayanan, Sivabalan Narayanan, Gopalakrishna Holla, Mammad Zadeh, Tianwei Li, Indranil Gupta, Roy H. Campbell | We present Ambry, a production-quality system for storing large immutable data (called blobs). |
20 | SQL Schema Design: Foundations, Normal Forms, and Normalization | Henning Köhler, Sebastian Link | Unfortunately, relational normalization only works for idealized database instances in which duplicates and null markers are not present. |
21 | SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment | Shrainik Jain, Dominik Moritz, Daniel Halperin, Bill Howe, Ed Lazowska | Our contributions include a system design for delivering databases into these contexts, a description of a public research query workload dataset released to advance research in analytic data systems, and an initial analysis of the workload that provides evidence of new use cases under-supported in existing systems. |
22 | Automatic Generation of Normalized Relational Schemas from Nested Key-Value Data | Michael DiScala, Daniel J. Abadi | In this paper we present an algorithm that automatically transforms the denormalized, nested data commonly found in NoSQL systems into traditional relational data that can be stored in a standard RDBMS. |
23 | Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation | Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A. Boncz, Thomas Neumann, Alfons Kemper | This work aims at reducing the main-memory footprint in high performance hybrid OLTP & OLAP databases, while retaining high query performance and transactional throughput. |
24 | GeckoFTL: Scalable Flash Translation Techniques For Very Large Flash Devices | Niv Dayan, Philippe Bonnet, Stratos Idreos | In this paper, we identify a key component of the metadata called the Page Validity Bitmap (PVB) as the bottleneck. |
25 | SHARE Interface in Flash Storage for Relational and NoSQL Databases | Gihwan Oh, Chiyoung Seo, Ravi Mayuram, Yang-Suk Kee, Sang-Won Lee | In this paper, we propose a flash storage interface, SHARE. |
26 | Accelerating Relational Databases by Leveraging Remote Memory and RDMA | Feng Li, Sudipto Das, Manoj Syamala, Vivek R. Narasayya | We implemented the scenarios in Microsoft SQL Server engine and present the first end-to-end study to demonstrate benefits of remote memory for a variety of micro-benchmarks and industry-standard benchmarks. |
27 | FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory | Ismail Oukid, Johan Lasperas, Anisoara Nica, Thomas Willhalm, Wolfgang Lehner | In this paper we propose a novel hybrid SCM-DRAM persistent and concurrent B-Tree, named Fingerprinting Persistent Tree (FPTree) that achieves similar performance to DRAM-based counterparts. |
28 | Micro-architectural Analysis of In-memory OLTP | Utku Sirin, Pinar Tözün, Danica Porobic, Anastasia Ailamaki | This paper sheds light on the micro-architectural behavior of in-memory database systems by analyzing and contrasting it to the behavior of disk-based systems when running OLTP workloads. |
29 | iBFS: Concurrent Breadth-First Search on GPUs | Hang Liu, H. Howie Huang, Yang Hu | In this work, we focus on a special class of graph traversal algorithm – concurrent BFS – where multiple breadth-first traversals are performed simultaneously on the same graph. |
30 | Tornado: A System For Real-Time Iterative Analysis Over Evolving Data | Xiaogang Shi, Bin Cui, Yingxia Shao, Yunhai Tong | In this paper, we propose a novel execution model to obtain timely results at given instants. |
31 | EmptyHeaded: A Relational Engine for Graph Processing | Christopher R. Aberger, Susan Tu, Kunle Olukotun, Christopher Ré | We present EmptyHeaded, a high-level engine that supports a rich datalog-like query language and achieves performance comparable to that of low-level engines. |
32 | GTS: A Fast and Scalable Graph Processing Method based on Streaming Topology to GPUs | Min-Soo Kim, Kyuhyeon An, Himchan Park, Hyunseok Seo, Jinwook Kim | Here, we propose a fast and scalable graph processing method GTS that handles even RMAT32 (64 billion edges) very efficiently only by using a single machine. |
33 | Graph Analytics Through Fine-Grained Parallelism | Zechao Shang, Feifei Li, Jeffrey Xu Yu, Zhiwei Zhang, Hong Cheng | Efficient graph analytics thus becomes an important subject of study. |
34 | Hybrid Pulling/Pushing for I/O-Efficient Distributed and Iterative Graph Computing | Zhigang Wang, Yu Gu, Yubin Bao, Ge Yu, Jeffrey Xu Yu | This paper proposes a hybrid solution to support switching between push and pull adaptively, to obtain optimal performance for distributed systems with disk-resident data in different scenarios. |
35 | Scalable Pattern Sharing on Event Streams* | Medhabi Ray, Chuan Lei, Elke A. Rundensteiner | In this work we design the SPASS framework that successfully tackles these demanding CEP workloads. |
36 | How to Win a Hot Dog Eating Contest: Distributed Incremental View Maintenance with Batch Updates | Milos Nikolic, Mohammad Dashti, Christoph Koch | In this paper, we study low-latency incremental computation of complex SQL queries in both local and distributed streaming environments. |
37 | Sharing-Aware Outlier Analytics over High-Volume Data Streams | Lei Cao, Jiayuan Wang, Elke A. Rundensteiner | In this work we propose a sharing-aware multi-query execution strategy for outlier detection on data streams called SOP. |
38 | THEMIS: Fairness in Federated Stream Processing under Overload | Evangelia Kalyvianaki, Marco Fiscato, Theodoros Salonidis, Peter Pietzuch | We describe THEMIS, a federated stream processing system for resource-starved, multi-site deployments. |
39 | SABER: Window-Based Hybrid Stream Processing for Heterogeneous Architectures | Alexandros Koliousis, Matthias Weidlich, Raul Castro Fernandez, Alexander L. Wolf, Paolo Costa, Peter Pietzuch | We describe Saber, a hybrid high-performance relational stream processing engine for CPUs and GPGPUs. |
40 | Range Thresholding on Streams | Miao Qiao, Junhao Gan, Yufei Tao | We propose the first algorithm that breaks the quadratic barrier, by reducing the computation cost dramatically to O(n + m), subject only to a polylogarithmic factor. |
41 | Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads | Joy Arulraj, Andrew Pavlo, Prashanth Menon | To overcome this barrier, we present a hybrid DBMS architecture that efficiently supports varied workloads on the same database. |
42 | An Effective Syntax for Bounded Relational Queries | Yang Cao, Wenfei Fan | We provide quadratic-time algorithms to check the coverage of Q, and to generate a bounded query plan for covered Q. |
43 | Wander Join: Online Aggregation via Random Walks | Feifei Li, Bin Wu, Ke Yi, Zhuoyue Zhao | This paper proposes a new approach, the wander join algorithm, to the online aggregation problem by performing random walks over the underlying join graph. |
44 | Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters | Srikanth Kandula, Anil Shanbhag, Aleksandar Vitorovic, Matthaios Olma, Robert Grandl, Surajit Chaudhuri, Bolin Ding | We present a system that approximates the answer to complex ad-hoc queries in big-data clusters by injecting samplers on-the-fly and without requiring pre-existing samples. |
45 | A Study of Sorting Algorithms on Approximate Memory | Shuang Chen, Shunning Jiang, Bingsheng He, Xueyan Tang | In this paper, we study one of the most basic operations in database–sorting on a hybrid storage system with both precise storage and approximate storage. |
46 | Distributed Wavelet Thresholding for Maximum Error Metrics | Ioannis Mytilinis, Dimitrios Tsoumakos, Nectarios Koziris | To that end, we present i) a general framework for the parallelization of existing dynamic programming algorithms, ii) a parallel version of one such DP-based algorithm and iii) a new parallel greedy algorithm for the problem. |
47 | Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee | Bolin Ding, Silu Huang, Surajit Chaudhuri, Kaushik Chakrabarti, Chi Wang | We propose a novel sampling scheme called measure-biased sampling to address the former challenge. |
48 | Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks | Hung T. Nguyen, My T. Thai, Thang N. Dinh | In this paper, we propose SSA and D-SSA, two novel sampling frameworks for IM-based viral marketing problems. |
49 | Spheres of Influence for More Effective Viral Marketing | Yasir Mehmood, Francesco Bonchi, David García-Soriano | We thus formalize the Typical Cascade problem which requires, for a given source node s, to find the set of nodes minimizing the expected Jaccard distance to all the possible cascades from s. |
50 | Continuous Influence Maximization: What Discounts Should We Offer to Social Network Users? | Yu Yang, Xiangbo Mao, Jian Pei, Xiaofei He | In this paper, we tackle the problem systematically. |
51 | Holistic Influence Maximization: Combining Scalability and Efficiency with Opinion-Aware Models | Sainyam Galhotra, Akhil Arora, Shourya Roy | In this paper, we propose a holistic solution to the influence maximization (IM) problem. Under the OI model, we introduce a novel problem of Maximizing the Effective Opinion (MEO) of influenced users. |
52 | Potential and Pitfalls of Domain-Specific Information Extraction at Web Scale | Astrid Rheinländer, Mario Lehmann, Anja Kunkel, Jörg Meier, Ulf Leser | In this paper, we report our experiences from building such a system for comparing the "web view" on health related topics with that derived from a controlled scientific corpus, i.e., Medline. |
53 | Robust and Noise Resistant Wrapper Induction | Tim Furche, Jinsong Guo, Sebastian Maneth, Christian Schallhart | We introduce such a language as subset of XPATH and show that even for such a restricted language, inducing optimal queries according to a suitable scoring is infeasible. |
54 | Goods: Organizing Google’s Datasets | Alon Halevy, Flip Korn, Natalya F. Noy, Christopher Olston, Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang | In this paper, we present GOODS, a project to rethink how we organize structured datasets at scale, in a setting where teams use diverse and often idiosyncratic ways to produce the datasets and where there is no centralized system for storing and querying them. |
55 | Multi-Source Uncertain Entity Resolution at Yad Vashem: Transforming Holocaust Victim Reports into People | Tomer Sagi, Avigdor Gal, Omer Barkol, Ruth Bergman, Alexander Avram | In this work we describe an entity resolution project performed at Yad Vashem, the central repository of Holocaust-era information. |
56 | A Hybrid Approach to Functional Dependency Discovery | Thorsten Papenbrock, Felix Naumann | For this reason, database research has proposed various algorithms for functional dependency discovery. |
57 | Ontological Pathfinding | Yang Chen, Sean Goldberg, Daisy Zhe Wang, Soumitra Siddharth Johri | We propose the Ontological Pathfinding algorithm (OP) that scales to web-scale knowledge bases via a series of parallelization and optimization techniques: a relational knowledge base model to apply inference rules in batches, a new rule mining algorithm that parallelizes the join queries, a novel partitioning algorithm to break the mining tasks into smaller independent sub-tasks, and a pruning strategy to eliminate unsound and resource-consuming rules before applying them. |
58 | Extracting Databases from Dark Data with DeepDive | Ce Zhang, Jaeho Shin, Christopher Ré, Michael Cafarella, Feng Niu | DeepDive is distinctive when compared to previous information extraction systems in its ability to obtain very high precision and recall at reasonable engineering cost; in a number of applications, we have used DeepDive to create databases with accuracy that meets that of human annotators. |
59 | Estimating the Impact of Unknown Unknowns on Aggregate Query Results | Yeounoh Chung, Michael Lind Mortensen, Carsten Binnig, Tim Kraska | In this work, we develop and analyze techniques to estimate the impact of the unknown data (a.k.a., unknown unknowns) on simple aggregate queries. |
60 | Constraint-Variance Tolerant Data Repairing | Shaoxu Song, Han Zhu, Jianmin Wang | To address the oversimplified and overrefined constraint inaccuracies, in this paper, we propose to repair data by allowing a small variation (with both predicate insertion and deletion) on the constraints. |
61 | Interactive and Deterministic Data Cleaning | Jian He, Enzo Veltri, Donatello Santoro, Guoliang Li, Giansalvatore Mecca, Paolo Papotti, Nan Tang | We present Falcon, an interactive, deterministic, and declarative data cleaning system, which uses SQL update queries as the language to repair data. |
62 | Sequential Data Cleaning: A Statistical Approach | Aoqian Zhang, Shaoxu Song, Jianmin Wang | We formalize the likelihood-based cleaning problem, show its NP-hardness, devise exact algorithms, and propose several approximate/heuristic methods to trade off effectiveness for efficiency. |
63 | Learning-Based Cleansing for Indoor RFID Data | Asif Iqbal Baba, Manfred Jaeger, Hua Lu, Torben Bach Pedersen, Wei-Shinn Ku, Xike Xie | We propose the Indoor RFID Multi-variate Hidden Markov Model (IR-MHMM) to capture the uncertainties of indoor RFID data as well as the correlation of moving object locations and object RFID readings. |
64 | PrivateClean: Data Cleaning and Differential Privacy | Sanjay Krishnan, Jiannan Wang, Michael J. Franklin, Ken Goldberg, Tim Kraska | This paper explores the link between data cleaning and differential privacy in a framework we call PrivateClean. |
65 | RDFind: Scalable Conditional Inclusion Dependency Discovery in RDF Datasets | Sebastian Kruse, Anja Jentzsch, Thorsten Papenbrock, Zoi Kaoudi, Jorge-Arnulfo Quiané-Ruiz, Felix Naumann | In our experimental evaluation, we show that RDFind is up to 419 times faster than the state-of-the-art, while considering a more general class of CINDs. |
66 | Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach | Chengliang Chai, Guoliang Li, Jian Li, Dong Deng, Jianhua Feng | To address these problems, we propose a cost-effective crowdsourced entity resolution framework, which significantly reduces the monetary cost while keeping high quality. |
67 | Topic Exploration in Spatio-Temporal Document Collections | Kaiqi Zhao, Lisi Chen, Gao Cong | In this paper, we study the problem of efficiently mining topics from spatio-temporal documents within a user specified bounded region and timespan, to provide users with insights about events, trends, and public concerns within the specified region and time period. |
68 | ParTime: Parallel Temporal Aggregation | Markus Pilman, Martin Kaufmann, Florian Köhl, Donald Kossmann, Damien Profeta | This paper presents ParTime, a parallel algorithm for temporal aggregation. |
69 | Data Polygamy: The Many-Many Relationships among Urban Spatio-Temporal Data Sets | Fernando Chirigati, Harish Doraiswamy, Theodoros Damoulas, Juliana Freire | To address these challenges, we propose Data Polygamy, a scalable topology-based framework that allows users to query for statistically significant relationships between spatio-temporal data sets. |
70 | Distributed Evaluation of Top-k Temporal Joins | Julien Pilourdault, Vincent Leroy, Sihem Amer-Yahia | We show how to exploit the nature of temporal predicates and the properties of their associated scoring semantics to design TKIJ, an efficient query evaluation approach on a distributed Map-Reduce architecture. |
71 | AT-GIS: Highly Parallel Spatial Query Processing with Associative Transducers | Peter Ogden, David Thomas, Peter Pietzuch | Our goal is to fully exploit the parallelism offered by modern multi-core CPUs for parsing and query execution, thus providing the performance of a cluster with the resources of a single machine. |
72 | Towards Best Region Search for Data Exploration | Kaiyu Feng, Gao Cong, Sourav S. Bhowmick, Wen-Chih Peng, Chunyan Miao | We propose an efficient algorithm called SliceBRS to find the exact answer to the BRS problem. This paper introduces a novel problem called the best region search (BRS) problem and provides efficient solutions to it. |
73 | Simba: Efficient In-Memory Spatial Analytics | Dong Xie, Feifei Li, Bin Yao, Gefei Li, Liang Zhou, Minyi Guo | We present the Simba (Spatial In-Memory Big data Analytics) system that offers scalable and efficient in-memory spatial query processing and analytics for big spatial data. |
74 | Realtime Data Processing at Facebook | Guoqiang Jerry Chen, Janet L. Wiener, Shridhar Iyer, Anshul Jaiswal, Ran Lei, Nikhil Simha, Wei Wang, Kevin Wilfong, Tim Williamson, Serhat Yilmaz | In this paper, we identify five important design decisions that affect their ease of use, performance, fault tolerance, scalability, and correctness. |
75 | SparkR: Scaling R Programs with Spark | Shivaram Venkataraman, Zongheng Yang, Davies Liu, Eric Liang, Hossein Falaki, Xiangrui Meng, Reynold Xin, Ali Ghodsi, Michael Franklin, Ion Stoica, Matei Zaharia | We present SparkR, an R package that provides a frontend to Apache Spark and uses Spark’s distributed computation engine to enable large scale data analysis from the R shell. |
76 | VectorH: Taking SQL-on-Hadoop to the Next Level | Andrei Costea, Adrian Ionescu, Bogdan Răducanu, Michał Switakowski, Cristian Bârca, Juliusz Sompolski, Alicja Łuszczak, Michał Szafrański, Giel de Nijs, Peter Boncz | We describe the changes made to single-server Vectorwise to turn it into a Hadoop-based MPP system, encompassing workload management, parallel query optimization and execution, HDFS storage, transaction processing and Spark integration. |
77 | Adaptive Logging: Optimizing Logging and Recovery Costs in Distributed In-memory Databases | Chang Yao, Divyakant Agrawal, Gang Chen, Beng Chin Ooi, Sai Wu | The percentage of data logging versus command logging becomes a tuning knob between the performance of transaction processing and recovery to meet different OLTP requirements, and a model is proposed to guide such tuning. |
78 | Big Data Analytics with Datalog Queries on Spark | Alexander Shkapsky, Mohan Yang, Matteo Interlandi, Hsuan Chiu, Tyson Condie, Carlo Zaniolo | Towards this goal, we propose compilation and optimization techniques that tackle the important problem of efficiently supporting recursion in Spark. |
79 | An Efficient MapReduce Cube Algorithm for Varied DataDistributions | Tova Milo, Eyal Altshuler | To address this problem, we consider cube computation in MapReduce, the popular paradigm for distributed big data processing, and present an efficient algorithm for computing cubes over large data sets. |
80 | Diversified Top-k Subgraph Querying in a Large Graph | Zhengwei Yang, Ada Wai-Chee Fu, Ruifeng Liu | In this work, we study the problem of top-k diversified subgraph querying that asks for a set of up to k subgraphs isomorphic to a given query graph, and that covers the largest number of vertices. |
81 | Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs | Mohamed S. Hassan, Walid G. Aref, Ahmed M. Aly | This paper introduces Edge-Disjoint Partitioning (EDP, for short), a new technique for efficiently answering ECSP queries over dynamic graphs. |
82 | Efficient Subgraph Matching by Postponing Cartesian Products | Fei Bi, Lijun Chang, Xuemin Lin, Lu Qin, Wenjie Zhang | In this paper, we study the problem of subgraph matching that extracts all subgraph isomorphic embeddings of a query graph q in a large data graph G. |
83 | Adding Counting Quantifiers to Graph Patterns | Wenfei Fan, Yinghui Wu, Jingbo Xu | This paper proposes quantified graph patterns (QGPs), an extension of graph patterns by supporting simple counting quantifiers on edges. |
84 | DUALSIM: Parallel Subgraph Enumeration in a Massive Graph on a Single Machine | Hyeonji Kim, Juneyoung Lee, Sourav S. Bhowmick, Wook-Shin Han, JeongHoon Lee, Seongyun Ko, Moath H.A. Jarrah | In this paper, we design and implement a disk-based, single machine parallel subgraph enumeration solution called DualSim that can handle massive graphs without maintaining exponential numbers of partial results. |
85 | Distributed Set Reachability | Sairam Gurajada, Martin Theobald | In this paper, we focus on the efficient and scalable processing of set-reachability queries over a distributed, directed data graph. |
86 | Fast Multi-Column Sorting in Main-Memory Column-Stores | Wenjian Xu, Ziqiang Feng, Eric Lo | In this paper, we propose a new technique called "code massaging", which manipulates the bits across the columns so that the overall sorting time can be reduced by eliminating some rounds of sorting and/or by improving the degree of SIMD data level parallelism. |
87 | Elastic Pipelining in an In-Memory Database Cluster | Li Wang, Minqi Zhou, Zhenjie Zhang, Yin Yang, Aoying Zhou, Dina Bitton | To tackle this problem, we propose elastic pipelining, which makes it possible to optimize intra-node parallelism assignments in the pipelines based on the actual workload at runtime. |
88 | Page As You Go: Piecewise Columnar Access In SAP HANA | Reza Sherkat, Colin Florendo, Mihnea Andrei, Anil K. Goel, Anisoara Nica, Peter Bumbulis, Ivan Schreter, Günter Radestock, Christian Bensberg, Daniel Booss, Heiko Gerwens | As an alternative approach, we propose to reduce the unit of load and eviction from column to a contiguous portion of the in-memory columnar representation, which we call a page. |
89 | Hybrid Garbage Collection for Multi-Version Concurrency Control in SAP HANA | Juchang Lee, Hyungyu Shin, Chang Gyoo Park, Seongyun Ko, Jaeyun Noh, Yongjae Chuh, Wolfgang Stephan, Wook-Shin Han | In this paper, we present an efficient and effective garbage collector called HybridGC in SAP HANA. |
90 | UpBit: Scalable In-Memory Updatable Bitmap Indexing | Manos Athanassoulis, Zheng Yan, Stratos Idreos | In this paper, we propose scalable in-memory Updatable Bitmap indexing (UpBit), which offers efficient updates, without hurting read performance. |
91 | FluxQuery: An Execution Framework for Highly Interactive Query Workloads | Roee Ebenstein, Niranjan Kamat, Arnab Nandi | We propose a novel model to interpret the variability of likely queries in a workload. |
92 | iOLAP: Managing Uncertainty for Efficient Incremental OLAP | Kai Zeng, Sameer Agarwal, Ion Stoica | In this paper, we propose iOLAP, an incremental OLAP query engine that provides a smooth trade-off between query accuracy and latency, and fulfills a full spectrum of user requirements from approximate but timely query execution to a more traditional accurate query execution. |
93 | Dynamic Prefetching of Data Tiles for Interactive Visualization | Leilani Battle, Remco Chang, Michael Stonebraker | In this paper, we present ForeCache, a general-purpose tool for exploratory browsing of large datasets. |
94 | Expressive Query Construction through Direct Manipulation of Nested Relational Results | Eirik Bakke, David R. Karger | This paper presents the first visual query system to meet all three requirements in a single design. |
95 | Shasta: Interactive Reporting At Scale | Gokul Nath Babu Manoharan, Stephan Ellner, Karl Schnaitter, Sridatta Chegu, Alejandro Estrella-Balderrama, Stephan Gudmundson, Apurv Gupta, Ben Handy, Bart Samwel, Chad Whipkey, Larysa Aharkava, Himani Apte, Nitin Gangahar, Jun Xu, Shivakumar Venkataraman, Divyakant Agrawal, Jeffrey D. Ullman | We describe Shasta, a middleware system built at Google to support interactive reporting in complex user-facing applications related to Google’s Internet advertising business. |
96 | Datometry Hyper-Q: Bridging the Gap Between Real-Time and Historical Analytics | Lyublena Antova, Rhonda Baldwin, Derrick Bryant, Tuan Cao, Michael Duller, John Eshleman, Zhongxian Gu, Entong Shen, Mohamed A. Soliman, F. Michael Waas | In this paper we present Hyper-Q, a data virtualization plat- form that overcomes the chasm. |
97 | Time Adaptive Sketches (Ada-Sketches) for Summarizing Data Streams | Anshumali Shrivastava, Arnd Christian Konig, Mikhail Bilenko | In this work, we describe a new method, Time-adaptive Sketches, (Ada-sketch), that overcomes these limitations, while extending and providing a strict generalization of several popular sketching algorithms. |
98 | Streaming Algorithms for Robust Distinct Elements | Di Chen, Qin Zhang | In this paper, we formalize the problem of robust distinct elements, and develop space and time-efficient streaming algorithms for datasets in the Euclidean space, using a novel technique we call bucket sampling. |
99 | Augmented Sketch: Faster and More Accurate Stream Processing | Pratanu Roy, Arijit Khan, Gustavo Alonso | Approximated algorithms are often used to estimate the frequency of items on high volume, fast data streams. |
100 | Matrix Sketching Over Sliding Windows | Zhewei Wei, Xuancheng Liu, Feifei Li, Shuo Shang, Xiaoyong Du, Ji-Rong Wen | With this observation, we present three general frameworks for matrix sketching on sliding windows. |
101 | Graph Stream Summarization: From Big Bang to Big Crunch | Nan Tang, Qing Chen, Prasenjit Mitra | We present TCM, a novel generalized graph stream summary. |
102 | Scalable Approximate Query Tracking over Highly Distributed Data Streams | Nikos Giatrakos, Antonios Deligiannakis, Minos Garofalakis | In this paper, we propose novel techniques that effectively tackle the aforementioned scalability problems by exploiting a carefully designed sample of the remote sites for efficient approximate query tracking. |
103 | A Hybrid B+-tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing Platforms | Amirhesam Shahvarani, Hans-Arno Jacobsen | In this paper, we propose a novel design for a B+-tree based on the heterogeneous computing platform and the hybrid memory architecture found in GPUs. |
104 | Low-Overhead Asynchronous Checkpointing in Main-Memory Database Systems | Kun Ren, Thaddeus Diamond, Daniel J. Abadi, Alexander Thomson | This paper presents Checkpointing Asynchronously using Logical Consistency (CALC), a lightweight, asynchronous technique for capturing database snapshots that does not require a physical point of consistency to create a checkpoint, and avoids conspicuous latency spikes incurred by other database snapshotting schemes. |
105 | T-Part: Partitioning of Transactions for Forward-Pushing in Deterministic Database Systems | Shan-Hung Wu, Tsai-Yu Feng, Meng-Kai Liao, Shao-Kan Pi, Yu-Shan Lin | In this paper, we present T-Part, a transaction execution engine that partitions transactions in a deterministic database system to deal with the unforeseeable workloads or workloads whose data are hard to partition. |
106 | Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes | Huanchen Zhang, David G. Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, Rui Shen | To reduce this overhead, we propose using a two-stage index: The first stage ingests all incoming entries and is kept small for fast read and write operations. |
107 | Design Principles for Scaling Multi-core OLTP Under High Contention | Kun Ren, Jose M. Faleiro, Daniel J. Abadi | In this paper we identify two prevalent design principles that limit the multi-core scalability of many (but not all) transactional database systems on contended workloads: the multi-purpose nature of execution threads in these systems, and the lack of advanced planning of data access. |
108 | DBSherlock: A Performance Diagnostic Tool for Transactional Databases | Dong Young Yoon, Ning Niu, Barzan Mozafari | This paper presents a practical tool for assisting DBAs in quickly and reliably diagnosing performance problems in an OLTP database. |
109 | TARDiS: A Branch-and-Merge Approach To Weak Consistency | Natacha Crooks, Youer Pu, Nancy Estrada, Trinabh Gupta, Lorenzo Alvisi, Allen Clement | This paper presents the design, implementation, and evaluation of TARDiS (Transactional Asynchronously Replicated Divergent Store), a transactional key-value store explicitly designed for weakly-consistent systems. |
110 | TicToc: Time Traveling Optimistic Concurrency Control | Xiangyao Yu, Andrew Pavlo, Daniel Sanchez, Srinivas Devadas | In this paper we present TicToc, a new optimistic concurrency control algorithm that avoids the scalability and concurrency bottlenecks of prior T/O schemes. |
111 | Scaling Multicore Databases via Constrained Parallel Execution | Zhaoguo Wang, Shuai Mu, Yang Cui, Han Yi, Haibo Chen, Jinyang Li | In this paper, we describe a new concurrency control scheme, interleaving constrained concurrency con- trol (IC3), which provides serializability while allowing for parallel execution of certain conflicting transactions. |
112 | Towards a Non-2PC Transaction Management in Distributed Database Systems | Qian Lin, Pengfei Chang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, Zhengkui Wang | In this paper, we propose a transaction management scheme called LEAP to avoid the 2PC protocol within distributed transaction processing. |
113 | ERMIA: Fast Memory-Optimized Database System for Heterogeneous Workloads | Kangnyeon Kim, Tianzheng Wang, Ryan Johnson, Ippokratis Pandis | In this paper, we present ERMIA, a memory-optimized database system built from scratch to cater the need of handling heterogeneous workloads. |
114 | Transaction Healing: Scaling Optimistic Concurrency Control on Multicores | Yingjun Wu, Chee-Yong Chan, Kian-Lee Tan | In this paper, we propose a new concurrency-control mechanism, called transaction healing, that exploits program semantics to scale the conventional OCC towards dozens of cores even under highly contended workloads. |
115 | Enabling Incremental Query Re-Optimization | Mengmeng Liu, Zachary G. Ives, Boon Thau Loo | We investigate how to build a cost-based optimizer that recomputes the optimal plan incrementally given new cost information, much as a stream engine constantly updates its outputs given new data. |
116 | Sampling-Based Query Re-Optimization | Wentao Wu, Jeffrey F. Naughton, Harneet Singh | In this paper, we propose a low-cost post-processing step that can take a plan produced by the optimizer, detect when it is likely to have made such a mistake, and take steps to fix it. |
117 | A Fast Randomized Algorithm for Multi-Objective Query Optimization | Immanuel Trummer, Christoph Koch | In this work, we present the first algorithm with polynomial complexity in the query size. |
118 | Operator and Query Progress Estimation in Microsoft SQL Server Live Query Statistics | Kukjin Lee, Arnd Christian König, Vivek Narasayya, Bolin Ding, Surajit Chaudhuri, Brent Ellwein, Alexey Eksarevskiy, Manbeen Kohli, Jacob Wyant, Praneeta Prakash, Rimma Nehme, Jiexing Li, Jeff Naughton | We describe the design and implementation of the new Live Query Statistics (LQS) feature in Microsoft SQL Server 2016. |
119 | Optimization of Nested Queries using the NF2 Algebra | Jürgen Hölsch, Michael Grossniklaus, Marc H. Scholl | In this paper, we argue that the NF2 (non-first normal form) algebra, which was originally designed to process nested tables, is a better approach to nested query optimization as it fulfills two key requirements. |
120 | Extracting Equivalent SQL from Imperative Code in Database Applications | K. Venkatesh Emani, Karthik Ramachandra, Subhro Bhattacharya, S. Sudarshan | In this paper we present an approach to this problem which is based on extracting a concise algebraic representation of (parts of) an application, which may include imperative code as well as SQL queries. |
121 | Generating Preview Tables for Entity Graphs | Ning Yan, Sona Hasani, Abolfazl Asudeh, Chengkai Li | We propose methods to produce preview tables for compact presentation of important entity types and relationships in entity graphs. |
122 | Speedup Graph Processing by Graph Ordering | Hao Wei, Jeffrey Xu Yu, Can Lu, Xuemin Lin | In this paper, we focus on CPU speedup for graph computing in general by reducing the CPU cache miss ratio for different graph algorithms. |
123 | ROLL: Fast In-Memory Generation of Gigantic Scale-free Networks | Ali Hadian, Sadegh Nobari, Behrooz Minaei-Bidgoli, Qiang Qu | In this paper, we propose ROLL-tree, a fast in-memory roulette wheel data structure that accelerates the BA network generation process by exploiting the statistical behaviors of the underlying growth model. |
124 | Functional Dependencies for Graphs | Wenfei Fan, Yinghui Wu, Jingbo Xu | We propose a class of functional dependencies for graphs, referred to as GFDs. |
125 | SLING: A Near-Optimal Index Structure for SimRank | Boyu Tian, Xiaokui Xiao | Scalable SimRank computation has been the subject of extensive research for more than a decade, and yet, none of the existing solutions can efficiently derive SimRank scores on large graphs with provable accuracy guarantees. |
126 | Query Planning for Evaluating SPARQL Property Paths | Nikolay Yakovets, Parke Godfrey, Jarek Gryz | The extension of SPARQL in version 1.1 with property paths offers a type of regular path query for RDF graph databases. |
127 | Robust Query Processing in Co-Processor-accelerated Databases | Sebastian Breß, Henning Funke, Jens Teubner | In this paper, we identify two effects that limit performance in case co-processor resources become scarce. |
128 | How to Architect a Query Compiler | Amir Shaikhha, Yannis Klonatos, Lionel Parreaux, Lewis Brown, Mohammad Dashti, Christoph Koch | We propose to use a stack of multiple DSLs on different levels of abstraction with lowering in multiple steps to make query compilers easier to build and extend, ultimately allowing us to create more convincing and sustainable compiler-based data management systems. |
129 | Automated Demand-driven Resource Scaling in Relational Database-as-a-Service | Sudipto Das, Feng Li, Vivek R. Narasayya, Arnd Christian König | We present a solution to enable a DaaS to auto-scale container sizes on behalf of its tenants. |
130 | GPL: A GPU-based Pipelined Query Processing Engine | Johns Paul, Jiong He, Bingsheng He | In this paper, we propose GPL, a novel pipelined query execution engine to improve the resource utilization of query co-processing on the GPU. |
131 | Towards a Hybrid Design for Fast Query Processing in DB2 with BLU Acceleration Using Graphical Processing Units: A Technology Demonstration | Sina Meraji, Berni Schiefer, Lan Pham, Lee Chu, Peter Kokosielis, Adam Storm, Wayne Young, Chang Ge, Geoffrey Ng, Kajan Kanagaratnam | In this paper, we show how we use Nvidia GPUs and host CPU cores for faster query processing in a DB2 database using BLU Acceleration (DB2’s column store technology). |
132 | An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory | Stefan Schuh, Xiao Chen, Jens Dittrich | In this paper we will try to develop an answer. |
133 | Top-k Relevant Semantic Place Retrieval on Spatial RDF Data | Jieming Shi, Dingming Wu, Nikos Mamoulis | In this work, we propose and study a novel location-based keyword search query on RDF data. |
134 | Local Similarity Search for Unstructured Text | Pei Wang, Chuan Xiao, Jianbin Qin, Wei Wang, Xiaoyang Zhang, Yoshiharu Ishikawa | In this paper, we study the problem of local similarity search to find partially replicated text. |
135 | Similarity Join over Array Data | Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu | In this paper, we introduce a novel distributed similarity join operator for multi-dimensional arrays. |
136 | LazyLSH: Approximate Nearest Neighbor Search for Multiple Distance Functions with a Single Index | Yuxin Zheng, Qi Guo, Anthony K.H. Tung, Sai Wu | In this paper, we propose LazyLSH that answers approximate nearest neighbor queries for multiple Lp metrics with theoretical guarantees. |
137 | Set-based Similarity Search for Time Series | Jinglin Peng, Hongzhi Wang, Jianzhong Li, Hong Gao | In this paper, we propose a novel approach, STS3, to process k-NN queries by transforming time series to sets and measure the similarity under Jaccard metric. |
138 | Range-based Obstructed Nearest Neighbor Queries | Huaijie Zhu, Xiaochun Yang, Bin Wang, Wang-Chien Lee | In this paper, we study a novel variant of obstructed nearest neighbor queries, namely, range-based obstructed nearest neighbor (RONN) search. |
139 | Rheem: Enabling Multi-Platform Task Execution | Divy Agrawal, Lamine Ba, Laure Berti-Equille, Sanjay Chawla, Ahmed Elmagarmid, Hossam Hammady, Yasser Idris, Zoi Kaoudi, Zuhair Khayyat, Sebastian Kruse, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiane-Ruiz, Nan Tang, Mohammed J. Zaki | We will demonstrate the strengths of system by using real-world scenarios from three different applications, namely, machine learning, data cleaning, and data fusion. |
140 | Emma in Action: Declarative Dataflows for Scalable Data Analysis | Alexander Alexandrov, Andreas Salzmann, Georgi Krastev, Asterios Katsifodimos, Volker Markl | To retain a sufficient level of abstraction and lower the barrier of entry for data scientists, projects like Spark and Flink currently offer domain-specific APIs on top of their parallel collection abstractions. |
141 | Wildfire: Concurrent Blazing Data Ingest and Analytics | Ronald Barber, Matt Huras, Guy Lohman, C. Mohan, Rene Mueller, Fatma Özcan, Hamid Pirahesh, Vijayshankar Raman, Richard Sidle, Oleg Sidorkin, Adam Storm, Yuanyuan Tian, Pinar Tözun | We demonstrate Hybrid Transactional and Analytics Processing (HTAP) on the Spark platform by the Wildfire prototype, which can ingest up to ~6 million inserts per second per node and simultaneously perform complex SQL analytics queries. |
142 | Efficient Query Processing on Many-core Architectures: A Case Study with Intel Xeon Phi Processor | Xuntao Cheng, Bingsheng He, Mian Lu, Chiew Tong Lau, Huynh Phung Huynh, Rick Siow Mong Goh | In this demonstration, we present PhiDB, an OLAP query processor with simultaneous multi-threading (SMT) capabilities on Xeon Phi as a case study for parallel database performance on future many-core processors. |
143 | ReproZip: Computational Reproducibility With Ease | Fernando Chirigati, Rémi Rampin, Dennis Shasha, Juliana Freire | We present ReproZip, the recommended packaging tool for the SIGMOD Reproducibility Review. |
144 | CLAMS: Bringing Quality to Data Lakes | Mina Farid, Alexandra Roatis, Ihab F. Ilyas, Hella-Franziska Hoffmann, Xu Chu | We present CLAMS, a system to discover and enforce expressive integrity constraints from large amounts of lake data with very limited schema information (e.g., represented as RDF triples). |
145 | FERARI: A Prototype for Complex Event Processing over Streaming Multi-cloud Platforms | Ioannis Flouris, Vasiliki Manikaki, Nikos Giatrakos, Antonios Deligiannakis, Minos Garofalakis, Michael Mock, Sebastian Bothe, Inna Skarbovsky, Fabiana Fournier, Marko Stajcer, Tomislav Krizan, Jonathan Yom-Tov, Taji Curin | In this demo, we present FERARI, a prototype that enables real-time Complex Event Processing (CEP) for large volume event data streams over distributed topologies. |
146 | Constance: An Intelligent Data Lake System | Rihan Hai, Sandra Geisler, Christoph Quix | To avoid this, we propose Constance, a Data Lake system with sophisticated metadata management over raw data extracted from heterogeneous data sources. |
147 | Exploring Privacy-Accuracy Tradeoffs using DPComp | Michael Hay, Ashwin Machanavajjhala, Gerome Miklau, Yan Chen, Dan Zhang, George Bissias | In this demonstration we present DPComp, a publicly-accessible web-based system, designed to support a broad community of users, including data analysts, privacy researchers, and data owners. |
148 | Interactive Search and Exploration of Waveform Data with Searchlight | Alexander Kalinin, Ugur Cetintemel, Stan Zdonik | Interactive Search and Exploration of Waveform Data with Searchlight |
149 | Ontology-Based Integration of Streaming and Static Relational Data with Optique | Evgeny Kharlamov, Sebastian Brandt, Ernesto Jimenez-Ruiz, Yannis Kotidis, Steffen Lamparter, Theofilos Mailis, Christian Neuenstadt, Özgür Özçep, Christoph Pinkel, Christoforos Svingos, Dmitriy Zheleznyakov, Ian Horrocks, Yannis Ioannidis, Ralf Moeller | In this work we show how Semantic Technologies implemented in our system optique can simplify such complex diagnostics by providing an abstraction layer—ontology—that integrates heterogeneous data. |
150 | The CloudMdsQL Multistore System | Boyan Kolev, Carlyna Bondiombouy, Patrick Valduriez, Ricardo Jimenez-Peris, Raquel Pau, José Pereira | In this demonstration, we present a Cloud Multidatastore Query Language (CloudMdsQL), and its query engine. |
151 | ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning | Sanjay Krishnan, Michael J. Franklin, Ken Goldberg, Jiannan Wang, Eugene Wu | We propose ActiveClean, a progressive framework for training Machine Learning models with data cleaning. |
152 | Wander Join: Online Aggregation for Joins | Feifei Li, Bin Wu, Ke Yi, Zhuoyue Zhao | We introduce a new approach, wander join, to the online aggregation problem by performing random walks over the underlying join graph. |
153 | PerNav: A Route Summarization Framework for Personalized Navigation | Yaguang Li, Han Su, Ugur Demiryurek, Bolong Zheng, Kai Zeng, Cyrus Shahabi | In this paper, we study a route summarization framework for Personalized Navigation dubbed PerNav – with which the goal is to generate more intuitive and customized turn-by-turn directions based on user generated content. |
154 | Making the Case for Query-by-Voice with EchoQuery | Gabriel Lyons, Vinh Tran, Carsten Binnig, Ugur Cetintemel, Tim Kraska | With this demonstration, we make the case for querying database systems using a voice-based interface, a new querying and interaction paradigm we call Query-by-Voice (QbV). |
155 | QUEPA: QUerying and Exploring a Polystore by Augmentation | Antonio Maccioni, Edoardo Basili, Riccardo Torlone | QUEPA implements in this way a lightweight mechanism for data integration in the polystore and operates in a plug-and-play mode, thus reducing the need for ad-hoc configurations and for middleware layers involving standard APIs, unified query languages or shared data models. |
156 | REACT: Context-Sensitive Recommendations for Data Analysis | Tova Milo, Amit Somech | In this demo we present REACT, a system that hooks to the analysis UI and provides the users with personalized recommendations of analysis actions. |
157 | PerfEnforce Demonstration: Data Analytics with Performance Guarantees | Jennifer Ortiz, Brendan Lee, Magdalena Balazinska | We demonstrate PerfEnforce, a dynamic scaling engine for analytics services. |
158 | High-Performance Geospatial Analytics in HyPerSpace | Varun Pandey, Andreas Kipf, Dimitri Vorona, Tobias Mühlbauer, Thomas Neumann, Alfons Kemper | In this demonstration, we present HyPerSpace, an extension to the high-performance main-memory database system HyPer developed at the Technical University of Munich, capable of processing geospatial queries with sub-second latencies. |
159 | What Makes a Good Physical plan?: Experiencing Hardware-Conscious Query Optimization with Candomblé | Holger Pirk, Oscar Moll, Sam Madden | To address this problem, we developed a system called Candomblé that lets database performance engineers interactively examine, optimize and evaluate query plans using a touch-based interface. |
160 | SnappyData: A Hybrid Transactional Analytical Store Built On Spark | Jags Ramnarayan, Barzan Mozafari, Sumedh Wale, Sudhir Menon, Neeraj Kumar, Hemant Bhanawat, Soubhik Chakraborty, Yogesh Mahajan, Rishitesh Mishra, Kishor Bachhav | With SnappyData, an open source platform, we propose a unified engine for real-time operational analytics, delivering stream analytics, OLTP and OLAP in a single integrated solution. |
161 | SourceSight: Enabling Effective Source Selection | Theodoros Rekatsinas, Amol Deshpande, Xin Luna Dong, Lise Getoor, Divesh Srivastava | In this demonstration we present \textsc{SourceSight}, a system that allows users to interactively explore a large number of heterogeneous data sources, and discover valuable sets of sources for diverse integration tasks. |
162 | BART in Action: Error Generation and Empirical Evaluations of Data-Cleaning Systems | Donatello Santoro, Patricia C. Arocena, Boris Glavic, Giansalvatore Mecca, Renée J. Miller, Paolo Papotti | Many automatic or semi-automatic data-repairing algorithms have been proposed in the last few years, each with its own strengths and weaknesses. |
163 | RxSpatial: Reactive Spatial Library for Real-Time Location Tracking and Processing | Youying Shi, Abdeltawab M. Hendawi, Hossam Fattah, Mohamed Ali | In this Demo, we present the RxSpatial, a real time reactive spatial library that consists of (1) a front-end, a programming interface for developers who are familiar with the Reactive framework and the Microsoft Spatial Library, and (2) a back-end for processing spatial operations in a streaming fashion. |
164 | Web-based Benchmarks for Forecasting Systems: The ECAST Platform | Robert Ulbricht, Claudio Hartmann, Martin Hahmann, Hilko Donker, Wolfgang Lehner | We propose the ECAST online platform in order to solve that problem. |
165 | Energy Elasticity on Heterogeneous Hardware using Adaptive Resource Reconfiguration LIVE | Annett Ungethüm, Thomas Kissinger, Willi-Wolfram Mentzel, Dirk Habich, Wolfgang Lehner | In this demo, we introduce the concept of energy elasticity and propose the energy-control loop as an implementation of this concept. |
166 | QFix: Demonstrating Error Diagnosis in Query Histories | Xiaolan Wang, Alexandra Meliou, Eugene Wu | In this demo proposal, we outline the design of QFix, a query-centric framework that derives explanations and repairs for discrepancies in relational data based on potential errors in the queries that operated on the data. |
167 | CoDAR: Revealing the Generalized Procedure & Recommending Algorithms of Community Detection | Xiang Ying, Chaokun Wang, Meng Wang, Jeffrey Xu Yu, Jun Zhang | In this paper, we design a tool called CoDAR, which reveals the generalized procedure of community detection and monitors the real-time structural changes of network during the detection process. |
168 | DB-Risk: The Game of Global Database Placement | Victor Zakhary, Faisal Nawab, Divyakant Agrawal, Amr El Abbadi | We propose an optimization framework that automatically derives a geo-replication placement plan with the objective of minimizing latency. |
169 | Quegel: A General-Purpose System for Querying Big Graphs | Qizhen Zhang, Da Yan, James Cheng | In this demonstration, we introduce a general-purpose system for querying big graphs, called Quegel, which treats queries as first-class citizens in the design of its computing model. |
170 | Introduction to Spark 2.0 for Database Researchers | Michael Armbrust, Doug Bateman, Reynold Xin, Matei Zaharia | This tutorial covers the core APIs for using Spark 2.0, including DataFrames, Datasets, SQL, streaming and machine learning pipelines. |
171 | Design Tradeoffs of Data Access Methods | Manos Athanassoulis, Stratos Idreos | In this tutorial we survey recent developments in access method design and we place them in the design space where each approach focuses primarily on one or a subset of read performance, update performance, and memory utilization. |
172 | Data Cleaning: Overview and Emerging Challenges | Xu Chu, Ihab F. Ilyas, Sanjay Krishnan, Jiannan Wang | Data Cleaning: Overview and Emerging Challenges |
173 | Querying Geo-Textual Data: Spatial Keyword Queries and Beyond | Gao Cong, Christian S. Jensen | The tutorial is designed to offer an overview of the problems addressed in this body of literature and offers an overview of pertinent concepts and techniques. |
174 | Provenance: On and Behind the Screens | Melanie Herschel, Marcel Hlawatsch | To this end, we will present some fundamental concepts of visualization before we discuss possible visualizations for provenance. |
175 | Microblogs Data Management Systems: Querying, Analysis, and Visualization | Mohamed F. Mokbel, Amr Magdy | In this tutorial, we give a 1.5 hours overview about microblogs data management, analysis, visualization, and systems. |
176 | The Challenges of Global-scale Data Management | Faisal Nawab, Divyakant Agrawal, Amr El Abbadi | In this tutorial we survey recent developments in GSDM focusing on identifying fundamental challenges and advancements in addition to open research opportunities. |
177 | Semistructured Models, Queries and Algebras in the Big Data Era: Tutorial Summary | Yannis Papakonstantinou | Again, the tutorial presents the algebras’ fundamentals while it abstracts away modeling differences that are not applicable. |
178 | Automatic Entity Recognition and Typing in Massive Text Data | Xiang Ren, Ahmed El-Kishky, Heng Ji, Jiawei Han | In this tutorial, we introduce data-driven methods to recognize typed entities of interest in massive, domain-specific text corpora. |
179 | Big Graph Analytics Systems | Da Yan, Yingyi Bu, Yuanyuan Tian, Amol Deshpande, James Cheng | The topics covered in this tutorial include programming models and algorithm design, computation models, communication mechanisms, out-of-core support, fault tolerance, dynamic graph support, and so on. |
180 | Constructing Join Histograms from Histograms with q-error Guarantees | Kaleb Alway, Anisoara Nica | In this paper we present a novel construction algorithm for building a join histogram that accepts two single-column histograms over different attributes, each with q-error guarantees, and produces a histogram over the result of the join operation on these attributes. |
181 | Graph Summarization for Geo-correlated Trends Detection in Social Networks | Colin Biafore, Faisal Nawab | We tackle this problem by providing effective graph summarizations aimed at the application of geo-correlated trends detection in social networks. |
182 | M3: Scaling Up Machine Learning via Memory Mapping | Dezhi Fang, Duen Horng Chau | We propose to use a similar approach for general machine learning. |
183 | K-means Split Revisited: Well-grounded Approach and Experimental Evaluation | Valentin Grigorev, George Chernishev | In this paper we study an existing k-means node split algorithm. |
184 | Main Memory Adaptive Denormalization | Zezhou Liu, Stratos Idreos | We introduce adaptive denormalization for modern main memory systems. |
185 | Adaptive Data Skipping in Main-Memory Systems | Wilson Qin, Stratos Idreos | We introduce adaptive data skipping as a framework for structures and techniques that respond to a vast array of data distributions and query workloads. |
186 | Searching Web Data using MinHash LSH | BiChen Rao, Erkang Zhu | In this extended abstract, we explore the use of MinHash Locality Sensitive Hashing (MinHash LSH) to address the problem of indexing and searching Web data. |
187 | Research Contribution as a Measure of Influence | Lais M.A. Rocha, Mirella M. Moro | We propose the 3c-index that measures the influence degree of researchers by evaluating the links they establish between communities. |
188 | Vectorizing an In Situ Query Engine | Panagiotis Sioulas, Anastasia Ailamaki | In this paper, we investigate the effect of vector processing on raw data querying. |
189 | Exploring Visualization of Data Transforms | Larry Xu | We present the concept of "tweening" of resultsets as a method of incrementally visualizing data transformations, and explore approaches towards generating these resultset tweens. |
190 | Minimizing Average Regret Ratio in Database | Sepanta Zeighami, Raymond Chi-Wing Wong | We propose "average regret ratio" as a metric to measure users’ satisfaction after a user sees k selected points of a database, instead of all of the points in the database. |