Paper Digest: SIGMOD 2016 Highlights

June 16, 2016June 26, 2020 admin

The ACM Special Interest Group on Management of Data (SIGMOD) is one of the top conferences on database management systems and data management technology.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: SIGMOD 2016 Papers

	Title	Authors	Highlight
1	Building Machine Learning Systems that Understand	Jeff Dean	In this talk, I will highlight some of the advances that have been made in deep learning and suggest some interesting directions for future research.
2	Learning Linear Regression Models over Factorized Joins	Maximilian Schleich, Dan Olteanu, Radu Ciucanu	We propose a new paradigm for computing batch gradient descent that exploits the factorized computation and representation of the training datasets, a rewriting of the regression objective function that decouples the computation of cofactors of model parameters from their convergence, and the commutativity of cofactor computation with relational union and projection.
3	To Join or Not to Join?: Thinking Twice about Joins before Feature Selection	Arun Kumar, Jeffrey Naughton, Jignesh M. Patel, Xiaojin Zhu	In this work, we show that the features brought in by such joins can often be ignored without affecting ML accuracy significantly, i.e., we can "avoid joins safely."
4	Real-time Video Recommendation Exploration	Yanxiang Huang, Bin Cui, Jie Jiang, Kunqian Hong, Wenyu Zhang, Yiran Xie	To address the deficiencies of current recommendation systems, we introduce some new techniques to provide real-time and accurate recommendations to users in the video recommendation system of Tencent Inc..
5	Towards Globally Optimal Crowdsourcing Quality Management: The Uniform Worker Setting	Akash Das Sarma, Aditya Parameswaran, Jennifer Widom	In this paper, we focus on filtering, where tasks require the evaluation of a yes/no predicate, and rating, where tasks elicit integer scores from a finite domain.
6	Building the Enterprise Fabric for Big Data with Vertica and Spark Integration	Jeff LeFevre, Rui Liu, Cornelio Inigo, Lupita Paz, Edward Ma, Malu Castellanos, Meichun Hsu	In this paper, we present our initial efforts toward a solution that satisfies the above requirements by integrating the HPE Vertica enterprise database with Apache Spark’s open source big data computation engine.
7	Truss Decomposition of Probabilistic Graphs: Semantics and Algorithms	Xin Huang, Wei Lu, Laks V.S. Lakshmanan	In this paper, given a probabilistic graph G, number k and parameter γ –(0,1], we define a (k,γ)-truss as a maximal connected subgraph H ⊆ G, in which for each edge, the probability that it is contained in at least (k-2) triangles is at least γ.
8	Efficient and Progressive Group Steiner Tree Search	Rong-Hua Li, Lu Qin, Jeffrey Xu Yu, Rui Mao	To overcome these limitations, we propose an efficient and progressive GST algorithm in this paper, called PrunedDP.
9	Publishing Attributed Social Graphs with Formal Privacy Guarantees	Zach Jorgensen, Ting Yu, Graham Cormode	We introduce an approach to release such graphs under the strong guarantee of differential privacy.
10	Publishing Graph Degree Distribution with Node Differential Privacy	Wei-Yen Day, Ninghui Li, Min Lyu	In this paper, we investigate the problem of publishing the degree distribution of a graph under node-DP by exploring the projection approach to reduce the sensitivity.
11	Principled Evaluation of Differentially Private Algorithms using DPBench	Michael Hay, Ashwin Machanavajjhala, Gerome Miklau, Yan Chen, Dan Zhang	In this paper we propose a set of evaluation principles which we argue are essential for sound evaluation.
12	PrivTree: A Differentially Private Algorithm for Hierarchical Decompositions	Jun Zhang, Xiaokui Xiao, Xing Xie	To remedy the deficiency of existing solutions, we present PrivTree, a histogram construction algorithm that adopts hierarchical decomposition but completely eliminates the dependency on a pre-defined h.
13	Adaptive Indexing over Encrypted Numeric Data	Panagiotis Karras, Artyom Nikitin, Muhammad Saad, Rudrika Bhatt, Denis Antyukhov, Stratos Idreos	In this paper, we propose and analyze a scheme for lightweight and indexable encryption, based on linear-algebra operations.
14	Practical Private Range Search Revisited	Ioannis Demertzis, Stavros Papadopoulos, Odysseas Papapetrou, Antonios Deligiannakis, Minos Garofalakis	In this paper, we take an interdisciplinary approach, which combines the rigor of Security formulations and proofs with efficient Data Management techniques. We construct a wide set of novel schemes with realistic security/performance trade-offs, adopting the notion of Searchable Symmetric Encryption (SSE) primarily proposed for keyword search.
15	Privacy Preserving Subgraph Matching on Large Graphs in Cloud	Zhao Chang, Lei Zou, Feifei Li	To reduce the search space for a subgraph matching query, we propose a cost model to select the more effective label combinations.
16	The Snowflake Elastic Data Warehouse	Benoit Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, Allison W. Lee, Ashish Motivala, Abdul Q. Munir, Steven Pelley, Peter Povinec, Greg Rahn, Spyridon Triantafyllis, Philipp Unterbrunner	In this paper, we describe the design of Snowflake and its novel multi-cluster, shared-data architecture.
17	Closing the functional and Performance Gap between SQL and NoSQL	Zhen Hua Liu, Beda Hammerschmidt, Doug McMahon, Ying Liu, Hui Joe Chang	In this paper, we present enhancements to Oracle’s JSON data management in the upcoming 12cR2 release.
18	Have Your Data and Query It Too: From Key-Value Caching to Big Data Management	Dipti Borkar, Ravi Mayuram, Gerald Sangudi, Michael Carey	This paper describes the architectural changes needed to address the requirements posed by next-generation database applications.
19	Ambry: LinkedIn’s Scalable Geo-Distributed Object Store	Shadi A. Noghabi, Sriram Subramanian, Priyesh Narayanan, Sivabalan Narayanan, Gopalakrishna Holla, Mammad Zadeh, Tianwei Li, Indranil Gupta, Roy H. Campbell	We present Ambry, a production-quality system for storing large immutable data (called blobs).
20	SQL Schema Design: Foundations, Normal Forms, and Normalization	Henning Köhler, Sebastian Link	Unfortunately, relational normalization only works for idealized database instances in which duplicates and null markers are not present.
21	SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment	Shrainik Jain, Dominik Moritz, Daniel Halperin, Bill Howe, Ed Lazowska	Our contributions include a system design for delivering databases into these contexts, a description of a public research query workload dataset released to advance research in analytic data systems, and an initial analysis of the workload that provides evidence of new use cases under-supported in existing systems.
22	Automatic Generation of Normalized Relational Schemas from Nested Key-Value Data	Michael DiScala, Daniel J. Abadi	In this paper we present an algorithm that automatically transforms the denormalized, nested data commonly found in NoSQL systems into traditional relational data that can be stored in a standard RDBMS.
23	Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation	Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A. Boncz, Thomas Neumann, Alfons Kemper	This work aims at reducing the main-memory footprint in high performance hybrid OLTP & OLAP databases, while retaining high query performance and transactional throughput.
24	GeckoFTL: Scalable Flash Translation Techniques For Very Large Flash Devices	Niv Dayan, Philippe Bonnet, Stratos Idreos	In this paper, we identify a key component of the metadata called the Page Validity Bitmap (PVB) as the bottleneck.
25	SHARE Interface in Flash Storage for Relational and NoSQL Databases	Gihwan Oh, Chiyoung Seo, Ravi Mayuram, Yang-Suk Kee, Sang-Won Lee	In this paper, we propose a flash storage interface, SHARE.
26	Accelerating Relational Databases by Leveraging Remote Memory and RDMA	Feng Li, Sudipto Das, Manoj Syamala, Vivek R. Narasayya	We implemented the scenarios in Microsoft SQL Server engine and present the first end-to-end study to demonstrate benefits of remote memory for a variety of micro-benchmarks and industry-standard benchmarks.
27	FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory	Ismail Oukid, Johan Lasperas, Anisoara Nica, Thomas Willhalm, Wolfgang Lehner	In this paper we propose a novel hybrid SCM-DRAM persistent and concurrent B-Tree, named Fingerprinting Persistent Tree (FPTree) that achieves similar performance to DRAM-based counterparts.
28	Micro-architectural Analysis of In-memory OLTP	Utku Sirin, Pinar Tözün, Danica Porobic, Anastasia Ailamaki	This paper sheds light on the micro-architectural behavior of in-memory database systems by analyzing and contrasting it to the behavior of disk-based systems when running OLTP workloads.
29	iBFS: Concurrent Breadth-First Search on GPUs	Hang Liu, H. Howie Huang, Yang Hu	In this work, we focus on a special class of graph traversal algorithm – concurrent BFS – where multiple breadth-first traversals are performed simultaneously on the same graph.
30	Tornado: A System For Real-Time Iterative Analysis Over Evolving Data	Xiaogang Shi, Bin Cui, Yingxia Shao, Yunhai Tong	In this paper, we propose a novel execution model to obtain timely results at given instants.
31	EmptyHeaded: A Relational Engine for Graph Processing	Christopher R. Aberger, Susan Tu, Kunle Olukotun, Christopher Ré	We present EmptyHeaded, a high-level engine that supports a rich datalog-like query language and achieves performance comparable to that of low-level engines.
32	GTS: A Fast and Scalable Graph Processing Method based on Streaming Topology to GPUs	Min-Soo Kim, Kyuhyeon An, Himchan Park, Hyunseok Seo, Jinwook Kim	Here, we propose a fast and scalable graph processing method GTS that handles even RMAT32 (64 billion edges) very efficiently only by using a single machine.
33	Graph Analytics Through Fine-Grained Parallelism	Zechao Shang, Feifei Li, Jeffrey Xu Yu, Zhiwei Zhang, Hong Cheng	Efficient graph analytics thus becomes an important subject of study.
34	Hybrid Pulling/Pushing for I/O-Efficient Distributed and Iterative Graph Computing	Zhigang Wang, Yu Gu, Yubin Bao, Ge Yu, Jeffrey Xu Yu	This paper proposes a hybrid solution to support switching between push and pull adaptively, to obtain optimal performance for distributed systems with disk-resident data in different scenarios.
35	Scalable Pattern Sharing on Event Streams*	Medhabi Ray, Chuan Lei, Elke A. Rundensteiner	In this work we design the SPASS framework that successfully tackles these demanding CEP workloads.
36	How to Win a Hot Dog Eating Contest: Distributed Incremental View Maintenance with Batch Updates	Milos Nikolic, Mohammad Dashti, Christoph Koch	In this paper, we study low-latency incremental computation of complex SQL queries in both local and distributed streaming environments.
37	Sharing-Aware Outlier Analytics over High-Volume Data Streams	Lei Cao, Jiayuan Wang, Elke A. Rundensteiner	In this work we propose a sharing-aware multi-query execution strategy for outlier detection on data streams called SOP.
38	THEMIS: Fairness in Federated Stream Processing under Overload	Evangelia Kalyvianaki, Marco Fiscato, Theodoros Salonidis, Peter Pietzuch	We describe THEMIS, a federated stream processing system for resource-starved, multi-site deployments.
39	SABER: Window-Based Hybrid Stream Processing for Heterogeneous Architectures	Alexandros Koliousis, Matthias Weidlich, Raul Castro Fernandez, Alexander L. Wolf, Paolo Costa, Peter Pietzuch	We describe Saber, a hybrid high-performance relational stream processing engine for CPUs and GPGPUs.
40	Range Thresholding on Streams	Miao Qiao, Junhao Gan, Yufei Tao	We propose the first algorithm that breaks the quadratic barrier, by reducing the computation cost dramatically to O(n + m), subject only to a polylogarithmic factor.
41	Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads	Joy Arulraj, Andrew Pavlo, Prashanth Menon	To overcome this barrier, we present a hybrid DBMS architecture that efficiently supports varied workloads on the same database.
42	An Effective Syntax for Bounded Relational Queries	Yang Cao, Wenfei Fan	We provide quadratic-time algorithms to check the coverage of Q, and to generate a bounded query plan for covered Q.
43	Wander Join: Online Aggregation via Random Walks	Feifei Li, Bin Wu, Ke Yi, Zhuoyue Zhao	This paper proposes a new approach, the wander join algorithm, to the online aggregation problem by performing random walks over the underlying join graph.
44	Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters	Srikanth Kandula, Anil Shanbhag, Aleksandar Vitorovic, Matthaios Olma, Robert Grandl, Surajit Chaudhuri, Bolin Ding	We present a system that approximates the answer to complex ad-hoc queries in big-data clusters by injecting samplers on-the-fly and without requiring pre-existing samples.
45	A Study of Sorting Algorithms on Approximate Memory	Shuang Chen, Shunning Jiang, Bingsheng He, Xueyan Tang	In this paper, we study one of the most basic operations in database–sorting on a hybrid storage system with both precise storage and approximate storage.
46	Distributed Wavelet Thresholding for Maximum Error Metrics	Ioannis Mytilinis, Dimitrios Tsoumakos, Nectarios Koziris	To that end, we present i) a general framework for the parallelization of existing dynamic programming algorithms, ii) a parallel version of one such DP-based algorithm and iii) a new parallel greedy algorithm for the problem.
47	Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee	Bolin Ding, Silu Huang, Surajit Chaudhuri, Kaushik Chakrabarti, Chi Wang	We propose a novel sampling scheme called measure-biased sampling to address the former challenge.
48	Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks	Hung T. Nguyen, My T. Thai, Thang N. Dinh	In this paper, we propose SSA and D-SSA, two novel sampling frameworks for IM-based viral marketing problems.
49	Spheres of Influence for More Effective Viral Marketing	Yasir Mehmood, Francesco Bonchi, David García-Soriano	We thus formalize the Typical Cascade problem which requires, for a given source node s, to find the set of nodes minimizing the expected Jaccard distance to all the possible cascades from s.
50	Continuous Influence Maximization: What Discounts Should We Offer to Social Network Users?	Yu Yang, Xiangbo Mao, Jian Pei, Xiaofei He	In this paper, we tackle the problem systematically.
51	Holistic Influence Maximization: Combining Scalability and Efficiency with Opinion-Aware Models	Sainyam Galhotra, Akhil Arora, Shourya Roy	In this paper, we propose a holistic solution to the influence maximization (IM) problem. Under the OI model, we introduce a novel problem of Maximizing the Effective Opinion (MEO) of influenced users.
52	Potential and Pitfalls of Domain-Specific Information Extraction at Web Scale	Astrid Rheinländer, Mario Lehmann, Anja Kunkel, Jörg Meier, Ulf Leser	In this paper, we report our experiences from building such a system for comparing the "web view" on health related topics with that derived from a controlled scientific corpus, i.e., Medline.
53	Robust and Noise Resistant Wrapper Induction	Tim Furche, Jinsong Guo, Sebastian Maneth, Christian Schallhart	We introduce such a language as subset of XPATH and show that even for such a restricted language, inducing optimal queries according to a suitable scoring is infeasible.
54	Goods: Organizing Google’s Datasets	Alon Halevy, Flip Korn, Natalya F. Noy, Christopher Olston, Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang	In this paper, we present GOODS, a project to rethink how we organize structured datasets at scale, in a setting where teams use diverse and often idiosyncratic ways to produce the datasets and where there is no centralized system for storing and querying them.
55	Multi-Source Uncertain Entity Resolution at Yad Vashem: Transforming Holocaust Victim Reports into People	Tomer Sagi, Avigdor Gal, Omer Barkol, Ruth Bergman, Alexander Avram	In this work we describe an entity resolution project performed at Yad Vashem, the central repository of Holocaust-era information.
56	A Hybrid Approach to Functional Dependency Discovery	Thorsten Papenbrock, Felix Naumann	For this reason, database research has proposed various algorithms for functional dependency discovery.
57	Ontological Pathfinding	Yang Chen, Sean Goldberg, Daisy Zhe Wang, Soumitra Siddharth Johri	We propose the Ontological Pathfinding algorithm (OP) that scales to web-scale knowledge bases via a series of parallelization and optimization techniques: a relational knowledge base model to apply inference rules in batches, a new rule mining algorithm that parallelizes the join queries, a novel partitioning algorithm to break the mining tasks into smaller independent sub-tasks, and a pruning strategy to eliminate unsound and resource-consuming rules before applying them.
58	Extracting Databases from Dark Data with DeepDive	Ce Zhang, Jaeho Shin, Christopher Ré, Michael Cafarella, Feng Niu	DeepDive is distinctive when compared to previous information extraction systems in its ability to obtain very high precision and recall at reasonable engineering cost; in a number of applications, we have used DeepDive to create databases with accuracy that meets that of human annotators.
59	Estimating the Impact of Unknown Unknowns on Aggregate Query Results	Yeounoh Chung, Michael Lind Mortensen, Carsten Binnig, Tim Kraska	In this work, we develop and analyze techniques to estimate the impact of the unknown data (a.k.a., unknown unknowns) on simple aggregate queries.
60	Constraint-Variance Tolerant Data Repairing	Shaoxu Song, Han Zhu, Jianmin Wang	To address the oversimplified and overrefined constraint inaccuracies, in this paper, we propose to repair data by allowing a small variation (with both predicate insertion and deletion) on the constraints.
61	Interactive and Deterministic Data Cleaning	Jian He, Enzo Veltri, Donatello Santoro, Guoliang Li, Giansalvatore Mecca, Paolo Papotti, Nan Tang	We present Falcon, an interactive, deterministic, and declarative data cleaning system, which uses SQL update queries as the language to repair data.
62	Sequential Data Cleaning: A Statistical Approach	Aoqian Zhang, Shaoxu Song, Jianmin Wang	We formalize the likelihood-based cleaning problem, show its NP-hardness, devise exact algorithms, and propose several approximate/heuristic methods to trade off effectiveness for efficiency.
63	Learning-Based Cleansing for Indoor RFID Data	Asif Iqbal Baba, Manfred Jaeger, Hua Lu, Torben Bach Pedersen, Wei-Shinn Ku, Xike Xie	We propose the Indoor RFID Multi-variate Hidden Markov Model (IR-MHMM) to capture the uncertainties of indoor RFID data as well as the correlation of moving object locations and object RFID readings.
64	PrivateClean: Data Cleaning and Differential Privacy	Sanjay Krishnan, Jiannan Wang, Michael J. Franklin, Ken Goldberg, Tim Kraska	This paper explores the link between data cleaning and differential privacy in a framework we call PrivateClean.
65	RDFind: Scalable Conditional Inclusion Dependency Discovery in RDF Datasets	Sebastian Kruse, Anja Jentzsch, Thorsten Papenbrock, Zoi Kaoudi, Jorge-Arnulfo Quiané-Ruiz, Felix Naumann	In our experimental evaluation, we show that RDFind is up to 419 times faster than the state-of-the-art, while considering a more general class of CINDs.
66	Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach	Chengliang Chai, Guoliang Li, Jian Li, Dong Deng, Jianhua Feng	To address these problems, we propose a cost-effective crowdsourced entity resolution framework, which significantly reduces the monetary cost while keeping high quality.
67	Topic Exploration in Spatio-Temporal Document Collections	Kaiqi Zhao, Lisi Chen, Gao Cong	In this paper, we study the problem of efficiently mining topics from spatio-temporal documents within a user specified bounded region and timespan, to provide users with insights about events, trends, and public concerns within the specified region and time period.
68	ParTime: Parallel Temporal Aggregation	Markus Pilman, Martin Kaufmann, Florian Köhl, Donald Kossmann, Damien Profeta	This paper presents ParTime, a parallel algorithm for temporal aggregation.
69	Data Polygamy: The Many-Many Relationships among Urban Spatio-Temporal Data Sets	Fernando Chirigati, Harish Doraiswamy, Theodoros Damoulas, Juliana Freire	To address these challenges, we propose Data Polygamy, a scalable topology-based framework that allows users to query for statistically significant relationships between spatio-temporal data sets.
70	Distributed Evaluation of Top-k Temporal Joins	Julien Pilourdault, Vincent Leroy, Sihem Amer-Yahia	We show how to exploit the nature of temporal predicates and the properties of their associated scoring semantics to design TKIJ, an efficient query evaluation approach on a distributed Map-Reduce architecture.
71	AT-GIS: Highly Parallel Spatial Query Processing with Associative Transducers	Peter Ogden, David Thomas, Peter Pietzuch	Our goal is to fully exploit the parallelism offered by modern multi-core CPUs for parsing and query execution, thus providing the performance of a cluster with the resources of a single machine.
72	Towards Best Region Search for Data Exploration	Kaiyu Feng, Gao Cong, Sourav S. Bhowmick, Wen-Chih Peng, Chunyan Miao	We propose an efficient algorithm called SliceBRS to find the exact answer to the BRS problem. This paper introduces a novel problem called the best region search (BRS) problem and provides efficient solutions to it.
73	Simba: Efficient In-Memory Spatial Analytics	Dong Xie, Feifei Li, Bin Yao, Gefei Li, Liang Zhou, Minyi Guo	We present the Simba (Spatial In-Memory Big data Analytics) system that offers scalable and efficient in-memory spatial query processing and analytics for big spatial data.
74	Realtime Data Processing at Facebook	Guoqiang Jerry Chen, Janet L. Wiener, Shridhar Iyer, Anshul Jaiswal, Ran Lei, Nikhil Simha, Wei Wang, Kevin Wilfong, Tim Williamson, Serhat Yilmaz	In this paper, we identify five important design decisions that affect their ease of use, performance, fault tolerance, scalability, and correctness.
75	SparkR: Scaling R Programs with Spark	Shivaram Venkataraman, Zongheng Yang, Davies Liu, Eric Liang, Hossein Falaki, Xiangrui Meng, Reynold Xin, Ali Ghodsi, Michael Franklin, Ion Stoica, Matei Zaharia	We present SparkR, an R package that provides a frontend to Apache Spark and uses Spark’s distributed computation engine to enable large scale data analysis from the R shell.
76	VectorH: Taking SQL-on-Hadoop to the Next Level	Andrei Costea, Adrian Ionescu, Bogdan Răducanu, Michał Switakowski, Cristian Bârca, Juliusz Sompolski, Alicja Łuszczak, Michał Szafrański, Giel de Nijs, Peter Boncz	We describe the changes made to single-server Vectorwise to turn it into a Hadoop-based MPP system, encompassing workload management, parallel query optimization and execution, HDFS storage, transaction processing and Spark integration.
77	Adaptive Logging: Optimizing Logging and Recovery Costs in Distributed In-memory Databases	Chang Yao, Divyakant Agrawal, Gang Chen, Beng Chin Ooi, Sai Wu	The percentage of data logging versus command logging becomes a tuning knob between the performance of transaction processing and recovery to meet different OLTP requirements, and a model is proposed to guide such tuning.
78	Big Data Analytics with Datalog Queries on Spark	Alexander Shkapsky, Mohan Yang, Matteo Interlandi, Hsuan Chiu, Tyson Condie, Carlo Zaniolo	Towards this goal, we propose compilation and optimization techniques that tackle the important problem of efficiently supporting recursion in Spark.
79	An Efficient MapReduce Cube Algorithm for Varied DataDistributions	Tova Milo, Eyal Altshuler	To address this problem, we consider cube computation in MapReduce, the popular paradigm for distributed big data processing, and present an efficient algorithm for computing cubes over large data sets.
80	Diversified Top-k Subgraph Querying in a Large Graph	Zhengwei Yang, Ada Wai-Chee Fu, Ruifeng Liu	In this work, we study the problem of top-k diversified subgraph querying that asks for a set of up to k subgraphs isomorphic to a given query graph, and that covers the largest number of vertices.
81	Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs	Mohamed S. Hassan, Walid G. Aref, Ahmed M. Aly	This paper introduces Edge-Disjoint Partitioning (EDP, for short), a new technique for efficiently answering ECSP queries over dynamic graphs.
82	Efficient Subgraph Matching by Postponing Cartesian Products	Fei Bi, Lijun Chang, Xuemin Lin, Lu Qin, Wenjie Zhang	In this paper, we study the problem of subgraph matching that extracts all subgraph isomorphic embeddings of a query graph q in a large data graph G.
83	Adding Counting Quantifiers to Graph Patterns	Wenfei Fan, Yinghui Wu, Jingbo Xu	This paper proposes quantified graph patterns (QGPs), an extension of graph patterns by supporting simple counting quantifiers on edges.
84	DUALSIM: Parallel Subgraph Enumeration in a Massive Graph on a Single Machine	Hyeonji Kim, Juneyoung Lee, Sourav S. Bhowmick, Wook-Shin Han, JeongHoon Lee, Seongyun Ko, Moath H.A. Jarrah	In this paper, we design and implement a disk-based, single machine parallel subgraph enumeration solution called DualSim that can handle massive graphs without maintaining exponential numbers of partial results.
85	Distributed Set Reachability	Sairam Gurajada, Martin Theobald	In this paper, we focus on the efficient and scalable processing of set-reachability queries over a distributed, directed data graph.
86	Fast Multi-Column Sorting in Main-Memory Column-Stores	Wenjian Xu, Ziqiang Feng, Eric Lo	In this paper, we propose a new technique called "code massaging", which manipulates the bits across the columns so that the overall sorting time can be reduced by eliminating some rounds of sorting and/or by improving the degree of SIMD data level parallelism.
87	Elastic Pipelining in an In-Memory Database Cluster	Li Wang, Minqi Zhou, Zhenjie Zhang, Yin Yang, Aoying Zhou, Dina Bitton	To tackle this problem, we propose elastic pipelining, which makes it possible to optimize intra-node parallelism assignments in the pipelines based on the actual workload at runtime.
88	Page As You Go: Piecewise Columnar Access In SAP HANA	Reza Sherkat, Colin Florendo, Mihnea Andrei, Anil K. Goel, Anisoara Nica, Peter Bumbulis, Ivan Schreter, Günter Radestock, Christian Bensberg, Daniel Booss, Heiko Gerwens	As an alternative approach, we propose to reduce the unit of load and eviction from column to a contiguous portion of the in-memory columnar representation, which we call a page.
89	Hybrid Garbage Collection for Multi-Version Concurrency Control in SAP HANA	Juchang Lee, Hyungyu Shin, Chang Gyoo Park, Seongyun Ko, Jaeyun Noh, Yongjae Chuh, Wolfgang Stephan, Wook-Shin Han	In this paper, we present an efficient and effective garbage collector called HybridGC in SAP HANA.
90	UpBit: Scalable In-Memory Updatable Bitmap Indexing	Manos Athanassoulis, Zheng Yan, Stratos Idreos	In this paper, we propose scalable in-memory Updatable Bitmap indexing (UpBit), which offers efficient updates, without hurting read performance.
91	FluxQuery: An Execution Framework for Highly Interactive Query Workloads	Roee Ebenstein, Niranjan Kamat, Arnab Nandi	We propose a novel model to interpret the variability of likely queries in a workload.
92	iOLAP: Managing Uncertainty for Efficient Incremental OLAP	Kai Zeng, Sameer Agarwal, Ion Stoica	In this paper, we propose iOLAP, an incremental OLAP query engine that provides a smooth trade-off between query accuracy and latency, and fulfills a full spectrum of user requirements from approximate but timely query execution to a more traditional accurate query execution.
93	Dynamic Prefetching of Data Tiles for Interactive Visualization	Leilani Battle, Remco Chang, Michael Stonebraker	In this paper, we present ForeCache, a general-purpose tool for exploratory browsing of large datasets.
94	Expressive Query Construction through Direct Manipulation of Nested Relational Results	Eirik Bakke, David R. Karger	This paper presents the first visual query system to meet all three requirements in a single design.
95	Shasta: Interactive Reporting At Scale	Gokul Nath Babu Manoharan, Stephan Ellner, Karl Schnaitter, Sridatta Chegu, Alejandro Estrella-Balderrama, Stephan Gudmundson, Apurv Gupta, Ben Handy, Bart Samwel, Chad Whipkey, Larysa Aharkava, Himani Apte, Nitin Gangahar, Jun Xu, Shivakumar Venkataraman, Divyakant Agrawal, Jeffrey D. Ullman	We describe Shasta, a middleware system built at Google to support interactive reporting in complex user-facing applications related to Google’s Internet advertising business.
96	Datometry Hyper-Q: Bridging the Gap Between Real-Time and Historical Analytics	Lyublena Antova, Rhonda Baldwin, Derrick Bryant, Tuan Cao, Michael Duller, John Eshleman, Zhongxian Gu, Entong Shen, Mohamed A. Soliman, F. Michael Waas	In this paper we present Hyper-Q, a data virtualization plat- form that overcomes the chasm.
97	Time Adaptive Sketches (Ada-Sketches) for Summarizing Data Streams	Anshumali Shrivastava, Arnd Christian Konig, Mikhail Bilenko	In this work, we describe a new method, Time-adaptive Sketches, (Ada-sketch), that overcomes these limitations, while extending and providing a strict generalization of several popular sketching algorithms.
98	Streaming Algorithms for Robust Distinct Elements	Di Chen, Qin Zhang	In this paper, we formalize the problem of robust distinct elements, and develop space and time-efficient streaming algorithms for datasets in the Euclidean space, using a novel technique we call bucket sampling.
99	Augmented Sketch: Faster and More Accurate Stream Processing	Pratanu Roy, Arijit Khan, Gustavo Alonso	Approximated algorithms are often used to estimate the frequency of items on high volume, fast data streams.
100	Matrix Sketching Over Sliding Windows	Zhewei Wei, Xuancheng Liu, Feifei Li, Shuo Shang, Xiaoyong Du, Ji-Rong Wen	With this observation, we present three general frameworks for matrix sketching on sliding windows.
101	Graph Stream Summarization: From Big Bang to Big Crunch	Nan Tang, Qing Chen, Prasenjit Mitra	We present TCM, a novel generalized graph stream summary.
102	Scalable Approximate Query Tracking over Highly Distributed Data Streams	Nikos Giatrakos, Antonios Deligiannakis, Minos Garofalakis	In this paper, we propose novel techniques that effectively tackle the aforementioned scalability problems by exploiting a carefully designed sample of the remote sites for efficient approximate query tracking.
103	A Hybrid B+-tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing Platforms	Amirhesam Shahvarani, Hans-Arno Jacobsen	In this paper, we propose a novel design for a B+-tree based on the heterogeneous computing platform and the hybrid memory architecture found in GPUs.
104	Low-Overhead Asynchronous Checkpointing in Main-Memory Database Systems	Kun Ren, Thaddeus Diamond, Daniel J. Abadi, Alexander Thomson	This paper presents Checkpointing Asynchronously using Logical Consistency (CALC), a lightweight, asynchronous technique for capturing database snapshots that does not require a physical point of consistency to create a checkpoint, and avoids conspicuous latency spikes incurred by other database snapshotting schemes.
105	T-Part: Partitioning of Transactions for Forward-Pushing in Deterministic Database Systems	Shan-Hung Wu, Tsai-Yu Feng, Meng-Kai Liao, Shao-Kan Pi, Yu-Shan Lin	In this paper, we present T-Part, a transaction execution engine that partitions transactions in a deterministic database system to deal with the unforeseeable workloads or workloads whose data are hard to partition.
106	Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes	Huanchen Zhang, David G. Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, Rui Shen	To reduce this overhead, we propose using a two-stage index: The first stage ingests all incoming entries and is kept small for fast read and write operations.
107	Design Principles for Scaling Multi-core OLTP Under High Contention	Kun Ren, Jose M. Faleiro, Daniel J. Abadi	In this paper we identify two prevalent design principles that limit the multi-core scalability of many (but not all) transactional database systems on contended workloads: the multi-purpose nature of execution threads in these systems, and the lack of advanced planning of data access.
108	DBSherlock: A Performance Diagnostic Tool for Transactional Databases	Dong Young Yoon, Ning Niu, Barzan Mozafari	This paper presents a practical tool for assisting DBAs in quickly and reliably diagnosing performance problems in an OLTP database.
109	TARDiS: A Branch-and-Merge Approach To Weak Consistency	Natacha Crooks, Youer Pu, Nancy Estrada, Trinabh Gupta, Lorenzo Alvisi, Allen Clement	This paper presents the design, implementation, and evaluation of TARDiS (Transactional Asynchronously Replicated Divergent Store), a transactional key-value store explicitly designed for weakly-consistent systems.
110	TicToc: Time Traveling Optimistic Concurrency Control	Xiangyao Yu, Andrew Pavlo, Daniel Sanchez, Srinivas Devadas	In this paper we present TicToc, a new optimistic concurrency control algorithm that avoids the scalability and concurrency bottlenecks of prior T/O schemes.
111	Scaling Multicore Databases via Constrained Parallel Execution	Zhaoguo Wang, Shuai Mu, Yang Cui, Han Yi, Haibo Chen, Jinyang Li	In this paper, we describe a new concurrency control scheme, interleaving constrained concurrency con- trol (IC3), which provides serializability while allowing for parallel execution of certain conflicting transactions.
112	Towards a Non-2PC Transaction Management in Distributed Database Systems	Qian Lin, Pengfei Chang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, Zhengkui Wang	In this paper, we propose a transaction management scheme called LEAP to avoid the 2PC protocol within distributed transaction processing.
113	ERMIA: Fast Memory-Optimized Database System for Heterogeneous Workloads	Kangnyeon Kim, Tianzheng Wang, Ryan Johnson, Ippokratis Pandis	In this paper, we present ERMIA, a memory-optimized database system built from scratch to cater the need of handling heterogeneous workloads.
114	Transaction Healing: Scaling Optimistic Concurrency Control on Multicores	Yingjun Wu, Chee-Yong Chan, Kian-Lee Tan	In this paper, we propose a new concurrency-control mechanism, called transaction healing, that exploits program semantics to scale the conventional OCC towards dozens of cores even under highly contended workloads.
115	Enabling Incremental Query Re-Optimization	Mengmeng Liu, Zachary G. Ives, Boon Thau Loo	We investigate how to build a cost-based optimizer that recomputes the optimal plan incrementally given new cost information, much as a stream engine constantly updates its outputs given new data.
116	Sampling-Based Query Re-Optimization	Wentao Wu, Jeffrey F. Naughton, Harneet Singh	In this paper, we propose a low-cost post-processing step that can take a plan produced by the optimizer, detect when it is likely to have made such a mistake, and take steps to fix it.
117	A Fast Randomized Algorithm for Multi-Objective Query Optimization	Immanuel Trummer, Christoph Koch	In this work, we present the first algorithm with polynomial complexity in the query size.
118	Operator and Query Progress Estimation in Microsoft SQL Server Live Query Statistics	Kukjin Lee, Arnd Christian König, Vivek Narasayya, Bolin Ding, Surajit Chaudhuri, Brent Ellwein, Alexey Eksarevskiy, Manbeen Kohli, Jacob Wyant, Praneeta Prakash, Rimma Nehme, Jiexing Li, Jeff Naughton	We describe the design and implementation of the new Live Query Statistics (LQS) feature in Microsoft SQL Server 2016.
119	Optimization of Nested Queries using the NF2 Algebra	Jürgen Hölsch, Michael Grossniklaus, Marc H. Scholl	In this paper, we argue that the NF2 (non-first normal form) algebra, which was originally designed to process nested tables, is a better approach to nested query optimization as it fulfills two key requirements.
120	Extracting Equivalent SQL from Imperative Code in Database Applications	K. Venkatesh Emani, Karthik Ramachandra, Subhro Bhattacharya, S. Sudarshan	In this paper we present an approach to this problem which is based on extracting a concise algebraic representation of (parts of) an application, which may include imperative code as well as SQL queries.
121	Generating Preview Tables for Entity Graphs	Ning Yan, Sona Hasani, Abolfazl Asudeh, Chengkai Li	We propose methods to produce preview tables for compact presentation of important entity types and relationships in entity graphs.
122	Speedup Graph Processing by Graph Ordering	Hao Wei, Jeffrey Xu Yu, Can Lu, Xuemin Lin	In this paper, we focus on CPU speedup for graph computing in general by reducing the CPU cache miss ratio for different graph algorithms.
123	ROLL: Fast In-Memory Generation of Gigantic Scale-free Networks	Ali Hadian, Sadegh Nobari, Behrooz Minaei-Bidgoli, Qiang Qu	In this paper, we propose ROLL-tree, a fast in-memory roulette wheel data structure that accelerates the BA network generation process by exploiting the statistical behaviors of the underlying growth model.
124	Functional Dependencies for Graphs	Wenfei Fan, Yinghui Wu, Jingbo Xu	We propose a class of functional dependencies for graphs, referred to as GFDs.
125	SLING: A Near-Optimal Index Structure for SimRank	Boyu Tian, Xiaokui Xiao	Scalable SimRank computation has been the subject of extensive research for more than a decade, and yet, none of the existing solutions can efficiently derive SimRank scores on large graphs with provable accuracy guarantees.
126	Query Planning for Evaluating SPARQL Property Paths	Nikolay Yakovets, Parke Godfrey, Jarek Gryz	The extension of SPARQL in version 1.1 with property paths offers a type of regular path query for RDF graph databases.
127	Robust Query Processing in Co-Processor-accelerated Databases	Sebastian Breß, Henning Funke, Jens Teubner	In this paper, we identify two effects that limit performance in case co-processor resources become scarce.
128	How to Architect a Query Compiler	Amir Shaikhha, Yannis Klonatos, Lionel Parreaux, Lewis Brown, Mohammad Dashti, Christoph Koch	We propose to use a stack of multiple DSLs on different levels of abstraction with lowering in multiple steps to make query compilers easier to build and extend, ultimately allowing us to create more convincing and sustainable compiler-based data management systems.
129	Automated Demand-driven Resource Scaling in Relational Database-as-a-Service	Sudipto Das, Feng Li, Vivek R. Narasayya, Arnd Christian König	We present a solution to enable a DaaS to auto-scale container sizes on behalf of its tenants.
130	GPL: A GPU-based Pipelined Query Processing Engine	Johns Paul, Jiong He, Bingsheng He	In this paper, we propose GPL, a novel pipelined query execution engine to improve the resource utilization of query co-processing on the GPU.
131	Towards a Hybrid Design for Fast Query Processing in DB2 with BLU Acceleration Using Graphical Processing Units: A Technology Demonstration	Sina Meraji, Berni Schiefer, Lan Pham, Lee Chu, Peter Kokosielis, Adam Storm, Wayne Young, Chang Ge, Geoffrey Ng, Kajan Kanagaratnam	In this paper, we show how we use Nvidia GPUs and host CPU cores for faster query processing in a DB2 database using BLU Acceleration (DB2’s column store technology).
132	An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory	Stefan Schuh, Xiao Chen, Jens Dittrich	In this paper we will try to develop an answer.
133	Top-k Relevant Semantic Place Retrieval on Spatial RDF Data	Jieming Shi, Dingming Wu, Nikos Mamoulis	In this work, we propose and study a novel location-based keyword search query on RDF data.
134	Local Similarity Search for Unstructured Text	Pei Wang, Chuan Xiao, Jianbin Qin, Wei Wang, Xiaoyang Zhang, Yoshiharu Ishikawa	In this paper, we study the problem of local similarity search to find partially replicated text.
135	Similarity Join over Array Data	Weijie Zhao, Florin Rusu, Bin Dong, Kesheng Wu	In this paper, we introduce a novel distributed similarity join operator for multi-dimensional arrays.
136	LazyLSH: Approximate Nearest Neighbor Search for Multiple Distance Functions with a Single Index	Yuxin Zheng, Qi Guo, Anthony K.H. Tung, Sai Wu	In this paper, we propose LazyLSH that answers approximate nearest neighbor queries for multiple Lp metrics with theoretical guarantees.
137	Set-based Similarity Search for Time Series	Jinglin Peng, Hongzhi Wang, Jianzhong Li, Hong Gao	In this paper, we propose a novel approach, STS3, to process k-NN queries by transforming time series to sets and measure the similarity under Jaccard metric.
138	Range-based Obstructed Nearest Neighbor Queries	Huaijie Zhu, Xiaochun Yang, Bin Wang, Wang-Chien Lee	In this paper, we study a novel variant of obstructed nearest neighbor queries, namely, range-based obstructed nearest neighbor (RONN) search.
139	Rheem: Enabling Multi-Platform Task Execution	Divy Agrawal, Lamine Ba, Laure Berti-Equille, Sanjay Chawla, Ahmed Elmagarmid, Hossam Hammady, Yasser Idris, Zoi Kaoudi, Zuhair Khayyat, Sebastian Kruse, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiane-Ruiz, Nan Tang, Mohammed J. Zaki	We will demonstrate the strengths of system by using real-world scenarios from three different applications, namely, machine learning, data cleaning, and data fusion.
140	Emma in Action: Declarative Dataflows for Scalable Data Analysis	Alexander Alexandrov, Andreas Salzmann, Georgi Krastev, Asterios Katsifodimos, Volker Markl	To retain a sufficient level of abstraction and lower the barrier of entry for data scientists, projects like Spark and Flink currently offer domain-specific APIs on top of their parallel collection abstractions.
141	Wildfire: Concurrent Blazing Data Ingest and Analytics	Ronald Barber, Matt Huras, Guy Lohman, C. Mohan, Rene Mueller, Fatma Özcan, Hamid Pirahesh, Vijayshankar Raman, Richard Sidle, Oleg Sidorkin, Adam Storm, Yuanyuan Tian, Pinar Tözun	We demonstrate Hybrid Transactional and Analytics Processing (HTAP) on the Spark platform by the Wildfire prototype, which can ingest up to ~6 million inserts per second per node and simultaneously perform complex SQL analytics queries.
142	Efficient Query Processing on Many-core Architectures: A Case Study with Intel Xeon Phi Processor	Xuntao Cheng, Bingsheng He, Mian Lu, Chiew Tong Lau, Huynh Phung Huynh, Rick Siow Mong Goh	In this demonstration, we present PhiDB, an OLAP query processor with simultaneous multi-threading (SMT) capabilities on Xeon Phi as a case study for parallel database performance on future many-core processors.
143	ReproZip: Computational Reproducibility With Ease	Fernando Chirigati, Rémi Rampin, Dennis Shasha, Juliana Freire	We present ReproZip, the recommended packaging tool for the SIGMOD Reproducibility Review.
144	CLAMS: Bringing Quality to Data Lakes	Mina Farid, Alexandra Roatis, Ihab F. Ilyas, Hella-Franziska Hoffmann, Xu Chu	We present CLAMS, a system to discover and enforce expressive integrity constraints from large amounts of lake data with very limited schema information (e.g., represented as RDF triples).
145	FERARI: A Prototype for Complex Event Processing over Streaming Multi-cloud Platforms	Ioannis Flouris, Vasiliki Manikaki, Nikos Giatrakos, Antonios Deligiannakis, Minos Garofalakis, Michael Mock, Sebastian Bothe, Inna Skarbovsky, Fabiana Fournier, Marko Stajcer, Tomislav Krizan, Jonathan Yom-Tov, Taji Curin	In this demo, we present FERARI, a prototype that enables real-time Complex Event Processing (CEP) for large volume event data streams over distributed topologies.
146	Constance: An Intelligent Data Lake System	Rihan Hai, Sandra Geisler, Christoph Quix	To avoid this, we propose Constance, a Data Lake system with sophisticated metadata management over raw data extracted from heterogeneous data sources.
147	Exploring Privacy-Accuracy Tradeoffs using DPComp	Michael Hay, Ashwin Machanavajjhala, Gerome Miklau, Yan Chen, Dan Zhang, George Bissias	In this demonstration we present DPComp, a publicly-accessible web-based system, designed to support a broad community of users, including data analysts, privacy researchers, and data owners.
148	Interactive Search and Exploration of Waveform Data with Searchlight	Alexander Kalinin, Ugur Cetintemel, Stan Zdonik	Interactive Search and Exploration of Waveform Data with Searchlight
149	Ontology-Based Integration of Streaming and Static Relational Data with Optique	Evgeny Kharlamov, Sebastian Brandt, Ernesto Jimenez-Ruiz, Yannis Kotidis, Steffen Lamparter, Theofilos Mailis, Christian Neuenstadt, Özgür Özçep, Christoph Pinkel, Christoforos Svingos, Dmitriy Zheleznyakov, Ian Horrocks, Yannis Ioannidis, Ralf Moeller	In this work we show how Semantic Technologies implemented in our system optique can simplify such complex diagnostics by providing an abstraction layer—ontology—that integrates heterogeneous data.
150	The CloudMdsQL Multistore System	Boyan Kolev, Carlyna Bondiombouy, Patrick Valduriez, Ricardo Jimenez-Peris, Raquel Pau, José Pereira	In this demonstration, we present a Cloud Multidatastore Query Language (CloudMdsQL), and its query engine.
151	ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning	Sanjay Krishnan, Michael J. Franklin, Ken Goldberg, Jiannan Wang, Eugene Wu	We propose ActiveClean, a progressive framework for training Machine Learning models with data cleaning.
152	Wander Join: Online Aggregation for Joins	Feifei Li, Bin Wu, Ke Yi, Zhuoyue Zhao	We introduce a new approach, wander join, to the online aggregation problem by performing random walks over the underlying join graph.
153	PerNav: A Route Summarization Framework for Personalized Navigation	Yaguang Li, Han Su, Ugur Demiryurek, Bolong Zheng, Kai Zeng, Cyrus Shahabi	In this paper, we study a route summarization framework for Personalized Navigation dubbed PerNav – with which the goal is to generate more intuitive and customized turn-by-turn directions based on user generated content.
154	Making the Case for Query-by-Voice with EchoQuery	Gabriel Lyons, Vinh Tran, Carsten Binnig, Ugur Cetintemel, Tim Kraska	With this demonstration, we make the case for querying database systems using a voice-based interface, a new querying and interaction paradigm we call Query-by-Voice (QbV).
155	QUEPA: QUerying and Exploring a Polystore by Augmentation	Antonio Maccioni, Edoardo Basili, Riccardo Torlone	QUEPA implements in this way a lightweight mechanism for data integration in the polystore and operates in a plug-and-play mode, thus reducing the need for ad-hoc configurations and for middleware layers involving standard APIs, unified query languages or shared data models.
156	REACT: Context-Sensitive Recommendations for Data Analysis	Tova Milo, Amit Somech	In this demo we present REACT, a system that hooks to the analysis UI and provides the users with personalized recommendations of analysis actions.
157	PerfEnforce Demonstration: Data Analytics with Performance Guarantees	Jennifer Ortiz, Brendan Lee, Magdalena Balazinska	We demonstrate PerfEnforce, a dynamic scaling engine for analytics services.
158	High-Performance Geospatial Analytics in HyPerSpace	Varun Pandey, Andreas Kipf, Dimitri Vorona, Tobias Mühlbauer, Thomas Neumann, Alfons Kemper	In this demonstration, we present HyPerSpace, an extension to the high-performance main-memory database system HyPer developed at the Technical University of Munich, capable of processing geospatial queries with sub-second latencies.
159	What Makes a Good Physical plan?: Experiencing Hardware-Conscious Query Optimization with Candomblé	Holger Pirk, Oscar Moll, Sam Madden	To address this problem, we developed a system called Candomblé that lets database performance engineers interactively examine, optimize and evaluate query plans using a touch-based interface.
160	SnappyData: A Hybrid Transactional Analytical Store Built On Spark	Jags Ramnarayan, Barzan Mozafari, Sumedh Wale, Sudhir Menon, Neeraj Kumar, Hemant Bhanawat, Soubhik Chakraborty, Yogesh Mahajan, Rishitesh Mishra, Kishor Bachhav	With SnappyData, an open source platform, we propose a unified engine for real-time operational analytics, delivering stream analytics, OLTP and OLAP in a single integrated solution.
161	SourceSight: Enabling Effective Source Selection	Theodoros Rekatsinas, Amol Deshpande, Xin Luna Dong, Lise Getoor, Divesh Srivastava	In this demonstration we present \textsc{SourceSight}, a system that allows users to interactively explore a large number of heterogeneous data sources, and discover valuable sets of sources for diverse integration tasks.
162	BART in Action: Error Generation and Empirical Evaluations of Data-Cleaning Systems	Donatello Santoro, Patricia C. Arocena, Boris Glavic, Giansalvatore Mecca, Renée J. Miller, Paolo Papotti	Many automatic or semi-automatic data-repairing algorithms have been proposed in the last few years, each with its own strengths and weaknesses.
163	RxSpatial: Reactive Spatial Library for Real-Time Location Tracking and Processing	Youying Shi, Abdeltawab M. Hendawi, Hossam Fattah, Mohamed Ali	In this Demo, we present the RxSpatial, a real time reactive spatial library that consists of (1) a front-end, a programming interface for developers who are familiar with the Reactive framework and the Microsoft Spatial Library, and (2) a back-end for processing spatial operations in a streaming fashion.
164	Web-based Benchmarks for Forecasting Systems: The ECAST Platform	Robert Ulbricht, Claudio Hartmann, Martin Hahmann, Hilko Donker, Wolfgang Lehner	We propose the ECAST online platform in order to solve that problem.
165	Energy Elasticity on Heterogeneous Hardware using Adaptive Resource Reconfiguration LIVE	Annett Ungethüm, Thomas Kissinger, Willi-Wolfram Mentzel, Dirk Habich, Wolfgang Lehner	In this demo, we introduce the concept of energy elasticity and propose the energy-control loop as an implementation of this concept.
166	QFix: Demonstrating Error Diagnosis in Query Histories	Xiaolan Wang, Alexandra Meliou, Eugene Wu	In this demo proposal, we outline the design of QFix, a query-centric framework that derives explanations and repairs for discrepancies in relational data based on potential errors in the queries that operated on the data.
167	CoDAR: Revealing the Generalized Procedure & Recommending Algorithms of Community Detection	Xiang Ying, Chaokun Wang, Meng Wang, Jeffrey Xu Yu, Jun Zhang	In this paper, we design a tool called CoDAR, which reveals the generalized procedure of community detection and monitors the real-time structural changes of network during the detection process.
168	DB-Risk: The Game of Global Database Placement	Victor Zakhary, Faisal Nawab, Divyakant Agrawal, Amr El Abbadi	We propose an optimization framework that automatically derives a geo-replication placement plan with the objective of minimizing latency.
169	Quegel: A General-Purpose System for Querying Big Graphs	Qizhen Zhang, Da Yan, James Cheng	In this demonstration, we introduce a general-purpose system for querying big graphs, called Quegel, which treats queries as first-class citizens in the design of its computing model.
170	Introduction to Spark 2.0 for Database Researchers	Michael Armbrust, Doug Bateman, Reynold Xin, Matei Zaharia	This tutorial covers the core APIs for using Spark 2.0, including DataFrames, Datasets, SQL, streaming and machine learning pipelines.
171	Design Tradeoffs of Data Access Methods	Manos Athanassoulis, Stratos Idreos	In this tutorial we survey recent developments in access method design and we place them in the design space where each approach focuses primarily on one or a subset of read performance, update performance, and memory utilization.
172	Data Cleaning: Overview and Emerging Challenges	Xu Chu, Ihab F. Ilyas, Sanjay Krishnan, Jiannan Wang	Data Cleaning: Overview and Emerging Challenges
173	Querying Geo-Textual Data: Spatial Keyword Queries and Beyond	Gao Cong, Christian S. Jensen	The tutorial is designed to offer an overview of the problems addressed in this body of literature and offers an overview of pertinent concepts and techniques.
174	Provenance: On and Behind the Screens	Melanie Herschel, Marcel Hlawatsch	To this end, we will present some fundamental concepts of visualization before we discuss possible visualizations for provenance.
175	Microblogs Data Management Systems: Querying, Analysis, and Visualization	Mohamed F. Mokbel, Amr Magdy	In this tutorial, we give a 1.5 hours overview about microblogs data management, analysis, visualization, and systems.
176	The Challenges of Global-scale Data Management	Faisal Nawab, Divyakant Agrawal, Amr El Abbadi	In this tutorial we survey recent developments in GSDM focusing on identifying fundamental challenges and advancements in addition to open research opportunities.
177	Semistructured Models, Queries and Algebras in the Big Data Era: Tutorial Summary	Yannis Papakonstantinou	Again, the tutorial presents the algebras’ fundamentals while it abstracts away modeling differences that are not applicable.
178	Automatic Entity Recognition and Typing in Massive Text Data	Xiang Ren, Ahmed El-Kishky, Heng Ji, Jiawei Han	In this tutorial, we introduce data-driven methods to recognize typed entities of interest in massive, domain-specific text corpora.
179	Big Graph Analytics Systems	Da Yan, Yingyi Bu, Yuanyuan Tian, Amol Deshpande, James Cheng	The topics covered in this tutorial include programming models and algorithm design, computation models, communication mechanisms, out-of-core support, fault tolerance, dynamic graph support, and so on.
180	Constructing Join Histograms from Histograms with q-error Guarantees	Kaleb Alway, Anisoara Nica	In this paper we present a novel construction algorithm for building a join histogram that accepts two single-column histograms over different attributes, each with q-error guarantees, and produces a histogram over the result of the join operation on these attributes.
181	Graph Summarization for Geo-correlated Trends Detection in Social Networks	Colin Biafore, Faisal Nawab	We tackle this problem by providing effective graph summarizations aimed at the application of geo-correlated trends detection in social networks.
182	M3: Scaling Up Machine Learning via Memory Mapping	Dezhi Fang, Duen Horng Chau	We propose to use a similar approach for general machine learning.
183	K-means Split Revisited: Well-grounded Approach and Experimental Evaluation	Valentin Grigorev, George Chernishev	In this paper we study an existing k-means node split algorithm.
184	Main Memory Adaptive Denormalization	Zezhou Liu, Stratos Idreos	We introduce adaptive denormalization for modern main memory systems.
185	Adaptive Data Skipping in Main-Memory Systems	Wilson Qin, Stratos Idreos	We introduce adaptive data skipping as a framework for structures and techniques that respond to a vast array of data distributions and query workloads.
186	Searching Web Data using MinHash LSH	BiChen Rao, Erkang Zhu	In this extended abstract, we explore the use of MinHash Locality Sensitive Hashing (MinHash LSH) to address the problem of indexing and searching Web data.
187	Research Contribution as a Measure of Influence	Lais M.A. Rocha, Mirella M. Moro	We propose the 3c-index that measures the influence degree of researchers by evaluating the links they establish between communities.
188	Vectorizing an In Situ Query Engine	Panagiotis Sioulas, Anastasia Ailamaki	In this paper, we investigate the effect of vector processing on raw data querying.
189	Exploring Visualization of Data Transforms	Larry Xu	We present the concept of "tweening" of resultsets as a method of incrementally visualizing data transformations, and explore approaches towards generating these resultset tweens.
190	Minimizing Average Regret Ratio in Database	Sepanta Zeighami, Raymond Chi-Wing Wong	We propose "average regret ratio" as a metric to measure users’ satisfaction after a user sees k selected points of a database, instead of all of the points in the database.