Paper Digest: ICDE 2019 Highlights

April 7, 2019October 16, 2019 admin

IEEE International Conference on Data Engineering (ICDE) addresses research issues in designing, building, managing, and evaluating advanced data-intensive systems and applications. In 2019, it is to be held in Macau, China.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting academic paper, you are welcome to sign up our free paper digest service to get new paper updates customized to your own interests on a daily basis.

Paper Digest Team
team@paperdigest.org

TABLE 1: ICDE 2019 Papers/Talks/Tutorials/Demos/…

	Title	Authors	Highlight
1	Data Management at Huawei: Recent Accomplishments and Future Challenges	J. Chen et al.	In this paper, we will go through recent advancements in Huawei data management technologies including a petabyte scale enterprise analytics platform (FusionInsight MPPDB) and a highly available in-memory database for telecommunication networks (GMDB).
2	KV-Match: A Subsequence Matching Approach Supporting Normalization and Time Warping	J. Wu, P. Wang, N. Pan, C. Wang, W. Wang and J. Wang	In this paper, we propose a novel problem, named constrained normalized subsequence matching problem (cNSM), which adds some constraints to NSM problem.
3	Efficient Maximal Spatial Clique Enumeration	C. Zhang, Y. Zhang, W. Zhang, L. Qin and J. Yang	In this paper, we investigate this problem in the context of spatial database.
4	Cluster-Based Subscription Matching for Geo-Textual Data Streams	L. Chen, S. Shang, K. Zheng and P. Kalnis	To solve the CSM problem, we propose a novel solution to cluster, feed, and summarize a stream of geo-textual messages efficiently.
5	Time-Dependent Hop Labeling on Road Network	L. Li, S. Wang and X. Zhou	In this paper, we aim to answer the fastest path profile query on time-dependent road network faster by extending the 2-hop labeling approach, which is fast in answering shortest distance query on the static graph.
6	Weight-Constrained Route Planning Over Time-Dependent Graphs	Y. Yuan, X. Lian, G. Wang, L. Chen, Y. Ma and Y. Wang	In this paper, we study the WRP problem over a large time-dependent graph by incorporating continuous time and weight functions into it.
7	Skyline Queries Constrained by Multi-cost Transportation Networks	Q. Gong, H. Cao and P. Nagarkar	In this paper, we propose a new type of skyline queries whose evaluation is constrained by a multi-cost transportation network (MCTN) and whose answers are off the network.
8	Online Social Media Recommendation Over Streams	X. Zhou, D. Qin, X. Lu, L. Chen and Y. Zhang	In this paper, we propose a novel framework for the social recommendation over streams.
9	Canonicalization of Open Knowledge Bases with Side Information from the Source Text	X. Lin and L. Chen	In this paper, we propose to perform canonicalization over Open IE triples by incorporating the side information from the original data sources, including the candidate entities of the noun phrases detected in the source text, the types of the candidate entities and the domain knowledge of the source text.
10	Walking with Perception: Efficient Random Walk Sampling via Common Neighbor Awareness	Y. Li et al.	To address this issue, we propose a common neighbor aware random walk framework called CNARW, which leverages weighted walking by differentiating the next-hop candidate nodes to speed up the convergence.
11	Building a Broad Knowledge Graph for Products	X. L. Dong	In this talk, we ask the question: Can one build a knowledge graph (KG) for all products in the world?
12	SimMeme: A Search Engine for Internet Memes	T. Milo, A. Somech and B. Youngmann	To address this problem, we present SimMeme, a Meme-dedicated search engine.
13	A Hierarchical Framework for Top-k Location-Aware Error-Tolerant Keyword Search	J. Yang, Y. Zhang, X. Zhou, J. Wang, H. Hu and C. Xing	In this paper, we propose a novel framework to solve the problem of top-k location-aware similarity search with fuzzy token matching.
14	2ED: An Efficient Entity Extraction Algorithm Using Two-Level Edit-Distance	Z. Wen, D. Deng, R. Zhang and R. Kotagiri	In this paper, we propose an efficient character-level and token-level edit-distance based algorithm called FuzzyED.
15	Bridging Quantities in Tables and Text	Y. Ibrahim, M. Riedewald, G. Weikum and D. Zeinalipour-Yazti	This paper introduces the quantity alignment problem: bidirectional linking between textual mentions of quantities and the corresponding table cells, in order to support advanced content summarization and faster navigation between explanations in text and details in tables.
16	An Efficient Insertion Operator in Dynamic Ridesharing Services	Y. Xu, Y. Tong, Y. Shi, Q. Tao, K. Xu and W. Li	In this paper, we propose a novel partition framework and a dynamic programming based insertion with a time complexity of O(n2).
17	Auction-Based Order Dispatch and Pricing in Ridesharing	L. Zheng, P. Cheng and L. Chen	In this paper, we propose solutions for the bonus-offering scenario of ridesharing platforms (service providers).
18	When Geo-Text Meets Security: Privacy-Preserving Boolean Spatial Keyword Queries	N. Cui, J. Li, X. Yang, B. Wang, M. Reynolds and Y. Xiang	Therefore, in this work, we first propose and formalize the problem of privacy-preserving boolean spatial keyword query under the widely accepted Known Background Thread Model.
19	Moving Object Linking Based on Historical Trace	F. Jin, W. Hua, J. Xu and X. Zhou	In this work, we study the problem of moving object linking based on their historical traces.
20	Knowledge Graphs and Enterprise AI: The Promise of an Enabling Technology	L. Bellomarini, D. Fakhoury, G. Gottlob and E. Sallinger	We propose knowledge graphs as the reference technology for the enterprise AI context, i.e., the complex of entities, properties and relationships that shape a business domain and constitute a common backbone for all AI-driven applications.
21	ImageProof: Enabling Authentication for Large-Scale Image Retrieval	S. Guo, J. Xu, C. Zhang, C. Xu and T. Xiang	In this paper, we take the first step in studying the problem of query authentication for large-scale image retrieval.
22	Time Constrained Continuous Subgraph Search Over Streaming Graphs	Y. Li, L. Zou, M. T. ?zsu and D. Zhao	In this paper, we study the subgraph (isomorphism) search over streaming graph data that obeys timing order constraints over the occurrence of edges in the stream.
23	Utilizing Dynamic Properties of Sharing Bits and Registers to Estimate User Cardinalities Over Time	P. Wang, P. Jia, X. Zhang, J. Tao, X. Guan and D. Towsley	To address these problems, we develop novel bit and register sharing algorithms, which use a bit array and a register array to build a compact sketch of all users’ connected items respectively.
24	Tracking Influential Nodes in Time-Decaying Dynamic Interaction Networks	J. Zhao, S. Shang, P. Wang, J. C. S. Lui and X. Zhang	In this work, we address the dynamic influence challenge by designing efficient streaming methods that can identify influential nodes from highly dynamic node interaction streams.
25	Fast and Accurate Graph Stream Summarization	X. Gou, L. Zou, C. Zhao and T. Yang	In this paper, we propose a novel Graph Stream Sketch (GSS for short) to summarize the graph streams, which has linear space cost O(\|E\|) (E is the edge set of the graph) and constant update time cost (O(1)) and supports most kinds of queries over graph streams with the controllable errors.
26	Mining Periodic Cliques in Temporal Networks	H. Qin, R. Li, G. Wang, L. Qin, Y. Cheng and Y. Yuan	In this paper, we study a problem of seeking periodic communities in a temporal network, where each edge is associated with a set of timestamps.
27	Coloring Embedder: A Memory Efficient Data Structure for Answering Multi-set Query	Y. Tong et al.	In this paper, we propose a new data structure named coloring embedder, which is fast, accurate as well as memory efficient.
28	Mining Order-Preserving Submatrices Under Data Uncertainty: A Possible-World Approach	J. Cheng, D. Yan, X. Hao and W. Ng	An optimized dynamic programming approach is proposed to compute the probability that a row supports a particular column permutation, and several effective pruning rules are introduced to efficiently prune insignificant OPSMs.
29	Multi-Dimensional Genomic Data Management for Region-Preserving Operations	O. Horlova, A. Kaitoua, V. Markl and S. Ceri	In this paper, we focus on the efficient execution of region-preserving GMQL operations, in which the regions of the result are a subset of the regions of one of the operands; most GMQL operations are region-preserving.
30	Improved Algorithms for Maximal Clique Search in Uncertain Networks	R. Li, Q. Dai, G. Wang, Z. Ming, L. Qin and J. X. Yu	To overcome this issue, we propose two new core-based pruning algorithms to reduce the uncertain graph size without missing any maximal (k, t)-clique.
31	Lazo: A Cardinality-Based Method for Coupled Estimation of Jaccard Similarity and Containment	R. Castro Fernandez, J. Min, D. Nava and S. Madden	The main contribution of this paper is LAZO, a method to simultaneously estimate both the similarity and containment of datasets, based on a redefinition of Jaccard similarity which takes into account the cardinality of each set.
32	TARDIS: Distributed Indexing Framework for Big Time Series Data	L. Zhang, N. Alghamdi, M. Y. Eltabakh and E. A. Rundensteiner	In this paper, we propose the TARDIS distributed indexing framework to overcome the aforementioned limitations.
33	Mostly Order Preserving Dictionaries	C. Liu, M. Umbenhower, H. Jiang, P. Subramaniam, J. Ma and A. J. Elmore	In this work we bridge this gap by introducing mostly ordered dictionaries that use a best effort dictionary generation based on sampling the input dataset.
34	Multi-copy Cuckoo Hashing	D. Li, R. Du, Z. Liu, T. Yang and B. Cui	To address the problem, we propose an efficient Cuckoo hashing scheme called Multi-copy Cuckoo or McCuckoo.
35	Efficient Scalable Multi-attribute Index Selection Using Recursive Strategies	R. Schlosser, J. Kossmann and M. Boissier	We introduce a novel recursive strategy that does not exclude index candidates in advance and effectively accounts for index interaction.
36	To Index or Not to Index: Optimizing Exact Maximum Inner Product Search	F. Abuzaid, G. Sethi, P. Bailis and M. Zaharia	In this paper, we show that a hardware-efficient brute-force approach, blocked matrix multiply (BMM), can outperform the state-of-the-art MIPS solvers by over an order of magnitude, for some-but not all-inputs.
37	Is There a Data Science and Engineering Brain Drain? If So, How Can We Rebalance Them?	J. Pei	Is There a Data Science and Engineering Brain Drain? If So, How Can We Rebalance Them?
38	Distributed In-memory Trajectory Similarity Search and Join on Road Network	H. Yuan and G. Li	To support trajectory similarity search and join, we propose a filtering-refine framework.
39	Stochastic Weight Completion for Road Networks Using Graph Convolutional Networks	J. Hu, C. Guo, B. Yang and C. S. Jensen	We propose a generic learning framework called Graph Convolutional Weight Completion (GCWC) that exploits the topology of a road network graph and the correlations of weights among adjacent edges to estimate stochastic weights for all edges.
40	Identifying the Most Interactive Object in Spatial Databases	D. Amagata and T. Hara	This paper investigates a new query, called an MIO query, that retrieves the Most Interactive Object in a given spatial dataset.
41	An Efficient Framework for Correctness-Aware kNN Queries on Road Networks	D. He, S. Wang, X. Zhou and R. Cheng	Motivated by this, we propose a framework on correctness-aware kNN queries which aim to optimize system throughput while guaranteeing query correctness on moving objects.
42	MPR ? A Partitioning-Replication Framework for Multi-Processing kNN Search on Road Networks	S. Luo, B. Kao, X. Wu and R. Cheng	We propose the MPR (Multi-layer Partitioning-Replication) mechanism that orchestrates CPU cores and schedules kNN query and index update processes to run on the cores.
43	AppUsage2Vec: Modeling Smartphone App Usage for Prediction	S. Zhao et al.	In this paper, we propose a novel framework for app usage prediction, called AppUsage2Vec, inspired by Doc2Vec.
44	iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making	P. Lahoti, K. P. Gummadi and G. Weikum	The paper introduces a method for probabilistically mapping user records into a low-rank representation that reconciles individual fairness and the utility of classifiers and rankings in downstream applications.
45	DBSCAN-MS: Distributed Density-Based Clustering in Metric Spaces	K. Yang, Y. Gao, R. Ma, L. Chen, S. Wu and G. Chen	In this paper, we propose DBSCAN-MS, a distributed density-based clustering in metric spaces.
46	Computing Trajectory Similarity in Linear Time: A Generic Seed-Guided Neural Metric Learning Approach	D. Yao, G. Cong, C. Zhang and J. Bi	We propose NeuTraj to accelerate trajectory similarity computation.
47	Bursty Event Detection Throughout Histories	D. Paul, Y. Peng and F. Li	In this paper, we investigate the problem of bursty event detection where we define burst as the acceleration over the incoming rate of an event mentioning.
48	RUM: Network Representation Learning Using Motifs	Y. Yu, Z. Lu, J. Liu, G. Zhao and J. Wen	Towards the leveraging of graph motifs that constitute higher-order organizations in a network, we propose two strategies, namely MotifWalk and MotifRe-weighting for learning motif-aware network embeddings.
49	Finding Significant Items in Data Streams	T. Yang, H. Zhang, D. Yang, Y. Huang and X. Li	In this paper, we define a new issue, named finding top-k significant items, and propose a novel algorithm namely LTC to address this issue.
50	Knowledge-Aware Deep Dual Networks for Text-Based Mortality Prediction	N. Liu, P. Lu, W. Zhang and J. Wang	To address the above issues, we propose novel Knowledge-aware Deep Dual Networks (K-DDN) for the text-based mortality prediction task.
51	Robust High Dimensional Stream Classification with Novel Class Detection	Z. Wang, Z. Kong, S. Changra, H. Tao and L. Khan	In this paper, we focus on addressing this challenge by proposing an effective learning framework called CNN-based Prototype Ensemble (CPE) for novel class detection and correction.
52	Towards the Completion of a Domain-Specific Knowledge Base with Emerging Query Terms	S. Jiang, J. Liang, Y. Xiao, H. Tang, H. Huang and J. Tan	In this paper, we use the product knowledge base in the largest Chinese e-commerce platform, Taobao, as an example to investigate a completion procedure of a domain-specific knowledge base.
53	Cooperation-Aware Task Assignment in Spatial Crowdsourcing	P. Cheng, L. Chen and J. Ye	In this paper, we consider an important spatial crowdsourcing problem, namely cooperation-aware spatial crowdsourcing (CA-SC), where spatial tasks (e.g., collecting the Wi-Fi signal strength in one building) are time-constrained and require more than one worker to complete thus the cooperation among assigned workers is essential to the result.
54	Minimizing Maximum Delay of Task Assignment in Spatial Crowdsourcing	Z. Chen, P. Cheng, Y. Zeng and L. Chen	In this paper, we study the minimizing maximum delay spatial crowdsourcing (MMD-SC) problem and propose solutions aiming at achieving a worst case controlled task assignment.
55	Physical Representation-Based Predicate Optimization for a Visual Analytics Database	M. R. Anderson, M. Cafarella, G. Ros and T. F. Wenisch	In this paper, we propose TAHOMA, which generates and evaluates many potential classifier cascades that jointly optimize the CNN architecture and input data representation.
56	Adaptive Dynamic Bipartite Graph Matching: A Reinforcement Learning Approach	Y. Wang, Y. Tong, C. Long, P. Xu, K. Xu and W. Lv	In this paper, we propose the dynamic bipartite graph matching (DBGM) problem to be better aligned with real-world applications and devise a novel adaptive batch-based solution framework with a constant competitive ratio.
57	Scalable Frequent Sequence Mining with Flexible Subsequence Constraints	A. Renz-Wieland, M. Bertsch and R. Gemulla	We derive a general framework for frequent sequence mining under this model and propose the D-SEQ and D-CAND algorithms within this framework.
58	Adaptive Influence Blocking: Minimizing the Negative Spread by Observation-Based Policies	Q. Shi, C. Wang, D. Ye, J. Chen, Y. Feng and C. Chen	Motivated by the above considerations, we propose a novel Adaptive Influence Blocking (AIB) problem.
59	Fraction-Score: A New Support Measure for Co-location Pattern Mining	H. K. Chan, C. Long, D. Yan and R. C. Wong	In this paper, we propose a new measure called Fraction-Score whose idea is to count instances fractionally if they overlap.
60	Discovery and Ranking of Functional Dependencies	Z. Wei and S. Link	Utilizing new data structures and original techniques for the dynamic computation of stripped partitions, we devise a new hybridization strategy that outperforms the best algorithms in terms of efficiency, column-, and row-scalability.
61	Adaptive Deep Reuse: Accelerating CNN Training on the Fly	L. Ning, H. Guan and X. Shen	This work proposes adaptive deep reuse, a method for accelerating CNN training by identifying and avoiding the unnecessary computations contained in each specific training on the fly.
62	Slice Finder: Automated Data Slicing for Model Validation	Y. Chung, T. Kraska, N. Polyzotis, K. H. Tae and S. E. Whang	Unlike general techniques (e.g., clustering) that can find arbitrary slices, our goal is to find interpretable slices (which are easier to take action compared to arbitrary subsets) that are large and problematic.
63	Answering Why-Questions for Subgraph Queries in Multi-attributed Graphs	Q. Song, M. H. Namaki and Y. Wu	We introduce measures that characterize good query rewrites by incorporating both query editing cost and answer closeness.
64	Neural Multi-task Recommendation from Multi-behavior Data	C. Gao et al.	In this work, we contribute a new solution named NMTR (short for Neural Multi-Task Recommendation) for learning recommender systems from user multi-behavior data.
65	AUC-MF: Point of Interest Recommendation with AUC Maximization	P. Han, S. Shang, A. Sun, P. Zhao, K. Zheng and P. Kalnis	In this paper, we propose AUC-MF to address the POI recommendation problem by maximizing Area Under the ROC curve (AUC).
66	Efficient Batch One-Hop Personalized PageRanks	S. Luo, X. Xiao, W. Lin and B. Kao	To address the limitations of existing algorithms, this paper presents Baton, an algorithm for batch one-hop PPR that offers strong practical efficiency.
67	BB-Tree: A Main-Memory Index Structure for Multidimensional Range Queries	S. Sprenger, P. Sch?fer and U. Leser	We present the BB-Tree, a fast and space-efficient index structure for processing multidimensional workloads in main memory.
68	Explaining Queries Over Web Tables to Non-experts	J. Berant, D. Deutch, A. Globerson, T. Milo and T. Wolfson	We introduce novel query explanations that provide a graphic representation of the query cell-based provenance (in its execution on a given table).
69	Nonlinear Models Over Normalized Data	Z. Cheng and N. Koudas	In this paper we study the implementation of popular nonlinear ML models and in particular independent Gaussian Mixture Models (IGMM) over normalized data.
70	Continuous Search on Dynamic Spatial Keyword Objects	Y. Dong, H. Chen and H. Kitagawa	In this paper, we define a novel query problem to continuously search for dynamic spatial keyword objects.
71	Top-K Frequent Term Queries on Streaming Data	S. Farazi and D. Rafiei	We propose two variations of reverse spatial term queries on streaming data and an approach for efficiently evaluating them with some bounds on the error.
72	Parallel and Distributed Processing of Reverse Top-k Queries	P. Nikitopoulos, G. A. Sfyris, A. Vlachou, C. Doulkeridis and O. Telelis	In this paper, we address the problem of processing reverse top-k queries in a parallel and distributed setting.
73	Distributed Discovery of Functional Dependencies	H. Saxena, L. Golab and I. F. Ilyas	We address the problem of discovering functional dependencies from distributed big data.
74	Enumerating k-Vertex Connected Components in Large Graphs	D. Wen, L. Qin, Y. Zhang, L. Chang and L. Chen	In this paper, given a graph G and an integer k, we study the problem of computing all k-VCCs in G.
75	AID: An Adaptive Image Data Index for Interactive Multilevel Visualization	S. Ghosh, A. Eldawy and S. Jais	This paper introduces the first adaptive visualization index that combines both data and images to provide a scalable, interactive visualization while minimizing the index size and index construction time.
76	Collecting Preference Rankings Under Local Differential Privacy	J. Yang, X. Cheng, S. Su, R. Chen, Q. Ren and Y. Liu	In this paper, we initiate the study of collecting preference rankings under local differential privacy.
77	Muses: Distributed Data Migration System for Polystores	A. Kaitoua, T. Rabl, A. Katsifodimos and V. Markl	In this paper we present Muses, a distributed, high-performance data migration engine that is able to forward, transform, repartition, and broadcast data between distributed engines’ instances efficiently.
78	PriSTE: From Location Privacy to Spatiotemporal Event Privacy	Y. Cao, Y. Xiao, L. Xiong and L. Bai	To address this problem, we define the spatiotemporal event as a new privacy goal, which can be formalized as Boolean expressions between location and time predicates.
79	Continuous Range Queries Over Multi-attribute Trajectories	J. Xu, Z. Bao and H. Lu	In this paper, we study continuous range queries over multi-attribute trajectories.
80	Insecurity and Hardness of Nearest Neighbor Queries Over Encrypted Data	R. Li, A. X. Liu, Y. Liu, H. Xu and H. Yuan	By viewing dimensions of the data and the encrypted data as source signals and observed signals, respectively, we formally prove and experimentally demonstrate that ASPE is actually insecure against even ciphertext only attacks, using signal processing theory.
81	Modeling Multidimensional User Preferences for Collaborative Filtering	F. Khawar and N. L. Zhang	In this paper, we propose such a method for implicit feedback data.
82	A Queueing-Theoretic Framework for Vehicle Dispatching in Dynamic Car-Hailing	P. Cheng, C. Feng, L. Chen and Z. Wang	In this paper, we consider an important dynamic car-hailing problem, namely maximum revenue vehicle dispatching (MRVD), in which rider requests dynamically arrive and drivers need to serve as many riders as possible such that the entire revenue of the platform is maximized.
83	Maximizing the Utility in Location-Based Mobile Advertising	P. Cheng, X. Lian, L. Chen and S. Liu	In this paper, we consider a location-based advertising problem, namely maximum utility advertisement assignment (MUAA) problem, with the estimation of the interests of customers and the contexts of the vendors, we want to maximize the overall utility of ads by determining the ads sent to each customer subject to the constraints of the capacities of customers, the distance ranges and the budgets of vendors.
84	Automated Grading of SQL Queries	B. Chandra, A. Banerjee, U. Hazra, M. Joseph and S. Sudarshan	In this paper, we discuss techniques to award partial marks to student SQL queries, in case they are incorrect, based on a weighted equivalence edit distance metric.
85	Index-Based Optimal Algorithm for Computing K-Cores in Large Uncertain Graphs	B. Yang, D. Wen, L. Qin, Y. Zhang, L. Chang and R. Li	To overcome these drawbacks, we have developed an index-based solution for computing (k, ?)-cores in this paper.
86	Parameter Discovery in Unsupervised Clustering	V. Clement and T. Heinis	In this paper, we introduce the idea of simple assumptions about the global distribution of some property of the data leading to local, actionable insights.
87	Interaction-Aware Arrangement for Event-Based Social Networks	F. Kou, Z. Zhou, H. Cheng, J. Du, Y. Shi and P. Xu	In this work, we propose a new event-participant arrangement problem called Interaction-aware Global Event-Participant Arrangement (IGEPA).
88	Optimizing Cross-Platform Data Movement	S. Kruse, Z. Kaoudi, J. Quiane-Ruiz, S. Chawla, F. Naumann and B. Contreras-Rojas	In this paper, we present the graph-based data movement strategy used by Rheem, our open-source cross-platform system.
89	Highly Efficient Pattern Mining Based on Transaction Decomposition	Y. Djenouri, J. Chun-Wei Lin, K. N?rv?g and H. Ramampiaro	This paper introduces a highly efficient pattern mining technique called Clustering-Based Pattern Mining (CBPM).
90	Procrastination-Aware Scheduling: A Bipartite Graph Perspective	L. Wang, Y. Tong, C. Hu, L. Chen and Y. Li	We find the PSP is NP-hard in the strong sense and design an approximation algorithm. In this paper, we first propose the Procrastination-aware Scheduling Problem (PSP) to model an appropriate schedule.
91	Hankel Matrix Factorization for Tagged Time Series to Recover Missing Values During Blackouts	S. Wu, L. Wang, T. Wu, X. Tao and J. Lu	While the existing approaches for missing value recovery in time series could not handle this issue properly, in this work, we proposes a Hankel matrix factorization-based approach for tagged time series called HKMF-T, following the idea of decomposing a data sequence into the smooth trend and the external impact components.
92	PerRD: A System for Personalized Route Description	H. Su, G. Cong, W. Chen, Q. Su, B. Zheng and K. Zheng	In this paper, we study a Personalized Route Description system dubbed PerRD – with which the goal is to generate more customized and intuitive route descriptions based on user generated content.
93	Scalable Metric Similarity Join Using MapReduce	J. Wu, Y. Zhang, J. Wang, C. Lin, Y. Fu and C. Xing	In this paper, we propose SMS-Join, a parallel framework to support similarity join in metric space based on the MapReduce paradigm.
94	An Indexing Framework for Efficient Visual Exploratory Subgraph Search in Graph Databases	C. Wang, M. Xie, S. S. Bhowmick, B. Choi, X. Xiao and S. Zhou	In this paper, we present two novel index structures called VACCINE and ADVISE to efficiently support exploratory subgraph search in a visual environment (VESS).
95	I-LSH: I/O Efficient c-Approximate Nearest Neighbor Search in High-Dimensional Space	W. Liu, H. Wang, Y. Zhang, W. Wang and L. Qin	In this paper, we introduce an incremental search based c-ANN search algorithm, named I-LSH.
96	Computing a Near-Maximum Independent Set in Dynamic Graphs	W. Zheng, C. Piao, H. Cheng and J. X. Yu	Since computing the exact MIS is intractable, we compute the high-quality (large-size) independent set for dynamic graphs in this paper, where 4 graph updating operations are allowed: adding or deleting a vertex or an edge.
97	T-Sample: A Dual Reservoir-Based Sampling Method for Characterizing Large Graph Streams	L. Zhang, H. Jiang, F. Wang, D. Feng and Y. Xie	This paper proposes a new method, called triangle-induced reservoir sampling, or T-Sample, to produce connected edge samples.
98	Real Time Principal Component Analysis	R. R. Chowdhury, M. A. Adnan and R. K. Gupta	In this paper, we propose a variant of PCA, that is suited for real-time applications.
99	A Fast Sketch Method for Mining User Similarities Over Fully Dynamic Graph Streams	P. Jia, P. Wang, J. Tao and X. Guan	Based on the sketch built on-the-fly, we develop a method to estimate user similarities over time.
100	A Collaborative Framework for Similarity Enforcement in Synthetic Scaling of Relational Datasets	J. W. Zhang and Y. C. Tay	This paper proposes ASPECT, which adopts a different approach, for relational datasets.
101	Meta Diagram Based Active Social Networks Alignment	Y. Ren, C. C. Aggarwal and J. Zhang	In this paper, we will study the network alignment problem to fuse online social networks specifically.
102	Entity Integrity, Referential Integrity, and Query Optimization with Embedded Uniqueness Constraints	Z. Wei, U. Leck and S. Link	Entity Integrity, Referential Integrity, and Query Optimization with Embedded Uniqueness Constraints
103	Efficient Pattern Mining Based Cryptanalysis for Privacy-Preserving Record Linkage	A. Vidanage, T. Ranbaduge, P. Christen and R. Schnell	Here we present a cryptanalysis attack that can re-identify attribute values encoded in BFs.
104	ECOQUG: An Effective Ensemble Community Scoring Function	C. Wang, H. Wang, C. Zhou, J. Li and H. Gao	In this paper, we propose a new community scoring function, ECOQUG.
105	CN-Probase: A Data-Driven Approach for Large-Scale Chinese Taxonomy Construction	J. Chen et al.	In this paper, we focus on automatic Chinese taxonomy construction and propose an effective generation and verification framework to build a large-scale and high-quality Chinese taxonomy.
106	Understanding Data Science Lifecycle Provenance via Graph Segmentation and Summarization	H. Miao and A. Deshpande	In this paper, we propose two high-level graph query operators to address the verboseness and evolving nature of such provenance graphs.
107	Contextual Community Search Over Large Social Networks	L. Chen, C. Liu, K. Liao, J. Li and R. Zhou	In this paper, we propose a novel parameter-free contextual community model for attributed community search.
108	Efficient Partitioning and Query Processing of Spatio-Temporal Graphs with Trillion Edges	M. Ding and S. Chen	In this paper, we define a formal spatio-temporal graph model based on real-world applications, and propose PAST, a framework for efficient PArtitioning and query processing of Spatio-Temporal graphs.
109	Top-K Entity Resolution with Adaptive Locality-Sensitive Hashing	V. Verroios and H. Garcia-Molina	In this paper we introduce the problem of top-k entity resolution and we summarize a novel approach for this problem; full details are presented in a technical report.
110	Finding Average Regret Ratio Minimizing Set in Database	S. Zeighami and R. C. Wong	In this paper, we would like to find a set of k points such that on average, the satisfaction (ratio) of a user is maximized.
111	HyMJ: A Hybrid Structure-Aware Approach to Distributed Multi-way Join Query	G. Zhu et al.	In this paper, we present a novel hybrid structure-aware multi-way join algorithm called HyMJ, which combines the one-round and multi-round algorithms to compute the hybrid query efficiently.
112	Accelerate MaxBRkNN Search by kNN Estimation	X. Chen, X. Cao, Z. Xu, Y. Zhang, S. Shang and W. Zhang	Observing this, we develop an approach which computes kNN for only promising clients by utilising a two-level grid index (ADPGI) to reduce the cost substantially.
113	Efficient Bottom-Up Discovery of Multi-scale Time Series Correlations Using Mutual Information	N. Ho, T. B. Pedersen, M. Vu, V. L. Ho and C. A. N. Biscio	This paper presents an approach to search for synchronous correlations in big time series that displays all three properties: the proposed method (i) utilizes the metric of mutual information from information theory, providing a strong theoretical foundation, (ii) is able to discover correlations at multiple temporal scales, and (iii) works in an efficient, bottom-up fashion, making it scalable to large datasets.
114	Fingerprinting Big Data: The Case of KNN Graph Construction	R. Guerraoui, A. Kermarrec, O. Ruas and F. Ta?ani	We propose fingerprinting, a new technique that consists in constructing compact, fast-to-compute and privacy-preserving binary representations of datasets.
115	Outer and Anti Joins in Temporal-Probabilistic Databases	K. Papaioannou, M. Theobald and M. Boehlen	For the computation of TP joins with negation, we introduce generalized lineage-aware temporal windows, a mechanism that binds an interval to the lineages of all the matching valid tuples of each input relation.
116	Workload-Driven Fragment Allocation for Partially Replicated Databases Using Linear Programming	S. Halfpap and R. Schlosser	In this paper, we define a linear programming (LP) model to calculate the set of partial replicas with the lowest overall memory capacity while evenly balancing the query load.
117	OSMAC: Optimizing Subgraph Matching Algorithms with Community Structure	Y. Lou and C. Wang	This paper proposes an optimization method named OSMAC to accelerate subgraph matching algorithms with the community structure of data graphs.
118	Workload-Aware Subgraph Query Caching and Processing in Large Graphs	Y. Liang and P. Zhao	In this paper, we address subgraph queries with the availability of query workload information, W = {w1,…, wn}, where wi ?
119	How I Learned to Stop Worrying and Love Re-optimization	M. Perron, Z. Shang, T. Kraska and M. Stonebraker	In this paper we investigate why this is still the case, despite decades of improvements to cost models, plan enumeration, and cardinality estimation.
120	CRA: Enabling Data-Intensive Applications in Containerized Environments	I. Sabek, B. Chandramouli and U. F. Minhas	In this paper, we factor out the commonalities in a large majority of these applications, into a generic dataflow layer called Common Runtime for Applications (CRA).
121	Scalable Similarity Joins of Tokenized Strings	A. Metwally and C. Huang	We propose a scalable distributed framework, Tokenized-String Joiner (TSJ), that adopts existing scalable string-join algorithms as building blocks to perform NSLD-joins.
122	MLlib*: Fast Training of GLMs Using Spark MLlib	Z. Zhang, J. Jiang, W. Wu, C. Zhang, L. Yu and B. Cui	In this paper, we study the performance of MLlib with a focus on training generalized linear models using gradient descent.
123	DirectLoad: A Fast Web-Scale Index System Across Large Regional Centers	A. Qin, M. Xiao, J. Ma, D. Tan, R. Lee and X. Zhang	In this paper, we show the effectiveness and efficiency of an in-memory index updating system, which is disruptive to the framework in a conventional memory hierarchy.
124	Presto: SQL on Everything	R. Sethi et al.	In this paper, we outline a selection of use cases that Presto supports at Facebook.
125	Improving RDF Query Performance Using In-memory Virtual Columns in Oracle Database	E. I. Chong, M. Perry and S. Das	In this paper, we propose to use in-memory virtual columns to avoid value table joins.
126	Rima: An RDMA-Accelerated Model-Parallelized Solution to Large-Scale Matrix Factorization	J. Geng, D. Li and S. Wang	Targeting at these drawbacks, we propose Rima, which uses ring-based model parallelism to solve large-scale MF with higher communication efficiency.
127	SEBDB: Semantics Empowered BlockChain DataBase	Y. Zhu, Z. Zhang, C. Jin, A. Zhou and Y. Yan	In this paper, we propose and implement a novel blockchain database, called SEBDB, which leverages the existing databases’ functionality which are optimized for decades.
128	Large Scale Traffic Signal Network Optimization – A Paradigm Shift Driven by Big Data	L. Yu et al.	In this paper, we will introduce our method for large scale traffic signal optimization, which is the major module of Alibaba’s city brain solution.
129	Domain-Independent Automated Processing of Free-Form Text Data in Telecom	R. Bhowmik and A. Akyamac	In this paper, we propose a domain-agnostic, unsupervised approach that deploys a multi-stage text processing pipeline for automatically discovering the key topics and categories from free-form text documents.
130	DRIVEN: a Framework for Efficient Data Retrieval and Clustering in Vehicular Networks	B. Havers, R. Duvignau, H. Najdataei, V. Gulisano, A. C. Koppisetty and M. Papatriantafilou	The goal of the DRIVEN framework, presented here, is to address these challenges for a data gathering and distance-based clustering tool in the context of vehicular networks.
131	Accurate Product Attribute Extraction on the Field	M. Rezk, L. Alonso Alemany, L. Nio and T. Zhang	In this paper we present a bootstrapping approach for attribute value extraction that minimizes the need for human intervention.
132	CATS: Cross-Platform E-Commerce Fraud Detection	H. Weng et al.	In this paper, we present an efficient, platform-independent, and robust e-commerce fraud detection system, CATS, to detect frauds for different large-scale e-commerce platforms.
133	Caladrius: A Performance Modelling Service for Distributed Stream Processing Systems	F. Kalim et al.	We find that general traffic trends in most jobs lend themselves well to prediction.
134	FAIR: Fraud Aware Impression Regulation System in Large-Scale Real-Time E-Commerce Search Platform	Z. Li et al.	In this paper, we propose the first fraud aware impression regulation system (FAIR) which is data-driven and can work in large-scale e-commerce platforms.
135	Accelerating Partial Evaluation in Distributed SPARQL Query Evaluation	P. Peng, L. Zou and R. Guan	To improve the efficiency of finding partial matches further, we propose an optimization that communicates variables’ candidates among sites to avoid redundant computations.
136	Micro-Browsing Models for Search Snippets	M. A. Islam, R. Srikant and S. Basu	In this paper, we propose a novel formulation: a micro-browsing model for how users read result snippets.
137	Interpretable Multi-task Learning for Product Quality Prediction with Attention Mechanism	C. Yeh, Y. Fan and W. Peng	In this paper, we investigate the problem of mining multivariate time series data generated from sensors mounted on manufacturing stations for early product quality prediction.
138	Learning Effective Embeddings From Crowdsourced Labels: An Educational Case Study	G. Xu, W. Ding, J. Tang, S. Yang, G. Y. Huang and Z. Liu	In this paper, we investigate the above problem and propose a novel framework of Representation Learning with crowdsourced Labels, i.e., “RLL”, which learns representation of data with crowdsourced labels by jointly and coherently solving the challenges introduced by limited and inconsistent labels.
139	A Prescription Trend Analysis using Medical Insurance Claim Big Data	K. Umemoto, K. Goda, N. Mitsutake and M. Kitsuregawa	(1) We propose a latent variable model that simulates the medication behavior of physicians to accurately reproduce monthly prescription time series from the MIC data, where prescription links between the diseases and medicines are missing.
140	Differential Data Quality Verification on Partitioned Data	S. Schelter et al.	We therefore present a differential generalization of the computational model of Deequ, based on algebraic states with monoid properties.
141	Logan: A Distributed Online Log Parser	A. Agrawal, R. Karlupia and R. Gupta	In this paper, we train a data-driven log parser on our new Apache Spark dataset, the largest application log dataset yet.
142	WebPut: A Web-Aided Data Imputation System for the General Type of Missing String Attribute Values	S. Shan et al.	In this demonstration, we present an end-to-end web-aided data imputation prototype system named WebPut.
143	Blockplane: A Global-Scale Byzantizing Middleware	F. Nawab and M. Sadoghi	We propose Blockplane, a middleware that enables making existing benign systems tolerate byzantine failures.
144	FGreat: Focused Graph Query Autocompletion	N. Ng, P. Yi, Z. Zhang, B. Choi, S. S. Bhowmick and J. Xu	This demonstration presents an interactive visual Focused GRaph quEry AutocompleTion framework, called FGreat.
145	Aucher: Multi-modal Queries on Live Audio Streams in Real-Time	Z. Wen, M. Liang, B. He, Z. Xia and B. Li	This paper demonstrates a real-time search system called Aucher for live audio streams.
146	SAC: A System for Big Data Lineage Tracking	M. Tang et al.	To address this issue, we build Spark-Atlas-Connector (short as SAC), a new system to track data lineage in a distributed computation platform, e.g., Spark.
147	A Gossip-Based System for Fast Approximate Score Computation in Multinomial Bayesian Networks	A. Zachariah, P. Rao, A. Katib, M. Senapati and K. Barnard	In this paper, we present a system for fast approximate score computation, a fundamental task for score-based structure learning of multinomial Bayesian networks.
148	Faster, Higher, Stronger: Redesigning Spreadsheets for Scale	M. Bendre et al.	We demonstrate three key features of DATASPREAD to address the aforementioned spreadsheet scalability challenges in interactivity, navigability, and expressiveness1.
149	RecovDB: Accurate and Efficient Missing Blocks Recovery for Large Time Series	I. Arous, M. Khayati, P. Cudr?-Mauroux, Y. Zhang, M. Kersten and S. Stalinlov	In this demo, we present RecovDB, a relational database system enhanced with advanced matrix decomposition technology for missing blocks recovery.
150	AI Pro: Data Processing Framework for AI Models	R. Frost, D. Paul and F. Li	We present AI Pro, an open-source framework for data processing with Artificial Intelligence (AI) models.
151	IVLG: Interactive Visualization of Large Graphs	M. Krommyda, V. Kantere and Y. Vassiliou	In order to overcome the limitations regarding the volume of the presented information, we have developed a novel technique that enables the interactive visualization as one continuous graph of datasets with millions of elements.
152	Just in Time: Personal Temporal Insights for Altering Model Decisions	N. Boer, D. Deutch, N. Frost and T. Milo	To this end, we propose a novel framework that provides users with insights and plans for changing their classification in particular future time points.
153	GeoSparkViz in Action: A Data System with Built-in Support for Geospatial Visualization	J. Yu, A. Tahir and M. Sarwat	This paper demonstrates GeoSparkViz, a full-fledged system that allows the user to load, prepare, integrate and execute MapViz tasks in the same system.
154	BENU: Distributed Subgraph Enumeration with Backtracking-Based Framework	Z. Wang, R. Gu, W. Hu, C. Yuan and Y. Huang	Different from those join-based algorithms, we develop a new backtracking-based framework BENU for distributed subgraph enumeration.
155	Hybrid.Poly: A Consolidated Interactive Analytical Polystore System	M. Podkorytov and M. Gubanov	Here we describe HYBRID.POLY- a consolidated in-memory polystore engine [2], designed to support heterogeneous large-scale data and interactively process complex analytical work-loads.
156	EXPLAINER: Entity Resolution Explanations	A. Ebaid, S. Thirumuruganathan, W. G. Aref, A. Elmagarmid and M. Ouzzani	In this demo, we propose ExplainER, a tool to understand and explain entity resolution classifiers with different granularity levels of explanations.
157	CEP-Wizard: Automatic Deployment of Distributed Complex Event Processing	Y. Shin, S. Yoon, P. Trirat and J. Lee	In this demonstration, we present CEP-Wizard, a framework of automatically configuring and deploying a distributed CEP engine with minimum effort.
158	A Comparison of Allocation Algorithms for Partially Replicated Databases	S. Halfpap and R. Schlosser	In this paper, we test and compare state-of-the-art allocation algorithms for partial replication.
159	PePPer: Fine-Grained Personal Access Control via Peer Probing	Y. Amsterdamer and O. Drien	To enable peers to manage access control rights on such data we introduce PePPer, a tool for fine-grained, personal access control.
160	COBRA: Compression Via Abstraction of Provenance for Hypothetical Reasoning	D. Deutch, Y. Moskovitch and N. Rinetzky	To this end, we present a framework that allows to reduce provenance size.
161	CogLearn: A Cognitive Graph-Oriented Online Learning System	Y. Pian, Y. Lu, P. Chen and Q. Duan	We propose and implement a novel online learning system, called CogLearn, to support learner’s self-awareness and reflective thinking, which urges a proper form of knowledge representation together with individual learner’s cognitive status.
162	GRIT: Consistent Distributed Transactions Across Polyglot Microservices with Multiple Databases	G. Zhang, K. Ren, J. Ahn and S. Ben-Romdhane	In this demo we present GRIT: a system that resolves this challenge by cleverly leveraging deterministic database technologies and optimistic concurrency control protocol(OCC).
163	vABS: Towards Verifiable Attribute-Based Search Over Shared Cloud Data	Y. Ji, C. Xu, J. Xu and H. Hu	In this demonstration, we present a system called vABS, which enables verifiable Attribute-Based Search over shared cloud data.
164	Efficient Synchronization of State-Based CRDTs	V. Enes, P. S. Almeida, C. Baquero and J. Leit?o	In this paper we: 1) identify two sources of inefficiency in current synchronization algorithms for delta-based CRDTs; 2) bring the concept of join decomposition to state-based CRDTs; 3) exploit join decompositions to obtain optimal deltas and 4) improve the efficiency of synchronization algorithms; and finally, 5) experimentally evaluate the improved algorithms.
165	An Environment-Aware Market Strategy for Data Allocation and Dynamic Migration in Cloud Database	T. Wang et al.	To this end, this paper presents an environment-aware market strategy based system, named e-MARS, for reasonable data migration to achieve query load balance in cloud database.
166	Vaite: A Visualization-Assisted Interactive Big Urban Trajectory Data Exploration System	C. Yang, Y. Zhang, B. Tang and M. Zhu	In this work, we architect and implement a visualization-assisted big urban trajectory data exploration system (Vaiet) to address these chanllenges.
167	SciDetector: Scientific Event Discovery by Tracking Variable Source Data Streaming	Z. Duan et al.	We present the design of and a demonstration for SciDetector, a system of scientific research for online analysis.
168	Demonstrating Spindra: A Geographic Knowledge Graph Management System	Y. Sun, J. Yu and M. Sarwat	In this paper, we demonstrate a system, namely Spindra, that provides efficient management of geographic knowledge graphs.
169	Native Storage Techniques for Data Management	I. Petrov, A. Koch, S. Hardock, T. Vincon and C. Riegger	In the present tutorial we perform a cross-cut analysis of database storage management from the perspective of modern storage technologies.
170	Crowdsourcing Database Systems: Overview and Challenges	C. Chai, J. Fan, G. Li, J. Wang and Y. Zheng	In this tutorial, we will survey and synthesize a wide spectrum of existing studies on crowdsourcing database systems.
171	Telco Big Data Research and Open Problems	C. Costa and D. Zeinalipour-Yazti	In this tutorial, we overview the state-of-the-art in telco big data analytics by focusing on a set of basic principles, namely: (i) real-time analytics and detection; (ii) experience, behavior and retention analytics; (iii) privacy; and (iv) storage.
172	Geospatial Data Management in Apache Spark: A Tutorial	J. Yu and M. Sarwat	A follow-up section presents the common approaches used by the practitioners to extend Spark and introduces the vital components in a generic spatial data management system.
173	Hierarchical Decomposition of Big Graphs	Y. Zhang, L. Qin, F. Zhang and W. Zhang	Subsequently, we provide an overview of the existing models and the computation algorithms under different computing environments.
174	Cohesive Subgraph Computation Over Large Sparse Graphs	L. Chang and L. Qin	In this tutorial, we survey the models and state-of-the-art algorithms for efficient cohesive subgraph computation based on different cohesiveness measures. Finally, we present open problems for future research.
175	Robust Query Processing: Mission Possible	J. R. Haritsa	In this tutorial, we will present these novel research approaches, characterize their strengths and limitations, and enumerate open technical problems that remain to be solved to make robust query processing a contemporary reality.
176	Automated Documentation of End-to-End Experiments in Data Science	S. Redyuk	We aim at reducing manual overhead for experimenting researchers, and intend to create a novel approach in dataflow and metadata tracking based on the analysis of the experiment source code.
177	Explaining Results of Data-Driven Applications	N. Frost	This paper demonstrates approaches for interpretability in two applications: Natural Language Queries, and Machine Learning Classifiers, followed by a discussion of open problems and future work.
178	Towards Explaining the Effects of Data Preprocessing on Machine Learning	C. V. Gonzalez Zelaya	In this initial work we define a simple metric, which we call volatility, to measure the effect of including/excluding a specific step on predictions made by the resulting model.
179	Don’t Fear the REAPER: A Framework for Materializing and Reusing Deep-Learning Models	M. B. Sigl	The aim of this research is to reduce training time of machine learning from a data-management perspective through model reuse, and shed some light on the above relationship in the case when reusing a model is appropriate.
180	Knowledge Representation for Emotion Intelligence	S. Wang	We have introduced two kinds of improving embedding methods (MEC and Emo2Vec) for the sentiment words embedding.
181	Disambiguation and Result Expansion in Keyword Search Over Relational Databases	N. Hormozi	In this paper, we are going to describe how we are improving state of the art in various stages of a keyword-search pipeline in order to retrieve the answers that best match the user’s intent.
182	Event Recommendation using Social Media	S. Madisetty	We plan to use event related discussions in social media as a signal for estimating the popularity of the events.
183	Learning Individual Models for Imputation	A. Zhang, S. Song, Y. Sun and J. Wang	In this study, enlightened by the conditional dependencies that hold conditionally over certain tuples rather than the whole relation, we propose to learn a regression model individually for each complete tuple together with its neighbors.
184	Location Inference for Non-Geotagged Tweets in User Timelines [Extended Abstract]	P. Li, H. Lu, N. Kanhabua, S. Zhao and G. Pan	Subsequently, we adapt machine learning models to our setting and design classifiers that classify each tweet cluster into one of the pre-defined location classes at the city level.
185	Efficient Parallel Skyline Query Processing for High-Dimensional Data	M. Tang, Y. Yu, W. G. Aref, Q. M. Malluhi and M. Ouzzani	More specifically, in this paper, we are tackling the data straggler and data skew challenges introduced by distributed skyline query processing, as well as the ensuing high computation cost of merging skyline candidates.
186	On Generalizing Collective Spatial Keyword Queries (Extended Abstract)	H. K. Chan, C. Long and R. C. Wong	In this paper, we design a unified cost function which generalizes the majority of existing cost functions for CoSKQ and develop a unified approach which works as well as (and sometimes better than) best-known approaches based on different cost functions.
187	A Novel Representation and Compression for Queries on Trajectories in Road Networks (Extended Abstract)	X. Yang, B. Wang, K. Yang, C. Liu and B. Zheng	In this paper, we explore characteristics of the trajectories in road networks, which have motivated the idea of coding trajectories by associating timestamps with relative spatial path and locations.
188	Efficient Multi-Class Probabilistic SVMs on GPUs	Z. Wen, J. Shi, B. He, J. Chen and Y. Chen	To overcome the challenges, we propose GMP-SVM to reduce high latency memory accesses and memory consumption through batch processing, computation/data reusing and sharing.
189	C2Net: A Network-Efficient Approach to Collision Counting LSH Similarity Join(Extended Abstract)	H. Li, S. Nutanong, H. Xu, C. Yu and F. Ha	This paper focuses on collision counting LSH-based similarity join in MapReduce and proposes a network-efficient solution called C2Net to improve the utilization of MapReduce combiners.
190	LinkBlackHole*: Robust Overlapping Community Detection Using Link Embedding (Extended Abstract)	J. Kim, S. Lim, J. Lee and B. S. Lee	This paper proposes LinkBlackHole*, a novel algorithm for finding communities that are (i) overlapping in nodes and (ii) mixing (not separating clearly) in links.
191	Fusion OLAP: Fusing the Pros of MOLAP and ROLAP Together for In-memory OLAP (Extended Abstract)	Y. Zhang, Y. Zhang, S. Wang and J. Lu	In this paper, we propose a novel Fusion OLAP model to fuse the multi-dimensional computing model and relational storage model together to make the best aspects of both MOLAP and ROLAP worlds.
192	In Search of Indoor Dense Regions: An Approach Using Indoor Positioning Data	H. Li, H. Lu, L. Shou, G. Chen and K. Chen	In this paper, we propose a data-driven approach that finds top-k indoor dense regions by using indoor positioning data.
193	CurrentClean: Spatio-Temporal Cleaning of Stale Data	M. Milani, Z. Zheng and F. Chiang	We introduce a spatio-temporal probabilistic model that captures the database update patterns to infer stale values, and propose a set of inference rules that model spatio-temporal update patterns commonly seen in real data.
194	Optimizing Quality for Probabilistic Skyline Computation and Probabilistic Similarity Search (Extended Abstract)	X. Miao, Y. Gao, L. Zhou, W. Wang and Q. Li	In this paper, we propose an efficient optimization framework, termed as QueryClean, for both probabilistic skyline computation and probabilistic similarity search.
195	On Efficiently Answering Why-Not Range-Based Skyline Queries in Road Networks (Extended Abstract)	X. Miao, Y. Gao, S. Guo and G. Chen	In this paper, we systematically carry out the study of why-not questions on the r-skyline query in the road network (abbreviated as the why-not RSQ problem).
196	SLADE: A Smart Large-Scale Task Decomposer in Crowdsourcing	Y. Tong, L. Chen, Z. Zhou, H. V. Jagadish, L. Shou and W. Lv	In this paper, we propose the Smart Large-scAle task DEcomposer (SLADE) problem, which aims to decompose a large-scale crowdsourcing task to achieve the desired reliability at a minimal cost.
197	XINA: Explainable Instance Alignment using Dominance Relationship (Extended Abstract)	J. Yeo, H. Park, S. Lee, E. W. Lee and S. Hwang	In this extended abstract, we present an instance alignment framework, namely XINA, for KB integration.
198	A Hardware-Accelerated Solution for Hierarchical Index-Based Merge-Join(Extended Abstract)	Z. Zhou, C. Yu, S. Nutanong, Y. Cui, C. Fu and C. J. Xue	In this paper, we develop a novel solution to accelerate the processing of sort-merge join queries with low match rates.
199	Finding Most Popular Indoor Semantic Locations Using Uncertain Mobility Data	H. Li, H. Lu, L. Shou, G. Chen and K. Chen	In this work, we use uncertain historical indoor mobility data to find the top-k popular indoor semantic locations with the highest flow values.
200	Uncertain Graph Sparsification (Extended Abstract)	P. Parchas, N. Papailiou, D. Papadias and F. Bonchi	To overcome this problem, we introduce the first sparsification techniques aimed explicitly at uncertain graphs.
201	Rule-Based Entity Resolution on Database with Hidden Temporal Information (Extended Abstract)	H. Wang, X. Ding, J. Li and H. Gao	In this paper, we deal with the problem of rule-based entity resolution on imprecise temporal data.
202	BRIGHT – Drift-Aware Demand Predictions for Taxi Networks (Extended Abstract)	A. Saadallah, L. Moreira-Matias, R. Sousa, J. Khiari, E. Jenelius and J. Gama	In this paper, we propose BRIGHT: a drift-aware supervised learning framework which aims to provide accurate predictions for short-term horizon taxi demand quantities through a creative ensemble of time series analysis methods that handle distinct types of concept drift.
203	Order-Sensitive Imputation for Clustered Missing Values (Extended Abstract)	Q. Ma, Y. Gu, W. Lee and G. Yu	To study the issue of missing values (MVs), we propose the Order-Sensitive Imputation for Clustered Missing values (OSICM) framework, in which missing values are imputed sequentially such that the values filled earlier in the process are also used for later imputation of other MVs.
204	Fine-Grained Provenance for Matching & ETL	N. Zheng, A. Alawini and Z. G. Ives	We propose PROVision, a provenance-driven troubleshooting tool that supports ETL and matching computations and traces extraction of content within data objects.
205	DeepDirect: Learning Directions of Social Ties with Edge-Based Network Embedding (Extended Abstract)	C. Wang, C. Wang, Z. Wang, X. Ye, J. X. Yu and B. Wang	This paper presents the problem of tie direction learning which learns the directionality function of directed social networks.
206	A Utility-Optimized Framework for Personalized Private Histogram Estimation (Extended Abstract)	Y. Nie, W. Yang, L. Huang, X. Xie, Z. Zhao and S. Wang	In this poster, we for the first time propose a framework to optimize the utility of histogram estimation with these two privacy requirements.
207	Near-Accurate Multiset Reconciliation (Extended Abstract)	L. Luo, D. Guo, X. Zhao, J. Wu, O. Rottenstreich and X. Luo	In this paper, we extend the set reconciliation problem into three design rationales: (i) multiset support; (ii) near 100% reconciliation accuracy; (iii) communication-friendly and time-saving.
208	Answering Why-Not Group Spatial Keyword Queries (Extended Abstract)	B. Zheng et al.	We propose a three-phase framework for efficiently computing he WGSK.
209	Effective and Efficient Community Search Over Large Directed Graphs (Extended Abstract)	Y. Fang, Z. Wang, R. Cheng, H. Wang and J. Hu	In this paper, we study the problem of CS on directed graph.
210	Exploring Communities in Large Profiled Graphs (Extended Abstract)	Y. Chen, Y. Fang, R. Cheng, Y. Li, X. Chen and J. Zhang	In this paper, we study profiled community search (PCS), where CS is performed on a profiled graph.
211	Index-Based Densest Clique Percolation Community Search in Networks (Extended Abstract)	L. Yuan, L. Qin, W. Zhang, L. Chang and J. Yang	Motivated by this, in this paper, we adopt the k-clique percolation community model and study the densest clique percolation community search problem which aims to find the k-clique percolation community with the maximum k value that contains a given set of query nodes.
212	Unsupervised String Transformation Learning for Entity Consolidation	D. Deng et al.	For this purpose, we propose a data-driven method to standardize the variant values based on two observations: (1) the variant values usually can be transformed to the same representation (e.g., “Mary Lee” and “Lee, Mary”) and (2) the same transformation often appears repeatedly across different clusters (e.g., transpose the first and last name).
213	A Semi-Supervised Framework of Clustering Selection for De-Duplication	S. Kushagra, H. Saxena, I. F. Ilyas and S. Ben-David	In this paper, we make the following contributions.
214	Scaling Up Subgraph Query Processing with Efficient Subgraph Matching	S. Sun and Q. Luo	As such, in this paper, we study whether, and if so, how to utilize efficient subgraph matching to improve subgraph query processing.
215	Efficient Parallel Subgraph Enumeration on a Single Machine	S. Sun, Y. Che, L. Wang and Q. Luo	In this paper, we develop an efficient parallel subgraph enumeration algorithm for a single machine, named LIGHT.
216	Fast Dual Simulation Processing of Graph Database Queries	S. Mennicke, J. Kalo, D. Nagel, H. Kroll and W. Balke	In this paper we bridge this gap by introducing a new dual simulation process operating on SPARQL queries.
217	Efficient and Incremental Clustering Algorithms on Star-Schema Heterogeneous Graphs	L. Chen, Y. Gao, Y. Zhang, C. S. Jensen and B. Zheng	In this paper, we represent attributed graphs as star-schema heterogeneous graphs, where attributes are modeled as different types of graph nodes.
218	G*-Tree: An Efficient Spatial Index on Road Networks	Z. Li, L. Chen and Y. Wang	In this paper, we propose an efficient hierarchical index, G*-tree, to optimize spatial queries on road networks.
219	DBSVEC: Density-Based Clustering Using Support Vector Expansion	Z. Wang, R. Zhang, J. Qi and B. Yuan	To address this problem, we propose a novel approximate density-based clustering algorithm named DBSVEC.
220	A Joint Context-Aware Embedding for Trip Recommendations	J. He, J. Qi and K. Ramamohanarao	In this study, we propose a POI embedding model to jointly learn the impact of these contextual factors.
221	AIR: Attentional Intention-Aware Recommender Systems	T. Chen, H. Yin, H. Chen, R. Yan, Q. V. H. Nguyen and X. Li	Hence, in this paper, we propose AIR, namely attentional intention-aware recommender systems to predict category-wise future user intention and collectively exploit the rich heterogeneous user interaction behaviors (i.e., multiple types of user behaviors).
222	No, That’s Not My Feedback: TV Show Recommendation Using Watchable Interval	K. Cho, Y. Lee, K. Han, J. Choi and S. Kim	In order to reflect this new concept into the TV show recommendation, we propose a novel framework based on collaborative filtering.
223	Adaptive Wavelet Clustering for Highly Noisy Data	Z. Chen, J. Liu, Y. Deng, K. He and J. E. Hopcroft	In this paper we make progress on the unsupervised task of mining arbitrarily shaped clusters in highly noisy datasets, which is a task present in many real-world applications.
224	An Efficient Parallel Keyword Search Engine on Knowledge Graphs	Y. Yang, D. Agrawal, H. V. Jagadish, A. K. H. Tung and S. Wu	In this paper, we attempt to address this need by leveraging advances in hardware technologies, e.g. multi-core CPUs and GPUs.
225	Towards Longitudinal Analytics on Social Media Data	F. Xia, B. Yang, C. Yu, W. Qian and A. Zhou	We study a fundamental functionality in longitudinal analytics-the top-k temporal keyword (TkTK) querying.
226	LCJoin: Set Containment Join via List Crosscutting	D. Deng, C. Yang, S. Shang, F. Zhu, L. Liu and L. Shao	In contrast, in this paper, we propose to intersect all the inverted lists simultaneously while skipping many irrelevant entries in the lists.
227	Bridging the Semantic Gap with SQL Query Logs in Natural Language Interfaces to Databases	C. Baik, H. V. Jagadish and Y. Li	In this paper, we propose leveraging information from the SQL query log of a database to enhance the performance of existing NLIDBs with respect to these challenges.
228	MF-Join: Efficient Fuzzy String Similarity Join with Multi-level Filtering	J. Wang, C. Lin and C. Zaniolo	In this paper, we propose MF-Join, a multi-level filtering approach for fuzzy string similarity join.
229	Finding Temporal Influential Users Over Evolving Social Networks	S. Huang, Z. Bao, J. S. Culpepper and B. Zhang	In this paper we study the problem of Distinct Influence Maximization (DIM) where the goal is to identify a seed set of influencers who maximize the number of distinct users influenced over a predefined window of time.
230	Seed Selection and Social Coupon Allocation for Redemption Maximization in Online Social Networks	T. Chang, Y. Shi, D. Yang and W. Chen	In the paper, we investigate not only the seed selection problem but also the effect of SC allocation for optimizing the redemption rate which represents the efficiency of SC allocation.
231	Keyword-Centric Community Search	Z. Zhang, X. Huang, J. Xu, B. Choi and Z. Shang	We design a new function of keyword closeness and propose efficient algorithms to solve the KCCS problem.
232	Cohesive Group Nearest Neighbor Queries Over Road-Social Networks	F. Guo, Y. Yuan, G. Wang, L. Chen, X. Lian and Z. Wang	In this paper, we study a new problem: a GNN search on a road network that incorporates cohesive social relationships (CGNN).
233	Maximizing Multifaceted Network Influence	Y. Li, J. Fan, G. Ovchinnikov and P. Karras	In this paper, we propose the Optimal Influential Pieces Assignment (OIPA) problem, which is to assign k distinct pieces of an information campaign OIPA to k promoters, so as to achieve the highest viral adoption in a network.
234	GB-KMV: An Augmented KMV Sketch for Approximate Containment Similarity Search	Y. Yang, Y. Zhang, W. Zhang and Z. Huang	In this paper, we study the problem of approximate containment similarity search. We provide a set of theoretical analysis to underpin the proposed augmented KMV sketch technique, and show that it outperforms the state-of-the-art technique LSH-E in terms of estimation accuracy under practical assumption.
235	ARROW: Approximating Reachability Using Random Walks Over Web-Scale Graphs	N. Sengupta, A. Bagchi, M. Ramanath and S. Bedathur	In this paper, we show that ARROW, despite its simplicity, is near-accurate and scales to graphs with tens of millions of vertices and hundreds of millions of edges.
236	Taster: Self-Tuning, Elastic and Online Approximate Query Processing	M. Olma, O. Papapetrou, R. Appuswamy and A. Ailamaki	In this paper, we present Taster, a self-tuning, elastic, online AQP engine that synergistically combines the benefits of online and offline AQP.
237	An Iterative Scheme for Leverage-Based Approximate Aggregation	S. Han, H. Wang, J. Wan and J. Li	To address this problem, we propose a novel approach to calculate the aggregation answers with a high accuracy using only a small portion of the data.
238	Deletion Propagation for Multiple Key Preserving Conjunctive Queries: Approximations and Complexity	Z. Cai, D. Miao and Y. Li	The investigated problem is a variant of the standard deletion propagation problem, where given a source database D, a set of key preserving conjunctive queries Q, and the set of views V obtained by the queries in Q, we try to identify a set T of tuples from D whose elimination prevents all the tuples in a given set of deletions on views ?
239	Enumerating Minimal Weight Set Covers	Z. Ajami and S. Cohen	Thus, we present an algorithm that enumerates all minimal weight set covers in polynomial delay (i.e., with polynomial time between results) in ?
240	Constraints-Based Explanations of Classifications	D. Deutch and N. Frost	We propose a simple generic approach for explaining classifications, by identifying relevant parts of the input whose perturbation would be significant in affecting the classification.
241	KARL: Fast Kernel Aggregation Queries	T. N. Chan, M. L. Yiu and H. U. Leong	In this paper, we propose a novel and effective bounding technique to speedup the computation of kernel aggregation.
242	Assessing and Remedying Coverage for a Given Dataset	A. Asudeh, Z. Jin and H. V. Jagadish	In this paper, we assess the coverage of a given dataset over multiple categorical attributes.
243	Social Influence-Based Group Representation Learning for Group Recommendation	H. Yin, Q. Wang, K. Zheng, Z. Li, J. Yang and X. Zhou	In this paper, we propose a novel group recommender system, namely SIGR (short for “Social Influence-based Group Recommender”), which takes an attention mechanism and a bipartite graph embedding model BGEM as building blocks. We create two large-scale benchmark datasets and conduct extensive experiments on them.
244	MIDAS: Finding the Right Web Sources to Fill Knowledge Gaps	X. Wang, X. L. Dong, Y. Li and A. Meliou	In this paper, we present MIDAS, a system that harnesses the results of automated knowledge extraction pipelines to repair the bottleneck in industrial knowledge creation and augmentation processes.
245	Exploiting Centrality Information with Graph Convolutions for Network Representation Learning	H. Chen, H. Yin, T. Chen, Q. V. H. Nguyen, W. Peng and X. Li	We propose a generalizable model, namely GraphCSC, that utilizes both linkage information and centrality information to learn low-dimensional vector representations for network vertices.
246	Route Recommendations on Road Networks for Arbitrary User Preference Functions	P. Yawalkar and S. Ranu	In this paper, we study the query where a user provides a set of relevant PoIs and wants to identify the optimal route covering these PoIs.
247	NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph Embedding	Y. Zhang, Q. Yao, Y. Shao and L. Chen	In this paper, motivated by the observation that negative triplets with large scores are important but rare, we propose to directly keep track of them with cache.
248	ServeDB: Secure, Verifiable, and Efficient Range Queries on Outsourced Database	S. Wu, Q. Li, G. Li, D. Yuan, X. Yuan and C. Wang	In this paper, we propose a secure and scalable scheme that can support multi-dimensional range queries over encrypted data.
249	Collecting and Analyzing Multidimensional Data with Local Differential Privacy	N. Wang et al.	In this paper, we point out that the fundamental problem of collecting multidimensional data under LDP has not been addressed sufficiently, and there remains much room for improvement even for basic tasks such as computing the mean value over a single numeric attribute under LDP.
250	Partitioned Data Security on Outsourced Sensitive and Non-Sensitive Data	S. Mehrotra, S. Sharma, J. Ullman and A. Mishra	We propose a new secure approach, entitled query binning (QB) that allows non-sensitive parts of the data to be outsourced in clear-text while guaranteeing that no information is leaked by the joint processing of non-sensitive data (in clear-text) and sensitive data (in encrypted form).
251	SecEQP: A Secure and Efficient Scheme for SkNN Query Problem Over Encrypted Geodata on Cloud	X. Lei, A. X. Liu, R. Li and G. Tu	In this paper, we propose the Secure and Efficient Query Processing (SecEQP) scheme to address the secure k nearest neighbor (SkNN) query problem.
252	Joins Over Encrypted Data with Fine Granular Security	F. Hahn, N. Loza and F. Kerschbaum	In this paper we present a different approach: Instead of implementing a stand-alone join operator that reveals the frequency of each element in the column, we show how to construct joins over encrypted data after selection operations have been applied.
253	Column-Oriented Database Acceleration Using FPGAs	S. Watanabe, K. Fujimoto, Y. Saeki, Y. Fujikawa and H. Yoshino	To overcome this drawback, we developed a column-oriented DBMS and a field-programmable-gate-array-based acceleration engine.
254	Hardware-Conscious Hash-Joins on GPUs	P. Sioulas, P. Chrysogelos, M. Karpathiotakis, R. Appuswamy and A. Ailamaki	In this paper, we present the design and implementation of a family of novel, partitioning-based GPU-join algorithms that are tuned to exploit various GPU hardware characteristics for working around the two main limitations of GPUs-limited memory capacity and slow PCIe interface.
255	TuFast: A Lightweight Parallelization Library for Graph Analytics	Z. Shang, J. X. Yu and Z. Zhang	In this paper, we present a lightweight transactional memory (TM) library TuFast which provides easy-to-use primitives for the end-user to agilely develop fast shared memory graph parallelization on a multi-core server.
256	LDC: A Lower-Level Driven Compaction Method to Optimize SSD-Oriented Key-Value Stores	Y. Chai, Y. Chai, X. Wang, H. Wei, N. Bao and Y. Liang	Aiming to optimize both the tail latency and the system throughput, in this paper, we propose a novel Lower-level Driven Compaction (LDC) method for LSM-tree KV stores.
257	No False Negatives: Accepting All Useful Schedules in a Fast Serializable Many-Core System	D. Durner and T. Neumann	We introduce a novel multi-version concurrency protocol that achieves high performance while reducing the number of aborted schedules to a minimum and providing the best isolation level.
258	Discovering Maximal Motif Cliques in Large Heterogeneous Information Networks	J. Hu, R. Cheng, K. C. Chang, A. Sankar, Y. Fang and B. Y. H. Lam	We thus present the META algorithm, which employs advanced pruning strategies to effectively reduce the search space.
259	REPT: A Streaming Algorithm of Approximating Global and Local Triangle Counts in Parallel	P. Wang, P. Jia, Y. Qi, Y. Sun, J. Tao and X. Guan	To solve these problems, we develop a novel parallel method REPT to significantly reduce the covariance (even completely eliminate the covariance for some cases) between sampled triangles.
260	AsterixDB Mid-Flight: A Case Study in Building Systems in Academia	M. J. Carey	This paper describes the experiences that the author and his (mostly UC-based) partners in software crime have had that culminated in the Big Data Management System now available as Apache AsterixDB.
261	Information Diffusion Prediction via Recurrent Cascades Convolution	X. Chen, F. Zhou, K. Zhang, G. Trajcevski, T. Zhong and F. Zhang	To capture both the underlying structures governing the spread of information and inherent dependencies between re-tweeting behaviors of users, we propose a semi-supervised method, called Recurrent Cascades Convolutional Networks (CasCN), which explicitly models and predicts cascades through learning the latent representation of both structural and temporal information, without involving any other features.
262	Finding Densest Lasting Subgraphs in Dynamic Graphs: A Stochastic Approach	X. Liu, T. Ge and Y. Wu	We propose a framework called Expectation-Maximization with Utility functions (EMU), a novel stochastic approach that nontrivially extends the conventional EM approach.
263	Multicapacity Facility Selection in Networks	A. Logins, P. Karras and C. S. Jensen	We present the first, to our knowledge, solution to the MCFS problem that achieves both scalability and high quality, the Wide Matching Algorithm (WMA).
264	An MBR-Oriented Approach for Efficient Skyline Query Processing	J. Zhang, W. Wang, X. Jiang, W. Ku and H. Lu	This research proposes an advanced approach that improves the efficiency of skyline query processing by significantly reducing the computational cost on object comparisons, i.e., dominance tests between objects.
265	Dynamic Set kNN Self-Join	D. Amagata, T. Hara and C. Xiao	In this paper, we study a novel problem, dynamic set kNN self-join, i.e., for each set, we continuously compute its k nearest neighbor sets.
266	Packed Memory Arrays – Rewired	D. De Leo and P. Boncz	PMAs have been studied mostly theoretically but suffer from practical problems, as we show in this paper.
267	GEM^2-Tree: A Gas-Efficient Structure for Authenticated Range Queries in Blockchain	C. Zhang, C. Xu, J. Xu, Y. Tang and B. Choi	In this paper, we take the first step toward studying authenticated range queries in the hybrid-storage blockchain.
268	Effective Filters and Linear Time Verification for Tree Similarity Joins	T. H?tter, M. Pawlik, R. L?schinger and N. Augsten	In this paper, we present a scalable solution for the tree similarity join that is based on (1) an effective indexing technique that leverages both the labels and the structure of trees to reduce the number of candidates, (2) an efficient upper bound filter that moves many of the candidates directly to the join result without additional verification, (3) a linear time verification technique for the remaining candidates that avoids the expensive tree edit distance.