Paper Digest: ICDE 2020 Highlights

April 19, 2020August 18, 2020 admin

IEEE International Conference on Data Engineering (ICDE) addresses research issues in designing, building, managing, and evaluating advanced data-intensive systems and applications. In 2020, it is to be held virtually due to covid-19 pandemic.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: ICDE 2020 Papers

	Title	Authors	Highlight
1	Real-Time Cross Online Matching in Spatial Crowdsourcing	Y. Cheng, B. Li, X. Zhou, Y. Yuan, G. Wang and L. Chen	In this paper, we propose a Cross Online Matching (COM), which enables a platform to “borrow” unoccupied crowd workers from other platforms for completing the user requests.
2	Predictive Task Assignment in Spatial Crowdsourcing: A Data-driven Approach	Y. Zhao, K. Zheng, Y. Cui, H. Su, F. Zhu and X. Zhou	We propose a two-phase data-driven framework.
3	Curiosity-Driven Energy-Efficient Worker Scheduling in Vehicular Crowdsourcing: A Deep Reinforcement Learning Approach	C. H. Liu et al.	In this paper, we explicitly consider the use of unmanned vehicular workers, e.g., drones and driverless cars, which are more controllable and can be deployed in remote or dangerous areas to carry on long-term and hash tasks as a vehicular crowdsourcing (VC) campaign.
4	Crowdsourced Collective Entity Resolution with Relational Match Propagation	J. Huang, W. Hu, Z. Bao and Y. Qu	In this paper, we propose a novel approach called crowdsourced collective ER, which leverages the relationships between entities to infer matches jointly rather than independently.
5	An End-to-End Deep RL Framework for Task Arrangement in Crowdsourcing Platforms	C. Shan, N. Mamoulis, R. Cheng, G. Li, X. Li and Y. Qian	In this paper, we propose a Deep Reinforcement Learning (RL) framework for task arrangement, which is a critical problem for the success of crowdsourcing platforms.
6	Efficient Bidirectional Order Dependency Discovery	Y. Jin, L. Zhu and Z. Tan	In this paper, we adopt a strategy that decouples the impact of m from that of n, and that still finds all minimal valid bidirectional order dependencies.
7	Efficient Diversity-Driven Ensemble for Deep Neural Networks	W. Zhang, J. Jiang, Y. Shao and B. Cui	As the effect of ensemble learning is more pronounced if ensemble members are accurate and diverse, we propose a method named Efficient Diversity-Driven Ensemble (EDDE) to address both the diversity and the efficiency of an ensemble.
8	Adaptive Network Alignment with Unsupervised and Multi-order Convolutional Networks	H. T. Trung, T. Van Vinh, N. T. Tam, H. Yin, M. Weidlich and N. Q. Viet Hung	In this paper, we propose a fully unsupervised network alignment framework based on a multi-order embedding model.
9	A Natural Language Interface for Database: Achieving Transfer-learnability Using Adversarial Method for Question Understanding	W. Wang, Y. Tian, H. Wang and W. Ku	In this work, we show that it is possible to separate data specific components from latent semantic structures in expressing relational queries in a natural language.
10	Array-based Data Management for Genomics	O. Horlova, A. Kaitoua and S. Ceri	Specifically, we define a wide spectrum of operations over datasets which are represented using arrays, and we show that the arraybased implementation scales well upon Spark, also thanks to a data representation which is effectively used for supporting machine learning.
11	Group Recommendation with Latent Voting Mechanism	L. Guo, H. Yin, Q. Wang, B. Cui, Z. Huang and L. Cui	Instead of exploring new heuristic or vanilla attention-based mechanism, we propose a new social self-attention based aggregation strategy by directly modeling the interactions among group members, namely Group Self-Attention (GroupSA).
12	Price-aware Recommendation with Graph Convolutional Networks	Y. Zheng, C. Gao, X. He, Y. Li and D. Jin	Towards the first difficulty, we propose to model the transitive relationship between user-to-item and item-to-price, taking the inspiration from the recently developed Graph Convolution Networks (GCN).
13	Syndrome-aware Herb Recommendation with Multi-Graph Convolution Network	Y. Jin, W. Zhang, X. He, X. Wang and X. Wang	Specifically, given a set of symptoms to treat, we aim to generate an overall syndrome representation by effectively fusing the embeddings of all the symptoms in the set, so as to mimic how a doctor induces the syndromes.
14	PoisonRec: An Adaptive Data Poisoning Framework for Attacking Black-box Recommender Systems	J. Song et al.	In this paper, we propose an adaptive data poisoning framework, PoisonRec, which can automatically learn effective attack strategies on various recommender systems with very limited knowledge.
15	Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Difficulty in Action Sequences	K. Umemoto, T. Milo and M. Kitsuregawa	We propose a progression model that uses latent variables to learn the monotonically non-decreasing progression of user skills.
16	Exploring Finer Granularity within the Cores: Efficient (k,p)-Core Computation	C. Zhang et al.	In this paper, we propose and study a novel cohesive subgraph model, named (k,p)-core, which is a maximal subgraph where each vertex has at least k neighbours and at least p fraction of its neighbours in the subgraph.
17	Kaskade: Graph Views for Efficient Graph Analytics	J. M. F. da Trindade, K. Karanasos, C. Curino, S. Madden and J. Shun	In this work, we leverage structural properties of graphs and queries to automatically derive materialized graph views that can dramatically speed up query evaluation.
18	Efficient Top-k Edge Structural Diversity Search	Q. Zhang, R. Li, Q. Yang, G. Wang and L. Qin	In this work, we for the first time perform a systematical study for the top-k edge structural diversity search problem on large graphs.
19	Adaptive Relation Discovery from Focusing Seeds on Large Networks	Z. Wang, C. Wang, W. Wang, X. Gu, B. Li and D. Meng	To support such applications, a new problem called adaptive relation discovery from focusing seeds (A-RDFS) is proposed and studied in this article.
20	Repairing Entities using Star Constraints in Multirelational Graphs	P. Lin, Q. Song, Y. Wu and J. Pi	(1) We propose a class of constraints called star functional dependencies (StarFDs).
21	Practical Anonymous Subscription with Revocation Based on Broadcast Encryption	X. Yi, R. Paulet, E. Bertino and F. Rao	In this paper we consider the problem where a client wishes to subscribe to some product or service provided by a server, but maintain their anonymity.
22	SVkNN: Efficient Secure and Verifiable k-Nearest Neighbor Query on the Cloud Platform*	N. Cui, X. Yang, B. Wang, J. Li and G. Wang	In this paper, we study the problem of secure and verifiable k nearest neighbor query (SVkNN).
23	An Anomaly Detection System for the Protection of Relational Database Systems against Data Leakage by Application Programs	D. Fadolalkarim, E. Bertino and A. Sallam	In this paper, we propose AD-PROM, an Anomaly Detection system that aims at protecting relational database systems against malicious/compromised applications PROgraMs aiming at stealing data.
24	SFour: A Protocol for Cryptographically Secure Record Linkage at Scale	B. Khurram and F. Kerschbaum	To this end, we propose the first known efficient PRL protocol that runs in subquadratic time, provides high accuracy, and guarantees cryptographic security in the semi-honest security model.
25	I/O Efficient Approximate Nearest Neighbour Search based on Learned Functions	M. Li, Y. Zhang, Y. Sun, W. Wang, I. W. Tsang and X. Lin	In this paper, we introduce a novel data-sensitive indexing and query processing framework for ANNS with an emphasis on optimizing the I/O efficiency, especially, the sequential I/Os.
26	Efficient Query Processing with Optimistically Compressed Hash Tables & Strings in the USSR	T. Gubner, V. Leis and P. Boncz	In this work, we propose three complementary techniques to improve this representation: Domain-Guided Prefix Suppression bit-packs keys and values tightly to reduce hash table record width.
27	UniKV: Toward High-Performance and Scalable KV Storage in Mixed Workloads via Unified Indexing	Q. Zhang, Y. Li, P. P. C. Lee, Y. Xu, Q. Cui and L. Tang	We design UniKV, which unifies the key design ideas of hash indexing and the LSM-tree in a single system.
28	Improved Correlated Sampling for Join Size Estimation	T. Wang and C. Chan	Based on this framework, we propose a new correlated sampling based technique to address the limitations of existing techniques.
29	MESSI: In-Memory Data Series Indexing	B. Peng, P. Fatourou and T. Palpanas	In this work, we propose MESSI, the first data series index designed for in-memory operation on modern hardware.
30	Spatial Transition Learning on Road Networks with Deep Probabilistic Models	X. Li, G. Cong and Y. Cheng	In this paper, we study the problem of predicting the most likely traveling route on the road network between two given locations by considering the real-time traffic.
31	Active Model Selection for Positive Unlabeled Time Series Classification	S. Liang, Y. Zhang and J. Ma	Focusing on the widely adopted self-training one-nearest-neighbor (ST-1NN) paradigm, we propose a model selection framework based on active learning (AL).
32	Neighbor Profile: Bagging Nearest Neighbors for Unsupervised Time Series Mining	Y. He, X. Chu and Y. Wang	Thereafter, we point out the inherent three issues: low-quality density estimation, gravity defiant behavior, and lack of reusable model, which deteriorate the performance of matrix profile in both efficiency and subsequence quality.To overcome these issues, we propose Neighbor Profile to robustly model the subsequence density by bagging nearest neighbors for the discovery of frequent/rare subsequences.
33	Massively-Parallel Change Detection for Satellite Time Series Data with Missing Values	F. Gieseke, S. Rosca, T. Henriksen, J. Verbesselt and C. E. Oancea	In this work, we propose a novel massively-parallel implementation for a state-of-the-art change detection method and demonstrate its potential in the context of monitoring deforestation.
34	Skyline Cohesive Group Queries in Large Road-social Networks	Q. Li, Y. Zhu and J. X. Yu	In this paper, we take a new approach to consider the constraints equally and study a skyline query.
35	Anchored Vertex Exploration for Community Engagement in Social Networks	T. Cai, J. Li, N. A. Hasan Haldar, A. Mian, J. Yearwood and T. Sellis	Given a set of keywords W, a structure cohesive parameter k, and a budget parameter l, our objective is to find l number of users who can induce a maximal expanded community.
36	Optimizing Knowledge Graphs through Voting-based User Feedback	R. Yang, X. Lin, J. Xu, Y. Yang and L. He	To address these issues, in this paper, we propose an interactive framework that refines and optimizes knowledge graphs through user votes.
37	AutoSF: Searching Scoring Functions for Knowledge Graph Embedding	Y. Zhang, Q. Yao, W. Dai and L. Chen	In this paper, inspired by the recent success of automated machine learning (AutoML), we propose to automatically design SFs (AutoSF) for distinct KGs by the AutoML techniques.
38	Semantic Guided and Response Times Bounded Top-k Similarity Search over Knowledge Graphs	Y. Wang, A. Khan, T. Wu, J. Jin and H. Yan	In this paper, we propose a semantic-guided and response-time-bounded graph query to return the top-k answers effectively and efficiently.
39	PPKWS: An Efficient Framework for Keyword Search on Public-Private Networks	J. Jiang, X. Huang, B. Choi, J. Xu, S. S. Bhowmick and L. Xu	In this paper, we propose a new keyword search framework, called public-private keyword search (PPKWS).
40	Privacy-preserving Real-time Anomaly Detection Using Edge Computing	S. Mehnaz and E. Bertino	We propose a privacy-preserving framework that enables efficient anomaly detection on encrypted data by leveraging a lightweight and aggregation optimized encryption scheme to encrypt the data before off-loading the data to the edge.
41	To Warn or Not to Warn: Online Signaling in Audit Games	C. Yan, H. Xu, Y. Vorobeychik, B. Li, D. Fabbri and B. A. Malin	In this paper, we formalize this auditing problem as a Signaling Audit Game (SAG), in which we model the interactions between an auditor and an attacker in the context of signaling and the usability cost is represented as a factor of the auditor?s payoff.
42	One-sided Differential Privacy	I. Kotsogiannis, S. Doudalis, S. Haney, A. Machanavajjhala and S. Mehrotra	In this work we introduce one-sided differential privacy (OSDP) that offers provable privacy guarantees to the sensitive records.
43	Providing Input-Discriminative Protection for Local Differential Privacy	X. Gu, M. Li, L. Xiong and Y. Cao	In this paper, we tackle the challenge of providing input-discriminative protection to reflect the distinct privacy requirements of different inputs.
44	Differentially Private Online Task Assignment in Spatial Crowdsourcing: A Tree-based Approach	Q. Tao, Y. Tong, Z. Zhou, Y. Shi, L. Chen and K. Xu	In this paper, we investigate privacy protection for online task assignment with the objective of minimizing the total distance, an important task assignment formulation in spatial crowdsourcing.
45	ChainLink: Indexing Big Time Series Data For Long Subsequence Matching	N. Alghamdi, L. Zhang, H. Zhang, E. A. Rundensteiner and M. Y. Eltabakh	In this work, we propose a lightweight distributed indexing framework, called ChainLink, that supports approximate kNN queries over TB-scale time series data.
46	Random Sampling for Group-By Queries	T. D. Nguyen, M. Shih, S. S. Parvathaneni, B. Xu, D. Srivastava and S. Tirthapura	We present CVOPT, a query- and data-driven sampling framework for a set of group-by queries.
47	PA-Tree: Polled-Mode Asynchronous B+ Tree for NVMe	L. Wang, Z. Zhang, B. He and Z. Zhang	To tackle this problem, we propose PA-Tree, an NVMe-friendly B+ Tree with a novel, polled-mode, asynchronous execution paradigm to process multiple index operations in an interleaved and asynchronous manner.
48	Distributed Streaming Set Similarity Join	J. Yang, W. Zhang, X. Wang, Y. Zhang and X. Lin	In this paper, we study the problem of efficient stream set similarity join over distributed systems, which has broad applications in data cleaning and data integration tasks, such as on-line near-duplicate detection.
49	Cool, a COhort OnLine analytical processing system	Z. Xie, H. Ying, C. Yue, M. Zhang, G. Chen and B. C. Ooi	In this paper, we present Cool, a cohort online analytical processing system.
50	TransN: Heterogeneous Network Representation Learning by Translating Node Embeddings	Z. Li et al.	To address this problem, in this paper, we propose TransN, a novel multi-view network embedding framework for heterogeneous networks.
51	An Adaptive Master-Slave Regularized Model for Unexpected Revenue Prediction Enhanced with Alternative Data	J. Xu, J. Zhou, Y. Jia, J. Li and X. Hui	Thus we proposed an adaptive master-slave regularized model, called AMS for short, to effectively leverage alternative data for unexpected revenue prediction.
52	Exact and Consistent Interpretation of Piecewise Linear Models Hidden behind APIs: A Closed Form Solution	Z. Cong, L. Chu, L. Wang, X. Hu and J. Pei	In this paper, we propose an elegant closed form solution named OpenAPI to compute exact and consistent interpretations for the family of Piecewise Linear Models (PLM), which includes many popular classification models.
53	Statistical Estimation of Diffusion Network Topologies	K. Han, Y. Tian, Y. Zhang, L. Han, H. Huang and Y. Gao	In this work, we investigate the problem of how to infer the topology of a diffusion network from only the final infection statuses of nodes.
54	Multiple Dense Subtensor Estimation with High Density Guarantee	Q. Duong, H. Ramampiaro and K. N?rv?g	We address these drawbacks by providing both theoretical and practical solution for estimating multiple dense subtensors in tensor data.
55	Efficient Approximation Algorithms for Adaptive Target Profit Maximization	K. Huang, J. Tang, X. Xiao, A. Sun and A. Lim	To acquire an overall understanding, we study the adaptive TPM problem under both the oracle model and the noise model, and propose ADG and AddATP algorithms to address them with strong theoretical guarantees, respectively.
56	Efficient Bitruss Decomposition for Large-scale Bipartite Graphs	K. Wang, X. Lin, L. Qin, W. Zhang and Y. Zhang	In this paper, we study the bitruss decomposition problem which aims to find all the k-bitrusses for k = 0.
57	Kaleido: An Efficient Out-of-core Graph Mining System on A Single Machine	C. Zhao, Z. Zhang, P. Xu, T. Zheng and J. Guo	In this paper, we present Kaleido, an efficient single machine, out-of-core graph mining system which treats disks as an extension of memory.
58	Finding the Best k in Core Decomposition: A Time and Space Optimal Solution	D. Chu et al.	In this paper, given a graph and a scoring metric, we aim to efficiently find the best value of k such that the score of the k-core (or k-core set) is the highest.
59	Updates-Aware Graph Pattern based Node Matching	G. Sun, G. Liu, Y. Wang and X. Zhou	In this paper, we first analyze and detect the elimination relationships between the updates. Then, we construct an Elimination Hierarchy Tree (EH-Tree) to index these elimination relationships.
60	Dataset Discovery in Data Lakes	A. Bogatu, A. A. A. Fernandes, N. W. Paton and N. Konstantinou	We refer to this as the problem of dataset discovery in data lakes and this paper contributes an effective and efficient solution to it.
61	Swapping Repair for Misplaced Attribute Values	Y. Sun, S. Song, C. Wang and J. Wang	In a holistic view of all (swapped) attributes, we propose to evaluate the likelihood of a swapping repaired tuple by studying its distances (similarity) to neighbors.
62	Interactive Cleaning for Progressive Visualization through Composite Questions	Y. Luo, C. Chai, X. Qin, N. Tang and G. Li	In this paper, we study the problem of interactive cleaning for progressive visualization (ICPV): Given a bad visualization V , it is to obtain a “cleaned” visualization V whose distance is far from V , under a given (small) budget w.r.t. human cost.
63	User-driven Error Detection for Time Series with Events	K. Le and P. Papotti	In this work, we exploit active learning to detect both errors and events in a single solution that aims at minimizing user interaction.
64	An Agile Sample Maintenance Approach for Agile Analytics	H. Zhang, Y. Zhang, Z. He, Y. Jing, K. Zhang and X. S. Wang	This paper proposes an adaptive sample update (ASU) approach that avoids re-sampling from scratch as much as possible by monitoring the data distribution, and uses instead an incremental update method before a re-sampling becomes necessary.
65	Continuously Tracking Core Items in Data Streams with Probabilistic Decays	J. Zhao, P. Wang, J. Tao, S. Zhang and J. C. S. Lui	This paper investigates the core items tracking (CIT) problem where the goal is to continuously track representative items, called core items, in a data stream so to best represent/summarize the stream.
66	The Art of Efficient In-memory Query Processing on NUMA Systems: a Systematic Approach	P. Memarzia, S. Ray and V. C. Bhavsar	In this work, we evaluate a variety of strategies that aim to accelerate memory-intensive data analytics workloads on NUMA systems.
67	Speeding Up GED Verification for Graph Similarity Search	L. Chang, X. Feng, X. Lin, L. Qin, W. Zhang and D. Ouyang	In this paper, we aim to speed up GED verification, which is orthogonal to the index structures used in the filtering phase.
68	Scaling Out Schema-free Stream Joins	D. Gjurovski and S. Michel	In this work, we consider computing natural joins over massive streams of JSON documents that do not adhere to a specific schema.
69	Contribution Maximization in Probabilistic Datalog	T. Milo, Y. Moskovitch and B. Youngmann	To overcome this, we propose an optimized algorithm which injects a refined variant of the classic Magic Sets technique, integrated with a sampling method, into IM algorithms, achieving a significant saving of space and execution time.
70	Multiscale Frequent Co-movement Pattern Mining	S. Helmi and F. Banaei-Kashani	In this paper, we propose a novel and efficient framework for co-movement pattern mining.
71	Self-paced Ensemble for Highly Imbalanced Massive Data Classification	Z. Liu et al.	Taking those factors into consideration, we propose a novel framework for imbalance classification that aims to generate a strong ensemble by self-paced harmonizing data hardness via under-sampling.
72	SAN : Scale-Space Attention Networks	Y. Garg, K. S. Candan and M. L. Sapino	We propose an innovative robust feature learning framework, scale-invariant attention networks (SAN), that identifies salient regions in the input data for the CNN to focus on.
73	A Novel Approach to Learning Consensus and Complementary Information for Multi-View Data Clustering	K. Luong and R. Nayak	We propose a novel optimal manifold for multi-view data which is the most consensed manifold embedded in the high-dimensional multi-view data.
74	Summarizing Hierarchical Multidimensional Data	A. Kim, L. V. S. Lakshmanan and D. Srivastava	In this paper, we propose Tree Summaries, which attain this challenging goal over arbitrary hierarchical multidimensional data sets.
75	Efficient Team Formation in Social Networks based on Constrained Pattern Graph	Y. Kou et al.	In order to solve this problem, we present an efficient team formation method based on Constrained Pattern Graph (called CPG).
76	Effective and Efficient Truss Computation over Large Heterogeneous Information Networks	Y. Yang, Y. Fang, X. Lin, W. Zhang and Y. Fang	In this paper, we study the problem of truss computation over HINs, which aims to find groups of vertices that are of the same type and densely connected.
77	Index-Free Approach with Theoretical Guarantee for Efficient Random Walk with Restart Query	D. Lin, R. C. Wong, M. Xie and V. J. Wei	Motivated by this, in this paper, we propose an index-free algorithm called Residue-Accumulated approach (ResAcc) which returns answers with a theoretical guarantee efficiently.
78	Optimization of GPU-based Sparse Matrix Multiplication for Large Sparse Networks	J. Lee, S. Kang, Y. Yu, Y. Jo, S. Kim and Y. Park	To solve these challenges, this paper proposes a new optimization pass called Block Reorganizer, which balances the total computations of each computing unit on target GPUs, based on the outer-product-based expansion process, and reduces the memory pressure during the merge process.
79	VAC: Vertex-Centric Attributed Community Search	Q. Liu, Y. Zhu, M. Zhao, X. Huang, J. Xu and Y. Gao	To make up for these deficiencies, in this paper, we study a novel attributed community search called vertex-centric attributed community (VAC) search.
80	Online Anomalous Trajectory Detection with Deep Generative Sequence Modeling	Y. Liu, K. Zhao, G. Cong and Z. Bao	To this end, we propose a novel model, namely Gaussian Mixture Variational Sequence AutoEncoder (GM-VSAE), to tackle these challenges.
81	Mobility-Aware Dynamic Taxi Ridesharing	Z. Liu, Z. Gong, J. Li and K. Wu	In this paper, we consider the mobility-aware taxi ridesharing problem, and present mT- Share to address these limitations.
82	Online Trichromatic Pickup and Delivery Scheduling in Spatial Crowdsourcing	B. Zheng et al.	In order to quickly respond to submitted tasks, we propose a greedy solution that finds the schedule with the highest utility-cost ratio.
83	Task Allocation in Dependency-aware Spatial Crowdsourcing	W. Ni, P. Cheng, L. Chen and X. Lin	In this paper, we consider a spatial crowdsourcing scenario, where the tasks may have some dependencies among them.
84	Parallel Semantic Trajectory Similarity Join	L. Chen, S. Shang, C. S. Jensen, B. Yao and P. Kalnis	We consider the problem of semantic trajectory similarity join (STS-Join).
85	Being Happy with the Least: Achieving a-happiness with Minimum Number of Tuples	M. Xie, R. C. Wong, P. Peng and V. J. Tsotras	In this paper, we study the min-size version of the regret minimization query; that is, we want to determine the least tuples needed to keep users happy at a given level.
86	Improving Neural Relation Extraction with Implicit Mutual Relations	J. Kuang, Y. Cao, J. Zheng, X. He, M. Gao and A. Zhou	In contrast to existing distant supervision approaches that suffer from insufficient training corpora to extract relations, our proposal of mining implicit mutual relation from the massive unlabeled corpora transfers the semantic information of entity pairs into the RE model, which is more expressive and semantically plausible.
87	SONG: Approximate Nearest Neighbor Search on GPU	W. Zhao, S. Tan and P. Li	In this paper, we present a novel framework that decouples the searching on graph algorithm into 3 stages, in order to parallel the performance-crucial distance computation.
88	R2LSH: A Nearest Neighbor Search Scheme Based on Two-dimensional Projected Spaces	K. Lu and M. Kudo	In this paper, we propose a novel and easy-to-implement disk- based method named R2LSH to answer ANN queries in highdimensional spaces.
89	Online Indices for Predictive Top-k Entity and Aggregate Queries on Knowledge Graphs	Y. Li, T. Ge and C. Chen	To improve query processing efficiency, we propose an incremental index on top of low dimensional entity vectors transformed from network embedding vectors.
90	Enabling Efficient Random Access to Hierarchically-Compressed Data	F. Zhang, J. Zhai, X. Shen, O. Mutlu and X. Du	This paper presents a set of techniques that successfully eliminate the limitation, and for the first time, establishes the feasibility of effectively handling both data traversal operations and random data accesses on hierarchically-compressed data.
91	Adaptive Top-k Overlap Set Similarity Joins	Z. Yang, B. Zheng, G. Li, X. Zhao, X. Zhou and C. S. Jensen	To avoid this problem, we propose a solution to the top-k overlap set similarity join (TkOSSJ) that returns k pairs of sets with the highest overlap similarities.
92	Load Shedding for Complex Event Processing: Input-based and State-based Techniques	B. Zhao, N. Q. Viet Hung and M. Weidlich	In this work, we therefore complement input-based load shedding with a statebased technique that discards partial matches.
93	SPEAr: Expediting Stream Processing with Accuracy Guarantees	N. R. Katsipoulakis, A. Labrinidis and P. K. Chrysanthis	We built SPEAr on top of Storm and our experiments indicate that it can reduce processing times by more than an order of magnitude, use more than an order of magnitude less memory, and offer accuracy guarantees in real-world benchmarks.
94	Temporal Network Representation Learning via Historical Neighborhoods Aggregation	S. Huang, Z. Bao, G. Li, Y. Zhou and J. S. Culpepper	In this paper, we propose the Embedding via Historical Neighborhoods Aggregation (EHNA) algorithm.
95	An Interval-centric Model for Distributed Computing over Temporal Graphs	S. Gandhi and Y. Simmhan	We propose an interval-centric computing model (ICM) for distributed and iterative processing of temporal graphs, where a vertex?s time-interval is a unit of data-parallel computation.
96	CrashSim: An Efficient Algorithm for Computing SimRank over Static and Temporal Graphs	M. Li, F. M. Choudhury, R. Borovica-Gajic, Z. Wang, J. Xin and J. Li	In this work, we first propose CrashSim, an index-free algorithm for single-source SimRank computation in static graphs.
97	Efficiently Answering Span-Reachability Queries in Large Temporal Graphs	D. Wen, Y. Huang, Y. Zhang, L. Qin, W. Zhang and X. Lin	In this paper, we define a new reachability model, called span-reachability, designed to relax the time order dependency and identify the relationship between entities in a given time period.
98	Turbocharging Geospatial Visualization Dashboards via a Materialized Sampling Cube Approach	J. Yu and M. Sarwat	In this paper, we present a middleware framework that runs on top of a SQL data system with the purpose of increasing the interactivity of geospatial visualization dashboards.
99	Sya: Enabling Spatial Awareness inside Probabilistic Knowledge Base Construction	I. Sabek and M. F. Mokbel	This paper presents Sya; the first spatial probabilistic knowledge base construction system, based on Markov Logic Networks (MLN).
100	Fast Query Decomposition for Batch Shortest Path Processing in Road Networks	L. Li, M. Zhang, W. Hua and X. Zhou	In this paper, we aim to improve the performance of batch shortest path algorithms by revisiting the problem of query clustering.
101	Efficient Attribute-Constrained Co-Located Community Search	J. Luo, X. Cao, X. Xie, Q. Qu, Z. Xu and C. S. Jensen	We study the problem of attribute-constrained co-located community (ACOC) search, which returns a community that satisfies three properties: i) structural cohesiveness: the members in the community are densely connected; ii) spatial co-location: the members are close to each other; and iii) attribute constraint: a set of attributes are covered by the attributes associated with the members.
102	Indoor Top-k Keyword-aware Routing Query	Z. Feng, T. Liu, H. Li, H. Lu, L. Shou and J. Xu	In this paper, we study the indoor top-k keyword-aware routing query (IKRQ).
103	Latte: A Native Table Engine On Nvme Storage	J. Chu, Y. Tu, Y. Zhang and C. Weng	To fully exploit the hardware potential of NVMe devices, we propose a lightweight native storage stack called Lightstack to minimize the software overhead.
104	Doubleheader Logging: Eliminating Journal Write Overhead for Mobile DBMS	S. Oh, W. Kim, J. Seo, H. Song, S. H. Noh and B. Nam	In this work, we propose a crash consistent in-place update logging method – doubleheader logging (DHL) for SQLite.
105	GSI: GPU-friendly Subgraph Isomorphism	L. Zeng, L. Zou, M. T. ?zsu, L. Hu and F. Zhang	In this paper, we propose a GPU-friendly subgraph isomorphism algorithm, GSI.
106	FPGA-based Compaction Engine for Accelerating LSM-tree Key-Value Stores	X. Sun, J. Yu, Z. Zhou and C. J. Xue	In this paper, we design and implement an FPGA-based compaction engine to accelerate compaction in LSM-tree based key-value stores.
107	Getting Swole: Generating Access-Aware Code with Predicate Pullups	A. Crotty, A. Galakatos and T. Kraska	Therefore, we propose SWOLE, the first access-aware code generation strategy.
108	Video Monitoring Queries	N. Koudas, R. Li and I. Xarchakos	In particular we introduce a set of approximate filters to speed up queries that involve objects of specific type (e.g., cars, trucks, etc.) on video frames with associated spatial relationships among them (e.g., car left of truck).
109	Reinforcement Learning with Tree-LSTM for Join Order Selection	X. Yu, G. Li, C. Chai and N. Tang	In this paper, we present RTOS, a novel learned optimizer that uses Reinforcement learning with Tree-structured long short-term memory (LSTM) for join Order Selection.
110	Approximate Query Processing for Data Exploration using Deep Generative Models	S. Thirumuruganathan, S. Hasan, N. Koudas and G. Das	In this work, we explore the usage of deep learning (DL) for answering aggregate queries specifically for interactive applications such as data exploration and visualization.
111	SuRF: Identification of Interesting Data Regions with Surrogate Models	F. Savva, C. Anagnostopoulos and P. Triantafillou	This paper studies the reverse problem: analysts provide a cut-off value for a statistic of interest and in turn our proposed framework efficiently identifies multidimensional regions whose statistic exceeds (or is below) the given cut-off value (according to user?s needs).
112	Two-Level Data Compression using Machine Learning in Time Series Database	X. Yu et al.	In this paper, we propose a two-level compression model that selects a proper compression scheme for each individual point, so that diverse patterns can be captured at a fine granularity.
113	SeeMoRe: A Fault-Tolerant Protocol for Hybrid Cloud Environments	M. J. Amiri, S. Maiyya, D. Agrawal and A. E. Abbadi	In this paper, we consider a private cloud consisting of nonmalicious nodes (crash-only failures) and a public cloud with possible malicious failures.
114	On Sharding Open Blockchains with Smart Contracts	Y. Tao, B. Li, J. Jiang, H. C. Ng, C. Wang and B. Li	In this paper, we propose, analyze, and implement a new distributed and dynamic sharding system to substantially improve the throughput of blockchain systems based on smart contracts, while requiring minimum cross-shard communication.
115	G-thinker: A Distributed Framework for Mining Subgraphs in a Big Graph	D. Yan, G. Guo, M. M. Rahman Chowdhury, M. Tamer ?zsu, W. Ku and J. C. S. Lui	We propose the first truly CPU-bound distributed framework called G-thinker that adopts a user-friendly subgraph-centric vertex-pulling API for writing distributed subgraph mining algorithms.
116	DynaMast: Adaptive Dynamic Mastering for Replicated Systems	M. Abebe, B. Glasbergen and K. Daudjee	We present DynaMast, a lazily replicated, multi-master database system that guarantees one-site transaction execution while effectively distributing both reads and updates among multiple sites.
117	Fela: Incorporating Flexible Parallelism and Elastic Tuning to Accelerate Large-Scale DML	J. Geng, D. Li and S. Wang	Targeting at these existing drawbacks, we propose Fela, which incorporates both flexible parallelism and elastic tuning mechanism to accelerate DML.
118	Sequence-Aware Factorization Machines for Temporal Predictive Analytics	T. Chen, H. Yin, Q. V. Hung Nguyen, W. Peng, X. Li and X. Zhou	Hence, in this paper, we propose a novel Sequence-Aware Factorization Machine (SeqFM) for temporal predictive analytics, which models feature interactions by fully investigating the effect of sequential dependencies.
119	Stochastic Origin-Destination Matrix Forecasting Using Dual-Stage Graph Convolutional, Recurrent Neural Networks	J. Hu, B. Yang, C. Guo, C. S. Jensen and H. Xiong	To solve this problem, we propose a generic learning framework that (i) employs matrix factorization and graph convolutional neural networks to contend with the data sparseness while capturing spatial correlations and that (ii) captures spatio-temporal dynamics via recurrent neural networks extended with graph convolutions.
120	Query Results over Ongoing Databases that Remain Valid as Time Passes By	Y. M?lle and M. H. B?hlen	We propose a solution that keeps ongoing time points uninstantiated during query processing.
121	Indoor Mobility Semantics Annotation Using Coupled Conditional Markov Networks	H. Li, H. Lu, M. A. Cheema, L. Shou and G. Chen	This work studies the annotation of indoor mobility semantics that describe an object’s mobility event (what ) at a semantic indoor region (where ) during a time period (when ).
122	Towards Factorized SVM with Gaussian Kernels over Normalized Data	K. Yang, Y. Gao, L. Liang, B. Yao, S. Wen and G. Chen	In this paper, we focus on the factorized SVM with gaussian kernels over normalized data.
123	FESIA: A Fast and SIMD-Efficient Set Intersection Approach on Modern CPUs	J. Zhang, Y. Lu, D. G. Spampinato and F. Franchetti	In this paper, we present FESIA, a new set intersection approach on modern CPUs.
124	Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory	P. Fent, A. v. Renen, A. Kipf, V. Leis, T. Neumann and A. Kemper	We propose L5, a high-performance communication layer for database systems.
125	ML-based Cross-Platform Query Optimization	Z. Kaoudi, J. Quian?-Ruiz, B. Contreras-Rojas, R. Pardo-Meza, A. Troudi and S. Chawla	We overcome these challenges in Robopt, a novel vector-based optimizer we have built for Rheem, a cross-platform system.
126	Automatic View Generation with Deep Learning and Reinforcement Learning	H. Yuan, G. Li, L. Feng, J. Sun and Y. Han	To address this problem, we propose an automatic view generation method which judiciously selects “highly beneficial” subqueries to generate materialized views.
127	C olumnSGD: A Column-oriented Framework for Distributed Stochastic Gradient Descent	Z. Zhang, W. Wu, J. Jiang, L. Yu, B. Cui and C. Zhang	Following this locality property, we develop a simple yet powerful computation framework that significantly reduces communication overheads and memory footprints compared to RowSGD, for large-scale ML models such as generalized linear models (GLMs) and factorization machines (FMs).
128	In-database connected component analysis	H. B?geholz, M. Brand and R. Todor	We describe a Big Data-practical, SQL-implementable algorithm for efficiently determining connected components for graph data stored in a Massively Parallel Processing (MPP) relational database.
129	Towards Concurrent Stateful Stream Processing on Multicore Processors	S. Zhang, Y. Wu, F. Zhang and B. He	This paper introduces TStream, a novel DSPS supporting efficient concurrent state access on multicore processors.
130	PSGraph: How Tencent trains extremely large-scale graphs with Spark?	J. Jiang et al.	To address these challenges, we develop a new graph processing system, called PSGraph, which uses Spark executor and PyTorch to perform calculation, and develops a distributed parameter server to store frequently accessed models.
131	JUST: JD Urban Spatio-Temporal Data Engine	R. Li et al.	This paper presents JUST, i.e., JD Urban Spatio-Temporal data engine, which can efficiently manage big spatio-temporal data in a convenient way.
132	Oracle Database In-Memory on Active Data Guard: Real-time Analytics on a Standby Database	S. Pendse et al.	In this paper, we explore and address the key challenges involved in building the DBIM-on-ADG infrastructure, including synchronized maintenance of the In-Memory Column Store on the Standby database, with high-speed OLTP activity continuously modifying data on the Primary database.
133	Data Sentinel: A Declarative Production-Scale Data Validation Platform	A. Swami, S. Vasudevan and J. Huyn	The contributions of this paper include the following: 1) Data Sentinel, a declarative production-scale data validation platform successfully deployed at LinkedIn 2) A generic design to build and deploy similar systems for production environments 3) Experiences and lessons learned that can benefit practitioners with similar objectives.
134	Turbine: Facebook?s Service Management Platform for Stream Processing	Y. Mei et al.	This paper describes the Turbine architecture, discusses the design choices behind it, and shares several case studies demonstrating Turbine capabilities in production.
135	Speed Kit: A Polyglot & GDPR-Compliant Approach For Caching Personalized Content	W. Wingerath et al.	In this paper, we present Speed Kit as a radically different approach for content distribution that combines (1) a polyglot architecture for efficiently caching personalized content with (2) a natively GDPR-compliant client proxy that handles all sensitive information within the user device.
136	De-Health: All Your Online Health Information Are Belong to Us	S. Ji et al.	In this paper, we study the privacy of online health data.
137	Maxson: Reduce Duplicate Parsing Overhead on Raw Data	X. Shi et al.	In this paper, we start with a study with a real production workload in Alibaba, which consists of over 3 million queries on JSON.
138	Automatic Calibration of Road Intersection Topology using Trajectories	L. Zhao et al.	To address above challenges, we propose a three-phase calibration framework, called CITT.
139	SAFE: Scalable Automatic Feature Engineering Framework for Industrial Tasks	Q. Shi, Y. Zhang, L. Li, X. Yang, M. Li and J. Zhou	In this paper, we proposed a staged method named SAFE (Scalable Automatic Feature Engineering), which can provide excellent efficiency and scalability, along with requisite interpretability and promising performance.
140	Cross-Graph Convolution Learning for Large-Scale Text-Picture Shopping Guide in E-Commerce Search	T. Zhang et al.	In this work, a new e-commerce search service named text-picture shopping guide (TPSG) is investigated and deployed to one of the most popular shopping platforms called Taobao.
141	Billion-scale Recommendation with Heterogeneous Side Information at Taobao	A. Pfadler, H. Zhao, J. Wang, L. Wang, P. Huang and D. L. Lee	To address these challenges, in this work, we present a flexible and highly scalable Side Information (SI) enhanced Skip-Gram (SISG) framework, which is deployed at Taobao.
142	Hierarchical Bipartite Graph Neural Networks: Towards Large-Scale E-commerce Applications	Z. Li et al.	To address these problems, we propose a novel method with Hierarchical bipartite Graph Neural Network (HiGNN) to handle large-scale e-commerce tasks.
143	LoCEC: Local Community-based Edge Classification in Large Online Social Networks	C. Song et al.	To tackle the challenges, we propose a Local Community-based Edge Classification (LoCEC) framework that classifies user relationships in a social network into real-world social connection types.
144	APTrace: A Responsive System for Agile Enterprise Level Causality Analysis	J. Gui et al.	In this paper, we propose a novel system, APTrace, to meet both of the above requirements.
145	HomoPAI: A Secure Collaborative Machine Learning Platform based on Homomorphic Encryption	Q. Li et al.	We propose HomoPAI, an HE-based secure collaborative machine learning system, enabling a more promising scenario, where data from multiple data owners could be securely processed.
146	ForkBase: Immutable, Tamper-evident Storage Substrate for Branchable Applications	Q. Lin et al.	To this end, we developed ForkBase to make Git for data practical. ForkBase is a distributed, immutable storage system designed for data version management and data collaborative operation.
147	MC-Explorer: Analyzing and Visualizing Motif-Cliques on Large Networks	B. Li et al.	To extract insight from this rich information source, we propose MC-Explorer, which is an advanced analysis and visualization system.
148	JODA: A Vertically Scalable, Lightweight JSON Processor for Big Data Transformations	N. Sch?fer and S. Michel	We describe the demonstration of JODA (Json On Demand Analytics), an approach to handling large amounts of JSON documents in a vertically scalable manner.
149	vCBIR: A Verifiable Search Engine for Content-Based Image Retrieval	S. Guo, Y. Ji, C. Zhang, C. Xu and J. Xu	We demonstrate vCBIR, a verifiable search engine for Content-Based Image Retrieval.
150	MusX: Online Exploring and Visualizing Graph-Based Musical Adaptations	F. L?vesque, M. St-Germain, D. Pich?, J. Gauvin, M. Gagnon and T. Hurtut	In this paper, we present a detailed description of MusX along with design and technical considerations, and the demonstration scenarios we intend to present to the audience.
151	RIDE: A System for Generalized Region of Interest Discovery and Exploration	Q. Liu, L. Zheng and L. Chen	To address the challenge of conducting ROI queries on the increasingly complex spatial data, we present RIDE, an efficient and effective system for generalized ROI Discovery and Exploration.
152	PocketView: A Concise and Informative Data Summarizer	Y. Xi, N. Wang, S. Hao, W. Yang and L. Li	In this demonstration, we propose a summarizer system called PocketView that is able to create a data summarization through a pocket view of the table.
153	CSQ System: A System to Support Constrained Skyline Queries on Transportation Networks	Q. Gong, J. Liu and H. Cao	In this paper, we present a system to answer MCTN-constrained CSQs, namely CSQ system.
154	SUDAF: Sharing User-Defined Aggregate Functions	C. Zhang, F. Toumani and B. Doreau	We present SUDAF (Sharing User-Defined Aggregate Functions), a declarative framework that allows users to formulate UDAFs as mathematical expressions and use them in SQL statements.
155	Automating Software Citation using GitCite	L. Chen and S. B. Davidson	This paper presents GitCite, a model for software citation with version control which enables citations to be inferred for any project component based on a small number of explicit citations attached to subdirectories/files, and an implementation that integrates with Git and GitHub.
156	SCLPD: Smart Cargo Loading Plan Decision Framework	J. Liu, J. Mao, J. Liao, H. Hu, Y. Guo and A. Zhou	This paper puts forward a system implementation of smart cargo loading plan decision framework (SCLPD for short) for steel logistics industry.
157	DCDT: A Digital Clock Drawing Test System for Cognitive Impairment Screening	F. Xu, Y. Ding, Z. Ling, X. Li, Y. Li and S. Wang	In this paper, we?d like to introduce DCDT, a novel Clock Drawing Test system based on digital collection and intellectualized analysis.
158	Kronos: Lightweight Knowledge-based Event Analysis in Cyber-Physical Data Streams	M. H. Namaki et al.	We demonstrate Kronos, a framework and system that automatically extracts highly dynamic knowledge for complex event analysis in Cyber-Physical systems.
159	DLEEL: Multi-Predicate Spatial Queries on User-generated Streaming Data	A. Almaslukh, L. Abdelhafeez and A. Magdy	This paper demonstrates DLEEL; a research system that supports scalable spatial queries with multiple predicates on user-generated data streams, such as social media streams.
160	Querying Streaming System Monitoring Data for Enterprise System Anomaly Detection	P. Gao et al.	In the demo, we aim to show the complete usage scenario of SAQL by (1) performing an APT attack in a controlled environment, and (2) using SAQL to detect the abnormal behaviors in real time by querying the collected stream of system monitoring data that contains the attack traces.
161	SAD: An Unsupervised System for Subsequence Anomaly Detection	P. Boniol, M. Linardi, F. Roncallo and T. Palpanas	In this demonstration, we present a system for unsupervised Subsequence Anomaly Detection (SAD) that uses the NorM method.
162	Machine Learning Meets Big Spatial Data	I. Sabek and M. F. Mokbel	In this 90-minutes tutorial, we comprehensively review the state-of-the-art work in the intersection of machine learning and big spatial data.
163	On the Integration of Machine Learning and Array Databases	S. Villarroya and P. Baumann	This tutorial focuses on the integration of machine learning algorithms and array databases.
164	Visualization Systems for Linked Datasets	M. Krommyda and V. Kantere	We present here a survey on these techniques, their strengths and weaknesses as well as the datasets that they can support.
165	Modern Large-Scale Data Management Systems after 40 Years of Consensus	M. J. Amiri, D. Agrawal and A. E. Abbadi	In this tutorial, we discuss consensus protocols that are used in modern large-scale data management systems, classify them into different categories based on their assumptions on network synchrony, failure model of nodes, etc., and elaborate on their main advantages and limitations.
166	Advances in Cryptography and Secure Hardware for Data Outsourcing	S. Sharma, A. Burtsev and S. Mehrotra	This tutorial focuses on recent advances in secure cloud-based data outsourcing based on cryptographic (encryption, secret-sharing, and multi-party computation (MPC)) and hardware-based approaches.
167	PushdownDB: Accelerating a DBMS Using S3 Computation	X. Yu et al.	This paper studies the effectiveness of pushing parts of DBMS analytics queries into the Simple Storage Service (S3) of Amazon Web Services (AWS), using a recently released capability called S3 Select.
168	Task Deployment Recommendation with Worker Availability	D. Wei, S. B. Roy and S. Amer-Yahia	We propose BatchStrat, an optimization-driven middle layer that recommends deployment strategies to a batch of requests by accounting for worker availability.
169	Towards Extracting Highlights From Recorded Live Videos: An Implicit Crowdsourcing Approach	R. Jiang, C. Qu, J. Wang, C. Wang and Y. Zheng	In this paper, we propose LIGHTOR, a novel implicit crowd-sourcing approach to overcome these limitations.
170	Crowdsourcing-based Data Extraction from Visualization Charts	C. Chai, G. Li, J. Fan and Y. Luo	In this paper, we introduce a crowdsourcing approach that leverages human ability to extract data from visualization charts.
171	Predicting Origin-Destination Flow via Multi-Perspective Graph Convolutional Network	H. Shi et al.	We propose Multi-Perspective Graph Convolutional Networks (MPGCN) to capture the complex dependencies.
172	Learning to Simulate Vehicle Trajectories from Demonstrations	G. Zheng, H. Liu, K. Xu and Z. Li	Considering the complexity and non-linearity of the real-world traffic, this paper unprecedentedly treat the problem of traffic simulation as a learning problem, and proposes learning to simulate (L2S) vehicle trajectory.
173	FakeDetector: Effective Fake News Detection with Deep Diffusive Neural Network	J. Zhang, B. Dong and P. S. Yu	This paper introduces a novel gated graph neural network, namely FAKEDETECTOR.
174	Mining Verb-Oriented Commonsense Knowledge	J. Liu et al.	In this paper, we focus on the automatic acquisition of a typical kind of implicit verb-oriented commonsense knowledge (e.g., “person eats food”), which is the concept level knowledge of verb phrases.
175	Automated Anomaly Detection in Large Sequences	P. Boniol, M. Linardi, F. Roncallo and T. Palpanas	In this work, we address these problems, and propose NorM, a novel approach, suitable for domain-agnostic anomaly detection.
176	SLED: Semi-supervised Locally-weighted Ensemble Detector	S. Zhang, D. T. Jung Huang, G. Dobbie and Y. S. Koh	In this research, we propose a semi-supervised locally-weighted ensemble detector (SLED), where the relative performance among its base detectors is characterized by a set of weights learned in a semi-supervised manner.
177	Hierarchical Quick Shift Guided Recurrent Clustering	M. C. Altinigneli, L. Miklautz, C. B?hm and C. Plant	We propose a novel density-based mode-seeking Hierarchical Quick Shift clustering algorithm with an optional Recurrent Neural Network (RNN) to jointly learn the cluster assignments for every sample and the underlying dynamics of the mode-seeking clustering process.
178	Matrix Profile XVII: Indexing the Matrix Profile to Allow Arbitrary Range Queries	Y. Zhu, C. M. Yeh, Z. Zimmerman and E. Keogh	In this work we introduce a novel indexing framework that allows queries about arbitrary ranges to be answered in quasilinear time, allowing such queries to be interactive for the first time.
179	D-Tucker: Fast and Memory-Efficient Tucker Decomposition for Dense Tensors	J. Jang and U. Kang	In this paper, we propose D-Tucker, a fast and memory-efficient method for Tucker decomposition on large dense tensors.
180	A Unified Framework for Multi-view Spectral Clustering	G. Zhong and C. Pun	To address these, we design a unified multi-view spectral clustering scheme to learn the discrete cluster indicator matrix in one stage.
181	Comprehensive and Efficient Data Labeling via Adaptive Model Scheduling	M. Yuan, L. Zhang, X. Li and H. Xiong	In this paper, we present an Adaptive Model Scheduling framework, consisting of 1) a deep reinforcement learning-based approach to predict the value of unexecuted models by mining semantic relationship among diverse models, and 2) two heuristic algorithms to adaptively schedule models under deadline or deadline-memory constraints.
182	Target Privacy Preserving for Social Networks	Z. Jiang, L. Sun, P. S. Yu, H. Li, J. Ma and Y. Shen	In this paper, we incorporate the realistic scenario of key protection into link privacy preserving and propose the target-link privacy preserving (TPP) model: target links referred to as targets are the most important and sensitive objectives that would be intentionally attacked by adversaries, in order that need privacy protections, while other links of less privacy concerns are properly released to maintain the graph utility.
183	Traffic Incident Detection: A Trajectory-based Approach	X. Han, T. Grubenmann, R. Cheng, S. C. Wong, X. Li and W. Sun	In this paper, we ask the question: Can ID be performed on sparse traffic data (e.g., location data obtained from GPS devices equipped on vehicles)? As these data may not be enough to describe the state of the roads involved, they can undermine the effectiveness of existing ID solutions.
184	Collective Entity Alignment via Adaptive Features	W. Zeng, X. Zhao, J. Tang and X. Lin	To fill this gap, we propose a collective EA framework.
185	InvaliDB: Scalable Push-Based Real-Time Queries on Top of Pull-Based Databases	W. Wingerath, F. Gessert and N. Ritter	To address these issues, we propose the system design InvaliDB which combines linear read and write scalability for real-time queries with superior query expressiveness and legacy compatibility.
186	Discovering Band Order Dependencies	P. Li, J. Szlichta, M. Bohlen and D. Srivastava	We introduce band ODs to model the semantics of attributes that are monotonically related with small variations without there being an intrinsic violation of semantics.
187	The Pastwatch: On the usability of provenance data in relational databases	O. AlOmeir, E. Y. Lai, M. Milani and R. Pottinger	We present a set of criteria that any provenance exploration tool must have and introduce Pastwatch, a provenance exploration system that adheres to those criteria.
188	Query-driven Repair of Functional Dependency Violations	S. Giannakopoulou, M. Karpathiotakis and A. Ailamaki	We propose an approach that performs probabilistic repair of functional dependency violations on-demand, driven by the exploratory analysis that users perform.
189	Outdated Fact Detection in Knowledge Bases	S. Hao, C. Chai, G. Li, N. Tang, N. Wang and X. Yu	In this paper, we propose a novel human-in-the-loop approach for outdated fact detection in KBs.
190	Preserving Contextual Information in Relational Matrix Operations	O. Dolmatova, N. Augsten and M. H. B?hlen	We propose relational matrix operations that support the analysis of data stored in tables and that preserve contextual information.
191	Stay Ahead of Poachers: Illegal Wildlife Poaching Prediction and Patrol Planning Under Uncertainty with Field Test Evaluations (Short Version)	L. Xu et al.	In this paper, we take an end-to-end approach to the data-to-deployment pipeline for anti-poaching.
192	Telescope: An Automatic Feature Extraction and Transformation Approach for Time Series Forecasting on a Level-Playing Field	A. Bauer, M. Z?fle, N. Herbst, S. Kounev and V. Curtef	In this paper, we propose a fully automated machine learning-based forecasting approach.
193	Auto-Model: Utilizing Research Papers and HPO Techniques to Deal with the CASH problem	C. Wang, H. Wang, T. Mu, J. Li and H. Gao	In this paper, we design the Auto-Model approach, which makes full use of known information in the related research paper and introduces hyperparameter optimization techniques, to solve the CASH problem effectively.
194	Toward Sampling for Deep Learning Model Diagnosis	P. Mehta, S. Portillo, M. Balazinska and A. Connolly	In this paper, we develop a novel data sampling technique that produces approximate but accurate results for these model debugging queries.
195	Approximate Quantiles for Datacenter Telemetry Monitoring	G. Lim, M. S. Hassan, Z. Jin, S. Volos and M. Jeon	To address these challenges, we propose AOMG, an efficient and accurate quantile approximation algorithm that capitalizes insights from our workload study.
196	An Intersectional Definition of Fairness	J. R. Foulds, R. Islam, K. N. Keya and S. Pan	We propose differential fairness, a multi-attribute definition of fairness in machine learning which is informed by intersectionality, a critical lens arising from the humanities literature, leveraging connections between differential privacy and legal notions of fairness.
197	Towards Locally Differentially Private Generic Graph Metric Estimation	Q. Ye, H. Hu, M. H. Au, X. Meng and X. Xiao	In this paper, we address these two issues by presenting LF-GDPR, the first LDP-enabled graph metric estimation framework for graph analysis.
198	BFT-Store: Storage Partition for Permissioned Blockchain via Erasure Coding	X. Qi, Z. Zhang, C. Jin and A. Zhou	This paper proposes a novel storage engine, called BFT-Store, to enhance storage scalability by integrating erasure coding with Byzantine Fault Tolerance (BFT) consensus protocol.
199	Reasoning about the Future in Blockchain Databases	S. Cohen, A. Rosenthal and A. Zohar	The main issue that we tackle is the need to reason about possible worlds, due to the uncertainty in transaction appending.
200	Deciding When to Trade Data Freshness for Performance in MongoDB-as-a-Service	C. Huang, M. Cahill, A. Fekete and U. Rohm	It should be better to choose the appropriate Read Preference setting at runtime, as we describe in this paper. We show how a system can detect when the primary copy is saturated in MongoDB-as-a-Service, and use this to choose where reads should be done to improve overall performance.
201	PrefixFPM: A Parallel Framework for General-Purpose Frequent Pattern Mining	D. Yan, W. Qu, G. Guo and X. Wang	This paper presents, PrefixFPM, a general-purpose framework for FPM that is able to fully utilize the CPU cores in a multicore machine.
202	HBP: Hotness Balanced Partition for Prioritized Iterative Graph Computations	S. Gong, Y. Zhang and G. Yu	To accelerate prioritized iterative graph computations, we propose Hotness Balanced Partition (HBP) and a stream-based partition algorithm Pb-HBP.
203	Computing Mutual Information of Big Categorical Data and Its Application to Feature Grouping	J. Li, C. Zhang, J. Zhang and X. Qin	This paper develops a parallel computing system – MiCS – for mutual information of big categorical data on the Spark computing platform.
204	StructSim: Querying Structural Node Similarity at Billion Scale	X. Chen, L. Lai, L. Qin and X. Lin	In this paper, we propose a new framework StructSim to compute nodes? role similarity.
205	HowSim: A General and Effective Similarity Measure on Heterogeneous Information Networks	Y. Wang et al.	To address this problem, we extend SimRank, a well-known similarity measure for homogeneous graphs, to HINs, by introducing the concept of decay graph.
206	GraphAE: Adaptive Embedding across Graphs	B. Yan and C. Wang	In this paper, we present an interesting graph embedding problem called Adaptive Task (AT), and propose a unified framework for this adaptive task, which introduces two types of alignment to learn adaptive node embedding across graphs.
207	FlashSchema: Achieving High Quality XML Schemas with Powerful Inference Algorithms and Large-scale Schema Data	Y. Li, J. Cao, H. Chen, T. Ge, Z. Xu and Q. Peng	In this paper, we propose a tool FlashSchema for high quality XML schema design, which supports both one-pass and interactive schema design and schema recommendation.
208	Efficient Structural Clustering in Large Uncertain Graphs	Y. Liang, T. Hu and P. Zhao	In this paper, we develop a new, decomposition-based method, ProbSCAN, for efficient reliable structural similarity computation with theoretically improved complexity.
209	Efficient Weighted Independent Set Computation over Large Graphs	W. Zheng, J. Gu, P. Peng and J. X. Yu	Following the reduction-and-branching strategy, we propose an exact algorithm to compute the maximum weighted independent set.
210	Keys as Features for Graph Entity Matching	T. Deng, L. Hou and Z. Han	We treat entity matching as a classification problem, and propose GMKSLEM, a supervised learning method for graph entity matching.
211	Online Pricing with Reserve Price Constraint for Personal Data Markets	C. Niu, Z. Zheng, F. Wu, S. Tang and G. Chen	In this paper, we study how the data broker can maximize her cumulative revenue by posting reasonable prices for sequential queries.
212	Permutation Index: Exploiting Data Skew for Improved Query Performance	W. Zhang and K. A. Ross	In this paper, we propose a novel index structure for repositioning data items to concentrate popular items into the same cache lines, resulting in better spatial locality, and better utilization of limited cache resources.
213	Efficient Locality-Sensitive Hashing Over High-Dimensional Data Streams	C. Yang, D. Deng, S. Shang and L. Shao	In this paper, we present PDA-LSH, a novel and practical disk-based LSH index that can offer efficient support for both updates and searches.
214	A Class of R*-tree Indexes for Spatial-Visual Search of Geo-tagged Street Images	A. Alfarrarjeh et al.	Therefore, we propose a class of R-tree indexes, particularly, by associating each node with two separate minimum bounding rectangles (MBR), one for spatial and the other for (dimension-reduced) visual properties of their contained images, and adapting the R-tree optimization criteria to both property types.
215	Graph Embeddings for One-pass Processing of Heterogeneous Queries	C. T. Duong et al.	Specifically, we propose graph-based models in which both, data and queries, incorporate information of different modalities.
216	Fast Error-tolerant Location-aware Query Autocompletion	J. Wang and C. Lin	In this paper, we propose a novel framework AutoEL to support error-tolerant location-aware query autocompletion.
217	TrajMesa: A Distributed NoSQL Storage Engine for Big Trajectory Data	R. Li et al.	This paper proposes a holistic distributed NoSQL trajectory storage engine, TrajMesa, based on GeoMesa, an open-source indexing toolkit for spatio-temporal data.
218	Learning to Rank Paths in Spatial Networks	S. B. Yang and B. Yang	We present PathRank, a data-driven framework for ranking paths based on historical trajectories.
219	A Hybrid Learning Approach to Stochastic Routing	S. A. Pedersen, B. Yang and C. S. Jensen	We propose a hybrid approach that combines convolution and machine learning-based estimation to take into account dependencies among distributions in order to improve accuracy.
220	Shortest Path Queries for Indoor Venues with Temporal Variations	T. Liu et al.	In this paper, we define a new type of query called Indoor Temporal-variation aware Shortest Path Query (ITSPQ).
221	DAG: A General Model for Privacy-Preserving Data Mining : (Extended Abstract)	S. G. Teo, J. Cao and V. C. S. Lee	To address this issue, we propose a privacy model DAG (Directed Acyclic Graph) that consists of a set of fundamental secure operators (e.g., +, -, ?, /, and power).
222	TIDY: Publishing a Time Interval Dataset with Differential Privacy (Extended abstract)	W. Jung, S. Kwon and K. Shim	We propose the TIDY (publishing Time Intervals via Differential privacY) algorithm to release time interval data under differential privacy.
223	The Power of Bounds: Answering Approximate Earth Mover?s Distance with Parametric Bounds (Extended abstract)	T. N. Chan, M. Lung Yiu and L. H. U	In this work, we study how to compute approximate EMD value with bounded error, using these bound functions.
224	On Nearby-Fit Spatial Keyword Queries (Extended Abstract)	V. J. Wei, R. Chi-Wing Wong, C. Long and P. Hui	In this paper, we propose a new type of query called nearby-fit spatial keyword query (NSKQ), where an optimal object is defined based not only on the location and the keywords of the object itself, but also on those of the objects nearby.
225	ChronoGraph: Enabling temporal graph traversals for efficient information diffusion analysis over time	J. Byun, S. Woo and D. Kim	ChronoGraph: Enabling temporal graph traversals for efficient information diffusion analysis over time
226	Efficient Distance Sensitivity Oracles for Real-World Graph Data	J. Lee and C. Chung	Motivated by this, we develop two practical distance sensitivity oracles for directed graphs as variants of Transit Node Routing, and effective speed-up techniques with a slight loss of accuracy.
227	Demythization of Structural XML Query Processing: Comparison of Holistic and Binary Approaches (Extended Abstract)	P. Luk?, R. Baca, M. Kr?tk? and T. Wang Ling	However, a thorough analytical and experimental comparison of binary and holistic joins has been missing despite an enormous research effort in this area. In this paper, we try to fill this gap.
228	Answering Skyline Queries over Incomplete Data with Crowdsourcing(Extended Abstract)	X. Miao, Y. Gao, S. Guo, L. Chen, J. Yin and Q. Li	In this paper, we study the problem of skyline queries over incomplete data with crowdsourcing.
229	HisRect: Features from Historical Visits and Recent Tweet for Co-Location Judgement	P. Li, H. Lu, Q. Zheng, S. Li and G. Pan	This study explores the problem of co-location judgement, i.e., to decide whether two Twitter users are co-located at some point-of-interest (POI).
230	K-SPIN: Efficiently Processing Spatial Keyword Queries on Road Networks : (Extended Abstract)	T. Abeywickrama, M. A. Cheema and A. Khan	Instead, we propose the K-SPIN framework, which uses an alternative keyword separation strategy that is more suitable on road networks.
231	ESPM: Efficient Spatial Pattern Matching (Extended Abstract)	H. Chen, Y. Fang, Y. Zhang, W. Zhang and L. Wang	To enhance the performance of SPM, in this paper we propose a novel Efficient Spatial Pattern Matching (ESPM) algorithm, which exploits the inverted linear quadtree index and computes matched node pairs and object pairs level by level in a top-down manner.
232	A Transformation-based Framework for KNN Set Similarity Search(Extended Abstract)	Y. Zhang, J. Wu, J. Wang and C. Xing	In this paper we use the widely applied Jaccard to quantify the similarity between two sets, but our proposed techniques can be easily extended to other set-based similarity functions.
233	Matrix Factorization with Interval-Valued Data	M. Li, F. Di Mauro, K. Sel?uk Candan and M. L. Sapino	In this paper, we propose matrix decomposition techniques that consider the existence of interval-valued data.
234	Neighborhood density correlation clustering	Z. Wang and L. Zhong	In this paper, by analyzing the advantages and disadvantages of existing clustering analysis algorithms, a new neighborhood density correlation clustering (NDCC) algorithm for quickly discovering arbitrary shaped clusters is proposed that avoids the clustering errors caused by iso-density points between clusters.
235	Spark Performance Optimization Analysis In Memory Management with Deploy Mode In Standalone Cluster Computing	D. M. Adinew, Z. Shijie and Y. Liao	Investigating how performance is increased in relation to spark executor memory, number of executors, number of cores, and deploy mode parameters configuration in a standalone cluster model is our primary goal.
236	Design of Database Systems with DRAM-only Heterogeneous Memory Architecture	Y. Qiao	This thesis aims to develop techniques that enable relational and NoSQL databases to take full advantage of the envisioned low-cost heterogeneous DRAM system.
237	Data Series Indexing Gone Parallel	B. Peng	In this Ph.D. work, we present the first data series indexing solutions, for both on-disk and in-memory data, that are designed to inherently take advantage of multi-core architectures, in order to accelerate similarity search processing times.
238	Area Queries Based on Voronoi Diagrams	Y. Li and G. Liu	In view of this issue, we propose a method of iteratively generating candidates based on Voronoi diagrams and apply it to area queries.
239	Picube for Fast Exploration of Large Datasets	W. Fu	Inspired by this, we propose a partitioned, inductively aggregated data-cube, picube.