Paper Digest: ICDE 2020 Highlights
IEEE International Conference on Data Engineering (ICDE) addresses research issues in designing, building, managing, and evaluating advanced data-intensive systems and applications. In 2020, it is to be held virtually due to covid-19 pandemic.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
team@paperdigest.org
TABLE 1: ICDE 2020 Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | Real-Time Cross Online Matching in Spatial Crowdsourcing | Y. Cheng, B. Li, X. Zhou, Y. Yuan, G. Wang and L. Chen | In this paper, we propose a Cross Online Matching (COM), which enables a platform to “borrow” unoccupied crowd workers from other platforms for completing the user requests. |
2 | Predictive Task Assignment in Spatial Crowdsourcing: A Data-driven Approach | Y. Zhao, K. Zheng, Y. Cui, H. Su, F. Zhu and X. Zhou | We propose a two-phase data-driven framework. |
3 | Curiosity-Driven Energy-Efficient Worker Scheduling in Vehicular Crowdsourcing: A Deep Reinforcement Learning Approach | C. H. Liu et al. | In this paper, we explicitly consider the use of unmanned vehicular workers, e.g., drones and driverless cars, which are more controllable and can be deployed in remote or dangerous areas to carry on long-term and hash tasks as a vehicular crowdsourcing (VC) campaign. |
4 | Crowdsourced Collective Entity Resolution with Relational Match Propagation | J. Huang, W. Hu, Z. Bao and Y. Qu | In this paper, we propose a novel approach called crowdsourced collective ER, which leverages the relationships between entities to infer matches jointly rather than independently. |
5 | An End-to-End Deep RL Framework for Task Arrangement in Crowdsourcing Platforms | C. Shan, N. Mamoulis, R. Cheng, G. Li, X. Li and Y. Qian | In this paper, we propose a Deep Reinforcement Learning (RL) framework for task arrangement, which is a critical problem for the success of crowdsourcing platforms. |
6 | Efficient Bidirectional Order Dependency Discovery | Y. Jin, L. Zhu and Z. Tan | In this paper, we adopt a strategy that decouples the impact of m from that of n, and that still finds all minimal valid bidirectional order dependencies. |
7 | Efficient Diversity-Driven Ensemble for Deep Neural Networks | W. Zhang, J. Jiang, Y. Shao and B. Cui | As the effect of ensemble learning is more pronounced if ensemble members are accurate and diverse, we propose a method named Efficient Diversity-Driven Ensemble (EDDE) to address both the diversity and the efficiency of an ensemble. |
8 | Adaptive Network Alignment with Unsupervised and Multi-order Convolutional Networks | H. T. Trung, T. Van Vinh, N. T. Tam, H. Yin, M. Weidlich and N. Q. Viet Hung | In this paper, we propose a fully unsupervised network alignment framework based on a multi-order embedding model. |
9 | A Natural Language Interface for Database: Achieving Transfer-learnability Using Adversarial Method for Question Understanding | W. Wang, Y. Tian, H. Wang and W. Ku | In this work, we show that it is possible to separate data specific components from latent semantic structures in expressing relational queries in a natural language. |
10 | Array-based Data Management for Genomics | O. Horlova, A. Kaitoua and S. Ceri | Specifically, we define a wide spectrum of operations over datasets which are represented using arrays, and we show that the arraybased implementation scales well upon Spark, also thanks to a data representation which is effectively used for supporting machine learning. |
11 | Group Recommendation with Latent Voting Mechanism | L. Guo, H. Yin, Q. Wang, B. Cui, Z. Huang and L. Cui | Instead of exploring new heuristic or vanilla attention-based mechanism, we propose a new social self-attention based aggregation strategy by directly modeling the interactions among group members, namely Group Self-Attention (GroupSA). |
12 | Price-aware Recommendation with Graph Convolutional Networks | Y. Zheng, C. Gao, X. He, Y. Li and D. Jin | Towards the first difficulty, we propose to model the transitive relationship between user-to-item and item-to-price, taking the inspiration from the recently developed Graph Convolution Networks (GCN). |
13 | Syndrome-aware Herb Recommendation with Multi-Graph Convolution Network | Y. Jin, W. Zhang, X. He, X. Wang and X. Wang | Specifically, given a set of symptoms to treat, we aim to generate an overall syndrome representation by effectively fusing the embeddings of all the symptoms in the set, so as to mimic how a doctor induces the syndromes. |
14 | PoisonRec: An Adaptive Data Poisoning Framework for Attacking Black-box Recommender Systems | J. Song et al. | In this paper, we propose an adaptive data poisoning framework, PoisonRec, which can automatically learn effective attack strategies on various recommender systems with very limited knowledge. |
15 | Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Difficulty in Action Sequences | K. Umemoto, T. Milo and M. Kitsuregawa | We propose a progression model that uses latent variables to learn the monotonically non-decreasing progression of user skills. |
16 | Exploring Finer Granularity within the Cores: Efficient (k,p)-Core Computation | C. Zhang et al. | In this paper, we propose and study a novel cohesive subgraph model, named (k,p)-core, which is a maximal subgraph where each vertex has at least k neighbours and at least p fraction of its neighbours in the subgraph. |
17 | Kaskade: Graph Views for Efficient Graph Analytics | J. M. F. da Trindade, K. Karanasos, C. Curino, S. Madden and J. Shun | In this work, we leverage structural properties of graphs and queries to automatically derive materialized graph views that can dramatically speed up query evaluation. |
18 | Efficient Top-k Edge Structural Diversity Search | Q. Zhang, R. Li, Q. Yang, G. Wang and L. Qin | In this work, we for the first time perform a systematical study for the top-k edge structural diversity search problem on large graphs. |
19 | Adaptive Relation Discovery from Focusing Seeds on Large Networks | Z. Wang, C. Wang, W. Wang, X. Gu, B. Li and D. Meng | To support such applications, a new problem called adaptive relation discovery from focusing seeds (A-RDFS) is proposed and studied in this article. |
20 | Repairing Entities using Star Constraints in Multirelational Graphs | P. Lin, Q. Song, Y. Wu and J. Pi | (1) We propose a class of constraints called star functional dependencies (StarFDs). |
21 | Practical Anonymous Subscription with Revocation Based on Broadcast Encryption | X. Yi, R. Paulet, E. Bertino and F. Rao | In this paper we consider the problem where a client wishes to subscribe to some product or service provided by a server, but maintain their anonymity. |
22 | SVkNN: Efficient Secure and Verifiable k-Nearest Neighbor Query on the Cloud Platform* | N. Cui, X. Yang, B. Wang, J. Li and G. Wang | In this paper, we study the problem of secure and verifiable k nearest neighbor query (SVkNN). |
23 | An Anomaly Detection System for the Protection of Relational Database Systems against Data Leakage by Application Programs | D. Fadolalkarim, E. Bertino and A. Sallam | In this paper, we propose AD-PROM, an Anomaly Detection system that aims at protecting relational database systems against malicious/compromised applications PROgraMs aiming at stealing data. |
24 | SFour: A Protocol for Cryptographically Secure Record Linkage at Scale | B. Khurram and F. Kerschbaum | To this end, we propose the first known efficient PRL protocol that runs in subquadratic time, provides high accuracy, and guarantees cryptographic security in the semi-honest security model. |
25 | I/O Efficient Approximate Nearest Neighbour Search based on Learned Functions | M. Li, Y. Zhang, Y. Sun, W. Wang, I. W. Tsang and X. Lin | In this paper, we introduce a novel data-sensitive indexing and query processing framework for ANNS with an emphasis on optimizing the I/O efficiency, especially, the sequential I/Os. |
26 | Efficient Query Processing with Optimistically Compressed Hash Tables & Strings in the USSR | T. Gubner, V. Leis and P. Boncz | In this work, we propose three complementary techniques to improve this representation: Domain-Guided Prefix Suppression bit-packs keys and values tightly to reduce hash table record width. |
27 | UniKV: Toward High-Performance and Scalable KV Storage in Mixed Workloads via Unified Indexing | Q. Zhang, Y. Li, P. P. C. Lee, Y. Xu, Q. Cui and L. Tang | We design UniKV, which unifies the key design ideas of hash indexing and the LSM-tree in a single system. |
28 | Improved Correlated Sampling for Join Size Estimation | T. Wang and C. Chan | Based on this framework, we propose a new correlated sampling based technique to address the limitations of existing techniques. |
29 | MESSI: In-Memory Data Series Indexing | B. Peng, P. Fatourou and T. Palpanas | In this work, we propose MESSI, the first data series index designed for in-memory operation on modern hardware. |
30 | Spatial Transition Learning on Road Networks with Deep Probabilistic Models | X. Li, G. Cong and Y. Cheng | In this paper, we study the problem of predicting the most likely traveling route on the road network between two given locations by considering the real-time traffic. |
31 | Active Model Selection for Positive Unlabeled Time Series Classification | S. Liang, Y. Zhang and J. Ma | Focusing on the widely adopted self-training one-nearest-neighbor (ST-1NN) paradigm, we propose a model selection framework based on active learning (AL). |
32 | Neighbor Profile: Bagging Nearest Neighbors for Unsupervised Time Series Mining | Y. He, X. Chu and Y. Wang | Thereafter, we point out the inherent three issues: low-quality density estimation, gravity defiant behavior, and lack of reusable model, which deteriorate the performance of matrix profile in both efficiency and subsequence quality.To overcome these issues, we propose Neighbor Profile to robustly model the subsequence density by bagging nearest neighbors for the discovery of frequent/rare subsequences. |
33 | Massively-Parallel Change Detection for Satellite Time Series Data with Missing Values | F. Gieseke, S. Rosca, T. Henriksen, J. Verbesselt and C. E. Oancea | In this work, we propose a novel massively-parallel implementation for a state-of-the-art change detection method and demonstrate its potential in the context of monitoring deforestation. |
34 | Skyline Cohesive Group Queries in Large Road-social Networks | Q. Li, Y. Zhu and J. X. Yu | In this paper, we take a new approach to consider the constraints equally and study a skyline query. |
35 | Anchored Vertex Exploration for Community Engagement in Social Networks | T. Cai, J. Li, N. A. Hasan Haldar, A. Mian, J. Yearwood and T. Sellis | Given a set of keywords W, a structure cohesive parameter k, and a budget parameter l, our objective is to find l number of users who can induce a maximal expanded community. |
36 | Optimizing Knowledge Graphs through Voting-based User Feedback | R. Yang, X. Lin, J. Xu, Y. Yang and L. He | To address these issues, in this paper, we propose an interactive framework that refines and optimizes knowledge graphs through user votes. |
37 | AutoSF: Searching Scoring Functions for Knowledge Graph Embedding | Y. Zhang, Q. Yao, W. Dai and L. Chen | In this paper, inspired by the recent success of automated machine learning (AutoML), we propose to automatically design SFs (AutoSF) for distinct KGs by the AutoML techniques. |
38 | Semantic Guided and Response Times Bounded Top-k Similarity Search over Knowledge Graphs | Y. Wang, A. Khan, T. Wu, J. Jin and H. Yan | In this paper, we propose a semantic-guided and response-time-bounded graph query to return the top-k answers effectively and efficiently. |
39 | PPKWS: An Efficient Framework for Keyword Search on Public-Private Networks | J. Jiang, X. Huang, B. Choi, J. Xu, S. S. Bhowmick and L. Xu | In this paper, we propose a new keyword search framework, called public-private keyword search (PPKWS). |
40 | Privacy-preserving Real-time Anomaly Detection Using Edge Computing | S. Mehnaz and E. Bertino | We propose a privacy-preserving framework that enables efficient anomaly detection on encrypted data by leveraging a lightweight and aggregation optimized encryption scheme to encrypt the data before off-loading the data to the edge. |
41 | To Warn or Not to Warn: Online Signaling in Audit Games | C. Yan, H. Xu, Y. Vorobeychik, B. Li, D. Fabbri and B. A. Malin | In this paper, we formalize this auditing problem as a Signaling Audit Game (SAG), in which we model the interactions between an auditor and an attacker in the context of signaling and the usability cost is represented as a factor of the auditor?s payoff. |
42 | One-sided Differential Privacy | I. Kotsogiannis, S. Doudalis, S. Haney, A. Machanavajjhala and S. Mehrotra | In this work we introduce one-sided differential privacy (OSDP) that offers provable privacy guarantees to the sensitive records. |
43 | Providing Input-Discriminative Protection for Local Differential Privacy | X. Gu, M. Li, L. Xiong and Y. Cao | In this paper, we tackle the challenge of providing input-discriminative protection to reflect the distinct privacy requirements of different inputs. |
44 | Differentially Private Online Task Assignment in Spatial Crowdsourcing: A Tree-based Approach | Q. Tao, Y. Tong, Z. Zhou, Y. Shi, L. Chen and K. Xu | In this paper, we investigate privacy protection for online task assignment with the objective of minimizing the total distance, an important task assignment formulation in spatial crowdsourcing. |
45 | ChainLink: Indexing Big Time Series Data For Long Subsequence Matching | N. Alghamdi, L. Zhang, H. Zhang, E. A. Rundensteiner and M. Y. Eltabakh | In this work, we propose a lightweight distributed indexing framework, called ChainLink, that supports approximate kNN queries over TB-scale time series data. |
46 | Random Sampling for Group-By Queries | T. D. Nguyen, M. Shih, S. S. Parvathaneni, B. Xu, D. Srivastava and S. Tirthapura | We present CVOPT, a query- and data-driven sampling framework for a set of group-by queries. |
47 | PA-Tree: Polled-Mode Asynchronous B+ Tree for NVMe | L. Wang, Z. Zhang, B. He and Z. Zhang | To tackle this problem, we propose PA-Tree, an NVMe-friendly B+ Tree with a novel, polled-mode, asynchronous execution paradigm to process multiple index operations in an interleaved and asynchronous manner. |
48 | Distributed Streaming Set Similarity Join | J. Yang, W. Zhang, X. Wang, Y. Zhang and X. Lin | In this paper, we study the problem of efficient stream set similarity join over distributed systems, which has broad applications in data cleaning and data integration tasks, such as on-line near-duplicate detection. |
49 | Cool, a COhort OnLine analytical processing system | Z. Xie, H. Ying, C. Yue, M. Zhang, G. Chen and B. C. Ooi | In this paper, we present Cool, a cohort online analytical processing system. |
50 | TransN: Heterogeneous Network Representation Learning by Translating Node Embeddings | Z. Li et al. | To address this problem, in this paper, we propose TransN, a novel multi-view network embedding framework for heterogeneous networks. |
51 | An Adaptive Master-Slave Regularized Model for Unexpected Revenue Prediction Enhanced with Alternative Data | J. Xu, J. Zhou, Y. Jia, J. Li and X. Hui | Thus we proposed an adaptive master-slave regularized model, called AMS for short, to effectively leverage alternative data for unexpected revenue prediction. |
52 | Exact and Consistent Interpretation of Piecewise Linear Models Hidden behind APIs: A Closed Form Solution | Z. Cong, L. Chu, L. Wang, X. Hu and J. Pei | In this paper, we propose an elegant closed form solution named OpenAPI to compute exact and consistent interpretations for the family of Piecewise Linear Models (PLM), which includes many popular classification models. |
53 | Statistical Estimation of Diffusion Network Topologies | K. Han, Y. Tian, Y. Zhang, L. Han, H. Huang and Y. Gao | In this work, we investigate the problem of how to infer the topology of a diffusion network from only the final infection statuses of nodes. |
54 | Multiple Dense Subtensor Estimation with High Density Guarantee | Q. Duong, H. Ramampiaro and K. N?rv?g | We address these drawbacks by providing both theoretical and practical solution for estimating multiple dense subtensors in tensor data. |
55 | Efficient Approximation Algorithms for Adaptive Target Profit Maximization | K. Huang, J. Tang, X. Xiao, A. Sun and A. Lim | To acquire an overall understanding, we study the adaptive TPM problem under both the oracle model and the noise model, and propose ADG and AddATP algorithms to address them with strong theoretical guarantees, respectively. |
56 | Efficient Bitruss Decomposition for Large-scale Bipartite Graphs | K. Wang, X. Lin, L. Qin, W. Zhang and Y. Zhang | In this paper, we study the bitruss decomposition problem which aims to find all the k-bitrusses for k = 0. |
57 | Kaleido: An Efficient Out-of-core Graph Mining System on A Single Machine | C. Zhao, Z. Zhang, P. Xu, T. Zheng and J. Guo | In this paper, we present Kaleido, an efficient single machine, out-of-core graph mining system which treats disks as an extension of memory. |
58 | Finding the Best k in Core Decomposition: A Time and Space Optimal Solution | D. Chu et al. | In this paper, given a graph and a scoring metric, we aim to efficiently find the best value of k such that the score of the k-core (or k-core set) is the highest. |
59 | Updates-Aware Graph Pattern based Node Matching | G. Sun, G. Liu, Y. Wang and X. Zhou | In this paper, we first analyze and detect the elimination relationships between the updates. Then, we construct an Elimination Hierarchy Tree (EH-Tree) to index these elimination relationships. |
60 | Dataset Discovery in Data Lakes | A. Bogatu, A. A. A. Fernandes, N. W. Paton and N. Konstantinou | We refer to this as the problem of dataset discovery in data lakes and this paper contributes an effective and efficient solution to it. |
61 | Swapping Repair for Misplaced Attribute Values | Y. Sun, S. Song, C. Wang and J. Wang | In a holistic view of all (swapped) attributes, we propose to evaluate the likelihood of a swapping repaired tuple by studying its distances (similarity) to neighbors. |
62 | Interactive Cleaning for Progressive Visualization through Composite Questions | Y. Luo, C. Chai, X. Qin, N. Tang and G. Li | In this paper, we study the problem of interactive cleaning for progressive visualization (ICPV): Given a bad visualization V , it is to obtain a “cleaned” visualization V whose distance is far from V , under a given (small) budget w.r.t. human cost. |
63 | User-driven Error Detection for Time Series with Events | K. Le and P. Papotti | In this work, we exploit active learning to detect both errors and events in a single solution that aims at minimizing user interaction. |
64 | An Agile Sample Maintenance Approach for Agile Analytics | H. Zhang, Y. Zhang, Z. He, Y. Jing, K. Zhang and X. S. Wang | This paper proposes an adaptive sample update (ASU) approach that avoids re-sampling from scratch as much as possible by monitoring the data distribution, and uses instead an incremental update method before a re-sampling becomes necessary. |
65 | Continuously Tracking Core Items in Data Streams with Probabilistic Decays | J. Zhao, P. Wang, J. Tao, S. Zhang and J. C. S. Lui | This paper investigates the core items tracking (CIT) problem where the goal is to continuously track representative items, called core items, in a data stream so to best represent/summarize the stream. |
66 | The Art of Efficient In-memory Query Processing on NUMA Systems: a Systematic Approach | P. Memarzia, S. Ray and V. C. Bhavsar | In this work, we evaluate a variety of strategies that aim to accelerate memory-intensive data analytics workloads on NUMA systems. |
67 | Speeding Up GED Verification for Graph Similarity Search | L. Chang, X. Feng, X. Lin, L. Qin, W. Zhang and D. Ouyang | In this paper, we aim to speed up GED verification, which is orthogonal to the index structures used in the filtering phase. |
68 | Scaling Out Schema-free Stream Joins | D. Gjurovski and S. Michel | In this work, we consider computing natural joins over massive streams of JSON documents that do not adhere to a specific schema. |
69 | Contribution Maximization in Probabilistic Datalog | T. Milo, Y. Moskovitch and B. Youngmann | To overcome this, we propose an optimized algorithm which injects a refined variant of the classic Magic Sets technique, integrated with a sampling method, into IM algorithms, achieving a significant saving of space and execution time. |
70 | Multiscale Frequent Co-movement Pattern Mining | S. Helmi and F. Banaei-Kashani | In this paper, we propose a novel and efficient framework for co-movement pattern mining. |
71 | Self-paced Ensemble for Highly Imbalanced Massive Data Classification | Z. Liu et al. | Taking those factors into consideration, we propose a novel framework for imbalance classification that aims to generate a strong ensemble by self-paced harmonizing data hardness via under-sampling. |
72 | SAN : Scale-Space Attention Networks | Y. Garg, K. S. Candan and M. L. Sapino | We propose an innovative robust feature learning framework, scale-invariant attention networks (SAN), that identifies salient regions in the input data for the CNN to focus on. |
73 | A Novel Approach to Learning Consensus and Complementary Information for Multi-View Data Clustering | K. Luong and R. Nayak | We propose a novel optimal manifold for multi-view data which is the most consensed manifold embedded in the high-dimensional multi-view data. |
74 | Summarizing Hierarchical Multidimensional Data | A. Kim, L. V. S. Lakshmanan and D. Srivastava | In this paper, we propose Tree Summaries, which attain this challenging goal over arbitrary hierarchical multidimensional data sets. |
75 | Efficient Team Formation in Social Networks based on Constrained Pattern Graph | Y. Kou et al. | In order to solve this problem, we present an efficient team formation method based on Constrained Pattern Graph (called CPG). |
76 | Effective and Efficient Truss Computation over Large Heterogeneous Information Networks | Y. Yang, Y. Fang, X. Lin, W. Zhang and Y. Fang | In this paper, we study the problem of truss computation over HINs, which aims to find groups of vertices that are of the same type and densely connected. |
77 | Index-Free Approach with Theoretical Guarantee for Efficient Random Walk with Restart Query | D. Lin, R. C. Wong, M. Xie and V. J. Wei | Motivated by this, in this paper, we propose an index-free algorithm called Residue-Accumulated approach (ResAcc) which returns answers with a theoretical guarantee efficiently. |
78 | Optimization of GPU-based Sparse Matrix Multiplication for Large Sparse Networks | J. Lee, S. Kang, Y. Yu, Y. Jo, S. Kim and Y. Park | To solve these challenges, this paper proposes a new optimization pass called Block Reorganizer, which balances the total computations of each computing unit on target GPUs, based on the outer-product-based expansion process, and reduces the memory pressure during the merge process. |
79 | VAC: Vertex-Centric Attributed Community Search | Q. Liu, Y. Zhu, M. Zhao, X. Huang, J. Xu and Y. Gao | To make up for these deficiencies, in this paper, we study a novel attributed community search called vertex-centric attributed community (VAC) search. |
80 | Online Anomalous Trajectory Detection with Deep Generative Sequence Modeling | Y. Liu, K. Zhao, G. Cong and Z. Bao | To this end, we propose a novel model, namely Gaussian Mixture Variational Sequence AutoEncoder (GM-VSAE), to tackle these challenges. |
81 | Mobility-Aware Dynamic Taxi Ridesharing | Z. Liu, Z. Gong, J. Li and K. Wu | In this paper, we consider the mobility-aware taxi ridesharing problem, and present mT- Share to address these limitations. |
82 | Online Trichromatic Pickup and Delivery Scheduling in Spatial Crowdsourcing | B. Zheng et al. | In order to quickly respond to submitted tasks, we propose a greedy solution that finds the schedule with the highest utility-cost ratio. |
83 | Task Allocation in Dependency-aware Spatial Crowdsourcing | W. Ni, P. Cheng, L. Chen and X. Lin | In this paper, we consider a spatial crowdsourcing scenario, where the tasks may have some dependencies among them. |
84 | Parallel Semantic Trajectory Similarity Join | L. Chen, S. Shang, C. S. Jensen, B. Yao and P. Kalnis | We consider the problem of semantic trajectory similarity join (STS-Join). |
85 | Being Happy with the Least: Achieving a-happiness with Minimum Number of Tuples | M. Xie, R. C. Wong, P. Peng and V. J. Tsotras | In this paper, we study the min-size version of the regret minimization query; that is, we want to determine the least tuples needed to keep users happy at a given level. |
86 | Improving Neural Relation Extraction with Implicit Mutual Relations | J. Kuang, Y. Cao, J. Zheng, X. He, M. Gao and A. Zhou | In contrast to existing distant supervision approaches that suffer from insufficient training corpora to extract relations, our proposal of mining implicit mutual relation from the massive unlabeled corpora transfers the semantic information of entity pairs into the RE model, which is more expressive and semantically plausible. |
87 | SONG: Approximate Nearest Neighbor Search on GPU | W. Zhao, S. Tan and P. Li | In this paper, we present a novel framework that decouples the searching on graph algorithm into 3 stages, in order to parallel the performance-crucial distance computation. |
88 | R2LSH: A Nearest Neighbor Search Scheme Based on Two-dimensional Projected Spaces | K. Lu and M. Kudo | In this paper, we propose a novel and easy-to-implement disk- based method named R2LSH to answer ANN queries in highdimensional spaces. |
89 | Online Indices for Predictive Top-k Entity and Aggregate Queries on Knowledge Graphs | Y. Li, T. Ge and C. Chen | To improve query processing efficiency, we propose an incremental index on top of low dimensional entity vectors transformed from network embedding vectors. |
90 | Enabling Efficient Random Access to Hierarchically-Compressed Data | F. Zhang, J. Zhai, X. Shen, O. Mutlu and X. Du | This paper presents a set of techniques that successfully eliminate the limitation, and for the first time, establishes the feasibility of effectively handling both data traversal operations and random data accesses on hierarchically-compressed data. |
91 | Adaptive Top-k Overlap Set Similarity Joins | Z. Yang, B. Zheng, G. Li, X. Zhao, X. Zhou and C. S. Jensen | To avoid this problem, we propose a solution to the top-k overlap set similarity join (TkOSSJ) that returns k pairs of sets with the highest overlap similarities. |
92 | Load Shedding for Complex Event Processing: Input-based and State-based Techniques | B. Zhao, N. Q. Viet Hung and M. Weidlich | In this work, we therefore complement input-based load shedding with a statebased technique that discards partial matches. |
93 | SPEAr: Expediting Stream Processing with Accuracy Guarantees | N. R. Katsipoulakis, A. Labrinidis and P. K. Chrysanthis | We built SPEAr on top of Storm and our experiments indicate that it can reduce processing times by more than an order of magnitude, use more than an order of magnitude less memory, and offer accuracy guarantees in real-world benchmarks. |
94 | Temporal Network Representation Learning via Historical Neighborhoods Aggregation | S. Huang, Z. Bao, G. Li, Y. Zhou and J. S. Culpepper | In this paper, we propose the Embedding via Historical Neighborhoods Aggregation (EHNA) algorithm. |
95 | An Interval-centric Model for Distributed Computing over Temporal Graphs | S. Gandhi and Y. Simmhan | We propose an interval-centric computing model (ICM) for distributed and iterative processing of temporal graphs, where a vertex?s time-interval is a unit of data-parallel computation. |
96 | CrashSim: An Efficient Algorithm for Computing SimRank over Static and Temporal Graphs | M. Li, F. M. Choudhury, R. Borovica-Gajic, Z. Wang, J. Xin and J. Li | In this work, we first propose CrashSim, an index-free algorithm for single-source SimRank computation in static graphs. |
97 | Efficiently Answering Span-Reachability Queries in Large Temporal Graphs | D. Wen, Y. Huang, Y. Zhang, L. Qin, W. Zhang and X. Lin | In this paper, we define a new reachability model, called span-reachability, designed to relax the time order dependency and identify the relationship between entities in a given time period. |
98 | Turbocharging Geospatial Visualization Dashboards via a Materialized Sampling Cube Approach | J. Yu and M. Sarwat | In this paper, we present a middleware framework that runs on top of a SQL data system with the purpose of increasing the interactivity of geospatial visualization dashboards. |
99 | Sya: Enabling Spatial Awareness inside Probabilistic Knowledge Base Construction | I. Sabek and M. F. Mokbel | This paper presents Sya; the first spatial probabilistic knowledge base construction system, based on Markov Logic Networks (MLN). |
100 | Fast Query Decomposition for Batch Shortest Path Processing in Road Networks | L. Li, M. Zhang, W. Hua and X. Zhou | In this paper, we aim to improve the performance of batch shortest path algorithms by revisiting the problem of query clustering. |
101 | Efficient Attribute-Constrained Co-Located Community Search | J. Luo, X. Cao, X. Xie, Q. Qu, Z. Xu and C. S. Jensen | We study the problem of attribute-constrained co-located community (ACOC) search, which returns a community that satisfies three properties: i) structural cohesiveness: the members in the community are densely connected; ii) spatial co-location: the members are close to each other; and iii) attribute constraint: a set of attributes are covered by the attributes associated with the members. |
102 | Indoor Top-k Keyword-aware Routing Query | Z. Feng, T. Liu, H. Li, H. Lu, L. Shou and J. Xu | In this paper, we study the indoor top-k keyword-aware routing query (IKRQ). |
103 | Latte: A Native Table Engine On Nvme Storage | J. Chu, Y. Tu, Y. Zhang and C. Weng | To fully exploit the hardware potential of NVMe devices, we propose a lightweight native storage stack called Lightstack to minimize the software overhead. |
104 | Doubleheader Logging: Eliminating Journal Write Overhead for Mobile DBMS | S. Oh, W. Kim, J. Seo, H. Song, S. H. Noh and B. Nam | In this work, we propose a crash consistent in-place update logging method – doubleheader logging (DHL) for SQLite. |
105 | GSI: GPU-friendly Subgraph Isomorphism | L. Zeng, L. Zou, M. T. ?zsu, L. Hu and F. Zhang | In this paper, we propose a GPU-friendly subgraph isomorphism algorithm, GSI. |
106 | FPGA-based Compaction Engine for Accelerating LSM-tree Key-Value Stores | X. Sun, J. Yu, Z. Zhou and C. J. Xue | In this paper, we design and implement an FPGA-based compaction engine to accelerate compaction in LSM-tree based key-value stores. |
107 | Getting Swole: Generating Access-Aware Code with Predicate Pullups | A. Crotty, A. Galakatos and T. Kraska | Therefore, we propose SWOLE, the first access-aware code generation strategy. |
108 | Video Monitoring Queries | N. Koudas, R. Li and I. Xarchakos | In particular we introduce a set of approximate filters to speed up queries that involve objects of specific type (e.g., cars, trucks, etc.) on video frames with associated spatial relationships among them (e.g., car left of truck). |
109 | Reinforcement Learning with Tree-LSTM for Join Order Selection | X. Yu, G. Li, C. Chai and N. Tang | In this paper, we present RTOS, a novel learned optimizer that uses Reinforcement learning with Tree-structured long short-term memory (LSTM) for join Order Selection. |
110 | Approximate Query Processing for Data Exploration using Deep Generative Models | S. Thirumuruganathan, S. Hasan, N. Koudas and G. Das | In this work, we explore the usage of deep learning (DL) for answering aggregate queries specifically for interactive applications such as data exploration and visualization. |
111 | SuRF: Identification of Interesting Data Regions with Surrogate Models | F. Savva, C. Anagnostopoulos and P. Triantafillou | This paper studies the reverse problem: analysts provide a cut-off value for a statistic of interest and in turn our proposed framework efficiently identifies multidimensional regions whose statistic exceeds (or is below) the given cut-off value (according to user?s needs). |
112 | Two-Level Data Compression using Machine Learning in Time Series Database | X. Yu et al. | In this paper, we propose a two-level compression model that selects a proper compression scheme for each individual point, so that diverse patterns can be captured at a fine granularity. |
113 | SeeMoRe: A Fault-Tolerant Protocol for Hybrid Cloud Environments | M. J. Amiri, S. Maiyya, D. Agrawal and A. E. Abbadi | In this paper, we consider a private cloud consisting of nonmalicious nodes (crash-only failures) and a public cloud with possible malicious failures. |
114 | On Sharding Open Blockchains with Smart Contracts | Y. Tao, B. Li, J. Jiang, H. C. Ng, C. Wang and B. Li | In this paper, we propose, analyze, and implement a new distributed and dynamic sharding system to substantially improve the throughput of blockchain systems based on smart contracts, while requiring minimum cross-shard communication. |
115 | G-thinker: A Distributed Framework for Mining Subgraphs in a Big Graph | D. Yan, G. Guo, M. M. Rahman Chowdhury, M. Tamer ?zsu, W. Ku and J. C. S. Lui | We propose the first truly CPU-bound distributed framework called G-thinker that adopts a user-friendly subgraph-centric vertex-pulling API for writing distributed subgraph mining algorithms. |
116 | DynaMast: Adaptive Dynamic Mastering for Replicated Systems | M. Abebe, B. Glasbergen and K. Daudjee | We present DynaMast, a lazily replicated, multi-master database system that guarantees one-site transaction execution while effectively distributing both reads and updates among multiple sites. |
117 | Fela: Incorporating Flexible Parallelism and Elastic Tuning to Accelerate Large-Scale DML | J. Geng, D. Li and S. Wang | Targeting at these existing drawbacks, we propose Fela, which incorporates both flexible parallelism and elastic tuning mechanism to accelerate DML. |
118 | Sequence-Aware Factorization Machines for Temporal Predictive Analytics | T. Chen, H. Yin, Q. V. Hung Nguyen, W. Peng, X. Li and X. Zhou | Hence, in this paper, we propose a novel Sequence-Aware Factorization Machine (SeqFM) for temporal predictive analytics, which models feature interactions by fully investigating the effect of sequential dependencies. |
119 | Stochastic Origin-Destination Matrix Forecasting Using Dual-Stage Graph Convolutional, Recurrent Neural Networks | J. Hu, B. Yang, C. Guo, C. S. Jensen and H. Xiong | To solve this problem, we propose a generic learning framework that (i) employs matrix factorization and graph convolutional neural networks to contend with the data sparseness while capturing spatial correlations and that (ii) captures spatio-temporal dynamics via recurrent neural networks extended with graph convolutions. |
120 | Query Results over Ongoing Databases that Remain Valid as Time Passes By | Y. M?lle and M. H. B?hlen | We propose a solution that keeps ongoing time points uninstantiated during query processing. |
121 | Indoor Mobility Semantics Annotation Using Coupled Conditional Markov Networks | H. Li, H. Lu, M. A. Cheema, L. Shou and G. Chen | This work studies the annotation of indoor mobility semantics that describe an object’s mobility event (what ) at a semantic indoor region (where ) during a time period (when ). |
122 | Towards Factorized SVM with Gaussian Kernels over Normalized Data | K. Yang, Y. Gao, L. Liang, B. Yao, S. Wen and G. Chen | In this paper, we focus on the factorized SVM with gaussian kernels over normalized data. |
123 | FESIA: A Fast and SIMD-Efficient Set Intersection Approach on Modern CPUs | J. Zhang, Y. Lu, D. G. Spampinato and F. Franchetti | In this paper, we present FESIA, a new set intersection approach on modern CPUs. |
124 | Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory | P. Fent, A. v. Renen, A. Kipf, V. Leis, T. Neumann and A. Kemper | We propose L5, a high-performance communication layer for database systems. |
125 | ML-based Cross-Platform Query Optimization | Z. Kaoudi, J. Quian?-Ruiz, B. Contreras-Rojas, R. Pardo-Meza, A. Troudi and S. Chawla | We overcome these challenges in Robopt, a novel vector-based optimizer we have built for Rheem, a cross-platform system. |
126 | Automatic View Generation with Deep Learning and Reinforcement Learning | H. Yuan, G. Li, L. Feng, J. Sun and Y. Han | To address this problem, we propose an automatic view generation method which judiciously selects “highly beneficial” subqueries to generate materialized views. |
127 | C olumnSGD: A Column-oriented Framework for Distributed Stochastic Gradient Descent | Z. Zhang, W. Wu, J. Jiang, L. Yu, B. Cui and C. Zhang | Following this locality property, we develop a simple yet powerful computation framework that significantly reduces communication overheads and memory footprints compared to RowSGD, for large-scale ML models such as generalized linear models (GLMs) and factorization machines (FMs). |
128 | In-database connected component analysis | H. B?geholz, M. Brand and R. Todor | We describe a Big Data-practical, SQL-implementable algorithm for efficiently determining connected components for graph data stored in a Massively Parallel Processing (MPP) relational database. |
129 | Towards Concurrent Stateful Stream Processing on Multicore Processors | S. Zhang, Y. Wu, F. Zhang and B. He | This paper introduces TStream, a novel DSPS supporting efficient concurrent state access on multicore processors. |
130 | PSGraph: How Tencent trains extremely large-scale graphs with Spark? | J. Jiang et al. | To address these challenges, we develop a new graph processing system, called PSGraph, which uses Spark executor and PyTorch to perform calculation, and develops a distributed parameter server to store frequently accessed models. |
131 | JUST: JD Urban Spatio-Temporal Data Engine | R. Li et al. | This paper presents JUST, i.e., JD Urban Spatio-Temporal data engine, which can efficiently manage big spatio-temporal data in a convenient way. |
132 | Oracle Database In-Memory on Active Data Guard: Real-time Analytics on a Standby Database | S. Pendse et al. | In this paper, we explore and address the key challenges involved in building the DBIM-on-ADG infrastructure, including synchronized maintenance of the In-Memory Column Store on the Standby database, with high-speed OLTP activity continuously modifying data on the Primary database. |
133 | Data Sentinel: A Declarative Production-Scale Data Validation Platform | A. Swami, S. Vasudevan and J. Huyn | The contributions of this paper include the following: 1) Data Sentinel, a declarative production-scale data validation platform successfully deployed at LinkedIn 2) A generic design to build and deploy similar systems for production environments 3) Experiences and lessons learned that can benefit practitioners with similar objectives. |
134 | Turbine: Facebook?s Service Management Platform for Stream Processing | Y. Mei et al. | This paper describes the Turbine architecture, discusses the design choices behind it, and shares several case studies demonstrating Turbine capabilities in production. |
135 | Speed Kit: A Polyglot & GDPR-Compliant Approach For Caching Personalized Content | W. Wingerath et al. | In this paper, we present Speed Kit as a radically different approach for content distribution that combines (1) a polyglot architecture for efficiently caching personalized content with (2) a natively GDPR-compliant client proxy that handles all sensitive information within the user device. |
136 | De-Health: All Your Online Health Information Are Belong to Us | S. Ji et al. | In this paper, we study the privacy of online health data. |
137 | Maxson: Reduce Duplicate Parsing Overhead on Raw Data | X. Shi et al. | In this paper, we start with a study with a real production workload in Alibaba, which consists of over 3 million queries on JSON. |
138 | Automatic Calibration of Road Intersection Topology using Trajectories | L. Zhao et al. | To address above challenges, we propose a three-phase calibration framework, called CITT. |
139 | SAFE: Scalable Automatic Feature Engineering Framework for Industrial Tasks | Q. Shi, Y. Zhang, L. Li, X. Yang, M. Li and J. Zhou | In this paper, we proposed a staged method named SAFE (Scalable Automatic Feature Engineering), which can provide excellent efficiency and scalability, along with requisite interpretability and promising performance. |
140 | Cross-Graph Convolution Learning for Large-Scale Text-Picture Shopping Guide in E-Commerce Search | T. Zhang et al. | In this work, a new e-commerce search service named text-picture shopping guide (TPSG) is investigated and deployed to one of the most popular shopping platforms called Taobao. |
141 | Billion-scale Recommendation with Heterogeneous Side Information at Taobao | A. Pfadler, H. Zhao, J. Wang, L. Wang, P. Huang and D. L. Lee | To address these challenges, in this work, we present a flexible and highly scalable Side Information (SI) enhanced Skip-Gram (SISG) framework, which is deployed at Taobao. |
142 | Hierarchical Bipartite Graph Neural Networks: Towards Large-Scale E-commerce Applications | Z. Li et al. | To address these problems, we propose a novel method with Hierarchical bipartite Graph Neural Network (HiGNN) to handle large-scale e-commerce tasks. |
143 | LoCEC: Local Community-based Edge Classification in Large Online Social Networks | C. Song et al. | To tackle the challenges, we propose a Local Community-based Edge Classification (LoCEC) framework that classifies user relationships in a social network into real-world social connection types. |
144 | APTrace: A Responsive System for Agile Enterprise Level Causality Analysis | J. Gui et al. | In this paper, we propose a novel system, APTrace, to meet both of the above requirements. |
145 | HomoPAI: A Secure Collaborative Machine Learning Platform based on Homomorphic Encryption | Q. Li et al. | We propose HomoPAI, an HE-based secure collaborative machine learning system, enabling a more promising scenario, where data from multiple data owners could be securely processed. |
146 | ForkBase: Immutable, Tamper-evident Storage Substrate for Branchable Applications | Q. Lin et al. | To this end, we developed ForkBase to make Git for data practical. ForkBase is a distributed, immutable storage system designed for data version management and data collaborative operation. |
147 | MC-Explorer: Analyzing and Visualizing Motif-Cliques on Large Networks | B. Li et al. | To extract insight from this rich information source, we propose MC-Explorer, which is an advanced analysis and visualization system. |
148 | JODA: A Vertically Scalable, Lightweight JSON Processor for Big Data Transformations | N. Sch?fer and S. Michel | We describe the demonstration of JODA (Json On Demand Analytics), an approach to handling large amounts of JSON documents in a vertically scalable manner. |
149 | vCBIR: A Verifiable Search Engine for Content-Based Image Retrieval | S. Guo, Y. Ji, C. Zhang, C. Xu and J. Xu | We demonstrate vCBIR, a verifiable search engine for Content-Based Image Retrieval. |
150 | MusX: Online Exploring and Visualizing Graph-Based Musical Adaptations | F. L?vesque, M. St-Germain, D. Pich?, J. Gauvin, M. Gagnon and T. Hurtut | In this paper, we present a detailed description of MusX along with design and technical considerations, and the demonstration scenarios we intend to present to the audience. |
151 | RIDE: A System for Generalized Region of Interest Discovery and Exploration | Q. Liu, L. Zheng and L. Chen | To address the challenge of conducting ROI queries on the increasingly complex spatial data, we present RIDE, an efficient and effective system for generalized ROI Discovery and Exploration. |
152 | PocketView: A Concise and Informative Data Summarizer | Y. Xi, N. Wang, S. Hao, W. Yang and L. Li | In this demonstration, we propose a summarizer system called PocketView that is able to create a data summarization through a pocket view of the table. |
153 | CSQ System: A System to Support Constrained Skyline Queries on Transportation Networks | Q. Gong, J. Liu and H. Cao | In this paper, we present a system to answer MCTN-constrained CSQs, namely CSQ system. |
154 | SUDAF: Sharing User-Defined Aggregate Functions | C. Zhang, F. Toumani and B. Doreau | We present SUDAF (Sharing User-Defined Aggregate Functions), a declarative framework that allows users to formulate UDAFs as mathematical expressions and use them in SQL statements. |
155 | Automating Software Citation using GitCite | L. Chen and S. B. Davidson | This paper presents GitCite, a model for software citation with version control which enables citations to be inferred for any project component based on a small number of explicit citations attached to subdirectories/files, and an implementation that integrates with Git and GitHub. |
156 | SCLPD: Smart Cargo Loading Plan Decision Framework | J. Liu, J. Mao, J. Liao, H. Hu, Y. Guo and A. Zhou | This paper puts forward a system implementation of smart cargo loading plan decision framework (SCLPD for short) for steel logistics industry. |
157 | DCDT: A Digital Clock Drawing Test System for Cognitive Impairment Screening | F. Xu, Y. Ding, Z. Ling, X. Li, Y. Li and S. Wang | In this paper, we?d like to introduce DCDT, a novel Clock Drawing Test system based on digital collection and intellectualized analysis. |
158 | Kronos: Lightweight Knowledge-based Event Analysis in Cyber-Physical Data Streams | M. H. Namaki et al. | We demonstrate Kronos, a framework and system that automatically extracts highly dynamic knowledge for complex event analysis in Cyber-Physical systems. |
159 | DLEEL: Multi-Predicate Spatial Queries on User-generated Streaming Data | A. Almaslukh, L. Abdelhafeez and A. Magdy | This paper demonstrates DLEEL; a research system that supports scalable spatial queries with multiple predicates on user-generated data streams, such as social media streams. |
160 | Querying Streaming System Monitoring Data for Enterprise System Anomaly Detection | P. Gao et al. | In the demo, we aim to show the complete usage scenario of SAQL by (1) performing an APT attack in a controlled environment, and (2) using SAQL to detect the abnormal behaviors in real time by querying the collected stream of system monitoring data that contains the attack traces. |
161 | SAD: An Unsupervised System for Subsequence Anomaly Detection | P. Boniol, M. Linardi, F. Roncallo and T. Palpanas | In this demonstration, we present a system for unsupervised Subsequence Anomaly Detection (SAD) that uses the NorM method. |
162 | Machine Learning Meets Big Spatial Data | I. Sabek and M. F. Mokbel | In this 90-minutes tutorial, we comprehensively review the state-of-the-art work in the intersection of machine learning and big spatial data. |
163 | On the Integration of Machine Learning and Array Databases | S. Villarroya and P. Baumann | This tutorial focuses on the integration of machine learning algorithms and array databases. |
164 | Visualization Systems for Linked Datasets | M. Krommyda and V. Kantere | We present here a survey on these techniques, their strengths and weaknesses as well as the datasets that they can support. |
165 | Modern Large-Scale Data Management Systems after 40 Years of Consensus | M. J. Amiri, D. Agrawal and A. E. Abbadi | In this tutorial, we discuss consensus protocols that are used in modern large-scale data management systems, classify them into different categories based on their assumptions on network synchrony, failure model of nodes, etc., and elaborate on their main advantages and limitations. |
166 | Advances in Cryptography and Secure Hardware for Data Outsourcing | S. Sharma, A. Burtsev and S. Mehrotra | This tutorial focuses on recent advances in secure cloud-based data outsourcing based on cryptographic (encryption, secret-sharing, and multi-party computation (MPC)) and hardware-based approaches. |
167 | PushdownDB: Accelerating a DBMS Using S3 Computation | X. Yu et al. | This paper studies the effectiveness of pushing parts of DBMS analytics queries into the Simple Storage Service (S3) of Amazon Web Services (AWS), using a recently released capability called S3 Select. |
168 | Task Deployment Recommendation with Worker Availability | D. Wei, S. B. Roy and S. Amer-Yahia | We propose BatchStrat, an optimization-driven middle layer that recommends deployment strategies to a batch of requests by accounting for worker availability. |
169 | Towards Extracting Highlights From Recorded Live Videos: An Implicit Crowdsourcing Approach | R. Jiang, C. Qu, J. Wang, C. Wang and Y. Zheng | In this paper, we propose LIGHTOR, a novel implicit crowd-sourcing approach to overcome these limitations. |
170 | Crowdsourcing-based Data Extraction from Visualization Charts | C. Chai, G. Li, J. Fan and Y. Luo | In this paper, we introduce a crowdsourcing approach that leverages human ability to extract data from visualization charts. |
171 | Predicting Origin-Destination Flow via Multi-Perspective Graph Convolutional Network | H. Shi et al. | We propose Multi-Perspective Graph Convolutional Networks (MPGCN) to capture the complex dependencies. |
172 | Learning to Simulate Vehicle Trajectories from Demonstrations | G. Zheng, H. Liu, K. Xu and Z. Li | Considering the complexity and non-linearity of the real-world traffic, this paper unprecedentedly treat the problem of traffic simulation as a learning problem, and proposes learning to simulate (L2S) vehicle trajectory. |
173 | FakeDetector: Effective Fake News Detection with Deep Diffusive Neural Network | J. Zhang, B. Dong and P. S. Yu | This paper introduces a novel gated graph neural network, namely FAKEDETECTOR. |
174 | Mining Verb-Oriented Commonsense Knowledge | J. Liu et al. | In this paper, we focus on the automatic acquisition of a typical kind of implicit verb-oriented commonsense knowledge (e.g., “person eats food”), which is the concept level knowledge of verb phrases. |
175 | Automated Anomaly Detection in Large Sequences | P. Boniol, M. Linardi, F. Roncallo and T. Palpanas | In this work, we address these problems, and propose NorM, a novel approach, suitable for domain-agnostic anomaly detection. |
176 | SLED: Semi-supervised Locally-weighted Ensemble Detector | S. Zhang, D. T. Jung Huang, G. Dobbie and Y. S. Koh | In this research, we propose a semi-supervised locally-weighted ensemble detector (SLED), where the relative performance among its base detectors is characterized by a set of weights learned in a semi-supervised manner. |
177 | Hierarchical Quick Shift Guided Recurrent Clustering | M. C. Altinigneli, L. Miklautz, C. B?hm and C. Plant | We propose a novel density-based mode-seeking Hierarchical Quick Shift clustering algorithm with an optional Recurrent Neural Network (RNN) to jointly learn the cluster assignments for every sample and the underlying dynamics of the mode-seeking clustering process. |
178 | Matrix Profile XVII: Indexing the Matrix Profile to Allow Arbitrary Range Queries | Y. Zhu, C. M. Yeh, Z. Zimmerman and E. Keogh | In this work we introduce a novel indexing framework that allows queries about arbitrary ranges to be answered in quasilinear time, allowing such queries to be interactive for the first time. |
179 | D-Tucker: Fast and Memory-Efficient Tucker Decomposition for Dense Tensors | J. Jang and U. Kang | In this paper, we propose D-Tucker, a fast and memory-efficient method for Tucker decomposition on large dense tensors. |
180 | A Unified Framework for Multi-view Spectral Clustering | G. Zhong and C. Pun | To address these, we design a unified multi-view spectral clustering scheme to learn the discrete cluster indicator matrix in one stage. |
181 | Comprehensive and Efficient Data Labeling via Adaptive Model Scheduling | M. Yuan, L. Zhang, X. Li and H. Xiong | In this paper, we present an Adaptive Model Scheduling framework, consisting of 1) a deep reinforcement learning-based approach to predict the value of unexecuted models by mining semantic relationship among diverse models, and 2) two heuristic algorithms to adaptively schedule models under deadline or deadline-memory constraints. |
182 | Target Privacy Preserving for Social Networks | Z. Jiang, L. Sun, P. S. Yu, H. Li, J. Ma and Y. Shen | In this paper, we incorporate the realistic scenario of key protection into link privacy preserving and propose the target-link privacy preserving (TPP) model: target links referred to as targets are the most important and sensitive objectives that would be intentionally attacked by adversaries, in order that need privacy protections, while other links of less privacy concerns are properly released to maintain the graph utility. |
183 | Traffic Incident Detection: A Trajectory-based Approach | X. Han, T. Grubenmann, R. Cheng, S. C. Wong, X. Li and W. Sun | In this paper, we ask the question: Can ID be performed on sparse traffic data (e.g., location data obtained from GPS devices equipped on vehicles)? As these data may not be enough to describe the state of the roads involved, they can undermine the effectiveness of existing ID solutions. |
184 | Collective Entity Alignment via Adaptive Features | W. Zeng, X. Zhao, J. Tang and X. Lin | To fill this gap, we propose a collective EA framework. |
185 | InvaliDB: Scalable Push-Based Real-Time Queries on Top of Pull-Based Databases | W. Wingerath, F. Gessert and N. Ritter | To address these issues, we propose the system design InvaliDB which combines linear read and write scalability for real-time queries with superior query expressiveness and legacy compatibility. |
186 | Discovering Band Order Dependencies | P. Li, J. Szlichta, M. Bohlen and D. Srivastava | We introduce band ODs to model the semantics of attributes that are monotonically related with small variations without there being an intrinsic violation of semantics. |
187 | The Pastwatch: On the usability of provenance data in relational databases | O. AlOmeir, E. Y. Lai, M. Milani and R. Pottinger | We present a set of criteria that any provenance exploration tool must have and introduce Pastwatch, a provenance exploration system that adheres to those criteria. |
188 | Query-driven Repair of Functional Dependency Violations | S. Giannakopoulou, M. Karpathiotakis and A. Ailamaki | We propose an approach that performs probabilistic repair of functional dependency violations on-demand, driven by the exploratory analysis that users perform. |
189 | Outdated Fact Detection in Knowledge Bases | S. Hao, C. Chai, G. Li, N. Tang, N. Wang and X. Yu | In this paper, we propose a novel human-in-the-loop approach for outdated fact detection in KBs. |
190 | Preserving Contextual Information in Relational Matrix Operations | O. Dolmatova, N. Augsten and M. H. B?hlen | We propose relational matrix operations that support the analysis of data stored in tables and that preserve contextual information. |
191 | Stay Ahead of Poachers: Illegal Wildlife Poaching Prediction and Patrol Planning Under Uncertainty with Field Test Evaluations (Short Version) | L. Xu et al. | In this paper, we take an end-to-end approach to the data-to-deployment pipeline for anti-poaching. |
192 | Telescope: An Automatic Feature Extraction and Transformation Approach for Time Series Forecasting on a Level-Playing Field | A. Bauer, M. Z?fle, N. Herbst, S. Kounev and V. Curtef | In this paper, we propose a fully automated machine learning-based forecasting approach. |
193 | Auto-Model: Utilizing Research Papers and HPO Techniques to Deal with the CASH problem | C. Wang, H. Wang, T. Mu, J. Li and H. Gao | In this paper, we design the Auto-Model approach, which makes full use of known information in the related research paper and introduces hyperparameter optimization techniques, to solve the CASH problem effectively. |
194 | Toward Sampling for Deep Learning Model Diagnosis | P. Mehta, S. Portillo, M. Balazinska and A. Connolly | In this paper, we develop a novel data sampling technique that produces approximate but accurate results for these model debugging queries. |
195 | Approximate Quantiles for Datacenter Telemetry Monitoring | G. Lim, M. S. Hassan, Z. Jin, S. Volos and M. Jeon | To address these challenges, we propose AOMG, an efficient and accurate quantile approximation algorithm that capitalizes insights from our workload study. |
196 | An Intersectional Definition of Fairness | J. R. Foulds, R. Islam, K. N. Keya and S. Pan | We propose differential fairness, a multi-attribute definition of fairness in machine learning which is informed by intersectionality, a critical lens arising from the humanities literature, leveraging connections between differential privacy and legal notions of fairness. |
197 | Towards Locally Differentially Private Generic Graph Metric Estimation | Q. Ye, H. Hu, M. H. Au, X. Meng and X. Xiao | In this paper, we address these two issues by presenting LF-GDPR, the first LDP-enabled graph metric estimation framework for graph analysis. |
198 | BFT-Store: Storage Partition for Permissioned Blockchain via Erasure Coding | X. Qi, Z. Zhang, C. Jin and A. Zhou | This paper proposes a novel storage engine, called BFT-Store, to enhance storage scalability by integrating erasure coding with Byzantine Fault Tolerance (BFT) consensus protocol. |
199 | Reasoning about the Future in Blockchain Databases | S. Cohen, A. Rosenthal and A. Zohar | The main issue that we tackle is the need to reason about possible worlds, due to the uncertainty in transaction appending. |
200 | Deciding When to Trade Data Freshness for Performance in MongoDB-as-a-Service | C. Huang, M. Cahill, A. Fekete and U. Rohm | It should be better to choose the appropriate Read Preference setting at runtime, as we describe in this paper. We show how a system can detect when the primary copy is saturated in MongoDB-as-a-Service, and use this to choose where reads should be done to improve overall performance. |
201 | PrefixFPM: A Parallel Framework for General-Purpose Frequent Pattern Mining | D. Yan, W. Qu, G. Guo and X. Wang | This paper presents, PrefixFPM, a general-purpose framework for FPM that is able to fully utilize the CPU cores in a multicore machine. |
202 | HBP: Hotness Balanced Partition for Prioritized Iterative Graph Computations | S. Gong, Y. Zhang and G. Yu | To accelerate prioritized iterative graph computations, we propose Hotness Balanced Partition (HBP) and a stream-based partition algorithm Pb-HBP. |
203 | Computing Mutual Information of Big Categorical Data and Its Application to Feature Grouping | J. Li, C. Zhang, J. Zhang and X. Qin | This paper develops a parallel computing system – MiCS – for mutual information of big categorical data on the Spark computing platform. |
204 | StructSim: Querying Structural Node Similarity at Billion Scale | X. Chen, L. Lai, L. Qin and X. Lin | In this paper, we propose a new framework StructSim to compute nodes? role similarity. |
205 | HowSim: A General and Effective Similarity Measure on Heterogeneous Information Networks | Y. Wang et al. | To address this problem, we extend SimRank, a well-known similarity measure for homogeneous graphs, to HINs, by introducing the concept of decay graph. |
206 | GraphAE: Adaptive Embedding across Graphs | B. Yan and C. Wang | In this paper, we present an interesting graph embedding problem called Adaptive Task (AT), and propose a unified framework for this adaptive task, which introduces two types of alignment to learn adaptive node embedding across graphs. |
207 | FlashSchema: Achieving High Quality XML Schemas with Powerful Inference Algorithms and Large-scale Schema Data | Y. Li, J. Cao, H. Chen, T. Ge, Z. Xu and Q. Peng | In this paper, we propose a tool FlashSchema for high quality XML schema design, which supports both one-pass and interactive schema design and schema recommendation. |
208 | Efficient Structural Clustering in Large Uncertain Graphs | Y. Liang, T. Hu and P. Zhao | In this paper, we develop a new, decomposition-based method, ProbSCAN, for efficient reliable structural similarity computation with theoretically improved complexity. |
209 | Efficient Weighted Independent Set Computation over Large Graphs | W. Zheng, J. Gu, P. Peng and J. X. Yu | Following the reduction-and-branching strategy, we propose an exact algorithm to compute the maximum weighted independent set. |
210 | Keys as Features for Graph Entity Matching | T. Deng, L. Hou and Z. Han | We treat entity matching as a classification problem, and propose GMKSLEM, a supervised learning method for graph entity matching. |
211 | Online Pricing with Reserve Price Constraint for Personal Data Markets | C. Niu, Z. Zheng, F. Wu, S. Tang and G. Chen | In this paper, we study how the data broker can maximize her cumulative revenue by posting reasonable prices for sequential queries. |
212 | Permutation Index: Exploiting Data Skew for Improved Query Performance | W. Zhang and K. A. Ross | In this paper, we propose a novel index structure for repositioning data items to concentrate popular items into the same cache lines, resulting in better spatial locality, and better utilization of limited cache resources. |
213 | Efficient Locality-Sensitive Hashing Over High-Dimensional Data Streams | C. Yang, D. Deng, S. Shang and L. Shao | In this paper, we present PDA-LSH, a novel and practical disk-based LSH index that can offer efficient support for both updates and searches. |
214 | A Class of R*-tree Indexes for Spatial-Visual Search of Geo-tagged Street Images | A. Alfarrarjeh et al. | Therefore, we propose a class of R*-tree indexes, particularly, by associating each node with two separate minimum bounding rectangles (MBR), one for spatial and the other for (dimension-reduced) visual properties of their contained images, and adapting the R*-tree optimization criteria to both property types. |
215 | Graph Embeddings for One-pass Processing of Heterogeneous Queries | C. T. Duong et al. | Specifically, we propose graph-based models in which both, data and queries, incorporate information of different modalities. |
216 | Fast Error-tolerant Location-aware Query Autocompletion | J. Wang and C. Lin | In this paper, we propose a novel framework AutoEL to support error-tolerant location-aware query autocompletion. |
217 | TrajMesa: A Distributed NoSQL Storage Engine for Big Trajectory Data | R. Li et al. | This paper proposes a holistic distributed NoSQL trajectory storage engine, TrajMesa, based on GeoMesa, an open-source indexing toolkit for spatio-temporal data. |
218 | Learning to Rank Paths in Spatial Networks | S. B. Yang and B. Yang | We present PathRank, a data-driven framework for ranking paths based on historical trajectories. |
219 | A Hybrid Learning Approach to Stochastic Routing | S. A. Pedersen, B. Yang and C. S. Jensen | We propose a hybrid approach that combines convolution and machine learning-based estimation to take into account dependencies among distributions in order to improve accuracy. |
220 | Shortest Path Queries for Indoor Venues with Temporal Variations | T. Liu et al. | In this paper, we define a new type of query called Indoor Temporal-variation aware Shortest Path Query (ITSPQ). |
221 | DAG: A General Model for Privacy-Preserving Data Mining : (Extended Abstract) | S. G. Teo, J. Cao and V. C. S. Lee | To address this issue, we propose a privacy model DAG (Directed Acyclic Graph) that consists of a set of fundamental secure operators (e.g., +, -, ?, /, and power). |
222 | TIDY: Publishing a Time Interval Dataset with Differential Privacy (Extended abstract) | W. Jung, S. Kwon and K. Shim | We propose the TIDY (publishing Time Intervals via Differential privacY) algorithm to release time interval data under differential privacy. |
223 | The Power of Bounds: Answering Approximate Earth Mover?s Distance with Parametric Bounds (Extended abstract) | T. N. Chan, M. Lung Yiu and L. H. U | In this work, we study how to compute approximate EMD value with bounded error, using these bound functions. |
224 | On Nearby-Fit Spatial Keyword Queries (Extended Abstract) | V. J. Wei, R. Chi-Wing Wong, C. Long and P. Hui | In this paper, we propose a new type of query called nearby-fit spatial keyword query (NSKQ), where an optimal object is defined based not only on the location and the keywords of the object itself, but also on those of the objects nearby. |
225 | ChronoGraph: Enabling temporal graph traversals for efficient information diffusion analysis over time | J. Byun, S. Woo and D. Kim | ChronoGraph: Enabling temporal graph traversals for efficient information diffusion analysis over time |
226 | Efficient Distance Sensitivity Oracles for Real-World Graph Data | J. Lee and C. Chung | Motivated by this, we develop two practical distance sensitivity oracles for directed graphs as variants of Transit Node Routing, and effective speed-up techniques with a slight loss of accuracy. |
227 | Demythization of Structural XML Query Processing: Comparison of Holistic and Binary Approaches (Extended Abstract) | P. Luk?, R. Baca, M. Kr?tk? and T. Wang Ling | However, a thorough analytical and experimental comparison of binary and holistic joins has been missing despite an enormous research effort in this area. In this paper, we try to fill this gap. |
228 | Answering Skyline Queries over Incomplete Data with Crowdsourcing(Extended Abstract) | X. Miao, Y. Gao, S. Guo, L. Chen, J. Yin and Q. Li | In this paper, we study the problem of skyline queries over incomplete data with crowdsourcing. |
229 | HisRect: Features from Historical Visits and Recent Tweet for Co-Location Judgement | P. Li, H. Lu, Q. Zheng, S. Li and G. Pan | This study explores the problem of co-location judgement, i.e., to decide whether two Twitter users are co-located at some point-of-interest (POI). |
230 | K-SPIN: Efficiently Processing Spatial Keyword Queries on Road Networks : (Extended Abstract) | T. Abeywickrama, M. A. Cheema and A. Khan | Instead, we propose the K-SPIN framework, which uses an alternative keyword separation strategy that is more suitable on road networks. |
231 | ESPM: Efficient Spatial Pattern Matching (Extended Abstract) | H. Chen, Y. Fang, Y. Zhang, W. Zhang and L. Wang | To enhance the performance of SPM, in this paper we propose a novel Efficient Spatial Pattern Matching (ESPM) algorithm, which exploits the inverted linear quadtree index and computes matched node pairs and object pairs level by level in a top-down manner. |
232 | A Transformation-based Framework for KNN Set Similarity Search(Extended Abstract) | Y. Zhang, J. Wu, J. Wang and C. Xing | In this paper we use the widely applied Jaccard to quantify the similarity between two sets, but our proposed techniques can be easily extended to other set-based similarity functions. |
233 | Matrix Factorization with Interval-Valued Data | M. Li, F. Di Mauro, K. Sel?uk Candan and M. L. Sapino | In this paper, we propose matrix decomposition techniques that consider the existence of interval-valued data. |
234 | Neighborhood density correlation clustering | Z. Wang and L. Zhong | In this paper, by analyzing the advantages and disadvantages of existing clustering analysis algorithms, a new neighborhood density correlation clustering (NDCC) algorithm for quickly discovering arbitrary shaped clusters is proposed that avoids the clustering errors caused by iso-density points between clusters. |
235 | Spark Performance Optimization Analysis In Memory Management with Deploy Mode In Standalone Cluster Computing | D. M. Adinew, Z. Shijie and Y. Liao | Investigating how performance is increased in relation to spark executor memory, number of executors, number of cores, and deploy mode parameters configuration in a standalone cluster model is our primary goal. |
236 | Design of Database Systems with DRAM-only Heterogeneous Memory Architecture | Y. Qiao | This thesis aims to develop techniques that enable relational and NoSQL databases to take full advantage of the envisioned low-cost heterogeneous DRAM system. |
237 | Data Series Indexing Gone Parallel | B. Peng | In this Ph.D. work, we present the first data series indexing solutions, for both on-disk and in-memory data, that are designed to inherently take advantage of multi-core architectures, in order to accelerate similarity search processing times. |
238 | Area Queries Based on Voronoi Diagrams | Y. Li and G. Liu | In view of this issue, we propose a method of iteratively generating candidates based on Voronoi diagrams and apply it to area queries. |
239 | Picube for Fast Exploration of Large Datasets | W. Fu | Inspired by this, we propose a partitioned, inductively aggregated data-cube, picube. |