Paper Digest: ICDE 2019 Highlights
IEEE International Conference on Data Engineering (ICDE) addresses research issues in designing, building, managing, and evaluating advanced data-intensive systems and applications. In 2019, it is to be held in Macau, China.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting academic paper, you are welcome to sign up our free paper digest service to get new paper updates customized to your own interests on a daily basis.
Paper Digest Team
team@paperdigest.org
TABLE 1: ICDE 2019 Papers/Talks/Tutorials/Demos/…
Title | Authors | Highlight | |
---|---|---|---|
1 | Data Management at Huawei: Recent Accomplishments and Future Challenges | J. Chen et al. | In this paper, we will go through recent advancements in Huawei data management technologies including a petabyte scale enterprise analytics platform (FusionInsight MPPDB) and a highly available in-memory database for telecommunication networks (GMDB). |
2 | KV-Match: A Subsequence Matching Approach Supporting Normalization and Time Warping | J. Wu, P. Wang, N. Pan, C. Wang, W. Wang and J. Wang | In this paper, we propose a novel problem, named constrained normalized subsequence matching problem (cNSM), which adds some constraints to NSM problem. |
3 | Efficient Maximal Spatial Clique Enumeration | C. Zhang, Y. Zhang, W. Zhang, L. Qin and J. Yang | In this paper, we investigate this problem in the context of spatial database. |
4 | Cluster-Based Subscription Matching for Geo-Textual Data Streams | L. Chen, S. Shang, K. Zheng and P. Kalnis | To solve the CSM problem, we propose a novel solution to cluster, feed, and summarize a stream of geo-textual messages efficiently. |
5 | Time-Dependent Hop Labeling on Road Network | L. Li, S. Wang and X. Zhou | In this paper, we aim to answer the fastest path profile query on time-dependent road network faster by extending the 2-hop labeling approach, which is fast in answering shortest distance query on the static graph. |
6 | Weight-Constrained Route Planning Over Time-Dependent Graphs | Y. Yuan, X. Lian, G. Wang, L. Chen, Y. Ma and Y. Wang | In this paper, we study the WRP problem over a large time-dependent graph by incorporating continuous time and weight functions into it. |
7 | Skyline Queries Constrained by Multi-cost Transportation Networks | Q. Gong, H. Cao and P. Nagarkar | In this paper, we propose a new type of skyline queries whose evaluation is constrained by a multi-cost transportation network (MCTN) and whose answers are off the network. |
8 | Online Social Media Recommendation Over Streams | X. Zhou, D. Qin, X. Lu, L. Chen and Y. Zhang | In this paper, we propose a novel framework for the social recommendation over streams. |
9 | Canonicalization of Open Knowledge Bases with Side Information from the Source Text | X. Lin and L. Chen | In this paper, we propose to perform canonicalization over Open IE triples by incorporating the side information from the original data sources, including the candidate entities of the noun phrases detected in the source text, the types of the candidate entities and the domain knowledge of the source text. |
10 | Walking with Perception: Efficient Random Walk Sampling via Common Neighbor Awareness | Y. Li et al. | To address this issue, we propose a common neighbor aware random walk framework called CNARW, which leverages weighted walking by differentiating the next-hop candidate nodes to speed up the convergence. |
11 | Building a Broad Knowledge Graph for Products | X. L. Dong | In this talk, we ask the question: Can one build a knowledge graph (KG) for all products in the world? |
12 | SimMeme: A Search Engine for Internet Memes | T. Milo, A. Somech and B. Youngmann | To address this problem, we present SimMeme, a Meme-dedicated search engine. |
13 | A Hierarchical Framework for Top-k Location-Aware Error-Tolerant Keyword Search | J. Yang, Y. Zhang, X. Zhou, J. Wang, H. Hu and C. Xing | In this paper, we propose a novel framework to solve the problem of top-k location-aware similarity search with fuzzy token matching. |
14 | 2ED: An Efficient Entity Extraction Algorithm Using Two-Level Edit-Distance | Z. Wen, D. Deng, R. Zhang and R. Kotagiri | In this paper, we propose an efficient character-level and token-level edit-distance based algorithm called FuzzyED. |
15 | Bridging Quantities in Tables and Text | Y. Ibrahim, M. Riedewald, G. Weikum and D. Zeinalipour-Yazti | This paper introduces the quantity alignment problem: bidirectional linking between textual mentions of quantities and the corresponding table cells, in order to support advanced content summarization and faster navigation between explanations in text and details in tables. |
16 | An Efficient Insertion Operator in Dynamic Ridesharing Services | Y. Xu, Y. Tong, Y. Shi, Q. Tao, K. Xu and W. Li | In this paper, we propose a novel partition framework and a dynamic programming based insertion with a time complexity of O(n2). |
17 | Auction-Based Order Dispatch and Pricing in Ridesharing | L. Zheng, P. Cheng and L. Chen | In this paper, we propose solutions for the bonus-offering scenario of ridesharing platforms (service providers). |
18 | When Geo-Text Meets Security: Privacy-Preserving Boolean Spatial Keyword Queries | N. Cui, J. Li, X. Yang, B. Wang, M. Reynolds and Y. Xiang | Therefore, in this work, we first propose and formalize the problem of privacy-preserving boolean spatial keyword query under the widely accepted Known Background Thread Model. |
19 | Moving Object Linking Based on Historical Trace | F. Jin, W. Hua, J. Xu and X. Zhou | In this work, we study the problem of moving object linking based on their historical traces. |
20 | Knowledge Graphs and Enterprise AI: The Promise of an Enabling Technology | L. Bellomarini, D. Fakhoury, G. Gottlob and E. Sallinger | We propose knowledge graphs as the reference technology for the enterprise AI context, i.e., the complex of entities, properties and relationships that shape a business domain and constitute a common backbone for all AI-driven applications. |
21 | ImageProof: Enabling Authentication for Large-Scale Image Retrieval | S. Guo, J. Xu, C. Zhang, C. Xu and T. Xiang | In this paper, we take the first step in studying the problem of query authentication for large-scale image retrieval. |
22 | Time Constrained Continuous Subgraph Search Over Streaming Graphs | Y. Li, L. Zou, M. T. ?zsu and D. Zhao | In this paper, we study the subgraph (isomorphism) search over streaming graph data that obeys timing order constraints over the occurrence of edges in the stream. |
23 | Utilizing Dynamic Properties of Sharing Bits and Registers to Estimate User Cardinalities Over Time | P. Wang, P. Jia, X. Zhang, J. Tao, X. Guan and D. Towsley | To address these problems, we develop novel bit and register sharing algorithms, which use a bit array and a register array to build a compact sketch of all users’ connected items respectively. |
24 | Tracking Influential Nodes in Time-Decaying Dynamic Interaction Networks | J. Zhao, S. Shang, P. Wang, J. C. S. Lui and X. Zhang | In this work, we address the dynamic influence challenge by designing efficient streaming methods that can identify influential nodes from highly dynamic node interaction streams. |
25 | Fast and Accurate Graph Stream Summarization | X. Gou, L. Zou, C. Zhao and T. Yang | In this paper, we propose a novel Graph Stream Sketch (GSS for short) to summarize the graph streams, which has linear space cost O(|E|) (E is the edge set of the graph) and constant update time cost (O(1)) and supports most kinds of queries over graph streams with the controllable errors. |
26 | Mining Periodic Cliques in Temporal Networks | H. Qin, R. Li, G. Wang, L. Qin, Y. Cheng and Y. Yuan | In this paper, we study a problem of seeking periodic communities in a temporal network, where each edge is associated with a set of timestamps. |
27 | Coloring Embedder: A Memory Efficient Data Structure for Answering Multi-set Query | Y. Tong et al. | In this paper, we propose a new data structure named coloring embedder, which is fast, accurate as well as memory efficient. |
28 | Mining Order-Preserving Submatrices Under Data Uncertainty: A Possible-World Approach | J. Cheng, D. Yan, X. Hao and W. Ng | An optimized dynamic programming approach is proposed to compute the probability that a row supports a particular column permutation, and several effective pruning rules are introduced to efficiently prune insignificant OPSMs. |
29 | Multi-Dimensional Genomic Data Management for Region-Preserving Operations | O. Horlova, A. Kaitoua, V. Markl and S. Ceri | In this paper, we focus on the efficient execution of region-preserving GMQL operations, in which the regions of the result are a subset of the regions of one of the operands; most GMQL operations are region-preserving. |
30 | Improved Algorithms for Maximal Clique Search in Uncertain Networks | R. Li, Q. Dai, G. Wang, Z. Ming, L. Qin and J. X. Yu | To overcome this issue, we propose two new core-based pruning algorithms to reduce the uncertain graph size without missing any maximal (k, t)-clique. |
31 | Lazo: A Cardinality-Based Method for Coupled Estimation of Jaccard Similarity and Containment | R. Castro Fernandez, J. Min, D. Nava and S. Madden | The main contribution of this paper is LAZO, a method to simultaneously estimate both the similarity and containment of datasets, based on a redefinition of Jaccard similarity which takes into account the cardinality of each set. |
32 | TARDIS: Distributed Indexing Framework for Big Time Series Data | L. Zhang, N. Alghamdi, M. Y. Eltabakh and E. A. Rundensteiner | In this paper, we propose the TARDIS distributed indexing framework to overcome the aforementioned limitations. |
33 | Mostly Order Preserving Dictionaries | C. Liu, M. Umbenhower, H. Jiang, P. Subramaniam, J. Ma and A. J. Elmore | In this work we bridge this gap by introducing mostly ordered dictionaries that use a best effort dictionary generation based on sampling the input dataset. |
34 | Multi-copy Cuckoo Hashing | D. Li, R. Du, Z. Liu, T. Yang and B. Cui | To address the problem, we propose an efficient Cuckoo hashing scheme called Multi-copy Cuckoo or McCuckoo. |
35 | Efficient Scalable Multi-attribute Index Selection Using Recursive Strategies | R. Schlosser, J. Kossmann and M. Boissier | We introduce a novel recursive strategy that does not exclude index candidates in advance and effectively accounts for index interaction. |
36 | To Index or Not to Index: Optimizing Exact Maximum Inner Product Search | F. Abuzaid, G. Sethi, P. Bailis and M. Zaharia | In this paper, we show that a hardware-efficient brute-force approach, blocked matrix multiply (BMM), can outperform the state-of-the-art MIPS solvers by over an order of magnitude, for some-but not all-inputs. |
37 | Is There a Data Science and Engineering Brain Drain? If So, How Can We Rebalance Them? | J. Pei | Is There a Data Science and Engineering Brain Drain? If So, How Can We Rebalance Them? |
38 | Distributed In-memory Trajectory Similarity Search and Join on Road Network | H. Yuan and G. Li | To support trajectory similarity search and join, we propose a filtering-refine framework. |
39 | Stochastic Weight Completion for Road Networks Using Graph Convolutional Networks | J. Hu, C. Guo, B. Yang and C. S. Jensen | We propose a generic learning framework called Graph Convolutional Weight Completion (GCWC) that exploits the topology of a road network graph and the correlations of weights among adjacent edges to estimate stochastic weights for all edges. |
40 | Identifying the Most Interactive Object in Spatial Databases | D. Amagata and T. Hara | This paper investigates a new query, called an MIO query, that retrieves the Most Interactive Object in a given spatial dataset. |
41 | An Efficient Framework for Correctness-Aware kNN Queries on Road Networks | D. He, S. Wang, X. Zhou and R. Cheng | Motivated by this, we propose a framework on correctness-aware kNN queries which aim to optimize system throughput while guaranteeing query correctness on moving objects. |
42 | MPR ? A Partitioning-Replication Framework for Multi-Processing kNN Search on Road Networks | S. Luo, B. Kao, X. Wu and R. Cheng | We propose the MPR (Multi-layer Partitioning-Replication) mechanism that orchestrates CPU cores and schedules kNN query and index update processes to run on the cores. |
43 | AppUsage2Vec: Modeling Smartphone App Usage for Prediction | S. Zhao et al. | In this paper, we propose a novel framework for app usage prediction, called AppUsage2Vec, inspired by Doc2Vec. |
44 | iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making | P. Lahoti, K. P. Gummadi and G. Weikum | The paper introduces a method for probabilistically mapping user records into a low-rank representation that reconciles individual fairness and the utility of classifiers and rankings in downstream applications. |
45 | DBSCAN-MS: Distributed Density-Based Clustering in Metric Spaces | K. Yang, Y. Gao, R. Ma, L. Chen, S. Wu and G. Chen | In this paper, we propose DBSCAN-MS, a distributed density-based clustering in metric spaces. |
46 | Computing Trajectory Similarity in Linear Time: A Generic Seed-Guided Neural Metric Learning Approach | D. Yao, G. Cong, C. Zhang and J. Bi | We propose NeuTraj to accelerate trajectory similarity computation. |
47 | Bursty Event Detection Throughout Histories | D. Paul, Y. Peng and F. Li | In this paper, we investigate the problem of bursty event detection where we define burst as the acceleration over the incoming rate of an event mentioning. |
48 | RUM: Network Representation Learning Using Motifs | Y. Yu, Z. Lu, J. Liu, G. Zhao and J. Wen | Towards the leveraging of graph motifs that constitute higher-order organizations in a network, we propose two strategies, namely MotifWalk and MotifRe-weighting for learning motif-aware network embeddings. |
49 | Finding Significant Items in Data Streams | T. Yang, H. Zhang, D. Yang, Y. Huang and X. Li | In this paper, we define a new issue, named finding top-k significant items, and propose a novel algorithm namely LTC to address this issue. |
50 | Knowledge-Aware Deep Dual Networks for Text-Based Mortality Prediction | N. Liu, P. Lu, W. Zhang and J. Wang | To address the above issues, we propose novel Knowledge-aware Deep Dual Networks (K-DDN) for the text-based mortality prediction task. |
51 | Robust High Dimensional Stream Classification with Novel Class Detection | Z. Wang, Z. Kong, S. Changra, H. Tao and L. Khan | In this paper, we focus on addressing this challenge by proposing an effective learning framework called CNN-based Prototype Ensemble (CPE) for novel class detection and correction. |
52 | Towards the Completion of a Domain-Specific Knowledge Base with Emerging Query Terms | S. Jiang, J. Liang, Y. Xiao, H. Tang, H. Huang and J. Tan | In this paper, we use the product knowledge base in the largest Chinese e-commerce platform, Taobao, as an example to investigate a completion procedure of a domain-specific knowledge base. |
53 | Cooperation-Aware Task Assignment in Spatial Crowdsourcing | P. Cheng, L. Chen and J. Ye | In this paper, we consider an important spatial crowdsourcing problem, namely cooperation-aware spatial crowdsourcing (CA-SC), where spatial tasks (e.g., collecting the Wi-Fi signal strength in one building) are time-constrained and require more than one worker to complete thus the cooperation among assigned workers is essential to the result. |
54 | Minimizing Maximum Delay of Task Assignment in Spatial Crowdsourcing | Z. Chen, P. Cheng, Y. Zeng and L. Chen | In this paper, we study the minimizing maximum delay spatial crowdsourcing (MMD-SC) problem and propose solutions aiming at achieving a worst case controlled task assignment. |
55 | Physical Representation-Based Predicate Optimization for a Visual Analytics Database | M. R. Anderson, M. Cafarella, G. Ros and T. F. Wenisch | In this paper, we propose TAHOMA, which generates and evaluates many potential classifier cascades that jointly optimize the CNN architecture and input data representation. |
56 | Adaptive Dynamic Bipartite Graph Matching: A Reinforcement Learning Approach | Y. Wang, Y. Tong, C. Long, P. Xu, K. Xu and W. Lv | In this paper, we propose the dynamic bipartite graph matching (DBGM) problem to be better aligned with real-world applications and devise a novel adaptive batch-based solution framework with a constant competitive ratio. |
57 | Scalable Frequent Sequence Mining with Flexible Subsequence Constraints | A. Renz-Wieland, M. Bertsch and R. Gemulla | We derive a general framework for frequent sequence mining under this model and propose the D-SEQ and D-CAND algorithms within this framework. |
58 | Adaptive Influence Blocking: Minimizing the Negative Spread by Observation-Based Policies | Q. Shi, C. Wang, D. Ye, J. Chen, Y. Feng and C. Chen | Motivated by the above considerations, we propose a novel Adaptive Influence Blocking (AIB) problem. |
59 | Fraction-Score: A New Support Measure for Co-location Pattern Mining | H. K. Chan, C. Long, D. Yan and R. C. Wong | In this paper, we propose a new measure called Fraction-Score whose idea is to count instances fractionally if they overlap. |
60 | Discovery and Ranking of Functional Dependencies | Z. Wei and S. Link | Utilizing new data structures and original techniques for the dynamic computation of stripped partitions, we devise a new hybridization strategy that outperforms the best algorithms in terms of efficiency, column-, and row-scalability. |
61 | Adaptive Deep Reuse: Accelerating CNN Training on the Fly | L. Ning, H. Guan and X. Shen | This work proposes adaptive deep reuse, a method for accelerating CNN training by identifying and avoiding the unnecessary computations contained in each specific training on the fly. |
62 | Slice Finder: Automated Data Slicing for Model Validation | Y. Chung, T. Kraska, N. Polyzotis, K. H. Tae and S. E. Whang | Unlike general techniques (e.g., clustering) that can find arbitrary slices, our goal is to find interpretable slices (which are easier to take action compared to arbitrary subsets) that are large and problematic. |
63 | Answering Why-Questions for Subgraph Queries in Multi-attributed Graphs | Q. Song, M. H. Namaki and Y. Wu | We introduce measures that characterize good query rewrites by incorporating both query editing cost and answer closeness. |
64 | Neural Multi-task Recommendation from Multi-behavior Data | C. Gao et al. | In this work, we contribute a new solution named NMTR (short for Neural Multi-Task Recommendation) for learning recommender systems from user multi-behavior data. |
65 | AUC-MF: Point of Interest Recommendation with AUC Maximization | P. Han, S. Shang, A. Sun, P. Zhao, K. Zheng and P. Kalnis | In this paper, we propose AUC-MF to address the POI recommendation problem by maximizing Area Under the ROC curve (AUC). |
66 | Efficient Batch One-Hop Personalized PageRanks | S. Luo, X. Xiao, W. Lin and B. Kao | To address the limitations of existing algorithms, this paper presents Baton, an algorithm for batch one-hop PPR that offers strong practical efficiency. |
67 | BB-Tree: A Main-Memory Index Structure for Multidimensional Range Queries | S. Sprenger, P. Sch?fer and U. Leser | We present the BB-Tree, a fast and space-efficient index structure for processing multidimensional workloads in main memory. |
68 | Explaining Queries Over Web Tables to Non-experts | J. Berant, D. Deutch, A. Globerson, T. Milo and T. Wolfson | We introduce novel query explanations that provide a graphic representation of the query cell-based provenance (in its execution on a given table). |
69 | Nonlinear Models Over Normalized Data | Z. Cheng and N. Koudas | In this paper we study the implementation of popular nonlinear ML models and in particular independent Gaussian Mixture Models (IGMM) over normalized data. |
70 | Continuous Search on Dynamic Spatial Keyword Objects | Y. Dong, H. Chen and H. Kitagawa | In this paper, we define a novel query problem to continuously search for dynamic spatial keyword objects. |
71 | Top-K Frequent Term Queries on Streaming Data | S. Farazi and D. Rafiei | We propose two variations of reverse spatial term queries on streaming data and an approach for efficiently evaluating them with some bounds on the error. |
72 | Parallel and Distributed Processing of Reverse Top-k Queries | P. Nikitopoulos, G. A. Sfyris, A. Vlachou, C. Doulkeridis and O. Telelis | In this paper, we address the problem of processing reverse top-k queries in a parallel and distributed setting. |
73 | Distributed Discovery of Functional Dependencies | H. Saxena, L. Golab and I. F. Ilyas | We address the problem of discovering functional dependencies from distributed big data. |
74 | Enumerating k-Vertex Connected Components in Large Graphs | D. Wen, L. Qin, Y. Zhang, L. Chang and L. Chen | In this paper, given a graph G and an integer k, we study the problem of computing all k-VCCs in G. |
75 | AID: An Adaptive Image Data Index for Interactive Multilevel Visualization | S. Ghosh, A. Eldawy and S. Jais | This paper introduces the first adaptive visualization index that combines both data and images to provide a scalable, interactive visualization while minimizing the index size and index construction time. |
76 | Collecting Preference Rankings Under Local Differential Privacy | J. Yang, X. Cheng, S. Su, R. Chen, Q. Ren and Y. Liu | In this paper, we initiate the study of collecting preference rankings under local differential privacy. |
77 | Muses: Distributed Data Migration System for Polystores | A. Kaitoua, T. Rabl, A. Katsifodimos and V. Markl | In this paper we present Muses, a distributed, high-performance data migration engine that is able to forward, transform, repartition, and broadcast data between distributed engines’ instances efficiently. |
78 | PriSTE: From Location Privacy to Spatiotemporal Event Privacy | Y. Cao, Y. Xiao, L. Xiong and L. Bai | To address this problem, we define the spatiotemporal event as a new privacy goal, which can be formalized as Boolean expressions between location and time predicates. |
79 | Continuous Range Queries Over Multi-attribute Trajectories | J. Xu, Z. Bao and H. Lu | In this paper, we study continuous range queries over multi-attribute trajectories. |
80 | Insecurity and Hardness of Nearest Neighbor Queries Over Encrypted Data | R. Li, A. X. Liu, Y. Liu, H. Xu and H. Yuan | By viewing dimensions of the data and the encrypted data as source signals and observed signals, respectively, we formally prove and experimentally demonstrate that ASPE is actually insecure against even ciphertext only attacks, using signal processing theory. |
81 | Modeling Multidimensional User Preferences for Collaborative Filtering | F. Khawar and N. L. Zhang | In this paper, we propose such a method for implicit feedback data. |
82 | A Queueing-Theoretic Framework for Vehicle Dispatching in Dynamic Car-Hailing | P. Cheng, C. Feng, L. Chen and Z. Wang | In this paper, we consider an important dynamic car-hailing problem, namely maximum revenue vehicle dispatching (MRVD), in which rider requests dynamically arrive and drivers need to serve as many riders as possible such that the entire revenue of the platform is maximized. |
83 | Maximizing the Utility in Location-Based Mobile Advertising | P. Cheng, X. Lian, L. Chen and S. Liu | In this paper, we consider a location-based advertising problem, namely maximum utility advertisement assignment (MUAA) problem, with the estimation of the interests of customers and the contexts of the vendors, we want to maximize the overall utility of ads by determining the ads sent to each customer subject to the constraints of the capacities of customers, the distance ranges and the budgets of vendors. |
84 | Automated Grading of SQL Queries | B. Chandra, A. Banerjee, U. Hazra, M. Joseph and S. Sudarshan | In this paper, we discuss techniques to award partial marks to student SQL queries, in case they are incorrect, based on a weighted equivalence edit distance metric. |
85 | Index-Based Optimal Algorithm for Computing K-Cores in Large Uncertain Graphs | B. Yang, D. Wen, L. Qin, Y. Zhang, L. Chang and R. Li | To overcome these drawbacks, we have developed an index-based solution for computing (k, ?)-cores in this paper. |
86 | Parameter Discovery in Unsupervised Clustering | V. Clement and T. Heinis | In this paper, we introduce the idea of simple assumptions about the global distribution of some property of the data leading to local, actionable insights. |
87 | Interaction-Aware Arrangement for Event-Based Social Networks | F. Kou, Z. Zhou, H. Cheng, J. Du, Y. Shi and P. Xu | In this work, we propose a new event-participant arrangement problem called Interaction-aware Global Event-Participant Arrangement (IGEPA). |
88 | Optimizing Cross-Platform Data Movement | S. Kruse, Z. Kaoudi, J. Quiane-Ruiz, S. Chawla, F. Naumann and B. Contreras-Rojas | In this paper, we present the graph-based data movement strategy used by Rheem, our open-source cross-platform system. |
89 | Highly Efficient Pattern Mining Based on Transaction Decomposition | Y. Djenouri, J. Chun-Wei Lin, K. N?rv?g and H. Ramampiaro | This paper introduces a highly efficient pattern mining technique called Clustering-Based Pattern Mining (CBPM). |
90 | Procrastination-Aware Scheduling: A Bipartite Graph Perspective | L. Wang, Y. Tong, C. Hu, L. Chen and Y. Li | We find the PSP is NP-hard in the strong sense and design an approximation algorithm. In this paper, we first propose the Procrastination-aware Scheduling Problem (PSP) to model an appropriate schedule. |
91 | Hankel Matrix Factorization for Tagged Time Series to Recover Missing Values During Blackouts | S. Wu, L. Wang, T. Wu, X. Tao and J. Lu | While the existing approaches for missing value recovery in time series could not handle this issue properly, in this work, we proposes a Hankel matrix factorization-based approach for tagged time series called HKMF-T, following the idea of decomposing a data sequence into the smooth trend and the external impact components. |
92 | PerRD: A System for Personalized Route Description | H. Su, G. Cong, W. Chen, Q. Su, B. Zheng and K. Zheng | In this paper, we study a Personalized Route Description system dubbed PerRD – with which the goal is to generate more customized and intuitive route descriptions based on user generated content. |
93 | Scalable Metric Similarity Join Using MapReduce | J. Wu, Y. Zhang, J. Wang, C. Lin, Y. Fu and C. Xing | In this paper, we propose SMS-Join, a parallel framework to support similarity join in metric space based on the MapReduce paradigm. |
94 | An Indexing Framework for Efficient Visual Exploratory Subgraph Search in Graph Databases | C. Wang, M. Xie, S. S. Bhowmick, B. Choi, X. Xiao and S. Zhou | In this paper, we present two novel index structures called VACCINE and ADVISE to efficiently support exploratory subgraph search in a visual environment (VESS). |
95 | I-LSH: I/O Efficient c-Approximate Nearest Neighbor Search in High-Dimensional Space | W. Liu, H. Wang, Y. Zhang, W. Wang and L. Qin | In this paper, we introduce an incremental search based c-ANN search algorithm, named I-LSH. |
96 | Computing a Near-Maximum Independent Set in Dynamic Graphs | W. Zheng, C. Piao, H. Cheng and J. X. Yu | Since computing the exact MIS is intractable, we compute the high-quality (large-size) independent set for dynamic graphs in this paper, where 4 graph updating operations are allowed: adding or deleting a vertex or an edge. |
97 | T-Sample: A Dual Reservoir-Based Sampling Method for Characterizing Large Graph Streams | L. Zhang, H. Jiang, F. Wang, D. Feng and Y. Xie | This paper proposes a new method, called triangle-induced reservoir sampling, or T-Sample, to produce connected edge samples. |
98 | Real Time Principal Component Analysis | R. R. Chowdhury, M. A. Adnan and R. K. Gupta | In this paper, we propose a variant of PCA, that is suited for real-time applications. |
99 | A Fast Sketch Method for Mining User Similarities Over Fully Dynamic Graph Streams | P. Jia, P. Wang, J. Tao and X. Guan | Based on the sketch built on-the-fly, we develop a method to estimate user similarities over time. |
100 | A Collaborative Framework for Similarity Enforcement in Synthetic Scaling of Relational Datasets | J. W. Zhang and Y. C. Tay | This paper proposes ASPECT, which adopts a different approach, for relational datasets. |
101 | Meta Diagram Based Active Social Networks Alignment | Y. Ren, C. C. Aggarwal and J. Zhang | In this paper, we will study the network alignment problem to fuse online social networks specifically. |
102 | Entity Integrity, Referential Integrity, and Query Optimization with Embedded Uniqueness Constraints | Z. Wei, U. Leck and S. Link | Entity Integrity, Referential Integrity, and Query Optimization with Embedded Uniqueness Constraints |
103 | Efficient Pattern Mining Based Cryptanalysis for Privacy-Preserving Record Linkage | A. Vidanage, T. Ranbaduge, P. Christen and R. Schnell | Here we present a cryptanalysis attack that can re-identify attribute values encoded in BFs. |
104 | ECOQUG: An Effective Ensemble Community Scoring Function | C. Wang, H. Wang, C. Zhou, J. Li and H. Gao | In this paper, we propose a new community scoring function, ECOQUG. |
105 | CN-Probase: A Data-Driven Approach for Large-Scale Chinese Taxonomy Construction | J. Chen et al. | In this paper, we focus on automatic Chinese taxonomy construction and propose an effective generation and verification framework to build a large-scale and high-quality Chinese taxonomy. |
106 | Understanding Data Science Lifecycle Provenance via Graph Segmentation and Summarization | H. Miao and A. Deshpande | In this paper, we propose two high-level graph query operators to address the verboseness and evolving nature of such provenance graphs. |
107 | Contextual Community Search Over Large Social Networks | L. Chen, C. Liu, K. Liao, J. Li and R. Zhou | In this paper, we propose a novel parameter-free contextual community model for attributed community search. |
108 | Efficient Partitioning and Query Processing of Spatio-Temporal Graphs with Trillion Edges | M. Ding and S. Chen | In this paper, we define a formal spatio-temporal graph model based on real-world applications, and propose PAST, a framework for efficient PArtitioning and query processing of Spatio-Temporal graphs. |
109 | Top-K Entity Resolution with Adaptive Locality-Sensitive Hashing | V. Verroios and H. Garcia-Molina | In this paper we introduce the problem of top-k entity resolution and we summarize a novel approach for this problem; full details are presented in a technical report. |
110 | Finding Average Regret Ratio Minimizing Set in Database | S. Zeighami and R. C. Wong | In this paper, we would like to find a set of k points such that on average, the satisfaction (ratio) of a user is maximized. |
111 | HyMJ: A Hybrid Structure-Aware Approach to Distributed Multi-way Join Query | G. Zhu et al. | In this paper, we present a novel hybrid structure-aware multi-way join algorithm called HyMJ, which combines the one-round and multi-round algorithms to compute the hybrid query efficiently. |
112 | Accelerate MaxBRkNN Search by kNN Estimation | X. Chen, X. Cao, Z. Xu, Y. Zhang, S. Shang and W. Zhang | Observing this, we develop an approach which computes kNN for only promising clients by utilising a two-level grid index (ADPGI) to reduce the cost substantially. |
113 | Efficient Bottom-Up Discovery of Multi-scale Time Series Correlations Using Mutual Information | N. Ho, T. B. Pedersen, M. Vu, V. L. Ho and C. A. N. Biscio | This paper presents an approach to search for synchronous correlations in big time series that displays all three properties: the proposed method (i) utilizes the metric of mutual information from information theory, providing a strong theoretical foundation, (ii) is able to discover correlations at multiple temporal scales, and (iii) works in an efficient, bottom-up fashion, making it scalable to large datasets. |
114 | Fingerprinting Big Data: The Case of KNN Graph Construction | R. Guerraoui, A. Kermarrec, O. Ruas and F. Ta?ani | We propose fingerprinting, a new technique that consists in constructing compact, fast-to-compute and privacy-preserving binary representations of datasets. |
115 | Outer and Anti Joins in Temporal-Probabilistic Databases | K. Papaioannou, M. Theobald and M. Boehlen | For the computation of TP joins with negation, we introduce generalized lineage-aware temporal windows, a mechanism that binds an interval to the lineages of all the matching valid tuples of each input relation. |
116 | Workload-Driven Fragment Allocation for Partially Replicated Databases Using Linear Programming | S. Halfpap and R. Schlosser | In this paper, we define a linear programming (LP) model to calculate the set of partial replicas with the lowest overall memory capacity while evenly balancing the query load. |
117 | OSMAC: Optimizing Subgraph Matching Algorithms with Community Structure | Y. Lou and C. Wang | This paper proposes an optimization method named OSMAC to accelerate subgraph matching algorithms with the community structure of data graphs. |
118 | Workload-Aware Subgraph Query Caching and Processing in Large Graphs | Y. Liang and P. Zhao | In this paper, we address subgraph queries with the availability of query workload information, W = {w1,…, wn}, where wi ? |
119 | How I Learned to Stop Worrying and Love Re-optimization | M. Perron, Z. Shang, T. Kraska and M. Stonebraker | In this paper we investigate why this is still the case, despite decades of improvements to cost models, plan enumeration, and cardinality estimation. |
120 | CRA: Enabling Data-Intensive Applications in Containerized Environments | I. Sabek, B. Chandramouli and U. F. Minhas | In this paper, we factor out the commonalities in a large majority of these applications, into a generic dataflow layer called Common Runtime for Applications (CRA). |
121 | Scalable Similarity Joins of Tokenized Strings | A. Metwally and C. Huang | We propose a scalable distributed framework, Tokenized-String Joiner (TSJ), that adopts existing scalable string-join algorithms as building blocks to perform NSLD-joins. |
122 | MLlib*: Fast Training of GLMs Using Spark MLlib | Z. Zhang, J. Jiang, W. Wu, C. Zhang, L. Yu and B. Cui | In this paper, we study the performance of MLlib with a focus on training generalized linear models using gradient descent. |
123 | DirectLoad: A Fast Web-Scale Index System Across Large Regional Centers | A. Qin, M. Xiao, J. Ma, D. Tan, R. Lee and X. Zhang | In this paper, we show the effectiveness and efficiency of an in-memory index updating system, which is disruptive to the framework in a conventional memory hierarchy. |
124 | Presto: SQL on Everything | R. Sethi et al. | In this paper, we outline a selection of use cases that Presto supports at Facebook. |
125 | Improving RDF Query Performance Using In-memory Virtual Columns in Oracle Database | E. I. Chong, M. Perry and S. Das | In this paper, we propose to use in-memory virtual columns to avoid value table joins. |
126 | Rima: An RDMA-Accelerated Model-Parallelized Solution to Large-Scale Matrix Factorization | J. Geng, D. Li and S. Wang | Targeting at these drawbacks, we propose Rima, which uses ring-based model parallelism to solve large-scale MF with higher communication efficiency. |
127 | SEBDB: Semantics Empowered BlockChain DataBase | Y. Zhu, Z. Zhang, C. Jin, A. Zhou and Y. Yan | In this paper, we propose and implement a novel blockchain database, called SEBDB, which leverages the existing databases’ functionality which are optimized for decades. |
128 | Large Scale Traffic Signal Network Optimization – A Paradigm Shift Driven by Big Data | L. Yu et al. | In this paper, we will introduce our method for large scale traffic signal optimization, which is the major module of Alibaba’s city brain solution. |
129 | Domain-Independent Automated Processing of Free-Form Text Data in Telecom | R. Bhowmik and A. Akyamac | In this paper, we propose a domain-agnostic, unsupervised approach that deploys a multi-stage text processing pipeline for automatically discovering the key topics and categories from free-form text documents. |
130 | DRIVEN: a Framework for Efficient Data Retrieval and Clustering in Vehicular Networks | B. Havers, R. Duvignau, H. Najdataei, V. Gulisano, A. C. Koppisetty and M. Papatriantafilou | The goal of the DRIVEN framework, presented here, is to address these challenges for a data gathering and distance-based clustering tool in the context of vehicular networks. |
131 | Accurate Product Attribute Extraction on the Field | M. Rezk, L. Alonso Alemany, L. Nio and T. Zhang | In this paper we present a bootstrapping approach for attribute value extraction that minimizes the need for human intervention. |
132 | CATS: Cross-Platform E-Commerce Fraud Detection | H. Weng et al. | In this paper, we present an efficient, platform-independent, and robust e-commerce fraud detection system, CATS, to detect frauds for different large-scale e-commerce platforms. |
133 | Caladrius: A Performance Modelling Service for Distributed Stream Processing Systems | F. Kalim et al. | We find that general traffic trends in most jobs lend themselves well to prediction. |
134 | FAIR: Fraud Aware Impression Regulation System in Large-Scale Real-Time E-Commerce Search Platform | Z. Li et al. | In this paper, we propose the first fraud aware impression regulation system (FAIR) which is data-driven and can work in large-scale e-commerce platforms. |
135 | Accelerating Partial Evaluation in Distributed SPARQL Query Evaluation | P. Peng, L. Zou and R. Guan | To improve the efficiency of finding partial matches further, we propose an optimization that communicates variables’ candidates among sites to avoid redundant computations. |
136 | Micro-Browsing Models for Search Snippets | M. A. Islam, R. Srikant and S. Basu | In this paper, we propose a novel formulation: a micro-browsing model for how users read result snippets. |
137 | Interpretable Multi-task Learning for Product Quality Prediction with Attention Mechanism | C. Yeh, Y. Fan and W. Peng | In this paper, we investigate the problem of mining multivariate time series data generated from sensors mounted on manufacturing stations for early product quality prediction. |
138 | Learning Effective Embeddings From Crowdsourced Labels: An Educational Case Study | G. Xu, W. Ding, J. Tang, S. Yang, G. Y. Huang and Z. Liu | In this paper, we investigate the above problem and propose a novel framework of Representation Learning with crowdsourced Labels, i.e., “RLL”, which learns representation of data with crowdsourced labels by jointly and coherently solving the challenges introduced by limited and inconsistent labels. |
139 | A Prescription Trend Analysis using Medical Insurance Claim Big Data | K. Umemoto, K. Goda, N. Mitsutake and M. Kitsuregawa | (1) We propose a latent variable model that simulates the medication behavior of physicians to accurately reproduce monthly prescription time series from the MIC data, where prescription links between the diseases and medicines are missing. |
140 | Differential Data Quality Verification on Partitioned Data | S. Schelter et al. | We therefore present a differential generalization of the computational model of Deequ, based on algebraic states with monoid properties. |
141 | Logan: A Distributed Online Log Parser | A. Agrawal, R. Karlupia and R. Gupta | In this paper, we train a data-driven log parser on our new Apache Spark dataset, the largest application log dataset yet. |
142 | WebPut: A Web-Aided Data Imputation System for the General Type of Missing String Attribute Values | S. Shan et al. | In this demonstration, we present an end-to-end web-aided data imputation prototype system named WebPut. |
143 | Blockplane: A Global-Scale Byzantizing Middleware | F. Nawab and M. Sadoghi | We propose Blockplane, a middleware that enables making existing benign systems tolerate byzantine failures. |
144 | FGreat: Focused Graph Query Autocompletion | N. Ng, P. Yi, Z. Zhang, B. Choi, S. S. Bhowmick and J. Xu | This demonstration presents an interactive visual Focused GRaph quEry AutocompleTion framework, called FGreat. |
145 | Aucher: Multi-modal Queries on Live Audio Streams in Real-Time | Z. Wen, M. Liang, B. He, Z. Xia and B. Li | This paper demonstrates a real-time search system called Aucher for live audio streams. |
146 | SAC: A System for Big Data Lineage Tracking | M. Tang et al. | To address this issue, we build Spark-Atlas-Connector (short as SAC), a new system to track data lineage in a distributed computation platform, e.g., Spark. |
147 | A Gossip-Based System for Fast Approximate Score Computation in Multinomial Bayesian Networks | A. Zachariah, P. Rao, A. Katib, M. Senapati and K. Barnard | In this paper, we present a system for fast approximate score computation, a fundamental task for score-based structure learning of multinomial Bayesian networks. |
148 | Faster, Higher, Stronger: Redesigning Spreadsheets for Scale | M. Bendre et al. | We demonstrate three key features of DATASPREAD to address the aforementioned spreadsheet scalability challenges in interactivity, navigability, and expressiveness1. |
149 | RecovDB: Accurate and Efficient Missing Blocks Recovery for Large Time Series | I. Arous, M. Khayati, P. Cudr?-Mauroux, Y. Zhang, M. Kersten and S. Stalinlov | In this demo, we present RecovDB, a relational database system enhanced with advanced matrix decomposition technology for missing blocks recovery. |
150 | AI Pro: Data Processing Framework for AI Models | R. Frost, D. Paul and F. Li | We present AI Pro, an open-source framework for data processing with Artificial Intelligence (AI) models. |
151 | IVLG: Interactive Visualization of Large Graphs | M. Krommyda, V. Kantere and Y. Vassiliou | In order to overcome the limitations regarding the volume of the presented information, we have developed a novel technique that enables the interactive visualization as one continuous graph of datasets with millions of elements. |
152 | Just in Time: Personal Temporal Insights for Altering Model Decisions | N. Boer, D. Deutch, N. Frost and T. Milo | To this end, we propose a novel framework that provides users with insights and plans for changing their classification in particular future time points. |
153 | GeoSparkViz in Action: A Data System with Built-in Support for Geospatial Visualization | J. Yu, A. Tahir and M. Sarwat | This paper demonstrates GeoSparkViz, a full-fledged system that allows the user to load, prepare, integrate and execute MapViz tasks in the same system. |
154 | BENU: Distributed Subgraph Enumeration with Backtracking-Based Framework | Z. Wang, R. Gu, W. Hu, C. Yuan and Y. Huang | Different from those join-based algorithms, we develop a new backtracking-based framework BENU for distributed subgraph enumeration. |
155 | Hybrid.Poly: A Consolidated Interactive Analytical Polystore System | M. Podkorytov and M. Gubanov | Here we describe HYBRID.POLY- a consolidated in-memory polystore engine [2], designed to support heterogeneous large-scale data and interactively process complex analytical work-loads. |
156 | EXPLAINER: Entity Resolution Explanations | A. Ebaid, S. Thirumuruganathan, W. G. Aref, A. Elmagarmid and M. Ouzzani | In this demo, we propose ExplainER, a tool to understand and explain entity resolution classifiers with different granularity levels of explanations. |
157 | CEP-Wizard: Automatic Deployment of Distributed Complex Event Processing | Y. Shin, S. Yoon, P. Trirat and J. Lee | In this demonstration, we present CEP-Wizard, a framework of automatically configuring and deploying a distributed CEP engine with minimum effort. |
158 | A Comparison of Allocation Algorithms for Partially Replicated Databases | S. Halfpap and R. Schlosser | In this paper, we test and compare state-of-the-art allocation algorithms for partial replication. |
159 | PePPer: Fine-Grained Personal Access Control via Peer Probing | Y. Amsterdamer and O. Drien | To enable peers to manage access control rights on such data we introduce PePPer, a tool for fine-grained, personal access control. |
160 | COBRA: Compression Via Abstraction of Provenance for Hypothetical Reasoning | D. Deutch, Y. Moskovitch and N. Rinetzky | To this end, we present a framework that allows to reduce provenance size. |
161 | CogLearn: A Cognitive Graph-Oriented Online Learning System | Y. Pian, Y. Lu, P. Chen and Q. Duan | We propose and implement a novel online learning system, called CogLearn, to support learner’s self-awareness and reflective thinking, which urges a proper form of knowledge representation together with individual learner’s cognitive status. |
162 | GRIT: Consistent Distributed Transactions Across Polyglot Microservices with Multiple Databases | G. Zhang, K. Ren, J. Ahn and S. Ben-Romdhane | In this demo we present GRIT: a system that resolves this challenge by cleverly leveraging deterministic database technologies and optimistic concurrency control protocol(OCC). |
163 | vABS: Towards Verifiable Attribute-Based Search Over Shared Cloud Data | Y. Ji, C. Xu, J. Xu and H. Hu | In this demonstration, we present a system called vABS, which enables verifiable Attribute-Based Search over shared cloud data. |
164 | Efficient Synchronization of State-Based CRDTs | V. Enes, P. S. Almeida, C. Baquero and J. Leit?o | In this paper we: 1) identify two sources of inefficiency in current synchronization algorithms for delta-based CRDTs; 2) bring the concept of join decomposition to state-based CRDTs; 3) exploit join decompositions to obtain optimal deltas and 4) improve the efficiency of synchronization algorithms; and finally, 5) experimentally evaluate the improved algorithms. |
165 | An Environment-Aware Market Strategy for Data Allocation and Dynamic Migration in Cloud Database | T. Wang et al. | To this end, this paper presents an environment-aware market strategy based system, named e-MARS, for reasonable data migration to achieve query load balance in cloud database. |
166 | Vaite: A Visualization-Assisted Interactive Big Urban Trajectory Data Exploration System | C. Yang, Y. Zhang, B. Tang and M. Zhu | In this work, we architect and implement a visualization-assisted big urban trajectory data exploration system (Vaiet) to address these chanllenges. |
167 | SciDetector: Scientific Event Discovery by Tracking Variable Source Data Streaming | Z. Duan et al. | We present the design of and a demonstration for SciDetector, a system of scientific research for online analysis. |
168 | Demonstrating Spindra: A Geographic Knowledge Graph Management System | Y. Sun, J. Yu and M. Sarwat | In this paper, we demonstrate a system, namely Spindra, that provides efficient management of geographic knowledge graphs. |
169 | Native Storage Techniques for Data Management | I. Petrov, A. Koch, S. Hardock, T. Vincon and C. Riegger | In the present tutorial we perform a cross-cut analysis of database storage management from the perspective of modern storage technologies. |
170 | Crowdsourcing Database Systems: Overview and Challenges | C. Chai, J. Fan, G. Li, J. Wang and Y. Zheng | In this tutorial, we will survey and synthesize a wide spectrum of existing studies on crowdsourcing database systems. |
171 | Telco Big Data Research and Open Problems | C. Costa and D. Zeinalipour-Yazti | In this tutorial, we overview the state-of-the-art in telco big data analytics by focusing on a set of basic principles, namely: (i) real-time analytics and detection; (ii) experience, behavior and retention analytics; (iii) privacy; and (iv) storage. |
172 | Geospatial Data Management in Apache Spark: A Tutorial | J. Yu and M. Sarwat | A follow-up section presents the common approaches used by the practitioners to extend Spark and introduces the vital components in a generic spatial data management system. |
173 | Hierarchical Decomposition of Big Graphs | Y. Zhang, L. Qin, F. Zhang and W. Zhang | Subsequently, we provide an overview of the existing models and the computation algorithms under different computing environments. |
174 | Cohesive Subgraph Computation Over Large Sparse Graphs | L. Chang and L. Qin | In this tutorial, we survey the models and state-of-the-art algorithms for efficient cohesive subgraph computation based on different cohesiveness measures. Finally, we present open problems for future research. |
175 | Robust Query Processing: Mission Possible | J. R. Haritsa | In this tutorial, we will present these novel research approaches, characterize their strengths and limitations, and enumerate open technical problems that remain to be solved to make robust query processing a contemporary reality. |
176 | Automated Documentation of End-to-End Experiments in Data Science | S. Redyuk | We aim at reducing manual overhead for experimenting researchers, and intend to create a novel approach in dataflow and metadata tracking based on the analysis of the experiment source code. |
177 | Explaining Results of Data-Driven Applications | N. Frost | This paper demonstrates approaches for interpretability in two applications: Natural Language Queries, and Machine Learning Classifiers, followed by a discussion of open problems and future work. |
178 | Towards Explaining the Effects of Data Preprocessing on Machine Learning | C. V. Gonzalez Zelaya | In this initial work we define a simple metric, which we call volatility, to measure the effect of including/excluding a specific step on predictions made by the resulting model. |
179 | Don’t Fear the REAPER: A Framework for Materializing and Reusing Deep-Learning Models | M. B. Sigl | The aim of this research is to reduce training time of machine learning from a data-management perspective through model reuse, and shed some light on the above relationship in the case when reusing a model is appropriate. |
180 | Knowledge Representation for Emotion Intelligence | S. Wang | We have introduced two kinds of improving embedding methods (MEC and Emo2Vec) for the sentiment words embedding. |
181 | Disambiguation and Result Expansion in Keyword Search Over Relational Databases | N. Hormozi | In this paper, we are going to describe how we are improving state of the art in various stages of a keyword-search pipeline in order to retrieve the answers that best match the user’s intent. |
182 | Event Recommendation using Social Media | S. Madisetty | We plan to use event related discussions in social media as a signal for estimating the popularity of the events. |
183 | Learning Individual Models for Imputation | A. Zhang, S. Song, Y. Sun and J. Wang | In this study, enlightened by the conditional dependencies that hold conditionally over certain tuples rather than the whole relation, we propose to learn a regression model individually for each complete tuple together with its neighbors. |
184 | Location Inference for Non-Geotagged Tweets in User Timelines [Extended Abstract] | P. Li, H. Lu, N. Kanhabua, S. Zhao and G. Pan | Subsequently, we adapt machine learning models to our setting and design classifiers that classify each tweet cluster into one of the pre-defined location classes at the city level. |
185 | Efficient Parallel Skyline Query Processing for High-Dimensional Data | M. Tang, Y. Yu, W. G. Aref, Q. M. Malluhi and M. Ouzzani | More specifically, in this paper, we are tackling the data straggler and data skew challenges introduced by distributed skyline query processing, as well as the ensuing high computation cost of merging skyline candidates. |
186 | On Generalizing Collective Spatial Keyword Queries (Extended Abstract) | H. K. Chan, C. Long and R. C. Wong | In this paper, we design a unified cost function which generalizes the majority of existing cost functions for CoSKQ and develop a unified approach which works as well as (and sometimes better than) best-known approaches based on different cost functions. |
187 | A Novel Representation and Compression for Queries on Trajectories in Road Networks (Extended Abstract) | X. Yang, B. Wang, K. Yang, C. Liu and B. Zheng | In this paper, we explore characteristics of the trajectories in road networks, which have motivated the idea of coding trajectories by associating timestamps with relative spatial path and locations. |
188 | Efficient Multi-Class Probabilistic SVMs on GPUs | Z. Wen, J. Shi, B. He, J. Chen and Y. Chen | To overcome the challenges, we propose GMP-SVM to reduce high latency memory accesses and memory consumption through batch processing, computation/data reusing and sharing. |
189 | C2Net: A Network-Efficient Approach to Collision Counting LSH Similarity Join(Extended Abstract) | H. Li, S. Nutanong, H. Xu, C. Yu and F. Ha | This paper focuses on collision counting LSH-based similarity join in MapReduce and proposes a network-efficient solution called C2Net to improve the utilization of MapReduce combiners. |
190 | LinkBlackHole*: Robust Overlapping Community Detection Using Link Embedding (Extended Abstract) | J. Kim, S. Lim, J. Lee and B. S. Lee | This paper proposes LinkBlackHole*, a novel algorithm for finding communities that are (i) overlapping in nodes and (ii) mixing (not separating clearly) in links. |
191 | Fusion OLAP: Fusing the Pros of MOLAP and ROLAP Together for In-memory OLAP (Extended Abstract) | Y. Zhang, Y. Zhang, S. Wang and J. Lu | In this paper, we propose a novel Fusion OLAP model to fuse the multi-dimensional computing model and relational storage model together to make the best aspects of both MOLAP and ROLAP worlds. |
192 | In Search of Indoor Dense Regions: An Approach Using Indoor Positioning Data | H. Li, H. Lu, L. Shou, G. Chen and K. Chen | In this paper, we propose a data-driven approach that finds top-k indoor dense regions by using indoor positioning data. |
193 | CurrentClean: Spatio-Temporal Cleaning of Stale Data | M. Milani, Z. Zheng and F. Chiang | We introduce a spatio-temporal probabilistic model that captures the database update patterns to infer stale values, and propose a set of inference rules that model spatio-temporal update patterns commonly seen in real data. |
194 | Optimizing Quality for Probabilistic Skyline Computation and Probabilistic Similarity Search (Extended Abstract) | X. Miao, Y. Gao, L. Zhou, W. Wang and Q. Li | In this paper, we propose an efficient optimization framework, termed as QueryClean, for both probabilistic skyline computation and probabilistic similarity search. |
195 | On Efficiently Answering Why-Not Range-Based Skyline Queries in Road Networks (Extended Abstract) | X. Miao, Y. Gao, S. Guo and G. Chen | In this paper, we systematically carry out the study of why-not questions on the r-skyline query in the road network (abbreviated as the why-not RSQ problem). |
196 | SLADE: A Smart Large-Scale Task Decomposer in Crowdsourcing | Y. Tong, L. Chen, Z. Zhou, H. V. Jagadish, L. Shou and W. Lv | In this paper, we propose the Smart Large-scAle task DEcomposer (SLADE) problem, which aims to decompose a large-scale crowdsourcing task to achieve the desired reliability at a minimal cost. |
197 | XINA: Explainable Instance Alignment using Dominance Relationship (Extended Abstract) | J. Yeo, H. Park, S. Lee, E. W. Lee and S. Hwang | In this extended abstract, we present an instance alignment framework, namely XINA, for KB integration. |
198 | A Hardware-Accelerated Solution for Hierarchical Index-Based Merge-Join(Extended Abstract) | Z. Zhou, C. Yu, S. Nutanong, Y. Cui, C. Fu and C. J. Xue | In this paper, we develop a novel solution to accelerate the processing of sort-merge join queries with low match rates. |
199 | Finding Most Popular Indoor Semantic Locations Using Uncertain Mobility Data | H. Li, H. Lu, L. Shou, G. Chen and K. Chen | In this work, we use uncertain historical indoor mobility data to find the top-k popular indoor semantic locations with the highest flow values. |
200 | Uncertain Graph Sparsification (Extended Abstract) | P. Parchas, N. Papailiou, D. Papadias and F. Bonchi | To overcome this problem, we introduce the first sparsification techniques aimed explicitly at uncertain graphs. |
201 | Rule-Based Entity Resolution on Database with Hidden Temporal Information (Extended Abstract) | H. Wang, X. Ding, J. Li and H. Gao | In this paper, we deal with the problem of rule-based entity resolution on imprecise temporal data. |
202 | BRIGHT – Drift-Aware Demand Predictions for Taxi Networks (Extended Abstract) | A. Saadallah, L. Moreira-Matias, R. Sousa, J. Khiari, E. Jenelius and J. Gama | In this paper, we propose BRIGHT: a drift-aware supervised learning framework which aims to provide accurate predictions for short-term horizon taxi demand quantities through a creative ensemble of time series analysis methods that handle distinct types of concept drift. |
203 | Order-Sensitive Imputation for Clustered Missing Values (Extended Abstract) | Q. Ma, Y. Gu, W. Lee and G. Yu | To study the issue of missing values (MVs), we propose the Order-Sensitive Imputation for Clustered Missing values (OSICM) framework, in which missing values are imputed sequentially such that the values filled earlier in the process are also used for later imputation of other MVs. |
204 | Fine-Grained Provenance for Matching & ETL | N. Zheng, A. Alawini and Z. G. Ives | We propose PROVision, a provenance-driven troubleshooting tool that supports ETL and matching computations and traces extraction of content within data objects. |
205 | DeepDirect: Learning Directions of Social Ties with Edge-Based Network Embedding (Extended Abstract) | C. Wang, C. Wang, Z. Wang, X. Ye, J. X. Yu and B. Wang | This paper presents the problem of tie direction learning which learns the directionality function of directed social networks. |
206 | A Utility-Optimized Framework for Personalized Private Histogram Estimation (Extended Abstract) | Y. Nie, W. Yang, L. Huang, X. Xie, Z. Zhao and S. Wang | In this poster, we for the first time propose a framework to optimize the utility of histogram estimation with these two privacy requirements. |
207 | Near-Accurate Multiset Reconciliation (Extended Abstract) | L. Luo, D. Guo, X. Zhao, J. Wu, O. Rottenstreich and X. Luo | In this paper, we extend the set reconciliation problem into three design rationales: (i) multiset support; (ii) near 100% reconciliation accuracy; (iii) communication-friendly and time-saving. |
208 | Answering Why-Not Group Spatial Keyword Queries (Extended Abstract) | B. Zheng et al. | We propose a three-phase framework for efficiently computing he WGSK. |
209 | Effective and Efficient Community Search Over Large Directed Graphs (Extended Abstract) | Y. Fang, Z. Wang, R. Cheng, H. Wang and J. Hu | In this paper, we study the problem of CS on directed graph. |
210 | Exploring Communities in Large Profiled Graphs (Extended Abstract) | Y. Chen, Y. Fang, R. Cheng, Y. Li, X. Chen and J. Zhang | In this paper, we study profiled community search (PCS), where CS is performed on a profiled graph. |
211 | Index-Based Densest Clique Percolation Community Search in Networks (Extended Abstract) | L. Yuan, L. Qin, W. Zhang, L. Chang and J. Yang | Motivated by this, in this paper, we adopt the k-clique percolation community model and study the densest clique percolation community search problem which aims to find the k-clique percolation community with the maximum k value that contains a given set of query nodes. |
212 | Unsupervised String Transformation Learning for Entity Consolidation | D. Deng et al. | For this purpose, we propose a data-driven method to standardize the variant values based on two observations: (1) the variant values usually can be transformed to the same representation (e.g., “Mary Lee” and “Lee, Mary”) and (2) the same transformation often appears repeatedly across different clusters (e.g., transpose the first and last name). |
213 | A Semi-Supervised Framework of Clustering Selection for De-Duplication | S. Kushagra, H. Saxena, I. F. Ilyas and S. Ben-David | In this paper, we make the following contributions. |
214 | Scaling Up Subgraph Query Processing with Efficient Subgraph Matching | S. Sun and Q. Luo | As such, in this paper, we study whether, and if so, how to utilize efficient subgraph matching to improve subgraph query processing. |
215 | Efficient Parallel Subgraph Enumeration on a Single Machine | S. Sun, Y. Che, L. Wang and Q. Luo | In this paper, we develop an efficient parallel subgraph enumeration algorithm for a single machine, named LIGHT. |
216 | Fast Dual Simulation Processing of Graph Database Queries | S. Mennicke, J. Kalo, D. Nagel, H. Kroll and W. Balke | In this paper we bridge this gap by introducing a new dual simulation process operating on SPARQL queries. |
217 | Efficient and Incremental Clustering Algorithms on Star-Schema Heterogeneous Graphs | L. Chen, Y. Gao, Y. Zhang, C. S. Jensen and B. Zheng | In this paper, we represent attributed graphs as star-schema heterogeneous graphs, where attributes are modeled as different types of graph nodes. |
218 | G*-Tree: An Efficient Spatial Index on Road Networks | Z. Li, L. Chen and Y. Wang | In this paper, we propose an efficient hierarchical index, G*-tree, to optimize spatial queries on road networks. |
219 | DBSVEC: Density-Based Clustering Using Support Vector Expansion | Z. Wang, R. Zhang, J. Qi and B. Yuan | To address this problem, we propose a novel approximate density-based clustering algorithm named DBSVEC. |
220 | A Joint Context-Aware Embedding for Trip Recommendations | J. He, J. Qi and K. Ramamohanarao | In this study, we propose a POI embedding model to jointly learn the impact of these contextual factors. |
221 | AIR: Attentional Intention-Aware Recommender Systems | T. Chen, H. Yin, H. Chen, R. Yan, Q. V. H. Nguyen and X. Li | Hence, in this paper, we propose AIR, namely attentional intention-aware recommender systems to predict category-wise future user intention and collectively exploit the rich heterogeneous user interaction behaviors (i.e., multiple types of user behaviors). |
222 | No, That’s Not My Feedback: TV Show Recommendation Using Watchable Interval | K. Cho, Y. Lee, K. Han, J. Choi and S. Kim | In order to reflect this new concept into the TV show recommendation, we propose a novel framework based on collaborative filtering. |
223 | Adaptive Wavelet Clustering for Highly Noisy Data | Z. Chen, J. Liu, Y. Deng, K. He and J. E. Hopcroft | In this paper we make progress on the unsupervised task of mining arbitrarily shaped clusters in highly noisy datasets, which is a task present in many real-world applications. |
224 | An Efficient Parallel Keyword Search Engine on Knowledge Graphs | Y. Yang, D. Agrawal, H. V. Jagadish, A. K. H. Tung and S. Wu | In this paper, we attempt to address this need by leveraging advances in hardware technologies, e.g. multi-core CPUs and GPUs. |
225 | Towards Longitudinal Analytics on Social Media Data | F. Xia, B. Yang, C. Yu, W. Qian and A. Zhou | We study a fundamental functionality in longitudinal analytics-the top-k temporal keyword (TkTK) querying. |
226 | LCJoin: Set Containment Join via List Crosscutting | D. Deng, C. Yang, S. Shang, F. Zhu, L. Liu and L. Shao | In contrast, in this paper, we propose to intersect all the inverted lists simultaneously while skipping many irrelevant entries in the lists. |
227 | Bridging the Semantic Gap with SQL Query Logs in Natural Language Interfaces to Databases | C. Baik, H. V. Jagadish and Y. Li | In this paper, we propose leveraging information from the SQL query log of a database to enhance the performance of existing NLIDBs with respect to these challenges. |
228 | MF-Join: Efficient Fuzzy String Similarity Join with Multi-level Filtering | J. Wang, C. Lin and C. Zaniolo | In this paper, we propose MF-Join, a multi-level filtering approach for fuzzy string similarity join. |
229 | Finding Temporal Influential Users Over Evolving Social Networks | S. Huang, Z. Bao, J. S. Culpepper and B. Zhang | In this paper we study the problem of Distinct Influence Maximization (DIM) where the goal is to identify a seed set of influencers who maximize the number of distinct users influenced over a predefined window of time. |
230 | Seed Selection and Social Coupon Allocation for Redemption Maximization in Online Social Networks | T. Chang, Y. Shi, D. Yang and W. Chen | In the paper, we investigate not only the seed selection problem but also the effect of SC allocation for optimizing the redemption rate which represents the efficiency of SC allocation. |
231 | Keyword-Centric Community Search | Z. Zhang, X. Huang, J. Xu, B. Choi and Z. Shang | We design a new function of keyword closeness and propose efficient algorithms to solve the KCCS problem. |
232 | Cohesive Group Nearest Neighbor Queries Over Road-Social Networks | F. Guo, Y. Yuan, G. Wang, L. Chen, X. Lian and Z. Wang | In this paper, we study a new problem: a GNN search on a road network that incorporates cohesive social relationships (CGNN). |
233 | Maximizing Multifaceted Network Influence | Y. Li, J. Fan, G. Ovchinnikov and P. Karras | In this paper, we propose the Optimal Influential Pieces Assignment (OIPA) problem, which is to assign k distinct pieces of an information campaign OIPA to k promoters, so as to achieve the highest viral adoption in a network. |
234 | GB-KMV: An Augmented KMV Sketch for Approximate Containment Similarity Search | Y. Yang, Y. Zhang, W. Zhang and Z. Huang | In this paper, we study the problem of approximate containment similarity search. We provide a set of theoretical analysis to underpin the proposed augmented KMV sketch technique, and show that it outperforms the state-of-the-art technique LSH-E in terms of estimation accuracy under practical assumption. |
235 | ARROW: Approximating Reachability Using Random Walks Over Web-Scale Graphs | N. Sengupta, A. Bagchi, M. Ramanath and S. Bedathur | In this paper, we show that ARROW, despite its simplicity, is near-accurate and scales to graphs with tens of millions of vertices and hundreds of millions of edges. |
236 | Taster: Self-Tuning, Elastic and Online Approximate Query Processing | M. Olma, O. Papapetrou, R. Appuswamy and A. Ailamaki | In this paper, we present Taster, a self-tuning, elastic, online AQP engine that synergistically combines the benefits of online and offline AQP. |
237 | An Iterative Scheme for Leverage-Based Approximate Aggregation | S. Han, H. Wang, J. Wan and J. Li | To address this problem, we propose a novel approach to calculate the aggregation answers with a high accuracy using only a small portion of the data. |
238 | Deletion Propagation for Multiple Key Preserving Conjunctive Queries: Approximations and Complexity | Z. Cai, D. Miao and Y. Li | The investigated problem is a variant of the standard deletion propagation problem, where given a source database D, a set of key preserving conjunctive queries Q, and the set of views V obtained by the queries in Q, we try to identify a set T of tuples from D whose elimination prevents all the tuples in a given set of deletions on views ? |
239 | Enumerating Minimal Weight Set Covers | Z. Ajami and S. Cohen | Thus, we present an algorithm that enumerates all minimal weight set covers in polynomial delay (i.e., with polynomial time between results) in ? |
240 | Constraints-Based Explanations of Classifications | D. Deutch and N. Frost | We propose a simple generic approach for explaining classifications, by identifying relevant parts of the input whose perturbation would be significant in affecting the classification. |
241 | KARL: Fast Kernel Aggregation Queries | T. N. Chan, M. L. Yiu and H. U. Leong | In this paper, we propose a novel and effective bounding technique to speedup the computation of kernel aggregation. |
242 | Assessing and Remedying Coverage for a Given Dataset | A. Asudeh, Z. Jin and H. V. Jagadish | In this paper, we assess the coverage of a given dataset over multiple categorical attributes. |
243 | Social Influence-Based Group Representation Learning for Group Recommendation | H. Yin, Q. Wang, K. Zheng, Z. Li, J. Yang and X. Zhou | In this paper, we propose a novel group recommender system, namely SIGR (short for “Social Influence-based Group Recommender”), which takes an attention mechanism and a bipartite graph embedding model BGEM as building blocks. We create two large-scale benchmark datasets and conduct extensive experiments on them. |
244 | MIDAS: Finding the Right Web Sources to Fill Knowledge Gaps | X. Wang, X. L. Dong, Y. Li and A. Meliou | In this paper, we present MIDAS, a system that harnesses the results of automated knowledge extraction pipelines to repair the bottleneck in industrial knowledge creation and augmentation processes. |
245 | Exploiting Centrality Information with Graph Convolutions for Network Representation Learning | H. Chen, H. Yin, T. Chen, Q. V. H. Nguyen, W. Peng and X. Li | We propose a generalizable model, namely GraphCSC, that utilizes both linkage information and centrality information to learn low-dimensional vector representations for network vertices. |
246 | Route Recommendations on Road Networks for Arbitrary User Preference Functions | P. Yawalkar and S. Ranu | In this paper, we study the query where a user provides a set of relevant PoIs and wants to identify the optimal route covering these PoIs. |
247 | NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph Embedding | Y. Zhang, Q. Yao, Y. Shao and L. Chen | In this paper, motivated by the observation that negative triplets with large scores are important but rare, we propose to directly keep track of them with cache. |
248 | ServeDB: Secure, Verifiable, and Efficient Range Queries on Outsourced Database | S. Wu, Q. Li, G. Li, D. Yuan, X. Yuan and C. Wang | In this paper, we propose a secure and scalable scheme that can support multi-dimensional range queries over encrypted data. |
249 | Collecting and Analyzing Multidimensional Data with Local Differential Privacy | N. Wang et al. | In this paper, we point out that the fundamental problem of collecting multidimensional data under LDP has not been addressed sufficiently, and there remains much room for improvement even for basic tasks such as computing the mean value over a single numeric attribute under LDP. |
250 | Partitioned Data Security on Outsourced Sensitive and Non-Sensitive Data | S. Mehrotra, S. Sharma, J. Ullman and A. Mishra | We propose a new secure approach, entitled query binning (QB) that allows non-sensitive parts of the data to be outsourced in clear-text while guaranteeing that no information is leaked by the joint processing of non-sensitive data (in clear-text) and sensitive data (in encrypted form). |
251 | SecEQP: A Secure and Efficient Scheme for SkNN Query Problem Over Encrypted Geodata on Cloud | X. Lei, A. X. Liu, R. Li and G. Tu | In this paper, we propose the Secure and Efficient Query Processing (SecEQP) scheme to address the secure k nearest neighbor (SkNN) query problem. |
252 | Joins Over Encrypted Data with Fine Granular Security | F. Hahn, N. Loza and F. Kerschbaum | In this paper we present a different approach: Instead of implementing a stand-alone join operator that reveals the frequency of each element in the column, we show how to construct joins over encrypted data after selection operations have been applied. |
253 | Column-Oriented Database Acceleration Using FPGAs | S. Watanabe, K. Fujimoto, Y. Saeki, Y. Fujikawa and H. Yoshino | To overcome this drawback, we developed a column-oriented DBMS and a field-programmable-gate-array-based acceleration engine. |
254 | Hardware-Conscious Hash-Joins on GPUs | P. Sioulas, P. Chrysogelos, M. Karpathiotakis, R. Appuswamy and A. Ailamaki | In this paper, we present the design and implementation of a family of novel, partitioning-based GPU-join algorithms that are tuned to exploit various GPU hardware characteristics for working around the two main limitations of GPUs-limited memory capacity and slow PCIe interface. |
255 | TuFast: A Lightweight Parallelization Library for Graph Analytics | Z. Shang, J. X. Yu and Z. Zhang | In this paper, we present a lightweight transactional memory (TM) library TuFast which provides easy-to-use primitives for the end-user to agilely develop fast shared memory graph parallelization on a multi-core server. |
256 | LDC: A Lower-Level Driven Compaction Method to Optimize SSD-Oriented Key-Value Stores | Y. Chai, Y. Chai, X. Wang, H. Wei, N. Bao and Y. Liang | Aiming to optimize both the tail latency and the system throughput, in this paper, we propose a novel Lower-level Driven Compaction (LDC) method for LSM-tree KV stores. |
257 | No False Negatives: Accepting All Useful Schedules in a Fast Serializable Many-Core System | D. Durner and T. Neumann | We introduce a novel multi-version concurrency protocol that achieves high performance while reducing the number of aborted schedules to a minimum and providing the best isolation level. |
258 | Discovering Maximal Motif Cliques in Large Heterogeneous Information Networks | J. Hu, R. Cheng, K. C. Chang, A. Sankar, Y. Fang and B. Y. H. Lam | We thus present the META algorithm, which employs advanced pruning strategies to effectively reduce the search space. |
259 | REPT: A Streaming Algorithm of Approximating Global and Local Triangle Counts in Parallel | P. Wang, P. Jia, Y. Qi, Y. Sun, J. Tao and X. Guan | To solve these problems, we develop a novel parallel method REPT to significantly reduce the covariance (even completely eliminate the covariance for some cases) between sampled triangles. |
260 | AsterixDB Mid-Flight: A Case Study in Building Systems in Academia | M. J. Carey | This paper describes the experiences that the author and his (mostly UC-based) partners in software crime have had that culminated in the Big Data Management System now available as Apache AsterixDB. |
261 | Information Diffusion Prediction via Recurrent Cascades Convolution | X. Chen, F. Zhou, K. Zhang, G. Trajcevski, T. Zhong and F. Zhang | To capture both the underlying structures governing the spread of information and inherent dependencies between re-tweeting behaviors of users, we propose a semi-supervised method, called Recurrent Cascades Convolutional Networks (CasCN), which explicitly models and predicts cascades through learning the latent representation of both structural and temporal information, without involving any other features. |
262 | Finding Densest Lasting Subgraphs in Dynamic Graphs: A Stochastic Approach | X. Liu, T. Ge and Y. Wu | We propose a framework called Expectation-Maximization with Utility functions (EMU), a novel stochastic approach that nontrivially extends the conventional EM approach. |
263 | Multicapacity Facility Selection in Networks | A. Logins, P. Karras and C. S. Jensen | We present the first, to our knowledge, solution to the MCFS problem that achieves both scalability and high quality, the Wide Matching Algorithm (WMA). |
264 | An MBR-Oriented Approach for Efficient Skyline Query Processing | J. Zhang, W. Wang, X. Jiang, W. Ku and H. Lu | This research proposes an advanced approach that improves the efficiency of skyline query processing by significantly reducing the computational cost on object comparisons, i.e., dominance tests between objects. |
265 | Dynamic Set kNN Self-Join | D. Amagata, T. Hara and C. Xiao | In this paper, we study a novel problem, dynamic set kNN self-join, i.e., for each set, we continuously compute its k nearest neighbor sets. |
266 | Packed Memory Arrays – Rewired | D. De Leo and P. Boncz | PMAs have been studied mostly theoretically but suffer from practical problems, as we show in this paper. |
267 | GEM^2-Tree: A Gas-Efficient Structure for Authenticated Range Queries in Blockchain | C. Zhang, C. Xu, J. Xu, Y. Tang and B. Choi | In this paper, we take the first step toward studying authenticated range queries in the hybrid-storage blockchain. |
268 | Effective Filters and Linear Time Verification for Tree Similarity Joins | T. H?tter, M. Pawlik, R. L?schinger and N. Augsten | In this paper, we present a scalable solution for the tree similarity join that is based on (1) an effective indexing technique that leverages both the labels and the structure of trees to reduce the number of candidates, (2) an efficient upper bound filter that moves many of the candidates directly to the join result without additional verification, (3) a linear time verification technique for the remaining candidates that avoids the expensive tree edit distance. |