Paper Digest: SIGMOD 2013 Highlights
The ACM Special Interest Group on Management of Data (SIGMOD) is one of the top conferences on database management systems and data management technology.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
team@paperdigest.org
TABLE 1: SIGMOD 2013 Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | Cumulon: optimizing statistical data analysis in the cloud | Botong Huang, Shivnath Babu, Jun Yang | We present Cumulon, a system designed to help users rapidly develop and intelligently deploy matrix-based big-data analysis programs in the cloud. |
2 | Shark: SQL and rich analytics at scale | Reynold S. Xin, Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, Ion Stoica | Shark: SQL and rich analytics at scale |
3 | Parallel analytics as a service | Petrie Wong, Zhian He, Eric Lo | This paper presents Thrifty, a prototype implementation of MPPDB-as-a-service. |
4 | MESSIAH: missing element-conscious SLCA nodes search in XML data | Ba Quan Truong, Sourav S Bhowmick, Curtis Dyreson, Aixin Sun | In this paper, we generalize the SLCA search paradigm to support queries involving missing elements. |
5 | Indexing for subtree similarity-search using edit distance | Sara Cohen | This paper proposes the first index structure for subtree similarity-search, provided that the unit cost function is used. |
6 | Discovering XSD keys from XML data | Marcelo Arenas, Jonny Daenen, Frank Neven, Martin Ugarte, Jan Van den Bussche, Stijn Vansummeren | The algorithm leverages known discovery algorithms for functional dependencies in the relational model, but incorporates the above mentioned properties to assess and refine the quality of derived keys. |
7 | A scalable lock manager for multicores | Hyungsoo Jung, Hyuck Han, Alan D. Fekete, Gernot Heiser, Heon Y. Yeom | Our analysis of MySQL identifies latch contention within the lock manager as the bottleneck responsible for this collapse. |
8 | Controlled lock violation | Goetz Graefe, Mark Lillibridge, Harumi Kuno, Joseph Tucek, Alistair Veitch | Thus, we set out to achieve the same goals as early lock release but with a different, simpler, and more robust approach. |
9 | X-FTL: transactional FTL for SQLite databases | Woon-Hak Kang, Sang-Won Lee, Bongki Moon, Gi-Hwan Oh, Changwoo Min | In this paper, we propose X-FTL, a transactional flash translation layer(FTL) for SQLite databases. |
10 | Optimal splitters for temporal and multi-version databases | Wangchao Le, Feifei Li, Yufei Tao, Robert Christensen | We introduce the concept of optimal splitters for temporal and multi-version databases, which induce a partition of the input data set, and guarantee that the size of the maximum bucket be minimized among all possible configurations, given a budget for the desired number of buckets. |
11 | Building an efficient RDF store over a relational database | Mihaela A. Bornea, Julian Dolby, Anastasios Kementsietsidis, Kavitha Srinivas, Patrick Dantressangle, Octavian Udrea, Bishwaranjan Bhattacharjee | In this paper, we describe a novel storage and query mechanism for RDF which works on top of existing relational representations. |
12 | Automatic synthesis of out-of-core algorithms | Yannis Klonatos, Andres Nötzli, Andrej Spielmann, Christoph Koch, Victor Kuncak | We present a system for the automatic synthesis of efficient algorithms specialized for a particular memory hierarchy and a set of storage devices. |
13 | InfoGather+: semantic matching and annotation of numeric and time-varying attributes in web tables | Meihui Zhang, Kaushik Chakrabarti | Our key insight is to leverage the wealth of tables on the web and infer label information from semantically matching columns of other web tables; this complements "local" extraction from column headers. |
14 | Value invention in data exchange | Patricia C. Arocena, Boris Glavic, Renee J. Miller | In this paper, we present two techniques for understanding when the Skolem functions needed to represent the correct semantics of incomplete information are computationally well-behaved. |
15 | Indexing methods for moving object databases: games and other applications | Hanan Samet, Jagan Sankaranarayanan, Michael Auerbach | Indexing methods for moving object databases: games and other applications |
16 | I/O efficient: computing SCCs in massive graphs | Zhiwei Zhang, Jeffrey Xu Yu, Lu Qin, Lijun Chang, Xuemin Lin | We propose a new two phase algorithm, namely, tree construction and tree search. |
17 | TF-Label: a topological-folding labeling scheme for reachability querying in a large graph | James Cheng, Silu Huang, Huanhuan Wu, Ada Wai-Chee Fu | We propose TF-label, an efficient and scalable labeling scheme for processing reachability queries. |
18 | Efficiently computing k-edge connected components via graph decomposition | Lijun Chang, Jeffrey Xu Yu, Lu Qin, Xuemin Lin, Chengfei Liu, Weifa Liang | As a result, our algorithm for computing k-edge connected components significantly improves the time complexity of an existing state-of-the-art technique from O(|V|2|E| + |V|3 log |V|) to O(h × l × |E|). |
19 | An online cost sensitive decision-making method in crowdsourcing systems | Jinyang Gao, Xuan Liu, Beng Chin Ooi, Haixun Wang, Gang Chen | In this paper, we design and implement a cost sensitive method for crowdsourcing. |
20 | Leveraging transitive relations for crowdsourced joins | Jiannan Wang, Guoliang Li, Tim Kraska, Michael J. Franklin, Jianhua Feng | In this paper, we focus on the crowdsourced join query which aims to utilize humans to find all pairs of matching objects from two collections. |
21 | Crowd mining | Yael Amsterdamer, Yael Grossman, Tova Milo, Pierre Senellart | Based on these, we design a framework of generic components, used for choosing the best questions to ask the crowd and mining significant patterns from the answers. |
22 | Efficient sentiment correlation for large-scale demographics | Mikalai Tsytsarau, Sihem Amer-Yahia, Themis Palpanas | We propose a scalable approach for sentiment indexing and aggregation that works on multiple time granularities and uses incrementally updateable data structures for online operation. |
23 | EBM: an entropy-based model to infer social strength from spatiotemporal data | Huy Pham, Cyrus Shahabi, Yan Liu | In this paper, we are interested in inferring these social connections by analyzing people’s location information, which is useful in a variety of application domains from sales and marketing to intelligence analysis. |
24 | Online search of overlapping communities | Wanyun Cui, Yanghua Xiao, Haixun Wang, Yiqi Lu, Wei Wang | In this paper, we focus on online search of overlapping communities, that is, given a query vertex, we find meaningful overlapping communities the vertex belongs to in an online manner. |
25 | BitWeaving: fast scans for main memory data processing | Yinan Li, Jignesh M. Patel | In this paper, we propose a technique called BitWeaving that exploits the parallelism available at the bit level in modern processors. |
26 | Performance and resource modeling in highly-concurrent OLTP workloads | Barzan Mozafari, Carlo Curino, Alekh Jindal, Samuel Madden | In this paper, we introduce our framework, called DBSeer, that addresses this problem by employing statistical models that provide resource and performance analysis and prediction for highly concurrent OLTP workloads. |
27 | ODYS: an approach to building a massively-parallel search engine using a DB-IR tightly-integrated parallel DBMS for higher-level functionality | Kyu-Young Whang, Tae-Seob Yun, Yeon-Mi Yeo, Il-Yeol Song, Hyuk-Yoon Kwon, In-Joong Kim | In this paper, we propose a new approach of building a massively-parallel search engine using a DB-IR tightly-integrated parallel DBMS. |
28 | Massive graph triangulation | Xiaocheng Hu, Yufei Tao, Chin-Wan Chung | Motivated by this, we develop a new algorithm that is provably I/O and CPU efficient at the same time, without making any assumption on the input G at all. |
29 | Turbo | Wook-Shin Han, Jinsoo Lee, Jeong-Hoon Lee | In this paper, we present an efficient and robust subgraph search solution, called TurboISO, which is turbo-charged with two novel concepts, candidate region exploration and the combine and permute strategy (in short, Comb/Perm). |
30 | Fast exact shortest-path distance queries on large networks by pruned landmark labeling | Takuya Akiba, Yoichi Iwata, Yuichi Yoshida | We propose a new exact method for shortest-path distance queries on large-scale networks. |
31 | Improving regular-expression matching on strings using negative factors | Xiaochun Yang, Bin Wang, Tao Qiu, Yaoshu Wang, Chen Li | In this paper we propose a novel technique that prunes false negatives by utilizing negative factors, which are substrings that cannot appear in an answer. |
32 | String similarity measures and joins with synonyms | Jiaheng Lu, Chunbin Lin, Wei Wang, Chen Li, Haiyong Wang | Because using synonyms in similarity measures is, while expressive, computationally expensive (NP-hard), we propose an efficient algorithm, called selective-expansion, which guarantees the optimality in many real scenarios. |
33 | Efficient top-k algorithms for approximate substring matching | Younghoon Kim, Kyuseok Shim | To reduce the number of expensive distance computations, the proposed algorithms utilize our novel filtering techniques which take advantages of q-grams and inverted q-gram indexes available. |
34 | Towards high-throughput gibbs sampling at scale: a study across storage managers | Ce Zhang, Christopher Ré | We find that both new theoretical and new algorithmic techniques are required to understand the tradeoff space for each choice. |
35 | Latch-free data structures for DBMS: design, implementation, and evaluation | Takashi Horikawa | This paper investigates these LF data structures with a particular focus on their applicability and effectiveness. |
36 | DBMS metrology: measuring query time | Sabah Currim, Richard T. Snodgrass, Young-Kyoon Suh, Rui Zhang, Matthew Wong Johnson, Cheng Yi | We review relevant process and overall measures obtainable from the Linux kernel and introduce a structural causal model relating these measures. |
37 | Quality and efficiency for kernel density estimates in large data | Yan Zheng, Jeffrey Jestes, Jeff M. Phillips, Feifei Li | We propose randomized and deterministic algorithms with quality guarantees which are orders of magnitude more efficient than previous algorithms. |
38 | Efficient ad-hoc search for personalized PageRank | Yasuhiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, Takeshi Mishima, Makoto Onizuka | The goal of this paper is to efficiently find the top-k nodes with exact node ranking so as to effectively support interactive similarity search based on PPR. |
39 | Provenance-based dictionary refinement in information extraction | Sudeepa Roy, Laura Chiticariu, Vitaly Feldman, Frederick R. Reiss, Huaiyu Zhu | In this paper, we study the dictionary refinement problem and address the above challenges. |
40 | CS2: a new database synopsis for query estimation | Feng Yu, Wen-Chi Hou, Cheng Luo, Dunren Che, Mengxia Zhu | We introduce a statistical technique, called reverse sample, and design a powerful estimator, called reverse estimator, to fully utilize correlated sample tuples for query estimation. |
41 | Branch-and-bound algorithm for reverse top-k queries | Akrivi Vlachou, Christos Doulkeridis, Kjetil Nørvåg, Yannis Kotidis | We propose an intuitive branch-and-bound algorithm for processing reverse top-k queries efficiently and discuss novel optimizations to boost its performance. |
42 | On the correct and complete enumeration of the core search space | Guido Moerkotte, Pit Fender, Marius Eich | We present three conflict detectors. |
43 | Trinity: a distributed graph engine on a memory cloud | Bin Shao, Haixun Wang, Yatao Li | In this paper, we introduce Trinity, a general purpose graph engine over a distributed memory cloud. |
44 | Characterizing tenant behavior for placement and crisis mitigation in multitenant DBMSs | Aaron J. Elmore, Sudipto Das, Alexander Pucher, Divyakant Agrawal, Amr El Abbadi, Xifeng Yan | We present Delphi, a self-managing system controller for a multitenant DBMS, and Pythia, a technique to learn behavior through observation and supervision using DBMS-agnostic database level performance measures. |
45 | Minimal MapReduce algorithms | Yufei Tao, Wenqing Lin, Xiaokui Xiao | This paper presents the notion of minimal algorithm, that is, an algorithm that guarantees the best parallelization in multiple aspects at the same time, up to a small constant factor. |
46 | NADEEF: a commodity data cleaning system | Michele Dallachiesa, Amr Ebaid, Ahmed Eldawy, Ahmed Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, Nan Tang | In this paper, we present NADEEF, an extensible, generalized and easy-to-deploy data cleaning platform. |
47 | Don’t be SCAREd: use SCalable Automatic REpairing with maximal likelihood and bounded changes | Mohamed Yakout, Laure Berti-Équille, Ahmed K. Elmagarmid | In this paper, we propose a new data repairing approach that is based on maximizing the likelihood of replacement data given the data distribution, which can be modeled using statistical machine learning techniques. |
48 | Determining the relative accuracy of attributes | Yang Cao, Wenfei Fan, Wenyuan Yu | This paper proposes a model for determining relative accuracy. |
49 | Photon: fault-tolerant and scalable joining of continuous data streams | Rajagopal Ananthanarayanan, Venkatesh Basker, Sumit Das, Ashish Gupta, Haifeng Jiang, Tianhao Qiu, Alexey Reznichenko, Deomid Ryabkov, Manpreet Singh, Shivakumar Venkataraman | In this paper, we describe the architecture of Photon, a geographically distributed system for joining multiple continuously flowing streams of data in real-time with high scalability and low latency, where the streams may be unordered or delayed. We also present challenges and solutions in maintaining large persistent state across geographically distant locations, and highlight the design principles that emerged from our experience. |
50 | Utility-maximizing event stream suppression | Di Wang, Yeye He, Elke Rundensteiner, Jeffrey F. Naughton | In this paper we consider how to suppress events in a stream to reduce the disclosure of sensitive patterns while maximizing the detection of nonsensitive patterns. |
51 | ε-Matching: event processing over noisy sequences in real time | Zheng Li, Tingjian Ge, Cindy X. Chen | Instead of the traditional approach of learning a distribution of the stream first and then processing queries, we propose a new approach that efficiently does the matching based on an error model. |
52 | Toward practical query pricing with QueryMarket | Paraschos Koutris, Prasang Upadhyaya, Magdalena Balazinska, Bill Howe, Dan Suciu | We develop a new pricing system, QueryMarket, for flexible query pricing in a data market based on an earlier theoretical framework (Koutris et al., PODS 2012). |
53 | Generalized scale independence through incremental precomputation | Michael Armbrust, Eric Liang, Tim Kraska, Armando Fox, Michael J. Franklin, David A. Patterson | In this paper, we describe a scale-independent view selection and maintenance system, which uses novel static analysis techniques that ensure that created views do not themselves become scaling bottlenecks. |
54 | Simulation of database-valued markov chains using SimSQL | Zhuhua Cai, Zografoula Vagena, Luis Perez, Subramanian Arumugam, Peter J. Haas, Christopher Jermaine | This paper describes the SimSQL system, which allows for SQLbased specification, simulation, and querying of database-valued Markov chains, i.e., chains whose value at any time step comprises the contents of an entire database. |
55 | Recursive mechanism: towards node differential privacy and unrestricted joins | Shixi Chen, Shuigeng Zhou | In this paper, we propose a novel differentially private mechanism that supports unrestricted joins, to release an approximation of a linear statistic of the result of some positive relational algebra calculation over a sensitive database. |
56 | PrivGene: differentially private model fitting using genetic algorithms | Jun Zhang, Xiaokui Xiao, Yin Yang, Zhenjie Zhang, Marianne Winslett | Motivated by this, we propose PrivGene, a general-purpose differentially private model fitting solution based on genetic algorithms (GA). |
57 | Information preservation in statistical privacy and bayesian estimation of unattributed histograms | Bing-Rong Lin, Daniel Kifer | In statistical privacy, utility refers to two concepts: information preservation — how much statistical information is retained by a sanitizing algorithm, and usability — how (and with how much difficulty) does one extract this information to build statistical models, answer queries, etc. |
58 | Collective spatial keyword queries: a distance owner-driven approach | Cheng Long, Raymond Chi-Wing Wong, Ke Wang, Ada Wai-Chee Fu | In this paper, we study the CoSKQ problem and address the above issues. |
59 | TOUCH: in-memory spatial join by hierarchical data-oriented partitioning | Sadegh Nobari, Farhan Tauheed, Thomas Heinis, Panagiotis Karras, Stéphane Bressan, Anastasia Ailamaki | In this paper we develop TOUCH, a novel in-memory spatial join algorithm that uses hierarchical data-oriented space partitioning, thereby keeping both its memory footprint and the number of comparisons low. |
60 | Finding time period-based most frequent path in big trajectory data | Wuman Luo, Haoyu Tan, Lei Chen, Lionel M. Ni | In this paper, we study a new path finding query which finds the most frequent path (MFP) during user-specified time periods in large-scale historical trajectory data. |
61 | Integrating scale out and fault tolerance in stream processing using operator state management | Raul Castro Fernandez, Matteo Migliavacca, Evangelia Kalyvianaki, Peter Pietzuch | Based on them, we describe an integrated approach for dynamic scale out and recovery of stateful operators. |
62 | Quantiles over data streams: an experimental study | Lu Wang, Ge Luo, Ke Yi, Graham Cormode | In this paper, we remedy this deficit by providing a taxonomy of different methods, and describe efficient implementations. |
63 | An efficient query indexing mechanism for filtering geo-textual data | Lisi Chen, Gao Cong, Xin Cao | In particular, we propose a hybrid index, called IQ-tree, and novel cost models for managing a stream of incoming Boolean Range Continuous queries. |
64 | Bolt-on causal consistency | Peter Bailis, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica | We consider the problem of separating consistency-related safety properties from availability and durability in distributed data stores via the application of a "bolt-on" shim layer that upgrades the safety of an underlying general-purpose data store. |
65 | RTP: robust tenant placement for elastic in-memory database clusters | Jan Schaffner, Tim Januschowski, Megan Kercher, Tim Kraska, Hasso Plattner, Michael J. Franklin, Dean Jacobs | In this paper, we consider algorithms that elastically contract and expand a cluster of in-memory databases depending on tenants’ behavior over time while maintaining response time guarantees. |
66 | Inter-media hashing for large-scale retrieval from heterogeneous data sources | Jingkuan Song, Yang Yang, Yi Yang, Zi Huang, Heng Tao Shen | In this paper, we present a new multimedia retrieval paradigm to innovate large-scale search of heterogenous multimedia data. |
67 | Mind the gap: large-scale frequent sequence mining | Iris Miliaraki, Klaus Berberich, Rainer Gemulla, Spyros Zoupanos | In this paper, we propose MG-FSM, a scalable algorithm for frequent sequence mining on MapReduce. |
68 | Reverse engineering complex join queries | Meihui Zhang, Hazem Elmeleegy, Cecilia M. Procopiuc, Divesh Srivastava | In this paper, we propose an efficient algorithm that discovers queries with arbitrary join graphs. |
69 | A direct mining approach to efficient constrained graph pattern discovery | Feida Zhu, Zequn Zhang, Qiang Qu | In this paper, we propose a direct mining framework to solve the problem and illustrate our ideas in the context of a particular type of constrained frequent patterns — the "skinny" patterns, which are graph patterns with a long backbone from which short twigs branch out. |
70 | Calibrating trajectory data for similarity-based analysis | Han Su, Kai Zheng, Haozhou Wang, Jiamin Huang, Xiaofang Zhou | In this paper, we pioneer a systematic approach to trajectory calibration that is a process to transform a heterogeneous trajectory dataset to one with (almost) unified sampling strategies. |
71 | On optimal worst-case matching | Cheng Long, Raymond Chi-Wing Wong, Philip S. Yu, Minhao Jiang | In this paper, we propose a new problem called Spatial Matching for Minimizing Maximum matching distance (SPM-MM). |
72 | Shortest path and distance queries on road networks: towards bridging theory and practice | Andy Diwen Zhu, Hui Ma, Xiaokui Xiao, Siqiang Luo, Youze Tang, Shuigeng Zhou | This paper presents Arterial Hierarchy (AH), an index structure that narrows the gap between theory and practice in answering shortest path and distance queries on road networks. |
73 | Fine-grained disclosure control for app ecosystems | Gabriel M. Bender, Lucja Kot, Johannes Gehrke, Christoph Koch | In this paper, we develop a new model of disclosure in app ecosystems. |
74 | Lightweight authentication of linear algebraic queries on data streams | Stavros Papadopoulos, Graham Cormode, Antonios Deligiannakis, Minos Garofalakis | We consider a stream outsourcing setting, where a data owner delegates the management of a set of disjoint data streams to an untrusted server. |
75 | Column imprints: a secondary index structure | Lefteris Sidirourgos, Martin Kersten | In this paper, we introduce column imprint, a simple but efficient cache conscious secondary index. |
76 | DeltaNI: an efficient labeling scheme for versioned hierarchical data | Jan Finis, Robert Brunel, Alfons Kemper, Thomas Neumann, Franz Färber, Norman May | We propose the DeltaNI index as a versioned pendant of the nested intervals (NI) labeling scheme. |
77 | Big data in capital markets | Alex Nazaruk, Michael Rauchman | In this talk we present and analyze forces behind the wide proliferation of electronic securities trading in US stocks and options markets. |
78 | Managing database technology at enterprise scale | Paul Yaron | This talk will discuss the challenges and strategies of managing the evolving ecosystem of "all data", from information security, to internal virtualization strategies. |
79 | We are drowning in a sea of least publishable units (LPUs) | David J. DeWitt, Ihab F. Ilyas, Jeffrey Naughton, Michael Stonebraker | In order to improve the quality of the papers being published we must reduce the number being submitted. |
80 | Rethinking eventual consistency | Philip A. Bernstein, Sudipto Das | We present a framework for comparing these criteria and mechanisms, to help architects navigate through this complex design space. |
81 | Workload management for big data analytics | Ashraf Aboulnaga, Shivnath Babu | Workload management for big data analytics |
82 | Knowledge harvesting in the big-data era | Fabian Suchanek, Gerhard Weikum | This tutorial presents state-of-the-art methods, recent advances, research opportunities, and open challenges along this avenue of knowledge harvesting and its applications. |
83 | Machine learning for big data | Tyson Condie, Paul Mineiro, Neoklis Polyzotis, Markus Weimer | This tutorial introduces current applications, techniques and systems with the aim of cross-fertilizing research between the database and machine learning communities. This leads to the closing of the seminar, where we introduce two sets of open research questions: Better systems support for the already established use cases of Machine Learning and support for recent advances in Machine Learning research. |
84 | Data management perspectives on business process management: tutorial overview | Richard Hull, Jianwen Su, Roman Vaculin | A recent approach to BPM, based on "business artifacts", is centered on a modeling framework that places data and process on an equal footing. |
85 | Data stream warehousing | Lukasz Golab, Theodore Johnson | Data stream warehousing |
86 | Data-driven neuroscience: enabling breakthroughs via innovative data management | Alexandros Stougiannis, Mirjana Pavlovic, Farhan Tauheed, Thomas Heinis, Anastasia Ailamaki | The level of detail of their models is unprecedented as they model details on the subcellular level (e.g., the neurotransmitter). |
87 | TsingNUS: a location-based service system towards live city | Guoliang Li, Nan Zhang, Ruicheng Zhong, Sitong Liu, Weihuang Huang, Ju Fan, Kian-Lee Tan, Lizhu Zhou, Jianhua Feng | We present our system towards live city, called TsingNUS, aiming to provide users with more user-friendly location-aware search experiences. |
88 | WOW: what the world of (data) warehousing can learn from the World of Warcraft | Rene Mueller, Tim Kaldewey, Guy M. Lohman, John McPherson | WOW: what the world of (data) warehousing can learn from the World of Warcraft |
89 | Rule-based application development using Webdamlog | Serge Abiteboul, Émilien Antoine, Gerome Miklau, Julia Stoyanovich, Jules Testard | We present the WebdamLog system for managing distributed data on the Web in a peer-to-peer manner. |
90 | Speeding up database applications with Pyxis | Alvin Cheung, Owen Arden, Samuel Madden, Andrew C. Myers | We propose to demonstrate Pyxis, a system that optimizes database applications by pushing computation to the database server. |
91 | Peckalytics: analyzing experts and interests on Twitter | Alex Cheng, Nilesh Bansal, Nick Koudas | Our aim is to facilitate targeting and optimization of advertising campaigns on the Twitter platform. |
92 | Packing experiments for sharing and publication | Fernando Chirigati, Dennis Shasha, Juliana Freire | As a step towards simplifying the process of creating reproducible experiments, we have developed ReproZip, a tool that automatically captures the provenance of experiments and packs all the necessary files, library dependencies and variables to reproduce the results. |
93 | CHIC: a combination-based recommendation system | Manasi Vartak, Samuel Madden | In this demonstration, we present CHIC, a first-of-its-kind, combination-based recommendation system for clothing. |
94 | Noah: a dynamic ridesharing system | Charles Tian, Yan Huang, Zhi Liu, Favyen Bastani, Ruoming Jin | Noah: a dynamic ridesharing system |
95 | A query answering system for data with evolution relationships | Siarhei Bykau, Flavio Rizzolo, Yannis Velegrakis | We present the TrenDS, a system for exploiting the evolution relationships between the structures in the database. |
96 | GeoDeepDive: statistical inference using familiar data-processing languages | Ce Zhang, Vidhya Govindaraju, Jackson Borchardt, Tim Foltz, Christopher Ré, Shanan Peters | We describe our proposed demonstration of GeoDeepDive, a system that helps geoscientists discover information and knowledge buried in the text, tables, and figures of geology journal articles. |
97 | Fact checking and analyzing the web | François Goasdoué, Konstantinos Karanasos, Yannis Katsis, Julien Leblay, Ioana Manolescu, Stamatis Zampetakis | We propose to demonstrate FactMinder, a fact checking and analysis assistance application. |
98 | Data mining algorithms as a service in the cloud exploiting relational database systems | Carlos Ordonez, Javier García-García, Carlos Garcia-Alvarado, Wellington Cabrera, Veerabhadran Baladandayuthapani, Mohammed S. Quraishi | We present a novel cloud system based on DBMS technology, where data mining algorithms are offered as a service. |
99 | SONDY: an open source platform for social dynamics mining and analysis | Adrien Guille, Cécile Favre, Hakim Hacid, Djamel A. Zighed | This paper describes SONDY, a tool for analysis of trends and dynamics in online social network data. |
100 | Interactive data mining with 3D-parallel-coordinate-trees | Elke Achtert, Hans-Peter Kriegel, Erich Schubert, Arthur Zimek | Here, we provide a tool to explore complex data sets using 3D-parallel-coordinate-trees, along with a number of approaches to arrange the axes. |
101 | Stat!: an interactive analytics environment for big data | Mike Barnett, Badrish Chandramouli, Robert DeLine, Steven Drucker, Danyel Fisher, Jonathan Goldstein, Patrick Morrison, John Platt | We demonstrate Stat! |
102 | PARAS: interactive parameter space exploration for association rule mining | Abhishek Mukherji, Xika Lin, Christopher R. Botaish, Jason Whitehouse, Elke A. Rundensteiner, Matthew O. Ward, Carolina Ruiz | We demonstrate our PARAS technology for supporting interactive association mining at near real-time speeds. |
103 | STEM: a spatio-temporal miner for bursty activity | Theodoros Lappas, Marcos R. Vieira, Dimitrios Gunopulos, Vassilis J. Tsotras | In this paper we describe STEM (Spatio-TEmporal Miner), a system for finding spatiotemporal burstiness patterns in a collection of spatially distributed frequency streams. |
104 | The farm: where pig scripts are bred and raised | Craig P. Sayers, Alkis Simitsis, Georgia Koutrika, Alejandro Guerrero Gonzalez, David Tamez Cantu, Meichun Hsu | To further assist developers, and support novice users, we offer "The Farm", a catalog of scriptable services supporting creation, discovery, composition, and optimized execution. |
105 | LinkIT: privacy preserving record linkage and integration via transformations | Luca Bonomi, Li Xiong, James J. Lu | We propose to demonstrate an open-source tool, LinkIT, for privacy preserving record Linkage and Integration via data Transformations. |
106 | Secure database-as-a-service with Cipherbase | Arvind Arasu, Spyros Blanas, Ken Eguro, Manas Joglekar, Raghav Kaushik, Donald Kossmann, Ravi Ramamurthy, Prasang Upadhyaya, Ramarathnam Venkatesan | In this demonstration we outline the functionality of Cipherbase — a full fledged SQL database system that supports the full generality of a database system while providing high data confidentiality. |
107 | DBalancer: distributed load balancing for NoSQL data-stores | Ioannis Konstantinou, Dimitrios Tsoumakos, Ioannis Mytilinis, Nectarios Koziris | In this demonstration, we present the DBalancer, a generic distributed module that can be installed on top of a typical NoSQL data-store and provide an efficient and highly configurable load balancing mechanism. |
108 | COCCUS: self-configured cost-based query services in the cloud | Ioannis Konstantinou, Verena Kantere, Dimitrios Tsoumakos, Nectarios Koziris | We demonstrate COCCUS, a modular system for cost-aware query execution, adaptive query charge and optimization of cloud data services. |
109 | Workload optimization using SharedDB | Georgios Giannikis, Darko Makreshanski, Gustavo Alonso, Donald Kossmann | Workload optimization using SharedDB |
110 | SciQL: array data processing inside an RDBMS | Ying Zhang, Martin Kersten, Stefan Manegold | To bridge the gap between the needs of the Data-Intensive Research fields and the current DBMS technologies, we have introduced SciQL (pronounced as ‘cycle’). |
111 | Iterative parallel data processing with stratosphere: an inside look | Stephan Ewen, Sebastian Schelter, Kostas Tzoumas, Daniel Warneke, Volker Markl | In prior work, we have shown how to extend and use a parallel data flow system to efficiently run iterative algorithms in a shared-nothing environment. |
112 | CARTILAGE: adding flexibility to the Hadoop skeleton | Alekh Jindal, Jorge Quiané-Ruiz, Samuel Madden | In this paper, we present CARTILAGE, a comprehensive data storage framework built on top of HDFS. |
113 | Continuous outlier detection in data streams: an extensible framework and state-of-the-art algorithms | Dimitrios Georgiadis, Maria Kontaki, Anastasios Gounaris, Apostolos N. Papadopoulos, Kostas Tsichlas, Yannis Manolopoulos | In this demo, we show both an extendible framework for outlier detection algorithms and specific outlier detection algorithms for the demanding case where outlier detection is continuously performed over a data stream. |
114 | FAST: differentially private real-time aggregate monitor with filtering and adaptive sampling | Liyue Fan, Li Xiong, Vaidy Sunderam | We propose FAST, a real-time system that allows differentially private aggregate sharing and time-series analytics. |
115 | Execution and optimization of continuous queries with cyclops | Harold Lim, Shivnath Babu | Cyclops employs a cost-based approach for picking the most suitable engine and plan for executing a given query. |
116 | Less watts, more performance: an intelligent storage engine for data appliances | Louis Woods, Jens Teubner, Gustavo Alonso | In this demonstration, we present Ibex, a novel storage engine featuring hybrid, FPGA-accelerated query processing. |
117 | A demonstration of SQLVM: performance isolation in multi-tenant relational database-as-a-service | Vivek Narasayya, Sudipto Das, Manoj Syamala, Surajit Chaudhuri, Feng Li, Hyunjung Park | A demonstration of SQLVM: performance isolation in multi-tenant relational database-as-a-service |
118 | SQUIN: a traversal based query execution system for the web of linked data | Olaf Hartig | We demonstrate the query execution system SQUIN which implements a novel query execution approach. |
119 | GRDB: a system for declarative and interactive analysis of noisy information networks | Walaa Eldin Moustafa, Hui Miao, Amol Deshpande, Lise Getoor | GRDB: a system for declarative and interactive analysis of noisy information networks |
120 | HiNGE: enabling temporal network analytics at scale | Udayan Khurana, Amol Deshpande | In this demonstration proposal, we present HiNGE (Historical Network/Graph Explorer), a system that enables interactive exploration and analytics over large evolving networks through visualization and node-centric metric computations. |
121 | Research-insight: providing insight on research by publication network analysis | Fangbo Tao, Xiao Yu, Kin Hou Lei, George Brova, Xiao Cheng, Jiawei Han, Rucha Kanade, Yizhou Sun, Chi Wang, Lidan Wang, Tim Weninger | Much knowl- edge can be derived from such information networks if we systematically develop an effective and scalable database-oriented information network analysis technology. |
122 | QUBLE: blending visual subgraph query formulation with query processing on large networks | Ho Hoang Hung, Sourav S Bhowmick, Ba Quan Truong, Byron Choi, Shuigeng Zhou | In this demonstration, we present a novel system called QUBLE (QUery Blender for Large nEtworks) to realize this novel paradigm on large networks. |
123 | StreamWorks: a system for dynamic graph search | Sutanay Choudhury, Lawrence Holder, George Chin, Abhik Ray, Sherman Beus, John Feo | The goal of our work is to enable real-time search capabilities for graph databases. |
124 | Query processing on prefix trees live | Thomas Kissinger, Benjamin Schlegel, Dirk Habich, Wolfgang Lehner | In this demonstration proposal, we present DexterDB, which implements our novel prefix tree-based processing model that makes indexes the first-class citizen of the database system. |
125 | xPAD: a platform for analytic data flows | Alkis Simitsis, Kevin Wilkinson, Petar Jovanovic | We present xPAD, our platform to manage analytic data flows. |
126 | PBS at work: advancing data management with consistency metrics | Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, Ion Stoica | A large body of recent work has proposed analytical and empirical techniques for quantifying the data consistency properties of distributed data stores. |
127 | The power of data use management in action | Prasang Upadhyaya, Nick Anderson, Magdalena Balazinska, Bill Howe, Raghav Kaushik, Ravi Ramamurthy, Dan Suciu | The power of data use management in action |
128 | CTrace: semantic comparison of multi-granularity process traces | Qing Liu, Kerry Taylor, Xiang Zhao, Geoffrey Squire, Xuemin Lin, Corne Kloppers, Richard Miller | We present CTrace, a system that (1) lets users explore the conceptual abstraction of large process traces with different levels of granularity, and (2) provides semantic comparison among traces in which both the structural and the semantic similarity are considered. |
129 | The big data ecosystem at LinkedIn | Roshan Sumbaly, Jay Kreps, Sam Shah | In particular, we present our solutions to the “last mile” issues in providing a rich developer ecosystem. |
130 | On brewing fresh espresso: LinkedIn’s distributed data serving platform | Lin Qiao, Kapil Surlaker, Shirshanka Das, Tom Quiggle, Bob Schulman, Bhaskar Ghosh, Antony Curtis, Oliver Seeliger, Zhen Zhang, Aditya Auradar, Chris Beaver, Gregory Brandt, Mihir Gandhi, Kishore Gopalakrishna, Wai Ip, Swaroop Jgadish, Shi Lu, Alexander Pachev, Aditya Ramesh, Abraham Sebastian, Rupa Shanbhag, Subbu Subramaniam, Yun Sun, Sajid Topiwala, Cuong Tran, Jemiah Westerman, David Zhang | This paper describes the motivation and design principles involved in building Espresso, the data model and capabilities exposed to clients, details of the replication and secondary indexing implementation and presents a set of experimental results that characterize the performance of the system along various dimensions. |
131 | Fast data in the era of big data: Twitter’s real-time related query suggestion architecture | Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin | We present the architecture behind Twitter’s real-time related query suggestion and spelling correction service. |
132 | Enhancements to SQL server column stores | Per-Ake Larson, Cipri Clinciu, Campbell Fraser, Eric N. Hanson, Mostafa Mokhtar, Michal Nowakiewicz, Vassilis Papadimos, Susan L. Price, Srikumar Rangarajan, Remus Rusanu, Mayukh Saubhasik | This paper gives an overview of SQL Server’s column stores and batch processing, in particular the enhancements introduced in the upcoming release. |
133 | Query containment in entity SQL | Guillem Rull, Philip A. Bernstein, Ivo Garcia dos Santos, Yannis Katsis, Sergey Melnik, Ernest Teniente | We describe a software architecture we have developed for a constructive containment checker of Entity SQL queries defined over extended ER schemas expressed in Microsoft’s Entity Data Model. |
134 | Timeline index: a unified data structure for processing queries on temporal data in SAP HANA | Martin Kaufmann, Amin Amiri Manjili, Panagiotis Vagenas, Peter Michael Fischer, Donald Kossmann, Franz Färber, Norman May | In this paper, we develop the Timeline Index as a novel, unified data structure that efficiently supports temporal operators such as temporal aggregation, time travel, and temporal joins. |
135 | LinkBench: a database benchmark based on the Facebook social graph | Timothy G. Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, Mark Callaghan | In this paper we present a new synthetic benchmark called LinkBench. |
136 | BigBench: towards an industry standard benchmark for big data analytics | Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, Hans-Arno Jacobsen | In this paper, we present BigBench, an end-to-end big data benchmark proposal. |
137 | Building, maintaining, and using knowledge bases: a report from the trenches | Omkar Deshpande, Digvijay S. Lamba, Michel Tourn, Sanjib Das, Sri Subramaniam, Anand Rajaraman, Venky Harinarayan, AnHai Doan | In this paper we describe such a process. |
138 | Query processing on smart SSDs: opportunities and challenges | Jaeyoung Do, Yang-Suk Kee, Jignesh M. Patel, Chanik Park, Kwanghyun Park, David J. DeWitt | We have implemented an initial prototype of Microsoft SQL Server running on a Samsung Smart SSD. |
139 | Micro adaptivity in Vectorwise | Bogdan Răducanu, Peter Boncz, Marcin Zukowski | In this paper, we (i) characterize a number of factors that cause performance diversity between primitive flavors, (ii) describe an e-greedy learning algorithm that casts the flavor selection into a multi-armed bandit problem, and (iii) describe the software framework for Micro Adaptivity that we implemented in the Vectorwise system. |
140 | Hekaton: SQL server’s memory-optimized OLTP engine | Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, Mike Zwilling | To achieve this it uses only latch-free data structures and a new optimistic, multiversion concurrency control technique. |
141 | Split query processing in polybase | David J. DeWitt, Alan Halverson, Rimma Nehme, Srinath Shankar, Josep Aguilar-Saborit, Artin Avanes, Miro Flasza, Jim Gramling | This paper presents Polybase, a feature of SQL Server PDW V2 that allows users to manage and query data stored in a Hadoop cluster using the standard SQL query language. |
142 | Petabyte scale databases and storage systems at Facebook | Dhruba Borthakur | We describe the techniques we have used to map the Facebook Graph Database into a set of relational tables. We present an alternate set of benchmark techniques that measure capacity of a database, the value/byte in that database and the efficiency of inbuilt crowd-sourcing techniques to reduce administration costs of that database. |
143 | Incremental mapping compilation in an object-to-relational mapping system | Philip A. Bernstein, Marie Jacob, Jorge Pérez, Guillem Rull, James F. Terwilliger | We define the problem formally, present algorithms to solve it for Microsoft’s Entity Framework, and report on an implementation. |
144 | FriendRouter: real-time path finder in social networks | Wladston Viana, Mirella M. Moro | Here, we introduce a tool for finding paths between social network users in real-time, a task that classical solutions are not tailored for. |
145 | Adaptive log compression for massive log data | Robert Christensen, Feifei Li | We present a novel adaptive log compression scheme. |
146 | BUZZARD: a NUMA-aware in-memory indexing system | Lukas M. Maas, Thomas Kissinger, Dirk Habich, Wolfgang Lehner | Using adaptive data partitioning techniques, BUZZARD distributes a prefix-tree-based index across the NUMA system and hands off incoming requests to worker threads located on each partition’s respective NUMA node. |
147 | Resa: realtime elastic streaming analytics in the cloud | Tian Tan, Richard T.B. Ma, Marianne Winslett, Yin Yang, Yong Yu, Zhenjie Zhang | We propose Resa, a novel framework for robust, elastic and realtime stream processing in the cloud. |
148 | Natural language question answering over RDF data | Ruizhe Huang, Lei Zou | In this work, we propose a methodology to translate natural language questions into SPARQL queries, which can be answered by existing RDF engines and fulfill users? |
149 | Mobile interaction and query optimizationin a protein-ligand data analysis system | Marvin Lapeine, Katherine G. Herbert, Emily Hill, Nina M. Goodey | Our approach applies standards as well as uses novel mechanisms to help improve performance time. |
150 | SIGMOD 2013 new researcher symposium | Anish Das Sarma, Xin Luna Dong | SIGMOD 2013 new researcher symposium |