Paper Digest: KDD 2015 Highlights
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) is one of the top data mining conferences in the world.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
team@paperdigest.org
TABLE 1: KDD 2015 Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | Online Controlled Experiments: Lessons from Running A/B/n Tests for 12 Years | Ron Kohavi | We provide an introduction, share real examples, key lessons, and cultural challenges. |
2 | MOOCS: What Have We Learned? | Daphne Koller | I will show how MOOCs provide opportunities for open-ended projects, intercultural learner interactions, and collaborative learning. |
3 | Machine Learning and Causal Inference for Policy Evaluation | Susan Athey | Specifically, we propose to divide the features of a model into causal features, whose values may be manipulated in a counterfactual policy environment, and attributes. |
4 | Data, Knowledge and Discovery: Machine Learning meets Natural Science | Hugh Durrant-Whyte | This talk will describe a number of applied machine learning projects addressing real-world inference problems in physical, life and social science areas. |
5 | Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC | Sungjin Ahn, Anoop Korattikara, Nathan Liu, Suju Rajan, Max Welling | In this paper, we propose a scalable distributed Bayesian matrix factorization algorithm using stochastic gradient MCMC. |
6 | TimeMachine: Timeline Generation for Knowledge-Base Entities | Tim Althoff, Xin Luna Dong, Kevin Murphy, Safa Alai, Van Dang, Wei Zhang | We present a method called TIMEMACHINE to generate a timeline of events and relations for entities in a knowledge base. |
7 | Estimating Local Intrinsic Dimensionality | Laurent Amsaleg, Oussama Chelly, Teddy Furon, Stéphane Girard, Michael E. Houle, Ken-ichi Kawarabayashi, Michael Nett | This paper is concerned with the estimation of a local measure of intrinsic dimensionality (ID) recently proposed by Houle. |
8 | Portraying Collective Spatial Attention in Twitter | Émilien Antoine, Adam Jatowt, Shoko Wakamiya, Yukiko Kawai, Toyokazu Akiyama | In this paper we demonstrate a novel visualization system for analyzing how Twitter users collectively talk about space and for uncovering correlations between geographical locations of Twitter users and the locations they tweet about. |
9 | Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy | Nurjahan Begum, Liudmila Ulanova, Jun Wang, Eamonn Keogh | In this work, we address this lethargy in two ways. |
10 | Efficient Online Evaluation of Big Data Stream Classifiers | Albert Bifet, Gianmarco de Francisci Morales, Jesse Read, Geoff Holmes, Bernhard Pfahringer | In this paper we propose a new evaluation methodology for big data streams. |
11 | Dynamically Modeling Patient’s Health State from Electronic Medical Records: A Time Series Approach | Karla L. Caballero Barajas, Ram Akella | In this paper, we present a method to dynamically estimate the probability of mortality inside the Intensive Care Unit (ICU) by combining heterogeneous data. |
12 | Facets: Fast Comprehensive Mining of Coevolving High-order Time Series | Yongjie Cai, Hanghang Tong, Wei Fan, Ping Ji, Qing He | In this paper, we propose a comprehensive method, FACETS, to simultaneously model all these three challenges. |
13 | Online Outlier Exploration Over Large Datasets | Lei Cao, Mingrui Wei, Di Yang, Elke A. Rundensteiner | In this work, we present the first online outlier exploration platform, called ONION, that enables analysts to effectively explore anomalies even in large datasets. |
14 | BatchRank: A Novel Batch Mode Active Learning Framework for Hierarchical Classification | Shayok Chakraborty, Vineeth Balasubramanian, Adepu Ravi Sankar, Sethuraman Panchanathan, Jieping Ye | In this paper, we propose a novel BMAL algorithm (BatchRank) for hierarchical classification. |
15 | On the Formation of Circles in Co-authorship Networks | Tanmoy Chakraborty, Sikhar Patranabis, Pawan Goyal, Animesh Mukherjee | In this paper, we propose an unsupervised approach to automatically detect circles in an ego network such that each circle represents a densely knit community of researchers. |
16 | Heterogeneous Network Embedding via Deep Architectures | Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C. Aggarwal, Thomas S. Huang | In this paper, we examine the scenario of a heterogeneous network with nodes and content of various types. |
17 | Differentially Private High-Dimensional Data Publication via Sampling-Based Inference | Rui Chen, Qian Xiao, Yu Zhang, Jianliang Xu | In this paper, we consider the problem of releasing high-dimensional data with differential privacy guarantees. |
18 | Efficient Algorithms for Public-Private Social Networks | Flavio Chierichetti, Alessandro Epasto, Ravi Kumar, Silvio Lattanzi, Vahab Mirrokni | We introduce the public-private model of graphs. |
19 | Warm Start for Parameter Selection of Linear Classifiers | Bo-Yu Chu, Chia-Hua Ho, Cheng-Hao Tsai, Chieh-Yen Lin, Chih-Jen Lin | Our aim is to devise effective warm-start strategies to efficiently solve this sequence of optimization problems. |
20 | Stream Sampling for Frequency Cap Statistics | Edith Cohen | We present a sampling framework for unaggregated data that uses a single pass (for streams) or two passes (for distributed data) and state proportional to the desired sample size. |
21 | Adaptation Algorithm and Theory Based on Generalized Discrepancy | Corinna Cortes, Mehryar Mohri, Andrés Muñoz Medina | We present a new algorithm for domain adaptation improving upon the discrepancy minimization algorithm (DM), which was previously shown to outperform a number of popular algorithms designed for this task. |
22 | Optimal Action Extraction for Random Forests and Boosted Trees | Zhicheng Cui, Wenlin Chen, Yujie He, Yixin Chen | To address this problem, we present a novel framework to post-process any ATM classifier to extract an optimal actionable plan that can change a given input to a desired class with a minimum cost. |
23 | Dynamic Matrix Factorization with Priors on Unknown Values | Robin Devooght, Nicolas Kourtellis, Amin Mantrach | In this work, we build on this assumption, and introduce a novel dynamic matrix factorization framework that allows to set an explicit prior on unknown values. |
24 | CoupledLP: Link Prediction in Coupled Networks | Yuxiao Dong, Jing Zhang, Jie Tang, Nitesh V. Chawla, Bai Wang | We propose a unified framework, CoupledLP, to solve the problem. |
25 | Unsupervised Feature Selection with Adaptive Structure Learning | Liang Du, Yi-Dong Shen | To address this, we propose a unified learning framework which performs structure learning and feature selection simultaneously. |
26 | Dirichlet-Hawkes Processes with Applications to Clustering Continuous-Time Document Streams | Nan Du, Mehrdad Farajtabar, Amr Ahmed, Alexander J. Smola, Le Song | In this paper, we propose a novel random process, referred to as the Dirichlet-Hawkes process, to take into account both information in a unified framework. |
27 | Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs | Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis | For the harder problem of ego 3-profiles, we introduce an algorithm that can estimate profiles of hundreds of thousands of vertices in parallel, in the timescale of minutes. |
28 | Hierarchical Graph-Coupled HMMs for Heterogeneous Personalized Health Data | Kai Fan, Marisa Eisenberg, Alison Walsh, Allison Aiello, Katherine Heller | The purpose of this study is to leverage modern technology (mobile or web apps) to enrich epidemiology data and infer the transmission of disease. |
29 | More Constraints, Smaller Coresets: Constrained Matrix Approximation of Sparse Big Data | Dan Feldman, Tamir Tassa | We suggest a generic data reduction technique with provable guarantees for computing the low rank approximation of a matrix under some $ellz error, and constrained factorizations, such as the Non-negative Matrix Factorization (NMF). |
30 | Certifying and Removing Disparate Impact | Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, Suresh Venkatasubramanian | Instead of requiring access to the process, we propose making inferences based on the data it uses. |
31 | RSC: Mining and Modeling Temporal Activity in Social Media | Alceu Ferraz Costa, Yuto Yamaguchi, Agma Juci Machado Traina, Caetano Traina, Christos Faloutsos | In this paper we analyze time-stamp data from social media services and find that the distribution of postings inter-arrival times (IAT) is characterized by four patterns: (i) positive correlation between consecutive IATs, (ii) heavy tails, (iii) periodic spikes and (iv) bimodal distribution. |
32 | A Clustering-Based Framework to Control Block Sizes for Entity Resolution | Jeffrey Fisher, Peter Christen, Qing Wang, Erhard Rahm | We propose two novel hierarchical clustering approaches which can generate blocks within a specified size range, and we present a penalty function which allows control of the trade-off between block quality and block size in the clustering process. |
33 | Who Supported Obama in 2012?: Ecological Inference through Distribution Regression | Seth R. Flaxman, Yu-Xiang Wang, Alexander J. Smola | We present a new solution to the “ecological inference” problem, of learning individual-level associations from aggregate data. |
34 | Real Estate Ranking via Mixed Land-use Latent Models | Yanjie Fu, Guannan Liu, Spiros Papadimitriou, Hui Xiong, Yong Ge, Hengshu Zhu, Chen Zhu | To that end, in this paper, we develop a geographical function ranking method, named FuncDivRank, by incorporating the functional diversity of communities into real estate appraisal. |
35 | Adaptive Message Update for Fast Affinity Propagation | Yasuhiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, Yasutoshi Ida, Machiko Toyoda | This paper proposes an efficient algorithm that guarantees the same clustering results as the original algorithm. |
36 | Monitoring Least Squares Models of Distributed Streams | Moshe Gabel, Daniel Keren, Assaf Schuster | We propose the first monitoring algorithm for multivariate regression models of distributed data streams that guarantees a bounded model error. |
37 | Reconstructing Textual Documents from n-grams | Matthias Gallé, Matías Tealdi | Instead, we propose another method consisting in adding strategically fictitious n-grams and show that a noised corpus like that is much harder to reconstruct while increasing only little the perplexity of a language model obtained through it. |
38 | Anatomical Annotations for Drosophila Gene Expression Patterns via Multi-Dimensional Visual Descriptors Integration: Multi-Dimensional Feature Learning | Hongchang Gao, Lin Yan, Weidong Cai, Heng Huang | We propose a novel structured sparsity-inducing norms based feature learning model to integrate the multi-dimensional visual descriptors for Drosophila gene expression patterns annotations. |
39 | Selective Hashing: Closing the Gap between Radius Search and k-NN Search | Jinyang Gao, H.V. Jagadish, Beng Chin Ooi, Sheng Wang | We propose a novel indexing scheme called Selective Hashing, where a disjoint set of indices are built with different granularities and each point is only stored in the most effective index. |
40 | Using Local Spectral Methods to Robustify Graph-Based Learning Algorithms | David F. Gleich, Michael W. Mahoney | We study robustness with respect to the details of graph constructions, errors in node labeling, degree variability, and a variety of other real-world heterogeneities, studying these methods through a precise relationship with mincut problems. |
41 | Instance Weighting for Patient-Specific Risk Stratification Models | Jen J. Gong, Thoralf M. Sundt, James D. Rawn, John V. Guttag | In this paper, we present an approach to address the problem of small data using transfer learning methods in the context of developing risk models for cardiac surgeries. |
42 | A Deep Hybrid Model for Weather Forecasting | Aditya Grover, Ashish Kapoor, Eric Horvitz | We explore new directions with forecasting weather as a data-intensive challenge that involves inferences across space and time. |
43 | Network Lasso: Clustering and Optimization in Large Graphs | David Hallac, Jure Leskovec, Stephen Boyd | In this paper, we introduce the network lasso, a generalization of the group lasso to a network setting that allows for simultaneous clustering and optimization on graphs. |
44 | Learning Tree Structure in Multi-Task Learning | Lei Han, Yu Zhang | To the best of our knowledge, there is no work to learn the tree structure among tasks and model parameters simultaneously under the regularization framework and in this paper, we develop a TAsk Tree (TAT) model for MTL to achieve this. |
45 | Probabilistic Community and Role Model for Social Networks | Yu Han, Jie Tang | In this paper, we propose a unified probabilistic framework, the Community Role Model (CRM), to model a social network. |
46 | Real-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering | Kohei Hayashi, Takanori Maehara, Masashi Toyoda, Ken-ichi Kawarabayashi | In this paper, we integrate both the extraction of meaningful topics and the filtering of messages over the Twitter stream. |
47 | Non-exhaustive, Overlapping Clustering via Low-Rank Semidefinite Programming | Yangyang Hou, Joyce Jiyoung Whang, David F. Gleich, Inderjit S. Dhillon | We propose a novel convex semidefinite program (SDP) as a relaxation of the non-exhaustive, overlapping clustering problem. |
48 | Inferring Air Quality for Station Location Recommendation Based on Urban Big Data | Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng | We design a semi-supervised inference model utilizing existing monitoring data together with heterogeneous city dynamics, including meteorology, human mobility, structure of road networks, and point of interests (POIs). |
49 | Website Optimization Problem and Its Solutions | Shuhei Iitsuka, Yutaka Matsuo | By combining organized algorithms and devices, we propose a rapid testing method that detects high-performing variations with few users. |
50 | Reciprocity in Social Networks with Capacity Constraints | Bo Jiang, Zhi-Li Zhang, Don Towsley | In this paper we study the problem of maximizing achievable reciprocity for an ensemble of digraphs with the same prescribed in- and out-degree sequences. |
51 | Learning with Similarity Functions on Graphs using Matchings of Geometric Embeddings | Fredrik D. Johansson, Devdatt Dubhashi | We develop and apply the Balcan-Blum-Srebro (BBS) theory of classification via similarity functions (which are not necessarily kernels) to the problem of graph classification. |
52 | Structured Hedging for Resource Allocations with Leverage | Nicholas Johnson, Arindam Banerjee | In this paper, we present a formulation for hedging online resource allocations with leverage and propose an efficient data mining algorithm (SHERAL). We pose the problem as a constrained online convex optimization problem. |
53 | Improved Bounds on the Dot Product under Random Projection and Random Sign Projection | Ata Kaban | In this paper we provide improved bounds on the dot product under random projection that matches the optimal bounds on the Euclidean distance. |
54 | Accelerated Alternating Direction Method of Multipliers | Mojtaba Kadkhodaie, Konstantina Christakopoulou, Maziar Sanjabi, Arindam Banerjee | In this paper, we introduce the Accelerated Alternating Direction Method of Multipliers (A2DM2) which solves problems with the same structure as ADMM. |
55 | Deep Computational Phenotyping | Zhengping Che, David Kale, Wenzhe Li, Mohammad Taha Bahadori, Yan Liu | We propose two novel modifications to standard neural net training that address challenges and exploit properties that are peculiar, if not exclusive, to medical data. |
56 | Leveraging Social Context for Modeling Topic Evolution | Janani Kalyanam, Amin Mantrach, Diego Saez-Trumper, Hossein Vahabi, Gert Lanckriet | In particular, our goal is to both qualitatively and quantitatively analyze when social context actually helps with TDE. |
57 | Scalable Blocking for Privacy Preserving Record Linkage | Alexandros Karakasidis, Georgia Koloniari, Vassilios S. Verykios | To this end, we propose Multi-Sampling Transitive Closure for Encrypted Fields (MS-TCEF), a novel privacy preserving blocking technique based on the use of reference sets. |
58 | Real Time Recommendations from Connoisseurs | Noriaki Kawamae | In this paper, we set the goal of real time recommendation, to present these items instantly. |
59 | Towards Decision Support and Goal Achievement: Identifying Action-Outcome Relationships From Social Media | Emre Kıcıman, Matthew Richardson | In this paper, we investigate the feasibility of mining the relationship between actions and their outcomes from the aggregated timelines of individuals posting experiential microblog reports. |
60 | On Estimating the Swapping Rate for Categorical Data | Daniel Kifer | In this paper, we consider the problem of inferring such parameters from the data. |
61 | Simultaneous Discovery of Common and Discriminative Topics via Joint Nonnegative Matrix Factorization | Hannah Kim, Jaegul Choo, Jingu Kim, Chandan K. Reddy, Haesun Park | To address such needs, this paper presents a novel topic modeling method based on joint nonnegative matrix factorization, which simultaneously discovers common as well as discriminative topics given multiple document sets. |
62 | A Decision Tree Framework for Spatiotemporal Sequence Prediction | Taehwan Kim, Yisong Yue, Sarah Taylor, Iain Matthews | We present a decision tree framework for learning an accurate non-parametric spatiotemporal sequence predictor. |
63 | TOPTRAC: Topical Trajectory Pattern Mining | Younghoon Kim, Jiawei Han, Cangzhou Yuan | In this paper, we present a latent topic-based clustering algorithm to discover patterns in the trajectories of geo-tagged text messages. |
64 | From Group to Individual Labels Using Deep Features | Dimitrios Kotzias, Misha Denil, Nando de Freitas, Padhraic Smyth | In this paper we focus on the problem of learning classifiers to make predictions at the instance level. |
65 | VEWS: A Wikipedia Vandal Early Warning System | Srijan Kumar, Francesca Spezzano, V.S. Subrahmanian | We describe specific behaviors that distinguish between vandals and non-vandals. We leverage multiple classical ML approaches, but develop 3 novel sets of features. |
66 | Unified and Contrasting Cuts in Multiple Graphs: Application to Medical Imaging Segmentation | Chia-Tung Kuo, Xiang Wang, Peter Walker, Owen Carmichael, Jieping Ye, Ian Davidson | In this paper we study two such questions: i) For a collection of graphs find a single cut that is good for all the graphs and ii) For two collections of graphs find a single cut that is good for one collection but poor for the other. |
67 | Reducing the Unlabeled Sample Complexity of Semi-Supervised Multi-View Learning | Chao Lan, Jun Huan | In this paper, we improve the state-of-art u.s.c. from O(1/ε) to O(log 1/ε) for small error ε, under mild conditions. |
68 | Maximum Likelihood Postprocessing for Differential Privacy under Consistency Constraints | Jaewoo Lee, Yue Wang, Daniel Kifer | In this paper, to further improve accuracy, we formulate this post-processing step as a constrained maximum likelihood estimation problem, which is equivalent to constrained L1 minimization. |
69 | Online Influence Maximization | Siyu Lei, Silviu Maniu, Luyi Mo, Reynold Cheng, Pierre Senellart | In this paper, we study IM in the absence of complete information on influence probability. |
70 | The Child is Father of the Man: Foresee the Success at the Early Stage | Liangyue Li, Hanghang Tong | In this paper, we propose a joint predictive model to forecast the long-term scientific impact at the early stage, which simultaneously addresses a number of these open challenges, including the scholarly feature design, the non-linearity, the domain-heterogeneity and dynamics. |
71 | 0-Bit Consistent Weighted Sampling | Ping Li | We provide a simple solution by discarding t* (which we refer to as the "0-bit" scheme). |
72 | On the Discovery of Evolving Truth | Yaliang Li, Qi Li, Jing Gao, Lu Su, Bo Zhao, Wei Fan, Jiawei Han | To address this problem, we investigate the temporal relations among both object truths and source reliability, and propose an incremental truth discovery framework that can dynamically update object truths and source weights upon the arrival of new data. |
73 | MASCOT: Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams | Yongsub Lim, U Kang | In this paper, we propose MASCOT, a memory-efficient and accurate method for local triangle estimation in a graph stream based on edge sampling. |
74 | A Learning-based Framework to Handle Multi-round Multi-party Influence Maximization on Social Networks | Su-Chen Lin, Shou-De Lin, Ming-Syan Chen | Considering nowadays companies providing similar products or services compete with each other for resources and customers, this work proposes a learning-based framework to tackle the multi-round competitive influence maximization problem on a social network. |
75 | Temporal Phenotyping from Longitudinal Electronic Health Records: A Graph Based Framework | Chuanren Liu, Fei Wang, Jianying Hu, Hui Xiong | To address this challenge, in this paper, we develop a novel representation, namely the temporal graph, for such event sequences. |
76 | Spectral Ensemble Clustering | Hongfu Liu, Tongliang Liu, Junjie Wu, Dacheng Tao, Yun Fu | We therefore propose SEC, an efficient Spectral Ensemble Clustering method based on co-association matrix. |
77 | Fast and Memory-Efficient Significant Pattern Mining via Permutation Testing | Felipe Llinares-López, Mahito Sugiyama, Laetitia Papaxanthos, Karsten Borgwardt | We present a novel algorithm for significant pattern mining, Westfall-Young light. |
78 | Influence at Scale: Distributed Computation of Complex Contagion in Networks | Brendan Lucier, Joel Oren, Yaron Singer | We describe a novel sampling approach that can be used to design scalable algorithms with provable performance guarantees. |
79 | FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation | Fenglong Ma, Yaliang Li, Qi Li, Minghui Qiu, Jing Gao, Shi Zhi, Lu Su, Bo Zhao, Heng Ji, Jiawei Han | To capture various expertise levels on different topics, we propose FaitCrowd, a fine grained truth discovery model for the task of aggregating conflicting data collected from multiple users/sources. |
80 | Algorithmic Cartography: Placing Points of Interest and Ads on Maps | Mohammad Mahdian, Okke Schrijvers, Sergei Vassilvitskii | We present simple, approximately optimal selection algorithms, coupled with incentive compatible pricing schemes in case of advertiser supplied points of interest. |
81 | Dimensionality Reduction Via Graph Structure Learning | Qi Mao, Li Wang, Steve Goodison, Yijun Sun | We present a new dimensionality reduction setting for a large family of real-world problems. |
82 | Robust Treecode Approximation for Kernel Machines | William B. March, Bo Xiao, Sameer Tharakan, Chenhan D. Yu, George Biros | We present a theoretical error analysis of our treecode and relate it to the error of Nystrom methods. |
83 | Inferring Networks of Substitutable and Complementary Products | Julian McAuley, Rahul Pandey, Jure Leskovec | Our goal in this paper is to learn the semantics of substitutes and complements from the text of online reviews. |
84 | Data-Driven Activity Prediction: Algorithms, Evaluation Methodology, and Applications | Bryan Minor, Janardhan Rao Doppa, Diane J. Cook | In this paper, we make three main contributions. |
85 | Scalable Large Near-Clique Detection in Large-Scale Networks via Sampling | Michael Mitzenmacher, Jakub Pachocki, Richard Peng, Charalampos Tsourakakis, Shen Chen Xu | In this paper we focus on a family of poly-time solvable formulations, known as the k-clique densest subgraph problem (k-Clique-DSP) [57]. |
86 | Graph Query Reformulation with Diversity | Davide Mottin, Francesco Bonchi, Francesco Gullo | We study a problem of graph-query reformulation enabling explorative query-driven discovery in graph databases. |
87 | Flexible and Robust Multi-Network Clustering | Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang | In this paper, we propose a flexible and robust framework that allows multiple underlying clustering structures across different networks. |
88 | Extreme States Distribution Decomposition Method for Search Engine Online Evaluation | Kirill Nikolaev, Alexey Drutsa, Ekaterina Gladkikh, Alexander Ulianov, Gleb Gusev, Pavel Serdyukov | We provide a thorough theoretical analysis of our approach and show experimentally that, other things being equal, it produces more sensitive OEC than the average. |
89 | Simultaneous Modeling of Multiple Diseases for Mortality Prediction in Acute Hospital Care | Nozomi Nori, Hisashi Kashima, Kazuto Yamashita, Hiroshi Ikai, Yuichi Imanaka | In this paper, we incorporate disease-specific contexts into mortality modeling by formulating the mortality prediction problem as a multi-task learning problem in which a task corresponds to a disease. |
90 | Fast and Robust Parallel SGD Matrix Factorization | Jinoh Oh, Wook-Shin Han, Hwanjo Yu, Xiaoqian Jiang | This paper proposes a fast and robust parallel SGD matrix factorization algorithm, called MLGF-MF, which is robust to skewed matrices and runs efficiently on block-storage devices (e.g., SSD disks) as well as shared-memory. |
91 | Efficient PageRank Tracking in Evolving Networks | Naoto Ohsaka, Takanori Maehara, Ken-ichi Kawarabayashi | In this paper, we propose an efficient online algorithm for tracking personalized PageRank in an evolving network. |
92 | Quick Sensitivity Analysis for Incremental Data Modification and Its Application to Leave-one-out CV in Linear Classification Problems | Shota Okumura, Yoshiki Suzuki, Ichiro Takeuchi | We introduce a novel sensitivity analysis framework for large scale classification problems that can be used when a small number of instances are incrementally added or removed. |
93 | Non-transitive Hashing with Latent Similarity Components | Mingdong Ou, Peng Cui, Fei Wang, Jun Wang, Wenwu Zhu | In this paper, we propose a non-transitive hashing method, namely Multi-Component Hashing (MuCH), to identify the latent similarity components to cope with the non-transitive similarity relationships. |
94 | Optimal Kernel Group Transformation for Exploratory Regression Analysis and Graphics | Pan Chao, Qiming Huang, Michael Zhu | In this article, we propose to use optimal group transformations as a general approach for exploring the relationship between Y and X. |
95 | Discovering and Exploiting Deterministic Label Relationships in Multi-Label Learning | Christina Papagiannopoulou, Grigorios Tsoumakas, Ioannis Tsamardinos | This work presents a probabilistic method for enforcing adherence of the marginal probabilities of a multi-label model to automatically discovered deterministic relationships among labels. |
96 | Subspace Clustering Using Log-determinant Rank Approximation | Chong Peng, Zhao Kang, Huiqing Li, Qiang Cheng | We apply the method of augmented Lagrangian multipliers to optimize this non-convex rank approximation-based objective function and obtain closed-form solutions for all subproblems of minimizing different variables alternatively. |
97 | A PCA-Based Change Detection Framework for Multidimensional Data Streams: Change Detection in Multidimensional Data Streams | Abdulhakim A. Qahtan, Basma Alharbi, Suojin Wang, Xiangliang Zhang | In this paper, we propose a framework for detecting changes in multidimensional data streams based on principal component analysis, which is used for projecting data into a lower dimensional space, thus facilitating density estimation and change-score calculations. |
98 | State-Driven Dynamic Sensor Selection and Prediction with State-Stacked Sparseness | Guo-Jun Qi, Charu Aggarwal, Deepak Turaga, Daby Sow, Phil Anno | We introduce the notion of state-stacked sparseness to select a subset of the most critical sensors as a function of evolving system state. |
99 | SCRAM: A Sharing Considered Route Assignment Mechanism for Fair Taxi Route Recommendations | Shiyou Qian, Jian Cao, Frédéric Le Mouël, Issam Sahel, Minglu Li | In the paper, we propose SCRAM, a sharing considered route assignment mechanism for fair taxi route recommendations. |
100 | Locally Densest Subgraph Discovery | Lu Qin, Rong-Hua Li, Lijun Chang, Chengqi Zhang | In this paper, we aim to discover top-k such representative locally densest subgraphs of a graph. |
101 | Virus Propagation in Multiple Profile Networks | Angeliki Rapti, Spyros Sioutas, Kostas Tsichlas, Giannis Tzimas | Can we predict what proportion of the network will actually get "infected" (e.g., spread the idea or buy the competing product), when the nodes of the network appear to have different sensitivity based on their profile? |
102 | Collective Opinion Spam Detection: Bridging Review Networks and Metadata | Shebuti Rayana, Leman Akoglu | In this work, we propose a new holistic approach called SPEAGLE that utilizes clues from all metadata (text, timestamp, rating) as well as relational data (network), and harness them collectively under a unified framework to spot suspicious users and reviews, as well as products targeted by spam. |
103 | ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering | Xiang Ren, Ahmed El-Kishky, Chi Wang, Fangbo Tao, Clare R. Voss, Jiawei Han | In this paper, we investigate entity recognition (ER) with distant-supervision and propose a novel relation phrase-based ER framework, called ClusType, that runs data-driven phrase mining to generate entity mention candidates and relation phrases, and enforces the principle that relation phrases should be softly clustered when propagating type information between their argument entities. |
104 | Mining Frequent Itemsets through Progressive Sampling with Rademacher Averages | Matteo Riondato, Eli Upfal | We present an algorithm to extract an high-quality approximation of the (top-k) Frequent itemsets (FIs) from random samples of a transactional dataset. |
105 | Why It Happened: Identifying and Modeling the Reasons of the Happening of Social Events | Yu Rong, Hong Cheng, Zhiyu Mo | Many models have been proposed to explain how information diffuses. |
106 | Matrix Completion with Queries | Natali Ruchansky, Mark Crovella, Evimaria Terzi | In this work, we address this problem by proposing an active version of matrix completion, where queries can be made to the true underlying matrix. |
107 | Stochastic Divergence Minimization for Online Collapsed Variational Bayes Zero Inference of Latent Dirichlet Allocation | Issei Sato, Hiroshi Nakagawa | We reformulate the existing SCVB0 inference by using the stochastic divergence minimization algorithm, with which convergence can be analyzed in terms of Martingale convergence theory. |
108 | Bayesian Poisson Tensor Factorization for Inferring Multilateral Relations from Sparse Dyadic Event Counts | Aaron Schein, John Paisley, David M. Blei, Hanna Wallach | We present a Bayesian tensor factorization model for inferring latent group structures from dynamic pairwise interaction patterns. |
109 | TimeCrunch: Interpretable Dynamic Graph Summarization | Neil Shah, Danai Koutra, Tianmin Zou, Brian Gallagher, Christos Faloutsos | Our main contributions are (a) formulation: we show how to formalize this problem as minimizing the encoding cost in a data compression paradigm, (b) algorithm: we propose TIMECRUNCH, an effective, scalable and parameter-free method for finding coherent, temporal patterns in dynamic graphs and (c) practicality: we apply our method to several large, diverse real-world datasets with up to 36 million edges and 6.3 million nodes. |
110 | Inside Jokes: Identifying Humorous Cartoon Captions | Dafna Shahaf, Eric Horvitz, Robert Mankoff | Motivated by the prospect of creating computational models of humor, we study the influence of the language of cartoon captions on the perceived humorousness of the cartoons. |
111 | Community Detection based on Distance Dynamics | Junming Shao, Zhichao Han, Qinli Yang, Tao Zhou | In this paper, we introduce a new community detection algorithm, called Attractor, which automatically spots communities in a network by examining the changes of "distances" among nodes (i.e. distance dynamics). |
112 | Discovery of Meaningful Rules in Time Series | Mohammad Shokoohi-Yekta, Yanping Chen, Bilson Campana, Bing Hu, Jesin Zakaria, Eamonn Keogh | In this work, we show why these ideas are not directly suitable for rule discovery in time series. |
113 | An Evaluation of Parallel Eccentricity Estimation Algorithms on Undirected Real-World Graphs | Julian Shun | This paper presents efficient shared-memory parallel implementations and the first comprehensive experimental study of graph eccentricity estimation algorithms in the literature. |
114 | Efficient Latent Link Recommendation in Signed Networks | Dongjin Song, David A. Meyer, Dacheng Tao | Since GAUC weights each pairwise comparison equally and the calculation of GAUC requires quadratic time, we derive two lower bounds of GAUC which can be computed in linear time and put more emphasis on ranking positive links on the top and negative links at the bottom of a ranking list. |
115 | Turn Waste into Wealth: On Simultaneous Clustering and Cleaning over Dirty Data | Shaoxu Song, Chunping Li, Xiaoquan Zhang | To this end, we study a novel problem of clustering and repairing over dirty data at the same time. |
116 | Set Cover at Web Scale | Stergios Stergiou, Kostas Tsioutsiouliklis | In this work we give the first MapReduce Set Cover algorithm that scales to problem sizes of ∼ 1 trillion elements and runs in logp Δ iterations for a nearly optimum approximation ratio of p ln Δ, where Δ is the cardinality of the largest set in F A web crawler is a system for bulk downloading of web pages. |
117 | Exploiting Relevance Feedback in Knowledge Graph Search | Yu Su, Shengqi Yang, Huan Sun, Mudhakar Srivatsa, Sue Kase, Michelle Vanni, Xifeng Yan | In this paper, we study how to improve graph query by relevance feedback. |
118 | LINKAGE: An Approach for Comprehensive Risk Prediction for Care Management | Zhaonan Sun, Fei Wang, Jianying Hu | In this paper, we propose a data-driven comprehensive risk prediction method, named LINKAGE, which can be used to jointly assess a set of associated risks in support of holistic care management. |
119 | Transitive Transfer Learning | Ben Tan, Yangqiu Song, Erheng Zhong, Qiang Yang | To solve the TTL problem, we propose a learning framework to mimic the human learning process. |
120 | PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks | Jian Tang, Meng Qu, Qiaozhu Mei | In this paper, we fill this gap by proposing a semi-supervised representation learning method for text data, which we call the predictive text embedding (PTE). |
121 | An Effective Marketing Strategy for Revenue Maximization with a Quantity Constraint | Ya-Wen Teng, Chih-Hua Tai, Philip S. Yu, Ming-Syan Chen | To fulfill this gap, in this paper, we aim for maximizing the revenue by considering the quantity constraint on the promoted commodity. |
122 | Scaling Up Stochastic Dual Coordinate Ascent | Kenneth Tran, Saghar Hosseini, Lin Xiao, Thomas Finley, Mikhail Bilenko | In this paper, we introduce an asynchronous parallel version of the algorithm, analyze its convergence properties, and propose a solution for primal-dual synchronization required to achieve convergence in practice. |
123 | Discovering Valuable items from Massive Data | Hastagiri P. Vanchinathan, Andreas Marfurt, Charles-Antoine Robelin, Donald Kossmann, Andreas Krause | We present an algorithm, GP-SELECT, which utilizes prior knowledge about similarity between items, expressed as a kernel function. |
124 | Deep Learning Architecture with Dynamically Programmed Layers for Brain Connectome Prediction | Vivek Veeriah, Rohit Durvasula, Guo-Jun Qi | It is critical in the research for epilepsy and other neuropathological diseases. |
125 | Incorporating World Knowledge to Document Clustering via Heterogeneous Information Networks | Chenguang Wang, Yangqiu Song, Ahmed El-Kishky, Dan Roth, Ming Zhang, Jiawei Han | In this paper, we provide an example of using world knowledge for domain dependent document clustering. |
126 | Towards Interactive Construction of Topical Hierarchy: A Recursive Tensor Decomposition Approach | Chi Wang, Xueqing Liu, Yanglei Song, Jiawei Han | In this study, we propose a novel method, called STROD, that allows efficient and consistent modification of topic hierarchies, based on a recursive generative model and a scalable tensor decomposition inference algorithm with theoretical performance guarantee. |
127 | Collaborative Deep Learning for Recommender Systems | Hao Wang, Naiyan Wang, Dit-Yan Yeung | To address this problem, we generalize recently advances in deep learning from i.i.d. input to non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian model called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix. |
128 | Trading Interpretability for Accuracy: Oblique Treed Sparse Additive Models | Jialei Wang, Ryohei Fujimaki, Yosuke Motohashi | This paper proposes oblique treed sparse additive models (OT-SpAMs). |
129 | Geo-SAGE: A Geographical Sparse Additive Generative Model for Spatial Item Recommendation | Weiqing Wang, Hongzhi Yin, Ling Chen, Yizhou Sun, Shazia Sadiq, Xiaofang Zhou | In light of this, we propose Geo-SAGE, a geographical sparse additive generative model for spatial item recommendation in this paper. |
130 | Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics | Yichen Wang, Robert Chen, Joydeep Ghosh, Joshua C. Denny, Abel Kho, You Chen, Bradley A. Malin, Jimeng Sun | We propose Rubik, a constrained non-negative tensor factorization and completion method for phenotyping. |
131 | Regularity and Conformity: Location Prediction Using Heterogeneous Mobility Data | Yingzi Wang, Nicholas Jing Yuan, Defu Lian, Linli Xu, Xing Xie, Enhong Chen, Yong Rui | To address these challenges, in this paper we propose a hybrid predictive model integrating both the regularity and conformity of human mobility as well as their mutual reinforcement. |
132 | Dynamic Poisson Autoregression for Influenza-Like-Illness Case Count Prediction | Zheng Wang, Prithwish Chakraborty, Sumiko R. Mekaru, John S. Brownstein, Jieping Ye, Naren Ramakrishnan | In this paper, we focus on short-term ILI case count prediction and develop a dynamic Poisson autoregressive model with exogenous inputs variables (DPARX) for flu forecasting. |
133 | Cinema Data Mining: The Smell of Fear | Jörg Wicker, Nicolas Krauter, Bettina Derstorff, Christof Stönner, Efstratios Bourtsoukidis, Thomas Klüpfel, Jonathan Williams, Stefan Kramer | The paper introduces a new field of application for data mining, where trace gas responses of people reacting on-line to films shown in cinemas (or movie theaters) are related to the semantic content of the films themselves. |
134 | Predicting Winning Price in Real Time Bidding with Censored Data | Wush Chi-Hsuan Wu, Mi-Yen Yeh, Ming-Syan Chen | We propose to leverage the machine learning and statistical methods to train the winning price model from the bidding history. |
135 | Diversifying Restricted Boltzmann Machine for Document Modeling | Pengtao Xie, Yuntian Deng, Eric Xing | To solve this problem, we propose Diversified RBM (DRBM) which diversifies the hidden units, to make them cover not only the dominant topics, but also those in the long-tail region. |
136 | Edge-Weighted Personalized PageRank: Breaking A Decade-Old Performance Barrier | Wenlei Xie, David Bindel, Alan Demers, Johannes Gehrke | In this paper, we describe the first fast algorithm for computing PageRank on general graphs when the edge weights are personalized. |
137 | Petuum: A New Platform for Distributed Machine Learning on Big Data | Eric P. Xing, Qirong Ho, Wei Dai, Jin-Kyu Kim, Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, Yaoliang Yu | We propose a general-purpose framework that systematically addresses data- and model-parallel challenges in large-scale ML, by leveraging several fundamental properties underlying ML programs that make them different from conventional operation-centric programs: error tolerance, dynamic structure, and nonuniform convergence; all stem from the optimization-centric nature shared in ML programs’ mathematical definitions, and the iterative-convergent behavior of their algorithmic solutions. |
138 | Longitudinal LASSO: Jointly Learning Features and Temporal Contingency for Outcome Prediction | Tingyang Xu, Jiangwen Sun, Jinbo Bi | We propose an approach to automatically and simultaneously determine both the relevant features and the relevant temporal points that impact the current outcome of the dependent variable. |
139 | Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems | Feng Yan, Olatunji Ruwase, Yuxiong He, Trishul Chilimbi | This paper develops performance models that quantify the impact of these partitioning and provisioning decisions on overall distributed system performance and scalability. |
140 | Deep Graph Kernels | Pinar Yanardag, S.V.N. Vishwanathan | In this paper, we present Deep Graph Kernels, a unified framework to learn latent representations of sub-structures for graphs, inspired by latest advancements in language modeling and deep learning. |
141 | Model Multiple Heterogeneity via Hierarchical Multi-Latent Space Learning | Pei Yang, Jingrui He | To address this problem, we propose a Hierarchical Multi-Latent Space (HiMLS) learning approach to jointly model the triple types of heterogeneity. |
142 | Structural Graphical Lasso for Learning Mouse Brain Connectivity | Sen Yang, Qian Sun, Shuiwang Ji, Peter Wonka, Ian Davidson, Jieping Ye | Motivated by the hierarchical structure of the brain networks, we consider the problem of estimating a graphical model with tree-structural regularization in this paper. |
143 | Entity Matching across Heterogeneous Sources | Yang Yang, Yizhou Sun, Jie Tang, Bo Ma, Juanzi Li | In this paper, we formalize the problem as entity matching across heterogeneous sources and propose a probabilistic topic model to solve the problem. |
144 | An Efficient Semi-Supervised Clustering Algorithm with Sequential Constraints | Jinfeng Yi, Lijun Zhang, Tianbao Yang, Wei Liu, Jun Wang | To address this challenge, we propose an efficient dynamic semi-supervised clustering framework that casts the clustering problem into a search problem over a feasible convex set, i.e., a convex hull with its extreme points being an ensemble of m data partitions. |
145 | Assembler: Efficient Discovery of Spatial Co-evolving Patterns in Massive Geo-sensory Data | Chao Zhang, Yu Zheng, Xiuli Ma, Jiawei Han | In this paper, we propose a two-stage method called Assember. |
146 | Dynamic Topic Modeling for Monitoring Market Competition from Online Text and Image Data | Hao Zhang, Gunhee Kim, Eric P. Xing | We propose a dynamic topic model for monitoring temporal evolution of market competition by jointly leveraging tweets and their associated images. |
147 | Organizational Chart Inference | Jiawei Zhang, Philip S. Yu, Yuanhua Lv | In this paper, we want to study the IOC (Inference of Organizational Chart) problem to identify company internal organizational chart based on the heterogeneous online ESN launched in it. |
148 | Panther: Fast Top-k Similarity Search on Large Networks | Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, Juanzi Li | In this paper, we propose a sampling method that provably and accurately estimates the similarity between vertices. |
149 | A Collective Bayesian Poisson Factorization Model for Cold-start Local Event Recommendation | Wei Zhang, Jianyong Wang | In this work, we address the new problem of cold-start local event recommendation in EBSNs. |
150 | Statistical Arbitrage Mining for Display Advertising | Weinan Zhang, Jun Wang | In this paper, we propose a novel data mining paradigm called Statistical Arbitrage Mining (SAM) focusing on mining and exploiting price discrepancies between two pricing schemes. |
151 | Deep Model Based Transfer and Multi-Task Learning for Biological Image Analysis | Wenlu Zhang, Rongjian Li, Tao Zeng, Qian Sun, Sudhir Kumar, Jieping Ye, Shuiwang Ji | Here, we developed problem-independent feature extraction methods to generate hierarchical representations for ISH images. |
152 | COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency | Yutao Zhang, Jie Tang, Zhilin Yang, Jian Pei, Philip S. Yu | In this paper, we propose COSNET (COnnecting heterogeneous Social NETworks with local and global consistency), a novel energy-based model, to address this problem by considering both local and global consistency among multiple networks. |
153 | SAME but Different: Fast and High Quality Gibbs Parameter Estimation | Huasha Zhao, Biye Jiang, John F. Canny, Bobby Jaros | In this paper we explore the application of SAME to graphical model inference on modern hardware. |
154 | Multi-Task Learning for Spatio-Temporal Event Forecasting | Liang Zhao, Qian Sun, Jieping Ye, Feng Chen, Chang-Tien Lu, Naren Ramakrishnan | This paper proposes a novel multi-task learning framework which aims to concurrently address all the challenges. |
155 | SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity | Qingyuan Zhao, Murat A. Erdogdu, Hera Y. He, Anand Rajaraman, Jure Leskovec | In this paper, we focus on predicting the final number of reshares of a given post. |
156 | Linear Time Samplers for Supervised Topic Models using Compositional Proposals | Xun Zheng, Yaoliang Yu, Eric P. Xing | In this work we extend the recent sampling advances for unsupervised LDA models to supervised tasks. |
157 | L∞ Error and Bandwidth Selection for Kernel Density Estimates of Large Data | Yan Zheng, Jeff M. Phillips | In this paper we investigate the challenges in using L∞ (or worst case) error, a stronger measure than L1 or L2. |
158 | Modeling Truth Existence in Truth Discovery | Shi Zhi, Bo Zhao, Wenzhu Tong, Jing Gao, Dian Yu, Heng Ji, Jiawei Han | By incorporating these three measures, we propose a probabilistic graphical model, which simultaneously infers truth as well as source quality without any a priori training involving ground truth answers. |
159 | Cuckoo Linear Algebra | Li Zhou, David G. Andersen, Mu Li, Alexander J. Smola | In this paper we present a novel data structure for sparse vectors based on Cuckoo hashing. |
160 | Integrating Vertex-centric Clustering with Edge-centric Clustering for Meta Path Graph Analysis | Yang Zhou, Ling Liu, David Buttler | This paper presents a meta path graph clustering framework, VEPATHCLUSTER, that combines meta path vertex-centric clustering with meta path edge-centric clustering for improving the clustering quality of heterogeneous networks. |
161 | Modeling User Mobility for Location Promotion in Location-based Social Networks | Wen-Yuan Zhu, Wen-Chih Peng, Ling-Jyh Chen, Kai Zheng, Xiaofang Zhou | In this paper, we investigate the key techniques that can help businesses promote their locations by advertising wisely through the underlying LBSNs. |
162 | Co-Clustering based Dual Prediction for Cargo Pricing Optimization | Yada Zhu, Hongxia Yang, Jingrui He | In particular, we propose a probabilistic framework to simultaneously construct dual predictive models and uncover the co-clusters of originations and destinations. |
163 | Debiasing Crowdsourced Batches | Honglei Zhuang, Aditya Parameswaran, Dan Roth, Jiawei Han | In this paper, we study the data annotation bias when data items are presented as batches to be judged by workers simultaneously. |
164 | Query Workloads for Data Series Indexes | Kostas Zoumpatianos, Yin Lou, Themis Palpanas, Johannes Gehrke | In this work, we show that random workloads are inherently not suitable for the task at hand and we argue that there is a need for carefully generating a query workload. |
165 | Scaling Machine Learning and Statistics for Web Applications | Deepak Agarwal | I will provide an overview of these challenges and the strategies we have adopted at LinkedIn to address those. |
166 | Hadoop’s Impact on the Future of Data Management | Amr Awadallah | Hadoop’s Impact on the Future of Data Management |
167 | Should You Trust Your Money to a Robot? | Vasant Dhar | Should You Trust Your Money to a Robot? |
168 | Data Science at Visa | Waqar Hasan, Min Wang | We will describe technical achievements we have made in the area of fraud and cover some open challenges in data science. |
169 | How Artificial Intelligence and Big Data Created Rocket Fuel: A Case Study | George John | The case study presentation will present a fast-paced overview of the business and technology context for Rocket Fuel at inception and at present, key learnings and decisions, and the road ahead. |
170 | Optimizing Marketing Impact through Data Driven Decisioning | Anil Kamath | In this talk we will show how data science and optimization techniques can be applied to cross channel data to attribute marketing effectiveness, drive media planning and real-time optimization of campaigns. |
171 | Powering Real-time Decision Engines in Finance and Healthcare using Open Source Software | Bassel Ojjeh | This presentation covers how, in collaboration with financial services and healthcare institutions, we built an OSS project to deliver a real-time decisioning engine for their respective applications. |
172 | Clouded Intelligence | Joseph Sirosh | In this talk I will review what these trends mean for the future of data science and show examples of revolutionary applications that you can build using cloud platforms. |
173 | Data Science from the Lab to the Field to the Enterprise | Christopher White | This presentation will cover previous work at DARPA, experience building real-world applications for defense and law enforcement to analyze data, and the future of computer science as an enabler for content discovery, information extraction, relevance determination, and information visualization. |
174 | User Modeling in Telecommunications and Internet Industry | Qiang Yang | What are the "pain" points of users’ In this talk, I will discuss my own experience on user modeling with big data. |
175 | The Effectiveness of Marketing Strategies in Social Media: Evidence from Promotional Events | Panagiotis Adamopoulos, Vilma Todri | We use a real-world data set and employ a promising research approach combining econometric with predictive modeling techniques in a causal estimation framework that allows for more accurate counterfactuals. |
176 | Personalizing LinkedIn Feed | Deepak Agarwal, Bee-Chung Chen, Qi He, Zhenhao Hua, Guy Lebanon, Yiming Ma, Pannagadatta Shivaswamy, Hsiao-Ping Tseng, Jaewon Yang, Liang Zhang | More specifically, we focus on the personalization models by generating three kinds of affinity scores: Viewer-ActivityType Affinity, Viewer-Actor Affinity, and Viewer-Actor-ActivityType Affinity. |
177 | Whither Social Networks for Web Search? | Rakesh Agrawal, Behzad Golshan, Evangelos Papalexakis | We present the results of our empirical study that indicates that by mining Twitter data one can obtain search results that are quite distinct from those produced by Google and Bing. |
178 | Exploiting Data Mining for Authenticity Assessment and Protection of High-Quality Italian Wines from Piedmont | Marco Arlorio, Jean Daniel Coisson, Giorgio Leonardi, Monica Locatelli, Luigi Portinale | Following Wagstaff’s proposal for practical exploitation of machine learning (and data mining) approaches, we describe how data have been collected and prepared for the production of different datasets, how suitable classification models have been identified and how the interpretation of the results suggests the emergence of an active role of classification techniques, based on standard chemical profiling, for the assesment of the authenticity of the wines target of the study. |
179 | Predictive Approaches for Low-Cost Preventive Medicine Program in Developing Countries | Yukino Baba, Hisashi Kashima, Yasunobu Nohara, Eiko Kai, Partha Ghosh, Rafiqul Islam, Ashir Ahmed, Masahiro Kuroda, Sozo Inoue, Tatsuo Hiramatsu, Michio Kimura, Shuji Shimizu, Kunihisa Kobayashi, Koji Tsuda, Masashi Sugiyama, Mathieu Blondel, Naonori Ueda, Masaru Kitsuregawa, Naoki Nakashima | In this study, we investigate predictive modeling for providing a low-cost preventive medicine program. In our two-year-long field study in Bangladesh, we collected the health checkup results of 15,075 subjects, the data of 6,607 prescriptions, and the follow-up examination results of 2,109 subjects. |
180 | Dynamic Hierarchical Classification for Patient Risk-of-Readmission | Senjuti Basu Roy, Ankur Teredesai, Kiyana Zolfaghar, Rui Liu, David Hazel, Stacey Newman, Albert Marinez | In this paper, we describe a supervised learning framework, Dynamic Hierarchical Classification (DHC) for patient’s risk-of-readmission prediction. |
181 | ALOJA-ML: A Framework for Automating Characterization and Knowledge Discovery in Hadoop Deployments | Josep Lluís Berral, Nicolas Poggi, David Carrera, Aaron Call, Rob Reinauer, Daron Green | This article presents ALOJA-Machine Learning (ALOJA-ML) an extension to the ALOJA project that uses machine learning techniques to interpret Hadoop benchmark performance data and performance tuning; here we detail the approach, efficacy of the model and initial results. |
182 | Multi-View Incident Ticket Clustering for Optimal Ticket Dispatching | Mirela Madalina Botezatu, Jasmina Bogojeska, Ioana Giurgiu, Hagen Voelzer, Dorothea Wiesmann | We present a novel technique that optimizes the dispatching of incident tickets to the agents in an IT Service Support Environment. |
183 | Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission | Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, Noemie Elhadad | In the 30-day hospital readmission case study, we show that the same methods scale to large datasets containing hundreds of thousands of patients and thousands of attributes while remaining intelligible and providing accuracy comparable to the best (unintelligible) machine learning methods. |
184 | User Conditional Hashtag Prediction for Images | Emily Denton, Jason Weston, Manohar Paluri, Lubomir Bourdev, Rob Fergus | We explore two ways of combining these heterogeneous features into a learning framework: (i) simple concatenation; and (ii) a 3-way multiplicative gating, where the image model is conditioned on the user metadata. |
185 | Big Data System for Analyzing Risky Procurement Entities | Amit Dhurandhar, Bruce Graves, Rajesh Ravi, Gopikrishanan Maniachari, Markus Ettl | In this paper, we describe a robust tool to identify procurement related fraud/risk, though the general design and the analytical components could be adapted to detecting fraud in other domains. |
186 | Probabilistic Modeling of a Sales Funnel to Prioritize Leads | Brendan Andrew Duncan, Charles Peter Elkan | Specifically,we present two models, called DQM for direct qualification model and FFM for full funnel model, that can be used to rank initial leads based on their probability of conversion to a sales opportunity, probability of successful sale, and/or expected revenue. |
187 | Online Topic-based Social Influence Analysis for the Wimbledon Championships | Varun R. Embar, Indrajit Bhattacharya, Vinayaka Pandit, Roman Vaculin | In this paper, we define various functional and usability criteria that social influence scores should satisfy, and propose a multi-dimensional definition of influence that satisfies these criteria. |
188 | Collective Spammer Detection in Evolving Multi-Relational Social Networks | Shobeir Fakhraei, James Foulds, Madhusudana Shashanka, Lise Getoor | Motivated by the Tagged.com social network, we develop methods to identify spammers in evolving multi-relational social networks. |
189 | Utilizing Text Mining on Online Medical Forums to Predict Label Change due to Adverse Drug Reactions | Ronen Feldman, Oded Netzer, Aviv Peretz, Binyamin Rosenfeld | We present an end-to-end text mining methodology for relation extraction of adverse drug reactions (ADRs) from medical forums on the Web. |
190 | One-Pass Ranking Models for Low-Latency Product Recommendations | Antonino Freno, Martin Saveski, Rodolphe Jenatton, Cédric Archambeau | In this paper, we investigate how the practical challenges faced in this setting can be tackled via an online learning to rank approach. |
191 | On the Reliability of Profile Matching Across Large Online Social Networks | Oana Goga, Patrick Loiseau, Robin Sommer, Renata Teixeira, Krishna P. Gummadi | In this paper, we study the extent to which we can reliably match profiles in practice, across real-world social networks, by exploiting public attributes, i.e., information users publicly provide about themselves. |
192 | E-commerce in Your Inbox: Product Recommendations at Scale | Mihajlo Grbovic, Vladan Radosavljevic, Nemanja Djuric, Narayan Bhamidipati, Jaikit Savla, Varun Bhagwan, Doug Sharp | In this paper we describe a system that leverages user purchase history determined from e-mail receipts to deliver highly personalized product ads to Yahoo Mail users. |
193 | Gender and Interest Targeting for Sponsored Post Advertising at Tumblr | Mihajlo Grbovic, Vladan Radosavljevic, Nemanja Djuric, Narayan Bhamidipati, Ananth Nagarajan | In this paper, we present a framework that enabled two of the key targeted advertising components for Tumblr, gender and interest targeting. |
194 | Mining Administrative Data to Spur Urban Revitalization | Ben Green, Alejandra Caro, Matthew Conway, Robert Manduca, Tom Plagge, Abby Miller | In this paper, we apply data science techniques to administrative data to help the City of Memphis, Tennessee improve distressed neighborhoods. |
195 | Measuring Causal Impact of Online Actions via Natural Experiments: Application to Display Advertising | Daniel N. Hill, Robert Moakler, Alan E. Hubbard, Vadim Tsemekhman, Foster Provost, Kiril Tsemekhman | Here we present a novel framework for estimating causal effects that relies on neither randomized experiments nor adjusting for the potentially explosive number of variables used in predictive models. |
196 | Focusing on the Long-term: It’s Good for Users and Business | Henning Hohnhold, Deirdre O’Brien, Diane Tang | The results presented in this paper are generalizable in two major ways. |
197 | Traffic Measurement and Route Recommendation System for Mass Rapid Transit (MRT) | Thomas Holleczek, Dang The Anh, Shanyang Yin, Yunye Jin, Spiros Antonatos, Han Leong Goh, Samantha Low, Amy Shi-Nash | We have therefore developed and deployed a traffic measurement system for a key player in the transportation industry to gain insights into crowd behavior for planning purposes. |
198 | Real-Time Bid Prediction using Thompson Sampling-Based Expert Selection | Elena Ikonomovska, Sina Jafarpour, Ali Dasdan | In this paper we propose to use probability sampling (via Thompson Sampling) as a meta-learning algorithm that samples from the pool of experts for the purpose of bid prediction. |
199 | Life-stage Prediction for Product Recommendation in E-commerce | Peng Jiang, Yadong Zhu, Yi Zhang, Quan Yuan | In this paper, we found obvious correlation between life stage and purchasing behavior in many E-commerce categories. |
200 | Visual Search at Pinterest | Yushi Jing, David Liu, Dmitry Kislyuk, Andrew Zhai, Jiajing Xu, Jeff Donahue, Sarah Tavel | We demonstrate that, with the availability of distributed computation platforms such as Amazon Web Services and open-source tools, it is possible for a small engineering team to build, launch and maintain a cost-effective, large-scale visual search system. |
201 | Discovering Collective Narratives of Theme Parks from Large Collections of Visitors’ Photo Streams | Gunhee Kim, Leonid Sigal | We present an approach for generating pictorial storylines from large collections of online photo streams shared by visitors to theme parks (e.g. Disneyland), along with publicly available information such as visitor’s maps. |
202 | A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes | Himabindu Lakkaraju, Everaldo Aguiar, Carl Shan, David Miller, Nasir Bhanpuri, Rayid Ghani, Kecia L. Addison | This paper describes a machine learning framework to identify such students, discusses features that are useful for this task, applies several classification algorithms, and evaluates them using metrics important to school administrators. |
203 | Probabilistic Graphical Models of Dyslexia | Yair Lakretz, Gal Chechik, Naama Friedmann, Michal Rosen-Zvi | In this study, introducing a novel approach, we use two families of probabilistic graphical models to analyze patterns of reading errors made by dyslexic people: an LDA-based model and two Naëve Bayes models which differ by their assumptions about the generation process of reading errors. |
204 | Promoting Positive Post-Click Experience for In-Stream Yahoo Gemini Users | Mounia Lalmas, Janette Lehmann, Guy Shaked, Fabrizio Silvestri, Gabriele Tolomei | In this paper, we describe the method we have implemented in Yahoo Gemini to measure the post-click experience on Yahoo mobile news streams via an automatic analysis of advert landing pages. |
205 | Generic and Scalable Framework for Automated Time-series Anomaly Detection | Nikolay Laptev, Saeed Amizadeh, Ian Flint | This paper introduces a generic and scalable framework for automated anomaly detection on large scale time-series data. |
206 | Leveraging Knowledge Bases for Contextual Entity Exploration | Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv | We present a system called Lewis for retrieving contextually relevant entity results leveraging a knowledge graph, and perform a large scale crowdsourcing experiment in the context of an e-reader scenario, which shows that Lewis can outperform the state-of-the-art contextual entity recommendation systems by more than 20% in terms of the MAP score. |
207 | Click-through Prediction for Advertising in Twitter Timeline | Cheng Li, Yue Lu, Qiaozhu Mei, Dong Wang, Sandeep Pandey | We present the problem of click-through prediction for advertising in Twitter timeline, which displays a stream of Tweets from accounts a user choose to follow. |
208 | Predicting Voice Elicited Emotions | Ying Li, Jose D. Contreras, Luis J. Salazar | We present the research, and product development and deployment, of Voice Analyzer’ by Jobaline Inc. |
209 | Discovery of Glaucoma Progressive Patterns Using Hierarchical MDL-Based Clustering | Shigeru Maya, Kai Morino, Hiroshi Murata, Ryo Asaoka, Kenji Yamanishi | In this paper, we propose a method to cluster the spacial patterns of the visual field in glaucoma patients to analyze the progression patterns of glaucoma. |
210 | Distributed Personalization | Xu Miao, Chun-Te Chu, Lijun Tang, Yitong Zhou, Joel Young, Anmol Bhasin | In this paper, we formalize the generic personalization problem as an optimization problem. |
211 | Voltage Correlations in Smart Meter Data | Rajendu Mitra, Ramachandra Kota, Sambaran Bandyopadhyay, Vijay Arya, Brian Sullivan, Richard Mueller, Heather Storey, Gerard Labut | This work shows that voltage time series measurements collected from customer smart meters exhibit correlations that are consistent with the hierarchical structure of the distribution network. |
212 | Analyzing Invariants in Cyber-Physical Systems using Latent Factor Regression | Marjan Momtazpour, Jinghe Zhang, Saifur Rahman, Ratnesh Sharma, Naren Ramakrishnan | We describe a latent factor approach to infer invariants underlying system variables and how we can leverage these relationships to monitor a cyber-physical system. |
213 | Predicting Future Scientific Discoveries Based on a Networked Analysis of the Past Literature | Meenakshi Nagarajan, Angela D. Wilkins, Benjamin J. Bachman, Ilya B. Novikov, Shenghua Bao, Peter J. Haas, María E. Terrón-Díaz, Sumit Bhatia, Anbu K. Adikesavan, Jacques J. Labrie, Sam Regenbogen, Christie M. Buchovecky, Curtis R. Pickering, Linda Kato, Andreas M. Lisewski, Ana Lelescu, Houyin Zhang, Stephen Boyer, Griff Weber, Ying Chen, Lawrence Donehower, Scott Spangler, Olivier Lichtarge | We present KnIT, the Knowledge Integration Toolkit, a system for accelerating scientific discovery and predicting previously unknown protein-protein interactions. |
214 | Learning a Hierarchical Monitoring System for Detecting and Diagnosing Service Issues | Vinod Nair, Ameya Raul, Shwetabh Khanduja, Vikas Bahirwani, Qihong Shao, Sundararajan Sellamanickam, Sathiya Keerthi, Steve Herbert, Sudheer Dhulipalla | We propose a machine learning based framework for building a hierarchical monitoring system to detect and diagnose service issues. |
215 | Predictive Modeling for Public Health: Preventing Childhood Lead Poisoning | Eric Potash, Joe Brew, Alexander Loewi, Subhabrata Majumdar, Andrew Reece, Joe Walsh, Eric Rozier, Emile Jorgenson, Raed Mansour, Rayid Ghani | This paper describes joint work with the Chicago Department of Public Health (CDPH) in which we build a model that predicts the risk of a child to being poisoned so that an intervention can take place before that happens. |
216 | Proof Protocol for a Machine Learning Technique Making Longitudinal Predictions in Dynamic Contexts | Kevin B. Pratt | We propose necessary components of the proof protocol and demonstrate results visualizations to support evaluation of the proof components. |
217 | An Architecture for Agile Machine Learning in Real-Time Applications | Johann Schleier-Smith | Machine learning techniques have proved effective in recommender systems and other applications, yet teams working to deploy them lack many of the advantages that those in more established software disciplines today take for granted. |
218 | Scalable Machine Learning Approaches for Neighborhood Classification Using Very High Resolution Remote Sensing Imagery | Manu Sethi, Yupeng Yan, Anand Rangarajan, Ranga Raju Vatsavai, Sanjay Ranka | A semi-supervised learning approach for identifying neighborhoods is presented which employs superpixel tessellation representations of VHR imagery. |
219 | Early Identification of Violent Criminal Gang Members | Elham Shaabani, Ashkan Aleali, Paulo Shakarian, John Bertetto | In this paper, we study the problem of early identification of violent gang members. |
220 | Spoken English Grading: Machine Learning with Crowd Intelligence | Vinay Shashidhar, Nishant Pandey, Varun Aggarwal | In this paper, we address the problem of grading spontaneous speech using a combination of machine learning and crowdsourcing. |
221 | Effective Audience Extension in Online Advertising | Jianqiang Shen, Sahin Cem Geyik, Ali Dasdan | In this paper, we formally define the audience extension problem, propose an algorithm that extends a given audience set efficiently under multiple desirable criteria, and experimentally validate its performance. |
222 | Going In-Depth: Finding Longform on the Web | Virginia Smith, Miriam Connor, Isabelle Stanton | In this work, we develop a system to automatically identify longform content across the web. |
223 | Early Prediction of Cardiac Arrest (Code Blue) using Electronic Medical Records | Sriram Somanchi, Samrachana Adhikari, Allen Lin, Elena Eneva, Rayid Ghani | In this paper, we describe our work, in partnership with NorthShore University HealthSystem, that preemptively flags patients who are likely to go into cardiac arrest, using signals extracted from demographic information, hospitalization history, vitals and laboratory measurements in patient-level electronic medical records. |
224 | When-To-Post on Social Networks | Nemanja Spasojevic, Zhisheng Li, Adithya Rao, Prantik Bhattacharyya | In this study, we formulate a when-to-post problem, where the objective is to find the best times for a user to post on social networks in order to maximize the probability of audience responses. |
225 | Mining for Causal Relationships: A Data-Driven Study of the Islamic State | Andrew Stanton, Amanda Thart, Ashish Jain, Priyank Vyas, Arpan Chatterjee, Paulo Shakarian | In this paper, we present a data-driven approach to analyzing this group using a dataset consisting of 2200 incidents of military activity surrounding ISIS and the forces that oppose it (including Iraqi, Syrian, and the American-led coalition). |
226 | Transfer Learning for Bilingual Content Classification | Qian Sun, Mohammad Amin, Baoshi Yan, Craig Martell, Vita Markman, Anmol Bhasin, Jieping Ye | In this paper, we take the spam (Spanish) job posting detection as the target problem and build a generic machine learning pipeline for multi-lingual spam detection. |
227 | FrauDetector: A Graph-Mining-based Framework for Fraudulent Phone Call Detection | Vincent S. Tseng, Jia-Ching Ying, Che-Wei Huang, Yimin Kao, Kuan-Ta Chen | In this paper, we develop a graph-mining-based fraudulent phone call detection framework for a mobile application to automatically annotate fraudulent phone numbers with a "fraud" tag, which is a crucial prerequisite for distinguishing fraudulent phone calls from normal phone calls. |
228 | Efficient Long-Term Degradation Profiling in Time Series for Complex Physical Systems | Liudmila Ulanova, Tan Yan, Haifeng Chen, Guofei Jiang, Eamonn Keogh, Kai Zhang | In this work, we introduce a novel time series analysis technique that allows the decomposition of the time series into trend and fluctuation components, providing the monitoring software with actionable information about the changes of the system’s behavior over time. |
229 | Interpreting Advertiser Intent in Sponsored Search | Bhanu C. Vattikonda, Santhosh Kodipaka, Hongyan Zhou, Vacha Dave, Saikat Guha, Alex C. Snoeren | Past work has employed a bag-of-words approach using features extracted from both the query and potential sponsored result to train the ranker. |
230 | Client Clustering for Hiring Modeling in Work Marketplaces | Vasilis Verroios, Panagiotis Papadimitriou, Ramesh Johari, Hector Garcia-Molina | We propose a Maximum Likelihood definition of the "optimal" client clustering along with an efficient Expectation-Maximization clustering algorithm that can be applied in large marketplaces. |
231 | Discerning Tactical Patterns for Professional Soccer Teams: An Enhanced Topic Model with Applications | Qing Wang, Hengshu Zhu, Wei Hu, Zhiyong Shen, Yuan Yao | To this end, in this paper we propose an unsupervised approach to automatically discerning the typical tactics, i.e., tactical patterns, of soccer teams through mining the historical match logs. |
232 | Predicting Serves in Tennis using Style Priors | Xinyu Wei, Patrick Lucey, Stuart Morgan, Peter Carr, Machar Reid, Sridha Sridharan | In this paper we present a method which recommends the most likely serves of a player in a given context. |
233 | Smart Pacing for Effective Online Ad Campaign Optimization | Jian Xu, Kuang-chih Lee, Wentong Li, Hang Qi, Quan Lu | In this paper, we propose a smart pacing approach in which the delivery pace of each campaign is learned from both offline and online data to achieve smooth delivery and optimal performance goals. |
234 | From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks | Ya Xu, Nanyu Chen, Addrian Fernandez, Omar Sinno, Anmol Bhasin | In this paper, we describe in depth the experimentation platform we have built at LinkedIn and the challenges that arise particularly when running A/B tests at large scale in a social network setting. |
235 | Tornado Forecasting with Multiple Markov Boundaries | Kui Yu, Dawei Wang, Wei Ding, Jian Pei, David L. Small, Shafiqul Islam, Xindong Wu | In this work, we provide a new solution to use the concept of multiple Markov boundaries in local causal discovery to identify multiple sets of the precursors for tornado forecasting. |
236 | Gas Concentration Reconstruction for Coal-Fired Boilers Using Gaussian Process | Chao Yuan, Matthias Behmann, Bernhard Meerbeck | We propose a Bayesian approach based on Gaussian process (GP) to address both image reconstruction and path arrangement problems, simultaneously. |
237 | Annotating Needles in the Haystack without Looking: Product Information Extraction from Emails | Weinan Zhang, Amr Ahmed, Jie Yang, Vanja Josifovski, Alex J. Smola | To accomplish this task, we propose a hybrid approach, which basically trains a CRF model using the labels predicted by binary classifiers (weak learners). |
238 | Forecasting Fine-Grained Air Quality Based on Big Data | Yu Zheng, Xiuwen Yi, Ming Li, Ruiyuan Li, Zhangqing Shan, Eric Chang, Tianrui Li | In this paper, we forecast the reading of an air quality monitoring station over the next 48 hours, using a data-driven method that considers current meteorological data, weather forecasts, and air quality data of the station and that of other stations within a few hundred kilometers. |
239 | Building Discriminative User Profiles for Large-scale Content Recommendation | Erheng Zhong, Nathan Liu, Yue Shi, Suju Rajan | In this paper, we propose a hybrid solution that makes use of a latent factor model to infer user interest vectors. |
240 | Stock Constrained Recommendation in Tmall | Wenliang Zhong, Rong Jin, Cheng Yang, Xiaowei Yan, Qi Zhang, Qiang Li | We address this challenge by developing a dual method that reduces the number of variables from n^2 to n, significantly improving the computational efficiency. |
241 | Predicting Ambulance Demand: a Spatio-Temporal Kernel Approach | Zhengyi Zhou, David S. Matteson | We propose a predictive method using spatio-temporal kernel density estimation (stKDE) to address these challenges, and provide spatial density predictions for ambulance demand in Toronto, Canada as it varies over hourly intervals. |
242 | Web Personalization and Recommender Systems | Shlomo Berkovsky, Jill Freyne | This tutorial will provide the participants with broad overview and thorough understanding of algorithms and practically deployed Web and mobile applications of personalized technologies. |
243 | Graph-Based User Behavior Modeling: From Prediction to Fraud Detection | Alex Beutel, Leman Akoglu, Christos Faloutsos | In this tutorial we will answer these questions – connecting graph analysis tools for user behavior modeling to anomaly and fraud detection. |
244 | Data-Driven Product Innovation | Xin Fu, Hernán Asorey | In this tutorial, we introduce the framework that we created to nurture data-driven product innovations. |
245 | Dense Subgraph Discovery: KDD 2015 tutorial | Aristides Gionis, Charalampos E. Tsourakakis | In this tutorial we aim to provide a comprehensive overview of (i) major algorithmic techniques for finding dense subgraphs in large graphs and (ii) graph mining applications that rely on dense subgraph extraction. |
246 | Diffusion in Social and Information Networks: Research Problems, Probabilistic Models and Machine Learning Methods | Manuel Gomez Rodriguez, Le Song | In this tutorial, we will present several diffusion models designed for fine-grained large-scale diffusion and social event data, present some canonical research problem in the context of diffusion, and introduce state-of-the-art algorithms to solve some of these problems, in particular, network estimation, influence estimation and control, and rumor source identification. |
247 | Social Media Anomaly Detection: Challenges and Solutions | Yan Liu, Sanjay Chawla | In this tutorial, we survey existing work on social media anomaly detection, focusing on the new anomalous phenomena in social media and most recent techniques to detect those special types of anomalies. |
248 | Automatic Entity Recognition and Typing from Massive Text Corpora: A Phrase and Network Mining Approach | Xiang Ren, Ahmed El-Kishky, Chi Wang, Jiawei Han | In this tutorial, we introduce data-driven methods to recognize typed entities of interest in massive, domain-specific text corpora. |
249 | VC-Dimension and Rademacher Averages: From Statistical Learning Theory to Sampling Algorithms | Matteo Riondato, Eli Upfal | In this tutorial, we survey the use of Rademacher Averages and the VC-dimension in sampling-based algorithms for graph analysis and pattern mining. |
250 | Large Scale Distributed Data Science using Apache Spark | James G. Shanahan, Laing Dai | This tutorial will provide an accessible introduction to Spark and its potential to revolutionize academic and commercial data science practices. |
251 | Medical Mining: KDD 2015 Tutorial | Myra Spiliopoulou, Pedro Pereira Rodrigues, Ernestina Menasalvas | Purpose of this tutorial is to contribute to this learning process. |
252 | Big Data Analytics: Optimization and Randomization | Tianbao Yang, Qihang Lin, Rong Jin | In the first part, we plan to present the state-of-the-art large-scale optimization algorithms, including various stochastic gradient descent methods, stochastic coordinate descent methods and distributed optimization algorithms, for solving various machine learning problems. |
253 | Data Driven Science: SIGKDD Panel | Katharina Morik, Hugh Durrant-Whyte, Gary Hill, Dietmar Müller, Tanya Berger-Wolf | Knowledge discovery methods are finding broad application in all areas of scientific endeavor, to explore experimental data, to discover new models, to propose new scientific theories and ideas. |