Paper Digest: WWW 2017 Highlights
The Web Conference (WWW) is one of the top internet conferences in the world.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
team@paperdigest.org
TABLE 1: WWW 2017 Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | Taming the Data Deluge to Unravel the Mysteries of the Universe | Melanie Johnston-Hollitt | Thus the project aims to build the most data intensive scientific experiment ever, in some of the most remote places on Earth. |
2 | The Web-Wide World | Mark Pesce | How do we use it? |
3 | Web Mail is not Dead!: It’s Just Not Human Anymore | Yoelle Maarek | In this talk, I first share some elements of this journey that led us to this critical finding that 90% of today’s Web Mail is sent by automatic scripts [1]. |
4 | Deals or No Deals: Contract Design for Online Advertising | Vahab Mirrokni, Hamid Nazerzadeh | We propose a constant-factor approximation algorithm for maximizing the revenue that can be obtained from these deals. |
5 | Budget Management Strategies in Repeated Auctions | Santiago Balseiro, Anthony Kim, Mohammad Mahdian, Vahab Mirrokni | In particular, we consider six different budget management strategies including probabilistic throttling, thresholding, bid shading, reserve pricing, and multiplicative boosting. |
6 | GSP: The Cinderella of Mechanism Design | Christopher A. Wilkens, Ruggiero Cavallo, Rad Niazadeh | We give a deep justification for GSP’s success: advertisers’ preferences map to a model we call value maximization; they do not maximize profit as the standard theory would believe. |
7 | Horizon-Independent Optimal Pricing in Repeated Auctions with Truthful and Strategic Buyers | Alexey Drutsa | We study revenue optimization learning algorithms for repeated posted-price auctions where a seller interacts with a (truthful or strategic) buyer that holds a fixed valuation. |
8 | Sponsored Search Auctions with Rich Ads | Ruggiero Cavallo, Prabhakar Krishnamurthy, Maxim Sviridenko, Christopher A. Wilkens | In this paper we report on our efforts to redesign a search ad selling system from the ground up in this new context, proposing a mechanism that optimizes an entire slate of ads globally and computes prices that achieve properties analogous to those held by GSP in the original, simpler setting of uniform ads. |
9 | Prices and Subsidies in the Sharing Economy | Zhixuan Fang, Longbo Huang, Adam Wierman | In this paper, we focus on the design of prices and subsidies in sharing platforms. |
10 | Segmenting Two-Sided Markets | Siddhartha Banerjee, Sreenivas Gollapudi, Kostas Kollias, Kamesh Munagala | We develop efficient algorithms for throughput (i.e. volume of trade) and welfare maximization with provable guarantees under a variety of assumptions on the demand and supply functions. |
11 | An Experimental Evaluation of Regret-Based Econometrics | Noam Nisan, Gali Noti | Using data obtained in a controlled ad-auction experiment that we ran, we evaluate the regret-based approach to econometrics that was recently suggested by Nekipelov, Syrgkanis, and Tardos (EC 2015). |
12 | Usage Patterns and the Economics of the Public Cloud | Cinar Kilcioglu, Justin M. Rao, Aadharsh Kannan, R. Preston McAfee | We examine the economics of demand and supply in cloud computing. |
13 | Understanding and Discovering Deliberate Self-harm Content in Social Media | Yilin Wang, Jiliang Tang, Jundong Li, Baoxin Li, Yali Wan, Clayton Mellina, Neil O’Hare, Yi Chang | In this paper, we aim to understand self-harm content and provide automatic approaches to its detection. |
14 | Mobile Sensing at the Service of Mental Well-being: a Large-scale Longitudinal Study | Sandra Servia-Rodríguez, Kiran K. Rachuri, Cecilia Mascolo, Peter J. Rentfrow, Neal Lathia, Gillian M. Sandstrom | In this paper we report on what we believe is the largest longitudinal in-the-wild study of mood through smartphones. |
15 | Harnessing the Web for Population-Scale Physiological Sensing: A Case Study of Sleep and Performance | Tim Althoff, Eric Horvitz, Ryen W. White, Jamie Zeitzer | We present the largest study to date on the impact of objectively measured real-world sleep on performance enabled through a reframing of everyday interactions with a web search engine as a series of performance tasks. |
16 | Cataloguing Treatments Discussed and Used in Online Autism Communities | Shaodian Zhang, Tian Kang, Lin Qiu, Weinan Zhang, Yong Yu, Noémie Elhadad | In this paper, we rely on machine learning methods to automatically identify attributions of mentions of treatments from an online autism community. |
17 | Web Application Migration with Closure Reconstruction | Jin-woo Kwon, Soo-Mook Moon | In this paper, we propose a novel approach to fully serialize closures. |
18 | AppHolmes: Detecting and Characterizing App Collusion among Third-Party Android Markets | Mengwei Xu, Yun Ma, Xuanzhe Liu, Felix Xiaozhu Lin, Yunxin Liu | In this paper, we present the first in-depth study of app collusion, in which one app surreptitiously launches others in the background without user’s awareness. |
19 | The Long-Standing Privacy Debate: Mobile Websites vs Mobile Apps | Elias P. Papadopoulos, Michalis Diamantaris, Panagiotis Papadopoulos, Thanasis Petsas, Sotiris Ioannidis, Evangelos P. Markatos | In this paper, we aim to respond to this question: which of the two options protects the users’ privacy in the best way apps or browsers? |
20 | An Explorative Study of the Mobile App Ecosystem from App Developers’ Perspective | Haoyu Wang, Zhe Liu, Yao Guo, Xiangqun Chen, Miao Zhang, Guoai Xu, Jason Hong | This paper presents a study of the mobile app ecosystem from the perspective of app developers. |
21 | Neural Collaborative Filtering | Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, Tat-Seng Chua | In this work, we strive to develop techniques based on neural networks to tackle the key problem in recommendation — collaborative filtering — on the basis of implicit feedback. |
22 | Learning to Recommend Accurate and Diverse Items | Peizhe Cheng, Shuaiqiang Wang, Jun Ma, Jiankai Sun, Hui Xiong | In this study, we investigate diversified recommendation problem by supervised learning, seeking significant improvement in diversity while maintaining accuracy. |
23 | Collaborative Metric Learning | Cheng-Kang Hsieh, Longqi Yang, Yin Cui, Tsung-Yi Lin, Serge Belongie, Deborah Estrin | In this work, we study the connection between metric learning and collaborative filtering. |
24 | Beyond Globally Optimal: Focused Learning for Improved Recommendations | Alex Beutel, Ed H. Chi, Zhiyuan Cheng, Hubert Pham, John Anderson | As a result, we ask: how can we learn additional models to improve the recommendation quality for a specified subset of items? |
25 | AutoCyclone: Automatic Mining of Cyclic Online Activities with Robust Tensor Factorization | Tsubasa Takahashi, Bryan Hooi, Christos Faloutsos | We present CycloneM, a unifying model to capture both cyclic patterns and outliers, and CycloneFact, a novel algorithm which solves the above problem. |
26 | Assessing Percolation Threshold Based on High-Order Non-Backtracking Matrices | Yuan Lin, Wei Chen, Zhongzhi Zhang | In this paper, we study high-order non-backtracking matrices and their application to assessing percolation threshold. |
27 | Large Scale Density-friendly Graph Decomposition via Convex Programming | Maximilien Danisch, T.-H. Hubert Chan, Mauro Sozio | In our work, we devise an efficient algorithm which is able to compute exact locally-dense decompositions in real-world graphs containing up to billions of edges. |
28 | nTD: Noise-Profile Adaptive Tensor Decomposition | Xinsheng Li, K. Selçuk Candan, Maria Luisa Sapino | In this paper, we propose a Noise-Profile Adaptive Tensor Decomposition (nTD) method, which aims to tackle both of these challenges. |
29 | Measuring and Improving the Reliability of Wide-Area Cloud Paths | Osama Haq, Mamoon Raja, Fahad R. Dogar | We find that cloud paths are more predictable compared to public Internet paths, with an order of magnitude lower loss rate and jitter at the tail (95th percentile and beyond) compared to public Internet paths. |
30 | Blotter: Low Latency Transactions for Geo-Replicated Storage | Henrique Moniz, João Leitão, Ricardo J. Dias, Johannes Gehrke, Nuno Preguiça, Rodrigo Rodrigues | In this paper we use a recently proposed isolation level, called Non-Monotonic Snapshot Isolation, to achieve ACID transactions with low latency. |
31 | Ten Blue Links on Mars | Charles L.A. Clarke, Gordon V. Cormack, Jimmy Lin, Adam Roegiest | In this paper, we formulate the searching from Mars problem as a tradeoff between "effort" (waiting for responses from Earth) and "data transfer" (pre-fetching or caching data on Mars). |
32 | Legion: Enriching Internet Services with Peer-to-Peer Interactions | Albert van der Linde, Pedro Fouto, João Leitão, Nuno Preguiça, Santiago Castiñeira, Annette Bieniusa | In this paper, we propose to extend user-centric Internet services with peer-to-peer interactions. |
33 | Inferring Individual Attributes from Search Engine Queries and Auxiliary Information | Luca Soldaini, Elad Yom-Tov | To facilitate research on specific topics of interest, especially in medicine, we introduce an algorithm for identifying a trait of interest in anonymous users. |
34 | Using Participatory Web-based Surveillance Data to Improve Seasonal Influenza Forecasting in Italy | Daniela Perrotta, Michele Tizzoni, Daniela Paolotti | In this study, we investigate how combining both traditional and participatory Web-based surveillance data can provide accurate predictions for seasonal influenza in real-time fashion. |
35 | Forecasting Seasonal Influenza Fusing Digital Indicators and a Mechanistic Disease Model | Qian Zhang, Nicola Perra, Daniela Perrotta, Michele Tizzoni, Daniela Paolotti, Alessandro Vespignani | Here, we propose the first seasonal influenza forecast framework based on a stochastic, spatially structured mechanistic model (individual level microsimulation) initialized with geo-localized microblogging data. |
36 | PhLeGrA: Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data | Maulik R. Kamdar, Mark A. Musen | We present the PhLeGrA platform for Linked Graph Analytics in Pharmacology in this paper. |
37 | Reducing Latency by Eliminating Synchrony | Min Hong Yun, Songtao He, Lin Zhong | We present Presto, an asynchronous design of the input to display path. |
38 | Almond: The Architecture of an Open, Crowdsourced, Privacy-Preserving, Programmable Virtual Assistant | Giovanni Campagna, Rakesh Ramesh, Silei Xu, Michael Fischer, Monica S. Lam | This paper presents the architecture of Almond, an open, crowdsourced, privacy-preserving and programmable virtual assistant for online services and the Internet of Things (IoT). |
39 | DeepSense: A Unified Deep Learning Framework for Time-Series Mobile Sensing Data Processing | Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, Tarek Abdelzaher | To this end, we propose DeepSense, a deep learning framework that directly addresses the aforementioned noise and feature customization challenges in a unified manner. |
40 | Regions, Periods, Activities: Uncovering Urban Dynamics via Cross-Modal Representation Learning | Chao Zhang, Keyang Zhang, Quan Yuan, Haoruo Peng, Yu Zheng, Tim Hanratty, Shaowen Wang, Jiawei Han | To bridge the gap, we present CrossMap, a novel cross-modal representation learning method that uncovers urban dynamics with massive GTSM data. |
41 | Fairness in Package-to-Group Recommendations | Dimitris Serbos, Shuyao Qi, Nikos Mamoulis, Evaggelia Pitoura, Panayiotis Tsaparas | In this paper, we focus on a novel aspect of package-to-group recommendations, that of fairness. |
42 | Streaming Recommender Systems | Shiyu Chang, Yang Zhang, Jiliang Tang, Dawei Yin, Yi Chang, Mark A. Hasegawa-Johnson, Thomas S. Huang | In this paper, we investigate the problem of recommendation with stream inputs. |
43 | What Your Images Reveal: Exploiting Visual Contents for Point-of-Interest Recommendation | Suhang Wang, Yilin Wang, Jiliang Tang, Kai Shu, Suhas Ranganath, Huan Liu | In this paper, we study the problem of enhancing POI recommendation with visual contents. |
44 | A General Model for Out-of-town Region Recommendation | Tuan-Anh Nguyen Pham, Xutao Li, Gao Cong | In this paper, we introduce a novel problem called Region Recommendation, which aims to recommend an out-of-town region of POIs that are likely to be visited by a user. |
45 | Linear Additive Markov Processes | Ravi Kumar, Maithra Raghu, Tamás Sarlós, Andrew Tomkins | We introduce LAMP: the Linear Additive Markov Process. |
46 | Submodular Optimization Over Sliding Windows | Alessandro Epasto, Silvio Lattanzi, Sergei Vassilvitskii, Morteza Zadimoghaddam | In this work, we study this question in the context of data streams, where elements arrive one at a time, and we want to design low-memory and fast update-time algorithms that maintain a good solution. |
47 | When Hashes Met Wedges: A Distributed Algorithm for Finding High Similarity Vectors | Aneesh Sharma, C. Seshadhri, Ashish Goel | Our work directly addresses this challenge by introducing a new algorithm — WHIMP — that solves this problem efficiently in the MapReduce model. |
48 | A Fast and Provable Method for Estimating Clique Counts Using Turán’s Theorem | Shweta Jain, C. Seshadhri | We present a new randomized algorithm that provably approximates the number of k-cliques, for any constant k. |
49 | Exploring HTTP Header Manipulation In-The-Wild | Gareth Tyson, Shan Huang, Felix Cuadrado, Ignacio Castro, Vasile C. Perta, Arjuna Sathiaseelan, Steve Uhlig | In this paper, we collect data on thousands of networks to understand how they intercept HTTP headers in-the-wild. |
50 | Push or Request: An Investigation of HTTP/2 Server Push for Improving Mobile Performance | Sanae Rosen, Bo Han, Shuai Hao, Z. Morley Mao, Feng Qian | In this paper, we investigate the benefits and challenges of using Server Push on mobile devices. |
51 | Performance Monitoring and Root Cause Analysis for Cloud-hosted Web Applications | Hiranya Jayathilaka, Chandra Krintz, Rich Wolski | In this paper, we describe Roots – a system for automatically identifying the "root cause" of performance anomalies in web applications deployed in Platform-as-a-Service (PaaS) clouds. |
52 | BOAT: Building Auto-Tuners with Structured Bayesian Optimization | Valentin Dalibard, Michael Schaarschmidt, Eiko Yoneki | We present BOAT, a framework which allows developers to build efficient bespoke auto-tuners for their system, in situations where generic auto-tuners fail. |
53 | Investigating the Healthiness of Internet-Sourced Recipes: Implications for Meal Planning and Recommender Systems | Christoph Trattner, David Elsweiler | This can be tempered, however, with simple post-filtering approaches, which we show by experiment are better suited to some algorithms than others. |
54 | Sangoshthi: Empowering Community Health Workers through Peer Learning in Rural India | Deepika Yadav, Pushpendra Singh, Kyle Montague, Vijay Kumar, Deepak Sood, Madeline Balaam, Drishti Sharma, Mona Duggal, Tom Bartindale, Delvin Varghese, Patrick Olivier | In this paper, we propose Sangoshthi, a low-cost mobile based training and learning platform that fits well into the environment of low-Internet access. |
55 | Is Saki #delicious?: The Food Perception Gap on Instagram and Its Relation to Health | Ferda Ofli, Yusuf Aytar, Ingmar Weber, Raggi al Hammouri, Antonio Torralba | Here we propose to use recent advances in image recognition to tackle this problem. |
56 | The Spread of Physical Activity Through Social Networks | David Stück, Haraldur Tómas Hallgrímsson, Greg Ver Steeg, Alessandro Epasto, Luca Foschini | In this study, we consider the evolution of the physical activity of 44.5 thousand Fitbit users as they interact on the Fitbit social network, in relation to their health status. |
57 | Drawing Sound Conclusions from Noisy Judgments | David Goldberg, Andrew Trotman, Xiao Wang, Wei Min, Zongru Wan | We introduce equations and algorithms that can adjust the metrics to the values they would have had if there were no annotation errors. |
58 | Exploring Query Auto-Completion and Click Logs for Contextual-Aware Web Search and Query Suggestion | Liangda Li, Hongbo Deng, Anlei Dong, Yi Chang, Ricardo Baeza-Yates, Hongyuan Zha | Our paper proposes to model users’ behaviors on both QAC and click logs simultaneously by utilizing both logs as the contextual data of each other. |
59 | Web Search as a Linguistic Tool | Adam Fourney, Meredith Ringel Morris, Ryen W. White | In this paper we report the results of two surveys that investigate how, when, and why people use web search to support low-level, language-related tasks. |
60 | Query Expansion Based on a Feedback Concept Model for Microblog Retrieval | Yashen Wang, Heyan Huang, Chong Feng | We tackle the problem of improving microblog retrieval algorithms by proposing a Feedback Concept Model for query expansion. |
61 | Why Do Cascade Sizes Follow a Power-Law? | Karol Wegrzycki, Piotr Sankowski, Andrzej Pacuk, Piotr Wygocki | We introduce random directed acyclic graph and use it to model the information diffusion network. |
62 | DeepCas: An End-to-end Predictor of Information Cascades | Cheng Li, Jiaqi Ma, Xiaoxiao Guo, Qiaozhu Mei | We present algorithms that learn the representation of cascade graphs in an end-to-end manner, which significantly improve the performance of cascade prediction over strong baselines including feature based methods, node embedding methods, and graph kernel methods. |
63 | Cascades: A View from Audience | Rahmtin Rotabi, Krishna Kamath, Jon Kleinberg, Aneesh Sharma | Cascades on social and information networks have been a tremendously popular subject of study in the past decade, and there is a considerable literature on phenomena such as diffusion mechanisms, virality, cascade prediction, and peer network effects. |
64 | Detecting Large Reshare Cascades in Social Networks | Karthik Subbian, B. Aditya Prakash, Lada Adamic | In contrast, in this paper, we propose SANSNET, a network-agnostic approach instead. |
65 | Identifying Value in Crowdsourced Wireless Signal Measurements | Zhijing Li, Ana Nika, Xinyi Zhang, Yanzi Zhu, Yuanshun Yao, Ben Y. Zhao, Haitao Zheng | Instead, we propose feature clustering, a novel application of unsupervised learning to detect hidden correlation between measurement instances, their features, and localization accuracy. |
66 | Collaborative Optimization for Collective Decision-making in Continuous Spaces | Nikhil Garg, Vijay Kamble, Ashish Goel, David Marn, Kamesh Munagala | We propose a meta-algorithm called Iterative Local Voting for collective decision-making in this setting, in which voters are sequentially sampled and asked to modify a candidate solution within some local neighborhood of its current value, as defined by a ball in some chosen norm. |
67 | Location Privacy-Preserving Task Allocation for Mobile Crowdsensing with Differential Geo-Obfuscation | Leye Wang, Dingqi Yang, Xiao Han, Tianben Wang, Daqing Zhang, Xiaojuan Ma | Hence, in this paper, we propose a location privacy-preserving task allocation framework with geo-obfuscation to protect users’ locations during task assignments. |
68 | PaRE: A System for Personalized Route Guidance | Yaguang Li, Han Su, Ugur Demiryurek, Bolong Zheng, Tieke He, Cyrus Shahabi | In this paper, we study a Personalized RoutE Guidance System dubbed PaRE – with which the goal is to generate more customized and intuitive directions based on user generated content. |
69 | Who Controls the Internet?: Analyzing Global Threats using Property Graph Traversals | Milivoj Simeonovski, Giancarlo Pellegrino, Christian Rossow, Michael Backes | To close this gap, we present a technique that models services, providers, and dependencies as a property graph. |
70 | Tools for Automated Analysis of Cybercriminal Markets | Rebecca S. Portnoff, Sadia Afroz, Greg Durrett, Jonathan K. Kummerfeld, Taylor Berg-Kirkpatrick, Damon McCoy, Kirill Levchenko, Vern Paxson | In this work, we propose an automated, top-down approach for analyzing underground forums. |
71 | Tracking Phishing Attacks Over Time | Qian Cui, Guy-Vincent Jourdan, Gregor V. Bochmann, Russell Couturier, Iosif-Viorel Onut | In this paper, we look at this problem from a new angle. |
72 | Security Challenges in an Increasingly Tangled Web | Deepak Kumar, Zane Ma, Zakir Durumeric, Ariana Mirian, Joshua Mason, J. Alex Halderman, Michael Bailey | In this paper, we investigate the current state of web dependencies and explore two security challenges associated with the increasing reliance on external services: (1) the expanded attack surface associated with serving unknown, implicitly trusted third-party content, and (2) how the increased set of external dependencies impacts HTTPS adoption. |
73 | Blood Pressure Prediction via Recurrent Models with Contextual Layer | Xiaohan Li, Shu Wu, Liang Wang | In this paper, we propose a novel model named recurrent models with contextual layer, which can model the sequential measurement data and contextual data simultaneously to predict the trend of users’ BP. |
74 | Enhancing Feature Selection Using Word Embeddings: The Case of Flu Surveillance | Vasileios Lampos, Bin Zou, Ingemar Johansson Cox | In this paper, we use neural word embeddings, trained on social media content from Twitter, to determine, in an unsupervised manner, how strongly textual features are semantically linked to an underlying health concept. |
75 | Adverse Drug Event Detection in Tweets with Semi-Supervised Convolutional Neural Networks | Kathy Lee, Ashequl Qadir, Sadid A. Hasan, Vivek Datla, Aaditya Prakash, Joey Liu, Oladimeji Farri | In this work, we build several semi-supervised convolutional neural network (CNN) models for ADE classification in tweets, specifically leveraging different types of unlabeled data in developing the models to address the problem. |
76 | DeepMood: Forecasting Depressed Mood Based on Self-Reported Histories via Recurrent Neural Networks | Yoshihiko Suhara, Yinzhan Xu, Alex ‘Sandy’ Pentland | This paper develops a recurrent neural network algorithm that incorporates categorical embedding layers for forecasting depression. |
77 | GPOP: Scalable Group-level Popularity Prediction for Online Content in Social Networks | Minh X. Hoang, Xuan-Hong Dang, Xiang Wu, Zhenyu Yan, Ambuj K. Singh | In this paper, we claim that a novel approach based on group-level popularity is necessary and more practical, given that users naturally organize themselves into clusters and that users within a cluster react to online content in a uniform manner. |
78 | Expecting to be HIP: Hawkes Intensity Processes for Social Media Popularity | Marian-Andrei Rizoiu, Lexing Xie, Scott Sanner, Manuel Cebrian, Honglin Yu, Pascal Van Hentenryck | We develop a novel mathematical model, the Hawkes intensity process, which can explain the complex popularity history of each video according to its type of content, network of diffusion, and sensitivity to promotion. |
79 | Taming the Unpredictability of Cultural Markets with Social Influence | Andrés Abeliuk, Gerardo Berbeglia, Pascal Van Hentenryck, Tad Hogg, Kristina Lerman | In this paper, we report results of an experimental study that shows that unpredictability is not an inherent property of social influence. |
80 | Predicting the Success of Online Petitions Leveraging Multidimensional Time-Series | Julia Proskurnia, Przemyslaw Grabowicz, Ryota Kobayashi, Carlos Castillo, Philippe Cudré-Mauroux, Karl Aberer | In this paper, we tackle the problem of predicting the evolution of a time series of user activity on the web in a manner that is both accurate and interpretable, using related time series to produce a more accurate prediction. |
81 | Dynamic Key-Value Memory Networks for Knowledge Tracing | Jiani Zhang, Xingjian Shi, Irwin King, Dit-Yan Yeung | To solve these problems, this work introduces a new model called Dynamic Key-Value Memory Networks (DKVMN) that can exploit the relationships between underlying concepts and directly output a student’s mastery level of each concept. |
82 | How Users Explore Ontologies on the Web: A Study of NCBO’s BioPortal Usage Logs | Simon Walk, Lisette Esín-Noboa, Denis Helic, Markus Strohmaier, Mark A. Musen | To that end, we study and group users according to their browsing behavior on BioPortal and use data mining techniques to characterize and compare exploration strategies across ontologies. |
83 | Type-based Semantic Optimization for Scalable RDF Graph Pattern Matching | HyeongSik Kim, Padmashree Ravindra, Kemafor Anyanwu | In this paper, we address the challenge of type-based query optimization for RDF graph pattern queries. |
84 | Extracting Emerging Knowledge from Social Media | Marco Brambilla, Stefano Ceri, Emanuele Della Valle, Riccardo Volonterio, Felix Xavier Acero Salazar | Thus, we propose a method for discovering emerging entities by extracting them from social content. |
85 | Distilling Task Knowledge from How-To Communities | Cuong Xuan Chu, Niket Tandon, Gerhard Weikum | This paper presents a method for automatically constructing a formal knowledge base on tasks and task-solving steps, by tapping the contents of online communities such as WikiHow. |
86 | Constructing and Evaluating a Novel Crowdsourcing-based Paraphrased Opinion Spam Dataset | Seongsoon Kim, Seongwoon Lee, Donghyeon Park, Jaewoo Kang | In this paper, we introduce a novel dataset called Paraphrased OPinion Spam (POPS), which contains a new type of review spam that imitates real human opinions using crowdsourcing. |
87 | Optimizing the Recency-Relevancy Trade-off in Online News Recommendations | Abhijnan Chakraborty, Saptarshi Ghosh, Niloy Ganguly, Krishna P. Gummadi | In this work, we focus on automatically recommending front-page stories in such media websites. |
88 | Distilling Information Reliability and Source Trustworthiness from Digital Traces | Behzad Tabibian, Isabel Valera, Mehrdad Farajtabar, Le Song, Bernhard Schölkopf, Manuel Gomez-Rodriguez | In particular, we propose a temporal point process modeling framework which links the temporal behavior of the users to information reliability and source trustworthiness. |
89 | An Army of Me: Sockpuppets in Online Discussion Communities | Srijan Kumar, Justin Cheng, Jure Leskovec, V.S. Subrahmanian | In this work, we study sockpuppetry across nine discussion communities, and show that sockpuppets differ from ordinary users in terms of their posting behavior, linguistic traits, as well as social network structure. |
90 | SMARTGEN: Exposing Server URLs of Mobile Apps With Selective Symbolic Execution | Chaoshun Zuo, Zhiqiang Lin | We have thus developed SMARTGEN to feature selective symbolic execution for the purpose of automatically generate server request messages to expose the server URLs by extracting and solving user input constraints in mobile apps. |
91 | On the Content Security Policy Violations due to the Same-Origin Policy | Dolière Francis Some, Nataliia Bielova, Tamara Rezk | In this work, we describe how CSP may be violated due to the SOP when a page contains an embedded iframe from the same origin. |
92 | Transparent Web Service Auditing via Network Provenance Functions | Adam Bates, Wajih Ul Hassan, Kevin Butler, Alin Dobra, Bradley Reaves, Patrick Cable, Thomas Moyer, Nabil Schear | In this work, we present a transparent provenance-based approach for auditing web services through the introduction of Network Provenance Functions (NPFs). |
93 | J-Force: Forced Execution on JavaScript | Kyungtae Kim, I Luk Kim, Chung Hwan Kim, Yonghwi Kwon, Yunhui Zheng, Xiangyu Zhang, Dongyan Xu | In this paper, we propose J-FORCE, a crash-free forced JavaScript execution engine to systematically explore possible execution paths and reveal malicious behaviors in such malware. |
94 | User Personalized Satisfaction Prediction via Multiple Instance Deep Learning | Zheqian Chen, Ben Gao, Huimin Zhang, Zhou Zhao, Haifeng Liu, Deng Cai | In this paper, we settle this issue by developing a new multiple instance deep learning framework. |
95 | What Makes a Link Successful on Wikipedia? | Dimitar Dimitrov, Philipp Singer, Florian Lemmerich, Markus Strohmaier | Starting from this observation, we set out to study large-scale click data from Wikipedia in order to understand what makes a link successful. |
96 | Cats and Captions vs. Creators and the Clock: Comparing Multimodal Content to Context in Predicting Relative Popularity | Jack Hessel, Lillian Lee, David Mimno | In this work, we separate out the influence of these non-content factors in several ways. |
97 | Clustered Model Adaption for Personalized Sentiment Analysis | Lin Gong, Benjamin Haines, Hongning Wang | We propose to capture humans’ variable and idiosyncratic sentiment via building personalized sentiment classification models at a group level. |
98 | Exact Computation of Influence Spread by Binary Decision Diagrams | Takanori Maehara, Hirofumi Suzuki, Masakazu Ishihata | We propose the first algorithm to compute influence spread exactly under the independent cascade model. |
99 | Secure Centrality Computation Over Multiple Networks | Gilad Asharov, Francesco Bonchi, David Garcia-Soriano, Tamir Tassa | We tackle this problem and devise a protocol which is highly scalable and still provably secure. |
100 | Interplay between Social Influence and Network Centrality: A Comparative Study on Shapley Centrality and Single-Node-Influence Centrality | Wei Chen, Shang-Hua Teng | We present a comprehensive comparative study of these two centrality measures. |
101 | Portfolio Optimization for Influence Spread | Naoto Ohsaka, Yuichi Yoshida | To address this issue, we adopt conditional value at risk (CVaR) as a risk measure, and propose an algorithm that computes a portfolio over seed sets with a provable guarantee on its CVaR. |
102 | Extracting and Ranking Travel Tips from User-Generated Reviews | Ido Guy, Avihai Mejer, Alexander Nus, Fiana Raiber | In this work, we propose to extract short practical tips from user reviews. |
103 | Information Extraction in Illicit Web Domains | Mayank Kejriwal, Pedro Szekely | In this paper, we propose a lightweight, feature-agnostic Information Extraction (IE) paradigm specifically designed for such domains. |
104 | Learning to Extract Events from Knowledge Base Revisions | Alexander Konovalov, Benjamin Strauss, Alan Ritter, Brendan O’Connor | In this paper we demonstrate the feasibility of accurately identifying entity-transition-events, from real-time news and social media text streams, that drive changes to a knowledge base. |
105 | CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases | Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Tarek F. Abdelzaher, Jiawei Han | In this paper, we investigate joint extraction of typed entities and relations with labeled data heuristically obtained from knowledge bases (i.e., distant supervision). |
106 | Correlation Clustering with Low-Rank Matrices | Nate Veldt, Anthony I. Wirth, David F. Gleich | In this paper we explore how to solve the correlation clustering objective exactly when the data to be clustered can be represented by a low-rank matrix. |
107 | Consistent Weighted Sampling Made More Practical | Wei Wu, Bin Li, Ling Chen, Chengqi Zhang | In this paper, we propose a Practical CWS (PCWS) algorithm. |
108 | Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification | Jan Deriu, Aurelien Lucchi, Valeria De Luca, Aliaksei Severyn, Simon Müller, Mark Cieliebak, Thomas Hofmann, Martin Jaggi | This paper presents a novel approach for multi-lingual sentiment classification in short texts. |
109 | Theory of the GMM Kernel | Ping Li, Cun-Hui Zhang | In this study, we develop theoretical results for both the cosine and the GMM (generalized min-max) kernel, which is a generalization of the resemblance. |
110 | Bimodal Distribution and Co-Bursting in Review Spam Detection | Huayi Li, Geli Fei, Shuai Wang, Bing Liu, Weixiang Shao, Arjun Mukherjee, Jidong Shao | Existing approaches to detecting spam reviews and reviewers employed review contents, reviewer behaviors, star rating patterns, and reviewer-product networks for detection. |
111 | Detecting Collusive Spamming Activities in Community Question Answering | Yuli Liu, Yiqun Liu, Ke Zhou, Min Zhang, Shaoping Ma | To shed light on these research questions, we propose a unified framework to tackle the challenge of detecting collusive spamming activities of CQA. |
112 | FLOCK: Combating Astroturfing on Livestreaming Platforms | Neil Shah | Our work provides a number of major contributions: (a) formulation: we are the first to introduce and characterize the viewbot fraud problem in livestreaming platforms, (b) methodology: we propose FLOCK, a principled and unsupervised method which efficiently and effectively identifies botted broadcasts and their constituent botted views, and (c) practicality: our approach achieves over 98% precision in identifying botted broadcasts and over 90% precision/recall against sizable synthetically generated viewbot attacks on a real-world livestreaming workload of over 16 million views and 92 thousand broadcasts. |
113 | Can You Spot the Fakes?: On the Limitations of User Feedback in Online Social Networks | David Mandell Freeman | In this work we provide the first public, data-driven assessment of whether the above assumption is true: are some users better at reporting than others? |
114 | Modeling Consumer Preferences and Price Sensitivities from Large-Scale Grocery Shopping Transaction Logs | Mengting Wan, Di Wang, Matt Goldman, Matt Taddy, Justin Rao, Jie Liu, Dimitrios Lymberopoulos, Julian McAuley | In this study, we seek to bridge the gap between large-scale recommender systems and established consumer theories from economics, and propose a nested feature-based matrix factorization framework to model both preferences and price sensitivities. |
115 | Do "Also-Viewed" Products Help User Rating Prediction? | Chanyoung Park, Donghyun Kim, Jinoh Oh, Hwanjo Yu | In this paper, we propose a matrix co-factorization method that leverages information hidden in the so-called "also-viewed" products, i.e., a list of products that has also been viewed by users who have viewed a target product. |
116 | Monetary Discount Strategies for Real-Time Promotion Campaign | Ying-Chun Lin, Chi-Hsuan Huang, Chu-Cheng Hsieh, Yu-Chen Shu, Kun-Ta Chuang | To achieve more effectiveness on real-time promotion in pursuit of better profits, we propose two discount-giving strategies: an algorithm based on Kernel density estimation, and the other algorithm based on Thompson sampling strategy. |
117 | Predicting Latent Structured Intents from Shopping Queries | Chao-Yuan Wu, Amr Ahmed, Gowtham Ramani Kumar, Ritendra Datta | In this paper we study the problem of inferring the latent intent from unstructured queries and mapping them to structured attributes. |
118 | EOMM: An Engagement Optimized Matchmaking Framework | Zhengxing Chen, Su Xue, John Kolen, Navid Aghdaie, Kazi A. Zaman, Yizhou Sun, Magy Seif El-Nasr | In this paper, we propose an Engagement Optimized Matchmaking (EOMM) framework that maximizes overall player engagement. |
119 | Back To The Source: An Online Approach for Sensor Placement and Source Localization | Brunella Spinelli, L. Elisa Celis, Patrick Thiran | We propose the first online approach to source localization: We deploy a priori only a small number of sensors (which reveal if they are reached by an infection) and then iteratively choose the best location to place a new sensor in order to localize the source. |
120 | What’s in a Name?: Understanding Profile Name Reuse on Twitter | Enrico Mariconti, Jeremiah Onaolapo, Syed Sharique Ahmad, Nicolas Nikiforou, Manuel Egele, Nick Nikiforakis, Gianluca Stringhini | In this paper, we provide a large-scale study of the phenomenon of profile name reuse on Twitter. |
121 | Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment | Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, Krishna P. Gummadi | To account for and avoid such unfairness, in this paper, we introduce a new notion of unfairness, disparate mistreatment, which is defined in terms of misclassification rates. |
122 | Sampling from Social Networks with Attributes | Claudia Wagner, Philipp Singer, Fariba Karimi, Jürgen Pfeffer, Markus Strohmaier | In this paper, we explore the sensitivity of different sampling techniques (node sampling, edge sampling, random walk sampling, and snowball sampling) on social networks with attributes. |
123 | Automated Template Generation for Question Answering over Knowledge Graphs | Abdalghani Abujabal, Mohamed Yahya, Mirek Riedewald, Gerhard Weikum | This paper presents QUINT, a system that automatically learns utterance-query templates solely from user questions paired with their answers. |
124 | A Semantic Graph-Based Approach for Mining Common Topics from Multiple Asynchronous Text Streams | Long Chen, Joemon M. Jose, Haitao Yu, Fajie Yuan | In this paper, we propose a semantic graph based topic modelling approach for structuring asynchronous text streams. |
125 | Neural Network-based Question Answering over Knowledge Graphs on Word and Character Level | Denis Lukovnikov, Asja Fischer, Jens Lehmann, Sören Auer | In this work, we follow a quite different approach: We train a neural network for answering simple questions in an end-to-end manner, leaving all decisions to the model. |
126 | Detecting Duplicate Posts in Programming QA Communities via Latent Semantics and Association Rules | Wei Emma Zhang, Quan Z. Sheng, Jey Han Lau, Ermyas Abebe | In this paper, we propose a methodology designed for the PCQA domain to detect duplicate questions. |
127 | How Public Is My Private Life?: Privacy in Online Dating | Camille Cobb, Tadayoshi Kohno | We present the results of a survey we designed to examine privacy-related risks, practices, and expectations of people who use or have used online dating, then delve deeper using semi-structured interviews. |
128 | Trajectory Recovery From Ash: User Privacy Is NOT Preserved in Aggregated Mobility Data | Fengli Xu, Zhen Tu, Yong Li, Pengyu Zhang, Xiaoming Fu, Depeng Jin | We develop an attack system that is able to exploit the uniqueness and regularity of human mobility to recover individual’s trajectories from the aggregated mobility data without any prior knowledge. |
129 | The Onions Have Eyes: A Comprehensive Structure and Privacy Analysis of Tor Hidden Services | Iskander Sanchez-Rola, Davide Balzarotti, Igor Santos | To fill this gap, we developed a dedicated analysis platform and used it to crawl and analyze over 1.5M URLs hosted in 7257 onion domains. |
130 | De-anonymizing Web Browsing Data with Social Networks | Jessica Su, Ansh Shukla, Sharad Goel, Arvind Narayanan | Our theoretical contribution applies to any type of transactional data and is robust to noisy observations, generalizing a wide range of previous de-anonymization attacks. |
131 | Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding | Chenyan Xiong, Russell Power, Jamie Callan | This paper introduces Explicit Semantic Ranking (ESR), a new ranking technique that leverages knowledge graph embedding. |
132 | Neighbor-Aware Search for Approximate Labeled Graph Matching using the Chi-Square Statistics | Sourav Dutta, Pratik Nayek, Arnab Bhattacharya | This paper presents a novel technique to characterize the subgraph similarity based on statistical significance captured by chi-square statistic. |
133 | Learning to Match using Local and Distributed Representations of Text for Web Search | Bhaskar Mitra, Fernando Diaz, Nick Craswell | We propose a novel document ranking model composed of two separate deep neural networks, one that matches the query and the document using a local representation, and another that matches the query and the document using learned distributed representations. |
134 | Using the Delay in a Treatment Effect to Improve Sensitivity and Preserve Directionality of Engagement Metrics in A/B Experiments | Alexey Drutsa, Gleb Gusev, Pavel Serdyukov | In this paper, we study how the delay property of user learning can be used to improve sensitivity of several popular metrics of user loyalty and activity. |
135 | GB-CENT: Gradient Boosted Categorical Embedding and Numerical Trees | Qian Zhao, Yue Shi, Liangjie Hong | Since in real-world applications we usually have both abundant numerical features and categorical features with large cardinality (e.g. geolocations, IDs, tags etc.), we design a new model, called GB-CENT, which leverages latent factor embedding and tree components to achieve the merits of both while avoiding their demerits. |
136 | Decoupled Collaborative Ranking | Jun Hu, Ping Li | We propose a new pointwise collaborative ranking approach for recommender systems, which focuses on improving ranking performance at the top of recommended list. |
137 | LETOR Methods for Unsupervised Rank Aggregation | Avradeep Bhowmik, Joydeep Ghosh | In this manuscript we propose a novel framework to bypass these issues by using object attributes to augment the standard rank aggregation framework. |
138 | A Generic Coordinate Descent Framework for Learning from Implicit Feedback | Immanuel Bayer, Xiangnan He, Bhargav Kanagal, Steffen Rendle | In this paper, we provide a new framework for deriving efficient CD algorithms for complex recommender models. |
139 | On Analyzing User Topic-Specific Platform Preferences Across Multiple Social Media Sites | Roy Ka-Wei Lee, Tuan-Anh Hoang, Ee-Peng Lim | To model social media topics as well as platform preferences of users, we propose a new topic model known as MultiPlatform-LDA (MultiLDA). |
140 | Competition and Selection Among Conventions | Rahmtin Rotabi, Cristian Danescu-Niculescu-Mizil, Jon Kleinberg | In this work we study a setting in which we can cleanly track the competition among conventions while explicitly taking these sources of complexity into account. |
141 | Discussion Quality Diffuses in the Digital Public Square | George Berry, Sean J. Taylor | We present the results of a study on large public Facebook Pages where we randomly used two different methods—most recent and social feedback—to order comments on posts. |
142 | When Confidence and Competence Collide: Effects on Online Decision-Making Discussions | Liye Fu, Lillian Lee, Cristian Danescu-Niculescu-Mizil | Our goal in this work is to understand the effects of confidence-competence misalignment on the dynamics and outcomes of discussions. |
143 | Ex Machina: Personal Attacks Seen at Scale | Ellery Wulczyn, Nithum Thain, Lucas Dixon | The contribution of this paper is to develop and illustrate a method that combines crowdsourcing and machine learning to analyze personal attacks at scale. |
144 | Temporal Effects on Hashtag Reuse in Twitter: A Cognitive-Inspired Hashtag Recommendation Approach | Dominik Kowald, Subhash Chandra Pujari, Elisabeth Lex | In this paper, we study temporal hashtag usage practices in Twitter with the aim of designing a cognitive-inspired hashtag recommendation algorithm we call BLLi,s. |
145 | Exploring Rated Datasets with Rating Maps | Sihem Amer-Yahia, Sofia Kleisarchaki, Naresh Kumar Kolloju, Laks V.S. Lakshmanan, Ruben H. Zamar | In this paper, we develop a framework for finding and exploring population segments and their opinions. |
146 | Modeling the Dynamics of Learning Activity on the Web | Charalampos Mavroforakis, Isabel Valera, Manuel Gomez-Rodriguez | In this paper, we introduce a novel modeling framework for clustering continuous-time grouped streaming data, the Hierarchical Dirichlet Hawkes process (HDHP), which allows us to automatically uncover a wide variety of learning patterns from detailed traces of learning activity. |
147 | ESCAPE: Efficiently Counting All 5-Vertex Subgraphs | Ali Pinar, C. Seshadhri, Vaidyanathan Vishal | We introduce an algorithmic framework that can be adopted to count any small pattern in a graph and apply this framework to compute exact counts for all 5-vertex subgraphs. |
148 | The k-peak Decomposition: Mapping the Global Structure of Graphs | Priya Govindan, Chenghong Wang, Chumeng Xu, Hongyu Duan, Sucheta Soundarajan | To resolve this issue, we propose the k-peak graph decomposition method, based on the k-core algorithm, which finds the centers of distinct regions in the graph. |
149 | Scalable Motif-aware Graph Clustering | Charalampos E. Tsourakakis, Jakub Pachocki, Michael Mitzenmacher | We develop new methods based on graph motifs for graph clustering, allowing more efficient detection of communities within networks. |
150 | Indexing Public-Private Graphs | Aaron Archer, Silvio Lattanzi, Peter Likarish, Sergei Vassilvitskii | We consider the reachability indexing problem for private-public directed graphs. |
151 | Pinning Down Abuse on Google Maps | Danny Yuxing Huang, Doug Grundman, Kurt Thomas, Abhishek Kumar, Elie Bursztein, Kirill Levchenko, Alex C. Snoeren | In this paper, we investigate a new form of blackhat search engine optimization that targets local listing services like Google Maps. |
152 | Extended Tracking Powers: Measuring the Privacy Diffusion Enabled by Browser Extensions | Oleksii Starov, Nick Nikiforakis | In this paper, we report on the first large-scale study of privacy leakage enabled by extensions. |
153 | Security Implications of Redirection Trail in Popular Websites Worldwide | Li Chang, Hsu-Chun Hsiao, Wei Jeng, Tiffany Hyun-Jin Kim, Wei-Hsi Lin | This paper reports a well-rounded investigation to analyze the wellness of URL redirection security. |
154 | Some Recipes Can Do More Than Spoil Your Appetite: Analyzing the Security and Privacy Risks of IFTTT Recipes | Milijana Surbatovich, Jassim Aljuraidan, Lujo Bauer, Anupam Das, Limin Jia | To gain an in-depth understanding of the potential security and privacy risks, we build an information-flow model to analyze how often IFTTT recipes involve potential integrity or secrecy violations. |
155 | Characterizing Email Search using Large-scale Behavioral Logs and Surveys | Qingyao Ai, Susan T. Dumais, Nick Craswell, Dan Liebling | In this paper we report the results of a large-scale log analysis of email search and complement this with a survey to better understand email search intent and success. |
156 | Template Induction over Unstructured Email Corpora | Julia Proskurnia, Marc-Allen Cartright, Lluis Garcia-Pueyo, Ivo Krka, James B. Wendt, Tobias Kaufmann, Balint Miklos | We propose a technique for inducing high quality templates from plain text emails at scale based on the suffix array data structure. |
157 | Situational Context for Ranking in Personal Search | Hamed Zamani, Michael Bendersky, Xuanhui Wang, Mingyang Zhang | We propose two context-aware ranking models based on neural networks. |
158 | The Demographics of Mail Search and their Application to Query Suggestion | David Carmel, Liane Lewin-Eytan, Alex Libov, Yoelle Maarek, Ariel Raviv | We study here the characteristics of Web mail searchers, and explore how demographic signals such as location, age, gender, and inferred income, influence their search behavior. |
159 | Promoting Relevant Results in Time-Ranked Mail Search | David Carmel, Liane Lewin-Eytan, Alex Libov, Yoelle Maarek, Ariel Raviv | We describe three hero-selection algorithms we have devised and the associated experiments we have conducted in Yahoo mail. |
160 | AttriInfer: Inferring User Attributes in Online Social Networks Using Markov Random Fields | Jinyuan Jia, Binghui Wang, Le Zhang, Neil Zhenqiang Gong | In this work, we propose AttriInfer, a new method to infer user attributes in online social networks. |
161 | Neural Underpinnings of Website Legitimacy and Familiarity Detection: An fNIRS Study | Ajaya Neupane, Nitesh Saxena, Leanne Hirshfield | In this paper, we study the neural underpinnings relevant to user-centered web security through the lens of functional near-infrared spectroscopy (fNIRS). |
162 | Probabilistic Visitor Stitching on Cross-Device Web Logs | Sungchul Kim, Nikhil Kini, Jay Pujara, Eunyee Koh, Lise Getoor | We introduce a general, probabilistic approach to visitor stitching using features and attributes commonly contained in web logs. |
163 | Why We Read Wikipedia | Philipp Singer, Florian Lemmerich, Robert West, Leila Zia, Ellery Wulczyn, Markus Strohmaier, Jure Leskovec | The goal of this paper is to fill this gap by combining a survey of Wikipedia readers with a log-based analysis of user activity. |
164 | Learning Personalized Preference of Strong and Weak Ties for Social Recommendation | Xin Wang, Steven C.H. Hoi, Martin Ester, Jiajun Bu, Chun Chen | Despite the extensive studies, no existing work has attempted to distinguish and learn the personalized preferences between strong and weak ties, two important terms widely used in social sciences, for each individual in social recommendation. |
165 | Cross View Link Prediction by Learning Noise-resilient Representation Consensus | Xiaokai Wei, Linchuan Xu, Bokai Cao, Philip S. Yu | In this paper, we study the problem of Cross View Link Prediction (CVLP) on partially observable networks, where the focus is to recommend nodes with only links to nodes with only attributes (or vice versa). |
166 | Semi-supervised Clustering in Attributed Heterogeneous Information Networks | Xiang Li, Yao Wu, Martin Ester, Ben Kao, Xin Wang, Yudian Zheng | We study the problem of clustering objects in an AHIN, taking into account objects’ similarities with respect to both object attribute values and their structural connectedness in the network. |
167 | An Efficient Approach to Event Detection and Forecasting in Dynamic Multivariate Social Media Networks | Minglai Shao, Jianxin Li, Feng Chen, Hongyi Huang, Shuai Zhang, Xunxun Chen | This paper presents a generic framework, namely dynamic multivariate evolving anomalous subgraphs scanning (DMGraphScan), to addressthis problem in dynamic multivariate social media networks. |