Paper Digest: ACL 2018 Highlights

July 14, 2018October 6, 2019 admin

Annual Meeting of the Association for Computational Linguistics (ACL) is one of the top natural language processing conferences in the world. In 2018, it is to be held in Melbourne, Australia. There were 2,571 paper submissions, of which 256 were accepted as long papers, and 125 as short papers.

To help AI community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting AI paper, you are welcome to sign up our free paper digest service to get new paper updates customized to your own interests on a daily basis.

Paper Digest Team
team@paperdigest.org

TABLE 1: ACL 2018 Long Papers

	Title	Authors	Highlight
1	Probabilistic FastText for Multi-Sense Word Embeddings	Ben Athiwaratkun, Andrew Wilson, Anima Anandkumar	We introduce Probabilistic FastText, a new model for word embeddings that can capture multiple word senses, sub-word structure, and uncertainty information.
2	A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors	Mikhail Khodak, Nikunj Saunshi, Yingyu Liang, Tengyu Ma, Brandon Stewart, Sanjeev Arora	This paper introduces a la carte embedding, a simple and general alternative to the usual word2vec-based approaches for building such representations that is based upon recent theoretical results for GloVe-like embeddings. We introduce a new dataset showing how the a la carte method requires fewer examples of words in context to learn high-quality embeddings and we obtain state-of-the-art results on a nonce task and some unsupervised document classification tasks.
3	Unsupervised Learning of Distributional Relation Vectors	Shoaib Jameel, Zied Bouraoui, Steven Schockaert	In this paper, we introduce a novel method which directly learns relation vectors from co-occurrence statistics.
4	Explicit Retrofitting of Distributional Word Vectors	Goran Glavaš, Ivan Vulić	In this work, in contrast, we transform external lexico-semantic relations into training examples which we use to learn an explicit retrofitting model (ER).
5	Unsupervised Neural Machine Translation with Weight Sharing	Zhen Yang, Wei Chen, Feng Wang, Bo Xu	To address this issue, we introduce an extension by utilizing two independent encoders but sharing some partial weights which are responsible for extracting high-level representations of the input sentences.
6	Triangular Architecture for Rare Language Translation	Shuo Ren, Wenhu Chen, Shujie Liu, Mu Li, Ming Zhou, Shuai Ma	By introducing another rich language Y, we propose a novel triangular training architecture (TA-NMT) to leverage bilingual data (Y,Z) (may be small) and (X,Y) (can be rich) to improve the translation performance of low-resource pairs.
7	Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates	Taku Kudo	The question addressed in this paper is whether it is possible to harness the segmentation ambiguity as a noise to improve the robustness of NMT.
8	The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation	Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George Foster, Llion Jones, Mike Schuster, Noam Shazeer, Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Zhifeng Chen, Yonghui Wu, Macduff Hughes	In this paper, we tease apart the new architectures and their accompanying techniques in two ways.
9	Ultra-Fine Entity Typing	Eunsol Choi, Omer Levy, Yejin Choi, Luke Zettlemoyer	We introduce a new entity typing task: given a sentence with an entity mention, the goal is to predict a set of free-form phrases (e.g. skyscraper, songwriter, or criminal) that describe appropriate types for the target entity.
10	Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking	Shikhar Murty, Patrick Verga, Luke Vilnis, Irena Radovanovic, Andrew McCallum	This paper presents new methods using real and complex bilinear mappings for integrating hierarchical information, yielding substantial improvement over flat predictions in entity linking and fine-grained entity typing, and achieving new state-of-the-art results for end-to-end models on the benchmark FIGER dataset. We also present two new human-annotated datasets containing wide and deep hierarchies which we will release to the community to encourage further research in this direction: \textit{MedMentions}, a collection of PubMed abstracts in which 246k mentions have been mapped to the massive UMLS ontology; and \textit{TypeNet}, which aligns Freebase types with the WordNet hierarchy to obtain nearly 2k entity types.
11	Improving Knowledge Graph Embedding Using Simple Constraints	Boyang Ding, Quan Wang, Bin Wang, Li Guo	This paper, by contrast, investigates the potential of using very simple constraints to improve KG embedding.
12	Towards Understanding the Geometry of Knowledge Graph Embeddings	Chandrahas, Aditya Sharma, Partha Talukdar	Despite this popularity and effectiveness of KG embeddings in various tasks (e.g., link prediction), geometric understanding of such embeddings (i.e., arrangement of entity and relation vectors in vector space) is unexplored – we fill this gap in the paper.
13	A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss	Wan-Ting Hsu, Chieh-Kai Lin, Ming-Ying Lee, Kerui Min, Jing Tang, Min Sun	We propose a unified model combining the strength of extractive and abstractive summarization.
14	Extractive Summarization with SWAP-NET: Sentences and Words from Alternating Pointer Networks	Aishwarya Jadhav, Vaibhav Rajan	We present a new neural sequence-to-sequence model for extractive summarization called SWAP-NET (Sentences and Words from Alternating Pointer Networks).
15	Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization	Ziqiang Cao, Wenjie Li, Sujian Li, Furu Wei	Inspired by the traditional template-based summarization approaches, this paper proposes to use existing summaries as soft templates to guide the seq2seq model.
16	Simple and Effective Text Simplification Using Semantic and Neural Methods	Elior Sulem, Omri Abend, Ari Rappoport	Here we present a simple and efficient splitting algorithm based on an automatic semantic parser.
17	Obtaining Reliable Human Ratings of Valence, Arousal, and Dominance for 20,000 English Words	Saif Mohammad	We present the NRC VAD Lexicon, which has human ratings of valence, arousal, and dominance for more than 20,000 English words.
18	Comprehensive Supersense Disambiguation of English Prepositions and Possessives	Nathan Schneider, Jena D. Hwang, Vivek Srikumar, Jakob Prange, Austin Blodgett, Sarah R. Moeller, Aviram Stern, Adi Bitan, Omri Abend	We introduce a new annotation scheme, corpus, and task for the disambiguation of prepositions and possessives in English.
19	A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature	Benjamin Nye, Junyi Jessy Li, Roma Patel, Yinfei Yang, Iain Marshall, Ani Nenkova, Byron Wallace	We present a corpus of 5,000 richly annotated abstracts of medical articles describing clinical randomized controlled trials.
20	Efficient Online Scalar Annotation with Bounded Support	Keisuke Sakaguchi, Benjamin Van Durme	We describe a novel method for efficiently eliciting scalar annotations for dataset construction and system quality estimation by human judgments.
21	Neural Argument Generation Augmented with Externally Retrieved Evidence	Xinyu Hua, Lu Wang	In this work, we study a novel task on automatically generating arguments of a different stance for a given statement.
22	A Stylometric Inquiry into Hyperpartisan and Fake News	Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, Benno Stein	We report on a comparative style analysis of hyperpartisan (extremely one-sided) news and fake news.
23	Retrieval of the Best Counterargument without Prior Topic Knowledge	Henning Wachsmuth, Shahbaz Syed, Benno Stein	To operationalize our hypothesis, we simultaneously model the similarity and dissimilarity of pairs of arguments, based on the words and embeddings of the arguments’ premises and conclusions.
24	LinkNBed: Multi-Graph Representation Learning with Entity Linkage	Rakshit Trivedi, Bunyamin Sisman, Xin Luna Dong, Christos Faloutsos, Jun Ma, Hongyuan Zha	To this end, we propose LinkNBed, a deep relational learning framework that learns entity and relationship representations across multiple graphs.
25	Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures	Luke Vilnis, Xiang Li, Shikhar Murty, Andrew McCallum	In this work we show that a broad class of models that assign probability measures to OE can never capture negative correlation, which motivates our construction of a novel box lattice and accompanying probability measure to capture anti-correlation and even disjoint concepts, while still providing the benefits of probabilistic modeling, such as the ability to perform rich joint and conditional queries over arbitrary sets of concepts, and both learning from and predicting calibrated uncertainty.
26	Graph-to-Sequence Learning using Gated Graph Neural Networks	Daniel Beck, Gholamreza Haffari, Trevor Cohn	In this work propose a new model that encodes the full structural information contained in the graph.
27	Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context	Urvashi Khandelwal, He He, Peng Qi, Dan Jurafsky	In this paper, we investigate the role of context in an LSTM LM, through ablation studies.
28	Bridging CNNs, RNNs, and Weighted Finite-State Machines	Roy Schwartz, Sam Thomson, Noah A. Smith	In this paper we present SoPa, a new model that aims to bridge these two approaches.
29	Zero-shot Learning of Classifiers from Natural Language Quantification	Shashank Srivastava, Igor Labutov, Tom Mitchell	We present a framework through which a set of explanations of a concept can be used to learn a classifier without access to any labeled examples.
30	Sentence-State LSTM for Text Representation	Yue Zhang, Qi Liu, Linfeng Song	We investigate an alternative LSTM structure for encoding text, which consists of a parallel state for each word.
31	Universal Language Model Fine-tuning for Text Classification	Jeremy Howard, Sebastian Ruder	We propose Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a language model.
32	Evaluating neural network explanation methods using hybrid documents and morphosyntactic agreement	Nina Poerner, Hinrich Schütze, Benjamin Roth	We show empirically that LIMSSE, LRP and DeepLIFT are the most effective explanation methods and recommend them for explaining DNNs in NLP.
33	Improving Text-to-SQL Evaluation Methodology	Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan, Sesh Sadasivam, Rui Zhang, Dragomir Radev	We identify limitations of and propose improvements to current evaluations of text-to-SQL systems.
34	Semantic Parsing with Syntax- and Table-Aware SQL Generation	Yibo Sun, Duyu Tang, Nan Duan, Jianshu Ji, Guihong Cao, Xiaocheng Feng, Bing Qin, Ting Liu, Ming Zhou	We present a generative model to map natural language questions into SQL queries.
35	Multitask Parsing Across Semantic Representations	Daniel Hershcovich, Omri Abend, Ari Rappoport	In this paper we tackle the challenging task of improving semantic parsing performance, taking UCCA parsing as a test case, and AMR, SDP and Universal Dependencies (UD) parsing as auxiliary tasks.
36	Character-Level Models versus Morphology in Semantic Role Labeling	Gözde Gül Şahin, Mark Steedman	In this work, we train various types of SRL models that use word, character and morphology level information and analyze how performance of characters compare to words and morphology for several languages.
37	AMR Parsing as Graph Prediction with Latent Alignment	Chunchuan Lyu, Ivan Titov	We introduce a neural parser which treats alignments as latent variables within a joint probabilistic model of concepts, relations and alignments.
38	Accurate SHRG-Based Semantic Parsing	Yufei Chen, Weiwei Sun, Xiaojun Wan	We demonstrate that an SHRG-based parser can produce semantic graphs much more accurately than previously shown, by relating synchronous production rules to the syntacto-semantic composition process.
39	Using Intermediate Representations to Solve Math Word Problems	Danqing Huang, Jin-Ge Yao, Chin-Yew Lin, Qingyu Zhou, Jian Yin	In this work we present an intermediate meaning representation scheme that tries to reduce this gap.
40	Discourse Representation Structure Parsing	Jiangming Liu, Shay B. Cohen, Mirella Lapata	We introduce an open-domain neural semantic parser which generates formal meaning representations in the style of Discourse Representation Theory (DRT; Kamp and Reyle 1993).
41	Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms	Dinghan Shen, Guoyin Wang, Wenlin Wang, Martin Renqiang Min, Qinliang Su, Yizhe Zhang, Chunyuan Li, Ricardo Henao, Lawrence Carin	In this paper, we conduct a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models.
42	ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations	John Wieting, Kevin Gimpel	We describe ParaNMT-50M, a dataset of more than 50 million English-English sentential paraphrase pairs.
43	Event2Mind: Commonsense Inference on Events, Intents, and Reactions	Hannah Rashkin, Maarten Sap, Emily Allaway, Noah A. Smith, Yejin Choi	We investigate a new commonsense inference task: given an event described in a short free-form text (“X drinks coffee in the morning”), a system reasons about the likely intents (“X wants to stay awake”) and reactions (“X feels alert”) of the event’s participants. To support this study, we construct a new crowdsourced corpus of 25,000 event phrases covering a diverse range of everyday events and situations.
44	Neural Adversarial Training for Semi-supervised Japanese Predicate-argument Structure Analysis	Shuhei Kurita, Daisuke Kawahara, Sadao Kurohashi	In this paper, we propose a novel Japanese PAS analysis model based on semi-supervised adversarial training with a raw corpus.
45	Improving Event Coreference Resolution by Modeling Correlations between Event Coreference Chains and Document Topic Structures	Prafulla Kumar Choubey, Ruihong Huang	This paper proposes a novel approach for event coreference resolution that models correlations between event coreference chains and document topical structures through an Integer Linear Programming formulation.
46	DSGAN: Generative Adversarial Training for Distant Supervision Relation Extraction	Pengda Qin, Weiran Xu, William Yang Wang	In this paper, we introduce an adversarial learning framework, which we named DSGAN, to learn a sentence-level true-positive generator.
47	Extracting Relational Facts by an End-to-End Neural Model with Copy Mechanism	Xiangrong Zeng, Daojian Zeng, Shizhu He, Kang Liu, Jun Zhao	In this paper, we propose an end-to-end model based on sequence-to-sequence learning with copy mechanism, which can jointly extract relational facts from sentences of any of these classes.
48	Self-regulation: Employing a Generative Adversarial Network to Improve Event Detection	Yu Hong, Wenxuan Zhou, Jingli Zhang, Guodong Zhou, Qiaoming Zhu	In this paper, we propose a self-regulated learning approach by utilizing a generative adversarial network to generate spurious features.
49	Context-Aware Neural Model for Temporal Information Extraction	Yuanliang Meng, Anna Rumshisky	We propose a context-aware neural network model for temporal information extraction.
50	Temporal Event Knowledge Acquisition via Identifying Narratives	Wenlin Yao, Ruihong Huang	Inspired by the double temporality characteristic of narrative texts, we propose a novel approach for acquiring rich temporal “before/after” event knowledge across sentences in narrative stories.
51	Textual Deconvolution Saliency (TDS) : a deep tool box for linguistic analysis	Laurent Vanni, Melanie Ducoffe, Carlos Aguilar, Frederic Precioso, Damon Mayaffre	In this paper, we propose a new strategy, called Text Deconvolution Saliency (TDS), to visualize linguistic information detected by a CNN for text classification.
52	Coherence Modeling of Asynchronous Conversations: A Neural Entity Grid Approach	Shafiq Joty, Muhammad Tasnim Mohiuddin, Dat Tien Nguyen	We propose a novel coherence model for written asynchronous conversations (e.g., forums, emails), and show its applications in coherence assessment and thread reconstruction tasks.
53	Deep Reinforcement Learning for Chinese Zero Pronoun Resolution	Qingyu Yin, Yu Zhang, Wei-Nan Zhang, Ting Liu, William Yang Wang	In this paper, we show how to integrate these goals, applying deep reinforcement learning to deal with the task.
54	Entity-Centric Joint Modeling of Japanese Coreference Resolution and Predicate Argument Structure Analysis	Tomohide Shibata, Sadao Kurohashi	This paper presents an entity-centric joint model for Japanese coreference resolution and predicate argument structure analysis.
55	Constraining MGbank: Agreement, L-Selection and Supertagging in Minimalist Grammars	John Torr	This paper reports on two strategies that have been implemented for improving the efficiency and precision of wide-coverage Minimalist Grammar (MG) parsing.
56	Not that much power: Linguistic alignment is influenced more by low-level linguistic features rather than social power	Yang Xu, Jeremy Cole, David Reitter	This work characterizes the effect of power on alignment with logistic regression models in two datasets, finding that the effect vanishes or is reversed after controlling for low-level features such as utterance length.
57	TutorialBank: A Manually-Collected Corpus for Prerequisite Chains, Survey Extraction and Resource Recommendation	Alexander Fabbri, Irene Li, Prawat Trairatvorakul, Yijiao He, Weitai Ting, Robert Tung, Caitlin Westerfield, Dragomir Radev	To address this situation, we introduce TutorialBank, a new, publicly available dataset which aims to facilitate NLP education and research. We are releasing the dataset and present several avenues for further research.
58	Give Me More Feedback: Annotating Argument Persuasiveness and Related Attributes in Student Essays	Winston Carlile, Nishant Gurrapadi, Zixuan Ke, Vincent Ng	We present the first corpus of essays that are simultaneously annotated with argument components, argument persuasiveness scores, and attributes of argument components that impact an argument’s persuasiveness.
59	Inherent Biases in Reference-based Evaluation for Grammatical Error Correction	Leshem Choshen, Omri Abend	Concretely, we show that LCB incentivizes GEC systems to avoid correcting even when they can generate a valid correction.
60	The price of debiasing automatic metrics in natural language evalaution	Arun Chaganty, Stephen Mussmann, Percy Liang	In this paper, we use control variates to combine automatic metrics with human evaluation to obtain an unbiased estimator with lower cost than human evaluation alone.
61	Neural Document Summarization by Jointly Learning to Score and Select Sentences	Qingyu Zhou, Nan Yang, Furu Wei, Shaohan Huang, Ming Zhou, Tiejun Zhao	In this paper, we present a novel end-to-end neural network framework for extractive document summarization by jointly learning to score and select sentences.
62	Unsupervised Abstractive Meeting Summarization with Multi-Sentence Compression and Budgeted Submodular Maximization	Guokan Shang, Wensi Ding, Zekun Zhang, Antoine Tixier, Polykarpos Meladianos, Michalis Vazirgiannis, Jean-Pierre Lorré	We introduce a novel graph-based framework for abstractive meeting speech summarization that is fully unsupervised and does not rely on any annotations.
63	Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting	Yen-Chun Chen, Mohit Bansal	Inspired by how humans summarize long documents, we propose an accurate and fast summarization model that first selects salient sentences and then rewrites them abstractively (i.e., compresses and paraphrases) to generate a concise overall summary.
64	Soft Layer-Specific Multi-Task Summarization with Entailment and Question Generation	Han Guo, Ramakanth Pasunuru, Mohit Bansal	We improve these important aspects of abstractive summarization via multi-task learning with the auxiliary tasks of question generation and entailment generation, where the former teaches the summarization model how to look for salient questioning-worthy details, and the latter teaches the model how to rewrite a summary which is a directed-logical subset of the input document.
65	Modeling and Prediction of Online Product Review Helpfulness: A Survey	Gerardo Ocampo Diaz, Vincent Ng	This paper provides an overview of the most relevant work in helpfulness prediction and understanding in the past decade, discusses the insights gained from said work, and provides guidelines for future research.
66	Mining Cross-Cultural Differences and Similarities in Social Media	Bill Yuchen Lin, Frank F. Xu, Kenny Zhu, Seung-won Hwang	In this paper, we study the problem of computing such cross-cultural differences and similarities.
67	Classification of Moral Foundations in Microblog Political Discourse	Kristen Johnson, Dan Goldwasser	The contributions of this work includes a dataset annotated for the moral foundations, annotation guidelines, and probabilistic graphical models which show the usefulness of jointly modeling abstract political slogans, as opposed to the unigrams of previous works, with policy frames for the prediction of the morality underlying political tweets.
68	Coarse-to-Fine Decoding for Neural Semantic Parsing	Li Dong, Mirella Lapata	In this work, we propose a structure-aware neural architecture which decomposes the semantic parsing process into two stages.
69	Confidence Modeling for Neural Semantic Parsing	Li Dong, Chris Quirk, Mirella Lapata	In this work we focus on confidence modeling for neural semantic parsers which are built upon sequence-to-sequence models.
70	StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing	Pengcheng Yin, Chunting Zhou, Junxian He, Graham Neubig	We introduce StructVAE, a variational auto-encoding model for semi-supervised semantic parsing, which learns both from limited amounts of parallel data, and readily-available unlabeled NL utterances.
71	Sequence-to-Action: End-to-End Semantic Graph Generation for Semantic Parsing	Bo Chen, Le Sun, Xianpei Han	This paper proposes a neural semantic parsing approach – Sequence-to-Action, which models semantic parsing as an end-to-end semantic graph generation process.
72	On the Limitations of Unsupervised Bilingual Dictionary Induction	Anders Søgaard, Sebastian Ruder, Ivan Vulić	Unsupervised machine translation – i.e., not assuming any cross-lingual supervision signal, whether a dictionary, translations, or comparable corpora – seems impossible, but nevertheless, Lample et al. (2017) recently proposed a fully unsupervised machine translation (MT) model.
73	A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings	Mikel Artetxe, Gorka Labaka, Eneko Agirre	This work proposes an alternative approach based on a fully unsupervised initialization that explicitly exploits the structural similarity of the embeddings, and a robust self-learning algorithm that iteratively improves this solution.
74	A Multi-lingual Multi-task Architecture for Low-resource Sequence Labeling	Ying Lin, Shengqi Yang, Veselin Stoyanov, Heng Ji	We propose a multi-lingual multi-task architecture to develop supervised models with a minimal amount of labeled data for sequence labeling.
75	Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable	Viktor Hangya, Fabienne Braune, Alexander Fraser, Hinrich Schütze	We make two contributions.
76	Knowledgeable Reader: Enhancing Cloze-Style Reading Comprehension with External Commonsense Knowledge	Todor Mihaylov, Anette Frank	We introduce a neural reading comprehension model that integrates external commonsense knowledge, encoded as a key-value memory, in a cloze-style setting.
77	Multi-Relational Question Answering from Narratives: Machine Reading and Reasoning in Simulated Worlds	Igor Labutov, Bishan Yang, Anusha Prakash, Amos Azaria	In this work, we look towards a practical use-case of QA over user-instructed knowledge that uniquely combines elements of both structured QA over knowledge bases, and unstructured QA over narrative, introducing the task of multi-relational QA over personal narrative.
78	Simple and Effective Multi-Paragraph Reading Comprehension	Christopher Clark, Matt Gardner	We introduce a method of adapting neural paragraph-level question answering models to the case where entire documents are given as input.
79	Semantically Equivalent Adversarial Rules for Debugging NLP models	Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin	To automatically detect this behavior for individual instances, we present semantically equivalent adversaries (SEAs) – semantic-preserving perturbations that induce changes in the model’s predictions.
80	Style Transfer Through Back-Translation	Shrimai Prabhumoye, Yulia Tsvetkov, Ruslan Salakhutdinov, Alan W Black	This paper introduces a new method for automatic style transfer.
81	Generating Fine-Grained Open Vocabulary Entity Type Descriptions	Rajarshi Bhowmik, Gerard de Melo	In this paper, we introduce a dynamic memory-based network that generates a short open vocabulary description of an entity by jointly leveraging induced fact embeddings as well as the dynamic context of the generated sequence of words.
82	Hierarchical Neural Story Generation	Angela Fan, Mike Lewis, Yann Dauphin	We explore story generation: creative systems that can build coherent and fluent passages of text about a topic. We collect a large dataset of 300K human-written stories paired with writing prompts from an online forum.
83	No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling	Xin Wang, Wenhu Chen, Yuan-Fang Wang, William Yang Wang	Therefore, we propose an Adversarial REward Learning (AREL) framework to learn an implicit reward function from human demonstrations, and then optimize policy search with the learned reward function.
84	Bridging Languages through Images with Deep Partial Canonical Correlation Analysis	Guy Rotman, Ivan Vulić, Roi Reichart	In particular, we propose a novel model based on Partial Canonical Correlation Analysis (PCCA).
85	Illustrative Language Understanding: Large-Scale Visual Grounding with Image Search	Jamie Kiros, William Chan, Geoffrey Hinton	We introduce Picturebook, a large-scale lookup operation to ground language via snapshots’ of our physical world accessed through image search.
86	What Action Causes This? Towards Naive Physical Action-Effect Prediction	Qiaozi Gao, Shaohua Yang, Joyce Chai, Lucy Vanderwende	Towards this goal, this paper introduces a new task on naive physical action-effect prediction, which addresses the relations between concrete actions (expressed in the form of verb-noun pairs) and their effects on the state of the physical world as depicted by images. We collected a dataset for this task and developed an approach that harnesses web image data through distant supervision to facilitate learning for action-effect prediction.
87	Transformation Networks for Target-Oriented Sentiment Classification	Xin Li, Lidong Bing, Wai Lam, Bei Shi	After re-examining the drawbacks of attention mechanism and the obstacles that block CNN to perform well in this classification task, we propose a new model that achieves new state-of-the-art results on a few benchmarks.
88	Target-Sensitive Memory Networks for Aspect Sentiment Classification	Shuai Wang, Sahisnu Mazumder, Bing Liu, Mianwei Zhou, Yi Chang	To tackle this problem, we propose the target-sensitive memory networks (TMNs).
89	Identifying Transferable Information Across Domains for Cross-domain Sentiment Classification	Raksha Sharma, Pushpak Bhattacharyya, Sandipan Dandapat, Himanshu Sharad Bhatt	We present a novel approach based on χ2 test and cosine-similarity between context vector of words to identify polarity preserving significant words across domains.
90	Unpaired Sentiment-to-Sentiment Translation: A Cycled Reinforcement Learning Approach	Jingjing Xu, Xu Sun, Qi Zeng, Xiaodong Zhang, Xuancheng Ren, Houfeng Wang, Wenjie Li	To solve this problem, we propose a cycled reinforcement learning method that enables training on unpaired data by collaboration between a neutralization module and an emotionalization module.
91	Discourse Marker Augmented Network with Reinforcement Learning for Natural Language Inference	Boyuan Pan, Yazheng Yang, Zhou Zhao, Yueting Zhuang, Deng Cai, Xiaofei He	We observe that people usually use some discourse markers such as “so” or “but” to represent the logical relationship between two sentences.
92	Working Memory Networks: Augmenting Memory Networks with a Relational Reasoning Module	Juan Pavez, Héctor Allende, Héctor Allende-Cid	To solve these issues, we introduce the Working Memory Network, a MemNN architecture with a novel working memory storage and reasoning module.
93	Reasoning with Sarcasm by Reading In-Between	Yi Tay, Anh Tuan Luu, Siu Cheung Hui, Jian Su	More specifically, we propose an attention-based neural model that looks in-between instead of across, enabling it to explicitly model contrast and incongruity.
94	Adversarial Contrastive Estimation	Avishek Joey Bose, Huan Ling, Yanshuai Cao	In this work, we view contrastive learning as an abstraction of all such methods and augment the negative sampler into a mixture distribution containing an adversarially learned sampler.
95	Adaptive Scaling for Sparse Detection in Information Extraction	Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun	In this paper, we propose \textit{adaptive scaling}, an algorithm which can handle the positive sparsity problem and directly optimize over F-measure via dynamic cost-sensitive learning.
96	Strong Baselines for Neural Semi-Supervised Learning under Domain Shift	Sebastian Ruder, Barbara Plank	In this paper, we re-evaluate classic general-purpose bootstrapping approaches in the context of neural networks under domain shifts vs. recent neural approaches and propose a novel multi-task tri-training method that reduces the time and space complexity of classic tri-training.
97	Fluency Boost Learning and Inference for Neural Grammatical Error Correction	Tao Ge, Furu Wei, Ming Zhou	Most of the neural sequence-to-sequence (seq2seq) models for grammatical error correction (GEC) have two limitations: (1) a seq2seq model may not be well generalized with only limited error-corrected data; (2) a seq2seq model may fail to completely correct a sentence with multiple errors through normal seq2seq inference.
98	A Neural Architecture for Automated ICD Coding	Pengtao Xie, Eric Xing	In this paper, we build a neural architecture for automated coding.
99	Domain Adaptation with Adversarial Training and Graph Embeddings	Firoj Alam, Shafiq Joty, Muhammad Imran	In this paper, we study the problem of classifying social media posts during a crisis event (e.g., Earthquake).
100	TDNN: A Two-stage Deep Neural Network for Prompt-independent Automated Essay Scoring	Cancan Jin, Ben He, Kai Hui, Le Sun	In particular, in the first stage, using the rated essays for non-target prompts as the training data, a shallow model is learned to select essays with an extreme quality for the target prompt, serving as pseudo training data; in the second stage, an end-to-end hybrid deep model is proposed to learn a prompt-dependent rating model consuming the pseudo training data from the first step.
101	Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation	Tiancheng Zhao, Kyusong Lee, Maxine Eskenazi	We present an unsupervised discrete sentence representation learning method that can integrate with any existing encoder-decoder dialog models for interpretable response generation.
102	Learning to Control the Specificity in Neural Response Generation	Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Jun Xu, Xueqi Cheng	To address this problem, we propose a novel controlled response generation mechanism to handle different utterance-response relationships in terms of specificity.
103	Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network	Xiangyang Zhou, Lu Li, Daxiang Dong, Yi Liu, Ying Chen, Wayne Xin Zhao, Dianhai Yu, Hua Wu	In this paper, we investigate matching a response with its multi-turn context using dependency information based entirely on attention.
104	MojiTalk: Generating Emotional Responses at Scale	Xianda Zhou, William Yang Wang	In this paper, we take a more radical approach: we exploit the idea of leveraging Twitter data that are naturally labeled with emojis. We collect a large corpus of Twitter conversations that include emojis in the response and assume the emojis convey the underlying emotions of the sentence.
105	Taylor’s law for Human Linguistic Sequences	Tatsuru Kobayashi, Kumiko Tanaka-Ishii	This article describes a new way to quantify Taylor’s law in natural language and conducts Taylor analysis of over 1100 texts across 14 languages.
106	A Framework for Representing Language Acquisition in a Population Setting	Jordan Kodner, Christopher Cerezo Falco	We compare the strengths and weaknesses of existing approaches and propose a new analytic framework which combines previous network models’ ability to capture realistic social structure with practically and more elegant computational properties.
107	Prefix Lexicalization of Synchronous CFGs using Synchronous TAG	Logan Born, Anoop Sarkar	We show that an epsilon-free, chain-free synchronous context-free grammar (SCFG) can be converted into a weakly equivalent synchronous tree-adjoining grammar (STAG) which is prefix lexicalized.
108	Straight to the Tree: Constituency Parsing with Neural Syntactic Distance	Yikang Shen, Zhouhan Lin, Athul Paul Jacob, Alessandro Sordoni, Aaron Courville, Yoshua Bengio	In this work, we propose a novel constituency parsing scheme.
109	Gaussian Mixture Latent Vector Grammars	Yanpeng Zhao, Liwen Zhang, Kewei Tu	We introduce Latent Vector Grammars (LVeGs), a new framework that extends latent variable grammars such that each nonterminal symbol is associated with a continuous vector space representing the set of (infinitely many) subtypes of the nonterminal.
110	Extending a Parser to Distant Domains Using a Few Dozen Partially Annotated Examples	Vidur Joshi, Matthew Peters, Mark Hopkins	For more syntactically distant domains, we provide a simple way to adapt a parser using only dozens of partial annotations.
111	Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations	Vered Shwartz, Ido Dagan	We propose a neural model that generalizes better by representing paraphrases in a continuous space, generalizing for both unseen noun-compounds and rare paraphrases.
112	Searching for the X-Factor: Exploring Corpus Subjectivity for Word Embeddings	Maksim Tkachenko, Chong Cher Chia, Hady Lauw	Through systematic comparative analyses, we establish this to be the case indeed.
113	Word Embedding and WordNet Based Metaphor Identification and Interpretation	Rui Mao, Chenghua Lin, Frank Guerin	In this paper, we propose an unsupervised learning method that identifies and interprets metaphors at word-level without any preprocessing, outperforming strong baselines in the metaphor identification task.
114	Incorporating Latent Meanings of Morphological Compositions to Enhance Word Embeddings	Yang Xu, Jiawei Liu, Wei Yang, Liusheng Huang	In this paper, we explore to employ the latent meanings of morphological compositions of words to train and enhance word embeddings.
115	A Stochastic Decoder for Neural Machine Translation	Philip Schulz, Wilker Aziz, Trevor Cohn	To this end, we present a deep generative model of machine translation which incorporates a chain of latent variables, in order to account for local lexical and syntactic variation in parallel corpora.
116	Forest-Based Neural Machine Translation	Chunpeng Ma, Akihiro Tamura, Masao Utiyama, Tiejun Zhao, Eiichiro Sumita	This paper proposes a forest-based NMT method that translates a linearized packed forest under a simple sequence-to-sequence framework (i.e., a forest-to-sequence NMT model).
117	Context-Aware Neural Machine Translation Learns Anaphora Resolution	Elena Voita, Pavel Serdyukov, Rico Sennrich, Ivan Titov	We introduce a context-aware neural machine translation model designed in such way that the flow of information from the extended context to the translation model can be controlled and analyzed.
118	Document Context Neural Machine Translation with Memory Networks	Sameen Maruf, Gholamreza Haffari	We present a document-level neural machine translation model which takes both source and target document context into account using memory networks.
119	Which Melbourne? Augmenting Geocoding with Maps	Milan Gritta, Mohammad Taher Pilehvar, Nigel Collier	We introduce a geocoder (location mention disambiguator) that achieves state-of-the-art (SOTA) results on three diverse datasets by exploiting the implicit lexical clues. We also introduce an open-source dataset for geoparsing of news events covering global disease outbreaks and epidemics to help future evaluation in geoparsing.
120	Learning Prototypical Goal Activities for Locations	Tianyu Jiang, Ellen Riloff	Our research aims to learn goal-acts for specific locations using a text corpus and semi-supervised learning.
121	Guess Me if You Can: Acronym Disambiguation for Enterprises	Yang Li, Bo Zhao, Ariel Fuxman, Fangbo Tao	In this work we propose an end-to-end framework to tackle all these challenges.
122	A Multi-Axis Annotation Scheme for Event Temporal Relations	Qiang Ning, Hao Wu, Dan Roth	This paper proposes a new multi-axis modeling to better capture the temporal structure of events.
123	Exemplar Encoder-Decoder for Neural Conversation Generation	Gaurav Pandey, Danish Contractor, Vineet Kumar, Sachindra Joshi	In this paper we present the Exemplar Encoder-Decoder network (EED), a novel conversation model that learns to utilize \textit{similar} examples from training data to generate responses.
124	DialSQL: Dialogue Based Structured Query Generation	Izzeddin Gur, Semih Yavuz, Yu Su, Xifeng Yan	Rather than solely relying on algorithmic innovations, in this work, we introduce DialSQL, a dialogue-based structured query generation framework that leverages human intelligence to boost the performance of existing algorithms via user interaction.
125	Conversations Gone Awry: Detecting Early Signs of Conversational Failure	Justine Zhang, Jonathan Chang, Cristian Danescu-Niculescu-Mizil, Lucas Dixon, Yiqing Hua, Dario Taraborelli, Nithum Thain	In this work, we introduce the task of predicting from the very start of a conversation whether it will get out of hand.
126	Are BLEU and Meaning Representation in Opposition?	Ondřej Cífka, Ondřej Bojar	We propose several variations of the attentive NMT architecture bringing this meeting point back.
127	Automatic Metric Validation for Grammatical Error Correction	Leshem Choshen, Omri Abend	We propose MAEGE, an automatic methodology for GEC metric validation, that overcomes many of the difficulties in the existing methodology.
128	The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing	Rotem Dror, Gili Baumer, Segev Shlomov, Roi Reichart	Based on this discussion we propose a simple practical protocol for statistical significance test selection in NLP setups and accompany this protocol with a brief survey of the most relevant tests.
129	Distilling Knowledge for Search-based Structured Prediction	Yijia Liu, Wanxiang Che, Huaipeng Zhao, Bing Qin, Ting Liu	In this paper, we distill an ensemble of multiple models trained with different initialization into a single model.
130	Stack-Pointer Networks for Dependency Parsing	Xuezhe Ma, Zecong Hu, Jingzhou Liu, Nanyun Peng, Graham Neubig, Eduard Hovy	We introduce a novel architecture for dependency parsing: stack-pointer networks (StackPtr).
131	Twitter Universal Dependency Parsing for African-American and Mainstream American English	Su Lin Blodgett, Johnny Wei, Brendan O’Connor	We describe our standards for handling Twitter- and AAE-specific features and evaluate a variety of cross-domain strategies for improving parsing with no, or very little, in-domain labeled data, including a new data synthesis approach.
132	LSTMs Can Learn Syntax-Sensitive Dependencies Well, But Modeling Structure Makes Them Better	Adhiguna Kuncoro, Chris Dyer, John Hale, Dani Yogatama, Stephen Clark, Phil Blunsom	Using the same diagnostic, we show that, in fact, LSTMs do succeed in learning such dependencies-provided they have enough capacity.
133	Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures	Wenqiang Lei, Xisen Jin, Min-Yen Kan, Zhaochun Ren, Xiangnan He, Dawei Yin	We propose a novel, holistic, extendable framework based on a single sequence-to-sequence (seq2seq) model which can be optimized with supervised or reinforcement learning.
134	An End-to-end Approach for Handling Unknown Slot Values in Dialogue State Tracking	Puyang Xu, Qi Hu	We describe in this paper an E2E architecture based on the pointer network (PtrNet) that can effectively extract unknown slot values while still obtains state-of-the-art accuracy on the standard DSTC2 benchmark.
135	Global-Locally Self-Attentive Encoder for Dialogue State Tracking	Victor Zhong, Caiming Xiong, Richard Socher	In this paper, we propose the Global-Locally Self-Attentive Dialogue State Tracker (GLAD), which learns representations of the user utterance and previous system actions with global-local modules.
136	Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems	Andrea Madotto, Chien-Sheng Wu, Pascale Fung	In this paper, we propose a novel yet simple end-to-end differentiable model called memory-to-sequence (Mem2Seq) to address this issue.
137	Tailored Sequence to Sequence Models to Different Conversation Scenarios	Hainan Zhang, Yanyan Lan, Jiafeng Guo, Jun Xu, Xueqi Cheng	In this paper, we propose two tailored optimization criteria for Seq2Seq to different conversation scenarios, i.e., the maximum generated likelihood for specific-requirement scenario, and the conditional value-at-risk for diverse-requirement scenario.
138	Knowledge Diffusion for Neural Dialogue Generation	Shuman Liu, Hongshen Chen, Zhaochun Ren, Yang Feng, Qun Liu, Dawei Yin	In this paper, we propose a neural knowledge diffusion (NKD) model to introduce knowledge into dialogue generation.
139	Generating Informative Responses with Controlled Sentence Function	Pei Ke, Jian Guan, Minlie Huang, Xiaoyan Zhu	In this paper, we present a model to generate informative responses with controlled sentence function.
140	Sentiment Adaptive End-to-End Dialog Systems	Weiyan Shi, Zhou Yu	Therefore, we propose to include user sentiment obtained through multimodal information (acoustic, dialogic and textual), in the end-to-end learning framework to make systems more user-adaptive and effective.
141	Embedding Learning Through Multilingual Concept Induction	Philipp Dufter, Mengjie Zhao, Martin Schmitt, Alexander Fraser, Hinrich Schütze	We present a new method for estimating vector space representations of words: embedding learning by concept induction.
142	Isomorphic Transfer of Syntactic Structures in Cross-Lingual NLP	Edoardo Maria Ponti, Roi Reichart, Anna Korhonen, Ivan Vulić	In this paper, we measure cross-lingual syntactic variation, or anisomorphism, in the UD treebank collection, considering both morphological and structural properties.
143	Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data	Adithya Pratapa, Gayatri Bhat, Monojit Choudhury, Sunayana Sitaram, Sandipan Dandapat, Kalika Bali	We present a computational technique for creation of grammatically valid artificial CM data based on the Equivalence Constraint Theory.
144	Chinese NER Using Lattice LSTM	Yue Zhang, Jie Yang	We investigate a lattice-structured LSTM model for Chinese NER, which encodes a sequence of input characters as well as all potential words that match a lexicon.
145	Nugget Proposal Networks for Chinese Event Detection	Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun	In this paper, we propose Nugget Proposal Networks (NPNs), which can solve the word-trigger mismatch problem by directly proposing entire trigger nuggets centered at each character regardless of word boundaries.
146	Higher-order Relation Schema Induction using Tensor Factorization with Back-off and Aggregation	Madhav Nimishakavi, Manish Gupta, Partha Talukdar	In this paper, we propose Tensor Factorization with Back-off and Aggregation (TFBA), a novel framework for the HRSI problem.
147	Discovering Implicit Knowledge with Unary Relations	Michael Glass, Alfio Gliozzo	In this paper we propose a new methodology to identify relations between two entities, consisting of detecting a very large number of unary relations, and using them to infer missing entities.
148	Improving Entity Linking by Modeling Latent Relations between Mentions	Phong Le, Ivan Titov	Unlike previous approaches, which relied on supervised systems or heuristics to predict these relations, we treat relations as latent variables in our neural entity-linking model.
149	Dating Documents using Graph Convolution Networks	Shikhar Vashishth, Shib Sankar Dasgupta, Swayambhu Nath Ray, Partha Talukdar	In this paper, we propose NeuralDater, a Graph Convolutional Network (GCN) based document dating approach which jointly exploits syntactic and temporal graph structures of document in a principled way.
150	A Graph-to-Sequence Model for AMR-to-Text Generation	Linfeng Song, Yue Zhang, Zhiguo Wang, Daniel Gildea	We introduce a neural graph-to-sequence model, using a novel LSTM structure for directly encoding graph-level semantics.
151	GTR-LSTM: A Triple Encoder for Sentence Generation from RDF Data	Bayu Distiawan Trisedya, Jianzhong Qi, Rui Zhang, Wei Wang	To preserve as much information from RDF triples as possible, we propose a novel graph-based triple encoder.
152	Learning to Write with Cooperative Discriminators	Ari Holtzman, Jan Buys, Maxwell Forbes, Antoine Bosselut, David Golub, Yejin Choi	We propose a unified learning framework that collectively addresses all the above issues by composing a committee of discriminators that can guide a base RNN generator towards more globally coherent generations.
153	A Neural Approach to Pun Generation	Zhiwei Yu, Jiwei Tan, Xiaojun Wan	In this paper, we propose neural network models for homographic pun generation, and they can generate puns without requiring any pun data for training.
154	Learning to Generate Move-by-Move Commentary for Chess Games from Large-Scale Social Forum Data	Harsh Jhamtani, Varun Gangal, Eduard Hovy, Graham Neubig, Taylor Berg-Kirkpatrick	We introduce a new large-scale chess commentary dataset and propose methods to generate commentary for individual moves in a chess game.
155	From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction	Zihang Dai, Qizhe Xie, Eduard Hovy	In this work, we study the credit assignment problem in reward augmented maximum likelihood (RAML) learning, and establish a theoretical equivalence between the token-level counterpart of RAML and the entropy regularized reinforcement learning.
156	DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension	Amrita Saha, Rahul Aralikatte, Mitesh M. Khapra, Karthik Sankaranarayanan	We propose DuoRC, a novel dataset for Reading Comprehension (RC) that motivates several new challenges for neural approaches in language understanding beyond those offered by existing RC datasets.
157	Stochastic Answer Networks for Machine Reading Comprehension	Xiaodong Liu, Yelong Shen, Kevin Duh, Jianfeng Gao	We propose a simple yet robust stochastic answer network (SAN) that simulates multi-step reasoning in machine reading comprehension.
158	Multi-Granularity Hierarchical Attention Fusion Networks for Reading Comprehension and Question Answering	Wei Wang, Ming Yan, Chen Wu	This paper describes a novel hierarchical attention network for reading comprehension style question answering, which aims to answer questions for a given narrative paragraph.
159	Joint Training of Candidate Extraction and Answer Selection for Reading Comprehension	Zhen Wang, Jiachen Liu, Xinyan Xiao, Yajuan Lyu, Tian Wu	In this paper, we formulate reading comprehension as an extract-then-select two-stage procedure.
160	Efficient and Robust Question Answering from Minimal Context over Documents	Sewon Min, Victor Zhong, Richard Socher, Caiming Xiong	In this paper, we study the minimal context required to answer the question, and find that most questions in existing datasets can be answered with a small set of sentences.
161	Denoising Distantly Supervised Open-Domain Question Answering	Yankai Lin, Haozhe Ji, Zhiyuan Liu, Maosong Sun	To address these issues, we propose a novel DS-QA model which employs a paragraph selector to filter out those noisy paragraphs and a paragraph reader to extract the correct answer from those denoised paragraphs.
162	Question Condensing Networks for Answer Selection in Community Question Answering	Wei Wu, Xu Sun, Houfeng Wang	In this paper, we propose the Question Condensing Networks (QCN) to make use of the subject-body relationship of community questions.
163	Towards Robust Neural Machine Translation	Yong Cheng, Zhaopeng Tu, Fandong Meng, Junjie Zhai, Yang Liu	In this paper, we propose to improve the robustness of NMT models with adversarial stability training.
164	Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings	Shaohui Kuang, Junhui Li, António Branco, Weihua Luo, Deyi Xiong	In this paper, we seek to somewhat shorten the distance between source and target words in that procedure, and thus strengthen their association, by means of a method we term bridging source and target word embeddings.
165	Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning	Julia Kreutzer, Joshua Uyheng, Stefan Riezler	We present a study on reinforcement learning (RL) from human bandit feedback for sequence-to-sequence learning, exemplified by the task of bandit neural machine translation (NMT).
166	Accelerating Neural Transformer via an Average Attention Network	Biao Zhang, Deyi Xiong, Jinsong Su	To alleviate this issue, we propose an average attention network as an alternative to the self-attention network in the decoder of the neural Transformer.
167	How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures	Tobias Domhan	In this work we take a fine-grained look at the different architectures for NMT.
168	Weakly Supervised Semantic Parsing with Abstract Examples	Omer Goldman, Veronica Latcinnik, Ehud Nave, Amir Globerson, Jonathan Berant	In this work we propose that in closed worlds with clear semantic types, one can substantially alleviate these problems by utilizing an abstract representation, where tokens in both the language utterance and program are lifted to an abstract form.
169	Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback	Carolin Lawrence, Stefan Riezler	We show how to apply this learning framework to neural semantic parsing.
170	AMR dependency parsing with a typed semantic algebra	Jonas Groschwitz, Matthias Lindemann, Meaghan Fowlie, Mark Johnson, Alexander Koller	We present a semantic parser for Abstract Meaning Representations which learns to parse strings into tree representations of the compositional structure of an AMR graph.
171	Sequence-to-sequence Models for Cache Transition Systems	Xiaochang Peng, Linfeng Song, Daniel Gildea, Giorgio Satta	In this paper, we present a sequence-to-sequence based approach for mapping natural language sentences to AMR semantic graphs.
172	Batch IS NOT Heavy: Learning Word Representations From All Samples	Xin Xin, Fajie Yuan, Xiangnan He, Joemon M. Jose	In this work, we propose AllVec that uses batch gradient learning to generate word representations from all training samples.
173	Backpropagating through Structured Argmax using a SPIGOT	Hao Peng, Sam Thomson, Noah A. Smith	We introduce structured projection of intermediate gradients (SPIGOT), a new method for backpropagating through neural networks that include hard-decision structured predictions (e.g., parsing) in intermediate layers.
174	Learning How to Actively Learn: A Deep Imitation Learning Approach	Ming Liu, Wray Buntine, Gholamreza Haffari	We introduce a method that learns an AL “policy” using “imitation learning” (IL).
175	Training Classifiers with Natural Language Explanations	Braden Hancock, Paroma Varma, Stephanie Wang, Martin Bringmann, Percy Liang, Christopher Ré	In this work, we propose BabbleLabble, a framework for training classifiers in which an annotator provides a natural language explanation for each labeling decision.
176	Did the Model Understand the Question?	Pramod Kaushik Mudrakarta, Ankur Taly, Mukund Sundararajan, Kedar Dhamdhere	We analyze state-of-the-art deep learning models for three tasks: question answering on (1) images, (2) tables, and (3) passages of text.
177	Harvesting Paragraph-level Question-Answer Pairs from Wikipedia	Xinya Du, Claire Cardie	We propose a neural network approach that incorporates coreference knowledge via a novel gating mechanism. We apply our system (composed of an answer span extraction system and the passage-level QG system) to the 10,000 top ranking Wikipedia articles and create a corpus of over one million question-answer pairs.
178	Multi-Passage Machine Reading Comprehension with Cross-Passage Answer Verification	Yizhong Wang, Kai Liu, Jing Liu, Wei He, Yajuan Lyu, Hua Wu, Sujian Li, Haifeng Wang	To address this problem, we propose an end-to-end neural model that enables those answer candidates from different passages to verify each other based on their content representations.
179	Language Generation via DAG Transduction	Yajie Ye, Weiwei Sun, Xiaojun Wan	In this paper, we propose a novel DAG transducer to perform graph-to-program transformation.
180	A Distributional and Orthographic Aggregation Model for English Derivational Morphology	Daniel Deutsch, John Hewitt, Dan Roth	In this work, we tackle the task of derived word generation.
181	Deep-speare: A joint neural model of poetic language, meter and rhyme	Jey Han Lau, Trevor Cohn, Timothy Baldwin, Julian Brooke, Adam Hammond	In this paper, we propose a joint architecture that captures language, rhyme and meter for sonnet modelling.
182	NeuralREG: An end-to-end approach to referring expression generation	Thiago Castro Ferreira, Diego Moussallem, Ákos Kádár, Sander Wubben, Emiel Krahmer	In this paper, we present a new approach (NeuralREG), relying on deep neural networks, which makes decisions about form and content in one go without explicit feature extraction.
183	Stock Movement Prediction from Tweets and Historical Prices	Yumo Xu, Shay B. Cohen	We treat these three complexities and present a novel deep generative model jointly exploiting text and price signals for this task.
184	Rumor Detection on Twitter with Tree-structured Recursive Neural Networks	Jing Ma, Wei Gao, Kam-Fai Wong	In this work, we try to learn discriminative features from tweets content by following their non-sequential propagation structure and generate more powerful representations for identifying different type of rumors.
185	Visual Attention Model for Name Tagging in Multimodal Social Media	Di Lu, Leonardo Neves, Vitor Carvalho, Ning Zhang, Heng Ji	In this paper, we explore the task of name tagging in multimodal social media posts.
186	Multimodal Named Entity Disambiguation for Noisy Social Media Posts	Seungwhan Moon, Leonardo Neves, Vitor Carvalho	We introduce the new Multimodal Named Entity Disambiguation (MNED) task for multimodal social media posts such as Snapchat or Instagram captions, which are composed of short captions with accompanying images. To this end, we build a new dataset called SnapCaptionsKB, a collection of Snapchat image captions submitted to public and crowd-sourced stories, with named entity mentions fully annotated and linked to entities in an external knowledge base.
187	Semi-supervised User Geolocation via Graph Convolutional Networks	Afshin Rahimi, Trevor Cohn, Timothy Baldwin	In this paper, we propose GCN, a multiview geolocation model based on Graph Convolutional Networks, that uses both text and network context.
188	Document Modeling with External Attention for Sentence Extraction	Shashi Narayan, Ronald Cardenas, Nikos Papasarantopoulos, Shay B. Cohen, Mirella Lapata, Jiangsheng Yu, Yi Chang	We propose to use external information to improve document modeling for problems that can be framed as sentence extraction.
189	Neural Models for Documents with Metadata	Dallas Card, Chenhao Tan, Noah A. Smith	In this paper, we build on recent advances in variational inference methods and propose a general neural framework, based on topic models, to enable flexible incorporation of metadata and allow for rapid exploration of alternative models.
190	NASH: Toward End-to-End Neural Architecture for Generative Semantic Hashing	Dinghan Shen, Qinliang Su, Paidamoyo Chapfuwa, Wenlin Wang, Guoyin Wang, Ricardo Henao, Lawrence Carin	In this paper, we present an \textit{end-to-end} Neural Architecture for Semantic Hashing (NASH), where the binary hashing codes are treated as \textit{Bernoulli} latent variables.
191	Large-Scale QA-SRL Parsing	Nicholas FitzGerald, Julian Michael, Luheng He, Luke Zettlemoyer	We present a new large-scale corpus of Question-Answer driven Semantic Role Labeling (QA-SRL) annotations, and the first high-quality QA-SRL parser.
192	Syntax for Semantic Role Labeling, To Be, Or Not To Be	Shexia He, Zuchao Li, Hai Zhao, Hongxiao Bai	We propose an enhanced argument labeling model companying with an extended korder argument pruning algorithm for effectively exploiting syntactic information.
193	Situated Mapping of Sequential Instructions to Actions with Single-step Reward Observation	Alane Suhr, Yoav Artzi	We propose a learning approach for mapping context-dependent sequential instructions to actions.
194	Marrying Up Regular Expressions with Neural Networks: A Case Study for Spoken Language Understanding	Bingfeng Luo, Yansong Feng, Zheng Wang, Songfang Huang, Rui Yan, Dongyan Zhao	In this paper, we ask the question: “Can we combine a neural network (NN) with regular expressions (RE) to improve supervised learning for NLP?”
195	Token-level and sequence-level loss smoothing for RNN language models	Maha Elbayad, Laurent Besacier, Jakob Verbeek	We extend this approach to token-level loss smoothing, and propose improvements to the sequence-level smoothing approach.
196	Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers	Georgios Spithourakis, Sebastian Riedel	In this paper, we explore different strategies for modelling numerals with language models, such as memorisation and digit-by-digit composition, and propose a novel neural architecture that uses a continuous probability density function to model numerals from an open vocabulary.
197	To Attend or not to Attend: A Case Study on Syntactic Structures for Semantic Relatedness	Amulya Gupta, Zhu Zhang	Our models are evaluated on two semantic relatedness tasks: semantic relatedness scoring for sentence pairs (SemEval 2012, Task 6 and SemEval 2014, Task 1) and paraphrase detection for question pairs (Quora, 2017).
198	What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties	Alexis Conneau, German Kruszewski, Guillaume Lample, Loïc Barrault, Marco Baroni	We introduce here 10 probing tasks designed to capture simple linguistic features of sentences, and we use them to study embeddings generated by three different encoders trained in eight distinct ways, uncovering intriguing properties of both encoders and training methods.
199	Robust Distant Supervision Relation Extraction via Deep Reinforcement Learning	Pengda Qin, Weiran Xu, William Yang Wang	To do this, our paper describes a radical solution-We explore a deep reinforcement learning strategy to generate the false-positive indicator, where we automatically recognize false positives for each relation type without any supervised information.
200	Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder	Ryo Takahashi, Ran Tian, Kentaro Inui	In this paper we investigate a dimension reduction technique by training relations jointly with an autoencoder, which is expected to better capture compositional constraints.
201	Zero-Shot Transfer Learning for Event Extraction	Lifu Huang, Heng Ji, Kyunghyun Cho, Ido Dagan, Sebastian Riedel, Clare Voss	Most previous supervised event extraction methods have relied on features derived from manual annotations, and thus cannot be applied to new event types without extra annotation effort.
202	Recursive Neural Structural Correspondence Network for Cross-domain Aspect and Opinion Co-Extraction	Wenya Wang, Sinno Jialin Pan	In this paper, we develop a novel recursive neural network that could reduce domain shift effectively in word level through syntactic relations.
203	Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning	Baolin Peng, Xiujun Li, Jianfeng Gao, Jingjing Liu, Kam-Fai Wong	To address these issues, we present Deep Dyna-Q, which to our knowledge is the first deep RL framework that integrates planning for task-completion dialogue policy learning.
204	Learning to Ask Questions in Open-domain Conversational Systems with Typed Decoders	Yansen Wang, Chenyi Liu, Minlie Huang, Liqiang Nie	We observe that a good question is a natural composition of interrogatives, topic words, and ordinary words.
205	Personalizing Dialogue Agents: I have a dog, do you have pets too?	Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, Jason Weston	In this work we present the task of making chit-chat more engaging by conditioning on profile information.
206	Efficient Large-Scale Neural Domain Classification with Personalized Attention	Young-Bum Kim, Dongchan Kim, Anjishnu Kumar, Ruhi Sarikaya	In this paper, we explore the task of mapping spoken language utterances to one of thousands of natural language understanding domains in intelligent personal digital assistants (IPDAs).
207	Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment	Yue Gu, Kangning Yang, Shiyu Fu, Shuhong Chen, Xinyu Li, Ivan Marsic	Addressing such issues, we introduce a hierarchical multimodal architecture with attention and word-level fusion to classify utterance-level sentiment and emotion from text and audio data.
208	Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph	AmirAli Bagher Zadeh, Paul Pu Liang, Soujanya Poria, Erik Cambria, Louis-Philippe Morency	In this paper we introduce CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI), the largest dataset of sentiment analysis and emotion recognition to date.
209	Efficient Low-rank Multimodal Fusion With Modality-Specific Factors	Zhun Liu, Ying Shen, Varun Bharadhwaj Lakshminarasimhan, Paul Pu Liang, AmirAli Bagher Zadeh, Louis-Philippe Morency	In this paper, we propose the Low-rank Multimodal Fusion method, which performs multimodal fusion using low-rank tensors to improve efficiency.
210	Discourse Coherence: Concurrent Explicit and Implicit Relations	Hannah Rohde, Alexander Johnson, Nathan Schneider, Bonnie Webber	Our prior work suggests that multiple discourse relations can be simultaneously operative between two segments for reasons not predicted by the literature.
211	A Spatial Model for Extracting and Visualizing Latent Discourse Structure in Text	Shashank Srivastava, Nebojsa Jojic	We present a generative probabilistic model of documents as sequences of sentences, and show that inference in it can lead to extraction of long-range latent discourse structure from a collection of documents.
212	Joint Reasoning for Temporal and Causal Relations	Qiang Ning, Zhili Feng, Hao Wu, Dan Roth	This paper presents a joint inference framework for them using constrained conditional models (CCMs).
213	Modeling Naive Psychology of Characters in Simple Commonsense Stories	Hannah Rashkin, Antoine Bosselut, Maarten Sap, Kevin Knight, Yejin Choi	To facilitate research addressing this challenge, we introduce a new annotation framework to explain naive psychology of story characters as fully-specified chains of mental states with respect to motivations and emotional reactions. Our work presents a new large-scale dataset with rich low-level annotations and establishes baseline performance on several new tasks, suggesting avenues for future research.
214	A Deep Relevance Model for Zero-Shot Document Filtering	Chenliang Li, Wei Zhou, Feng Ji, Yu Duan, Haiqing Chen	In this paper, we propose a novel deep relevance model for zero-shot document filtering, named DAZER.
215	Disconnected Recurrent Neural Networks for Text Categorization	Baoxin Wang	In this paper, we present a novel model named disconnected recurrent neural network (DRNN), which incorporates position-invariance into RNN.
216	Joint Embedding of Words and Labels for Text Classification	Guoyin Wang, Chunyuan Li, Wenlin Wang, Yizhe Zhang, Dinghan Shen, Xinyuan Zhang, Ricardo Henao, Lawrence Carin	We introduce an attention framework that measures the compatibility of embeddings between text sequences and labels.
217	Neural Sparse Topical Coding	Min Peng, Qianqian Xie, Yanchun Zhang, Hua Wang, Xiuzhen Zhang, Jimin Huang, Gang Tian	We propose a novel sparsity-enhanced topic model, Neural Sparse Topical Coding (NSTC) base on a sparsity-enhanced topic model called Sparse Topical Coding (STC).
218	Document Similarity for Texts of Varying Lengths via Hidden Topics	Hongyu Gong, Tarek Sakakini, Suma Bhat, JinJun Xiong	In this paper, we present a document matching approach to bridge this gap, by comparing the texts in a common space of hidden topics.
219	Eyes are the Windows to the Soul: Predicting the Rating of Text Quality Using Gaze Behaviour	Sandeep Mathias, Diptesh Kanojia, Kevin Patel, Samarth Agrawal, Abhijit Mishra, Pushpak Bhattacharyya	In this paper, we show that gaze behaviour does indeed help in effectively predicting the rating of text quality.
220	Multi-Input Attention for Unsupervised OCR Correction	Rui Dong, David Smith	We propose a novel approach to OCR post-correction that exploits repeated texts in large corpora both as a source of noisy target outputs for unsupervised training and as a source of evidence when decoding.
221	Building Language Models for Text with Named Entities	Md Rizwan Parvez, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang	In this paper, we propose a novel and effective approach to building a language model which can learn the entity names by leveraging their entity type information. We also introduce two benchmark datasets based on recipes and Java programming codes, on which we evaluate the proposed model.
222	hyperdoc2vec: Distributed Representations of Hypertext Documents	Jialong Han, Yan Song, Wayne Xin Zhao, Shuming Shi, Haisong Zhang	In this paper, we propose a general embedding approach for hyper-documents, namely, hyperdoc2vec, along with four criteria characterizing necessary information that hyper-document embedding models should preserve.
223	Entity-Duet Neural Ranking: Understanding the Role of Knowledge Graph Semantics in Neural Information Retrieval	Zhenghao Liu, Chenyan Xiong, Maosong Sun, Zhiyuan Liu	This paper presents the Entity-Duet Neural Ranking Model (EDRM), which introduces knowledge graphs to neural search systems.
224	Neural Natural Language Inference Models Enhanced with External Knowledge	Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Diana Inkpen, Si Wei	In this paper, we enrich the state-of-the-art neural natural language inference models with external knowledge.
225	AdvEntuRe: Adversarial Training for Textual Entailment with Knowledge-Guided Examples	Dongyeop Kang, Tushar Khot, Ashish Sabharwal, Eduard Hovy	We consider the problem of learning textual entailment models with limited supervision (5K-10K training examples), and present two complementary approaches for it.
226	Subword-level Word Vector Representations for Korean	Sungjoon Park, Jeongmin Byun, Sion Baek, Yongseok Cho, Alice Oh	In this paper, we look at improving distributed word representations for Korean using knowledge about the unique linguistic structure of Korean. To evaluate the vectors, we develop Korean test sets for word similarity and analogy and make them publicly available.
227	Incorporating Chinese Characters of Words for Lexical Sememe Prediction	Huiming Jin, Hao Zhu, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Fen Lin, Leyu Lin	To address this issue for Chinese, we propose a novel framework to take advantage of both internal character information and external context information of words.
228	SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment	Jisun An, Haewoon Kwak, Yong-Yeol Ahn	Here, we propose SemAxis, a simple yet powerful framework to characterize word semantics using many semantic axes in word-vector spaces beyond sentiment.
229	End-to-End Reinforcement Learning for Automatic Taxonomy Induction	Yuning Mao, Xiang Ren, Jiaming Shen, Xiaotao Gu, Jiawei Han	We present a novel end-to-end reinforcement learning approach to automatic taxonomy induction from a set of terms.
230	Incorporating Glosses into Neural Word Sense Disambiguation	Fuli Luo, Tianyu Liu, Qiaolin Xia, Baobao Chang, Zhifang Sui	In this paper, we integrate the context and glosses of the target word into a unified framework in order to make full use of both labeled data and lexical knowledge.
231	Bilingual Sentiment Embeddings: Joint Projection of Sentiment Across Languages	Jeremy Barnes, Roman Klinger, Sabine Schulte im Walde	We introduce Bilingual Sentiment Embeddings (BLSE), which jointly represent sentiment information in a source and target language.
232	Learning Domain-Sensitive and Sentiment-Aware Word Embeddings	Bei Shi, Zihao Fu, Lidong Bing, Wai Lam	We propose a new method for learning domain-sensitive and sentiment-aware embeddings that simultaneously capture the information of sentiment semantics and domain sensitivity of individual words.
233	Cross-Domain Sentiment Classification with Target Domain Specific Information	Minlong Peng, Qi Zhang, Yu-gang Jiang, Xuanjing Huang	In this work, we propose a method to simultaneously extract domain specific and invariant representations and train a classifier on each of the representation, respectively. And we introduce a few target domain labeled data for learning domain-specific information.
234	Aspect Based Sentiment Analysis with Gated Convolutional Networks	Wei Xue, Tao Li	We propose a model based on convolutional neural networks and gating mechanisms, which is more accurate and efficient.
235	A Helping Hand: Transfer Learning for Deep Sentiment Analysis	Xin Dong, Gerard de Melo	In this work, we present an approach to feed generic cues into the training process of such networks, leading to better generalization abilities given limited training data.
236	Cold-Start Aware User and Product Attention for Sentiment Classification	Reinald Kim Amplayo, Jihyeok Kim, Sua Sung, Seung-won Hwang	In this paper, we present Hybrid Contextualized Sentiment Classifier (HCSC), which contains two modules: (1) a fast word encoder that returns word vectors embedded with short and long range dependency features; and (2) Cold-Start Aware Attention (CSAA), an attention mechanism that considers the existence of cold-start problem when attentively pooling the encoded word vectors.
237	Modeling Deliberative Argumentation Strategies on Wikipedia	Khalid Al-Khatib, Henning Wachsmuth, Kevin Lang, Jakob Herpel, Matthias Hagen, Benno Stein	In this paper, we present a model for deliberative discussions and we illustrate its operationalization. On this basis, we automatically generate a corpus with about 200,000 turns, labeled for the 13 categories.
238	Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning	Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut	We present a new dataset of image caption annotations, Conceptual Captions, which contains an order of magnitude more images than the MS-COCO dataset (Lin et al., 2014) and represents a wider variety of both images and image caption styles.
239	Learning Translations via Images with a Massively Multilingual Image Dataset	John Hewitt, Daphne Ippolito, Brendan Callahan, Reno Kriz, Derry Tanti Wijaya, Chris Callison-Burch	To improve image-based translation, we introduce a novel method of predicting word concreteness from images, which improves on a previous state-of-the-art unsupervised technique. To facilitate research on the task, we introduce a large-scale multilingual corpus of images, each labeled with the word it represents.
240	On the Automatic Generation of Medical Imaging Reports	Baoyu Jing, Pengtao Xie, Eric Xing	To cope with these challenges, we (1) build a multi-task learning framework which jointly performs the prediction of tags and the generation of paragraphs, (2) propose a co-attention mechanism to localize regions containing abnormalities and generate narrations for them, (3) develop a hierarchical LSTM model to generate long paragraphs.
241	Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning	Hongge Chen, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Cho-Jui Hsieh	To study the robustness of language grounding to adversarial perturbations in machine vision and perception, we propose Show-and-Fool, a novel algorithm for crafting adversarial examples in neural image captioning.
242	Think Visually: Question Answering through Virtual Imagery	Ankit Goyal, Jian Wang, Jia Deng	In this paper, we study the problem of geometric reasoning (a form of visual reasoning) in the context of question-answering. Further, we propose two synthetic benchmarks, FloorPlanQA and ShapeIntersection, to evaluate the geometric reasoning capability of QA systems.
243	Interactive Language Acquisition with One-shot Visual Concept Learning through a Conversational Game	Haichao Zhang, Haonan Yu, Wei Xu	We highlight the perspective that conversational interaction serves as a natural interface both for language learning and for novel knowledge acquisition and propose a joint imitation and reinforcement approach for grounded language learning through an interactive conversational game.
244	A Purely End-to-End System for Multi-speaker Speech Recognition	Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. Hershey	In this paper, we propose a new sequence-to-sequence framework to directly decode multiple label sequences from a single speech sequence by unifying source separation and speech recognition functions in an end-to-end manner.
245	A Structured Variational Autoencoder for Contextual Morphological Inflection	Lawrence Wolf-Sonkin, Jason Naradowsky, Sebastian J. Mielke, Ryan Cotterell	To this end, we introduce a novel generative latent-variable model for the semi-supervised learning of inflection generation.
246	Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings	Bernd Bohnet, Ryan McDonald, Gonçalo Simões, Daniel Andor, Emily Pitler, Joshua Maynez	In this paper, we investigate models that use recurrent neural networks with sentence-level context for initial character and word-based representations.
247	Neural Factor Graph Models for Cross-lingual Morphological Tagging	Chaitanya Malaviya, Matthew R. Gormley, Graham Neubig	In this paper we propose a method for cross-lingual morphological tagging that aims to improve information sharing between languages by relaxing this assumption.
248	Global Transition-based Non-projective Dependency Parsing	Carlos Gómez-Rodríguez, Tianze Shi, Lillian Lee	In this paper, we extend their approach to support non-projectivity by providing the first practical implementation of the MH₄ algorithm, an $O(n^4)$ mildly nonprojective dynamic-programming parser with very high coverage on non-projective treebanks.
249	Constituency Parsing with a Self-Attentive Encoder	Nikita Kitaev, Dan Klein	We demonstrate that replacing an LSTM encoder with a self-attentive architecture can lead to improvements to a state-of-the-art discriminative constituency parser.
250	Pre- and In-Parsing Models for Neural Empty Category Detection	Yufei Chen, Yuanyuan Zhao, Weiwei Sun, Xiaojun Wan	Motivated by the positive impact of empty category on syntactic parsing, we study neural models for pre- and in-parsing detection of empty category, which has not previously been investigated.
251	Composing Finite State Transducers on GPUs	Arturo Argueta, David Chiang	We show that our approach obtains speedups of up to 6 times over our serial implementation and 4.5 times over OpenFST.
252	Supervised Treebank Conversion: Data and Approaches	Xinzhou Jiang, Zhenghua Li, Bo Zhang, Min Zhang, Sheng Li, Luo Si	In this work, we for the first time propose the task of supervised treebank conversion. First, we manually construct a bi-tree aligned dataset containing over ten thousand sentences.
253	Object-oriented Neural Programming (OONP) for Document Understanding	Zhengdong Lu, Xianggen Liu, Haotian Cui, Yukun Yan, Daqi Zheng	We propose Object-oriented Neural Programming (OONP), a framework for semantically parsing documents in specific domains.
254	Finding syntax in human encephalography with beam search	John Hale, Chris Dyer, Adhiguna Kuncoro, Jonathan Brennan	Finding syntax in human encephalography with beam search
255	Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information	Sudha Rao, Hal Daumé III	In this work, we build a neural network model for the task of ranking clarification questions. We create a dataset of clarification questions consisting of 77K posts paired with a clarification question (and answer) from three domains of StackExchange: askubuntu, unix and superuser.
256	Let’s do it “again”: A First Computational Approach to Detecting Adverbial Presupposition Triggers	Andre Cianflone, Yulan Feng, Jad Kabbara, Jackie Chi Kit Cheung	We introduce the novel task of predicting adverbial presupposition triggers, which is useful for natural language generation tasks such as summarization and dialogue systems. We introduce two new corpora, derived from the Penn Treebank and the Annotated English Gigaword dataset and investigate the use of a novel attention mechanism tailored to this task.

TABLE 2: ACL 2018 Short Papers

	Title	Authors	Highlight
1	Continuous Learning in a Hierarchical Multiscale Neural Network	Thomas Wolf, Julien Chaumond, Clement Delangue	We propose a hierarchical multi-scale language model in which short time-scale dependencies are encoded in the hidden state of a lower-level recurrent neural network while longer time-scale dependencies are encoded in the dynamic of the lower-level network by having a meta-learner update the weights of the lower-level neural network in an online meta-learning fashion.
2	Restricted Recurrent Neural Tensor Networks: Exploiting Word Frequency and Compositionality	Alexandre Salle, Aline Villavicencio	In this paper, we introduce restricted recurrent neural tensor networks (r-RNTN) which reserve distinct hidden layer weights for frequent vocabulary words while sharing a single set of weights for infrequent words.
3	Deep RNNs Encode Soft Hierarchical Syntax	Terra Blevins, Omer Levy, Luke Zettlemoyer	We present a set of experiments to demonstrate that deep recurrent neural networks (RNNs) learn internal representations that capture soft hierarchical notions of syntax from highly varied supervision.
4	Word Error Rate Estimation for Speech Recognition: e-WER	Ahmed Ali, Steve Renals	In this paper, we propose a novel approach to estimate WER, or e-WER, which does not require a gold-standard transcription of the test set.
5	Towards Robust and Privacy-preserving Text Representations	Yitong Li, Timothy Baldwin, Trevor Cohn	In this paper, we propose an approach to explicitly obscure important author characteristics at training time, such that representations learned are invariant to these attributes.
6	HotFlip: White-Box Adversarial Examples for Text Classification	Javid Ebrahimi, Anyi Rao, Daniel Lowd, Dejing Dou	We propose an efficient method to generate white-box adversarial examples to trick a character-level neural classifier.
7	Domain Adapted Word Embeddings for Improved Sentiment Classification	Prathusha K Sarma, Yingyu Liang, Bill Sethares	This paper proposes a method to combine the breadth of generic embeddings with the specificity of domain specific embeddings.
8	Active learning for deep semantic parsing	Long Duong, Hadi Afshar, Dominique Estival, Glen Pink, Philip Cohen, Mark Johnson	We propose several active learning strategies for overnight data collection and show that different example selection strategies per domain perform best.
9	Learning Thematic Similarity Metric from Article Sections Using Triplet Networks	Liat Ein Dor, Yosi Mass, Alon Halfon, Elad Venezian, Ilya Shnayderman, Ranit Aharonov, Noam Slonim	In this paper we suggest to leverage the partition of articles into sections, in order to learn thematic similarity metric between sentences. To test the performance of the learned embeddings, we create and release a sentence clustering benchmark.
10	Unsupervised Semantic Frame Induction using Triclustering	Dmitry Ustalov, Alexander Panchenko, Andrey Kutuzov, Chris Biemann, Simone Paolo Ponzetto	We cast the frame induction problem as a triclustering problem that is a generalization of clustering for triadic data.
11	Identification of Alias Links among Participants in Narratives	Sangameshwar Patil, Sachin Pawar, Swapnil Hingmire, Girish Palshikar, Vasudeva Varma, Pushpak Bhattacharyya	In this paper, we propose an approach based on linguistic knowledge for identification of aliases mentioned using proper nouns, pronouns or noun phrases with common noun headword.
12	Named Entity Recognition With Parallel Recurrent Neural Networks	Andrej Žukov-Gregorič, Yoram Bachrach, Sam Coope	We present a new architecture for named entity recognition.
13	Type-Sensitive Knowledge Base Inference Without Explicit Type Supervision	Prachi Jain, Pankaj Kumar, Mausam, Soumen Chakrabarti	State-of-the-art knowledge base completion (KBC) models predict a score for every known or unknown fact via a latent factorization over entity and relation embeddings.
14	A Walk-based Model on Entity Graphs for Relation Extraction	Fenia Christopoulou, Makoto Miwa, Sophia Ananiadou	We present a novel graph-based neural network model for relation extraction.
15	Ranking-Based Automatic Seed Selection and Noise Reduction for Weakly Supervised Relation Extraction	Van-Thuy Phi, Joan Santoso, Masashi Shimbo, Yuji Matsumoto	This paper addresses the tasks of automatic seed selection for bootstrapping relation extraction, and noise reduction for distantly supervised relation extraction.
16	Automatic Extraction of Commonsense LocatedNear Knowledge	Frank F. Xu, Bill Yuchen Lin, Kenny Zhu	In this paper, we study how to automatically extract such relationship through a sentence-level relation classifier and aggregating the scores of entity pairs from a large corpus. Also, we release two benchmark datasets for evaluation and future research.
17	Neural Coreference Resolution with Deep Biaffine Attention by Joint Mention Detection and Mention Clustering	Rui Zhang, Cícero Nogueira dos Santos, Michihiro Yasunaga, Bing Xiang, Dragomir Radev	In this paper, we propose to improve the end-to-end coreference resolution system by (1) using a biaffine attention model to get antecedent scores for each possible mention, and (2) jointly optimizing the mention detection accuracy and mention clustering accuracy given the mention cluster labels.
18	Fully Statistical Neural Belief Tracking	Nikola Mrkšić, Ivan Vulić	This paper proposes an improvement to the existing data-driven Neural Belief Tracking (NBT) framework for Dialogue State Tracking (DST).
19	Some of Them Can be Guessed! Exploring the Effect of Linguistic Context in Predicting Quantifiers	Sandro Pezzelle, Shane Steinert-Threlkeld, Raffaella Bernardi, Jakub Szymanik	We study the role of linguistic context in predicting quantifiers (few’, all’). We collect crowdsourced data from human participants and test various models in a local (single-sentence) and a global context (multi-sentence) condition.
20	A Named Entity Recognition Shootout for German	Martin Riedl, Sebastian Padó	We ask how to practically build a model for German named entity recognition (NER) that performs at the state of the art for both contemporary and historical texts, i.e., a big-data and a small-data scenario.
21	A dataset for identifying actionable feedback in collaborative software development	Benjamin S. Meyers, Nuthan Munaiah, Emily Prud’hommeaux, Andrew Meneely, Josephine Wolff, Cecilia Ovesdotter Alm, Pradeep Murukannaiah	To understand the factors that contribute to this outcome, we analyze a novel dataset of more than one million code reviews for the Google Chromium project, from which we extract linguistic features of feedback that elicited responsive actions from coworkers.
22	SNAG: Spoken Narratives and Gaze Dataset	Preethi Vaidyanathan, Emily T. Prud’hommeaux, Jeff B. Pelz, Cecilia O. Alm	In this paper, we describe a new multimodal dataset that consists of gaze measurements and spoken descriptions collected in parallel during an image inspection task.
23	Analogical Reasoning on Chinese Morphological and Semantic Relations	Shen Li, Zhe Zhao, Renfen Hu, Wensi Li, Tao Liu, Xiaoyong Du	This paper proposes an analogical reasoning task on Chinese.
24	Construction of a Chinese Corpus for the Analysis of the Emotionality of Metaphorical Expressions	Dongyu Zhang, Hongfei Lin, Liang Yang, Shaowu Zhang, Bo Xu	We present an annotation scheme that contains annotations of linguistic metaphors, emotional categories (joy, anger, sadness, fear, love, disgust and surprise), and intensity. We therefore construct a significant new corpus on metaphor, with 5,605 manually annotated sentences in Chinese.
25	Automatic Article Commenting: the Task and Dataset	Lianhui Qin, Lemao Liu, Wei Bi, Yan Wang, Xiaojiang Liu, Zhiting Hu, Hai Zhao, Shuming Shi	This paper proposes the new task of automatic article commenting, and introduces a large-scale Chinese dataset with millions of real comments and a human-annotated subset characterizing the comments’ varying quality.
26	Improved Evaluation Framework for Complex Plagiarism Detection	Anton Belyy, Marina Dubova, Dmitry Nekrasov	In this paper, we study the performance of plagdet, the main measure for plagiarim detection, on manually paraphrased datasets (such as PAN Summary).
27	Global Encoding for Abstractive Summarization	Junyang Lin, Xu Sun, Shuming Ma, Qi Su	To tackle the problem, we propose a global encoding framework, which controls the information flow from the encoder to the decoder based on the global information of the source context.
28	A Language Model based Evaluator for Sentence Compression	Yang Zhao, Zhiyuan Luo, Akiko Aizawa	We herein present a language-model-based evaluator for deletion-based sentence compression and view this task as a series of deletion-and-evaluation operations using the evaluator.
29	Identifying and Understanding User Reactions to Deceptive and Trusted Social News Sources	Maria Glenski, Tim Weninger, Svitlana Volkova	In the present work we seek to better understand how users react to trusted and deceptive news sources across two popular, and very different, social media platforms.
30	Content-based Popularity Prediction of Online Petitions Using a Deep Regression Model	Shivashankar Subramanian, Timothy Baldwin, Trevor Cohn	In this work, we model this task using CNN regression with an auxiliary ordinal regression objective.
31	Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer	Cicero Nogueira dos Santos, Igor Melnyk, Inkit Padhi	We introduce a new approach to tackle the problem of offensive language in online social media.
32	Diachronic degradation of language models: Insights from social media	Kokil Jaidka, Niyati Chhaya, Lyle Ungar	This study investigates the diachronic accuracy of pre-trained language models for downstream tasks in machine learning and user profiling.
33	Task-oriented Dialogue System for Automatic Diagnosis	Zhongyu Wei, Qianlong Liu, Baolin Peng, Huaixiao Tou, Ting Chen, Xuanjing Huang, Kam-fai Wong, Xiangying Dai	In this paper, we make a move to build a dialogue system for automatic diagnosis. We first build a dataset collected from an online medical forum by extracting symptoms from both patients’ self-reports and conversational data between patients and doctors.
34	Transfer Learning for Context-Aware Question Matching in Information-seeking Conversations in E-commerce	Minghui Qiu, Liu Yang, Feng Ji, Wei Zhou, Jun Huang, Haiqing Chen, Bruce Croft, Wei Lin	To alleviate these problems, we study transfer learning for multi-turn information seeking conversations in this paper.
35	A Multi-task Approach to Learning Multilingual Representations	Karan Singla, Dogan Can, Shrikanth Narayanan	We present a novel multi-task modeling approach to learning multilingual distributed representations of text.
36	Characterizing Departures from Linearity in Word Translation	Ndapa Nakashole, Raphael Flauger	We investigate the behavior of maps learned by machine translation methods.
37	Filtering and Mining Parallel Data in a Joint Multilingual Space	Holger Schwenk	We learn a joint multilingual sentence embedding and use the distance between sentences in different languages to filter noisy parallel data and to mine for parallel data in large news collections.
38	Hybrid semi-Markov CRF for Neural Sequence Labeling	Zhixiu Ye, Zhen-Hua Ling	In this paper, we improve the existing SCRF methods by employing word-level and segment-level information simultaneously.
39	A Study of the Importance of External Knowledge in the Named Entity Recognition Task	Dominic Seyler, Tatiana Dembelova, Luciano Del Corro, Johannes Hoffart, Gerhard Weikum	In this work, we discuss the importance of external knowledge for performing Named Entity Recognition (NER).
40	Improving Topic Quality by Promoting Named Entities in Topic Modeling	Katsiaryna Krasnashchok, Salim Jouili	In this paper we use named entities as domain-specific terms for news-centric content and present a new weighting model for Latent Dirichlet Allocation.
41	Obligation and Prohibition Extraction Using Hierarchical RNNs	Ilias Chalkidis, Ion Androutsopoulos, Achilleas Michos	We consider the task of detecting contractual obligations and prohibitions.
42	Paper Abstract Writing through Editing Mechanism	Qingyun Wang, Zhihao Zhou, Lifu Huang, Spencer Whitehead, Boliang Zhang, Heng Ji, Kevin Knight	We present a paper abstract writing system based on an attentive neural sequence-to-sequence model that can take a title as input and automatically generate an abstract.
43	Conditional Generators of Words Definitions	Artyom Gadetsky, Ilya Yakubovskiy, Dmitry Vetrov	In this work, we study the problem of word ambiguities in definition modeling and propose a possible solution by employing latent variable modeling and soft attention mechanisms.
44	CNN for Text-Based Multiple Choice Question Answering	Akshay Chaturvedi, Onkar Pandit, Utpal Garain	In this paper, we propose a Convolutional Neural Network (CNN) model for text-based multiple choice question answering where questions are based on a particular article.
45	Narrative Modeling with Memory Chains and Semantic Supervision	Fei Liu, Trevor Cohn, Timothy Baldwin	Inspired by previous studies on ROC Story Cloze Test, we propose a novel method, tracking various semantic aspects with external neural memory chains while encouraging each to focus on a particular semantic aspect.
46	Injecting Relational Structural Representation in Neural Networks for Question Similarity	Antonio Uva, Daniele Bonadiman, Alessandro Moschitti	In this paper, we propose to inject structural representations in NNs by (i) learning a model with Tree Kernels (TKs) on relatively few pairs of questions (few thousands) as gold standard (GS) training data is typically scarce, (ii) predicting labels on a very large corpus of question pairs, and (iii) pre-training NNs on such large corpus.
47	A Simple and Effective Approach to Coverage-Aware Neural Machine Translation	Yanyang Li, Tong Xiao, Yinqiao Li, Qiang Wang, Changming Xu, Jingbo Zhu	We offer a simple and effective method to seek a better balance between model confidence and length preference for Neural Machine Translation (NMT).
48	Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation	Rui Wang, Masao Utiyama, Eiichiro Sumita	Here, we propose an efficient method to dynamically sample the sentences in order to accelerate the NMT training.
49	Compositional Representation of Morphologically-Rich Input for Neural Machine Translation	Duygu Ataman, Marcello Federico	As a solution, various studies proposed segmenting words into sub-word units and performing translation at the sub-lexical level.
50	Extreme Adaptation for Personalized Neural Machine Translation	Paul Michel, Graham Neubig	In this paper, we propose a simple and parameter-efficient adaptation technique that only requires adapting the bias of the output softmax to each particular user of the MT system, either directly or through a factored approximation.
51	Multi-representation ensembles and delayed SGD updates improve syntax-based NMT	Danielle Saunders, Felix Stahlberg, Adrià de Gispert, Bill Byrne	We formulate beam search over such ensembles using WFSTs, and describe a delayed SGD update training procedure that is especially effective for long representations like linearized syntax.
52	Learning from Chunk-based Feedback in Neural Machine Translation	Pavel Petrushkov, Shahram Khadivi, Evgeny Matusov	We propose a simple and effective way of utilizing such feedback in NMT training.
53	Bag-of-Words as Target for Neural Machine Translation	Shuming Ma, Xu Sun, Yizhong Wang, Junyang Lin	In this paper, we propose an approach that uses both the sentences and the bag-of-words as targets in the training stage, in order to encourage the model to generate the potentially correct sentences that are not appeared in the training set.
54	Improving Beam Search by Removing Monotonic Constraint for Neural Machine Translation	Raphael Shu, Hideki Nakayama	Despite its simplicity, we show that the proposed decoding algorithm enhances the quality of selected hypotheses and improve the translations even for high-performance models in English-Japanese translation task.
55	Leveraging distributed representations and lexico-syntactic fixedness for token-level prediction of the idiomaticity of English verb-noun combinations	Milton King, Paul Cook	In this paper we propose and evaluate models for classifying VNC usages as idiomatic or literal, based on a variety of approaches to forming distributed representations.
56	Using pseudo-senses for improving the extraction of synonyms from word embeddings	Olivier Ferret	In this article, we propose Pseudofit, a new method for specializing word embeddings according to semantic similarity without any external knowledge.
57	Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora	Stephen Roller, Douwe Kiela, Maximilian Nickel	In this paper, we study the performance of both approaches on several hypernymy tasks and find that simple pattern-based methods consistently outperform distributional methods on common benchmark datasets.
58	Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling	Luheng He, Kenton Lee, Omer Levy, Luke Zettlemoyer	We propose an end-to-end approach for jointly predicting all predicates, arguments spans, and the relations between them.
59	Sparse and Constrained Attention for Neural Machine Translation	Chaitanya Malaviya, Pedro Ferreira, André F. T. Martins	We experiment with various sparse and constrained attention transformations and propose a new one, constrained sparsemax, shown to be differentiable and sparse.
60	Neural Hidden Markov Model for Machine Translation	Weiyue Wang, Derui Zhu, Tamer Alkhouli, Zixuan Gan, Hermann Ney	We study a neural hidden Markov model (HMM) consisting of neural network-based alignment and lexicon models, which are trained jointly using the forward-backward algorithm.
61	Bleaching Text: Abstract Features for Cross-lingual Gender Prediction	Rob van der Goot, Nikola Ljubešić, Ian Matroos, Malvina Nissim, Barbara Plank	We propose an alternative: bleaching text, i.e., transforming lexical strings into more abstract features.
62	Orthographic Features for Bilingual Lexicon Induction	Parker Riley, Daniel Gildea	This work extends embedding-based methods to incorporate these features, resulting in significant accuracy gains for related languages.
63	Neural Cross-Lingual Coreference Resolution And Its Application To Entity Linking	Gourab Kundu, Avi Sil, Radu Florian, Wael Hamza	We propose an entity-centric neural crosslingual coreference model that builds on multi-lingual embeddings and language independent features.
64	Judicious Selection of Training Data in Assisting Language for Multilingual Neural NER	Rudra Murthy, Anoop Kunchukuttan, Pushpak Bhattacharyya	To alleviate this problem, we propose a metric based on symmetric KL divergence to filter out the highly divergent training instances in the assisting language.
65	Neural Open Information Extraction	Lei Cui, Furu Wei, Ming Zhou	In this paper, we propose a neural Open IE approach with an encoder-decoder framework.
66	Document Embedding Enhanced Event Detection with Hierarchical and Supervised Attention	Yue Zhao, Xiaolong Jin, Yuanzhuo Wang, Xueqi Cheng	In this paper, we propose a novel Document Embedding Enhanced Bi-RNN model, called DEEB-RNN, to detect events in sentences.
67	Learning Matching Models with Weak Supervision for Response Selection in Retrieval-based Chatbots	Yu Wu, Wei Wu, Zhoujun Li, Ming Zhou	We propose a method that can leverage unlabeled data to learn a matching model for response selection in retrieval-based chatbots.
68	Improving Slot Filling in Spoken Language Understanding with Joint Pointer and Attention	Lin Zhao, Zhe Feng	We present a generative neural network model for slot filling based on a sequence-to-sequence (Seq2Seq) model together with a pointer network, in the situation where only sentence-level slot annotations are available in the spoken dialogue data.
69	Large-Scale Multi-Domain Belief Tracking with Knowledge Sharing	Osman Ramadan, Paweł Budzianowski, Milica Gašić	In this paper, a novel approach is introduced that fully utilizes semantic similarity between dialogue utterances and the ontology terms, allowing the information to be shared across domains.
70	Modeling discourse cohesion for discourse parsing via memory network	Yanyan Jia, Yuan Ye, Yansong Feng, Yuxuan Lai, Rui Yan, Dongyan Zhao	In this paper, we propose a new transition-based discourse parser that makes use of memory networks to take discourse cohesion into account.
71	SciDTB: Discourse Dependency TreeBank for Scientific Abstracts	An Yang, Sujian Li	In this paper, we present SciDTB, a domain-specific discourse treebank annotated on scientific articles.
72	Predicting accuracy on large datasets from smaller pilot data	Mark Johnson, Peter Anderson, Mark Dras, Mark Steedman	We introduce a new performance extrapolation task to evaluate how well different extrapolations predict accuracy on larger training sets.
73	The Influence of Context on Sentence Acceptability Judgements	Jean-Philippe Bernardy, Shalom Lappin, Jey Han Lau	We investigate the influence that document context exerts on human acceptability judgements for English sentences, via two sets of experiments.
74	Do Neural Network Cross-Modal Mappings Really Bridge Modalities?	Guillem Collell, Marie-Francine Moens	Here, we propose a new similarity measure and two ad hoc experiments to shed light on this issue.
75	Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing	Daniel Fried, Dan Klein	We explore using a policy gradient method as a parser-agnostic alternative.
76	Linear-time Constituency Parsing with RNNs and Dynamic Programming	Juneki Hong, Liang Huang	We propose a linear-time constituency parser with RNNs and dynamic programming using graph-structured stack and beam search, which runs in time $O(n b^2)$ where $b$ is the beam size.
77	Simpler but More Accurate Semantic Dependency Parsing	Timothy Dozat, Christopher D. Manning	We extend the LSTM-based syntactic parser of Dozat and Manning (2017) to train on and generate these graph structures.
78	Simplified Abugidas	Chenchen Ding, Masao Utiyama, Eiichiro Sumita	An abugida is a writing system where the consonant letters represent syllables with a default vowel and other vowels are denoted by diacritics.
79	Automatic Academic Paper Rating Based on Modularized Hierarchical Convolutional Neural Network	Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma	In this paper, in order to assist professionals in evaluating academic papers, we propose a novel task: automatic academic paper rating (AAPR), which automatically determine whether to accept academic papers. We build a new dataset for this task and propose a novel modularized hierarchical convolutional neural network to achieve automatic academic paper rating.
80	Automated essay scoring with string kernels and word embeddings	Mădălina Cozma, Andrei Butnaru, Radu Tudor Ionescu	In this work, we present an approach based on combining string kernels and word embeddings for automatic essay scoring.
81	Party Matters: Enhancing Legislative Embeddings with Author Attributes for Vote Prediction	Anastassia Kornilova, Daniel Argyle, Vladimir Eidelman	In this paper, we show that text alone is insufficient for modeling voting outcomes in new contexts, as session changes lead to changes in the underlying data generation process.
82	Dynamic and Static Topic Model for Analyzing Time-Series Document Collections	Rem Hida, Naoya Takeishi, Takehisa Yairi, Koichi Hori	To this end, we propose a dynamic and static topic model, which simultaneously considers the dynamic structures of the temporal topic evolution and the static structures of the topic hierarchy at each time.
83	PhraseCTM: Correlated Topic Modeling on Phrases within Markov Random Fields	Weijing Huang	We propose a novel topic model PhraseCTM and a two-stage method to find out the correlated topics at phrase level.
84	A Document Descriptor using Covariance of Word Vectors	Marwan Torki	In this paper, we address the problem of finding a novel document descriptor based on the covariance matrix of the word vectors of a document.
85	Learning with Structured Representations for Negation Scope Extraction	Hao Li, Wei Lu	We design approaches based on conditional random fields (CRF), semi-Markov CRF, as well as latent-variable CRF models to capture such information.
86	End-Task Oriented Textual Entailment via Deep Explorations of Inter-Sentence Interactions	Wenpeng Yin, Dan Roth, Hinrich Schütze	We propose DEISTE (deep explorations of inter-sentence interactions for textual entailment) for this entailment task.
87	Sense-Aware Neural Models for Pun Location in Texts	Yitao Cai, Yin Li, Xiaojun Wan	In this paper, we focus on the task of pun location, which aims to identify the pun word in a given short text.
88	A Rank-Based Similarity Metric for Word Embeddings	Enrico Santus, Hongmin Wang, Emmanuele Chersoni, Yue Zhang	In this paper, we report experiments with a rank-based metric for WE, which performs comparably to vector cosine in similarity estimation and outperforms it in the recently-introduced and challenging task of outlier detection, thus suggesting that rank-based measures can improve clustering quality.
89	Addressing Noise in Multidialectal Word Embeddings	Alexander Erdmann, Nasser Zalmout, Nizar Habash	We make three contributions to address this noise.
90	GNEG: Graph-Based Negative Sampling for word2vec	Zheng Zhang, Pierre Zweigenbaum	In this purpose we pre-compute word co-occurrence statistics from the corpus and apply to it network algorithms such as random walk.
91	Unsupervised Learning of Style-sensitive Word Vectors	Reina Akama, Kento Watanabe, Sho Yokoi, Sosuke Kobayashi, Kentaro Inui	This paper presents the first study aimed at capturing stylistic similarity between words in an unsupervised manner.
92	Exploiting Document Knowledge for Aspect-level Sentiment Classification	Ruidan He, Wee Sun Lee, Hwee Tou Ng, Daniel Dahlmeier	In this paper, we explore two approaches that transfer knowledge from document-level data, which is much less expensive to obtain, to improve the performance of aspect-level sentiment classification.
93	Modeling Sentiment Association in Discourse for Humor Recognition	Lizhen Liu, Donghai Zhang, Wei Song	This paper proposes to model sentiment association between discourse units to indicate how the punchline breaks the expectation of the setup.
94	Double Embeddings and CNN-based Sequence Labeling for Aspect Extraction	Hu Xu, Bing Liu, Lei Shu, Philip S. Yu	Unlike other highly sophisticated supervised deep learning models, this paper proposes a novel and yet simple CNN model employing two types of pre-trained embeddings for aspect extraction: general-purpose embeddings and domain-specific embeddings.
95	Will it Blend? Blending Weak and Strong Labeled Data in a Neural Network for Argumentation Mining	Eyal Shnarch, Carlos Alzate, Lena Dankin, Martin Gleize, Yufang Hou, Leshem Choshen, Ranit Aharonov, Noam Slonim	We propose a methodology to blend high quality but scarce strong labeled data with noisy but abundant weak labeled data during the training of neural networks. In addition, we provide a manually annotated data set for the task of topic-dependent evidence detection.
96	Investigating Audio, Video, and Text Fusion Methods for End-to-End Automatic Personality Prediction	Onno Kampman, Elham J. Barezi, Dario Bertero, Pascale Fung	We propose a tri-modal architecture to predict Big Five personality trait scores from video clips with different channels for audio, text, and video data.
97	An Empirical Study of Building a Strong Baseline for Constituency Parsing	Jun Suzuki, Sho Takase, Hidetaka Kamigaito, Makoto Morishita, Masaaki Nagata	We incorporate several techniques that were mainly developed in natural language generation tasks, e.g., machine translation and summarization, and demonstrate that the sequence-to-sequence model achieves the current top-notch parsers’ performance (almost) without requiring any explicit task-specific knowledge or architecture of constituent parsing.
98	Parser Training with Heterogeneous Treebanks	Sara Stymne, Miryam de Lhoneux, Aaron Smith, Joakim Nivre	We go on to propose a new method based on treebank embeddings.
99	Generalized chart constraints for efficient PCFG and TAG parsing	Stefan Grünewald, Sophie Henning, Alexander Koller	We generalize chart constraints to more expressive grammar formalisms and describe a neural tagger which predicts chart constraints at very high precision.
100	Exploring Semantic Properties of Sentence Embeddings	Xunjie Zhu, Tingfeng Li, Gerard de Melo	In this paper, we assess to what extent prominent sentence embedding methods exhibit select semantic properties.
101	Scoring Lexical Entailment with a Supervised Directional Similarity Network	Marek Rei, Daniela Gerz, Ivan Vulić	We present the Supervised Directional Similarity Network, a novel neural architecture for learning task-specific transformation functions on top of general-purpose word embeddings.
102	Extracting Commonsense Properties from Embeddings with Limited Human Guidance	Yiben Yang, Larry Birnbaum, Ji-Ping Wang, Doug Downey	We propose and assess methods for extracting one type of commonsense knowledge, object-property comparisons, from pre-trained embeddings.
103	Breaking NLI Systems with Sentences that Require Simple Lexical Inferences	Max Glockner, Vered Shwartz, Yoav Goldberg	We create a new NLI test set that shows the deficiency of state-of-the-art models in inferences that require lexical and world knowledge.
104	Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation	Poorya Zaremoodi, Wray Buntine, Gholamreza Haffari	We address this issue by extending the recurrent units with multiple “blocks” along with a trainable “routing network”.
105	Automatic Estimation of Simultaneous Interpreter Performance	Craig Stewart, Nikolai Vogler, Junjie Hu, Jordan Boyd-Graber, Graham Neubig	We propose the task of predicting simultaneous interpreter performance by building on existing methodology for quality estimation (QE) of machine translation output.
106	Polyglot Semantic Role Labeling	Phoebe Mulcaire, Swabha Swayamdipta, Noah A. Smith	We experiment with a new approach where we combine resources from different languages in the CoNLL 2009 shared task to build a single polyglot semantic dependency parser.
107	Learning Cross-lingual Distributed Logical Representations for Semantic Parsing	Yanyan Zou, Wei Lu	In this work, we present a study to show how learning distributed representations of the logical forms from data annotated in different languages can be used for improving the performance of a monolingual semantic parser.
108	Enhancing Drug-Drug Interaction Extraction from Texts by Molecular Structure Information	Masaki Asada, Makoto Miwa, Yutaka Sasaki	We propose a novel neural method to extract drug-drug interactions (DDIs) from texts using external drug molecular structure information.
109	diaNED: Time-Aware Named Entity Disambiguation for Diachronic Corpora	Prabal Agarwal, Jannik Strötgen, Luciano del Corro, Johannes Hoffart, Gerhard Weikum	This paper presents the first time-aware method for NED that resolves ambiguities even when mention contexts give only few cues.
110	Examining Temporality in Document Classification	Xiaolei Huang, Michael J. Paul	This study investigates how document classifiers trained on documents from certain time intervals perform on documents from other time intervals, considering both seasonal intervals (intervals that repeat across years, e.g., winter) and non-seasonal intervals (e.g., specific years).
111	Personalized Language Model for Query Auto-Completion	Aaron Jaech, Mari Ostendorf	We show how an adaptable language model can be used to generate personalized completions and how the model can use online updating to make predictions for users not seen during training.
112	Personalized Review Generation By Expanding Phrases and Attending on Aspect-Aware Representations	Jianmo Ni, Julian McAuley	In this paper, we focus on the problem of building assistive systems that can help users to write reviews.
113	Learning Simplifications for Specific Target Audiences	Carolina Scarton, Lucia Specia	We explore these two features of TS to build models tailored for specific grade levels.
114	Split and Rephrase: Better Evaluation and Stronger Baselines	Roee Aharoni, Yoav Goldberg	To aid this, we present a new train-development-test data split and neural models augmented with a copy-mechanism, outperforming the best reported baseline by 8.68 BLEU and fostering further progress on the task.
115	Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization	Shuming Ma, Xu Sun, Junyang Lin, Houfeng Wang	In this work, we supervise the learning of the representation of the source content with that of the summary.
116	Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum	Omer Levy, Kenton Lee, Nicholas FitzGerald, Luke Zettlemoyer	We present an alternative view to explain the success of LSTMs: the gates themselves are versatile recurrent models that provide more representational power than previously appreciated.
117	On the Practical Computational Power of Finite Precision RNNs for Language Recognition	Gail Weiss, Yoav Goldberg, Eran Yahav	We consider the case of RNNs with finite precision whose computation time is linear in the input length.
118	A Co-Matching Model for Multi-choice Reading Comprehension	Shuohang Wang, Mo Yu, Jing Jiang, Shiyu Chang	This paper proposes a new co-matching approach to this problem, which jointly models whether a passage can match both a question and a candidate answer.
119	Tackling the Story Ending Biases in The Story Cloze Test	Rishi Sharma, James Allen, Omid Bakhshandeh, Nasrin Mostafazadeh	In order to shed some light on this issue, we have performed various data analysis and analyzed a variety of top performing models presented for this task.
120	A Multi-sentiment-resource Enhanced Attention Network for Sentiment Classification	Zeyang Lei, Yujiu Yang, Min Yang, Yi Liu	In this paper, we propose a Multi-sentiment-resource Enhanced Attention Network (MEAN) to alleviate the problem by integrating three kinds of sentiment linguistic knowledge (e.g., sentiment lexicon, negation words, intensity words) into the deep neural network via attention mechanisms.
121	Pretraining Sentiment Classifiers with Unlabeled Dialog Data	Toru Shimizu, Nobuyuki Shimizu, Hayato Kobayashi	In this paper, we take the concept a step further by using a conditional language model, instead of a language model.
122	Disambiguating False-Alarm Hashtag Usages in Tweets for Irony Detection	Hen-Hsen Huang, Chiao-Chen Chen, Hsin-Hsi Chen	We analyze the ambiguity of hashtag usages and propose a novel neural network-based model, which incorporates linguistic information from different aspects, to disambiguate the usage of three hashtags that are widely used to collect the training data for irony detection.
123	Cross-Target Stance Classification with Self-Attention Networks	Chang Xu, Cécile Paris, Surya Nepal, Ross Sparks	In this work, we explore the potential for generalizing classifiers between different targets, and propose a neural model that can apply what has been learned from a source target to a destination target.
124	Know What You Don’t Know: Unanswerable Questions for SQuAD	Pranav Rajpurkar, Robin Jia, Percy Liang	To address these weaknesses, we present SQuADRUn, a new dataset that combines the existing Stanford Question Answering Dataset (SQuAD) with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones.
125	`Lighter’ Can Still Be Dark: Modeling Comparative Color Descriptions	Olivia Winn, Smaranda Muresan	We propose a novel paradigm of grounding comparative adjectives within the realm of color descriptions.