Paper Digest: NAACL 2019 Highlights

June 1, 2019October 5, 2019 admin

Download NAACL-2019-Paper-Digests.pdf– highlights of all 424 NAACL-2019 papers (.PDF file size is ~0.3M).
Download NAACL-2019-Industry-Paper-Digests.pdf– highlights of all 28 NAACL-2019 industry track papers (.PDF file size is ~0.2M).

The North American Chapter of the Association for Computational Linguistics (NAACL) is one of the top natural language processing conferences in the world. In 2019, it is to be held in Minneapolis, MN. There were ~2,000 paper submissions, of which 424 were accepted. In addition, 28 industry papers are also accepted.

To help AI community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summary to quickly get the main idea of each paper.

We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting AI paper, you are welcome to sign up our free paper digest service to get new paper updates customized to your own interests on a daily basis.

Paper Digest Team
team@paperdigest.org

TABLE 1: NAACL 2019 Papers

	Title	Authors	Highlight
1	Entity Recognition at First Sight: Improving NER with Eye Movement Information	Nora Hollenstein, Ce Zhang,	In this work, we leverage eye movement features from three corpora with recorded gaze information to augment a state-of-the-art neural model for named entity recognition (NER) with gaze embeddings.
2	The emergence of number and syntax units in LSTM language models	Yair Lakretz, Germán Kruszewski, Théo Desbordes, Dieuwke Hupkes, Stanislas Dehaene, Marco Baroni,	We present here a detailed study of the inner mechanics of number tracking in LSTMs at the single neuron level.
3	Neural Self-Training through Spaced Repetition	Hadi Amiri,	In this work, we tackle the above challenges by introducing a new data sampling technique based on spaced repetition that dynamically samples informative and diverse unlabeled instances with respect to individual learner and instance characteristics.
4	Neural language models as psycholinguistic subjects: Representations of syntactic state	Richard Futrell, Ethan Wilcox, Takashi Morita, Peng Qian, Miguel Ballesteros, Roger Levy,	We investigate the extent to which the behavior of neural network language models reflects incremental representations of syntactic state.
5	Understanding language-elicited EEG data by predicting it from a fine-tuned language model	Dan Schwartz, Tom Mitchell,	We take a step towards better understanding the ERPs by finetuning a language model to predict them.
6	Pre-training on high-resource speech recognition improves low-resource speech-to-text translation	Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, Sharon Goldwater,	We present a simple approach to improve direct speech-to-text translation (ST) when the source language is low-resource: we pre-train the model on a high-resource automatic speech recognition (ASR) task, and then fine-tune its parameters for ST. We demonstrate that our approach is effective by pre-training on 300 hours of English ASR data to improve Spanish English ST from 10.8 to 20.2 BLEU when only 20 hours of Spanish-English ST training data are available.
7	Measuring the perceptual availability of phonological features during language acquisition using unsupervised binary stochastic autoencoders	Cory Shain, Micha Elsner,	In this paper, we deploy binary stochastic neural autoencoder networks as models of infant language learning in two typologically unrelated languages (Xitsonga and English).
8	Giving Attention to the Unexpected: Using Prosody Innovations in Disfluency Detection	Vicky Zayats, Mari Ostendorf,	This paper introduces a new approach to extracting acoustic-prosodic cues using text-based distributional prediction of acoustic cues to derive vector z-score features (innovations).
9	Massively Multilingual Adversarial Speech Recognition	Oliver Adams, Matthew Wiesner, Shinji Watanabe, David Yarowsky,	We report on adaptation of multilingual end-to-end speech recognition models trained on as many as 100 languages.
10	Lost in Interpretation: Predicting Untranslated Terminology in Simultaneous Interpretation	Nikolai Vogler, Craig Stewart, Graham Neubig,	In this paper, we propose a task of predicting which terminology simultaneous interpreters will leave untranslated, and examine methods that perform this task using supervised sequence taggers.
11	AudioCaps: Generating Captions for Audios in The Wild	Chris Dongjoo Kim, Byeongchang Kim, Hyunmin Lee, Gunhee Kim,	We explore the problem of Audio Captioning: generating natural language description for any kind of audio in the wild, which has been surprisingly unexplored in previous research.
12	“President Vows to Cut textlessTaxestextgreater Hair”: Dataset and Analysis of Creative Text Editing for Humorous Headlines	Nabil Hossain, John Krumm, Michael Gamon,	We introduce, release, and analyze a new dataset, called Humicroedit, for research in computational humor.
13	Answer-based Adversarial Training for Generating Clarification Questions	Sudha Rao, Hal Daumé III,	We present an approach for generating clarification questions with the goal of eliciting new information that would make the given textual context more complete.
14	Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data	Wei Zhao, Liang Wang, Kewei Shen, Ruoyu Jia, Jingming Liu,	In this paper, we propose a copy-augmented architecture for the GEC task by copying the unchanged words from the source sentence to the target sentence.
15	Topic-Guided Variational Auto-Encoder for Text Generation	Wenlin Wang, Zhe Gan, Hongteng Xu, Ruiyi Zhang, Guoyin Wang, Dinghan Shen, Changyou Chen, Lawrence Carin,	We propose a topic-guided variational auto-encoder (TGVAE) model for text generation.
16	Implementation of a Chomsky-Schutzenberger n-best parser for weighted multiple context-free grammars	Thomas Ruprecht, Tobias Denkinger,	We provide the first implementation of Chomsky-Sch{\”u}tzenberger parsing.
17	Phylogenic Multi-Lingual Dependency Parsing	Mathieu Dehouck, Pascal Denis,	In this paper, drawing inspiration from multi-task learning, we make use of the phylogenetic tree to guide the learning of multi-lingual dependency parsers leveraging languages structural similarities.
18	Discontinuous Constituency Parsing with a Stack-Free Transition System and a Dynamic Oracle	Maximin Coavoux, Shay B. Cohen,	We introduce a novel transition system for discontinuous constituency parsing.
19	How Bad are PoS Tagger in Cross-Corpora Settings? Evaluating Annotation Divergence in the UD Project.	Guillaume Wisniewski, François Yvon,	How Bad are PoS Tagger in Cross-Corpora Settings? Evaluating Annotation Divergence in the UD Project.
20	CCG Parsing Algorithm with Incremental Tree Rotation	Miloš Stanojevi?, Mark Steedman,	We propose a new incremental parsing algorithm for CCG following the same revealing tradition of work but having a purely syntactic approach that does not depend on access to a distinct level of semantic representation.
21	Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing	Hao Fu, Chunyuan Li, Xiaodong Liu, Jianfeng Gao, Asli Celikyilmaz, Lawrence Carin,	In this paper we study different scheduling schemes for $\beta$, and show that KL vanishing is caused by the lack of good latent codes in training decoder at the beginning of optimization.
22	Recurrent models and lower bounds for projective syntactic decoding	Natalie Schluter,	We show how recurrent models can carry out projective maximum spanning tree decoding.
23	Evaluating Composition Models for Verb Phrase Elliptical Sentence Embeddings	Gijs Wijnholds, Mehrnoosh Sadrzadeh,	In this paper, we develop different models for embedding VP-elliptical sentences.
24	Neural Finite-State Transducers: Beyond Rational Relations	Chu-Cheng Lin, Hao Zhu, Matthew R. Gormley, Jason Eisner,	We present training and inference algorithms for locally and globally normalized variants of NFSTs.
25	Riemannian Normalizing Flow on Variational Wasserstein Autoencoder for Text Modeling	Prince Zizhuang Wang, William Yang Wang,	To address this problem, we introduce an improved Variational Wasserstein Autoencoder (WAE) with Riemannian Normalizing Flow (RNF) for text modeling.
26	A Study of Incorrect Paraphrases in Crowdsourced User Utterances	Mohammad-Ali Yaghoub-Zadeh-Fard, Boualem Benatallah, Moshe Chai Barukh, Shayan Zamanirad,	In this paper, we investigate common crowdsourced paraphrasing issues, and propose an annotated dataset called Para-Quality, for detecting the quality issues.
27	ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters	Abdalghani Abujabal, Rishiraj Saha Roy, Mohamed Yahya, Gerhard Weikum,	We introduce ComQA, a large dataset of real user questions that exhibit different challenging aspects such as compositionality, temporal reasoning, and comparisons.
28	FreebaseQA: A New Factoid QA Data Set Matching Trivia-Style Question-Answer Pairs with Freebase	Kelvin Jiang, Dekun Wu, Hui Jiang,	In this paper, we present a new data set, named FreebaseQA, for open-domain factoid question answering (QA) tasks over structured knowledge bases, like Freebase.
29	Simple Question Answering with Subgraph Ranking and Joint-Scoring	Wenbo Zhao, Tagyoung Chung, Anuj Goyal, Angeliki Metallinou,	Motivated by this, we present a unified framework to describe and analyze existing approaches.
30	Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question Answering	Jianmo Ni, Chenguang Zhu, Weizhu Chen, Julian McAuley,	In this paper we propose a retriever-reader model that learns to attend on essential terms during the question answering process.
31	UHop: An Unrestricted-Hop Relation Extraction Framework for Knowledge-Based Question Answering	Zi-Yuan Chen, Chih-Hung Chang, Yi-Pei Chen, Jijnasa Nayak, Lun-Wei Ku,	In this paper, we propose UHop, an unrestricted-hop framework which relaxes this restriction by use of a transition-based search framework to replace the relation-chain-based search one.
32	BAG: Bi-directional Attention Entity Graph Convolutional Network for Multi-hop Reasoning Question Answering	Yu Cao, Meng Fang, Dacheng Tao,	We propose a Bi-directional Attention Entity Graph Convolutional Network (BAG), leveraging relationships between nodes in an entity graph and attention information between a query and the entity graph, to solve this task.
33	Vector of Locally-Aggregated Word Embeddings (VLAWE): A Novel Document-level Representation	Radu Tudor Ionescu, Andrei Butnaru,	In this paper, we propose a novel representation for text documents based on aggregating word embedding vectors into document embeddings.
34	Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis	Md Shad Akhtar, Dushyant Chauhan, Deepanway Ghosal, Soujanya Poria, Asif Ekbal, Pushpak Bhattacharyya,	In this paper, we present a deep multi-task learning framework that jointly performs sentiment and emotion analysis both.
35	Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence	Chi Sun, Luyao Huang, Xipeng Qiu,	In this paper, we construct an auxiliary sentence from the aspect and convert ABSA to a sentence-pair classification task, such as question answering (QA) and natural language inference (NLI).
36	A Variational Approach to Weakly Supervised Document-Level Multi-Aspect Sentiment Classification	Ziqian Zeng, Wenxuan Zhou, Xin Liu, Yangqiu Song,	In this paper, we propose a variational approach to weakly supervised document-level multi-aspect sentiment classification.
37	HiGRU: Hierarchical Gated Recurrent Units for Utterance-Level Emotion Recognition	Wenxiang Jiao, Haiqin Yang, Irwin King, Michael R. Lyu,	In this paper, we address three challenges in utterance-level emotion recognition in dialogue systems: (1) the same word can deliver different emotions in different contexts; (2) some emotions are rarely seen in general dialogues; (3) long-range contextual information is hard to be effectively captured.
38	Learning Interpretable Negation Rules via Weak Supervision at Document Level: A Reinforcement Learning Approach	Nicolas Pröllochs, Stefan Feuerriegel, Dirk Neumann,	To the best of our knowledge, our work presents the first approach that eliminates the need for world-level negation labels, replacing it instead with document-level sentiment annotations.
39	Simplified Neural Unsupervised Domain Adaptation	Timothy Miller,	In this work, we show that it is possible to improve on existing neural domain adaptation algorithms by 1) jointly training the representation learner with the task learner; and 2) removing the need for heuristically-selected “pivot features.”
40	Learning Bilingual Sentiment-Specific Word Embeddings without Cross-lingual Supervision	Yanlin Feng, Xiaojun Wan,	In this work, we propose UBiSE (Unsupervised Bilingual Sentiment Embeddings), which learns sentiment-specific word representations for two languages in a common space without any cross-lingual supervision.
41	ReWE: Regressing Word Embeddings for Regularization of Neural Machine Translation Systems	Inigo Jauregi Unanue, Ehsan Zare Borzeshi, Nazanin Esmaili, Massimo Piccardi,	To mollify this problem, we propose regressing word embeddings (ReWE) as a new regularization technique in a system that is jointly trained to predict the next word in the translation (categorical value) and its word embedding (continuous value).
42	Lost in Machine Translation: A Method to Reduce Meaning Loss	Reuben Cohn-Gordon, Noah Goodman,	Building on Bayesian models of informative utterance production, we present a method to define a less ambiguous translation system in terms of an underlying pre-trained neural sequence-to-sequence model.
43	Bi-Directional Differentiable Input Reconstruction for Low-Resource Neural Machine Translation	Xing Niu, Weijia Xu, Marine Carpuat,	We aim to better exploit the limited amounts of parallel text available in low-resource settings by introducing a differentiable reconstruction loss for neural machine translation (NMT).
44	Code-Switching for Enhancing NMT with Pre-Specified Translation	Kai Song, Yue Zhang, Heng Yu, Weihua Luo, Kun Wang, Min Zhang,	We investigate a data augmentation method, making code-switched training data by replacing source phrases with their target translations.
45	Aligning Vector-spaces with Noisy Supervised Lexicon	Noa Yehezkel Lubin, Jacob Goldberger, Yoav Goldberg,	We propose a model that accounts for noisy pairs.
46	Understanding and Improving Hidden Representations for Neural Machine Translation	Guanlin Li, Lemao Liu, Xintong Li, Conghui Zhu, Tiejun Zhao, Shuming Shi,	Towards understanding for performance improvement, we first artificially construct a sequence of nested relative tasks and measure the feature generalization ability of the learned hidden representation over these tasks.
47	Content Differences in Syntactic and Semantic Representation	Daniel Hershcovich, Omri Abend, Ari Rappoport,	We target this gap, and take Universal Dependencies (UD) and UCCA as a test case.
48	Attentive Mimicking: Better Word Embeddings by Attending to Informative Contexts	Timo Schick, Hinrich Schütze,	In this paper, we introduce attentive mimicking: the mimicking model is given access not only to a word’s surface form, but also to all available contexts and learns to attend to the most informative and reliable contexts for computing an embedding.
49	Evaluating Style Transfer for Text	Remi Mir, Bjarke Felbo, Nick Obradovich, Iyad Rahwan,	We propose a set of metrics for automated evaluation and demonstrate that they are more strongly correlated and in agreement with human judgment: direction-corrected Earth Mover’s Distance, Word Mover’s Distance on style-masked texts, and adversarial classification for the respective aspects.
50	Big BiRD: A Large, Fine-Grained, Bigram Relatedness Dataset for Examining Semantic Composition	Shima Asaadi, Saif Mohammad, Svetlana Kiritchenko,	In this paper, we describe how we created a large, fine-grained, bigram relatedness dataset (BiRD), using a comparative annotation technique called Best-Worst Scaling.
51	Outlier Detection for Improved Data Quality and Diversity in Dialog Systems	Stefan Larson, Anish Mahendran, Andrew Lee, Jonathan K. Kummerfeld, Parker Hill, Michael A. Laurenzano, Johann Hauswald, Lingjia Tang, Jason Mars,	We introduce a simple and effective technique for detecting both erroneous and unique samples in a corpus of short texts using neural sentence embeddings combined with distance-based outlier detection.
52	Asking the Right Question: Inferring Advice-Seeking Intentions from Personal Narratives	Liye Fu, Jonathan P. Chang, Cristian Danescu-Niculescu-Mizil,	To test the capabilities of NLP systems to recover such intuition, we introduce the new task of inferring what is the advice-seeking goal behind a personal narrative.
53	Seeing Things from a Different Angle:Discovering Diverse Perspectives about Claims	Sihao Chen, Daniel Khashabi, Wenpeng Yin, Chris Callison-Burch, Dan Roth,	Inherently, this is a natural language understanding task, and we propose to address it as such.
54	IMHO Fine-Tuning Improves Claim Detection	Tuhin Chakrabarty, Christopher Hidey, Kathy McKeown,	We propose to alleviate this problem by fine-tuning a language model using a Reddit corpus of 5.5 million opinionated claims.
55	Joint Multiple Intent Detection and Slot Labeling for Goal-Oriented Dialog	Rashmi Gangadharaiah, Balakrishnan Narayanaswamy,	We investigate an attention-based neural network model that performs multi-label classification for identifying multiple intents and produces labels for both intents and slot-labels at the token-level.
56	CITE: A Corpus of Image-Text Discourse Relations	Malihe Alikhani, Sreyasi Nag Chowdhury, Gerard de Melo, Matthew Stone,	This paper presents a novel crowd-sourced resource for multimodal discourse: our resource characterizes inferences in image-text contexts in the domain of cooking recipes in the form of coherence relations.
57	Improving Dialogue State Tracking by Discerning the Relevant Context	Sanuj Sharma, Prafulla Kumar Choubey, Ruihong Huang,	We, therefore, propose a novel framework for DST that identifies relevant historical context by referring to the past utterances where a particular slot-value changes and uses that together with weighted system utterance to identify the relevant context.
58	CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog	Satwik Kottur, José M. F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach,	We develop CLEVR-Dialog, a large diagnostic dataset for studying multi-round reasoning in visual dialog.
59	Learning Outside the Box: Discourse-level Features Improve Metaphor Identification	Jesse Mu, Helen Yannakoudakis, Ekaterina Shutova,	Inspired by pragmatic accounts of metaphor, we argue that broader discourse features are crucial for better metaphor identification.
60	Detection of Abusive Language: the Problem of Biased Datasets	Michael Wiegand, Josef Ruppenhofer, Thomas Kleinbauer,	We discuss the impact of data bias on abusive language detection.
61	Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them	Hila Gonen, Yoav Goldberg,	Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them.
62	Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings	Thomas Manzini, Lim Yao Chong, Alan W. Black, Yulia Tsvetkov,	In this work, we propose a method to debias word embeddings in multiclass settings such as race and religion, extending the work of (Bolukbasi et al., 2016) from the binary setting, such as binary gender.
63	On Measuring Social Biases in Sentence Encoders	Chandler May, Alex Wang, Shikha Bordia, Samuel R. Bowman, Rachel Rudinger,	Accordingly, we extend the Word Embedding Association Test to measure bias in sentence encoders.
64	Gender Bias in Contextualized Word Embeddings	Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, Kai-Wei Chang,	In this paper, we quantify, analyze and mitigate gender bias exhibited in ELMo’s contextualized word vectors.
65	Combining Sentiment Lexica with a Multi-View Variational Autoencoder	Alexander Miserlis Hoyle, Lawrence Wolf-Sonkin, Hanna Wallach, Ryan Cotterell, Isabelle Augenstein,	We introduce a generative model of sentiment lexica to combine disparate scales into a common latent representation.
66	Enhancing Opinion Role Labeling with Semantic-Aware Word Representations from Semantic Role Labeling	Meishan Zhang, Peili Liang, Guohong Fu,	In this work, we propose a simple and novel method to enhance ORL by utilizing SRL, presenting semantic-aware word representations which are learned from SRL.
67	Frowning Frodo, Wincing Leia, and a Seriously Great Friendship: Learning to Classify Emotional Relationships of Fictional Characters	Evgeny Kim, Roman Klinger,	In this paper, we combine these aspects into a unified framework to classify emotional relationships of fictional characters.
68	Generalizing Unmasking for Short Texts	Janek Bevendorff, Benno Stein, Matthias Hagen, Martin Potthast,	In this paper, we present a generalized unmasking approach which allows for authorship verification of texts as short as four printed pages with very high precision at an adjustable recall tradeoff.
69	Adversarial Training for Satire Detection: Controlling for Confounding Variables	Robert McHardy, Heike Adel, Roman Klinger,	We therefore propose a novel model for satire detection with an adversarial component to control for the confounding variable of publication source.
70	Keyphrase Generation: A Text Summarization Struggle	Erion Çano, Ond?ej Bojar,	In this paper, we explore the possibility of considering the keyphrase string as an abstractive summary of the title and the abstract. First, we collect, process and release a large dataset of scientific paper metadata that contains 2.2 million records.
71	SEQ^3: Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive Sentence Compression	Christos Baziotis, Ion Androutsopoulos, Ioannis Konstas, Alexandros Potamianos,	We present a sequence-to-sequence-to-sequence autoencoder (SEQ{\^{}}3), consisting of two chained encoder-decoder pairs, with words used as a sequence of discrete latent variables.
72	Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation	Ori Shapira, David Gabay, Yang Gao, Hadar Ronen, Ramakanth Pasunuru, Mohit Bansal, Yael Amsterdamer, Ido Dagan,	We revisit the Pyramid approach, proposing a lightweight sampling-based version that is crowdsourcable.
73	Serial Recall Effects in Neural Language Modeling	Hassan Hajipoor, Hadi Amiri, Maseud Rahgozar, Farhad Oroumchian,	In this research, we investigate neural language models in the context of these serial recall effects.
74	Fast Concept Mention Grouping for Concept Map-based Multi-Document Summarization	Tobias Falke, Iryna Gurevych,	In this paper, we propose two alternative grouping techniques based on locality sensitive hashing, approximate nearest neighbor search and a fast clustering algorithm.
75	Syntax-aware Neural Semantic Role Labeling with Supertags	Jungo Kasai, Dan Friedman, Robert Frank, Dragomir Radev, Owen Rambow,	We introduce a new syntax-aware model for dependency-based semantic role labeling that outperforms syntax-agnostic models for English and Spanish.
76	Left-to-Right Dependency Parsing with Pointer Networks	Daniel Fernández-González, Carlos Gómez-Rodríguez,	We propose a novel transition-based algorithm that straightforwardly parses sentences from left to right by building n attachments, with n being the length of the input sentence.
77	Viable Dependency Parsing as Sequence Labeling	Michalina Strzyz, David Vilares, Carlos Gómez-Rodríguez,	We show instead that with a conventional BILSTM-based model it is possible to obtain fast and accurate parsers.
78	Pooled Contextualized Embeddings for Named Entity Recognition	Alan Akbik, Tanja Bergmann, Roland Vollgraf,	To address this drawback, we propose a method in which we dynamically aggregate contextualized embeddings of each unique string that we encounter.
79	Better Modeling of Incomplete Annotations for Named Entity Recognition	Zhanming Jie, Pengjun Xie, Wei Lu, Ruixue Ding, Linlin Li,	We highlight several pitfalls associated with learning under such a setup in the context of NER and identify limitations associated with existing approaches, proposing a novel yet easy-to-implement approach for recognizing named entities with incomplete data annotations.
80	Event Detection without Triggers	Shulin Liu, Yang Li, Feng Zhang, Tao Yang, Xinpeng Zhou,	In this work, we propose a novel framework dubbed as Type-aware Bias Neural Network with Attention Mechanisms (TBNNAM), which encodes the representation of a sentence based on target event types.
81	Sub-event detection from twitter streams as a sequence labeling problem	Giannis Bekoulis, Johannes Deleu, Thomas Demeester, Chris Develder,	This paper introduces improved methods for sub-event detection in social media streams, by applying neural sequence models not only on the level of individual posts, but also directly on the stream level.
82	GraphIE: A Graph-Based Framework for Information Extraction	Yujie Qian, Enrico Santus, Zhijing Jin, Jiang Guo, Regina Barzilay,	In this paper, we introduce GraphIE, a framework that operates over a graph representing a broad set of dependencies between textual units (i.e. words or sentences).
83	OpenKI: Integrating Open Information Extraction and Knowledge Bases with Relation Inference	Dongxu Zhang, Subhabrata Mukherjee, Colin Lockard, Luna Dong, Andrew McCallum,	In this paper, we consider advancing web-scale knowledge extraction and alignment by integrating OpenIE extractions in the form of (subject, predicate, object) triples with Knowledge Bases (KB).
84	Imposing Label-Relational Inductive Bias for Extremely Fine-Grained Entity Typing	Wenhan Xiong, Jiawei Wu, Deren Lei, Mo Yu, Shiyu Chang, Xiaoxiao Guo, William Yang Wang,	To model the underlying label correlations without access to manually annotated label structures, we introduce a novel label-relational inductive bias, represented by a graph propagation layer that effectively encodes both global label co-occurrence statistics and word-level similarities.
85	Improving Event Coreference Resolution by Learning Argument Compatibility from Unlabeled Data	Yin Jou Huang, Jing Lu, Sadao Kurohashi, Vincent Ng,	In this work, we propose a transfer learning framework for event coreference resolution that utilizes a large amount of unlabeled data to learn argument compatibility of event mentions.
86	Sentence Embedding Alignment for Lifelong Relation Extraction	Hong Wang, Wenhan Xiong, Mo Yu, Xiaoxiao Guo, Shiyu Chang, William Yang Wang,	Specifically, we utilize an explicit alignment model to mitigate the sentence embedding distortion of learned model when training on new data and new relations.
87	Description-Based Zero-shot Fine-Grained Entity Typing	Rasha Obeidat, Xiaoli Fern, Hamed Shahbazi, Prasad Tadepalli,	This work proposes a zero-shot entity typing approach that utilizes the type description available from Wikipedia to build a distributed semantic representation of the types.
88	Adversarial Decomposition of Text Representation	Alexey Romanov, Anna Rumshisky, Anna Rogers, David Donahue,	In this paper, we present a method for adversarial decomposition of text representation.
89	PoMo: Generating Entity-Specific Post-Modifiers in Context	Jun Seok Kang, Robert Logan, Zewei Chu, Yang Chen, Dheeru Dua, Kevin Gimpel, Sameer Singh, Niranjan Balasubramanian,	We introduce entity post-modifier generation as an instance of a collaborative writing task.
90	Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting	J. Edward Hu, Huda Khayrallah, Ryan Culkin, Patrick Xia, Tongfei Chen, Matt Post, Benjamin Van Durme,	We describe vectorized dynamic beam allocation, which extends work in lexically-constrained decoding to work with batching, leading to a five-fold improvement in throughput when working with positive constraints.
91	Courteously Yours: Inducing courteous behavior in Customer Care responses using Reinforced Pointer Generator Network	Hitesh Golchha, Mauajama Firdaus, Asif Ekbal, Pushpak Bhattacharyya,	In this paper, we propose an effective deep learning framework for inducing courteous behavior in customer care responses.
92	How to Avoid Sentences Spelling Boring? Towards a Neural Approach to Unsupervised Metaphor Generation	Zhiwei Yu, Xiaojun Wan,	In order to create novel metaphors, we propose a neural approach to metaphor generation and explore the shared inferential structure of a metaphorical usage and a literal usage of a verb.
93	Incorporating Context and External Knowledge for Pronoun Coreference Resolution	Hongming Zhang, Yan Song, Yangqiu Song,	In this paper, we propose a two-layer model for pronoun coreference resolution that leverages both context and external knowledge, where a knowledge attention mechanism is designed to ensure the model leveraging the appropriate source of external knowledge based on different context.
94	Unsupervised Deep Structured Semantic Models for Commonsense Reasoning	Shuohang Wang, Sheng Zhang, Yelong Shen, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Jing Jiang,	We propose two neural network models based on the Deep Structured Semantic Models (DSSM) framework to tackle two classic commonsense reasoning tasks, Winograd Schema challenges (WSC) and Pronoun Disambiguation (PDP).
95	Recovering dropped pronouns in Chinese conversations via modeling their referents	Jingxuan Yang, Jianzhuo Tong, Si Li, Sheng Gao, Jun Guo, Nianwen Xue,	In this work, we present a novel end-to-end neural network model to recover dropped pronouns in conversational data.
96	The problem with probabilistic DAG automata for semantic graphs	Ieva Vasiljeva, Sorcha Gilroy, Adam Lopez,	We show that some DAG automata cannot be made into useful probabilistic models by the nearly universal strategy of assigning weights to transitions.
97	A Systematic Study of Leveraging Subword Information for Learning Word Representations	Yi Zhu, Ivan Vuli?, Anna Korhonen,	In this work, we deliver such a study focusing on the variation of two crucial components required for subword-level integration into word representation models: 1) segmentation of words into subword units, and 2) subword composition functions to obtain final word representations.
98	Better Word Embeddings by Disentangling Contextual n-Gram Information	Prakhar Gupta, Matteo Pagliardini, Martin Jaggi,	In this paper, we show how training word embeddings jointly with bigram and even trigram embeddings, results in improved unigram embeddings.
99	Integration of Knowledge Graph Embedding Into Topic Modeling with Hierarchical Dirichlet Process	Dingcheng Li, Siamak Zamani, Jingyuan Zhang, Ping Li,	In this paper, we develop \textit{topic modeling with knowledge graph embedding} (TMKGE), a Bayesian nonparametric model to employ knowledge graph (KG) embedding in the context of topic modeling, for extracting more coherent topics.
100	Correlation Coefficients and Semantic Textual Similarity	Vitalii Zhelezniak, Aleksandar Savkov, April Shen, Nils Hammerla,	In this work, we illustrate that for all common word vectors, cosine similarity is essentially equivalent to the Pearson correlation coefficient, which provides some justification for its use.
101	Generating Token-Level Explanations for Natural Language Inference	James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal,	In this paper, we show that it is possible to generate token-level explanations for NLI without the need for training data explicitly annotated for this purpose.
102	Strong Baselines for Complex Word Identification across Multiple Languages	Pierre Finnimore, Elisabeth Fritzsch, Daniel King, Alison Sneyd, Aneeq Ur Rehman, Fernando Alva-Manchego, Andreas Vlachos,	In this paper, we present monolingual and cross-lingual CWI models that perform as well as (or better than) most models submitted to the latest CWI Shared Task.
103	Adaptive Convolution for Multi-Relational Learning	Xiaotian Jiang, Quan Wang, Bin Wang,	In this work we introduce ConvR, an adaptive convolutional network designed to maximize entity-relation interactions in a convolutional fashion.
104	Graph Pattern Entity Ranking Model for Knowledge Graph Completion	Takuma Ebisu, Ryutaro Ichise,	In this paper, we utilize graph patterns in a knowledge graph to overcome such problems.
105	Adversarial Training for Weakly Supervised Event Detection	Xiaozhi Wang, Xu Han, Zhiyuan Liu, Maosong Sun, Peng Li,	To address these issues, we build a large event-related candidate set with good coverage and then apply an adversarial training mechanism to iteratively identify those informative instances from the candidate set and filter out those noisy ones.
106	A Submodular Feature-Aware Framework for Label Subset Selection in Extreme Classification Problems	Elham J. Barezi, Ian D. Wood, Pascale Fung, Hamid R. Rabiee,	We propose a submodular maximization framework with linear cost to find informative labels which are most relevant to other labels yet least redundant with each other.
107	Relation Extraction with Temporal Reasoning Based on Memory Augmented Distant Supervision	Jianhao Yan, Lin He, Ruqin Huang, Jian Li, Ying Liu,	This paper formulates the problem of relation extraction with temporal reasoning and proposes a solution to predict whether two given entities participate in a relation at a given time spot. For this purpose, we construct a dataset called WIKI-TIME which additionally includes the valid period of a certain relation of two entities in the knowledge base.
108	Integrating Semantic Knowledge to Tackle Zero-shot Text Classification	Jingqing Zhang, Piyawat Lertvittayakumjorn, Yike Guo,	In this paper, we propose a two-phase framework together with data augmentation and feature augmentation to solve this problem.
109	Word-Node2Vec: Improving Word Embedding with Document-Level Non-Local Word Co-occurrences	Procheta Sen, Debasis Ganguly, Gareth Jones,	In this paper, we propose a graph-based word embedding method, named word-node2vec’.
110	Cross-Topic Distributional Semantic Representations Via Unsupervised Mappings	Eleftheria Briakou, Nikos Athanasiou, Alexandros Potamianos,	In this work, we propose a DSM that learns multiple distributional representations of a word based on different topics.
111	What just happened? Evaluating retrofitted distributional word vectors	Dmetri Hayes,	We propose root-mean-square error (RMSE) as an alternative evaluation metric, and demonstrate that correlation measures and RMSE sometimes yield opposite conclusions concerning the efficacy of retrofitting.
112	Linguistic Knowledge and Transferability of Contextual Representations	Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, Noah A. Smith,	To shed light on the linguistic knowledge they capture, we study the representations produced by several recent pretrained contextualizers (variants of ELMo, the OpenAI transformer language model, and BERT) with a suite of sixteen diverse probing tasks.
113	Mutual Information Maximization for Simple and Accurate Part-Of-Speech Induction	Karl Stratos,	We focus on two training objectives that are amenable to stochastic gradient descent (SGD): a novel generalization of the classical Brown clustering objective and a recently proposed variational lower bound.
114	Unsupervised Recurrent Neural Network Grammars	Yoon Kim, Alexander Rush, Lei Yu, Adhiguna Kuncoro, Chris Dyer, Gábor Melis,	In this work, we experiment with unsupervised learning of RNNGs.
115	Cooperative Learning of Disjoint Syntax and Semantics	Serhii Havrylov, Germán Kruszewski, Armand Joulin,	In this work, we present a recursive model inspired by Choi et al. (2018) that reaches near perfect accuracy on this task.
116	Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders	Andrew Drozdov, Patrick Verga, Mohit Yadav, Mohit Iyyer, Andrew McCallum,	We introduce the deep inside-outside recursive autoencoder (DIORA), a fully-unsupervised method for discovering syntax that simultaneously learns representations for constituents within the induced tree.
117	Knowledge-Augmented Language Model and Its Application to Unsupervised Named-Entity Recognition	Angli Liu, Jingfei Du, Veselin Stoyanov,	Our work demonstrates that named entities (and possibly other types of world knowledge) can be modeled successfully using predictive learning and training on large corpora of text without any additional information.
118	Syntax-Enhanced Neural Machine Translation with Syntax-Aware Word Representations	Meishan Zhang, Zhenghua Li, Guohong Fu, Min Zhang,	In this work, we propose a novel method to integrate source-side syntax implicitly for NMT.
119	Competence-based Curriculum Learning for Neural Machine Translation	Emmanouil Antonios Platanios, Otilia Stretcu, Graham Neubig, Barnabas Poczos, Tom Mitchell,	In this paper, we propose a curriculum learning framework for NMT that reduces training time, reduces the need for specialized heuristics or large batch sizes, and results in overall better performance.
120	Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation	Jiawei Wu, Xin Wang, William Yang Wang,	To avoid this fundamental issue, we propose an alternative but more effective approach, extract-edit, to extract and then edit real sentences from the target monolingual corpora.
121	Consistency by Agreement in Zero-Shot Neural Machine Translation	Maruan Al-Shedivat, Ankur Parikh,	In this paper, we focus on zero-shot generalization-a challenging setup that tests models on translation directions they have not been optimized for at training time.
122	Modeling Recurrence for Transformer	Jie Hao, Xing Wang, Baosong Yang, Longyue Wang, Jinfeng Zhang, Zhaopeng Tu,	In response to this problem, we propose to directly model recurrence for Transformer with an additional recurrence encoder.
123	Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models	Tiancheng Zhao, Kaige Xie, Maxine Eskenazi,	This paper proposes a novel latent action framework that treats the action spaces of an end-to-end dialog agent as latent variables and develops unsupervised methods in order to induce its own action space from the data.
124	Skeleton-to-Response: Dialogue Generation Guided by Retrieval Memory	Deng Cai, Yan Wang, Wei Bi, Zhaopeng Tu, Xiaojiang Liu, Wai Lam, Shuming Shi,	In this paper, we propose a new framework which exploits retrieval results via a skeleton-to-response paradigm.
125	Jointly Optimizing Diversity and Relevance in Neural Response Generation	Xiang Gao, Sungjin Lee, Yizhe Zhang, Chris Brockett, Michel Galley, Jianfeng Gao, Bill Dolan,	In this paper, we propose a SpaceFusion model to jointly optimize diversity and relevance that essentially fuses the latent space of a sequence-to-sequence model and that of an autoencoder model by leveraging novel regularization terms.
126	Disentangling Language and Knowledge in Task-Oriented Dialogs	Dinesh Raghu, Nikhil Gupta, Mausam,	We propose an encoder-decoder architecture (BoSsNet) with a novel Bag-of-Sequences (BoSs) memory, which facilitates the disentangled learning of the response’s language model and its knowledge incorporation.
127	Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together	Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang,	In this paper, we propose a novel attention mechanism called “Multi-mask Tensorized Self-Attention” (MTSA), which is as fast and as memory-efficient as a CNN, but significantly outperforms previous CNN-/RNN-/attention-based models.
128	WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations	Mohammad Taher Pilehvar, Jose Camacho-Collados,	In this paper we show that existing models have surpassed the performance ceiling of the standard evaluation dataset for the purpose, i.e., Stanford Contextual Word Similarity, and highlight its shortcomings.
129	Does My Rebuttal Matter? Insights from a Major NLP Conference	Yang Gao, Steffen Eger, Ilia Kuznetsov, Iryna Gurevych, Yusuke Miyao,	Aiming to fill this gap, we present a corpus that contains over 4k reviews and 1.2k author responses from ACL-2018.
130	Casting Light on Invisible Cities: Computationally Engaging with Literary Criticism	Shufan Wang, Mohit Iyyer,	While most previous work focuses on “distant reading” by algorithmically discovering high-level patterns from large collections of literary works, here we sharpen the focus of our methods to a single literary theory about Italo Calvino’s postmodern novel Invisible Cities, which consists of 55 short descriptions of imaginary cities.
131	PAWS: Paraphrase Adversaries from Word Scrambling	Yuan Zhang, Jason Baldridge, Luheng He,	This paper introduces PAWS (Paraphrase Adversaries from Word Scrambling), a new dataset with 108,463 well-formed paraphrase and non-paraphrase pairs with high lexical overlap.
132	Cross-Corpora Evaluation and Analysis of Grammatical Error Correction Models — Is Single-Corpus Evaluation Enough?	Masato Mita, Tomoya Mizumoto, Masahiro Kaneko, Ryo Nagata, Kentaro Inui,	This study explores the necessity of performing cross-corpora evaluation for grammatical error correction (GEC) models.
133	Star-Transformer	Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, Zheng Zhang,	In this paper, we present Star-Transformer, a lightweight alternative by careful sparsification.
134	Adaptation of Hierarchical Structured Models for Speech Act Recognition in Asynchronous Conversation	Tasnim Mohiuddin, Thanh-Tung Nguyen, Shafiq Joty,	In this paper, we propose methods to effectively leverage abundant unlabeled conversational data and the available labeled data from synchronous domains.
135	From legal to technical concept: Towards an automated classification of German political Twitter postings as criminal offenses	Frederike Zufall, Tobias Horsmann, Torsten Zesch,	In this article, we analyze which Twitter posts could actually be deemed offenses under German criminal law.
136	Joint Multi-Label Attention Networks for Social Text Annotation	Hang Dong, Wei Wang, Kaizhu Huang, Frans Coenen,	We propose a novel attention network for document annotation with user-generated tags.
137	Multi-Channel Convolutional Neural Network for Twitter Emotion and Sentiment Recognition	Jumayel Islam, Robert E. Mercer, Lu Xiao,	In this paper, we propose a novel use of a multi-channel convolutional neural architecture which can effectively use different emotion and sentiment indicators such as hashtags, emoticons and emojis that are present in the tweets and improve the performance of emotion and sentiment identification.
138	Detecting Cybersecurity Events from Noisy Short Text	Semih Yagcioglu, Mehmet saygin Seyfioglu, Begum Citamak, Batuhan Bardak, Seren Guldamlasioglu, Azmi Yuksel, Emin Islam Tatli,	In this study, we propose a method that leverages both domain-specific word embeddings and task-specific features to detect cyber security events from tweets. We collected a new dataset of cyber security related tweets from Twitter and manually annotated a subset of 2K of them.
139	White-to-Black: Efficient Distillation of Black-Box Adversarial Attacks	Yotam Gil, Yoav Chai, Or Gorodissky, Jonathan Berant,	In this work, we show that the knowledge implicit in the optimization procedure can be distilled into another more efficient neural network.
140	Analyzing the Perceived Severity of Cybersecurity Threats Reported on Social Media	Shi Zong, Alan Ritter, Graham Mueller, Evan Wright,	In this paper, we investigate methods to analyze the severity of cybersecurity threats based on the language that is used to describe them online.
141	Fake News Detection using Deep Markov Random Fields	Duc Minh Nguyen, Tien Huu Do, Robert Calderbank, Nikos Deligiannis,	To overcome this limitation, we develop a graph-theoretic method that inherits the power of deep learning while at the same time utilizing the correlations among the articles.
142	Issue Framing in Online Discussion Fora	Mareike Hartmann, Tallulah Jansen, Isabelle Augenstein, Anders Søgaard,	In this paper, we introduce a new issue frame annotated corpus of online discussions.
143	Vector of Locally Aggregated Embeddings for Text Representation	Hadi Amiri, Mitra Mohtarami,	We present Vector of Locally Aggregated Embeddings (VLAE) for effective and, ultimately, lossless representation of textual content.
144	Predicting the Type and Target of Offensive Posts in Social Media	Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar,	In contrast, here we target several different kinds of offensive content.
145	Biomedical Event Extraction based on Knowledge-driven Tree-LSTM	Diya Li, Lifu Huang, Heng Ji, Jiawei Han,	To better encode contextual information and external background knowledge, we propose a novel knowledge base (KB)-driven tree-structured long short-term memory networks (Tree-LSTM) framework, incorporating two new types of features: (1) dependency structures to capture wide contexts; (2) entity properties (types and category descriptions) from external ontologies via entity linking.
146	Detecting cognitive impairments by agreeing on interpretations of linguistic features	Zining Zhu, Jekaterina Novikova, Frank Rudzicz,	In this paper, we take a third approach, proposing Consensus Networks (CNs), a framework to classify after reaching agreements between modalities.
147	Relation Extraction using Explicit Context Conditioning	Gaurav Singh, Parminder Bhatia,	We refer to such indirect relations as second-order relations, and describe an efficient implementation for computing them.
148	Conversation Model Fine-Tuning for Classifying Client Utterances in Counseling Dialogues	Sungjoon Park, Donghyun Kim, Alice Oh,	With proper anonymization, we collect counselor-client dialogues, define meaningful categories of client utterances with professional counselors, and develop a novel neural network model for classifying the client utterances.
149	Using Similarity Measures to Select Pretraining Data for NER	Xiang Dai, Sarvnaz Karimi, Ben Hachey, Cecile Paris,	We propose three cost-effective measures to quantify different aspects of similarity between source pretraining and target task data.
150	Predicting Annotation Difficulty to Improve Task Routing and Model Performance for Biomedical Information Extraction	Yinfei Yang, Oshin Agarwal, Chris Tar, Byron C. Wallace, Ani Nenkova,	In this paper we demonstrate that directly modeling instance difficulty can be used to improve model performance and to route instances to appropriate annotators.
151	Detecting Depression in Social Media using Fine-Grained Emotions	Mario Ezra Aragon, Adrian Pastor Lopez Monroy, Luis Carlos Gonzalez Gurrola, Manuel Montes-y-Gomez,	We propose a new representation called Bag of Sub-Emotions (BoSE), which represents social media documents by a set of fine-grained emotions automatically generated using a lexical resource of emotions and subword embeddings.
152	A Silver Standard Corpus of Human Phenotype-Gene Relations	Diana Sousa, Andre Lamurias, Francisco M Couto,	This paper presents the Phenotype-Gene Relations (PGR) corpus, a silver standard corpus of human phenotype and gene annotations and their relations. We generated this corpus using Named-Entity Recognition tools, whose results were partially evaluated by eight curators, obtaining a precision of 87.01%.
153	Improving Lemmatization of Non-Standard Languages with Joint Learning	Enrique Manjavacas, Ákos Kádár, Mike Kestemont,	In the present paper we aim to improve lemmatization performance on a set of non-standard historical languages in which the difficulty is increased by an additional aspect (iii): spelling variation due to lacking orthographic standards. Finally, to encourage future work on processing of non-standard varieties, we release the dataset of non-standard languages underlying the present study, which is based on openly accessible sources.
154	One Size Does Not Fit All: Comparing NMT Representations of Different Granularities	Nadir Durrani, Fahim Dalvi, Hassan Sajjad, Yonatan Belinkov, Preslav Nakov,	We found that while representations derived from subwords are slightly better for modeling syntax, character-based representations are superior for modeling morphology and are also more robust to noisy input.
155	A Simple Joint Model for Improved Contextual Neural Lemmatization	Chaitanya Malaviya, Shijie Wu, Ryan Cotterell,	We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages from the Universal Dependencies corpora.
156	A Probabilistic Generative Model of Linguistic Typology	Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein,	By modelling all languages and features within the same architecture, we show how structural similarities between languages can be exploited to predict typological features with near-perfect accuracy, outperforming several baselines on the task of predicting held-out features.
157	Quantifying the morphosyntactic content of Brown Clusters	Manuel Ciosici, Leon Derczynski, Ira Assent,	We show that increases in Average Mutual Information, the clustering algorithms’ optimization goal, are highly correlated with improvements in encoding of morphosyntactic information.
158	Analyzing Bayesian Crosslingual Transfer in Topic Models	Shudong Hao, Michael J. Paul,	We introduce a theoretical analysis of crosslingual transfer in probabilistic topic models.
159	Recursive Subtree Composition in LSTM-Based Dependency Parsing	Miryam de Lhoneux, Miguel Ballesteros, Joakim Nivre,	We investigate the impact of adding a tree layer on top of a sequential model by recursively composing subtree representations (composition) in a transition-based parser that uses features extracted by a BiLSTM.
160	Cross-lingual CCG Induction	Kilian Evang,	We propose an alternative making use of cross-lingual learning: an existing source-language parser is used together with a parallel corpus to induce a grammar and parsing model for a target language.
161	Density Matching for Bilingual Word Embedding	Chunting Zhou, Xuezhe Ma, Di Wang, Graham Neubig,	In this paper, we propose an approach that instead expresses the two monolingual embedding spaces as probability densities defined by a Gaussian mixture model, and matches the two densities using a method called normalizing flow.
162	Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing	Tal Schuster, Ori Ram, Regina Barzilay, Amir Globerson,	We introduce a novel method for multilingual transfer that utilizes deep contextual embeddings, pretrained in an unsupervised fashion.
163	Early Rumour Detection	Kaimin Zhou, Chang Shu, Binyang Li, Jey Han Lau,	To address this, we present a novel methodology for early rumour detection.
164	Microblog Hashtag Generation via Encoding Conversation Contexts	Yue Wang, Jing Li, Irwin King, Michael R. Lyu, Shuming Shi,	Different from previous work considering hashtags to be inseparable, our work is the first effort to annotate hashtags with a novel sequence generation framework via viewing the hashtag as a short sequence of words.
165	Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems	Steffen Eger, Gözde Gül ?ahin, Andreas Rücklé, Ji-Ung Lee, Claudia Schulz, Mohsen Mesgar, Krishnkant Swarnkar, Edwin Simpson, Iryna Gurevych,	We consider this as a new type of adversarial attack in NLP, a setting to which humans are very robust, as our experiments with both simple and more difficult visual perturbations demonstrate.
166	Something’s Brewing! Early Prediction of Controversy-causing Posts from Discussion Features	Jack Hessel, Lillian Lee,	Using data from several different communities on reddit.com, we predict the ultimate controversiality of posts, leveraging features drawn from both the textual content and the tree structure of the early comments that initiate the discussion.
167	No Permanent Friends or Enemies: Tracking Relationships between Nations from News	Xiaochuang Han, Eunsol Choi, Chenhao Tan,	In this work, we explore unsupervised neural models to infer relations between nations from news articles.
168	Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation	Sebastian Gehrmann, Steven Layne, Franck Dernoncourt,	In particular, we present an extractive pipeline for section title generation by first selecting the most salient sentence and then applying deletion-based compression.
169	Unifying Human and Statistical Evaluation for Natural Language Generation	Tatsunori Hashimoto, Hugh Zhang, Percy Liang,	In this paper, we propose a unified framework which evaluates both diversity and quality, based on the optimal error rate of predicting whether a sentence is human- or machine-generated.
170	What makes a good conversation? How controllable attributes affect human judgments	Abigail See, Stephen Roller, Douwe Kiela, Jason Weston,	In this work, we examine two controllable neural text generation methods, conditional training and weighted decoding, in order to control four important attributes for chit-chat dialogue: repetition, specificity, response-relatedness and question-asking.
171	An Empirical Investigation of Global and Local Normalization for Recurrent Neural Sequence Models Using a Continuous Relaxation to Beam Search	Kartik Goyal, Chris Dyer, Taylor Berg-Kirkpatrick,	In this paper, we attempt to shed light on this problem through an empirical study.
172	Pun Generation with Surprise	He He, Nanyun Peng, Percy Liang,	In this paper, we propose an unsupervised approach to pun generation based on lots of raw (unhumorous) text and a surprisal principle.
173	Single Document Summarization as Tree Induction	Yang Liu, Ivan Titov, Mirella Lapata,	In this paper, we conceptualize single-document extractive summarization as a tree induction problem.
174	Fixed That for You: Generating Contrastive Claims with Semantic Edits	Christopher Hidey, Kathy McKeown,	To generate contrastive claims, we create a corpus of Reddit comment pairs self-labeled by posters using the acronym FTFY (fixed that for you).
175	Box of Lies: Multimodal Deception Detection in Dialogues	Felix Soldner, Verónica Pérez-Rosas, Rada Mihalcea,	In this paper, we address the task of detecting multimodal deceptive cues during conversational dialogues. We introduce a multimodal dataset containing deceptive conversations between participants playing the Box of Lies game from The Tonight Show Starring Jimmy Fallon, in which they try to guess whether an object description provided by their opponent is deceptive or not.
176	A Crowdsourced Corpus of Multiple Judgments and Disagreement on Anaphoric Interpretation	Massimo Poesio, Jon Chamberlain, Silviu Paun, Juntao Yu, Alexandra Uma, Udo Kruschwitz,	We present a corpus of anaphoric information (coreference) crowdsourced through a game-with-a-purpose.
177	A Streamlined Method for Sourcing Discourse-level Argumentation Annotations from the Crowd	Tristan Miller, Maria Sukhareva, Iryna Gurevych,	We present a method that breaks down a popular but relatively complex discourse-level argument annotation scheme into a simpler, iterative procedure that can be applied even by untrained annotators.
178	Unsupervised Dialog Structure Learning	Weiyan Shi, Tiancheng Zhao, Zhou Yu,	We propose to extract dialog structures using a modified VRNN model with discrete latent vectors.
179	Modeling Document-level Causal Structures for Event Causal Relation Identification	Lei Gao, Prafulla Kumar Choubey, Ruihong Huang,	We aim to comprehensively identify all the event causal relations in a document, both within a sentence and across sentences, which is important for reconstructing pivotal event structures.
180	Hierarchical User and Item Representation with Three-Tier Attention for Recommendation	Chuhan Wu, Fangzhao Wu, Junxin Liu, Yongfeng Huang,	In this paper, we propose a hierarchical user and item representation model with three-tier attention to learn user and item representations from reviews for recommendation.
181	Text Similarity Estimation Based on Word Embeddings and Matrix Norms for Targeted Marketing	Tim vor der Brück, Marc Pouly,	Motivated by an industrial application from the domain of youth marketing, where this approach produced only mediocre results, we propose an alternative way of combining the word vectors using matrix norms.
182	Glocal: Incorporating Global Information in Local Convolution for Keyphrase Extraction	Animesh Prasad, Min-Yen Kan,	We address this shortcoming by allowing the proper incorporation of global information into the GCN family of models through the use of scaled node weights.
183	A Study of Latent Structured Prediction Approaches to Passage Reranking	Iryna Haponchyk, Alessandro Moschitti,	In this paper, we propose a structured output approach which regards rankings as latent variables.
184	Combining Distant and Direct Supervision for Neural Relation Extraction	Iz Beltagy, Kyle Lo, Waleed Ammar,	We improve such models by combining the distant supervision data with an additional directly-supervised data, which we use as supervision for the attention weights.
185	Tweet Stance Detection Using an Attention based Neural Ensemble Model	Umme Aymun Siddiqua, Abu Nowshed Chy, Masaki Aono,	In this paper, we propose a neural ensemble model that adopts the strengths of these two LSTM variants to learn better long-term dependencies, where each module coupled with an attention mechanism that amplifies the contribution of important elements in the final representation.
186	Word Embedding-Based Automatic MT Evaluation Metric using Word Position Information	Hiroshi Echizen’ya, Kenji Araki, Eduard Hovy,	We propose a new automatic evaluation metric for machine translation.
187	Learning to Stop in Structured Prediction for Neural Machine Translation	Mingbo Ma, Renjie Zheng, Liang Huang,	We propose a novel ranking method which enables an optimal beam search stop- ping criteria.
188	Learning Unsupervised Multilingual Word Embeddings with Incremental Multilingual Hubs	Geert Heyman, Bregt Verreet, Ivan Vuli?, Marie-Francine Moens,	In this work, we propose a new robust framework for learning unsupervised multilingual word embeddings that mitigates the instability issues.
189	Curriculum Learning for Domain Adaptation in Neural Machine Translation	Xuan Zhang, Pamela Shapiro, Gaurav Kumar, Paul McNamee, Marine Carpuat, Kevin Duh,	We introduce a curriculum learning approach to adapt generic neural machine translation models to a specific domain.
190	Improving Robustness of Machine Translation with Synthetic Noise	Vaibhav Vaibhav, Sumeet Singh, Craig Stewart, Graham Neubig,	In this paper we propose methods to enhance the robustness of MT systems by emulating naturally occurring noise in otherwise clean data.
191	Non-Parametric Adaptation for Neural Machine Translation	Ankur Bapna, Orhan Firat,	We propose a novel n-gram level retrieval approach that relies on local phrase level similarities, allowing us to retrieve neighbors that are useful for translation even when overall sentence similarity is low.
192	Online Distilling from Checkpoints for Neural Machine Translation	Hao-Ran Wei, Shujian Huang, Ran Wang, Xin-Yu Dai, Jiajun Chen,	In contrast, we propose an online knowledge distillation method.
193	Value-based Search in Execution Space for Mapping Instructions to Programs	Dor Muhlgay, Jonathan Herzig, Jonathan Berant,	In this work, we propose a search algorithm that uses the target world state, known at training time, to train a critic network that predicts the expected reward of every search state.
194	VQD: Visual Query Detection In Natural Scenes	Manoj Acharya, Karan Jariwala, Christopher Kanan,	We propose a new visual grounding task called Visual Query Detection (VQD).
195	Improving Natural Language Interaction with Robots Using Advice	Nikhil Mehta, Dan Goldwasser,	In this paper we take the first step towards increasing the bandwidth of this interaction, and suggest a protocol for including advice, high-level observations about the task, which can help constrain the agent’s prediction.
196	Generating Knowledge Graph Paths from Textual Definitions using Sequence-to-Sequence Models	Victor Prokhorov, Mohammad Taher Pilehvar, Nigel Collier,	We present a novel method for mapping unrestricted text to knowledge graph entities by framing the task as a sequence-to-sequence problem.
197	Shifting the Baseline: Single Modality Performance on Visual Navigation & QA	Jesse Thomason, Daniel Gordon, Yonatan Bisk,	We present unimodal ablations on three recent datasets in visual navigation and QA, seeing an up to 29% absolute gain in performance over published baselines.
198	ExCL: Extractive Clip Localization Using Natural Language Descriptions	Soham Ghosh, Anuva Agarwal, Zarana Parekh, Alexander Hauptmann,	In order to select the most relevant video clip corresponding to the given text description, we propose a novel extractive approach that predicts the start and end frames by leveraging cross-modal interactions between the text and video – this removes the need to retrieve and re-rank multiple proposal segments.
199	Detecting dementia in Mandarin Chinese using transfer learning from a parallel corpus	Bai Li, Yi-Te Hsu, Frank Rudzicz,	We propose a method to learn a correspondence between independently engineered lexicosyntactic features in two languages, using a large parallel corpus of out-of-domain movie dialogue data.
200	Cross-lingual Visual Verb Sense Disambiguation	Spandana Gella, Desmond Elliott, Frank Keller,	We extend this line of work to the more challenging task of cross-lingual verb sense disambiguation, introducing the MultiSense dataset of 9,504 images annotated with English, German, and Spanish verbs.
201	Subword-Level Language Identification for Intra-Word Code-Switching	Manuel Mager, Özlem Çetino?lu, Katharina Kann,	In this paper, we extend the language identification task to the subword-level, such that it includes splitting mixed words while tagging each part with a language ID.
202	MuST-C: a Multilingual Speech Translation Corpus	Mattia A. Di Gangi, Roldano Cattoni, Luisa Bentivogli, Matteo Negri, Marco Turchi,	To fill this gap, we created MuST-C, a multilingual speech translation corpus whose size and quality will facilitate the training of end-to-end systems for SLT from English into 8 languages.
203	Contextualization of Morphological Inflection	Ekaterina Vylomova, Ryan Cotterell, Trevor Cohn, Timothy Baldwin, Jason Eisner,	In this paper, we isolate the task of predicting a fully inflected sentence from its partially lemmatized version.
204	A Robust Abstractive System for Cross-Lingual Summarization	Jessica Ouyang, Boya Song, Kathy McKeown,	We present a robust neural abstractive summarization system for cross-lingual summarization. We construct summarization corpora for documents automatically translated from three low-resource languages, Somali, Swahili, and Tagalog, using machine translation and the New York Times summarization corpus.
205	Improving Neural Machine Translation with Neural Syntactic Distance	Chunpeng Ma, Akihiro Tamura, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao,	We propose five strategies to improve NMT with NSD.
206	Measuring Immediate Adaptation Performance for Neural Machine Translation	Patrick Simianer, Joern Wuebker, John DeNero,	To this end, we propose new metrics that directly evaluate immediate adaptation performance for machine translation.
207	Differentiable Sampling with Flexible Reference Word Order for Neural Machine Translation	Weijia Xu, Xing Niu, Marine Carpuat,	Our new differentiable sampling algorithm addresses this issue by optimizing the probability that the reference can be aligned with the sampled output, based on a soft alignment predicted by the model itself.
208	Reinforcement Learning based Curriculum Optimization for Neural Machine Translation	Gaurav Kumar, George Foster, Colin Cherry, Maxim Krikun,	We consider the problem of making efficient use of heterogeneous training data in neural machine translation (NMT).
209	Overcoming Catastrophic Forgetting During Domain Adaptation of Neural Machine Translation	Brian Thompson, Jeremy Gwinnup, Huda Khayrallah, Kevin Duh, Philipp Koehn,	In this work, we interpret the drop in general-domain performance as catastrophic forgetting of general-domain knowledge.
210	Short-Term Meaning Shift: A Distributional Exploration	Marco Del Tredici, Raquel Fernández, Gemma Boleda,	We present the first exploration of meaning shift over short periods of time in online communities using distributional representations. We create a small annotated dataset and use it to assess the performance of a standard model for meaning shift detection on short-term meaning shift.
211	Detecting Derogatory Compounds — An Unsupervised Approach	Michael Wiegand, Maximilian Wolf, Josef Ruppenhofer,	We propose an unsupervised classification approach that incorporates linguistic properties of compounds.
212	Personalized Neural Embeddings for Collaborative Filtering with Text	Guangneng Hu,	We develop a Personalized Neural Embedding (PNE) framework to exploit both interactions and words seamlessly.
213	An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models	Alexandra Chronopoulou, Christos Baziotis, Alexandros Potamianos,	In this paper we present a conceptually simple and effective transfer learning approach that addresses the problem of catastrophic forgetting.
214	Incorporating Emoji Descriptions Improves Tweet Classification	Abhishek Singh, Eduardo Blanco, Wei Jin,	In this paper, we present a simple strategy to process emojis: replace them with their natural language description and use pretrained word embeddings as normally done with standard words.
215	Modeling Personal Biases in Language Use by Inducing Personalized Word Embeddings	Daisuke Oba, Naoki Yoshinaga, Shoetsu Sato, Satoshi Akasaki, Masashi Toyoda,	In this study, we propose a method of modeling such personal biases in word meanings (hereafter, semantic variations) with personalized word embeddings obtained by solving a task on subjective text while regarding words used by different individuals as different words.
216	Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media	Ramy Baly, Georgi Karadzhov, Abdelrhman Saleh, James Glass, Preslav Nakov,	In particular, we propose a multi-task ordinal regression framework that models the two problems jointly.
217	Joint Detection and Location of English Puns	Yanyan Zou, Wei Lu,	This paper presents an approach that addresses pun detection and pun location jointly from a sequence labeling perspective.
218	Harry Potter and the Action Prediction Challenge from Natural Language	David Vilares, Carlos Gómez-Rodríguez,	We explore the challenge of action prediction from textual descriptions of scenes, a testbed to approximate whether text inference can be used to predict upcoming actions.
219	Argument Mining for Understanding Peer Reviews	Xinyu Hua, Mitko Nikolov, Nikhil Badugu, Lu Wang,	In this work, we study the content and structure of peer reviews under the argument mining framework, through automatically detecting (1) the argumentative propositions put forward by reviewers, and (2) their types (e.g., evaluating the work or making suggestions for improvement).
220	An annotated dataset of literary entities	David Bamman, Sejal Popat, Sheng Shen,	We present empirical results demonstrating the performance of nested entity recognition models in this domain; training natively on in-domain literary data yields an improvement of over 20 absolute points in F-score (from 45.7 to 68.3), and mitigates a disparate impact in performance for male and female entities present in models trained on news data. We present a new dataset comprised of 210,532 tokens evenly drawn from 100 different English-language literary texts annotated for ACE entity categories (person, location, geo-political entity, facility, organization, and vehicle).
221	Abusive Language Detection with Graph Convolutional Networks	Pushkar Mishra, Marco Del Tredici, Helen Yannakoudakis, Ekaterina Shutova,	In contrast, working with graph convolutional networks (GCNs), we present the first approach that captures not only the structure of online communities but also the linguistic behavior of the users within them.
222	On the Importance of Distinguishing Word Meaning Representations: A Case Study on Reverse Dictionary Mapping	Mohammad Taher Pilehvar,	Through a set of experiments on a state-of-the-art reverse dictionary system based on neural networks, we show that a simple adjustment aimed at addressing the meaning conflation deficiency can lead to substantial improvements.
223	Factorising AMR generation through syntax	Kris Cao, Stephen Clark,	We show that decomposing the generation process this way leads to state-of-the-art single model performance generating from AMR without additional unlabelled data.
224	A Crowdsourced Frame Disambiguation Corpus with Ambiguity	Anca Dumitrache, Lora Aroyo, Chris Welty,	We present a resource for the task of FrameNet semantic frame disambiguation of over 5,000 word-sentence pairs from the Wikipedia corpus.
225	Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets	Nelson F. Liu, Roy Schwartz, Noah A. Smith,	We introduce inoculation by fine-tuning, a new analysis method for studying challenge datasets by exposing models (the metaphorical patient) to a small amount of data from the challenge dataset (a metaphorical pathogen) and assessing how well they can adapt.
226	A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization	Dai Quoc Nguyen, Thanh Vu, Tu Dinh Nguyen, Dat Quoc Nguyen, Dinh Phung,	In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object).
227	Partial Or Complete, That’s The Question	Qiang Ning, Hangfeng He, Chuchu Fan, Dan Roth,	This paper questions this common perception, motivated by the fact that structures consist of interdependent sets of variables.
228	Sequential Attention with Keyword Mask Model for Community-based Question Answering	Jianxin Yang, Wenge Rong, Libin Shi, Zhang Xiong,	We propose a Sequential Attention with Keyword Mask model(SAKM) for CQA to imitate human reading behavior.
229	Simple Attention-Based Representation Learning for Ranking Short Social Media Posts	Peng Shi, Jinfeng Rao, Jimmy Lin,	This paper explores the problem of ranking short social media posts with respect to user queries using neural networks.
230	AttentiveChecker: A Bi-Directional Attention Flow Mechanism for Fact Verification	Santosh Tokala, Vishal G, Avirup Saha, Niloy Ganguly,	In this paper, we present a completely task-agnostic pipelined system, AttentiveChecker, consisting of three homogeneous Bi-Directional Attention Flow (BIDAF) networks, which are multi-layer hierarchical networks that represent the context at different levels of granularity.
231	Practical, Efficient, and Customizable Active Learning for Named Entity Recognition in the Digital Humanities	Alexander Erdmann, David Joseph Wrisley, Benjamin Allen, Christopher Brown, Sophie Cohen-Bodénès, Micha Elsner, Yukun Feng, Brian Joseph, Béatrice Joyeux-Prunel, Marie-Catherine de Marneffe,	Thus, we propose an active learning solution for named entity recognition, attempting to maximize a custom model’s improvement per additional unit of manual annotation.
232	Doc2hash: Learning Discrete Latent variables for Documents Retrieval	Yifei Zhang, Hao Zhu,	In this paper, we propose a method, Doc2hash, that solves the gradient flow problem of the discrete stochastic layer by using continuous relaxation on priors, and trains the generative model in an end-to-end manner to generate hash codes.
233	Evaluating Text GANs as Language Models	Guy Tevet, Gavriel Habib, Vered Shwartz, Jonathan Berant,	In this work, we propose to approximate the distribution of text generated by a GAN, which permits evaluating them with traditional probability-based LM metrics.
234	Latent Code and Text-based Generative Adversarial Networks for Soft-text Generation	Md Akmal Haidar, Mehdi Rezagholizadeh, Alan Do Omri, Ahmad Rashid,	In this work, we introduce a novel text-based approach called Soft-GAN to effectively exploit GAN setup for text generation.
235	Neural Text Generation from Rich Semantic Representations	Valerie Hajdik, Jan Buys, Michael Wayne Goodman, Emily M. Bender,	We propose neural models to generate high-quality text from structured representations based on Minimal Recursion Semantics (MRS).
236	Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation	Amit Moryossef, Yoav Goldberg, Ido Dagan,	For training a plan-to-text generator, we present a method for matching reference texts to their corresponding text plans.
237	Evaluating Rewards for Question Generation Models	Tom Hosking, Sebastian Riedel,	We therefore optimise directly for various objectives beyond simply replicating the ground truth questions, including a novel approach using an adversarial discriminator that seeks to generate questions that are indistinguishable from real examples.
238	Text Generation from Knowledge Graphs with Graph Transformers	Rik Koncel-Kedziorski, Dhanush Bekal, Yi Luan, Mirella Lapata, Hannaneh Hajishirzi,	In this work, we address the problem of generating coherent multi-sentence texts from the output of an information extraction system, and in particular a knowledge graph.
239	Open Information Extraction from Question-Answer Pairs	Nikita Bhutani, Yoshihiko Suhara, Wang-Chiew Tan, Alon Halevy, H. V. Jagadish,	We describe NeurON, a system for extracting tuples from question-answer pairs.
240	Question Answering by Reasoning Across Documents with Graph Convolutional Networks	Nicola De Cao, Wilker Aziz, Ivan Titov,	We introduce a neural model which integrates and reasons relying on information spread within documents and across multiple documents.
241	A Qualitative Comparison of CoQA, SQuAD 2.0 and QuAC	Mark Yatskar,	We compare three new datasets for question answering: SQuAD 2.0, QuAC, and CoQA, along several of their new features: (1) unanswerable questions, (2) multi-turn interactions, and (3) abstractive answers.We show that the datasets provide complementary coverage of the first two aspects, but weak coverage of the third.Because of the datasets’ structural similarity, a single extractive model can be easily adapted to any of the datasets and we show improved baseline results on both SQuAD 2.0 and CoQA.
242	BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis	Hu Xu, Bing Liu, Lei Shu, Philip Yu,	We call this problem Review Reading Comprehension (RRC). In this work, we first build an RRC dataset called ReviewRC based on a popular benchmark for aspect-based sentiment analysis.
243	Old is Gold: Linguistic Driven Approach for Entity and Relation Linking of Short Text	Ahmad Sakor, Isaiah Onando Mulang’, Kuldeep Singh, Saeedeh Shekarpour, Maria Esther Vidal, Jens Lehmann, Sören Auer,	In this work, we present the Falcon approach which effectively maps entities and relations within a short text to its mentions of a background knowledge graph.
244	Be Consistent! Improving Procedural Text Comprehension using Label Consistency	Xinya Du, Bhavana Dalvi, Niket Tandon, Antoine Bosselut, Wen-tau Yih, Peter Clark, Claire Cardie,	We present a new learning framework that leverages label consistency during training, allowing consistency bias to be built into the model.
245	MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms	Aida Amini, Saadia Gabriel, Shanchuan Lin, Rik Koncel-Kedziorski, Yejin Choi, Hannaneh Hajishirzi,	We introduce a large-scale dataset of math word problems and an interpretable neural math problem solver by learning to map problems to their operation programs.
246	DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs	Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, Matt Gardner,	We introduce a new reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs.
247	An Encoding Strategy Based Word-Character LSTM for Chinese NER	Wei Liu, Tongge Xu, Qinghua Xu, Jiayu Song, Yueran Zu,	In this paper, we propose a novel word-character LSTM(WC-LSTM) model to add word information into the start or the end character of the word, alleviating the influence of word segmentation errors while obtaining the word boundary information.
248	Highly Effective Arabic Diacritization using Sequence to Sequence Modeling	Hamdy Mubarak, Ahmed Abdelali, Hassan Sajjad, Younes Samih, Kareem Darwish,	In this work, we present a unified character level sequence-to-sequence deep learning model that recovers both types of diacritics without the use of explicit feature engineering.
249	SC-LSTM: Learning Task-Specific Representations in Multi-Task Learning for Sequence Labeling	Peng Lu, Ting Bai, Philippe Langlais,	In order to do so, we propose a new LSTM cell which contains both shared parameters that can learn from all tasks, and task-specific parameters that can learn task-specific information.
250	Learning to Denoise Distantly-Labeled Data for Entity Typing	Yasumasa Onoe, Greg Durrett,	We investigate this approach on the ultra-fine entity typing task of Choi et al. (2018).
251	A Simple and Robust Approach to Detecting Subject-Verb Agreement Errors	Simon Flachs, Ophélie Lacroix, Marek Rei, Helen Yannakoudakis, Anders Søgaard,	We observe that rule-based error generation is less sensitive to syntactic parsing errors and irregularities than error detection and explore a simple, yet efficient approach to getting the best of both worlds: We train neural sequential labelers on the combination of large volumes of silver standard data, obtained through rule-based error generation, and gold standard data.
252	A Grounded Unsupervised Universal Part-of-Speech Tagger for Low-Resource Languages	Ronald Cardenas, Ying Lin, Heng Ji, Jonathan May,	In this work, we describe an approach for low-resource unsupervised POS tagging that yields fully grounded output and requires no labeled training data.
253	On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing	Wasi Ahmad, Zhisong Zhang, Xuezhe Ma, Eduard Hovy, Kai-Wei Chang, Nanyun Peng,	In this paper, we investigate crosslingual transfer and posit that an orderagnostic model will perform better when transferring to distant foreign languages.
254	A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations	Mingda Chen, Qingming Tang, Sam Wiseman, Kevin Gimpel,	We propose a generative model for a sentence that uses two latent variables, with one intended to represent the syntax of the sentence and the other to represent its semantics.
255	Self-Discriminative Learning for Unsupervised Document Embedding	Hong-You Chen, Chin-Hua Hu, Leila Wehbe, Shou-de Lin,	Unlike most previous work which learn the embedding based on self-prediction of the surface of text, we explicitly exploit the inter-document information and directly model the relations of documents in embedding space with a discriminative network and a novel objective.
256	Adaptive Convolution for Text Classification	Byung-Ju Choi, Jun-Hyung Park, SangKeun Lee,	In this paper, we present an adaptive convolution for text classification to give flexibility to convolutional neural networks (CNNs).
257	Zero-Shot Cross-Lingual Opinion Target Extraction	Soufian Jebbara, Philipp Cimiano,	In this work, we address the lack of available annotated data for specific languages by proposing a zero-shot cross-lingual approach for the extraction of opinion target expressions.
258	Adversarial Category Alignment Network for Cross-domain Sentiment Classification	Xiaoye Qu, Zhikang Zou, Yu Cheng, Yang Yang, Pan Zhou,	In this work, we propose an adversarial category alignment network (ACAN), which attempts to enhance category consistency between the source domain and the target domain.
259	Target-oriented Opinion Words Extraction with Target-fused Neural Sequence Labeling	Zhifang Fan, Zhen Wu, Xin-Yu Dai, Shujian Huang, Jiajun Chen,	In this paper, we propose a novel sequence labeling subtask for ABSA named TOWE (Target-oriented Opinion Words Extraction), which aims at extracting the corresponding opinion words for a given opinion target. We build four datasets for TOWE based on several popular ABSA benchmarks from laptop and restaurant reviews.
260	Abstractive Summarization of Reddit Posts with Multi-level Memory Networks	Byeongchang Kim, Hyunwoo Kim, Gunhee Kim,	We address the problem of abstractive summarization in two directions: proposing a novel dataset and a new model. First, we collect Reddit TIFU dataset, consisting of 120K posts from the online discussion forum Reddit.
261	Automatic learner summary assessment for reading comprehension	Menglin Xia, Ekaterina Kochmar, Ted Briscoe,	We present a summarization task for evaluating non-native reading comprehension and propose three novel approaches to automatically assess the learner summaries.
262	Data-efficient Neural Text Compression with Interactive Learning	Avinesh P.V.S, Christian M. Meyer,	In this paper, we propose a novel interactive setup to neural text compression that enables transferring a model to new domains and compression tasks with minimal human supervision.
263	Text Generation with Exemplar-based Adaptive Decoding	Hao Peng, Ankur Parikh, Manaal Faruqui, Bhuwan Dhingra, Dipanjan Das,	We propose a novel conditioned text generation model.
264	Guiding Extractive Summarization with Question-Answering Rewards	Kristjan Arumae, Fei Liu,	In this paper we describe a novel framework to guide a supervised, extractive summarization system with question-answering rewards.
265	Beyond task success: A closer look at jointly learning to see, ask, and GuessWhat	Ravi Shekhar, Aashish Venkatesh, Tim Baumgärtner, Elia Bruni, Barbara Plank, Raffaella Bernardi, Raquel Fernández,	We propose a grounded dialogue state encoder which addresses a foundational issue on how to integrate visual grounding with dialogue system components.
266	The World in My Mind: Visual Dialog with Adversarial Multi-modal Feature Encoding	Yiqun Yao, Jiaming Xu, Bo Xu,	In this paper, we propose a novel Adversarial Multi-modal Feature Encoding (AMFE) framework for effective and robust auxiliary training of visual dialog systems.
267	Strong and Simple Baselines for Multimodal Utterance Embeddings	Paul Pu Liang, Yao Chong Lim, Yao-Hung Hubert Tsai, Ruslan Salakhutdinov, Louis-Philippe Morency,	In this paper, we propose two simple but strong baselines to learn embeddings of multimodal utterances.
268	Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout	Hao Tan, Licheng Yu, Mohit Bansal,	In this paper, we present a generalizable navigational agent.
269	Towards Content Transfer through Grounded Text Generation	Shrimai Prabhumoye, Chris Quirk, Michel Galley,	This paper introduces the notion of Content Transfer for long-form text generation, where the task is to generate a next sentence in a document that both fits its context and is grounded in a content-rich external textual source such as a news story. As another contribution of this paper, we release a benchmark dataset of 640k Wikipedia referenced sentences paired with the source articles to encourage exploration of this new task.
270	Improving Machine Reading Comprehension with General Reading Strategies	Kai Sun, Dian Yu, Dong Yu, Claire Cardie,	Inspired by reading strategies identified in cognitive science, and given limited computational resources – just a pre-trained model and a fixed number of training instances – we propose three general strategies aimed to improve non-extractive machine reading comprehension (MRC): (i) BACK AND FORTH READING that considers both the original and reverse order of an input sequence, (ii) HIGHLIGHTING, which adds a trainable embedding to the text embedding of tokens that are relevant to the question and candidate answers, and (iii) SELF-ASSESSMENT that generates practice questions and candidate answers directly from the text in an unsupervised manner.
271	Multi-task Learning with Sample Re-weighting for Machine Reading Comprehension	Yichong Xu, Xiaodong Liu, Yelong Shen, Jingjing Liu, Jianfeng Gao,	We propose a multi-task learning framework to learn a joint Machine Reading Comprehension (MRC) model that can be applied to a wide range of MRC tasks in different domains.
272	Semantically-Aligned Equation Generation for Solving and Reasoning Math Word Problems	Ting-Rui Chiang, Yun-Nung Chen,	Motivated by the intuition about how human generates the equations given the problem texts, this paper presents a neural approach to automatically solve math word problems by operating symbols according to their semantic meanings in texts.
273	Iterative Search for Weakly Supervised Semantic Parsing	Pradeep Dasigi, Matt Gardner, Shikhar Murty, Luke Zettlemoyer, Eduard Hovy,	We propose a novel iterative training algorithm that alternates between searching for consistent logical forms and maximizing the marginal likelihood of the retrieved ones.
274	Alignment over Heterogeneous Embeddings for Question Answering	Vikas Yadav, Steven Bethard, Mihai Surdeanu,	We propose a simple, fast, and mostly-unsupervised approach for non-factoid question answering (QA) called Alignment over Heterogeneous Embeddings (AHE).
275	Bridging the Gap: Attending to Discontinuity in Identification of Multiword Expressions	Omid Rohanian, Shiva Taslimipoor, Samaneh Kouchaki, Le An Ha, Ruslan Mitkov,	We introduce a new method to tag Multiword Expressions (MWEs) using a linguistically interpretable language-independent deep learning architecture.
276	Incorporating Word Attention into Character-Based Word Segmentation	Shohei Higashiyama, Masao Utiyama, Eiichiro Sumita, Masao Ideuchi, Yoshiaki Oida, Yohei Sakamoto, Isaac Okada,	We propose a character-based model utilizing word information to leverage the advantages of both types of models.
277	VCWE: Visual Character-Enhanced Word Embeddings	Chi Sun, Xipeng Qiu, Xuanjing Huang,	In this paper, we propose a model to learn Chinese word embeddings via three-level composition: (1) a convolutional neural network to extract the intra-character compositionality from the visual shape of a character; (2) a recurrent neural network with self-attention to compose character representation into word embeddings; (3) the Skip-Gram framework to capture non-compositionality directly from the contextual information.
278	Subword Encoding in Lattice LSTM for Chinese Word Segmentation	Jie Yang, Yue Zhang, Shuailong Liang,	We investigate subword information for Chinese word segmentation, by integrating sub word embeddings trained using byte-pair encoding into a Lattice LSTM (LaLSTM) network over a character sequence.
279	Improving Cross-Domain Chinese Word Segmentation with Word Embeddings	Yuxiao Ye, Weikang Li, Yue Zhang, Likun Qiu, Jian Sun,	In this paper, we propose a semi-supervised word-based approach to improving cross-domain CWS given a baseline segmenter.
280	Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging	Apostolos Kemos, Heike Adel, Hinrich Schütze,	In this paper, we propose to eliminate the need for tokenizers with an end-to-end character-level semi-Markov conditional random field.
281	Shrinking Japanese Morphological Analyzers With Neural Networks and Semi-supervised Learning	Arseny Tolmachev, Daisuke Kawahara, Sadao Kurohashi,	We propose a compact alternative to these cumbersome approaches which do not rely on any externally provided n-gram or word representations.
282	Neural Constituency Parsing of Speech Transcripts	Paria Jamshid Lou, Yufei Wang, Mark Johnson,	This paper studies the performance of a neural self-attentive parser on transcribed speech.
283	Acoustic-to-Word Models with Conversational Context Information	Suyoun Kim, Florian Metze,	In this work, we present a direct acoustic-to-word, end-to-end speech recognition model capable of utilizing the conversational context to better process long conversations.
284	A Dynamic Speaker Model for Conversational Interactions	Hao Cheng, Hao Fang, Mari Ostendorf,	In this work, we introduce a neural model for learning a dynamically updated speaker embedding in a conversational context.
285	Fluent Translations from Disfluent Speech in End-to-End Speech Translation	Elizabeth Salesky, Matthias Sperber, Alexander Waibel,	We use a sequence-to-sequence model to translate from noisy, disfluent speech to fluent text with disfluencies removed using the recently collected copy-edited’ references for the Fisher Spanish-English dataset.
286	Relation Classification Using Segment-Level Attention-based CNN and Dependency-based RNN	Van-Hien Tran, Van-Thuy Phi, Hiroyuki Shindo, Yuji Matsumoto,	In this paper, we propose a new model effectively combining Segment-level Attention-based Convolutional Neural Networks (SACNNs) and Dependency-based Recurrent Neural Networks (DepRNNs).
287	Document-Level Event Factuality Identification via Adversarial Neural Network	Zhong Qian, Peifeng Li, Qiaoming Zhu, Guodong Zhou,	Document-Level Event Factuality Identification via Adversarial Neural Network
288	Distant Supervision Relation Extraction with Intra-Bag and Inter-Bag Attentions	Zhi-Xiu Ye, Zhen-Hua Ling,	This paper presents a neural relation extraction method to deal with the noisy training data generated by distant supervision.
289	Ranking-Based Autoencoder for Extreme Multi-label Classification	Bingyu Wang, Li Chen, Wei Sun, Kechen Qin, Kefeng Li, Hui Zhou,	In this paper, we propose a deep learning XML method, with a word-vector-based self-attention, followed by a ranking-based AutoEncoder architecture.
290	Posterior-regularized REINFORCE for Instance Selection in Distant Supervision	Qi Zhang, Siliang Tang, Xiang Ren, Fei Wu, Shiliang Pu, Yueting Zhuang,	This paper provides a new way to improve the efficiency of the REINFORCE training process.
291	Scalable Collapsed Inference for High-Dimensional Topic Models	Rashidul Islam, James Foulds,	In this paper, we develop an online inference algorithm for topic models which leverages stochasticity to scale well in the number of documents, sparsity to scale well in the number of topics, and which operates in the collapsed representation of the topic model for improved accuracy and run-time performance.
292	An Integrated Approach for Keyphrase Generation via Exploring the Power of Retrieval and Extraction	Wang Chen, Hou Pong Chan, Piji Li, Lidong Bing, Irwin King,	In this paper, we present a novel integrated approach for keyphrase generation (KG).
293	Predicting Malware Attributes from Cybersecurity Texts	Arpita Roy, Youngja Park, Shimei Pan,	In this paper, we propose a novel feature learning method to leverage diverse knowledge sources such as small amount of human annotations, unlabeled text and specifications about malware attribute labels.
294	Improving Distantly-supervised Entity Typing with Compact Latent Space Clustering	Bo Chen, Xiaotao Gu, Yufeng Hu, Siliang Tang, Guoping Hu, Yueting Zhuang, Xiang Ren,	In this work, we propose to regularize distantly supervised models with Compact Latent Space Clustering (CLSC) to bypass this problem and effectively utilize noisy data yet.
295	Modelling Instance-Level Annotator Reliability for Natural Language Labelling Tasks	Maolin Li, Arvid Fahlström Myrman, Tingting Mu, Sophia Ananiadou,	In this paper, we propose an unsupervised model which can handle both binary and multi-class labels.
296	Review-Driven Multi-Label Music Style Classification by Exploiting Style Correlations	Guangxiang Zhao, Jingjing Xu, Qi Zeng, Xuancheng Ren, Xu Sun,	This paper explores a new natural languageprocessing task, review-driven multi-label musicstyle classification.
297	Fact Discovery from Knowledge Base via Facet Decomposition	Zihao Fu, Yankai Lin, Zhiyuan Liu, Wai Lam,	To tackle this new problem, we propose a novel framework that decomposes the discovery problem into several facet discovery components.
298	A Richer-but-Smarter Shortest Dependency Path with Attentive Augmentation for Relation Extraction	Duy-Cat Can, Hoang-Quynh Le, Quang-Thuy Ha, Nigel Collier,	In this work, we propose a novel model that combines the advantages of these two approaches.
299	Bidirectional Attentive Memory Networks for Question Answering over Knowledge Bases	Yu Chen, Lingfei Wu, Mohammed J Zaki,	In this work, we propose to directly model the two-way flow of interactions between the questions and the KB via a novel Bidirectional Attentive Memory Network, called BAMnet.
300	BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions	Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, Kristina Toutanova,	In this paper we study yes/no questions that are naturally occurring – meaning that they are generated in unprompted and unconstrained settings. We build a reading comprehension dataset, BoolQ, of such questions, and show that they are unexpectedly challenging.
301	Enhancing Key-Value Memory Neural Networks for Knowledge Based Question Answering	Kun Xu, Yuxuan Lai, Yansong Feng, Zhiguo Wang,	In this paper, we propose a novel mechanism to enable conventional KV-MemNNs models to perform interpretable reasoning for complex questions.
302	Repurposing Entailment for Multi-Hop Question Answering Tasks	Harsh Trivedi, Heeyoung Kwon, Tushar Khot, Ashish Sabharwal, Niranjan Balasubramanian,	We introduce Multee, a general architecture that can effectively use entailment models for multi-hop QA tasks.
303	GenderQuant: Quantifying Mention-Level Genderedness	Ananya, Nitya Parthasarthi, Sameer Singh,	In this paper, we use existing NLP pipelines to automatically annotate gender of mentions in the text.
304	Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings	Dorottya Demszky, Nikhil Garg, Rob Voigt, James Zou, Jesse Shapiro, Matthew Gentzkow, Dan Jurafsky,	We provide an NLP framework to uncover four linguistic dimensions of political polarization in social media: topic choice, framing, affect and illocutionary force.
305	Learning to Decipher Hate Symbols	Jing Qian, Mai ElSherief, Elizabeth Belding, William Yang Wang,	In this paper, we propose a novel task of deciphering hate symbols. To do this, we leveraged the Urban Dictionary and collected a new, symbol-rich Twitter corpus of hate speech.
306	Long-tail Relation Extraction via Knowledge Graph Embeddings and Graph Convolution Networks	Ningyu Zhang, Shumin Deng, Zhanlin Sun, Guanying Wang, Xi Chen, Wei Zhang, Huajun Chen,	We propose a distance supervised relation extraction approach for long-tailed, imbalanced data which is prevalent in real-world settings.
307	GAN Driven Semi-distant Supervision for Relation Extraction	Pengshuai Li, Xinsong Zhang, Weijia Jia, Hai Zhao,	To address this issue, we propose a novel semi-distant supervision approach for relation extraction by constructing a small accurate dataset and properly leveraging numerous instances without relation labels.
308	A general framework for information extraction using dynamic span graphs	Yi Luan, Dave Wadden, Luheng He, Amy Shah, Mari Ostendorf, Hannaneh Hajishirzi,	We introduce a general framework for several information extraction tasks that share span representations using dynamically constructed span graphs.
309	OpenCeres: When Open Information Extraction Meets the Semi-Structured Web	Colin Lockard, Prashant Shiralkar, Xin Luna Dong,	In this paper, we define the problem of OpenIE from semi-structured websites to extract such facts, and present an approach for solving it. We also introduce a labeled evaluation dataset to motivate research in this area.
310	Structured Minimally Supervised Learning for Neural Relation Extraction	Fan Bai, Alan Ritter,	We present an approach to minimally supervised relation extraction that combines the benefits of learned representations and structured learning, and accurately predicts sentence-level relation mentions given only proposition-level supervision from a KB.
311	Neural Machine Translation of Text from Non-Native Speakers	Antonios Anastasopoulos, Alison Lui, Toan Q. Nguyen, David Chiang,	In this paper, we show that augmenting training data with sentences containing artificially-introduced grammatical errors can make the system more robust to such errors. We also present a set of Spanish translations of the JFLEG grammar error correction corpus, which allows for testing NMT robustness to real grammatical errors.
312	Improving Domain Adaptation Translation with Domain Invariant and Specific Information	Shuhao Gu, Yang Feng, Qun Liu,	In this paper, we propose a method to explicitly model the two kinds of information in the encoder-decoder framework so as to exploit out-of-domain data in in-domain training.
313	Selective Attention for Context-aware Neural Machine Translation	Sameen Maruf, André F. T. Martins, Gholamreza Haffari,	To this end, we propose a novel and scalable top-down approach to hierarchical attention for context-aware NMT which uses sparse attention to selectively focus on relevant sentences in the document context and then attends to key words in those sentences.
314	On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models	Paul Michel, Xian Li, Graham Neubig, Juan Pino,	Using the example of untargeted attacks on machine translation (MT), we propose a new evaluation framework for adversarial attacks on seq2seq models that takes the semantic equivalence of the pre- and post-perturbation input into account.
315	Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction	Kazuma Hashimoto, Yoshimasa Tsuruoka,	To improve the efficiency of reinforcement learning, we present a novel approach for reducing the action space based on dynamic vocabulary prediction.
316	Mitigating Uncertainty in Document Classification	Xuchao Zhang, Fanglan Chen, ChangTien Lu, Naren Ramakrishnan,	In this paper, we propose a novel neural-network-based model that applies a new dropout-entropy method for uncertainty measurement.
317	Complexity-Weighted Loss and Diverse Reranking for Sentence Simplification	Reno Kriz, Joao Sedoc, Marianna Apidianaki, Carolina Zheng, Gaurav Kumar, Eleni Miltsakaki, Chris Callison-Burch,	We aim to alleviate this issue through the use of two main techniques. Second, we generate a large set of diverse candidate simplifications at test time, and rerank these to promote fluency, adequacy, and simplicity.
318	Predicting Helpful Posts in Open-Ended Discussion Forums: A Neural Architecture	Kishaloy Halder, Min-Yen Kan, Kazunari Sugiyama,	In this paper, we address the task of identifying helpful posts in a forum thread to help users comprehend long running discussion threads, which often contain repetitive or irrelevant posts.
319	Text Classification with Few Examples using Controlled Generalization	Abhijit Mahabal, Jason Baldridge, Burcu Karagol Ayan, Vincent Perot, Dan Roth,	This produces task-specific semantic vectors; here, we show that a feed-forward network over these vectors is especially effective in low-data scenarios, compared to existing state-of-the-art methods.
320	Reinforcement Learning Based Text Style Transfer without Parallel Training Corpus	Hongyu Gong, Suma Bhat, Lingfei Wu, JinJun Xiong, Wen-mei Hwu,	In this paper, we address this challenge by using a reinforcement-learning-based generator-evaluator architecture.
321	Adapting RNN Sequence Prediction Model to Multi-label Set Prediction	Kechen Qin, Cheng Li, Virgil Pavlu, Javed Aslam,	We present an adaptation of RNN sequence models to the problem of multi-label classification for text, where the target is a set of labels, not a sequence.
322	Customizing Grapheme-to-Phoneme System for Non-Trivial Transcription Problems in Bangla Language	Sudipta Saha Shubha, Nafis Sadeq, Shafayat Ahmed, Md. Nahidul Islam, Muhammad Abdullah Adnan, Md. Yasin Ali Khan, Mohammad Zuberul Islam,	As the performance of data-driven approaches for G2P conversion depend largely on pronunciation lexicon on which the system is trained, in this paper, we investigate on developing an improved training lexicon by identifying and categorizing the critical cases in Bangla language and include those critical cases in training lexicon for developing a robust G2P conversion system in Bangla language.
323	Connecting Language and Knowledge with Heterogeneous Representations for Neural Relation Extraction	Peng Xu, Denilson Barbosa,	We help close the gap with a framework that unifies the learning of RE and KBE models leading to significant improvements over the state-of-the-art in RE.
324	Segmentation-free compositional $n$-gram embedding	Geewook Kim, Kazuki Fukui, Hidetoshi Shimodaira,	We propose a new type of representation learning method that models words, phrases and sentences seamlessly.
325	Exploiting Noisy Data in Distant Supervision Relation Classification	Kaijia Yang, Liang He, Xin-Yu Dai, Shujian Huang, Jiajun Chen,	Different from previous works that underutilize noisy data which inherently characterize the property of classification, in this paper, we propose RCEND, a novel framework to enhance Relation Classification by Exploiting Noisy Data.
326	Misspelling Oblivious Word Embeddings	Aleksandra Piktus, Necati Bora Edizel, Piotr Bojanowski, Edouard Grave, Rui Ferreira, Fabrizio Silvestri,	In this paper we present a method to learn word embeddings that are resilient to misspellings.
327	Learning Relational Representations by Analogy using Hierarchical Siamese Networks	Gaetano Rossiello, Alfio Gliozzo, Robert Farrell, Nicolas Fauceglia, Michael Glass,	We address relation extraction as an analogy problem by proposing a novel approach to learn representations of relations expressed by their textual mentions. Following this idea, we collect a large set of analogous pairs by matching triples in knowledge bases with web-scale corpora through distant supervision.
328	An Effective Label Noise Model for DNN Text Classification	Ishan Jindal, Daniel Pressel, Brian Lester, Matthew Nokleby,	In this paper, we propose an approach to training deep networks that is robust to label noise.
329	Understanding Learning Dynamics Of Language Models with SVCCA	Naomi Saphra, Adam Lopez,	We present the first study on the learning dynamics of neural language models, using a simple and flexible analysis method called Singular Vector Canonical Correlation Analysis (SVCCA), which enables us to compare learned representations across time and across models, without the need to evaluate directly on annotated data.
330	Using Large Corpus N-gram Statistics to Improve Recurrent Neural Language Models	Yiben Yang, Ji-Ping Wang, Doug Downey,	We explore a technique that uses large corpus n-gram statistics as a regularizer for training a neural network LM on a smaller corpus.
331	Continual Learning for Sentence Representations Using Conceptors	Tianlin Liu, Lyle Ungar, Joao Sedoc,	In this paper, we consider a continual learning scenario for sentence representations: Given a sequence of corpora, we aim to optimize the sentence encoder with respect to the new corpus while maintaining its accuracy on the old corpora.
332	Relation Discovery with Out-of-Relation Knowledge Base as Supervision	Yan Liang, Xin Liu, Jianwen Zhang, Yangqiu Song,	In this paper, we study the problem of how to use out-of-relation knowledge bases to supervise the discovery of unseen relations, where out-of-relation means that relations to discover from the text corpus and those in knowledge bases are not overlapped. We construct a set of constraints between entity pairs based on the knowledge base embedding and then incorporate constraints into the relation discovery by a variational auto-encoder based algorithm.
333	Corpora Generation for Grammatical Error Correction	Jared Lichtarge, Chris Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar, Simon Tong,	We describe two approaches for generating large parallel datasets for GEC using publicly available Wikipedia data.
334	Structural Supervision Improves Learning of Non-Local Grammatical Dependencies	Ethan Wilcox, Peng Qian, Richard Futrell, Miguel Ballesteros, Roger Levy,	Here we investigate whether supervision with hierarchical structure enhances learning of a range of grammatical dependencies, a question that has previously been addressed only for subject-verb agreement.
335	Benchmarking Approximate Inference Methods for Neural Structured Prediction	Lifu Tu, Kevin Gimpel,	In this paper, we compare these two families of inference methods on three sequence labeling datasets.
336	Evaluating and Enhancing the Robustness of Dialogue Systems: A Case Study on a Negotiation Agent	Minhao Cheng, Wei Wei, Cho-Jui Hsieh,	In this paper, we develop algorithms to evaluate the robustness of a dialogue agent by carefully designed attacks using adversarial agents.
337	Investigating Robustness and Interpretability of Link Prediction via Adversarial Modifications	Pouya Pezeshkpour, Yifan Tian, Sameer Singh,	In this paper, we propose adversarial modifications for link prediction models: identifying the fact to add into or remove from the knowledge graph that changes the prediction for a target fact after the model is retrained.
338	Analysis Methods in Neural Language Processing: A Survey	Yonatan Belinkov, James Glass,	In this survey paper, we review analysis methods in neural language processing, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work.
339	Transferable Neural Projection Representations	Chinnadhurai Sankar, Sujith Ravi, Zornitsa Kozareva,	In this paper, we propose a skip-gram based architecture coupled with Locality-Sensitive Hashing (LSH) projections to learn efficient dynamically computable representations.
340	Semantic Role Labeling with Associated Memory Network	Chaoyu Guan, Yuhao Cheng, Hai Zhao,	This paper proposes a novel syntax-agnostic SRL model enhanced by the proposed associated memory network (AMN), which makes use of inter-sentence attention of label-known associated sentences as a kind of memory to further enhance dependency-based SRL.
341	Better, Faster, Stronger Sequence Tagging Constituent Parsers	David Vilares, Mostafa Abdou, Anders Søgaard,	In this work, we address the following weaknesses of such constituent parsers: (a) high error rates around closing brackets of long constituents, (b) large label sets, leading to sparsity, and (c) error propagation arising from greedy decoding.
342	CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition	Yuying Zhu, Guoxin Wang,	In this paper, we investigate a Convolutional Attention Network called CAN for Chinese NER, which consists of a character-based convolutional neural network (CNN) with local-attention layer and a gated recurrent unit (GRU) with global self-attention layer to capture the information from adjacent characters and sentence contexts.
343	Decomposed Local Models for Coordinate Structure Parsing	Hiroki Teranishi, Hiroyuki Shindo, Yuji Matsumoto,	We propose a simple and accurate model for coordination boundary identification.
344	Multi-Task Learning for Japanese Predicate Argument Structure Analysis	Hikaru Omori, Mamoru Komachi,	To address this problem, we present a multi-task learning method for PASA and ENASA.
345	Domain adaptation for part-of-speech tagging of noisy user-generated text	Luisa März, Dietrich Trautmann, Benjamin Roth,	We propose an architecture that trains an out-of-domain model on a large newswire corpus, and transfers those weights by using them as a prior for a model trained on the target domain (a data-set of German Tweets) for which there is very little annotations available.
346	Neural Chinese Address Parsing	Hao Li, Wei Lu, Pengjun Xie, Linlin Li,	This paper introduces a new task – Chinese address parsing – the task of mapping Chinese addresses into semantically meaningful chunks. We create and publicly release a new dataset consisting of 15K Chinese addresses, and conduct extensive experiments on the dataset to investigate the model effectiveness and robustness.
347	Learning Hierarchical Discourse-level Structure for Fake News Detection	Hamid Karimi, Jiliang Tang,	To address these challenges, we propose Hierarchical Discourse-level Structure for Fake news detection.
348	DiscoFuse: A Large-Scale Dataset for Discourse-Based Sentence Fusion	Mor Geva, Eric Malmi, Idan Szpektor, Jonathan Berant,	In this paper, we propose a method for automatically-generating fusion examples from raw text and present DiscoFuse, a large scale dataset for discourse-based sentence fusion.
349	Linguistically-Informed Specificity and Semantic Plausibility for Dialogue Generation	Wei-Jen Ko, Greg Durrett, Junyi Jessy Li,	In this work, we examine whether specificity is solely a frequency-related notion and find that more linguistically-driven specificity measures are better suited to improving response informativeness.
350	Learning to Describe Unknown Phrases with Local and Global Contexts	Shonosuke Ishiwatari, Hiroaki Hayashi, Naoki Yoshinaga, Graham Neubig, Shoetsu Sato, Masashi Toyoda, Masaru Kitsuregawa,	To solve this task, we propose a neural description model that consists of two context encoders and a description decoder.
351	Mining Discourse Markers for Unsupervised Sentence Representation Learning	Damien Sileo, Tim Van de Cruys, Camille Pradel, Philippe Muller,	In the present work, we propose a method to automatically discover sentence pairs with relevant discourse markers, and apply it to massive amounts of data.
352	How Large a Vocabulary Does Text Classification Need? A Variational Approach to Vocabulary Selection	Wenhu Chen, Yu Su, Yilin Shen, Zhiyu Chen, Xifeng Yan, William Yang Wang,	In this paper, we provide a more sophisticated variational vocabulary dropout (VVD) based on variational dropout to perform vocabulary selection, which can intelligently select the subset of the vocabulary to achieve the required performance.
353	Subword-based Compact Reconstruction of Word Embeddings	Shota Sasaki, Jun Suzuki, Kentaro Inui,	In this paper, we propose a method of reconstructing pre-trained word embeddings using subword information that can effectively represent a large number of subword embeddings in a considerably small fixed space.
354	Bayesian Learning for Neural Dependency Parsing	Ehsan Shareghi, Yingzhen Li, Yi Zhu, Roi Reichart, Anna Korhonen,	We demonstrate that in the small data regime, where uncertainty around parameter estimation and model prediction matters the most, Bayesian neural modeling is very effective.
355	AutoSeM: Automatic Task Selection and Mixing in Multi-Task Learning	Han Guo, Ramakanth Pasunuru, Mohit Bansal,	To address these issues, we present AutoSeM, a two-stage MTL pipeline, where the first stage automatically selects the most useful auxiliary tasks via a Beta-Bernoulli multi-armed bandit with Thompson Sampling, and the second stage learns the training mixing ratio of these selected auxiliary tasks via a Gaussian Process based Bayesian optimization framework.
356	Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages	Shauli Ravfogel, Yoav Goldberg, Tal Linzen,	We propose a paradigm that addresses these issues: we create synthetic versions of English, which differ from English in one or more typological parameters, and generate corpora for those languages based on a parsed English corpus.
357	Attention is not Explanation	Sarthak Jain, Byron C. Wallace,	In this work we perform extensive experiments across a variety of NLP tasks that aim to assess the degree to which attention weights provide meaningful “explanations” for predictions.
358	Playing Text-Adventure Games with Graph-Based Deep Reinforcement Learning	Prithviraj Ammanabrolu, Mark Riedl,	We present a deep reinforcement learning architecture that represents the game state as a knowledge graph which is learned during exploration.
359	Information Aggregation for Multi-Head Attention with Routing-by-Agreement	Jian Li, Baosong Yang, Zi-Yi Dou, Xing Wang, Michael R. Lyu, Zhaopeng Tu,	In this work, we propose to improve the information aggregation for multi-head attention with a more powerful routing-by-agreement algorithm.
360	Context Dependent Semantic Parsing over Temporally Structured Data	Charles Chen, Razvan Bunescu,	We describe a new semantic parsing setting that allows users to query the system using both natural language questions and actions within a graphical user interface.
361	Structural Scaffolds for Citation Intent Classification in Scientific Publications	Arman Cohan, Waleed Ammar, Madeleine van Zuylen, Field Cady,	We propose structural scaffolds, a multitask model to incorporate structural information of scientific papers into citations for effective classification of citation intents. In addition, we introduce a new dataset of citation intents (SciCite) which is more than five times larger and covers multiple scientific domains compared with existing datasets.
362	pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference	Mandar Joshi, Eunsol Choi, Omer Levy, Daniel Weld, Luke Zettlemoyer,	This paper proposes new methods for learning and using embeddings of word pairs that implicitly represent background knowledge about such relationships.
363	Submodular Optimization-based Diverse Paraphrasing and its Effectiveness in Data Augmentation	Ashutosh Kumar, Satwik Bhattamishra, Manik Bhandari, Partha Talukdar,	In this work, we focus on the task of obtaining highly diverse paraphrases while not compromising on paraphrasing quality.
364	Let’s Make Your Request More Persuasive: Modeling Persuasive Strategies via Semi-Supervised Neural Nets on Crowdfunding Platforms	Diyi Yang, Jiaao Chen, Zichao Yang, Dan Jurafsky, Eduard Hovy,	Building on theories of persuasion, we propose a neural network to quantify persuasiveness and identify the persuasive strategies in advocacy requests.
365	Recursive Routing Networks: Learning to Compose Modules for Language Understanding	Ignacio Cases, Clemens Rosenbaum, Matthew Riemer, Atticus Geiger, Tim Klinger, Alex Tamkin, Olivia Li, Sandhini Agarwal, Joshua D. Greene, Dan Jurafsky, Christopher Potts, Lauri Karttunen,	We introduce Recursive Routing Networks (RRNs), which are modular, adaptable models that learn effectively in diverse environments. To show that RRNs can learn to specialize to more fine-grained semantic distinctions, we introduce a new corpus of NLI examples involving implicative predicates, and show that the model components become fine-tuned to the inferential signatures that are characteristic of these predicates.
366	Structural Neural Encoders for AMR-to-text Generation	Marco Damonte, Shay B. Cohen,	We investigate the extent to which reentrancies (nodes with multiple parents) have an impact on AMR-to-text generation by comparing graph encoders to tree encoders, where reentrancies are not preserved.
367	Multilingual prediction of Alzheimer’s disease through domain adaptation and concept-based language modelling	Kathleen C. Fraser, Nicklas Linz, Bai Li, Kristina Lundholm Fors, Frank Rudzicz, Alexandra Konig, Jan Alexandersson, Philippe Robert, Dimitrios Kokkinakis,	Here, we compare several methods of domain adaptation to augment a small French dataset of picture descriptions (n = 57) with a much larger English dataset (n = 550), for the task of automatically distinguishing participants with dementia from controls.
368	Ranking and Selecting Multi-Hop Knowledge Paths to Better Predict Human Needs	Debjit Paul, Anette Frank,	We present a novel method to extract, rank, filter and select multi-hop relation paths from a commonsense knowledge resource to interpret the expression of sentiment in terms of their underlying human needs.
369	NLP Whack-A-Mole: Challenges in Cross-Domain Temporal Expression Extraction	Amy Olex, Luke Maffey, Bridget McInnes,	Here we explore parsing issues that arose when running our system, a tool built on Newswire text, on clinical notes in the THYME corpus.
370	Document-Level N-ary Relation Extraction with Multiscale Representation Learning	Robin Jia, Cliff Wong, Hoifung Poon,	In this paper, we propose a novel multiscale neural architecture for document-level n-ary relation extraction.
371	Inferring Which Medical Treatments Work from Reports of Clinical Trials	Eric Lehman, Jay DeYoung, Regina Barzilay, Byron C. Wallace,	In this paper, we present a new task and corpus for making this unstructured published scientific evidence actionable. We present a new corpus for this task comprising 10,000+ prompts coupled with full-text articles describing RCTs.
372	Decay-Function-Free Time-Aware Attention to Context and Speaker Indicator for Spoken Language Understanding	Jonggu Kim, Jong-Hyeok Lee,	To capture salient contextual information for spoken language understanding (SLU) of a dialogue, we propose time-aware models that automatically learn the latent time-decay function of the history without a manual time-decay function.
373	Dialogue Act Classification with Context-Aware Self-Attention	Vipul Raheja, Joel Tetreault,	Dialogue Act Classification with Context-Aware Self-Attention.
374	Affect-Driven Dialog Generation	Pierre Colombo, Wojciech Witon, Ashutosh Modi, James Kennedy, Mubbasir Kapadia,	In this paper, we present an affect-driven dialog system, which generates emotional responses in a controlled manner using a continuous representation of emotions.
375	Multi-Level Memory for Task Oriented Dialogs	Revanth Gangi Reddy, Danish Contractor, Dinesh Raghu, Sachindra Joshi,	In this paper we relax the strong assumptions made by existing architectures and separate memories used for modeling dialog context and KB results.
376	Topic Spotting using Hierarchical Networks with Self Attention	Pooja Chitkara, Ashutosh Modi, Pravalika Avvaru, Sepehr Janghorbani, Mubbasir Kapadia,	We propose a hierarchical model with self attention for topic spotting.
377	Top-Down Structurally-Constrained Neural Response Generation with Lexicalized Probabilistic Context-Free Grammar	Wenchao Du, Alan W. Black,	We applied our model to the task of dialog response generation, and found it significantly improves over sequence-to-sequence baseline, in terms of diversity and relevance.
378	What do Entity-Centric Models Learn? Insights from Entity Linking in Multi-Party Dialogue	Laura Aina, Carina Silberer, Ionut-Teodor Sorodoc, Matthijs Westera, Gemma Boleda,	In this paper we analyze the behavior of two recently proposed entity-centric models in a referential task, Entity Linking in Multi-party Dialogue (SemEval 2018 Task 4).
379	Continuous Learning for Large-scale Personalized Domain Classification	Han Li, Jihwan Lee, Sidharth Mudgal, Ruhi Sarikaya, Young-Bum Kim,	In this paper, we propose CoNDA, a neural-based approach for continuous domain adaption with normalization and regularization.
380	Cross-lingual Transfer Learning for Multilingual Task Oriented Dialog	Sebastian Schuster, Sonal Gupta, Rushin Shah, Mike Lewis,	In this paper, we present a new data set of 57k annotated utterances in English (43k), Spanish (8.6k) and Thai (5k) across the domains weather, alarm, and reminder.
381	Evaluating Coherence in Dialogue Systems using Entailment	Nouha Dziri, Ehsan Kamalloo, Kory Mathewson, Osmar Zaiane,	In this paper, we present interpretable metrics for evaluating topic coherence by making use of distributed sentence representations.
382	On Knowledge distillation from complex networks for response prediction	Siddhartha Arora, Mitesh M. Khapra, Harish G. Ramaswamy,	In order to overcome this, we use standard simple models which do not capture all pairwise interactions, but learn to emulate certain characteristics of a complex teacher network.
383	Cross-lingual Multi-Level Adversarial Transfer to Enhance Low-Resource Name Tagging	Lifu Huang, Heng Ji, Jonathan May,	We focus on improving name tagging for low-resource languages using annotations from related languages.
384	Unsupervised Extraction of Partial Translations for Neural Machine Translation	Benjamin Marie, Atsushi Fujita,	In this work, we assume that new translation knowledge can be extracted from monolingual data, without relying at all on existing parallel data.
385	Low-Resource Syntactic Transfer with Unsupervised Source Reordering	Mohammad Sadegh Rasooli, Michael Collins,	We describe a cross-lingual transfer method for dependency parsing that takes into account the problem of word order differences between source and target languages.
386	Revisiting Adversarial Autoencoder for Unsupervised Word Translation with Cycle Consistency and Improved Training	Tasnim Mohiuddin, Shafiq Joty,	In this work, we revisit adversarial autoencoder for unsupervised word translation and propose two novel extensions to it that yield more stable training and improved results.
387	Addressing word-order Divergence in Multilingual Neural Machine Translation for extremely Low Resource Languages	Rudra Murthy, Anoop Kunchukuttan, Pushpak Bhattacharyya,	To bridge this divergence, we propose to pre-order the assisting language sentences to match the word order of the source language and train the parent model.
388	Massively Multilingual Neural Machine Translation	Roee Aharoni, Melvin Johnson, Orhan Firat,	We perform extensive experiments in training massively multilingual NMT models, involving up to 103 distinct languages and 204 translation directions simultaneously.
389	A Large-Scale Comparison of Historical Text Normalization Systems	Marcel Bollmann,	This paper presents the largest study of historical text normalization done so far.
390	Combining Discourse Markers and Cross-lingual Embeddings for Synonym–Antonym Classification	Michael Roth, Shyam Upadhyay,	In this work, we improve the transfer by exploiting monolingual information, expressed in the form of co-occurrences with discourse markers that convey contrast.
391	Context-Aware Cross-Lingual Mapping	Hanan Aldarmaki, Mona Diab,	In this paper, we propose an alternative to word-level mapping that better reflects sentence-level cross-lingual similarity.
392	Polyglot Contextual Representations Improve Crosslingual Transfer	Phoebe Mulcaire, Jungo Kasai, Noah A. Smith,	We introduce Rosita, a method to produce multilingual contextual word representations by training a single language model on text from multiple languages.
393	Typological Features for Multilingual Delexicalised Dependency Parsing	Manon Scholivet, Franck Dary, Alexis Nasr, Benoit Favre, Carlos Ramisch,	Our work investigates the use of high-level language descriptions in the form of typological features for multilingual dependency parsing.
394	Recommendations for Datasets for Source Code Summarization	Alexander LeClair, Collin McMillan,	In this paper, we make recommendations for these standards from experimental results. We release a dataset based on prior work of over 2.1m pairs of Java methods and one sentence method descriptions from over 28k Java projects.
395	Question Answering as an Automatic Evaluation Metric for News Article Summarization	Matan Eyal, Tal Baumel, Michael Elhadad,	We present an alternative, extrinsic, evaluation metric for this task, Answering Performance for Evaluation of Summaries.
396	Understanding the Behaviour of Neural Abstractive Summarizers using Contrastive Examples	Krtin Kumar, Jackie Chi Kit Cheung,	We investigate how they achieve this performance with respect to human-written gold-standard abstracts, and whether the systems are able to understand deeper syntactic and semantic structures. We generate a set of contrastive summaries which are perturbed, deficient versions of human-written summaries, and test whether existing neural summarizers score them more highly than the human-written summaries.
397	Jointly Extracting and Compressing Documents with Summary State Representations	Afonso Mendes, Shashi Narayan, Sebastião Miranda, Zita Marinho, André F. T. Martins, Shay B. Cohen,	We present a new neural model for text summarization that first extracts sentences from a document and then compresses them.
398	News Article Teaser Tweets and How to Generate Them	Sanjeev Kumar Karn, Mark Buckley, Ulli Waltinger, Hinrich Schütze,	In this work, we define the task of teaser generation and provide an evaluation benchmark and baseline systems for the process of generating teasers.
399	Cross-referencing Using Fine-grained Topic Modeling	Jeffrey Lund, Piper Armstrong, Wilson Fearn, Stephen Cowley, Emily Hales, Kevin Seppi,	We develop a topic-based system for automatically producing candidate cross-references which can be easily verified by human annotators.
400	Conversation Initiation by Diverse News Contents Introduction	Satoshi Akasaki, Nobuhiro Kaji,	In this paper, we consider the system as a conversation initiator and propose a novel task of generating the initial utterance in open-domain non-task-oriented conversation. To address the lack of the training data for this task, we constructed a novel large-scale dataset through crowd-sourcing.
401	Positional Encoding to Control Output Sequence Length	Sho Takase, Naoaki Okazaki,	In this paper, we propose a simple but effective extension of a sinusoidal positional encoding (Vaswani et al., 2017) so that a neural encoder-decoder model preserves the length constraint.
402	The Lower The Simpler: Simplifying Hierarchical Recurrent Models	Chao Wang, Hui Jiang,	To improve the training efficiency of hierarchical recurrent models without compromising their performance, we propose a strategy named as “the lower the simpler”, which is to simplify the baseline models by making the lower layers simpler than the upper layers.
403	Using Natural Language Relations between Answer Choices for Machine Comprehension	Rajkumar Pujari, Dan Goldwasser,	In this paper, we propose a method to leverage the natural language relations between the answer choices, such as entailment and contradiction, to improve the performance of machine comprehension.
404	Saliency Learning: Teaching the Model Where to Pay Attention	Reza Ghaeini, Xiaoli Fern, Hamed Shahbazi, Prasad Tadepalli,	In this paper, we aim to teach the model to make the right prediction for the right reason by providing explanation training and ensuring the alignment of the model’s explanation with the ground truth explanation.
405	Understanding Dataset Design Choices for Multi-hop Reasoning	Jifan Chen, Greg Durrett,	In this paper, we investigate two recently proposed datasets, WikiHop and HotpotQA.
406	Neural Grammatical Error Correction with Finite State Transducers	Felix Stahlberg, Christopher Bryant, Bill Byrne,	We show how to improve LM-GEC by applying modelling techniques based on finite state transducers.
407	Convolutional Self-Attention Networks	Baosong Yang, Longyue Wang, Derek F. Wong, Lidia S. Chao, Zhaopeng Tu,	In this work, we propose novel convolutional self-attention networks, which offer SANs the abilities to 1) strengthen dependencies among neighboring elements, and 2) model the interaction between features extracted by multiple attention heads.
408	Rethinking Complex Neural Network Architectures for Document Classification	Ashutosh Adhikari, Achyudh Ram, Raphael Tang, Jimmy Lin,	Our work provides an open-source platform and the foundation for future work in document classification.
409	Pre-trained language model representations for language generation	Sergey Edunov, Alexei Baevski, Michael Auli,	In this paper, we examine different strategies to integrate pre-trained representations into sequence to sequence models and apply it to neural machine translation and abstractive summarization.
410	Pragmatically Informative Text Generation	Sheng Shen, Daniel Fried, Jacob Andreas, Dan Klein,	We consider two pragmatic modeling methods for text generation: one where pragmatics is imposed by information preservation, and another where pragmatics is imposed by explicit modeling of distractors.
411	Stochastic Wasserstein Autoencoder for Probabilistic Sentence Generation	Hareesh Bahuleyan, Lili Mou, Hao Zhou, Olga Vechtomova,	In this paper, we propose to use the Wasserstein autoencoder (WAE) for probabilistic sentence generation, where the encoder could be either stochastic or deterministic.
412	Benchmarking Hierarchical Script Knowledge	Yonatan Bisk, Jan Buys, Karl Pichotta, Yejin Choi,	In this paper, we introduce KidsCook, a parallel script corpus, as well as a cloze task which matches video captions with missing procedural details.
413	A large-scale study of the effects of word frequency and predictability in naturalistic reading	Cory Shain,	This paper examines the generalizability of this finding to more realistic conditions of sentence processing by studying effects of frequency and predictability in three large-scale naturalistic reading corpora.
414	Augmenting word2vec with latent Dirichlet allocation within a clinical application	Akshay Budhkar, Frank Rudzicz,	This paper presents three hybrid models that directly combine latent Dirichlet allocation and word embedding for distinguishing between speakers with and without Alzheimer’s disease from transcripts of picture descriptions.
415	On the Idiosyncrasies of the Mandarin Chinese Classifier System	Shijia Liu, Hongyuan Mei, Adina Williams, Ryan Cotterell,	In this paper, we introduce an information-theoretic approach to measuring idiosyncrasy; we examine how much the uncertainty in Mandarin Chinese classifiers can be reduced by knowing semantic information about the nouns that the classifiers modify.
416	Joint Learning of Pre-Trained and Random Units for Domain Adaptation in Part-of-Speech Tagging	Sara Meftah, Youssef Tamaazousti, Nasredine Semmar, Hassane Essafi, Fatiha Sadat,	In this paper, we propose to augment the target-network with normalised, weighted and randomly initialised units that beget a better adaptation while maintaining the valuable source knowledge.
417	Show Some Love to Your n-grams: A Bit of Progress and Stronger n-gram Language Modeling Baselines	Ehsan Shareghi, Daniela Gerz, Ivan Vuli?, Anna Korhonen,	In this paper, we examine the recent progress in n-gram literature, running experiments on 50 languages covering all morphological language families.
418	Training Data Augmentation for Context-Sensitive Neural Lemmatizer Using Inflection Tables and Raw Text	Toms Bergmanis, Sharon Goldwater,	To combine the efficiency of type-based learning with the benefits of context, we propose a way to train a context-sensitive lemmatizer with little or no labeled corpus data, using inflection tables from the UniMorph project and raw text examples from Wikipedia that provide sentence contexts for the unambiguous UniMorph examples.
419	A Structural Probe for Finding Syntax in Word Representations	John Hewitt, Christopher D. Manning,	In this work, we propose a structural probe, which evaluates whether syntax trees are embedded in a linear transformation of a neural network’s word representation space.
420	CNM: An Interpretable Complex-valued Network for Matching	Qiuchi Li, Benyou Wang, Massimo Melucci,	This paper seeks to model human language by the mathematical framework of quantum physics.
421	CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge	Alon Talmor, Jonathan Herzig, Nicholas Lourie, Jonathan Berant,	To investigate question answering with prior knowledge, we present CommonsenseQA: a challenging new dataset for commonsense question answering.
422	Probing the Need for Visual Context in Multimodal Machine Translation	Ozan Caglayan, Pranava Madhyastha, Lucia Specia, Loïc Barrault,	In this paper we probe the contribution of the visual modality to state-of-the-art MMT models by conducting a systematic analysis where we partially deprive the models from source-side textual context.
423	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova,	We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.
424	What’s in a Name? Reducing Bias in Bios without Access to Protected Attributes	Alexey Romanov, Maria De-Arteaga, Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, Anna Rumshisky, Adam Kalai,	In the context of mitigating bias in occupation classification, we propose a method for discouraging correlation between the predicted probability of an individual’s true occupation and a word embedding of their name.

TABLE 2: NAACL 2019 Industry Track Papers

	Title	Authors	Highlight
1	Enabling Real-time Neural IME with Incremental Vocabulary Selection	Jiali Yao, Raphael Shu, Xinjian Li, Katsutoshi Ohtsuki, Hideki Nakayama,	In this work, we articulate the bottleneck of neural IME decoding to be the heavy softmax computation over a large vocabulary.
2	Locale-agnostic Universal Domain Classification Model in Spoken Language Understanding	Jihwan Lee, Ruhi Sarikaya, Young-Bum Kim,	In this paper, we introduce an approach for leveraging available data across multiple locales sharing the same language to 1) improve domain classification model accuracy in Spoken Language Understanding and user experience even if new locales do not have sufficient data and 2) reduce the cost of scaling the domain classifier to a large number of locales.
3	Practical Semantic Parsing for Spoken Language Understanding	Marco Damonte, Rahul Goel, Tagyoung Chung,	We build a transfer learning framework for executable semantic parsing.
4	Fast Prototyping a Dialogue Comprehension System for Nurse-Patient Conversations on Symptom Monitoring	Zhengyuan Liu, Hazel Lim, Nur Farah Ain Suhaimi, Shao Chuen Tong, Sharon Ong, Angela Ng, Sheldon Lee, Michael R. Macdonald, Savitha Ramasamy, Pavitra Krishnaswamy, Wai Leng Chow, Nancy F. Chen,	In this work, we investigate fast prototyping of a dialogue comprehension system by leveraging on minimal nurse-to-patient conversations.
5	Graph Convolution for Multimodal Information Extraction from Visually Rich Documents	Xiaojing Liu, Feiyu Gao, Qiong Zhang, Huasha Zhao,	In this paper, we introduce a graph convolution based model to combine textual and visual information presented in VRDs.
6	Diversifying Reply Suggestions Using a Matching-Conditional Variational Autoencoder	Budhaditya Deb, Peter Bailey, Milad Shokouhi,	We propose a constrained-sampling approach to make the variational inference in M-CVAE efficient for our production system.
7	Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting	Yichao Lu, Manisha Srivastava, Jared Kramer, Heba Elfardy, Andrea Kahn, Song Wang, Vikas Bhardwaj,	We present real-world results for two issue types in the customer service domain.
8	Detecting Customer Complaint Escalation with Recurrent Neural Networks and Manually-Engineered Features	Wei Yang, Luchen Tan, Chunwei Lu, Anqi Cui, Han Li, Xi Chen, Kun Xiong, Muzi Wang, Ming Li, Jian Pei, Jimmy Lin,	We describe a hybrid model that tackles this challenge by integrating recurrent neural networks with manually-engineered features.
9	Multi-Modal Generative Adversarial Network for Short Product Title Generation in Mobile E-Commerce	Jianguo Zhang, Pengcheng Zou, Zhao Li, Yao Wan, Xiuming Pan, Yu Gong, Philip S. Yu,	In this paper, we propose a Multi-Modal Generative Adversarial Network (MM-GAN) for short product title generation in E-Commerce, which innovatively incorporates image information and attribute tags from product, as well as textual information from original long titles.
10	A Case Study on Neural Headline Generation for Editing Support	Kazuma Murao, Ken Kobayashi, Hayato Kobayashi, Taichi Yatsuka, Takeshi Masuyama, Tatsuru Higurashi, Yoshimune Tabuchi,	In this paper, we describe a practical use case of neural headline generation in a news aggregator, where dozens of professional editors constantly select important news articles and manually create their headlines, which are much shorter than the original headlines.
11	Neural Lexicons for Slot Tagging in Spoken Language Understanding	Kyle Williams,	We develop models that encode lexicon information as neural features for use in a Long-short term memory neural network.
12	Active Learning for New Domains in Natural Language Understanding	Stanislav Peshterliev, John Kearney, Abhyuday Jagannatha, Imre Kiss, Spyros Matsoukas,	We propose an algorithm called Majority-CRF that uses an ensemble of classification models to guide the selection of relevant utterances, as well as a sequence labeling model to help prioritize informative examples.
13	Scaling Multi-Domain Dialogue State Tracking via Query Reformulation	Pushpendre Rastogi, Arpit Gupta, Tongfei Chen, Mathias Lambert,	We present a novel approach to dialogue state tracking and referring expression resolution tasks.
14	Are the Tools up to the Task? an Evaluation of Commercial Dialog Tools in Developing Conversational Enterprise-grade Dialog Systems	Marie Meteer, Meghan Hickey, Carmi Rothberg, David Nahamoo, Ellen Eide Kislal,	In this paper, we provide both quantitative and qualitative results in three main areas: natural language understanding, dialog, and text generation.
15	Development and Deployment of a Large-Scale Dialog-based Intelligent Tutoring System	Shazia Afzal, Tejas Dhamecha, Nirmal Mukhi, Renuka Sindhgatta, Smit Marvaniya, Matthew Ventura, Jessica Yarbro,	In this paper, we describe and reflect on the design, methods, decisions and assessments that led to the successful deployment of our AI driven DBT currently being used by several hundreds of college level students for practice and self-regulated study in diverse subjects like Sociology, Communications, and American Government.
16	Learning When Not to Answer: a Ternary Reward Structure for Reinforcement Learning Based Question Answering	Fréderic Godin, Anjishnu Kumar, Arpit Mittal,	In this paper, we investigate the challenges of using reinforcement learning agents for question-answering over knowledge graphs for real-world applications.
17	Extraction of Message Sequence Charts from Software Use-Case Descriptions	Girish Palshikar, Nitin Ramrakhiyani, Sangameshwar Patil, Sachin Pawar, Swapnil Hingmire, Vasudeva Varma, Pushpak Bhattacharyya,	In this paper, we describe a linguistic knowledge-based approach to extract MSCs from use-cases.
18	Improving Knowledge Base Construction from Robust Infobox Extraction	Boya Peng, Yejin Huh, Xiao Ling, Michele Banko,	This paper presents a robust approach that tackles all three challenges.
19	A k-Nearest Neighbor Approach towards Multi-level Sequence Labeling	Yue Chen, John Chen,	In this paper we present a new method for intent recognition for complex dialog management in low resource situations.
20	Train One Get One Free: Partially Supervised Neural Network for Bug Report Duplicate Detection and Clustering	Lahari Poddar, Leonardo Neves, William Brendel, Luis Marujo, Sergey Tulyakov, Pradeep Karuturi,	This paper proposes a neural architecture that can jointly (1) detect if two bug reports are duplicates, and (2) aggregate them into latent topics.
21	Robust Semantic Parsing with Adversarial Learning for Domain Generalization	Gabriel Marzinotto, Geraldine Damnati, Frederic Bechet, Benoit Favre,	We propose to perform Semantic Parsing with a domain classification adversarial task, covering various use-cases with or without explicit knowledge of the domain.
22	TOI-CNN: a Solution of Information Extraction on Chinese Insurance Policy	Lin Sun, Kai Zhang, Fule Ji, Zhenhua Yang,	This paper shows a problem of Element Tagging on Insurance Policy (ETIP). We have collected a large Chinese insurance contract dataset and labeled the critical elements of seven categories to test the performance of the proposed method.
23	Cross-lingual Transfer Learning for Japanese Named Entity Recognition	Andrew Johnson, Penny Karanasou, Judith Gaspers, Dietrich Klakow,	This work explores cross-lingual transfer learning (TL) for named entity recognition, focusing on bootstrapping Japanese from English.
24	Neural Text Normalization with Subword Units	Courtney Mansfield, Ming Sun, Yuzong Liu, Ankur Gandhe, Bjorn Hoffmeister,	In this paper, we frame TN as a machine translation task and tackle it with sequence-to-sequence (seq2seq) models.
25	Audio De-identification – a New Entity Recognition Task	Ido Cohn, Itay Laish, Genady Beryozkin, Gang Li, Izhak Shafran, Idan Szpektor, Tzvika Hartman, Avinatan Hassidim, Yossi Matias,	To this end, we define the task of audio de-ID, in which audio spans with entity mentions should be detected. Finally, we introduce a novel metric for audio de-ID and a new evaluation benchmark consisting of a large labeled segment of the Switchboard and Fisher audio datasets and detail our pipeline’s results on it.
26	In Other News: a Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data	Nishant Prateek, Mateusz Łajszczak, Roberto Barra-Chicote, Thomas Drugman, Jaime Lorenzo-Trueba, Thomas Merritt, Srikanth Ronanki, Trevor Wood,	In this paper different styles of speech are analysed based on prosodic variations, from this a model is proposed to synthesise speech in the style of a newscaster, with just a few hours of supplementary data. We pose the problem of synthesising in a target style using limited data as that of creating a bi-style model that can synthesise both neutral-style and newscaster-style speech via a one-hot vector which factorises the two styles.
27	Generate, Filter, and Rank: Grammaticality Classification for Production-Ready NLG Systems	Ashwini Challa, Kartikeya Upasani, Anusha Balakrishnan, Rajen Subba,	We propose the use of a generate, filter, and rank framework, in which candidate responses are first filtered to eliminate unacceptable responses, and then ranked to select the best response. We release a grammatical classification and semantic correctness classification dataset for the weather domain that consists of responses generated by 3 data-driven NLG systems.
28	Content-based Dwell Time Engagement Prediction Model for News Articles	Heidar Davoudi, Aijun An, Gordon Edall,	In this paper, we propose a novel content-based approach based on a deep neural network architecture for predicting article dwell times.