Paper Digest: ACL 2019 Highlights

July 27, 2019October 5, 2019 admin

Download ACL-2019-Paper-Digests.pdf– highlights of all 660 (447 long+ 213 short) ACL-2019 papers.
Annual Meeting of the Association for Computational Linguistics (ACL) is one of the top natural language processing conferences in the world. In 2019, it is to be held in Florence, Italy. There were 2,905 paper submissions, of which 447 were accepted as long papers, and 213 as short papers.

To help AI community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights to quickly get the main idea of each paper.

We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting AI paper, you are welcome to sign up our free paper digest service to get new paper updates customized to your own interests on a daily basis.

Paper Digest Team
team@paperdigest.org

TABLE 1: ACL 2019 Papers

	Title	Authors	Highlight
1	One Time of Interaction May Not Be Enough: Go Deep with an Interaction-over-Interaction Network for Response Selection in Dialogues	Chongyang Tao, Wei Wu, Can Xu, Wenpeng Hu, Dongyan Zhao, Rui Yan,	In this work, we let utterance-response interaction go deep by proposing an interaction-over-interaction network (IoI).
2	Incremental Transformer with Deliberation Decoder for Document Grounded Conversations	Zekang Li, Cheng Niu, Fandong Meng, Yang Feng, Qian Li, Jie Zhou,	In this paper, we propose a novel Transformer-based architecture for multi-turn document grounded conversations.
3	Improving Multi-turn Dialogue Modelling with Utterance ReWriter	Hui Su, Xiaoyu Shen, Rongzhi Zhang, Fei Sun, Pengwei Hu, Cheng Niu, Jie Zhou,	In this paper, we propose rewriting the human utterance as a pre-process to help multi-turn dialgoue modelling.
4	Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study	Chinnadhurai Sankar, Sandeep Subramanian, Chris Pal, Sarath Chandar, Yoshua Bengio,	In this paper, we take an empirical approach to understanding how these models use the available dialog history by studying the sensitivity of the models to artificially introduced unnatural changes or perturbations to their context at test time.
5	Boosting Dialog Response Generation	Wenchao Du, Alan W Black,	To address this problem, we designed an iterative training process and ensemble method based on boosting.
6	Constructing Interpretive Spatio-Temporal Features for Multi-Turn Responses Selection	Junyu Lu, Chenbin Zhang, Zeying Xie, Guang Ling, Tom Chao Zhou, Zenglin Xu,	To address these issues, we propose a Spatio-Temporal Matching network (STM) for response selection.
7	Semantic Parsing with Dual Learning	Ruisheng Cao, Su Zhu, Chen Liu, Jieyu Li, Kai Yu,	In this work, we develop a semantic parsing framework with the dual learning algorithm, which enables a semantic parser to make full use of data (labeled and even unlabeled) through a dual-learning game.
8	Semantic Expressive Capacity with Bounded Memory	Antoine Venant, Alexander Koller,	We investigate the capacity of mechanisms for compositional semantic parsing to describe relations between sentences and semantic representations.
9	AMR Parsing as Sequence-to-Graph Transduction	Sheng Zhang, Xutai Ma, Kevin Duh, Benjamin Van Durme,	We propose an attention-based model that treats AMR parsing as sequence-to-graph transduction.
10	Generating Logical Forms from Graph Representations of Text and Entities	Peter Shaw, Philip Massey, Angelica Chen, Francesco Piccinno, Yasemin Altun,	We present an approach that uses a Graph Neural Network (GNN) architecture to incorporate information about relevant entities and their relations during parsing.
11	Learning Compressed Sentence Representations for On-Device Text Processing	Dinghan Shen, Pengyu Cheng, Dhanasekar Sundararaman, Xinyuan Zhang, Qian Yang, Meng Tang, Asli Celikyilmaz, Lawrence Carin,	In this paper, we propose four different strategies to transform continuous and generic sentence embeddings into a binarized form, while preserving their rich semantic information.
12	The (Non-)Utility of Structural Features in BiLSTM-based Dependency Parsers	Agnieszka Falenska, Jonas Kuhn,	In this paper we aim to answer the question: How much structural context are the BiLSTM representations able to capture implicitly?
13	Automatic Generation of High Quality CCGbanks for Parser Domain Adaptation	Masashi Yoshikawa, Hiroshi Noji, Koji Mineshima, Daisuke Bekki,	We propose a new domain adaptation method for Combinatory Categorial Grammar (CCG) parsing, based on the idea of automatic generation of CCG corpora exploiting cheaper resources of dependency trees.
14	A Joint Named-Entity Recognizer for Heterogeneous Tag-sets Using a Tag Hierarchy	Genady Beryozkin, Yoel Drori, Oren Gilon, Tzvika Hartman, Idan Szpektor,	We propose to use the given tag hierarchy to jointly learn a neural network that shares its tagging layer among all tag-sets.
15	Massively Multilingual Transfer for NER	Afshin Rahimi, Yuan Li, Trevor Cohn,	We propose two techniques for modulating the transfer, suitable for zero-shot or few-shot learning, respectively.
16	Reliability-aware Dynamic Feature Composition for Name Tagging	Ying Lin, Liyuan Liu, Heng Ji, Dong Yu, Jiawei Han,	In this paper, we propose a novel reliability-aware name tagging model to tackle this issue.
17	Unsupervised Pivot Translation for Distant Languages	Yichong Leng, Xu Tan, Tao Qin, Xiang-Yang Li, Tie-Yan Liu,	In this work, we introduce unsupervised pivot translation for distant languages, which translates a language to a distant language through multiple hops, and the unsupervised translation on each hop is relatively easier than the original direct translation.
18	Bilingual Lexicon Induction with Semi-supervision in Non-Isometric Embedding Spaces	Barun Patra, Joel Ruben Antony Moniz, Sarthak Garg, Matthew R. Gormley, Graham Neubig,	We propose a technique to quantitatively estimate this assumption of the isometry between two embedding spaces and empirically show that this assumption weakens as the languages in question become increasingly etymologically distant.
19	An Effective Approach to Unsupervised Machine Translation	Mikel Artetxe, Gorka Labaka, Eneko Agirre,	In this paper, we identify and address several deficiencies of existing unsupervised SMT approaches by exploiting subword information, developing a theoretically well founded unsupervised tuning method, and incorporating a joint refinement procedure.
20	Effective Adversarial Regularization for Neural Machine Translation	Motoki Sato, Jun Suzuki, Shun Kiyono,	We aim to further leverage this promising methodology into more sophisticated and critical neural models in the natural language processing field, i.e., neural machine translation (NMT) models.
21	Revisiting Low-Resource Neural Machine Translation: A Case Study	Rico Sennrich, Biao Zhang,	In this paper, we re-assess the validity of these results, arguing that they are the result of lack of system adaptation to low-resource settings.
22	Domain Adaptive Inference for Neural Machine Translation	Danielle Saunders, Felix Stahlberg, Adrià de Gispert, Bill Byrne,	We investigate adaptive ensemble weighting for Neural Machine Translation, addressing the case of improving performance on a new and potentially unknown domain without sacrificing performance on the original domain.
23	Neural Relation Extraction for Knowledge Base Enrichment	Bayu Distiawan Trisedya, Gerhard Weikum, Jianzhong Qi, Rui Zhang,	This way, NED errors may cause extraction errors that affect the overall precision and recall.To address this problem, we propose an end-to-end relation extraction model for KB enrichment based on a neural encoder-decoder model.
24	Attention Guided Graph Convolutional Networks for Relation Extraction	Zhijiang Guo, Yan Zhang, Wei Lu,	In this work, we propose Attention Guided Graph Convolutional Networks (AGGCNs), a novel model which directly takes full dependency trees as inputs.
25	Spatial Aggregation Facilitates Discovery of Spatial Topics	Aniruddha Maiti, Slobodan Vucetic,	By looking at topic discovery through matrix factorization lenses we show that spatial aggregation allows low rank approximation of the original document-word matrix, in which spatially distinct topics are preserved and non-spatial topics are aggregated into a single topic.
26	Relation Embedding with Dihedral Group in Knowledge Graph	Canran Xu, Ruijiang Li,	To fulfill this gap, we propose a new model called DihEdral, named after dihedral symmetry group.
27	Sequence Tagging with Contextual and Non-Contextual Subword Representations: A Multilingual Evaluation	Benjamin Heinzerling, Michael Strube,	In this work, we conduct an extensive evaluation comparing non-contextual subword embeddings, namely FastText and BPEmb, and a contextual representation method, namely BERT, on multilingual named entity recognition and part-of-speech tagging.
28	Augmenting Neural Networks with First-order Logic	Tao Li, Vivek Srikumar,	In this paper, we present a novel framework for introducing declarative knowledge to neural network architectures in order to guide training and prediction.
29	Self-Regulated Interactive Sequence-to-Sequence Learning	Julia Kreutzer, Stefan Riezler,	We show how self-regulation strategies that decide when to ask for which kind of feedback from a teacher (or from oneself) can be cast as a learning-to-learn problem leading to improved cost-aware sequence-to-sequence learning.
30	You Only Need Attention to Traverse Trees	Mahtab Ahmed, Muhammad Rifayat Samee, Robert E. Mercer,	To this end, we propose Tree Transformer, a model that captures phrase level syntax for constituency trees as well as word-level dependencies for dependency trees by doing recursive traversal only with attention.
31	Cross-Domain Generalization of Neural Constituency Parsers	Daniel Fried, Nikita Kitaev, Dan Klein,	We present three results about the generalization of neural parsers in a zero-shot setting: training on trees from one corpus and evaluating on out-of-domain corpora.
32	Adaptive Attention Span in Transformers	Sainbayar Sukhbaatar, Edouard Grave, Piotr Bojanowski, Armand Joulin,	We propose a novel self-attention mechanism that can learn its optimal attention span.
33	Neural News Recommendation with Long- and Short-term User Representations	Mingxiao An, Fangzhao Wu, Chuhan Wu, Kun Zhang, Zheng Liu, Xing Xie,	In this paper, we propose a neural news recommendation approach which can learn both long- and short-term user representations.
34	Automatic Domain Adaptation Outperforms Manual Domain Adaptation for Predicting Financial Outcomes	Marina Sedinkina, Nikolas Breitkopf, Hinrich Schütze,	In this paper, we automatically create sentiment dictionaries for predicting financial outcomes.
35	Manipulating the Difficulty of C-Tests	Ji-Ung Lee, Erik Schwan, Christian M. Meyer,	We propose two novel manipulation strategies for increasing and decreasing the difficulty of C-tests automatically.
36	Towards Unsupervised Text Classification Leveraging Experts and Word Embeddings	Zied Haj-Yahia, Adrien Sieg, Léa A. Deleris,	In this work, we explore an unsupervised approach to classify documents into categories simply described by a label.
37	Neural Text Simplification of Clinical Letters with a Domain Specific Phrase Table	Matthew Shardlow, Raheel Nawaz,	This work uses neural text simplification methods to automatically improve the understandability of clinical letters for patients.
38	What You Say and How You Say It Matters: Predicting Stock Volatility Using Verbal and Vocal Cues	Yu Qin, Yi Yang,	We propose a multimodal deep regression model (MDRM) that jointly model CEO’s verbal (from text) and vocal (from audio) information in a conference call.
39	Detecting Concealed Information in Text and Speech	Shengli Hu,	In this work, we explore acoustic-prosodic and linguistic indicators of information concealment by collecting a unique corpus of professionals practicing for oral exams while concealing information.
40	Evidence-based Trustworthiness	Yi Zhang, Zachary Ives, Dan Roth,	Our key contribution is to develop a family of probabilistic models that jointly estimate the trustworthiness of sources, and the credibility of claims they assert.
41	Disentangled Representation Learning for Non-Parallel Text Style Transfer	Vineet John, Lili Mou, Hareesh Bahuleyan, Olga Vechtomova,	We propose a simple yet effective approach, which incorporates auxiliary multi-task and adversarial objectives, for style prediction and bag-of-words prediction, respectively.
42	Cross-Sentence Grammatical Error Correction	Shamil Chollampatt, Weiqi Wang, Hwee Tou Ng,	In this paper, we address this serious limitation of existing approaches and improve strong neural encoder-decoder models by appropriately modeling wider contexts.
43	This Email Could Save Your Life: Introducing the Task of Email Subject Line Generation	Rui Zhang, Joel Tetreault,	In this paper, we propose and study the task of \textit{email subject line generation}: automatically generating an email subject line from the email body.
44	Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change	Haim Dubossarsky, Simon Hengchen, Nina Tahmasebi, Dominik Schlechtweg,	We show that, trained on a diachroniccorpus, the skip-gram with negative samplingarchitecture with temporal referencing outper-forms alignment models on a synthetic task aswell as a manual testset.
45	Adversarial Attention Modeling for Multi-dimensional Emotion Regression	Suyang Zhu, Shoushan Li, Guodong Zhou,	In this paper, we propose a neural network-based approach, namely Adversarial Attention Network, to the task of multi-dimensional emotion regression, which automatically rates multiple emotion dimension scores for an input text.
46	Divide, Conquer and Combine: Hierarchical Feature Fusion Network with Local and Global Perspectives for Multimodal Affective Computing	Sijie Mai, Haifeng Hu, Songlong Xing,	We propose a general strategy named divide, conquer and combine’ for multimodal fusion.
47	Modeling Financial Analysts’ Decision Making via the Pragmatics and Semantics of Earnings Calls	Katherine Keith, Amanda Stent,	In this paper, we examine analysts’ decision making behavior as it pertains to the language content of earnings calls.
48	An Interactive Multi-Task Learning Network for End-to-End Aspect-Based Sentiment Analysis	Ruidan He, Wee Sun Lee, Hwee Tou Ng, Daniel Dahlmeier,	In this paper, we propose an interactive multi-task learning network (IMN) which is able to jointly learn multiple related tasks simultaneously at both the token level as well as the document level.
49	Decompositional Argument Mining: A General Purpose Approach for Argument Graph Construction	Debela Gemechu, Chris Reed,	This work presents an approach decomposing propositions into four functional components and identify the patterns linking those components to determine argument structure.
50	MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations	Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, Rada Mihalcea,	Thus, we propose the Multimodal EmotionLines Dataset (MELD), an extension and enhancement of EmotionLines.
51	Open-Domain Targeted Sentiment Analysis via Span-Based Extraction and Classification	Minghao Hu, Yuxing Peng, Zhen Huang, Dongsheng Li, Yiwei Lv,	To address these problems, we propose a span-based extract-then-classify framework, where multiple opinion targets are directly extracted from the sentence under the supervision of target span boundaries, and corresponding polarities are then classified using their span representations.
52	Transfer Capsule Network for Aspect Level Sentiment Classification	Zhuang Chen, Tieyun Qian,	In this paper, we propose a Transfer Capsule Network (TransCap) model for transferring document-level knowledge to aspect-level sentiment classification.
53	Progressive Self-Supervised Attention Learning for Aspect-Level Sentiment Analysis	Jialong Tang, Ziyao Lu, Jinsong Su, Yubin Ge, Linfeng Song, Le Sun, Jiebo Luo,	In this paper, we propose a progressive self-supervised attention learning approach for neural ASC models, which automatically mines useful attention supervision information from a training corpus to refine attention mechanisms.
54	Classification and Clustering of Arguments with Contextualized Word Embeddings	Nils Reimers, Benjamin Schiller, Tilman Beck, Johannes Daxenberger, Christian Stab, Iryna Gurevych,	For the first time, we showhow to leverage the power of contextual-ized word embeddings to classify and clustertopic-dependent arguments, achieving impres-sive results on both tasks and across multipledatasets.
55	Sentiment Tagging with Partial Labels using Modular Architectures	Xiao Zhang, Dan Goldwasser,	In this paper we focus on a popular class of learning problems, sequence prediction applied to several sentiment analysis tasks, and suggest a modular learning approach in which different sub-tasks are learned using separate functional modules, combined to perform the final task while sharing information.
56	DOER: Dual Cross-Shared RNN for Aspect Term-Polarity Co-Extraction	Huaishao Luo, Tianrui Li, Bing Liu, Junbo Zhang,	In this paper, we treat these two tasks as two sequence labeling problems and propose a novel Dual crOss-sharEd RNN framework (DOER) to generate all aspect term-polarity pairs of the input sentence simultaneously.
57	A Corpus for Modeling User and Language Effects in Argumentation on Online Debating	Esin Durmus, Claire Cardie,	This paper presents a dataset of 78,376 debates generated over a 10-year period along with surprisingly comprehensive participant profiles.
58	Topic Tensor Network for Implicit Discourse Relation Recognition in Chinese	Sheng Xu, Peifeng Li, Fang Kong, Qiaoming Zhu, Guodong Zhou,	In this paper, we propose a topic tensor network to recognize Chinese implicit discourse relations with both sentence-level and topic-level representations.
59	Learning from Omission	Bill McDowell, Noah Goodman,	Here, we explore whether pragmatic reasoning during training can improve the quality of learned meanings.
60	Multi-Task Learning for Coherence Modeling	Youmna Farag, Helen Yannakoudakis,	We propose a hierarchical neural network trained in a multi-task fashion that learns to predict a document-level coherence score (at the network’s top layers) along with word-level grammatical roles (at the bottom layers), taking advantage of inductive transfer between the two tasks.
61	Data Programming for Learning Discourse Structure	Sonia Badene, Kate Thompson, Jean-Pierre Lorré, Nicholas Asher,	This paper investigates the advantages and limits of data programming for the task of learning discourse structure.
62	Evaluating Discourse in Structured Text Representations	Elisa Ferracane, Greg Durrett, Junyi Jessy Li, Katrin Erk,	We examine this model in detail, and evaluate on additional discourse-relevant tasks and datasets, in order to assess whether the structured attention improves performance on the end task and whether it captures a text’s discourse structure.
63	Know What You Don’t Know: Modeling a Pragmatic Speaker that Refers to Objects of Unknown Categories	Sina Zarrieß, David Schlangen,	We combine these lines of research and model zero-shot reference games, where a speaker needs to successfully refer to a novel object in an image.
64	End-to-end Deep Reinforcement Learning Based Coreference Resolution	Hongliang Fei, Xu Li, Dingcheng Li, Ping Li,	In this paper, we introduce an end-to-end reinforcement learning based coreference resolution model to directly optimize coreference evaluation metrics.
65	Implicit Discourse Relation Identification for Open-domain Dialogues	Mingyu Derek Ma, Kevin Bowden, Jiaqi Wu, Wen Cui, Marilyn Walker,	In this paper, we designed a novel discourse relation identification pipeline specifically tuned for open-domain dialogue systems.
66	Coreference Resolution with Entity Equalization	Ben Kantor, Amir Globerson,	Here we provide a simple and effective approach for achieving this, via an “Entity Equalization” mechanism.
67	A Cross-Domain Transferable Neural Coherence Model	Peng Xu, Hamidreza Saghir, Jin Sung Kang, Teng Long, Avishek Joey Bose, Yanshuai Cao, Jackie Chi Kit Cheung,	In this work, we propose a local discriminative neural model with a much smaller negative sampling space that can efficiently learn against incorrect orderings.
68	MOROCO: The Moldavian and Romanian Dialectal Corpus	Andrei Butnaru, Radu Tudor Ionescu,	In this work, we introduce the MOldavian and ROmanian Dialectal COrpus (MOROCO), which is freely available for download at https://github.com/butnaruandrei/MOROCO.
69	Just “OneSeC” for Producing Multilingual Sense-Annotated Data	Bianca Scarlini, Tommaso Pasini, Roberto Navigli,	In this paper we formulate the assumption of One Sense per Wikipedia Category and present OneSeC, a language-independent method for the automatic extraction of hundreds of thousands of sentences in which a target word is tagged with its meaning.
70	How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions	Goran Glavaš, Robert Litschko, Sebastian Ruder, Ivan Vuli?,	In this work, we take the first step towards a comprehensive evaluation of CLE models: we thoroughly evaluate both supervised and unsupervised CLE models, for a large number of language pairs, on BLI and three downstream tasks, providing new insights concerning the ability of cutting-edge CLE models to support cross-lingual NLP.
71	SP-10K: A Large-scale Evaluation Set for Selectional Preference Acquisition	Hongming Zhang, Hantian Ding, Yangqiu Song,	To provide a better evaluation method for SP models, we introduce SP-10K, a large-scale evaluation set that provides human ratings for the plausibility of 10,000 SP pairs over five SP relations, covering 2,500 most frequent verbs, nouns, and adjectives in American English.
72	A Wind of Change: Detecting and Evaluating Lexical Semantic Change across Times and Domains	Dominik Schlechtweg, Anna Hätty, Marco Del Tredici, Sabine Schulte im Walde,	We perform an interdisciplinary large-scale evaluation for detecting lexical semantic divergences in a diachronic and in a synchronic task: semantic sense changes across time, and semantic sense changes across domains.
73	Errudite: Scalable, Reproducible, and Testable Error Analysis	Tongshuang Wu, Marco Tulio Ribeiro, Jeffrey Heer, Daniel Weld,	This paper codifies model and task agnostic principles for informative error analysis, and presents Errudite, an interactive tool for better supporting this process.
74	DocRED: A Large-Scale Document-Level Relation Extraction Dataset	Yuan Yao, Deming Ye, Peng Li, Xu Han, Yankai Lin, Zhenghao Liu, Zhiyuan Liu, Lixin Huang, Jie Zhou, Maosong Sun,	In order to acceleratethe research on document-level RE, we in-troduce DocRED, a new dataset constructedfrom Wikipedia and Wikidata with three features: (1) DocRED annotates both named entities and relations, and is the largest human-annotated dataset for document-level RE fromplain text; (2) DocRED requires reading multiple sentences in a document to extract entities and infer their relations by synthesizing all information of the document; (3) alongwith the human-annotated data, we also offer large-scale distantly supervised data, whichenables DocRED to be adopted for both supervised and weakly supervised scenarios.
75	ChID: A Large-scale Chinese IDiom Dataset for Cloze Test	Chujie Zheng, Minlie Huang, Aixin Sun,	In this paper we propose a large-scale Chinese cloze test dataset ChID, which studies the comprehension of idiom, a unique language phenomenon in Chinese.
76	Automatic Evaluation of Local Topic Quality	Jeffrey Lund, Piper Armstrong, Wilson Fearn, Stephen Cowley, Courtni Byun, Jordan Boyd-Graber, Kevin Seppi,	We propose a task designed to elicit human judgments of token-level topic assignments.
77	Crowdsourcing and Aggregating Nested Markable Annotations	Chris Madge, Juntao Yu, Jon Chamberlain, Udo Kruschwitz, Silviu Paun, Massimo Poesio,	In this paper, we present a method for identifying markables for coreference annotation that combines high-performance automatic markable detectors with checking with a Game-With-A-Purpose (GWAP) and aggregation using a Bayesian annotation model.
78	Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems	Chien-Sheng Wu, Andrea Madotto, Ehsan Hosseini-Asl, Caiming Xiong, Richard Socher, Pascale Fung,	In this paper, we propose a Transferable Dialogue State Generator (TRADE) that generates dialogue states from utterances using copy mechanism, facilitating transfer when predicting (domain, slot, value) triplets not encountered during training.
79	Multi-Task Networks with Universe, Group, and Task Feature Learning	Shiva Pentyala, Mengwen Liu, Markus Dreyer,	We present methods for multi-task learning that take advantage of natural groupings of related tasks.
80	Constrained Decoding for Neural NLG from Compositional Representations in Task-Oriented Dialogue	Anusha Balakrishnan, Jinfeng Rao, Kartikeya Upasani, Michael White, Rajen Subba,	In this paper, we (1) propose using tree-structured semantic representations, like those used in traditional rule-based NLG systems, for better discourse-level structuring and sentence-level planning; (2) introduce a challenging dataset using this representation for the weather domain; (3) introduce a constrained decoding approach for Seq2Seq models that leverages this representation to improve semantic correctness; and (4) demonstrate promising results on our dataset and the E2E dataset.
81	OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs	Seungwhan Moon, Pararth Shah, Anuj Kumar, Rajen Subba,	We study a conversational reasoning model that strategically traverses through a large-scale common fact knowledge graph (KG) to introduce engaging and contextually diverse entities and attributes.
82	Coupling Retrieval and Meta-Learning for Context-Dependent Semantic Parsing	Daya Guo, Duyu Tang, Nan Duan, Ming Zhou, Jian Yin,	In this paper, we present an approach to incorporate retrieved datapoints as supporting evidence for context-dependent semantic parsing, such as generating source code conditioned on the class environment.
83	Knowledge-aware Pronoun Coreference Resolution	Hongming Zhang, Yan Song, Yangqiu Song, Dong Yu,	In this paper, we explore how to leverage different types of knowledge to better resolve pronoun coreference with a neural model.
84	Don’t Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference	Yonatan Belinkov, Adam Poliak, Stuart Shieber, Benjamin Van Durme, Alexander Rush,	We propose two probabilistic methods to build models that are more robust to such biases and better transfer across datasets.
85	GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification	Jie Zhou, Xu Han, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, Maosong Sun,	To alleviate this issue, we propose a graph-based evidence aggregating and reasoning (GEAR) framework which enables information to transfer on a fully-connected evidence graph and then utilizes different aggregators to collect multi-evidence information.
86	SherLIiC: A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language Inference	Martin Schmitt, Hinrich Schütze,	We present SherLIiC, a testbed for lexical inference in context (LIiC), consisting of 3985 manually annotated inference rule candidates (InfCands), accompanied by (i) {\textasciitilde}960k unlabeled InfCands, and (ii) {\textasciitilde}190k typed textual relations between Freebase entities extracted from the large entity-linked corpus ClueWeb09.
87	Extracting Symptoms and their Status from Clinical Conversations	Nan Du, Kai Chen, Anjuli Kannan, Linh Tran, Yuhui Chen, Izhak Shafran,	This paper describes novel models tailored for a new application, that of extracting the symptoms mentioned in clinical conversations along with their status.
88	What Makes a Good Counselor? Learning to Distinguish between High-quality and Low-quality Counseling Conversations	Verónica Pérez-Rosas, Xinyi Wu, Kenneth Resnicow, Rada Mihalcea,	In this paper, we explore several linguistic aspects of the collaboration process occurring during counseling conversations.
89	Finding Your Voice: The Linguistic Development of Mental Health Counselors	Justine Zhang, Robert Filbin, Christine Morrison, Jaclyn Weiser, Cristian Danescu-Niculescu-Mizil,	In this work, we develop a computational framework to quantify the extent to which individuals change their linguistic behavior with experience and to study the nature of this evolution.
90	Towards Automating Healthcare Question Answering in a Noisy Multilingual Low-Resource Setting	Jeanne E. Daniel, Willie Brink, Ryan Eloff, Charles Copley,	We discuss ongoing work into automating amultilingual digital helpdesk service availablevia text messaging to pregnant and breastfeed-ing mothers in South Africa.
91	Joint Entity Extraction and Assertion Detection for Clinical Text	Parminder Bhatia, Busra Celikkaya, Mohammed Khalilia,	We consider this as a multi-task problem and present a novel end-to-end neural model to jointly extract entities and negations.
92	HEAD-QA: A Healthcare Dataset for Complex Reasoning	David Vilares, Carlos Gómez-Rodríguez,	We present HEAD-QA, a multi-choice question answering testbed to encourage research on complex reasoning.
93	Are You Convinced? Choosing the More Convincing Evidence with a Siamese Network	Martin Gleize, Eyal Shnarch, Leshem Choshen, Lena Dankin, Guy Moshkowich, Ranit Aharonov, Noam Slonim,	In this paper, we present a new data set, IBM-EviConv, of pairs of evidence labeled for convincingness, designed to be more challenging than existing alternatives.
94	From Surrogacy to Adoption; From Bitcoin to Cryptocurrency: Debate Topic Expansion	Roy Bar-Haim, Dalia Krieger, Orith Toledo-Ronen, Lilach Edelstein, Yonatan Bilu, Alon Halfon, Yoav Katz, Amir Menczel, Ranit Aharonov, Noam Slonim,	We present algorithms for finding both consistent and contrastive expansions and demonstrate their effectiveness empirically.
95	Multimodal and Multi-view Models for Emotion Recognition	Gustavo Aguilar, Viktor Rozgic, Weiran Wang, Chao Wang,	To address this challenge, we study the problem of efficiently combining acoustic and lexical modalities during training while still providing a deployable acoustic model that does not require lexical inputs.
96	Emotion-Cause Pair Extraction: A New Task to Emotion Analysis in Texts	Rui Xia, Zixiang Ding,	In this work, we propose a new task: emotion-cause pair extraction (ECPE), which aims to extract the potential pairs of emotions and corresponding causes in a document.
97	Argument Invention from First Principles	Yonatan Bilu, Ariel Gera, Daniel Hershcovich, Benjamin Sznajder, Dan Lahav, Guy Moshkowich, Anael Malet, Assaf Gavron, Noam Slonim,	In this work we aim to explicitly define a taxonomy of such principled recurring arguments, and, given a controversial topic, to automatically identify which of these arguments are relevant to the topic.
98	Improving the Similarity Measure of Determinantal Point Processes for Extractive Multi-Document Summarization	Sangwoo Cho, Logan Lebanoff, Hassan Foroosh, Fei Liu,	In this paper we seek to strengthen a DPP-based method for extractive multi-document summarization by presenting a novel similarity measure inspired by capsule networks.
99	Global Optimization under Length Constraint for Neural Text Summarization	Takuya Makino, Tomoya Iwakura, Hiroya Takamura, Manabu Okumura,	We propose a global optimization method under length constraint (GOLC) for neural text summarization models.
100	Searching for Effective Neural Extractive Summarization: What Works and What’s Next	Ming Zhong, Pengfei Liu, Danqing Wang, Xipeng Qiu, Xuanjing Huang,	In this paper, we seek to better understand how neural extractive summarization systems could benefit from different types of model architectures, transferable knowledge and learning schemas.
101	A Simple Theoretical Model of Importance for Summarization	Maxime Peyrard,	To this end, we propose simple but rigorous definitions of several concepts that were previously used only intuitively in summarization: Redundancy, Relevance, and Informativeness.
102	Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model	Alexander Fabbri, Irene Li, Tianwei She, Suyi Li, Dragomir Radev,	In this paper, we introduce Multi-News, the first large-scale MDS news dataset.
103	Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency	Shuhuai Ren, Yihe Deng, Kun He, Wanxiang Che,	Based on the synonyms substitution strategy, we introduce a new word replacement order determined by both the word saliency and the classification probability, and propose a greedy algorithm called probability weighted word saliency (PWWS) for text adversarial attack.
104	Heuristic Authorship Obfuscation	Janek Bevendorff, Martin Potthast, Matthias Hagen, Benno Stein,	We deal with the adversary task, called authorship obfuscation: preventing verification by altering a to-be-obfuscated text.
105	Text Categorization by Learning Predominant Sense of Words as Auxiliary Task	Kazuya Shimura, Jiyi Li, Fumiyo Fukumoto,	This paper follows the assumption and presents a method for text categorization by leveraging the predominant sense of words depending on the domain, i.e., domain-specific senses.
106	DeepSentiPeer: Harnessing Sentiment in Review Texts to Recommend Peer Review Decisions	Tirthankar Ghosal, Rajeev Verma, Asif Ekbal, Pushpak Bhattacharyya,	Here in this work, we investigate the role of reviewer sentiment embedded within peer review texts to predict the peer review outcome.
107	Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion	Suyoun Kim, Siddharth Dalmia, Florian Metze,	We present a novel conversational-context aware end-to-end speech recognizer based on a gated neural network that incorporates conversational-context/word/speech embeddings.
108	Figurative Usage Detection of Symptom Words to Improve Personal Health Mention Detection	Adith Iyer, Aditya Joshi, Sarvnaz Karimi, Ross Sparks, Cecile Paris,	To do so, we present two methods: a pipeline-based approach and a feature augmentation-based approach.
109	Complex Word Identification as a Sequence Labelling Task	Sian Gooding, Ekaterina Kochmar,	In this paper, we present a novel approach to CWI based on sequence modelling.
110	Neural News Recommendation with Topic-Aware News Representation	Chuhan Wu, Fangzhao Wu, Mingxiao An, Yongfeng Huang, Xing Xie,	In this paper, we propose a neural news recommendation approach with topic-aware news representations.
111	Poetry to Prose Conversion in Sanskrit as a Linearisation Task: A Case for Low-Resource Languages	Amrith Krishna, Vishnu Sharma, Bishal Santra, Aishik Chakraborty, Pavankumar Satuluri, Pawan Goyal,	k{\=a}vya guru, the approach we propose, essentially consists of a pipeline of two pretraining steps followed by a seq2seq model.
112	Learning Emphasis Selection for Written Text in Visual Media from Crowd-Sourced Label Distributions	Amirreza Shirani, Franck Dernoncourt, Paul Asente, Nedim Lipka, Seokhwan Kim, Jose Echevarria, Thamar Solorio,	We propose a model that employs end-to-end label distribution learning (LDL) on crowd-sourced data and predicts a selection distribution, capturing the inter-subjectivity (common-sense) in the audience as well as the ambiguity of the input.
113	Rumor Detection by Exploiting User Credibility Information, Attention and Multi-task Learning	Quanzhi Li, Qiong Zhang, Luo Si,	In this study, we propose a new multi-task learning approach for rumor detection and stance classification tasks.
114	Context-specific Language Modeling for Human Trafficking Detection from Online Advertisements	Saeideh Shahrokh Esfahani, Michael J. Cafarella, Maziyar Baran Pouyan, Gregory DeAngelo, Elena Eneva, Andy E. Fano,	Here, we present an approach using natural language processing to identify trafficking ads on these websites.
115	Self-Attentional Models for Lattice Inputs	Matthias Sperber, Graham Neubig, Ngoc-Quan Pham, Alex Waibel,	To extend such models to handle lattices, we introduce probabilistic reachability masks that incorporate lattice structure into the model and support lattice scores if available.
116	When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion	Elena Voita, Rico Sennrich, Ivan Titov,	We introduce a model that is suitable for this scenario and demonstrate major gains over a context-agnostic baseline on our new benchmarks without sacrificing performance as measured with BLEU. We then create test sets targeting these phenomena.
117	A Compact and Language-Sensitive Multilingual Translation Method	Yining Wang, Long Zhou, Jiajun Zhang, Feifei Zhai, Jingfang Xu, Chengqing Zong,	In this paper, we propose a compact and language-sensitive method for multilingual translation.
118	Unsupervised Parallel Sentence Extraction with Parallel Segment Detection Helps Machine Translation	Viktor Hangya, Alexander Fraser,	We detect continuous parallel segments in sentence pair candidates and rely on them when mining parallel sentences.
119	Unsupervised Bilingual Word Embedding Agreement for Unsupervised Neural Machine Translation	Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao,	Thus, we propose two methods that train UNMT with UBWE agreement.
120	Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies	Yunsu Kim, Yingbo Gao, Hermann Ney,	This paper shows effective techniques to transfer a pretrained NMT model to a new, unrelated language without shared vocabularies.
121	Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations	Jiatao Gu, Yong Wang, Kyunghyun Cho, Victor O.K. Li,	In this work, we address the degeneracy problem due to capturing spurious correlations by quantitatively analyzing the mutual information between language IDs of the source and decoded sentences.
122	Syntactically Supervised Transformers for Faster Neural Machine Translation	Nader Akoury, Kalpesh Krishna, Mohit Iyyer,	In this work, we propose the syntactically supervised Transformer (SynST), which first autoregressively predicts a chunked parse tree before generating all of the target tokens in one shot conditioned on the predicted parse.
123	Dynamically Composing Domain-Data Selection with Clean-Data Selection by “Co-Curricular Learning” for Neural Machine Translation	Wei Wang, Isaac Caswell, Ciprian Chelba,	This paper introduces a “co-curricular learning” method to compose dynamic domain-data selection with dynamic clean-data selection, for transfer learning across both capabilities.
124	On the Word Alignment from Neural Machine Translation	Xintong Li, Guanlin Li, Lemao Liu, Max Meng, Shuming Shi,	This paper thereby proposes two methods to induce word alignment which are general and agnostic to specific NMT models.
125	Imitation Learning for Non-Autoregressive Neural Machine Translation	Bingzhen Wei, Mingxuan Wang, Hao Zhou, Junyang Lin, Xu Sun,	In this paper, we propose an imitation learning framework for non-autoregressive machine translation, which still enjoys the fast translation speed but gives comparable translation performance compared to its auto-regressive counterpart.
126	Monotonic Infinite Lookback Attention for Simultaneous Machine Translation	Naveen Arivazhagan, Colin Cherry, Wolfgang Macherey, Chung-Cheng Chiu, Semih Yavuz, Ruoming Pang, Wei Li, Colin Raffel,	We present the first simultaneous translation system to learn an adaptive schedule jointly with a neural machine translation (NMT) model that attends over all source tokens read thus far.
127	Global Textual Relation Embedding for Relational Understanding	Zhiyu Chen, Hanwen Zha, Honglei Liu, Wenhu Chen, Xifeng Yan, Yu Su,	In this work, we investigate how to learn a general-purpose embedding of textual relations, defined as the shortest dependency path between entities.
128	Graph Neural Networks with Generated Parameters for Relation Extraction	Hao Zhu, Yankai Lin, Zhiyuan Liu, Jie Fu, Tat-Seng Chua, Maosong Sun,	In this paper, we propose a novel graph neural network with generated parameters (GP-GNNs).
129	Entity-Relation Extraction as Multi-Turn Question Answering	Xiaoya Li, Fan Yin, Zijun Sun, Xiayu Li, Arianna Yuan, Duo Chai, Mingxin Zhou, Jiwei Li,	In this paper, we propose a new paradigm for the task of entity-relation extraction.
130	Exploiting Entity BIO Tag Embeddings and Multi-task Learning for Relation Extraction with Imbalanced Data	Wei Ye, Bo Li, Rui Xie, Zhonghao Sheng, Long Chen, Shikun Zhang,	To mitigate this problem, we propose a multi-task architecture which jointly trains a model to perform relation identification with cross-entropy loss and relation classification with ranking loss.
131	Joint Type Inference on Entities and Relations via Graph Convolutional Networks	Changzhi Sun, Yeyun Gong, Yuanbin Wu, Ming Gong, Daxin Jiang, Man Lan, Shiliang Sun, Nan Duan,	To tackle the joint type inference task, we propose a novel graph convolutional network (GCN) running on an entity-relation bipartite graph.
132	Extracting Multiple-Relations in One-Pass with Pre-Trained Transformers	Haoyu Wang, Ming Tan, Mo Yu, Shiyu Chang, Dakuo Wang, Kun Xu, Xiaoxiao Guo, Saloni Potdar,	In this work, we focus on the task of multiple relation extractions by encoding the paragraph only once.
133	Unsupervised Information Extraction: Regularizing Discriminative Approaches with Relation Distribution Losses	Étienne Simon, Vincent Guigue, Benjamin Piwowarski,	To overcome this limitation, we introduce a skewness loss which encourages the classifier to predict a relation with confidence given a sentence, and a distribution distance loss enforcing that all relations are predicted in average.
134	Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction	Christoph Alt, Marc Hübner, Leonhard Hennig,	To address this gap, we utilize a pre-trained language model, the OpenAI Generative Pre-trained Transformer (GPT) (Radford et al., 2018).
135	ARNOR: Attention Regularization based Noise Reduction for Distant Supervision Relation Classification	Wei Jia, Dai Dai, Xinyan Xiao, Hua Wu,	In this paper, we propose ARNOR, a novel Attention Regularization based NOise Reduction framework for distant supervision relation classification.
136	GraphRel: Modeling Text as Relational Graphs for Joint Entity and Relation Extraction	Tsu-Jui Fu, Peng-Hsuan Li, Wei-Yun Ma,	In this paper, we present GraphRel, an end-to-end relation extraction model which uses graph convolutional networks (GCNs) to jointly learn named entities and relations.
137	DIAG-NRE: A Neural Pattern Diagnosis Framework for Distantly Supervised Neural Relation Extraction	Shun Zheng, Xu Han, Yankai Lin, Peilin Yu, Lu Chen, Ling Huang, Zhiyuan Liu, Wei Xu,	To ease the labor-intensive workload of pattern writing and enable the quick generalization to new relation types, we propose a neural pattern diagnosis framework, DIAG-NRE, that can automatically summarize and refine high-quality relational patterns from noise data with human experts in the loop.
138	Multi-grained Named Entity Recognition	Congying Xia, Chenwei Zhang, Tao Yang, Yaliang Li, Nan Du, Xian Wu, Wei Fan, Fenglong Ma, Philip Yu,	This paper presents a novel framework, MGNER, for Multi-Grained Named Entity Recognition where multiple entities or entity mentions in a sentence could be non-overlapping or totally nested.
139	ERNIE: Enhanced Language Representation with Informative Entities	Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, Qun Liu,	In this paper, we utilize both large-scale textual corpora and KGs to train an enhanced language representation model (ERNIE), which can take full advantage of lexical, syntactic, and knowledge information simultaneously.
140	Multi-Channel Graph Neural Network for Entity Alignment	Yixin Cao, Zhiyuan Liu, Chengjiang Li, Zhiyuan Liu, Juanzi Li, Tat-Seng Chua,	In this paper, we propose a novel Multi-channel Graph Neural Network model (MuGNN) to learn alignment-oriented knowledge graph (KG) embeddings by robustly encoding two KGs via multiple channels.
141	A Neural Multi-digraph Model for Chinese NER with Gazetteers	Ruixue Ding, Pengjun Xie, Xiaoyan Zhang, Wei Lu, Linlin Li, Luo Si,	To automatically learn how to incorporate multiple gazetteers into an NER system, we propose a novel approach based on graph neural networks with a multi-digraph structure that captures the information that the gazetteers offer.
142	Improved Language Modeling by Decoding the Past	Siddhartha Brahma,	We propose a new regularization method based on decoding the last token in the context using the predicted distribution of the next token.
143	Training Hybrid Language Models by Marginalizing over Segmentations	Edouard Grave, Sainbayar Sukhbaatar, Piotr Bojanowski, Armand Joulin,	In this paper, we study the problem of hybrid language modeling, that is using models which can predict both characters and larger units such as character ngrams or words.
144	Improving Neural Language Models by Segmenting, Attending, and Predicting the Future	Hongyin Luo, Lan Jiang, Yonatan Belinkov, James Glass,	In this work, we propose a method that improves language modeling by learning to align the given context and the following phrase.
145	Lightweight and Efficient Neural Natural Language Processing with Quaternion Networks	Yi Tay, Aston Zhang, Anh Tuan Luu, Jinfeng Rao, Shuai Zhang, Shuohang Wang, Jie Fu, Siu Cheung Hui,	This paper proposes a series of lightweight and memory efficient neural architectures for a potpourri of natural language processing (NLP) tasks.
146	Sparse Sequence-to-Sequence Models	Ben Peters, Vlad Niculae, André F. T. Martins,	In this paper, we propose sparse sequence-to-sequence models, rooted in a new family of $\alpha$-entmax transformations, which includes softmax and sparsemax as particular cases, and is sparse for any $\alpha > 1$.
147	On the Robustness of Self-Attentive Models	Yu-Lun Hsieh, Minhao Cheng, Da-Cheng Juan, Wei Wei, Wen-Lian Hsu, Cho-Jui Hsieh,	Specifically, we investigate the attention and feature extraction mechanisms of state-of-the-art recurrent neural networks and self-attentive architectures for sentiment analysis, entailment and machine translation under adversarial attacks.
148	Exact Hard Monotonic Attention for Character-Level Transduction	Shijie Wu, Ryan Cotterell,	In this work, we ask the following question: Is monotonicity really a helpful inductive bias in these tasks?
149	A Lightweight Recurrent Network for Sequence Modeling	Biao Zhang, Rico Sennrich,	In this paper, we propose a lightweight recurrent network, or LRN.
150	Towards Scalable and Reliable Capsule Networks for Challenging NLP Applications	Wei Zhao, Haiyun Peng, Steffen Eger, Erik Cambria, Min Yang,	In this paper, we introduce: (i) an agreement score to evaluate the performance of routing processes at instance-level; (ii) an adaptive optimizer to enhance the reliability of routing; (iii) capsule compression and partial routing to improve the scalability of capsule networks.
151	Soft Representation Learning for Sparse Transfer	Haeju Park, Jinyoung Yeo, Gengyu Wang, Seung-won Hwang,	Our contribution is using adversarial training across tasks, to “soft-code” shared and private spaces, to avoid the shared space gets too sparse.
152	Learning Representations from Imperfect Time Series Data via Tensor Rank Regularization	Paul Pu Liang, Zhun Liu, Yao-Hung Hubert Tsai, Qibin Zhao, Ruslan Salakhutdinov, Louis-Philippe Morency,	To address these concerns, we present a regularization method based on tensor rank minimization.
153	Towards Lossless Encoding of Sentences	Gabriele Prato, Mathieu Duchesneau, Sarath Chandar, Alain Tapp,	In this work, we propose a near lossless method for encoding long sequences of texts as well as all of their sub-sequences into feature rich representations.
154	Open Vocabulary Learning for Neural Chinese Pinyin IME	Zhuosheng Zhang, Yafang Huang, Hai Zhao,	To alleviate such inconveniences, we propose a neural P2C conversion model augmented by an online updated vocabulary with a sampling mechanism to support open vocabulary learning during IME working.
155	Using LSTMs to Assess the Obligatoriness of Phonological Distinctive Features for Phonotactic Learning	Nicole Mirea, Klinton Bicknell,	To ascertain the importance of phonetic information in the form of phonological distinctive features for the purpose of segment-level phonotactic acquisition, we compare the performance of two recurrent neural network models of phonotactic learning: one that has access to distinctive features at the start of the learning process, and one that does not.
156	Better Character Language Modeling through Morphology	Terra Blevins, Luke Zettlemoyer,	We incorporate morphological supervision into character language models (CLMs) via multitasking and show that this addition improves bits-per-character (BPC) performance across 24 languages, even when the morphology data and language modeling data are disjoint.
157	Historical Text Normalization with Delayed Rewards	Simon Flachs, Marcel Bollmann, Anders Søgaard,	Policy gradient training enables direct optimization for exact matches, and while the small datasets in historical text normalization are prohibitive of from-scratch reinforcement learning, we show that policy gradient fine-tuning leads to significant improvements across the board.
158	Stochastic Tokenization with a Language Model for Neural Text Classification	Tatsuya Hiraoka, Hiroyuki Shindo, Yuji Matsumoto,	In this paper, we propose a method to simultaneously learn tokenization and text classification to address these problems.
159	Mitigating Gender Bias in Natural Language Processing: Literature Review	Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, William Yang Wang,	In this paper, we review contemporary studies on recognizing and mitigating gender bias in NLP.
160	Gender-preserving Debiasing for Pre-trained Word Embeddings	Masahiro Kaneko, Danushka Bollegala,	Taking gender-bias as a working example, we propose a debiasing method that preserves non-discriminative gender-related information, while removing stereotypical discriminative gender biases from pre-trained word embeddings.
161	Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology	Ran Zmigrod, Sebastian J. Mielke, Hanna Wallach, Ryan Cotterell,	We present a novel approach for converting between masculine-inflected and feminine-inflected sentences in such languages.
162	A Transparent Framework for Evaluating Unintended Demographic Bias in Word Embeddings	Chris Sweeney, Maryam Najafian,	In this work, we present a transparent framework and metric for evaluating discrimination across protected groups with respect to their word embedding bias.
163	The Risk of Racial Bias in Hate Speech Detection	Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi, Noah A. Smith,	We investigate how annotators’ insensitivity to differences in dialect can lead to racial bias in automatic hate speech detection models, potentially amplifying harm against minority populations.
164	Evaluating Gender Bias in Machine Translation	Gabriel Stanovsky, Noah A. Smith, Luke Zettlemoyer,	We present the first challenge set and evaluation protocol for the analysis of gender bias in machine translation (MT).
165	LSTMEmbed: Learning Word and Sense Representations from a Large Semantically Annotated Corpus with Long Short-Term Memories	Ignacio Iacobacci, Roberto Navigli,	In this paper we explore the capabilities of a bidirectional LSTM model to learn representations of word senses from semantically annotated corpora.
166	Understanding Undesirable Word Embedding Associations	Kawin Ethayarajh, David Duvenaud, Graeme Hirst,	We show that for any embedding model that implicitly does matrix factorization, debiasing vectors post hoc using subspace projection (Bolukbasi et al., 2016) is, under certain conditions, equivalent to training on an unbiased corpus.
167	Unsupervised Discovery of Gendered Language through Latent-Variable Modeling	Alexander Miserlis Hoyle, Lawrence Wolf-Sonkin, Hanna Wallach, Isabelle Augenstein, Ryan Cotterell,	To that end, we introduce a generative latent-variable model that jointly represents adjective (or verb) choice, with its sentiment, given the natural gender of a head (or dependent) noun.
168	Topic Sensitive Attention on Generic Corpora Corrects Sense Bias in Pretrained Embeddings	Vihari Piratla, Sunita Sarawagi, Soumen Chakrabarti,	Given a small corpus D{\_}T pertaining to a limited set of focused topics, our goal is to train embeddings that accurately capture the sense of words in the topic in spite of the limited size of D{\_}T.
169	SphereRE: Distinguishing Lexical Relations with Hyperspherical Relation Embeddings	Chengyu Wang, Xiaofeng He, Aoying Zhou,	In this work, we present a neural representation learning model to distinguish lexical relations among term pairs based on Hyperspherical Relation Embeddings (SphereRE).
170	Multilingual Factor Analysis	Francisco Vargas, Kamen Brestnichki, Alex Papadopoulos Korfiatis, Nils Hammerla,	In this work we approach the task of learning multilingual word representations in an offline manner by fitting a generative latent variable model to a multilingual dictionary.
171	Meaning to Form: Measuring Systematicity as Information	Tiago Pimentel, Arya D. McCarthy, Damian Blasi, Brian Roark, Ryan Cotterell,	In this work, we offer a holistic quantification of the systematicity of the sign using mutual information and recurrent neural networks.
172	Learning Morphosyntactic Analyzers from the Bible via Iterative Annotation Projection across 26 Languages	Garrett Nicolai, David Yarowsky,	In this paper, we address both issues simultaneously: leveraging the high accuracy of English taggers and parsers, we project morphological information onto translations of the Bible in 26 varied test languages.
173	Adversarial Multitask Learning for Joint Multi-Feature and Multi-Dialect Morphological Modeling	Nasser Zalmout, Nizar Habash,	In this paper we explore the use of multitask learning and adversarial training to address morphological richness and dialectal variations in the context of full morphological tagging.
174	Neural Machine Translation with Reordering Embeddings	Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita,	In this paper, we propose a reordering mechanism to learn the reordering embedding of a word based on its contextual information.
175	Neural Fuzzy Repair: Integrating Fuzzy Matches into Neural Machine Translation	Bram Bulte, Arda Tezcan,	We present a simple yet powerful data augmentation method for boosting Neural Machine Translation (NMT) performance by leveraging information retrieved from a Translation Memory (TM).
176	Learning Deep Transformer Models for Machine Translation	Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F. Wong, Lidia S. Chao,	Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for development of the Transformer system, and the other uses deeper language representation but faces the difficulty arising from learning deep networks. Here, we continue the line of research on the latter.
177	Generating Diverse Translations with Sentence Codes	Raphael Shu, Hideki Nakayama, Kyunghyun Cho,	In this work, we attempt to obtain diverse translations by using sentence codes to condition the sentence generation.
178	Self-Supervised Neural Machine Translation	Dana Ruiter, Cristina España-Bonet, Josef van Genabith,	We present a simple new method where an emergent NMT system is used for simultaneously selecting training data and learning internal NMT representations.
179	Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation	Elizabeth Salesky, Matthias Sperber, Alan W Black,	We show that a naive method to create compressed phoneme-like speech representations is far more effective and efficient for translation than traditional frame-level speech features.
180	Visually Grounded Neural Syntax Acquisition	Haoyue Shi, Jiayuan Mao, Kevin Gimpel, Karen Livescu,	We present the Visually Grounded Neural Syntax Learner (VG-NSL), an approach for learning syntactic representations and structures without any explicit supervision.
181	Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation	Vihan Jain, Gabriel Magalhaes, Alexander Ku, Ashish Vaswani, Eugene Ie, Jason Baldridge,	Here, we highlight shortcomings of current metrics for the Room-to-Room dataset (Anderson et al.,2018b) and propose a new metric, Coverage weighted by Length Score (CLS).
182	Expressing Visual Relationships via Language	Hao Tan, Franck Dernoncourt, Zhe Lin, Trung Bui, Mohit Bansal,	To push forward the research in this direction, we first introduce a new language-guided image editing dataset that contains a large number of real image pairs with corresponding editing instructions. We then propose a new relational speaker model based on an encoder-decoder architecture with static relational attention and sequential multi-head attention.
183	Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video	Zhenfang Chen, Lin Ma, Wenhan Luo, Kwan-Yee Kenneth Wong,	In this paper, we address a novel task, namely weakly-supervised spatio-temporally grounding natural sentence in video.
184	The PhotoBook Dataset: Building Common Ground through Visually-Grounded Dialogue	Janosch Haber, Tim Baumgärtner, Ece Takmaz, Lieke Gelderloos, Elia Bruni, Raquel Fernández,	This paper introduces the PhotoBook dataset, a large-scale collection of visually-grounded, task-oriented dialogues in English designed to investigate shared dialogue history accumulating during conversation.
185	Continual and Multi-Task Architecture Search	Ramakanth Pasunuru, Mohit Bansal,	In our work, we first introduce a novel continual architecture search (CAS) approach, so as to continually evolve the model parameters during the sequential training of several tasks, without losing performance on previously learned tasks (via block-sparsity and orthogonality constraints), thus enabling life-long learning. Next, we explore a multi-task architecture search (MAS) approach over ENAS for finding a unified, single cell structure that performs well across multiple tasks (via joint controller rewards), and hence allows more generalizable transfer of the cell structure knowledge to an unseen new task.
186	Semi-supervised Stochastic Multi-Domain Learning using Variational Inference	Yitong Li, Timothy Baldwin, Trevor Cohn,	In this paper we propose a method to distill the important domain signal as part of a multi-domain learning system, using a latent variable model in which parts of a neural model are stochastically gated based on the inferred domain.
187	Boosting Entity Linking Performance by Leveraging Unlabeled Documents	Phong Le, Ivan Titov,	In contrast, we propose an approach which exploits only naturally occurring information: unlabeled documents and Wikipedia.
188	Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following	David Gaddy, Dan Klein,	We consider the problem of learning to map from natural language instructions to state transitions (actions) in a data-efficient manner.
189	Reinforced Training Data Selection for Domain Adaptation	Miaofeng Liu, Yan Song, Hongbin Zou, Tong Zhang,	To make TDS self-adapted to data and task, and to combine it with model training, in this paper, we propose a reinforcement learning (RL) framework that synchronously searches for training instances relevant to the target domain and learns better representations for them.
190	Generating Long and Informative Reviews with Aspect-Aware Coarse-to-Fine Decoding	Junyi Li, Wayne Xin Zhao, Ji-Rong Wen, Yang Song,	In this paper, we propose a novel review generation model by characterizing an elaborately designed aspect-aware coarse-to-fine generation process.
191	PaperRobot: Incremental Draft Generation of Scientific Ideas	Qingyun Wang, Lifu Huang, Zhiying Jiang, Kevin Knight, Heng Ji, Mohit Bansal, Yi Luan,	We present a PaperRobot who performs as an automatic research assistant by (1) conducting deep understanding of a large collection of human-written papers in a target domain and constructing comprehensive background knowledge graphs (KGs); (2) creating new ideas by predicting links from the background KGs, by combining graph attention and contextual text attention; (3) incrementally writing some key elements of a new paper based on memory-attention networks: from the input title along with predicted related entities to generate a paper abstract, from the abstract to generate conclusion and future work, and finally from future work to generate a title for a follow-on paper.
192	Rhetorically Controlled Encoder-Decoder for Modern Chinese Poetry Generation	Zhiqiang Liu, Zuohui Fu, Jie Cao, Gerard de Melo, Yik-Cheung Tam, Cheng Niu, Jie Zhou,	In this paper, we propose a rhetorically controlled encoder-decoder for modern Chinese poetry generation.
193	Enhancing Topic-to-Essay Generation with External Commonsense Knowledge	Pengcheng Yang, Lei Li, Fuli Luo, Tianyu Liu, Xu Sun,	Towards filling this gap, we propose to integrate commonsense from the external knowledge base into the generator through dynamic memory mechanism.
194	Towards Fine-grained Text Sentiment Transfer	Fuli Luo, Peng Li, Pengcheng Yang, Jie Zhou, Yutong Tan, Baobao Chang, Zhifang Sui, Xu Sun,	In this paper, we focus on the task of fine-grained text sentiment transfer (FGST).
195	Data-to-text Generation with Entity Modeling	Ratish Puduppully, Li Dong, Mirella Lapata,	In this work we propose an entity-centric neural architecture for data-to-text generation.
196	Ensuring Readability and Data-fidelity using Head-modifier Templates in Deep Type Description Generation	Jiangjie Chen, Ao Wang, Haiyun Jiang, Suo Feng, Chenguang Li, Yanghua Xiao,	To solve these problems, we propose a head-modifier template based method to ensure the readability and data fidelity of generated type descriptions.
197	Key Fact as Pivot: A Two-Stage Model for Low Resource Table-to-Text Generation	Shuming Ma, Pengcheng Yang, Tianyu Liu, Peng Li, Jie Zhou, Xu Sun,	In this work, we consider the scenario of low resource table-to-text generation, where only limited parallel data is available.
198	Unsupervised Neural Text Simplification	Sai Surya, Abhijit Mishra, Anirban Laha, Parag Jain, Karthik Sankaranarayanan,	The paper presents a first attempt towards unsupervised neural text simplification that relies only on unlabeled text corpora.
199	Syntax-Infused Variational Autoencoder for Text Generation	Xinyuan Zhang, Yi Yang, Siyang Yuan, Dinghan Shen, Lawrence Carin,	We present a syntax-infused variational autoencoder (SIVAE), that integrates sentences with their syntactic trees to improve the grammar of generated sentences.
200	Towards Generating Long and Coherent Text with Multi-Level Latent Variable Models	Dinghan Shen, Asli Celikyilmaz, Yizhe Zhang, Liqun Chen, Xin Wang, Jianfeng Gao, Lawrence Carin,	In this paper, we propose to leverage several multi-level structures to learn a VAE model for generating long, and coherent text.
201	Jointly Learning Semantic Parser and Natural Language Generator via Dual Information Maximization	Hai Ye, Wenjie Li, Lu Wang,	In this paper, we model the duality of these two tasks via a joint learning framework, and demonstrate its effectiveness of boosting the performance on both tasks.
202	Learning to Select, Track, and Generate for Data-to-Text	Hayate Iso, Yui Uehara, Tatsuya Ishigaki, Hiroshi Noji, Eiji Aramaki, Ichiro Kobayashi, Yusuke Miyao, Naoaki Okazaki, Hiroya Takamura,	We propose a data-to-text generation model with two modules, one for tracking and the other for text generation.
203	Reinforced Dynamic Reasoning for Conversational Question Generation	Boyuan Pan, Hao Li, Ziyu Yao, Deng Cai, Huan Sun,	Towards that end, we propose a new approach named Reinforced Dynamic Reasoning network, which is based on the general encoder-decoder framework but incorporates a reasoning procedure in a dynamic manner to better understand what has been asked and what to ask next about the passage into the general encoder-decoder framework.
204	TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks	Guy Lev, Michal Shmueli-Scheuer, Jonathan Herzig, Achiya Jerbi, David Konopnicki,	In this paper, we propose a novel method that automatically generates summaries for scientific papers, by utilizing videos of talks at scientific conferences.
205	Improving Abstractive Document Summarization with Salient Information Modeling	Yongjian You, Weijia Jia, Tianyi Liu, Wenmian Yang,	To tackle the above difficulties, we propose a Transformer-based encoder-decoder framework with two novel extensions for abstractive document summarization.
206	Unsupervised Neural Single-Document Summarization of Reviews via Learning Latent Discourse Structure and its Ranking	Masaru Isonuma, Junichiro Mori, Ichiro Sakata,	This paper focuses on the end-to-end abstractive summarization of a single product review without supervision.
207	BiSET: Bi-directional Selective Encoding with Template for Abstractive Summarization	Kai Wang, Xiaojun Quan, Rui Wang,	In this paper, we propose a novel Bi-directional Selective Encoding with Template (BiSET) model, which leverages template discovered from training data to softly select key information from each source article to guide its summarization process.
208	Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards	Hou Pong Chan, Wang Chen, Lu Wang, Irwin King,	To address this problem, we propose a reinforcement learning (RL) approach for keyphrase generation, with an adaptive reward function that encourages a model to generate both sufficient and accurate keyphrases.
209	Scoring Sentence Singletons and Pairs for Abstractive Summarization	Logan Lebanoff, Kaiqiang Song, Franck Dernoncourt, Doo Soon Kim, Seokhwan Kim, Walter Chang, Fei Liu,	This paper attempts to bridge the gap by ranking sentence singletons and pairs together in a unified space.
210	Keep Meeting Summaries on Topic: Abstractive Multi-Modal Meeting Summarization	Manling Li, Lingyu Zhang, Heng Ji, Richard J. Radke,	Specifically, we propose a multi-modal hierarchical attention across three levels: segment, utterance and word.
211	Adversarial Domain Adaptation Using Artificial Titles for Abstractive Title Generation	Francine Chen, Yan-Ying Chen,	This paper examines techniques for adapting from a labeled source domain to an unlabeled target domain in the context of an encoder-decoder model for text generation.
212	BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization	Eva Sharma, Chen Li, Lu Wang,	In this work, we present a novel dataset, BIGPATENT, consisting of 1.3 million records of U.S. patent documents along with human written abstractive summaries.
213	Ranking Generated Summaries by Correctness: An Interesting but Challenging Application for Natural Language Inference	Tobias Falke, Leonardo F. R. Ribeiro, Prasetya Ajie Utama, Ido Dagan, Iryna Gurevych,	In this paper, we evaluate summaries produced by state-of-the-art models via crowdsourcing and show that such errors occur frequently, in particular with more abstractive models.
214	Self-Supervised Learning for Contextualized Extractive Summarization	Hong Wang, Xin Wang, Wenhan Xiong, Mo Yu, Xiaoxiao Guo, Shiyu Chang, William Yang Wang,	In this paper, we aim to improve this task by introducing three auxiliary pre-training tasks that learn to capture the document-level context in a self-supervised fashion.
215	On the Summarization of Consumer Health Questions	Asma Ben Abacha, Dina Demner-Fushman,	In this paper, we study neural abstractive models for medical question summarization.
216	Unsupervised Rewriter for Multi-Sentence Compression	Yang Zhao, Xiaoyu Shen, Wei Bi, Akiko Aizawa,	To tackle the above-mentioned issues, we present a neural rewriter for multi-sentence compression that does not need any parallel corpus.
217	Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text	Jianxing Yu, Zhengjun Zha, Jian Yin,	This paper focuses on the topic of inferential machine comprehension, which aims to fully understand the meanings of given text to answer generic questions, especially the ones needed reasoning skills.
218	Token-level Dynamic Self-Attention Network for Multi-Passage Reading Comprehension	Yimeng Zhuang, Huadong Wang,	In this paper, we introduce the Dynamic Self-attention Network (DynSAN) for multi-passage reading comprehension task, which processes cross-passage information at token-level and meanwhile avoids substantial computational costs.
219	Explicit Utilization of General Knowledge in Machine Reading Comprehension	Chao Wang, Hui Jiang,	To bridge the gap between Machine Reading Comprehension (MRC) models and human beings, which is mainly reflected in the hunger for data and the robustness to noise, in this paper, we explore how to integrate the neural networks of MRC models with the general knowledge of human beings.
220	Multi-style Generative Reading Comprehension	Kyosuke Nishida, Itsumi Saito, Kosuke Nishida, Kazutoshi Shinoda, Atsushi Otsuka, Hisako Asano, Junji Tomita,	We propose a multi-style abstractive summarization model for question answering, called Masque.
221	Retrieve, Read, Rerank: Towards End-to-End Multi-Document Reading Comprehension	Minghao Hu, Yuxing Peng, Zhen Huang, Dongsheng Li,	In this work, we present RE$^3$QA, a unified question answering model that combines context retrieving, reading comprehension, and answer reranking to predict the final answer.
222	Multi-Hop Paragraph Retrieval for Open-Domain Question Answering	Yair Feldman, Ran El-Yaniv,	We present a method for retrieving multiple supporting paragraphs, nested amidst a large knowledge base, which contain the necessary evidence to answer a given question.
223	E3: Entailment-driven Extracting and Editing for Conversational Machine Reading	Victor Zhong, Luke Zettlemoyer,	We present a new conversational machine reading model that jointly extracts a set of decision rules from the procedural text while reasoning about which are entailed by the conversational history and which still need to be edited to create questions for the user.
224	Generating Question-Answer Hierarchies	Kalpesh Krishna, Mohit Iyyer,	In this paper, we present SQUASH (Specificity-controlled Question-Answer Hierarchies), a novel and challenging text generation task that converts an input document into a hierarchy of question-answer pairs.
225	Answering while Summarizing: Multi-task Learning for Multi-hop QA with Evidence Extraction	Kosuke Nishida, Kyosuke Nishida, Masaaki Nagata, Atsushi Otsuka, Itsumi Saito, Hisako Asano, Junji Tomita,	This study focuses on the task of explainable multi-hop QA, which requires the system to return the answer with evidence sentences by reasoning and gathering disjoint pieces of the reference texts.
226	Enhancing Pre-Trained Language Representations with Rich Knowledge for Machine Reading Comprehension	An Yang, Quan Wang, Jing Liu, Kai Liu, Yajuan Lyu, Hua Wu, Qiaoqiao She, Sujian Li,	In this work, we investigate the potential of leveraging external knowledge bases (KBs) to further improve BERT for MRC.
227	XQA: A Cross-lingual Open-domain Question Answering Dataset	Jiahua Liu, Yankai Lin, Zhiyuan Liu, Maosong Sun,	In this paper, we construct a novel dataset XQA for cross-lingual OpenQA research.
228	Compound Probabilistic Context-Free Grammars for Grammar Induction	Yoon Kim, Chris Dyer, Alexander Rush,	We study a formalization of the grammar induction problem that models sentences as being generated by a compound probabilistic context free grammar.
229	Semi-supervised Domain Adaptation for Dependency Parsing	Zhenghua Li, Xue Peng, Min Zhang, Rui Wang, Luo Si,	We propose a simple domain embedding approach to merge the source- and target-domain training data, which is shown to be more effective than both direct corpus concatenation and multi-task learning.
230	Head-Driven Phrase Structure Grammar Parsing on Penn Treebank	Junru Zhou, Hai Zhao,	This paper makes the first attempt to formulate a simplified HPSG by integrating constituent and dependency formal representations into head-driven phrase structure.
231	Distantly Supervised Named Entity Recognition using Positive-Unlabeled Learning	Minlong Peng, Xiaoyu Xing, Qi Zhang, Jinlan Fu, Xuanjing Huang,	In this work, we explore the way to perform named entity recognition (NER) using only unlabeled data and named entity dictionaries.
232	Multi-Task Semantic Dependency Parsing with Policy Gradient for Learning Easy-First Strategies	Shuhei Kurita, Anders Søgaard,	We propose a new iterative predicate selection (IPS) algorithm for SDP.
233	GCDT: A Global Context Enhanced Deep Transition Architecture for Sequence Labeling	Yijin Liu, Fandong Meng, Jinchao Zhang, Jinan Xu, Yufeng Chen, Jie Zhou,	In this paper, we try to address these issues, and thus propose a Global Context enhanced Deep Transition architecture for sequence labeling named GCDT.
234	Unsupervised Learning of PCFGs with Normalizing Flow	Lifeng Jin, Finale Doshi-Velez, Timothy Miller, Lane Schwartz, William Schuler,	This paper describes a neural PCFG inducer which employs context embeddings (Peters et al., 2018) in a normalizing flow model (Dinh et al., 2015) to extend PCFG induction to use semantic and morphological information.
235	Variance of Average Surprisal: A Better Predictor for Quality of Grammar from Unsupervised PCFG Induction	Lifeng Jin, William Schuler,	In order to find a better indicator for quality of induced grammars, this paper correlates several linguistically- and psycholinguistically-motivated predictors to parsing accuracy on a large multilingual grammar induction evaluation data set.
236	Cross-Domain NER using Cross-Domain Language Modeling	Chen Jia, Xiaobo Liang, Yue Zhang,	To address this issue, we consider using cross-domain LM as a bridge cross-domains for NER domain adaptation, performing cross-domain and cross-task knowledge transfer by designing a novel parameter generation network.
237	Graph-based Dependency Parsing with Graph Neural Networks	Tao Ji, Yuanbin Wu, Man Lan,	We investigate the problem of efficiently incorporating high-order features into neural graph-based dependency parsing.
238	Wide-Coverage Neural A* Parsing for Minimalist Grammars	John Torr, Milos Stanojevic, Mark Steedman, Shay B. Cohen,	This paper presents the first ever application of this formalism to the task of realistic wide-coverage parsing.
239	Multi-Modal Sarcasm Detection in Twitter with Hierarchical Fusion Model	Yitao Cai, Huiyu Cai, Xiaojun Wan,	In this paper, we focus on multi-modal sarcasm detection for tweets consisting of texts and images in Twitter. We create a multi-modal sarcasm detection dataset based on Twitter.
240	Topic-Aware Neural Keyphrase Generation for Social Media Language	Yue Wang, Jing Li, Hou Pong Chan, Irwin King, Michael R. Lyu, Shuming Shi,	To facilitate automatic language understanding, we study keyphrase prediction, distilling salient information from massive posts.
241	#YouToo? Detection of Personal Recollections of Sexual Harassment on Social Media	Arijit Ghosh Chowdhury, Ramit Sawhney, Rajiv Ratn Shah, Debanjan Mahata,	This work attempts to aggregate such experiences of sexual abuse to facilitate a better understanding of social media constructs and to bring about social change.
242	Multi-task Pairwise Neural Ranking for Hashtag Segmentation	Mounica Maddela, Wei Xu, Daniel Preo?iuc-Pietro,	We build a dataset of 12,594 hashtags split into individual segments and propose a set of approaches for hashtag segmentation by framing it as a pairwise ranking problem between candidate segmentations.
243	Entity-Centric Contextual Affective Analysis	Anjalie Field, Yulia Tsvetkov,	We show how contextualized word embeddings can be used to capture affect dimensions in portrayals of people.
244	Sentence-Level Evidence Embedding for Claim Verification with Hierarchical Attention Networks	Jing Ma, Wei Gao, Shafiq Joty, Kam-Fai Wong,	In this paper, we propose a novel end-to-end hierarchical attention network focusing on learning to represent coherent evidence as well as their semantic relatedness with the claim.
245	Predicting Human Activities from User-Generated Content	Steven Wilson, Rada Mihalcea,	In this paper, we explore the task of predicting human activities from user-generated content. We collect a dataset containing instances of social media users writing about a range of everyday activities.
246	You Write like You Eat: Stylistic Variation as a Predictor of Social Stratification	Angelo Basile, Albert Gatt, Malvina Nissim,	Inspired by Labov’s seminal work on stylisticvariation as a function of social stratification,we develop and compare neural models thatpredict a person’s presumed socio-economicstatus, obtained through distant supervision,from their writing style on social media.
247	Encoding Social Information with Graph Convolutional Networks forPolitical Perspective Detection in News Media	Chang Li, Dan Goldwasser,	In this paper, we highlight the importance of contextualizing social information, capturing how this information is disseminated in social networks.
248	Fine-Grained Spoiler Detection from Large-Scale Review Corpora	Mengting Wan, Rishabh Misra, Ndapa Nakashole, Julian McAuley,	This paper presents computational approaches for automatically detecting critical plot twists in reviews of media products. First, we created a large-scale book review dataset that includes fine-grained spoiler annotations at the sentence-level, as well as book and (anonymized) user information.
249	Celebrity Profiling	Matti Wiegmann, Benno Stein, Martin Potthast,	With this paper we introduce the Webis Celebrity Corpus 2019.
250	Dataset Creation for Ranking Constructive News Comments	Soichiro Fujita, Hayato Kobayashi, Manabu Okumura,	In this paper, we address directly evaluating the quality of comments on the basis of “constructiveness,” separately from user feedback. To this end, we create a new dataset including 100K+ Japanese comments with constructiveness scores (C-scores).
251	Enhancing Air Quality Prediction with Social Media and Natural Language Processing	Jyun-Yu Jiang, Xue Sun, Wei Wang, Sean Young,	In this paper, we propose to exploit social media and natural language processing techniques to enhance air quality prediction.
252	Twitter Homophily: Network Based Prediction of User’s Occupation	Jiaqi Pan, Rishabh Bhardwaj, Wei Lu, Hai Leong Chieu, Xinghao Pan, Ni Yi Puay,	In this paper, we investigate the importance of social network information compared to content information in the prediction of a Twitter user’s occupational class.
253	Domain Adaptive Dialog Generation via Meta Learning	Kun Qian, Zhou Yu,	We propose a domain adaptive dialog generation method based on meta-learning (DAML).
254	Strategies for Structuring Story Generation	Angela Fan, Mike Lewis, Yann Dauphin,	We explore coarse-to-fine models for creating narrative texts of several hundred words, and introduce new models which decompose stories by abstracting over actions and entities.
255	Argument Generation with Retrieval, Planning, and Realization	Xinyu Hua, Zhe Hu, Lu Wang,	In this paper, we study the specific problem of counter-argument generation, and present a novel framework, CANDELA.
256	A Simple Recipe towards Reducing Hallucination in Neural Surface Realisation	Feng Nie, Jin-Ge Yao, Jinpeng Wang, Rong Pan, Chin-Yew Lin,	To mitigate this issue, we propose to integrate a language understanding module for data refinement with self-training iterations to effectively induce strong equivalence between the input data and the paired text.
257	Cross-Modal Commentator: Automatic Machine Commenting Based on Cross-Modal Information	Pengcheng Yang, Zhihan Zhang, Fuli Luo, Lei Li, Chengyang Huang, Xu Sun,	To remedy this, we propose a new task: cross-model automatic commenting (CMAC), which aims to make comments by integrating multiple modal contents.
258	A Working Memory Model for Task-oriented Dialog Response Generation	Xiuyi Chen, Jiaming Xu, Bo Xu,	Inspired by the psychological studies on working memory, we propose a working memory model (WMM2Seq) for dialog response generation.
259	Cognitive Graph for Multi-Hop Reading Comprehension at Scale	Ming Ding, Chang Zhou, Qibin Chen, Hongxia Yang, Jie Tang,	We propose a new CogQA framework for multi-hop reading comprehension question answering in web-scale documents.
260	Multi-hop Reading Comprehension across Multiple Documents by Reasoning over Heterogeneous Graphs	Ming Tu, Guangtao Wang, Jing Huang, Yun Tang, Xiaodong He, Bowen Zhou,	In this paper, we propose a new model to tackle the multi-hop RC problem.
261	Explore, Propose, and Assemble: An Interpretable Model for Multi-Hop Reading Comprehension	Yichen Jiang, Nitish Joshi, Yen-Chun Chen, Mohit Bansal,	To achieve this, we propose an interpretable 3-module system called Explore-Propose-Assemble reader (EPAr).
262	Avoiding Reasoning Shortcuts: Adversarial Evaluation, Training, and Model Development for Multi-Hop QA	Yichen Jiang, Mohit Bansal,	In this paper, we show that in the multi-hop HotpotQA (Yang et al., 2018) dataset, the examples often contain reasoning shortcuts through which models can directly locate the answer by word-matching the question with a sentence in the context.
263	Exploiting Explicit Paths for Multi-hop Reading Comprehension	Souvik Kundu, Tushar Khot, Ashish Sabharwal, Peter Clark,	We propose a novel, path-based reasoning approach for the multi-hop reading comprehension task where a system needs to combine facts from multiple passages to answer a question.
264	Sentence Mover’s Similarity: Automatic Evaluation for Multi-Sentence Texts	Elizabeth Clark, Asli Celikyilmaz, Noah A. Smith,	We introduce methods based on sentence mover’s similarity; our automatic metrics evaluate text in a continuous space using word and sentence embeddings.
265	Analysis of Automatic Annotation Suggestions for Hard Discourse-Level Tasks in Expert Domains	Claudia Schulz, Christian M. Meyer, Jan Kiesewetter, Michael Sailer, Elisabeth Bauer, Martin R. Fischer, Frank Fischer, Iryna Gurevych,	To speed up and ease annotations, we investigate the viability of automatically generated annotation suggestions for such tasks.
266	Deep Dominance – How to Properly Compare Deep Neural Models	Rotem Dror, Segev Shlomov, Roi Reichart,	In this paper, we propose to adapt to this problem a recently proposed test for the Almost Stochastic Dominance relation between two distributions.
267	We Need to Talk about Standard Splits	Kyle Gorman, Steven Bedrick,	We argue that randomly generated splits should be used in system evaluation.
268	Aiming beyond the Obvious: Identifying Non-Obvious Cases in Semantic Similarity Datasets	Nicole Peinelt, Maria Liakata, Dong Nguyen,	This paper proposes to distinguish obvious from non-obvious text pairs based on superficial lexical overlap and ground-truth labels.
269	Putting Evaluation in Context: Contextual Embeddings Improve Machine Translation Evaluation	Nitika Mathur, Timothy Baldwin, Trevor Cohn,	We proposed a simple unsupervised metric, and additional supervised metrics which rely on contextual word embeddings to encode the translation and reference sentences.
270	Joint Effects of Context and User History for Predicting Online Conversation Re-entries	Xingshan Zeng, Jing Li, Lu Wang, Kam-Fai Wong,	Specifically, we propose a neural framework with three main layers, each modeling context, user history, and interactions between them, to explore how the conversation context and user chatting history jointly result in their re-entry behavior.
271	CONAN – COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech	Yi-Ling Chung, Elizaveta Kuzmenko, Serra Sinem Tekiroglu, Marco Guerini,	In this paper, we describe the creation of the first large-scale, multilingual, expert-based dataset of hate-speech/counter-narrative pairs.
272	Categorizing and Inferring the Relationship between the Text and Image of Twitter Posts	Alakananda Vempala, Daniel Preo?iuc-Pietro,	We show that by combining the text and image information, we can build a machine learning approach that accurately distinguishes between the relationship types.
273	Who Sides with Whom? Towards Computational Construction of Discourse Networks for Political Debates	Sebastian Padó, Andre Blessing, Nico Blokker, Erenay Dayanik, Sebastian Haunss, Jonas Kuhn,	This paper presents three contributions towards this goal: (a) a requirements analysis, linking the task to knowledge base population; (b) an annotated pilot corpus of migration claims based on German newspaper reports; (c) initial modeling results.
274	Analyzing Linguistic Differences between Owner and Staff Attributed Tweets	Daniel Preo?iuc-Pietro, Rita Devlin Marier,	In this study, we challenge this assumption and study the linguistic differences between posts signed by the account owner or attributed to their staff.
275	Exploring Author Context for Detecting Intended vs Perceived Sarcasm	Silviu Oprea, Walid Magdy,	We define author context as the embedded representation of their historical posts on Twitter and suggest neural models that extract these representations.
276	Open Domain Event Extraction Using Neural Latent Variable Models	Xiao Liu, Heyan Huang, Yue Zhang,	We consider open domain event extraction, the task of extracting unconstraint types of events from news clusters.
277	Multi-Level Matching and Aggregation Network for Few-Shot Relation Classification	Zhi-Xiu Ye, Zhen-Hua Ling,	This paper presents a multi-level matching and aggregation network (MLMAN) for few-shot relation classification.
278	Quantifying Similarity between Relations with Fact Distribution	Weize Chen, Hao Zhu, Xu Han, Zhiyuan Liu, Maosong Sun,	We introduce a conceptually simple and effective method to quantify the similarity between relations in knowledge bases.
279	Matching the Blanks: Distributional Similarity for Relation Learning	Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, Tom Kwiatkowski,	In this paper, we build on extensions of Harris’ distributional hypothesis to relations, as well as recent advances in learning text representations (specifically, BERT), to build task agnostic relation representations solely from entity-linked text.
280	Fine-Grained Temporal Relation Extraction	Siddharth Vashishtha, Benjamin Van Durme, Aaron Steven White,	We present a novel semantic framework for modeling temporal relations and event durations that maps pairs of events to real-valued scales.
281	FIESTA: Fast IdEntification of State-of-The-Art models using adaptive bandit algorithms	Henry Moss, Andrew Moore, David Leslie, Paul Rayson,	We present FIESTA, a model selection approach that significantly reduces the computational resources required to reliably identify state-of-the-art performance from large collections of candidate models.
282	Is Attention Interpretable?	Sofia Serrano, Noah A. Smith,	We conclude that while attention noisily predicts input components’ overall importance to a model, it is by no means a fail-safe indicator.
283	Correlating Neural and Symbolic Representations of Language	Grzegorz Chrupa?a, Afra Alishahi,	Here we present two methods based on Representational Similarity Analysis (RSA) and Tree Kernels (TK) which allow us to directly quantify how strongly the information encoded in neural activation patterns corresponds to information represented by symbolic structures such as syntax trees.
284	Interpretable Neural Predictions with Differentiable Binary Variables	Joost Bastings, Wilker Aziz, Ivan Titov,	We propose a latent model that mixes discrete and continuous behaviour allowing at the same time for binary selections and gradient-based training without REINFORCE.
285	Transformer-XL: Attentive Language Models beyond a Fixed-Length Context	Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc Le, Ruslan Salakhutdinov,	We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence.
286	Domain Adaptation of Neural Machine Translation by Lexicon Induction	Junjie Hu, Mengzhou Xia, Graham Neubig, Jaime Carbonell,	To remedy this problem, we propose an unsupervised adaptation method which fine-tunes a pre-trained out-of-domain NMT model using a pseudo-in-domain corpus.
287	Reference Network for Neural Machine Translation	Han Fu, Chenghao Liu, Jianling Sun,	In this paper, we propose a Reference Network to incorporate referring process into translation decoding of NMT.
288	Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation	Chenze Shao, Yang Feng, Jinchao Zhang, Fandong Meng, Xilin Chen, Jie Zhou,	In this paper, we propose two approaches to retrieve the target sequential information for NAT to enhance its translation ability while preserving the fast-decoding property.
289	STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework	Mingbo Ma, Liang Huang, Hao Xiong, Renjie Zheng, Kaibo Liu, Baigong Zheng, Chuanqiang Zhang, Zhongjun He, Hairong Liu, Xing Li, Hua Wu, Haifeng Wang,	Within this framework, we present a very simple yet surprisingly effective “wait-k” policy trained to generate the target sentence concur- rently with the source sentence, but always k words behind.
290	Look Harder: A Neural Machine Translation Model with Hard Attention	Sathish Reddy Indurthi, Insoo Chung, Sangha Kim,	In this work, we propose a hard-attention based NMT model which selects a subset of source tokens for each target token to effectively handle long sequence translation.
291	Robust Neural Machine Translation with Joint Textual and Phonetic Embedding	Hairong Liu, Mingbo Ma, Liang Huang, Hao Xiong, Zhongjun He,	We propose to improve the robustness of NMT to homophone noises by 1) jointly embedding both textual and phonetic information of source sentences, and 2) augmenting the training dataset with homophone noises.
292	A Simple and Effective Approach to Automatic Post-Editing with Transfer Learning	Gonçalo M. Correia, André F. T. Martins,	in this paper, we propose an alternative where we fine-tune pre-trained BERT models on both the encoder and decoder of an APE system, exploring several parameter sharing strategies.
293	Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation	Nima Pourdamghani, Nada Aldarrab, Marjan Ghazvininejad, Kevin Knight, Jonathan May,	In this work we explore this intuition by breaking translation into a two step process: generating a rough gloss by means of a dictionary and then translating’ the resulting pseudo-translation, or Translationese’ into a fully fluent translation.
294	Training Neural Machine Translation to Apply Terminology Constraints	Georgiana Dinu, Prashant Mathur, Marcello Federico, Yaser Al-Onaizan,	This paper proposes a novel method to inject custom terminology into neural machine translation at run time.
295	Leveraging Local and Global Patterns for Self-Attention Networks	Mingzhou Xu, Derek F. Wong, Baosong Yang, Yue Zhang, Lidia S. Chao,	To address this argument, we propose a hybrid attention mechanism to dynamically leverage both of the local and global information.
296	Sentence-Level Agreement for Neural Machine Translation	Mingming Yang, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Min Zhang, Tiejun Zhao,	In this paper, we propose a sentence-level agreement module to directly minimize the difference between the representation of source and target sentence.
297	Multilingual Unsupervised NMT using Shared Encoder and Language-Specific Decoders	Sukanta Sen, Kamal Kumar Gupta, Asif Ekbal, Pushpak Bhattacharyya,	In this paper, we propose a multilingual unsupervised NMT scheme which jointly trains multiple languages with a shared encoder and multiple decoders.
298	Lattice-Based Transformer Encoder for Neural Machine Translation	Fengshun Xiao, Jiangtong Li, Hai Zhao, Rui Wang, Kehai Chen,	We propose two methods: 1) lattice positional encoding and 2) lattice-aware self-attention.
299	Multi-Source Cross-Lingual Model Transfer: Learning What to Share	Xilun Chen, Ahmed Hassan Awadallah, Hany Hassan, Wei Wang, Claire Cardie,	In this work, we focus on the multilingual transfer setting where training data in multiple source languages is leveraged to further boost target language performance.
300	Unsupervised Multilingual Word Embedding with Limited Resources using Neural Language Models	Takashi Wada, Tomoharu Iwata, Yuji Matsumoto,	To overcome this problem, we propose a new unsupervised multilingual embedding method that does not rely on such assumption and performs well under resource-poor scenarios, namely when only a small amount of monolingual data (i.e., 50k sentences) are available, or when the domains of monolingual data are different across languages.
301	Choosing Transfer Languages for Cross-Lingual Learning	Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Yuyan Zhang, Mengzhou Xia, Shruti Rijhwani, Junxian He, Zhisong Zhang, Xuezhe Ma, Antonios Anastasopoulos, Patrick Littell, Graham Neubig,	In this paper, we consider this task of automatically selecting optimal transfer languages as a ranking problem, and build models that consider the aforementioned features to perform this prediction.
302	CogNet: A Large-Scale Cognate Database	Khuyagbaatar Batsuren, Gabor Bella, Fausto Giunchiglia,	This paper introduces CogNet, a new, large-scale lexical database that provides cognates -words of common origin and meaning- across languages.
303	Neural Decipherment via Minimum-Cost Flow: From Ugaritic to Linear B	Jiaming Luo, Yuan Cao, Regina Barzilay,	In this paper we propose a novel neural approach for automatic decipherment of lost languages.
304	Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network	Kun Xu, Liwei Wang, Mo Yu, Yansong Feng, Yan Song, Zhiguo Wang, Dong Yu,	In this paper, we introduce the topic entity graph, a local sub-graph of an entity, to represent entities with their contextual information in KG.
305	Zero-Shot Cross-Lingual Abstractive Sentence Summarization through Teaching Generation and Attention	Xiangyu Duan, Mingming Yin, Min Zhang, Boxing Chen, Weihua Luo,	We propose to solve this zero-shot problem by using resource-rich monolingual ASSUM system to teach zero-shot cross-lingual ASSUM system on both summary word generation and attention.
306	Improving Low-Resource Cross-lingual Document Retrieval by Reranking with Deep Bilingual Representations	Rui Zhang, Caitlin Westerfield, Sungrok Shim, Garrett Bingham, Alexander Fabbri, William Hu, Neha Verma, Dragomir Radev,	In this paper, we propose to boost low-resource cross-lingual document retrieval performance with deep bilingual query-document representations.
307	Are Girls Neko or Sh=ojo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization	Mozhi Zhang, Keyulu Xu, Ken-ichi Kawarabayashi, Stefanie Jegelka, Jordan Boyd-Graber,	For non-isomorphic pairs, our method (Iterative Normalization) transforms monolingual embeddings to make orthogonal alignment easier by simultaneously enforcing that (1) individual word vectors are unit length, and (2) each language’s average vector is zero.
308	MAAM: A Morphology-Aware Alignment Model for Unsupervised Bilingual Lexicon Induction	Pengcheng Yang, Fuli Luo, Peng Chen, Tianyu Liu, Xu Sun,	To tackle this challenge, we propose a morphology-aware alignment model for the UBLI task.
309	Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings	Mikel Artetxe, Holger Schwenk,	In this paper, we propose a new method for this task based on multilingual sentence embeddings.
310	JW300: A Wide-Coverage Parallel Corpus for Low-Resource Languages	Željko Agi?, Ivan Vuli?,	In this paper, we present the resource and showcase its utility in experiments with cross-lingual word embedding induction and multi-source part-of-speech projection.
311	Cross-Lingual Syntactic Transfer through Unsupervised Adaptation of Invertible Projections	Junxian He, Zhisong Zhang, Taylor Berg-Kirkpatrick, Graham Neubig,	In this paper, we focus on methods for cross-lingual transfer to distant languages and propose to learn a generative model with a structured prior that utilizes labeled source data and unlabeled target data jointly.
312	Unsupervised Joint Training of Bilingual Word Embeddings	Benjamin Marie, Atsushi Fujita,	In this work, we propose a new approach that trains unsupervised BWE jointly on synthetic parallel data generated through unsupervised machine translation.
313	Inferring Concept Hierarchies from Text Corpora via Hyperbolic Embeddings	Matthew Le, Stephen Roller, Laetitia Papaxanthos, Douwe Kiela, Maximilian Nickel,	For this purpose, we propose a new method combining hyperbolic embeddings and Hearst patterns.
314	Is Word Segmentation Necessary for Deep Learning of Chinese Representations?	Xiaoya Li, Yuxian Meng, Xiaofei Sun, Qinghong Han, Arianna Yuan, Jiwei Li,	In this paper, we ask the fundamental question of whether Chinese word segmentation (CWS) is necessary for deep learning-based Chinese Natural Language Processing.
315	Towards Understanding Linear Word Analogies	Kawin Ethayarajh, David Duvenaud, Graeme Hirst,	We provide novel justification for the addition of SGNS word vectors by showing that it automatically down-weights the more frequent word, as weighting schemes do ad hoc.
316	On the Compositionality Prediction of Noun Phrases using Poincar’e Embeddings	Abhik Jana, Dima Puzyrev, Alexander Panchenko, Pawan Goyal, Chris Biemann, Animesh Mukherjee,	We introduce a novel technique to blend hierarchical information with distributional information for predicting compositionality.
317	Robust Representation Learning of Biomedical Names	Minh C. Phan, Aixin Sun, Yi Tay,	This paper proposes a new framework for learning robust representations of biomedical names and terms.
318	Relational Word Embeddings	Jose Camacho-Collados, Luis Espinosa Anke, Steven Schockaert,	As an alternative, in this paper we propose to encode relational knowledge in a separate word embedding, which is aimed to be complementary to a given standard word embedding.
319	Unraveling Antonym’s Word Vectors through a Siamese-like Network	Mathias Etcheverry, Dina Wonsever,	We present an approach to unravel antonymy and synonymy from word vectors based on a siamese network inspired approach.
320	Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks	Shikhar Vashishth, Manik Bhandari, Prateek Yadav, Piyush Rai, Chiranjib Bhattacharyya, Partha Talukdar,	In this paper, we overcome this problem by proposing SynGCN, a flexible Graph Convolution based method for learning word embeddings.
321	Word and Document Embedding with vMF-Mixture Priors on Context Word Vectors	Shoaib Jameel, Steven Schockaert,	Our hypothesis in this paper is that embedding models can be improved by explicitly imposing a cluster structure on the set of context word vectors.
322	Delta Embedding Learning	Xiao Zhang, Ji Wu, Dejing Dou,	We propose a novel learning technique called Delta Embedding Learning, which can be applied to general NLP tasks to improve performance by optimized tuning of the word embeddings.
323	Annotation and Automatic Classification of Aspectual Categories	Markus Egg, Helena Prepens, Will Roberts,	We present the first annotated resource for the aspectual classification of German verb tokens in their clausal context.
324	Putting Words in Context: LSTM Language Models and Lexical Ambiguity	Laura Aina, Kristina Gulordava, Gemma Boleda,	We investigate how an LSTM language model deals with lexical ambiguity in English, designing a method to probe its hidden representations for lexical and contextual information about words.
325	Making Fast Graph-based Algorithms with Graph Metric Embeddings	Andrey Kutuzov, Mohammad Dorgham, Oleksiy Oliynyk, Chris Biemann, Alexander Panchenko,	We introduce a simple yet efficient and effective approach for learning graph embeddings.
326	Embedding Imputation with Grounded Language Information	Ziyi Yang, Chenguang Zhu, Vin Sachidananda, Eric Darve,	In this paper, we propose an approach for embedding imputation which uses grounded information in the form of a knowledge graph.
327	The Effectiveness of Simple Hybrid Systems for Hypernym Discovery	William Held, Nizar Habash,	This paper evaluates the contribution of both paradigms to hybrid success by evaluating the benefits of hybrid treatment of baseline models from each paradigm.
328	BERT-based Lexical Substitution	Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou,	To address these issues, we propose an end-to-end BERT-based lexical substitution approach which can propose and validate substitute candidates without using any annotated data or manually curated resources.
329	Exploring Numeracy in Word Embeddings	Aakanksha Naik, Abhilasha Ravichander, Carolyn Rose, Eduard Hovy,	In this work, we show that existing embedding models are inadequate at constructing representations that capture salient aspects of mathematical meaning for numbers, which is important for language understanding.
330	HighRES: Highlight-based Reference-less Evaluation of Summarization	Hardy Hardy, Shashi Narayan, Andreas Vlachos,	To address this issue, we propose a novel approach for manual evaluation, Highlight-based Reference-less Evaluation of Summarization (HighRES), in which summaries are assessed by multiple annotators against the source document via manually highlighted salient content in the latter.
331	EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing	Yue Dong, Zichao Li, Mehdi Rezagholizadeh, Jackie Chi Kit Cheung,	We present the first sentence simplification model that learns explicit edit operations (ADD, DELETE, and KEEP) via a neural programmer-interpreter approach.
332	Decomposable Neural Paraphrase Generation	Zichao Li, Xin Jiang, Lifeng Shang, Qun Liu,	This paper presents Decomposable Neural Paraphrase Generator (DNPG), a Transformer-based model that can learn and generate paraphrases of a sentence at different levels of granularity in a disentangled way.
333	Transforming Complex Sentences into a Semantic Hierarchy	Christina Niklaus, Matthias Cetto, André Freitas, Siegfried Handschuh,	We present an approach for recursively splitting and rephrasing complex English sentences into a novel semantic hierarchy of simplified sentences, with each of them presenting a more regular structure that may facilitate a wide variety of artificial intelligence tasks, such as machine translation (MT) or information extraction (IE).
334	Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference	Tom McCoy, Ellie Pavlick, Tal Linzen,	A machine learning system can score well on a given test set by relying on heuristics that are effective for frequent example types but break down in more challenging cases. We study this issue within natural language inference (NLI), the task of determining whether one sentence entails another.
335	Zero-Shot Entity Linking by Reading Entity Descriptions	Lajanugen Logeswaran, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, Jacob Devlin, Honglak Lee,	We present the zero-shot entity linking task, where mentions must be linked to unseen entities without in-domain labeled data.
336	Dual Adversarial Neural Transfer for Low-Resource Named Entity Recognition	Joey Tianyi Zhou, Hao Zhang, Di Jin, Hongyuan Zhu, Meng Fang, Rick Siow Mong Goh, Kenneth Kwok,	We propose a new neural transfer method termed Dual Adversarial Transfer Network (DATNet) for addressing low-resource Named Entity Recognition (NER).
337	Scalable Syntax-Aware Language Models Using Knowledge Distillation	Adhiguna Kuncoro, Chris Dyer, Laura Rimell, Stephen Clark, Phil Blunsom,	To answer this question, we introduce an efficient knowledge distillation (KD) technique that transfers knowledge from a syntactic language model trained on a small corpus to an LSTM language model, hence enabling the LSTM to develop a more structurally sensitive representation of the larger training data it learns from.
338	An Imitation Learning Approach to Unsupervised Parsing	Bowen Li, Lili Mou, Frank Keller,	In our work, we propose an imitation learning approach to unsupervised parsing, where we transfer the syntactic knowledge induced by PRPN to a Tree-LSTM model with discrete parsing actions.
339	Women’s Syntactic Resilience and Men’s Grammatical Luck: Gender-Bias in Part-of-Speech Tagging and Dependency Parsing	Aparna Garimella, Carmen Banea, Dirk Hovy, Rada Mihalcea,	To address this, we annotate the Wall Street Journal part of the Penn Treebank with the gender information of the articles’ authors, and build taggers and parsers trained on this data that show performance differences in text written by men and women.
340	Multilingual Constituency Parsing with Self-Attention and Pre-Training	Nikita Kitaev, Steven Cao, Dan Klein,	We show that constituency parsing benefits from unsupervised pre-training across a variety of languages and a range of pre-training conditions.
341	A Multilingual BPE Embedding Space for Universal Sentiment Lexicon Induction	Mengjie Zhao, Hinrich Schütze,	We present a new method for sentiment lexicon induction that is designed to be applicable to the entire range of typological diversity of the world’s languages.
342	Tree Communication Models for Sentiment Analysis	Yuan Zhang, Yue Zhang,	In this paper, we propose a tree communication model using graph convolutional neural network and graph recurrent neural network, which allows rich information exchange between phrases constituent tree.
343	Improved Sentiment Detection via Label Transfer from Monolingual to Synthetic Code-Switched Text	Bidisha Samanta, Niloy Ganguly, Soumen Chakrabarti,	We present an effective technique for synthesizing labeled code-switched text from labeled monolingual text, which is relatively readily available.
344	Exploring Sequence-to-Sequence Learning in Aspect Term Extraction	Dehong Ma, Sujian Li, Fangzhao Wu, Xing Xie, Houfeng Wang,	To tackle these problems, we first explore to formalize ATE as a sequence-to-sequence (Seq2Seq) learning task where the source sequence and target sequence are composed of words and labels respectively. At the same time, to make Seq2Seq learning suit to ATE where labels correspond to words one by one, we design the gated unit networks to incorporate corresponding word representation into the decoder, and position-aware attention to pay more attention to the adjacent words of a target word.
345	Aspect Sentiment Classification Towards Question-Answering with Reinforced Bidirectional Attention Network	Jingjing Wang, Changlong Sun, Shoushan Li, Xiaozhong Liu, Luo Si, Min Zhang, Guodong Zhou,	This paper extends the research to interactive reviews and proposes a new research task, namely Aspect Sentiment Classification towards Question-Answering (ASC-QA), for real-world applications.
346	ELI5: Long Form Question Answering	Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, Jason Weston, Michael Auli,	We introduce the first large-scale corpus for long form question answering, a task requiring elaborate and in-depth answers to open-ended questions.
347	Textbook Question Answering with Multi-modal Context Graph Understanding and Self-supervised Open-set Comprehension	Daesik Kim, Seonhoon Kim, Nojun Kwak,	In this work, we introduce a novel algorithm for solving the textbook question answering (TQA) task which describes more realistic QA problems compared to other recent tasks.
348	Generating Question Relevant Captions to Aid Visual Question Answering	Jialin Wu, Zeyuan Hu, Raymond Mooney,	We present a novel approach to better VQA performance that exploits this connection by jointly generating captions that are targeted to help answer a specific visual question.
349	Multi-grained Attention with Object-level Grounding for Visual Question Answering	Pingping Huang, Jianhui Huang, Yuqing Guo, Min Qiao, Yong Zhu,	To address this problem, this paper proposes a multi-grained attention method.
350	Psycholinguistics Meets Continual Learning: Measuring Catastrophic Forgetting in Visual Question Answering	Claudio Greco, Barbara Plank, Raquel Fernández, Raffaella Bernardi,	We study the issue of catastrophic forgetting in the context of neural multimodal approaches to Visual Question Answering (VQA).
351	Improving Visual Question Answering by Referring to Generated Paragraph Captions	Hyounghun Kim, Mohit Bansal,	Hence, we propose a combined Visual and Textual Question Answering (VTQA) model which takes as input a paragraph caption as well as the corresponding image, and answers the given question based on both inputs.
352	Shared-Private Bilingual Word Embeddings for Neural Machine Translation	Xuebo Liu, Derek F. Wong, Yang Liu, Lidia S. Chao, Tong Xiao, Jingbo Zhu,	In this paper, we propose shared-private bilingual word embeddings, which give a closer relationship between the source and target embeddings, and which also reduce the number of model parameters.
353	Literary Event Detection	Matthew Sims, Jong Ho Park, David Bamman,	In this work we present a new dataset of literary events-events that are depicted as taking place within the imagined space of a novel.
354	Assessing the Ability of Self-Attention Networks to Learn Word Order	Baosong Yang, Longyue Wang, Derek F. Wong, Lidia S. Chao, Zhaopeng Tu,	To this end, we propose a novel word reordering detection task to quantify how well the word order information learned by SAN and RNN.
355	Energy and Policy Considerations for Deep Learning in NLP	Emma Strubell, Ananya Ganesh, Andrew McCallum,	In this paper we bring this issue to the attention of NLP researchers by quantifying the approximate financial and environmental costs of training a variety of recently successful neural network models for NLP.
356	What Does BERT Learn about the Structure of Language?	Ganesh Jawahar, Benoît Sagot, Djamé Seddah,	In this work, we provide novel support for this claim by performing a series of experiments to unpack the elements of English language structure learned by BERT.
357	A Just and Comprehensive Strategy for Using NLP to Address Online Abuse	David Jurgens, Libby Hemphill, Eshwar Chandrasekharan,	In this position paper, we argue that the community needs to make three substantive changes: (1) expanding our scope of problems to tackle both more subtle and more serious forms of abuse, (2) developing proactive technologies that counter or inhibit abuse before it harms, and (3) reframing our effort within a framework of justice to promote healthy communities.
358	Learning from Dialogue after Deployment: Feed Yourself, Chatbot!	Braden Hancock, Antoine Bordes, Pierre-Emmanuel Mazare, Jason Weston,	In this work, we propose the self-feeding chatbot, a dialogue agent with the ability to extract new training examples from the conversations it participates in.
359	Generating Responses with a Specific Emotion in Dialog	Zhenqiao Song, Xiaoqing Zheng, Lu Liu, Mu Xu, Xuanjing Huang,	We propose an emotional dialogue system (EmoDS) that can generate the meaningful responses with a coherent structure for a post, and meanwhile express the desired emotion explicitly or implicitly within a unified framework.
360	Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention	Wenhu Chen, Jianshu Chen, Pengda Qin, Xifeng Yan, William Yang Wang,	To alleviate such scalability issue, we exploit the structure of dialog acts to build a multi-layer hierarchical graph, where each act is represented as a root-to-leaf route on the graph.
361	Incremental Learning from Scratch for Task-Oriented Dialogue Systems	Weikang Wang, Jiajun Zhang, Qian Li, Mei-Yuh Hwang, Chengqing Zong, Zhifei Li,	To address this problem, we propose a novel incremental learning framework to design task-oriented dialogue systems, or for short Incremental Dialogue System (IDS), without pre-defining the exhaustive list of user needs. To evaluate our method, we propose a new dataset which simulates unanticipated user needs in the deployment stage.
362	ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation	Hainan Zhang, Yanyan Lan, Liang Pang, Jiafeng Guo, Xueqi Cheng,	In this paper, we propose a new model, named ReCoSa, to tackle this problem.
363	Dialogue Natural Language Inference	Sean Welleck, Jason Weston, Arthur Szlam, Kyunghyun Cho,	In this paper, we frame the consistency of dialogue agents as natural language inference (NLI) and create a new natural language inference dataset called Dialogue NLI.
364	Budgeted Policy Learning for Task-Oriented Dialogue Systems	Zhirui Zhang, Xiujun Li, Jianfeng Gao, Enhong Chen,	This paper presents a new approach that extends Deep Dyna-Q (DDQ) by incorporating a Budget-Conscious Scheduling (BCS) to best utilize a fixed, small amount of user interactions (budget) for learning task-oriented dialogue agents.
365	Comparison of Diverse Decoding Methods from Conditional Language Models	Daphne Ippolito, Reno Kriz, Joao Sedoc, Maria Kustikova, Chris Callison-Burch,	In this work, we perform an extensive survey of decoding-time strategies for generating diverse outputs from a conditional language model.
366	Retrieval-Enhanced Adversarial Training for Neural Response Generation	Qingfu Zhu, Lei Cui, Wei-Nan Zhang, Furu Wei, Ting Liu,	In this paper, we propose a Retrieval-Enhanced Adversarial Training (REAT) method for neural response generation.
367	Vocabulary Pyramid Network: Multi-Pass Encoding and Decoding with Multi-Level Vocabularies for Response Generation	Cao Liu, Shizhu He, Kang Liu, Jun Zhao,	To tackle the above two problems, we present a Vocabulary Pyramid Network (VPN) which is able to incorporate multi-pass encoding and decoding with multi-level vocabularies into response generation.
368	On-device Structured and Context Partitioned Projection Networks	Sujith Ravi, Zornitsa Kozareva,	To address this challenge, we propose an on-device neural network SGNN++ which dynamically learns compact projection vectors from raw text using structured and context-dependent partition projections.
369	Proactive Human-Machine Conversation with Explicit Conversation Goal	Wenquan Wu, Zhen Guo, Xiangyang Zhou, Hua Wu, Xiyuan Zhang, Rongzhong Lian, Haifeng Wang,	In this paper, we take a radical step towards building a human-like conversational agent: endowing it with the ability of proactively leading the conversation (introducing a new topic or maintaining the current topic).
370	Learning a Matching Model with Co-teaching for Multi-turn Response Selection in Retrieval-based Dialogue Systems	Jiazhan Feng, Chongyang Tao, Wei Wu, Yansong Feng, Dongyan Zhao, Rui Yan,	To learn a robust matching model from noisy training data, we propose a general co-teaching framework with three specific teaching strategies that cover both teaching with loss functions and teaching with data curriculum.
371	Learning to Abstract for Memory-augmented Conversational Response Generation	Zhiliang Tian, Wei Bi, Xiaopeng Li, Nevin L. Zhang,	In this work, we propose a memory-augmented generative model, which learns to abstract from the training corpus and saves the useful information to the memory to assist the response generation.
372	Are Training Samples Correlated? Learning to Generate Dialogue Responses with Multiple References	Lisong Qiu, Juntao Li, Wei Bi, Dongyan Zhao, Rui Yan,	In this paper, we propose to utilize the multiple references by considering the correlation of different valid responses and modeling the 1-to-n mapping with a novel two-step generation architecture.
373	Pretraining Methods for Dialog Context Representation Learning	Shikib Mehri, Evgeniia Razumovskaia, Tiancheng Zhao, Maxine Eskenazi,	Two novel methods of pretraining dialog context encoders are proposed, and a total of four methods are examined.
374	A Large-Scale Corpus for Conversation Disentanglement	Jonathan K. Kummerfeld, Sai R. Gouravajhala, Joseph J. Peper, Vignesh Athreya, Chulaka Gunasekara, Jatin Ganhotra, Siva Sankalp Patel, Lazaros C Polymenakos, Walter Lasecki,	We created a new dataset of 77,563 messages manually annotated with reply-structure graphs that both disentangle conversations and define internal conversation structure.
375	Self-Supervised Dialogue Learning	Jiawei Wu, Xin Wang, William Yang Wang,	Therefore, in this paper, we introduce a self-supervised learning task, inconsistent order detection, to explicitly capture the flow of conversation in dialogues.
376	Are we there yet? Encoder-decoder neural networks as cognitive models of English past tense inflection	Maria Corkery, Yevgen Matusevych, Sharon Goldwater,	Recently, however, Kirov and Cotterell (2018) showed that modern encoder-decoder (ED) models overcome many of these flaws. They also presented evidence that ED models demonstrate humanlike performance in a nonce-word task. Here, we look more closely at the behaviour of their model in this task.
377	A Spreading Activation Framework for Tracking Conceptual Complexity of Texts	Ioana Hulpu?, Sanja Štajner, Heiner Stuckenschmidt,	We propose an unsupervised approach for assessing conceptual complexity of texts, based on spreading activation.
378	End-to-End Sequential Metaphor Identification Inspired by Linguistic Theories	Rui Mao, Chenghua Lin, Frank Guerin,	We experiment with two DNN models which are inspired by two human metaphor identification procedures. By testing on three public datasets, we find that our models achieve state-of-the-art performance in end-to-end metaphor identification.
379	Diachronic Sense Modeling with Deep Contextualized Word Embeddings: An Ecological View	Renfen Hu, Shen Li, Shichen Liang,	To address this issue, this paper proposes a sense representation and tracking framework based on deep contextualized embeddings, aiming at answering not only what and when, but also how the word meaning changes.
380	Miss Tools and Mr Fruit: Emergent Communication in Agents Learning about Object Affordances	Diane Bouchacourt, Marco Baroni,	We propose here a new task capturing crucial aspects of the human environment, such as natural object affordances, and of human conversation, such as full symmetry among the participants.
381	CNNs found to jump around more skillfully than RNNs: Compositional Generalization in Seq2seq Convolutional Networks	Roberto Dessì, Marco Baroni,	We test here a convolutional network (CNN) on these tasks, reporting hugely improved performance with respect to RNNs.
382	Uncovering Probabilistic Implications in Typological Knowledge Bases	Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein,	In this paper, we present a computational model which successfully identifies known universals, including Greenberg universals, but also uncovers new ones, worthy of further linguistic investigation.
383	Is Word Segmentation Child’s Play in All Languages?	Georgia R. Loukatou, Steven Moran, Damian Blasi, Sabine Stoll, Alejandrina Cristia,	We report on the stability in performance of 11 conceptually diverse algorithms on a selection of 8 typologically distinct languages. The results consist evidence that some segmentation algorithms are cross-linguistically valid, thus could be considered as potential strategies employed by all infants.
384	On the Distribution of Deep Clausal Embeddings: A Large Cross-linguistic Study	Damian Blasi, Ryan Cotterell, Lawrence Wolf-Sonkin, Sabine Stoll, Balthasar Bickel, Marco Baroni,	We introduce here a collection of large, dependency-parsed written corpora in 17 languages, that allow us, for the first time, to capture clausal embedding through dependency graphs and assess their distribution.
385	Attention-based Conditioning Methods for External Knowledge Integration	Katerina Margatina, Christos Baziotis, Alexandros Potamianos,	In this paper, we present a novel approach for incorporating external knowledge in Recurrent Neural Networks (RNNs).
386	The KnowRef Coreference Corpus: Removing Gender and Number Cues for Difficult Pronominal Anaphora Resolution	Ali Emami, Paul Trichelair, Adam Trischler, Kaheer Suleman, Hannes Schulz, Jackie Chi Kit Cheung,	We introduce a new benchmark for coreference resolution and NLI, KnowRef, that targets common-sense understanding and world knowledge. We present a corpus of over 8,000 annotated text passages with ambiguous pronominal anaphora.
387	StRE: Self Attentive Edit Quality Prediction in Wikipedia	Soumya Sarkar, Bhanu Prakash Reddy, Sandipan Sikdar, Animesh Mukherjee,	In this paper we propose Self Attentive Revision Encoder (StRE) which leverages orthographic similarity of lexical units toward predicting the quality of new edits.
388	How Large Are Lions? Inducing Distributions over Quantitative Attributes	Yanai Elazar, Abhijit Mahabal, Deepak Ramachandran, Tania Bedrax-Weiss, Dan Roth,	We propose an unsupervised method for collecting quantitative information from large amounts of web data, and use it to create a new, very large resource consisting of distributions over physical quantities associated with objects, adjectives, and verbs which we call Distributions over Quantitative (DoQ).
389	Fine-Grained Sentence Functions for Short-Text Conversation	Wei Bi, Jun Gao, Xiaojiang Liu, Shuming Shi,	In this work, we collect a new Short-Text Conversation dataset with manually annotated SEntence FUNctions (STC-Sefun).
390	Give Me More Feedback II: Annotating Thesis Strength and Related Attributes in Student Essays	Zixuan Ke, Hrishikesh Inamdar, Hui Lin, Vincent Ng,	To facilitate advances in this area, we design a scoring rubric for scoring a core, yet unexplored dimension of persuasive essay quality, thesis strength, and annotate a corpus of essays with thesis strength scores.
391	Crowdsourcing and Validating Event-focused Emotion Corpora for German and English	Enrica Troiano, Sebastian Padó, Roman Klinger,	In this paper, we fill this gap for German by constructing deISEAR, a corpus designed in analogy to the well-established English ISEAR emotion dataset.
392	Pay Attention when you Pay the Bills. A Multilingual Corpus with Dependency-based and Semantic Annotation of Collocations.	Marcos Garcia, Marcos García Salido, Susana Sotelo, Estela Mosqueira, Margarita Alonso-Ramos,	This paper presents a new multilingual corpus with semantic annotation of collocations in English, Portuguese, and Spanish.
393	Does it Make Sense? And Why? A Pilot Study for Sense Making and Explanation	Cunxiang Wang, Shuailong Liang, Yue Zhang, Xiaonan Li, Tian Gao,	In this paper, we release a benchmark to directly test whether a system can differentiate natural language statements that make sense from those that do not make sense.
394	Large Dataset and Language Model Fun-Tuning for Humor Recognition	Vladislav Blinov, Valeria Bolotova-Baranova, Pavel Braslavski,	We collected a dataset of jokes and funny dialogues in Russian from various online resources and complemented them carefully with unfunny texts with similar lexical properties.
395	Towards Language Agnostic Universal Representations	Armen Aghajanyan, Xia Song, Saurabh Tiwary,	In this work, we present a method to decouple the language from the problem by learning language agnostic representations and therefore allowing training a model in one language and applying to a different one in a zero shot fashion.
396	Leveraging Meta Information in Short Text Aggregation	He Zhao, Lan Du, Guanfeng Liu, Wray Buntine,	To deal with the insufficiency, we propose a generative model that aggregates short texts into clusters by leveraging the associated meta information.
397	Exploiting Invertible Decoders for Unsupervised Sentence Representation Learning	Shuai Tang, Virginia R. de Sa,	In order to utilise the decoder after learning, we present two types of decoding functions whose inverse can be easily derived without expensive inverse calculation.
398	Self-Attentive, Multi-Context One-Class Classification for Unsupervised Anomaly Detection on Text	Lukas Ruff, Yury Zemlyanskiy, Robert Vandermeulen, Thomas Schnake, Marius Kloft,	In this paper we introduce a new anomaly detection method-Context Vector Data Description (CVDD)-which builds upon word embedding models to learn multiple sentence representations that capture multiple semantic contexts via the self-attention mechanism.
399	Hubless Nearest Neighbor Search for Bilingual Lexicon Induction	Jiaji Huang, Qiang Qiu, Kenneth Church,	This work proposes a new method, Hubless Nearest Neighbor (HNN), to mitigate hubness.
400	Distant Learning for Entity Linking with Automatic Noise Detection	Phong Le, Ivan Titov,	As the learning signal is weak and our surrogate labels are noisy, we introduce a noise detection component in our model: it lets the model detect and disregard examples which are likely to be noisy.
401	Learning How to Active Learn by Dreaming	Thuy-Trang Vu, Ming Liu, Dinh Phung, Gholamreza Haffari,	We introduce a new sample-efficient method that learns the AL policy directly on the target domain of interest by using wake and dream cycles.
402	Few-Shot Representation Learning for Out-Of-Vocabulary Words	Ziniu Hu, Ting Chen, Kai-Wei Chang, Yizhou Sun,	In this paper, we formulate the learning of OOV embedding as a few-shot regression problem by fitting a representation function to predict an oracle embedding vector (defined as embedding trained with abundant observations) based on limited contexts.
403	Neural Temporality Adaptation for Document Classification: Diachronic Word Embeddings and Domain Adaptation Models	Xiaolei Huang, Michael J. Paul,	This paper describes two complementary ways to adapt classifiers to shifts across time.
404	Learning Transferable Feature Representations Using Neural Networks	Himanshu Sharad Bhatt, Shourya Roy, Arun Rajkumar, Sriranjani Ramakrishnan,	We present a novel neural network architecture to simultaneously learn a two-part representation which is based on the principle of segregating source specific representation from the common representation.
405	Bayes Test of Precision, Recall, and F1 Measure for Comparison of Two Natural Language Processing Models	Ruibo Wang, Jihong Li,	In this study, we propose to use a block-regularized 3{\mbox{$\times$}}2 CV (3{\mbox{$\times$}}2 BCV) in model comparison because it could regularize the difference in certain frequency distributions over linguistic units between training and validation sets and yield stable estimators of P, R, and F1.
406	TIGS: An Inference Algorithm for Text Infilling with Gradient Search	Dayiheng Liu, Jie Fu, Pengfei Liu, Jiancheng Lv,	In this paper, we propose an iterative inference algorithm based on gradient search, which could be the first inference algorithm that can be broadly applied to any neural sequence generative models for text infilling tasks.
407	Keeping Notes: Conditional Natural Language Generation with a Scratchpad Encoder	Ryan Benmalek, Madian Khabsa, Suma Desu, Claire Cardie, Michele Banko,	We introduce the Scratchpad Mechanism, a novel addition to the sequence-to-sequence (seq2seq) neural network architecture and demonstrate its effectiveness in improving the overall fluency of seq2seq models for natural language generation tasks.
408	Using Automatically Extracted Minimum Spans to Disentangle Coreference Evaluation from Boundary Detection	Nafise Sadat Moosavi, Leo Born, Massimo Poesio, Michael Strube,	In this paper, we propose the MINA algorithm for automatically extracting minimum spans to benefit from minimum span evaluation in all corpora.
409	Revisiting Joint Modeling of Cross-document Entity and Event Coreference Resolution	Shany Barhom, Vered Shwartz, Alon Eirew, Michael Bugert, Nils Reimers, Ido Dagan,	We propose a neural architecture for cross-document coreference resolution.
410	A Unified Linear-Time Framework for Sentence-Level Discourse Parsing	Xiang Lin, Shafiq Joty, Prathyusha Jwalapuram, M Saiful Bari,	We propose an efficient neural framework for sentence-level discourse analysis in accordance with Rhetorical Structure Theory (RST).
411	Employing the Correspondence of Relations and Connectives to Identify Implicit Discourse Relations via Label Embeddings	Linh The Nguyen, Linh Van Ngo, Khoat Than, Thien Huu Nguyen,	In this work, we explore this property in a multi-task learning framework for IDRR in which the relations and the connectives are simultaneously predicted, and the mapping is leveraged to transfer knowledge between the two prediction tasks via the embeddings of relations and connectives.
412	Do You Know That Florence Is Packed with Visitors? Evaluating State-of-the-art Models of Speaker Commitment	Nanjiang Jiang, Marie-Catherine de Marneffe,	Here, we explore the hypothesis that linguistic deficits drive the error patterns of existing speaker commitment models by analyzing the linguistic correlates of model error on a challenging naturalistic dataset.
413	Multi-Relational Script Learning for Discourse Relations	I-Ta Lee, Dan Goldwasser,	In this paper, we suggest to view learning event embedding as a multi-relational problem, which allows us to capture different aspects of event pairs.
414	Open-Domain Why-Question Answering with Adversarial Learning to Encode Answer Texts	Jong-Hoon Oh, Kazuma Kadowaki, Julien Kloetzer, Ryu Iida, Kentaro Torisawa,	In this paper, we propose a method for why-question answering (why-QA) that uses an adversarial learning framework.
415	Learning to Ask Unanswerable Questions for Machine Reading Comprehension	Haichao Zhu, Li Dong, Furu Wei, Wenhui Wang, Bing Qin, Ting Liu,	In this work, we propose a data augmentation technique by automatically generating relevant unanswerable questions according to an answerable question paired with its corresponding paragraph that contains the answer.
416	Compositional Questions Do Not Necessitate Multi-hop Reasoning	Sewon Min, Eric Wallace, Sameer Singh, Matt Gardner, Hannaneh Hajishirzi, Luke Zettlemoyer,	We introduce a single-hop BERT-based RC model that achieves 67 F1-comparable to state-of-the-art multi-hop models.
417	Improving Question Answering over Incomplete KBs with Knowledge-Aware Reader	Wenhan Xiong, Mo Yu, Shiyu Chang, Xiaoxiao Guo, William Yang Wang,	We propose a new end-to-end question answering model, which learns to aggregate answer evidence from an incomplete knowledge base (KB) and a set of retrieved text snippets.Under the assumptions that structured data is easier to query and the acquired knowledge can help the understanding of unstructured text, our model first accumulates knowledge ofKB entities from a question-related KB sub-graph; then reformulates the question in the latent space and reads the text with the accumulated entity knowledge at hand.
418	AdaNSP: Uncertainty-driven Adaptive Decoding in Neural Semantic Parsing	Xiang Zhang, Shizhu He, Kang Liu, Jun Zhao,	We instead to propose an adaptive decoding method to avoid such intermediate representations.
419	The Language of Legal and Illegal Activity on the Darknet	Leshem Choshen, Dan Eldad, Daniel Hershcovich, Elior Sulem, Omri Abend,	This paper tackles this gap and performs an in-depth investigation of the characteristics of legal and illegal text in the Darknet, comparing it to a clear net website with similar content as a control condition.
420	Eliciting Knowledge from Experts: Automatic Transcript Parsing for Cognitive Task Analysis	Junyi Du, He Jiang, Jiaming Shen, Xiang Ren,	In this paper, we propose a weakly-supervised information extraction framework for automated CTA transcript parsing.
421	Course Concept Expansion in MOOCs with External Knowledge and Interactive Game	Jifan Yu, Chenyu Wang, Gan Luo, Lei Hou, Juanzi Li, Zhiyuan Liu, Jie Tang,	In this paper, we first build a novel boundary during searching for new concepts via external knowledge base and then utilize heterogeneous features to verify the high-quality results. In addition, to involve human efforts in our model, we design an interactive optimization mechanism based on a game.
422	Towards Near-imperceptible Steganographic Text	Falcon Dai, Zheng Cai,	We show that the imperceptibility of several existing linguistic steganographic systems (Fang et al., 2017; Yang et al., 2018) relies on implicit assumptions on statistical behaviors of fluent text.
423	Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network	Sunil Kumar Sahu, Fenia Christopoulou, Makoto Miwa, Sophia Ananiadou,	We present a novel inter-sentence relation extraction model that builds a labelled edge graph convolutional neural network model on a document-level graph.
424	Neural Legal Judgment Prediction in English	Ilias Chalkidis, Ion Androutsopoulos, Nikolaos Aletras,	As a side-product, we propose a hierarchical version of BERT, which bypasses BERT’s length limitation. We release a new English legal judgment prediction dataset, containing cases from the European Court of Human Rights.
425	Robust Neural Machine Translation with Doubly Adversarial Inputs	Yong Cheng, Lu Jiang, Wolfgang Macherey,	We propose an approach to improving the robustness of NMT models, which consists of two parts: (1) attack the translation model with adversarial source examples; (2) defend the translation model with adversarial target inputs to improve its robustness against the adversarial source inputs.
426	Bridging the Gap between Training and Inference for Neural Machine Translation	Wen Zhang, Yang Feng, Fandong Meng, Di You, Qun Liu,	In this paper, we address these issues by sampling context words not only from the ground truth sequence but also from the predicted sequence by the model during training, where the predicted sequence is selected with a sentence-level optimum.
427	Beyond BLEU:Training Neural Machine Translation with Semantic Similarity	John Wieting, Taylor Berg-Kirkpatrick, Kevin Gimpel, Graham Neubig,	In this paper, we introduce an alternative reward function for optimizing NMT systems that is based on recent work in semantic similarity.
428	AutoML Strategy Based on Grammatical Evolution: A Case Study about Knowledge Discovery from Text	Suilan Estevez-Velarde, Yoan Gutiérrez, Andrés Montoyo, Yudivián Almeida-Cruz,	This paper proposes a novel AutoML strategy based on probabilistic grammatical evolution, which is evaluated on the health domain by facing the knowledge discovery challenge in Spanish text documents.
429	Distilling Discrimination and Generalization Knowledge for Event Detection via Delta-Representation Learning	Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun,	To address this problem, this paper proposes a Delta-learning approach to distill discrimination and generalization knowledge by effectively decoupling, incrementally learning and adaptively fusing event representation.
430	Chinese Relation Extraction with Multi-Grained Information and External Linguistic Knowledge	Ziran Li, Ning Ding, Zhiyuan Liu, Haitao Zheng, Ying Shen,	To address the issues, we propose a multi-grained lattice framework (MG lattice) for Chinese relation extraction to take advantage of multi-grained language information and external linguistic knowledge.
431	A2N: Attending to Neighbors for Knowledge Graph Inference	Trapit Bansal, Da-Cheng Juan, Sujith Ravi, Andrew McCallum,	We thus propose a novel attention-based method to learn query-dependent representation of entities which adaptively combines the relevant graph neighborhood of an entity leading to more accurate KG completion.
432	Graph based Neural Networks for Event Factuality Prediction using Syntactic and Semantic Structures	Amir Pouran Ben Veyseh, Thien Huu Nguyen, Dejing Dou,	In this work, we introduce a novel graph-based neural network for EFP that can integrate the semantic and syntactic information more effectively.
433	Embedding Time Expressions for Deep Temporal Ordering Models	Tanya Goyal, Greg Durrett,	In this paper, we introduce a framework to infuse temporal awareness into such models by learning a pre-trained model to embed timexes. We generate synthetic data consisting of pairs of timexes, then train a character LSTM to learn embeddings and classify the timexes’ temporal relation.
434	Episodic Memory Reader: Learning What to Remember for Question Answering from Streaming Data	Moonsu Han, Minki Kang, Hyunwoo Jung, Sung Ju Hwang,	To tackle this problem, we propose a novel end-to-end deep network model for reading comprehension, which we refer to as Episodic Memory Reader (EMR) that sequentially reads the input contexts into an external memory, while replacing memories that are less important for answering unseen questions.
435	Selection Bias Explorations and Debias Methods for Natural Language Sentence Matching Datasets	Guanhua Zhang, Bing Bai, Jian Liang, Kun Bai, Shiyu Chang, Mo Yu, Conghui Zhu, Tiejun Zhao,	In this paper, we investigate the problem of selection bias on six NLSM datasets and find that four out of them are significantly biased.
436	Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index	Minjoon Seo, Jinhyuk Lee, Tom Kwiatkowski, Ankur Parikh, Ali Farhadi, Hannaneh Hajishirzi,	In this paper, we introduce query-agnostic indexable representations of document phrases that can drastically speed up open-domain QA.
437	Language Modeling with Shared Grammar	Yuyu Zhang, Le Song,	In this work, we propose neural variational language model (NVLM), which enables the sharing of grammar knowledge among different corpora.
438	Zero-Shot Semantic Parsing for Instructions	Ofer Givoli, Roi Reichart,	We introduce a new training algorithm that aims to train a semantic parser on examples from a set of source domains, so that it can effectively parse instructions from an unknown target domain.
439	Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling	Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen, Benjamin Van Durme, Edouard Grave, Ellie Pavlick, Samuel R. Bowman,	We conduct the first large-scale systematic study of candidate pretraining tasks, comparing 19 different tasks both as alternatives and complements to language modeling.
440	Complex Question Decomposition for Semantic Parsing	Haoyu Zhang, Jingjing Cai, Jianjun Xu, Ji Wang,	In this work, we focus on complex question semantic parsing and propose a novel Hierarchical Semantic Parsing (HSP) method, which utilizes the decompositionality of complex questions for semantic parsing.
441	Multi-Task Deep Neural Networks for Natural Language Understanding	Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao,	In this paper, we present a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks.
442	DisSent: Learning Sentence Representations from Explicit Discourse Relations	Allen Nie, Erin Bennett, Noah Goodman,	We show that with dependency parsing and rule-based rubrics, we can curate a high quality sentence relation task by leveraging explicit discourse relations.
443	SParC: Cross-Domain Semantic Parsing in Context	Tao Yu, Rui Zhang, Michihiro Yasunaga, Yi Chern Tan, Xi Victoria Lin, Suyi Li, Heyang Er, Irene Li, Bo Pang, Tao Chen, Emily Ji, Shreya Dixit, David Proctor, Sungrok Shim, Jonathan Kraft, Vincent Zhang, Caiming Xiong, Richard Socher, Dragomir Radev,	We present SParC, a dataset for cross-domainSemanticParsing inContext that consists of 4,298 coherent question sequences (12k+ individual questions annotated with SQL queries).
444	Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation	Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao, Jian-Guang Lou, Ting Liu, Dongmei Zhang,	We present a neural approach called IRNet for complex and cross-domain Text-to-SQL.
445	EigenSent: Spectral sentence embeddings using higher-order Dynamic Mode Decomposition	Subhradeep Kayal, George Tsatsaronis,	In this work, we experiment with spectral methods of signal representation and summarization as mechanisms for constructing such word-sequence embeddings in an unsupervised fashion.
446	SemBleu: A Robust Metric for AMR Parsing Evaluation	Linfeng Song, Daniel Gildea,	We propose SEMBLEU, a robust metric that extends BLEU (Papineni et al., 2002) to AMRs.
447	Reranking for Neural Semantic Parsing	Pengcheng Yin, Graham Neubig,	This paper presents a simple approach to quickly iterate and improve the performance of an existing neural semantic parser by reranking an n-best list of predicted MRs, using features that are designed to fix observed problems with baseline models.
448	Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing	Ben Bogin, Jonathan Berant, Matt Gardner,	In this paper, we present an encoder-decoder semantic parser, where the structure of the DB schema is encoded with a graph neural network, and this representation is later used at both encoding and decoding time.
449	Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark	Nikita Nangia, Samuel R. Bowman,	Given the fast pace of progress however, the headroom we observe is quite limited.
450	Compositional Semantic Parsing across Graphbanks	Matthias Lindemann, Jonas Groschwitz, Alexander Koller,	We present a compositional neural semantic parser which achieves, for the first time, competitive accuracies across a diverse range of graphbanks.
451	Rewarding Smatch: Transition-Based AMR Parsing with Reinforcement Learning	Tahira Naseem, Abhishek Shah, Hui Wan, Radu Florian, Salim Roukos, Miguel Ballesteros,	Our work involves enriching the Stack-LSTM transition-based AMR parser (Ballesteros and Al-Onaizan, 2017) by augmenting training with Policy Learning and rewarding the Smatch score of sampled graphs.
452	BERT Rediscovers the Classical NLP Pipeline	Ian Tenney, Dipanjan Das, Ellie Pavlick,	We focus on one such model, BERT, and aim to quantify where linguistic information is captured within the network.
453	Simple and Effective Paraphrastic Similarity from Parallel Translations	John Wieting, Kevin Gimpel, Graham Neubig, Taylor Berg-Kirkpatrick,	We present a model and methodology for learning paraphrastic sentence embeddings directly from bitext, removing the time-consuming intermediate step of creating para-phrase corpora.
454	Second-Order Semantic Dependency Parsing with End-to-End Neural Networks	Xinyu Wang, Jingxian Huang, Kewei Tu,	In this paper, we propose a second-order semantic dependency parser, which takes into consideration not only individual dependency edges but also interactions between pairs of edges.
455	Towards Multimodal Sarcasm Detection (An _Obviously_ Perfect Paper)	Santiago Castro, Devamanyu Hazarika, Verónica Pérez-Rosas, Roger Zimmermann, Rada Mihalcea, Soujanya Poria,	In this paper, we argue that incorporating multimodal cues can improve the automatic classification of sarcasm.
456	Determining Relative Argument Specificity and Stance for Complex Argumentative Structures	Esin Durmus, Faisal Ladhak, Claire Cardie,	In this paper, we tackle these tasks in the context of complex arguments on a diverse set of topics.
457	Latent Variable Sentiment Grammar	Liwen Zhang, Kewei Tu, Yue Zhang,	To this end, we investigate two formalisms with deep sentiment representations that capture sentiment subtype expressions by latent variables and Gaussian mixture vectors, respectively.
458	An Investigation of Transfer Learning-Based Sentiment Analysis in Japanese	Enkhbold Bataa, Joshua Wu,	In this work we focus on Japanese and show the potential use of transfer learning techniques in text classification.
459	Probing Neural Network Comprehension of Natural Language Arguments	Timothy Niven, Hung-Yu Kao,	We are surprised to find that BERT’s peak performance of 77% on the Argument Reasoning Comprehension Task reaches just three points below the average untrained human baseline. However, we show that this result is entirely accounted for by exploitation of spurious statistical cues in the dataset.
460	Recognising Agreement and Disagreement between Stances with Reason Comparing Networks	Chang Xu, Cecile Paris, Surya Nepal, Ross Sparks,	We propose a reason comparing network (RCN) to leverage reason information for stance comparison.
461	Toward Comprehensive Understanding of a Sentiment Based on Human Motives	Naoki Otani, Eduard Hovy,	Our work considers human motives as the driver for human sentiments and addresses the problem of motive detection as the first step.
462	Context-aware Embedding for Targeted Aspect-based Sentiment Analysis	Bin Liang, Jiachen Du, Ruifeng Xu, Binyang Li, Hejiao Huang,	To address this problem, we propose a novel method to refine the embeddings of targets and aspects.
463	Yes, we can! Mining Arguments in 50 Years of US Presidential Campaign Debates	Shohreh Haddadan, Elena Cabrio, Serena Villata,	As existing research lacks solid empirical investigation of the typology of argument components in political debates, we fill this gap by proposing an Argument Mining approach to political debates.
464	An Empirical Study of Span Representations in Argumentation Structure Parsing	Tatsuki Kuribayashi, Hiroki Ouchi, Naoya Inoue, Paul Reisert, Toshinori Miyoshi, Jun Suzuki, Kentaro Inui,	This study investigates (i) span representation originally developed for other NLP tasks and (ii) a simple task-dependent extension for ASP.
465	Simple and Effective Text Matching with Richer Alignment Features	Runqi Yang, Jianhai Zhang, Xing Gao, Feng Ji, Haiqing Chen,	In this paper, we present a fast and strong neural approach for general purpose text matching applications.
466	Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs	Deepak Nathani, Jatin Chauhan, Charu Sharma, Manohar Kaul,	To this effect, our paper proposes a novel attention-based feature embedding that captures both entity and relation features in any given entity’s neighborhood.
467	Neural Network Alignment for Sentential Paraphrases	Jessica Ouyang, Kathy McKeown,	We present a monolingual alignment system for long, sentence- or clause-level alignments, and demonstrate that systems designed for word- or short phrase-based alignment are ill-suited for these longer alignments.
468	Duality of Link Prediction and Entailment Graph Induction	Mohammad Javad Hosseini, Shay B. Cohen, Mark Johnson, Mark Steedman,	In this paper, we show that these two problems are actually complementary.
469	A Cross-Sentence Latent Variable Model for Semi-Supervised Text Sequence Matching	Jihun Choi, Taeuk Kim, Sang-goo Lee,	We present a latent variable model for predicting the relationship between a pair of text sequences.
470	COMET: Commonsense Transformers for Automatic Knowledge Graph Construction	Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya, Asli Celikyilmaz, Yejin Choi,	We present the first comprehensive study on automatic knowledge base construction for two prevalent commonsense knowledge graphs: ATOMIC (Sap et al., 2019) and ConceptNet (Speer et al., 2017).
471	Detecting Subevents using Discourse and Narrative Features	Mohammed Aldawsari, Mark Finlayson,	We present a supervised model for automatically identifying when one event is a subevent of another.
472	HellaSwag: Can a Machine Really Finish Your Sentence?	Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, Yejin Choi,	In this paper, we show that commonsense inference still proves difficult for even state-of-the-art models, by presenting HellaSwag, a new challenge dataset.
473	Unified Semantic Parsing with Weak Supervision	Priyanka Agrawal, Ayushi Dalmia, Parag Jain, Abhishek Bansal, Ashish Mittal, Karthik Sankaranarayanan,	To overcome this, we propose a novel framework to build a unified multi-domain enabled semantic parser trained only with weak supervision (denotations).
474	Every Child Should Have Parents: A Taxonomy Refinement Algorithm Based on Hyperbolic Term Embeddings	Rami Aly, Shantanu Acharya, Alexander Ossa, Arne Köhn, Chris Biemann, Alexander Panchenko,	We introduce the use of Poincar{\’e} embeddings to improve existing state-of-the-art approaches to domain-specific taxonomy induction from text as a signal for both relocating wrong hyponym terms within a (pre-induced) taxonomy as well as for attaching disconnected terms in a taxonomy.
475	Learning to Rank for Plausible Plausibility	Zhongyang Li, Tongfei Chen, Benjamin Van Durme,	We suggest this loss is intuitively wrong when applied to plausibility tasks, where the prompt by design is neither categorically entailed nor contradictory given the context.
476	Generalized Tuning of Distributional Word Vectors for Monolingual and Cross-Lingual Lexical Entailment	Goran Glavaš, Ivan Vuli?,	In this work, we propose a simple and effective method for fine-tuning distributional word vectors for LE.
477	Attention Is (not) All You Need for Commonsense Reasoning	Tassilo Klein, Moin Nabi,	In this paper, we describe a simple re-implementation of BERT for commonsense reasoning.
478	A Surprisingly Robust Trick for the Winograd Schema Challenge	Vid Kocijan, Ana-Maria Cretu, Oana-Maria Camburu, Yordan Yordanov, Thomas Lukasiewicz,	In this paper, we show that the performance of three language models on WSC273 consistently and robustly improves when fine-tuned on a similar pronoun disambiguation problem dataset (denoted WSCR).
479	Coherent Comments Generation for Chinese Articles with a Graph-to-Sequence Model	Wei Li, Jingjing Xu, Yancheng He, ShengLi Yan, Yunfang Wu, Xu Sun,	In this paper, we propose to generate comments with a graph-to-sequence model that models the input news as a topic interaction graph.
480	Interconnected Question Generation with Coreference Alignment and Conversation Flow Modeling	Yifan Gao, Piji Li, Irwin King, Michael R. Lyu,	We propose an end-to-end neural model with coreference alignment and conversation flow modeling.
481	Cross-Lingual Training for Automatic Question Generation	Vishwajeet Kumar, Nitish Joshi, Arijit Mukherjee, Ganesh Ramakrishnan, Preethi Jyothi,	We propose a cross-lingual QG model which uses the following training regime: (i) Unsupervised pretraining of language models in both primary and secondary languages and (ii) joint supervised training for QG in both languages.
482	A Hierarchical Reinforced Sequence Operation Method for Unsupervised Text Style Transfer	Chen Wu, Xuancheng Ren, Fuli Luo, Xu Sun,	To address these challenges, we propose a hierarchical reinforced sequence operation method, named Point-Then-Operate (PTO), which consists of a high-level agent that proposes operation positions and a low-level agent that alters the sentence.
483	Handling Divergent Reference Texts when Evaluating Table-to-Text Generation	Bhuwan Dhingra, Manaal Faruqui, Ankur Parikh, Ming-Wei Chang, Dipanjan Das, William Cohen,	We propose a new metric, PARENT, which aligns n-grams from the reference and generated texts to the semi-structured data before computing their precision and recall.
484	Unsupervised Question Answering by Cloze Translation	Patrick Lewis, Ludovic Denoyer, Sebastian Riedel,	In this work, we explore to what extent high quality training data is actually required for Extractive QA, and investigate the possibility of unsupervised Extractive QA.
485	MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension	Alon Talmor, Jonathan Berant,	In this paper, we conduct such an investigation over ten RC datasets, training on one or more source RC datasets, and evaluating generalization, as well as transfer to a target RC dataset.
486	Simple and Effective Curriculum Pointer-Generator Networks for Reading Comprehension over Long Narratives	Yi Tay, Shuohang Wang, Anh Tuan Luu, Jie Fu, Minh C. Phan, Xingdi Yuan, Jinfeng Rao, Siu Cheung Hui, Aston Zhang,	We propose a curriculum learning (CL) based Pointer-Generator framework for reading/sampling over large documents, enabling diverse training of the neural model based on the notion of alternating contextual difficulty.
487	Explain Yourself! Leveraging Language Models for Commonsense Reasoning	Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong, Richard Socher,	We collect human explanations for commonsense reasoning in the form of natural language sequences and highlighted annotations in a new dataset called Common Sense Explanations (CoS-E). We use CoS-E to train language models to automatically generate explanations that can be used during training and inference in a novel Commonsense Auto-Generated Explanation (CAGE) framework.
488	Interpretable Question Answering on Knowledge Bases and Text	Alona Sydorova, Nina Poerner, Benjamin Roth,	In this work, we address the interpretability of ML based question answering (QA) models on a combination of knowledge bases (KB) and text documents.
489	A Resource-Free Evaluation Metric for Cross-Lingual Word Embeddings Based on Graph Modularity	Yoshinari Fujinuma, Jordan Boyd-Graber, Michael J. Paul,	We measure this characteristic using modularity, a network measurement that measures the strength of clusters in a graph.
490	Multilingual and Cross-Lingual Graded Lexical Entailment	Ivan Vuli?, Simone Paolo Ponzetto, Goran Glavaš,	In this paper, we present the first work on cross-lingual generalisation of GR-LE relation.
491	What Kind of Language Is Hard to Language-Model?	Sebastian J. Mielke, Ryan Cotterell, Kyle Gorman, Brian Roark, Jason Eisner,	Methodologically, we introduce a new paired-sample multiplicative mixed-effects model to obtain language difficulty coefficients from at-least-pairwise parallel corpora.
492	Analyzing the Limitations of Cross-lingual Word Embedding Mappings	Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa, Eneko Agirre,	We thus conclude that current mapping methods do have strong limitations, calling for further research to jointly learn cross-lingual embeddings with a weaker cross-lingual signal.
493	How Multilingual is Multilingual BERT?	Telmo Pires, Eva Schlinger, Dan Garrette,	In this paper, we show that Multilingual BERT (M-BERT), released by Devlin et al. (2018) as a single language model pre-trained from monolingual corpora in 104 languages, is surprisingly good at zero-shot cross-lingual model transfer, in which task-specific annotations in one language are used to fine-tune the model for evaluation in another language.
494	Bilingual Lexicon Induction through Unsupervised Machine Translation	Mikel Artetxe, Gorka Labaka, Eneko Agirre,	In this paper, we propose an alternative approach to this problem that builds on the recent work on unsupervised machine translation.
495	Automatically Identifying Complaints in Social Media	Daniel Preo?iuc-Pietro, Mihaela Gaman, Nikolaos Aletras,	In this paper, we introduce the first systematic analysis of complaints in computational linguistics.
496	TWEETQA: A Social Media Focused Question Answering Dataset	Wenhan Xiong, Jiawei Wu, Hong Wang, Vivek Kulkarni, Mo Yu, Shiyu Chang, Xiaoxiao Guo, William Yang Wang,	While previous datasets have concentrated on question answering (QA) for formal text like news and Wikipedia, we present the first large-scale dataset for QA over social media data.
497	Asking the Crowd: Question Analysis, Evaluation and Generation for Open Discussion on Online Forums	Zi Chai, Xinyu Xing, Xiaojun Wan, Bo Huang,	In this paper, we take the first step on teaching machines to ask open-answered questions from real-world news for open discussion (openQG).
498	Tree LSTMs with Convolution Units to Predict Stance and Rumor Veracity in Social Media Conversations	Sumeet Kumar, Kathleen Carley,	In this research, we propose a new way to represent social-media conversations as binarized constituency trees that allows comparing features in source-posts and their replies effectively.
499	HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization	Xingxing Zhang, Furu Wei, Ming Zhou,	Inspired by the recent work on pre-training transformer sentence encoders (Devlin et al., 2018), we propose Hibert (as shorthand for \textbf{HI}erachical \textbf{B}idirectional \textbf{E}ncoder \textbf{R}epresentations from \textbf{T}ransformers) for document encoding and a method to pre-train it using unlabeled data.
500	Hierarchical Transformers for Multi-Document Summarization	Yang Liu, Mirella Lapata,	In this paper, we develop a neural summarization model which can effectively process multiple input documents and distill Transformer architecture with the ability to encode documents in a hierarchical manner.
501	Abstractive Text Summarization Based on Deep Learning and Semantic Content Generalization	Panagiotis Kouris, Georgios Alexandridis, Andreas Stafylopatis,	This work proposes a novel framework for enhancing abstractive text summarization based on the combination of deep learning techniques along with semantic data transformations.
502	Studying Summarization Evaluation Metrics in the Appropriate Scoring Range	Maxime Peyrard,	We show that, surprisingly, evaluation metrics which behave similarly on these datasets (average-scoring range) strongly disagree in the higher-scoring range in which current systems now operate.
503	Simple Unsupervised Summarization by Contextual Matching	Jiawei Zhou, Alexander Rush,	We propose an unsupervised method for sentence summarization using only language modeling.
504	Generating Summaries with Topic Templates and Structured Convolutional Decoders	Laura Perez-Beltrachini, Yang Liu, Mirella Lapata,	In this paper we propose a structured convolutional decoder that is guided by the content structure of target summaries.
505	Morphological Irregularity Correlates with Frequency	Shijie Wu, Ryan Cotterell, Timothy O’Donnell,	We present a study of morphological irregularity.
506	Like a Baby: Visually Situated Neural Language Acquisition	Alexander Ororbia, Ankur Mali, Matthew Kelly, David Reitter,	We examine the benefits of visual context in training neural language models to perform next-word prediction.
507	Relating Simple Sentence Representations in Deep Neural Networks and the Brain	Sharmistha Jat, Hao Tang, Partha Talukdar, Tom Mitchell,	We investigate these questions using sentences with simple syntax and semantics (e.g., The bone was eaten by the dog.)
508	Modeling Affirmative and Negated Action Processing in the Brain with Lexical and Compositional Semantic Models	Vesna Djokic, Jean Maillard, Luana Bulat, Ekaterina Shutova,	In this paper, we apply lexical and compositional semantic models to decode fMRI patterns associated with negated and affirmative sentences containing hand-action verbs.
509	Word-order Biases in Deep-agent Emergent Communication	Rahma Chaabouni, Eugene Kharitonov, Alessandro Lazaric, Emmanuel Dupoux, Marco Baroni,	We aim here to uncover which biases such models display with respect to “natural” word-order constraints.
510	NNE: A Dataset for Nested Named Entity Recognition in English Newswire	Nicky Ringland, Xiang Dai, Ben Hachey, Sarvnaz Karimi, Cecile Paris, James R. Curran,	We describe NNE-a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank (PTB).
511	Sequence-to-Nuggets: Nested Entity Mention Detection via Anchor-Region Networks	Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun,	In this paper, we propose to resolve this problem by modeling and leveraging the head-driven phrase structures of entity mentions, i.e., although a mention can nest other mentions, they will not share the same head word.
512	Improving Textual Network Embedding with Global Attention via Optimal Transport	Liqun Chen, Guoyin Wang, Chenyang Tao, Dinghan Shen, Pengyu Cheng, Xinyuan Zhang, Wenlin Wang, Yizhe Zhang, Lawrence Carin,	This work focuses on learning context-aware network embeddings augmented with text data.
513	Identification of Tasks, Datasets, Evaluation Metrics, and Numeric Scores for Scientific Leaderboards Construction	Yufang Hou, Charles Jochim, Martin Gleize, Francesca Bonin, Debasis Ganguly,	In this paper we build two datasets and develop a framework (TDMS-IE) aimed at automatically extracting task, dataset, metric and score from NLP papers, towards the automatic construction of leaderboards.
514	Scaling up Open Tagging from Tens to Thousands: Comprehension Empowered Attribute Value Extraction from Product Title	Huimin Xu, Wenting Wang, Xin Mao, Xinyu Jiang, Man Lan,	In this work, we propose a novel approach to support value extraction scaling up to thousands of attributes without losing performance: (1) We propose to regard attribute as a query and adopt only one global set of BIO tags for any attributes to reduce the burden of attribute tag or model explosion; (2) We explicitly model the semantic representations for attribute and title, and develop an attention mechanism to capture the interactive semantic relations in-between to enforce our framework to be attribute comprehensive.
515	Incorporating Linguistic Constraints into Keyphrase Generation	Jing Zhao, Yuxiang Zhang,	In this paper, we propose the parallel Seq2Seq network with the coverage attention to alleviate the overlapping phrase problem.
516	A Unified Multi-task Adversarial Learning Framework for Pharmacovigilance Mining	Shweta Yadav, Asif Ekbal, Sriparna Saha, Pushpak Bhattacharyya,	In this paper, we propose a neural network inspired multi- task learning framework that can simultaneously extract ADRs from various sources.
517	Quantity Tagger: A Latent-Variable Sequence Labeling Approach to Solving Addition-Subtraction Word Problems	Yanyan Zou, Wei Lu,	This work presents a novel approach, \textit{Quantity Tagger}, that automatically discovers such hidden relations by tagging each quantity with a \textit{sign} corresponding to one type of mathematical operation.
518	A Deep Reinforced Sequence-to-Set Model for Multi-Label Classification	Pengcheng Yang, Fuli Luo, Shuming Ma, Junyang Lin, Xu Sun,	To remedy this, we propose a simple but effective sequence-to-set model.
519	Joint Slot Filling and Intent Detection via Capsule Neural Networks	Chenwei Zhang, Yaliang Li, Nan Du, Wei Fan, Philip Yu,	To exploit the semantic hierarchy for effective modeling, we propose a capsule-based neural network model which accomplishes slot filling and intent detection via a dynamic routing-by-agreement schema.
520	Neural Aspect and Opinion Term Extraction with Mined Rules as Weak Supervision	Hongliang Dai, Yangqiu Song,	To alleviate this problem, we first propose an algorithm to automatically mine extraction rules from existing training examples based on dependency parsing results. The mined rules are then applied to label a large amount of auxiliary data. Finally, we study training procedures to train a neural model which can learn from both the data automatically labeled by the rules and a small amount of data accurately annotated by human.
521	Cost-sensitive Regularization for Label Confusion-aware Event Detection	Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun,	To address this label confusion problem, this paper proposes cost-sensitive regularization, which can force the training procedure to concentrate more on optimizing confusing type pairs.
522	Exploring Pre-trained Language Models for Event Extraction and Generation	Sen Yang, Dawei Feng, Linbo Qiao, Zhigang Kan, Dongsheng Li,	To promote event extraction, we first propose an event extraction model to overcome the roles overlap problem by separating the argument prediction in terms of roles. Moreover, to address the problem of insufficient training data, we propose a method to automatically generate labeled data by editing prototypes and screen out generated samples by ranking the quality.
523	Improving Open Information Extraction via Iterative Rank-Aware Learning	Zhengbao Jiang, Pengcheng Yin, Graham Neubig,	We propose an additional binary classification loss to calibrate the likelihood to make it more globally comparable, and an iterative learning process, where extractions generated by the open IE model are incrementally included as training samples to help the model learn from trial and error.
524	Towards Improving Neural Named Entity Recognition with Gazetteers	Tianyu Liu, Jin-Ge Yao, Chin-Yew Lin,	In this work, we show that properly utilizing external gazetteers could benefit segmental neural NER models.
525	Span-Level Model for Relation Extraction	Kalpit Dixit, Yaser Al-Onaizan,	To address these concerns, we present a model which directly models all possible spans and performs joint entity mention detection and relation extraction.
526	Enhancing Unsupervised Generative Dependency Parser with Contextual Information	Wenjuan Han, Yong Jiang, Kewei Tu,	In this paper, we propose a novel probabilistic model called discriminative neural dependency model with valence (D-NDMV) that generates a sentence and its parse from a continuous latent representation, which encodes global contextual information of the generated sentence.
527	Neural Architectures for Nested NER through Linearization	Jana Straková, Milan Straka, Jan Hajic,	We propose two neural network architectures for nested named entity recognition (NER), a setting in which named entities may overlap and also be labeled with more than one label.
528	Online Infix Probability Computation for Probabilistic Finite Automata	Marco Cognetta, Yo-Sub Han, Soon Chan Kwon,	We develop an asymptotic improvement of that algorithm and solve the open problem of computing the infix probabilities of PFAs from streaming data, which is crucial when process- ing queries online and is the ultimate goal of the incremental approach.
529	How to Best Use Syntax in Semantic Role Labelling	Yufei Wang, Mark Johnson, Stephen Wan, Yifang Sun, Wei Wang,	We evaluate three different ways of encoding syntactic parses and three different ways of injecting them into a state-of-the-art neural ELMo-based SRL sequence labelling model.
530	PTB Graph Parsing with Tree Approximation	Yoshihide Kato, Shigeki Matsubara,	This paper proposes a method that approximates PTB graph-structured representations by trees.
531	Sequence Labeling Parsing by Learning across Representations	Michalina Strzyz, David Vilares, Carlos Gómez-Rodríguez,	We use parsing as sequence labeling as a common framework to learn across constituency and dependency syntactic abstractions.To do so, we cast the problem as multitask learning (MTL).
532	A Prism Module for Semantic Disentanglement in Name Entity Recognition	Kun Liu, Shen Li, Daqi Zheng, Zhengdong Lu, Sheng Gao, Si Li,	To solve this problem, we propose a prism module to disentangle the semantic aspects of words and reduce noise at the input layer of a model.
533	Label-Agnostic Sequence Labeling by Copying Nearest Neighbors	Sam Wiseman, Karl Stratos,	We show we can perform accurate sequence labeling by explicitly (and only) copying labels from retrieved neighbors.
534	Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset	Hannah Rashkin, Eric Michael Smith, Margaret Li, Y-Lan Boureau,	This work proposes a new benchmark for empathetic dialogue generation and EmpatheticDialogues, a novel dataset of 25k conversations grounded in emotional situations.
535	Know More about Each Other: Evolving Dialogue Strategy via Compound Assessment	Siqi Bao, Huang He, Fan Wang, Rongzhong Lian, Hua Wu,	In this paper, a novel Generation-Evaluation framework is developed for multi-turn conversations with the objective of letting both participants know more about each other.
536	Training Neural Response Selection for Task-Oriented Dialogue Systems	Matthew Henderson, Ivan Vuli?, Daniela Gerz, Iñigo Casanueva, Pawe? Budzianowski, Sam Coope, Georgios Spithourakis, Tsung-Hsien Wen, Nikola Mrkši?, Pei-Hao Su,	Inspired by the recent success of pretraining in language modelling, we propose an effective method for deploying response selection in task-oriented dialogue.
537	Collaborative Dialogue in Minecraft	Anjali Narayan-Chen, Prashant Jayannavar, Julia Hockenmaier,	We present the Minecraft Dialogue Corpus, a collection of 509 conversations and game logs.
538	Neural Response Generation with Meta-words	Can Xu, Wei Wu, Chongyang Tao, Huang Hu, Matt Schuerman, Ying Wang,	We present open domain dialogue generation with meta-words.
539	Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading	Lianhui Qin, Michel Galley, Chris Brockett, Xiaodong Liu, Xiang Gao, Bill Dolan, Yejin Choi, Jianfeng Gao,	We present a new end-to-end approach to contentful neural conversation that jointly models response generation and on-demand machine reading. To support further research on knowledge-grounded conversation, we introduce a new large-scale conversation dataset grounded in external web pages (2.8M turns, 7.4M sentences of grounding).
540	Ordinal and Attribute Aware Response Generation in a Multimodal Dialogue System	Hardik Chauhan, Mauajama Firdaus, Asif Ekbal, Pushpak Bhattacharyya,	In this paper, we propose a novel position and attribute aware attention mechanism to learn enhanced image representation conditioned on the user utterance.
541	Memory Consolidation for Contextual Spoken Language Understanding with Dialogue Logistic Inference	He Bai, Yu Zhou, Jiajun Zhang, Chengqing Zong,	In this paper, we propose a new dialogue logistic inference (DLI) task to consolidate the context memory jointly with SLU in the multi-task framework.
542	Personalizing Dialogue Agents via Meta-Learning	Andrea Madotto, Zhaojiang Lin, Chien-Sheng Wu, Pascale Fung,	In this paper, we propose to extend Model-Agnostic Meta-Learning (MAML) (Finn et al., 2017) to personalized dialogue learning without using any persona descriptions.
543	Reading Turn by Turn: Hierarchical Attention Architecture for Spoken Dialogue Comprehension	Zhengyuan Liu, Nancy Chen,	Therefore, in this work, we propose a hierarchical attention neural network architecture, combining turn-level and word-level attention mechanisms, to improve spoken dialogue comprehension performance.
544	A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling	Haihong E, Peiqing Niu, Zhongfu Chen, Meina Song,	In this paper, we propose a novel bi-directional interrelated model for joint intent detection and slot filling.
545	Dual Supervised Learning for Natural Language Understanding and Generation	Shang-Yu Su, Chao-Wei Huang, Yun-Nung Chen,	This paper proposes a novel learning framework for natural language understanding and generation on top of dual supervised learning, providing a way to exploit the duality.
546	SUMBT: Slot-Utterance Matching for Universal and Scalable Belief Tracking	Hwaran Lee, Jinsik Lee, Tae-Yoon Kim,	In this paper, we propose a new approach to universal and scalable belief tracker, called slot-utterance matching belief tracker (SUMBT).
547	Robust Zero-Shot Cross-Domain Slot Filling with Example Values	Darsh Shah, Raghav Gupta, Amir Fayazi, Dilek Hakkani-Tur,	We propose utilizing both the slot description and a small number of examples of slot values, which may be easily available, to learn semantic representations of slots which are transferable across domains and robust to misaligned schemas.
548	Deep Unknown Intent Detection with Margin Loss	Ting-En Lin, Hua Xu,	In this paper, we present a two-stage method for detecting unknown intents.
549	Modeling Semantic Relationship in Multi-turn Conversations with Hierarchical Latent Variables	Lei Shen, Yang Feng, Haolan Zhan,	To address this problem, we propose a Conversational Semantic Relationship RNN (CSRR) model to construct the dependency explicitly.
550	Rationally Reappraising ATIS-based Dialogue Systems	Jingcheng Niu, Gerald Penn,	This paper presents a detailed account of these shortcomings, our proposed repairs, our rule-based grammar and the neural slot-filling architectures associated with ATIS.
551	Learning Latent Trees with Stochastic Perturbations and Differentiable Dynamic Programming	Caio Corro, Ivan Titov,	We treat projective dependency trees as latent variables in our probabilistic model and induce them in such a way as to be beneficial for a downstream task, without relying on any direct tree supervision.
552	Neural-based Chinese Idiom Recommendation for Enhancing Elegance in Essay Writing	Yuanchao Liu, Bo Pang, Bingquan Liu,	In this study, we address the problem of idiom recommendation by leveraging a neural machine translation framework, in which we suppose that idioms are written with one pseudo target language.
553	Better Exploiting Latent Variables in Text Modeling	Canasai Kruengkrai,	We show that sampling latent variables multiple times at a gradient step helps in improving a variational autoencoder and propose a simple and effective method to better exploit these latent variables through hidden state averaging.
554	Misleading Failures of Partial-input Baselines	Shi Feng, Eric Wallace, Jordan Boyd-Graber,	We first design artificial datasets to illustrate how the trivial patterns that are only visible in the full input can evade any partial-input baseline. Next, we identify such artifacts in the SNLI dataset-a hypothesis-only model augmented with trivial patterns in the premise can solve 15% of previously-thought “hard” examples.
555	Soft Contextual Data Augmentation for Neural Machine Translation	Fei Gao, Jinhua Zhu, Lijun Wu, Yingce Xia, Tao Qin, Xueqi Cheng, Wengang Zhou, Tie-Yan Liu,	In this paper, we present a novel data augmentation method for neural machine translation.Different from previous augmentation methods that randomly drop, swap or replace words with other words in a sentence, we softly augment a randomly chosen word in a sentence by its contextual mixture of multiple related words.
556	Reversing Gradients in Adversarial Domain Adaptation for Question Deduplication and Textual Entailment Tasks	Anush Kamath, Sparsh Gupta, Vitor Carvalho,	Here we investigate the use of gradient reversal on adversarial domain adaptation to explicitly learn both shared and unshared (domain specific) representations between two textual domains.
557	Towards Integration of Statistical Hypothesis Tests into Deep Neural Networks	Ahmad Aghaebrahimian, Mark Cieliebak,	We report our ongoing work about a new deep architecture working in tandem with a statistical test procedure for jointly training texts and their label descriptions for multi-label and multi-class classification tasks.
558	Depth Growing for Neural Machine Translation	Lijun Wu, Yiren Wang, Yingce Xia, Fei Tian, Fei Gao, Tao Qin, Jianhuang Lai, Tie-Yan Liu,	In this work, we propose an effective two-stage approach with three specially designed components to construct deeper NMT models, which result in significant improvements over the strong Transformer baselines on WMT14 English$\to$German and English$\to$French translation tasks.
559	Generating Fluent Adversarial Examples for Natural Languages	Huangzhao Zhang, Hao Zhou, Ning Miao, Lei Li,	In this paper, we propose MHA, which addresses both problems by performing Metropolis-Hastings sampling, whose proposal is designed with the guidance of gradients.
560	Towards Explainable NLP: A Generative Explanation Framework for Text Classification	Hui Liu, Qingyu Yin, William Yang Wang,	To solve this problem, we propose a novel generative explanation framework that learns to make classification decisions and generate fine-grained explanations at the same time.
561	Combating Adversarial Misspellings with Robust Word Recognition	Danish Pruthi, Bhuwan Dhingra, Zachary C. Lipton,	To combat adversarial spelling mistakes, we propose placing a word recognition model in front of the downstream classifier.
562	An Empirical Investigation of Structured Output Modeling for Graph-based Neural Dependency Parsing	Zhisong Zhang, Xuezhe Ma, Eduard Hovy,	In this paper, we investigate the aspect of structured output modeling for the state-of-the-art graph-based neural dependency parser (Dozat and Manning, 2017).
563	Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes	Jie Cao, Michael Tanana, Zac Imel, Eric Poitras, David Atkins, Vivek Srikumar,	In this paper, we study modeling behavioral codes used to asses a psychotherapy treatment style called Motivational Interviewing (MI), which is effective for addressing substance abuse and related problems.
564	Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems	Hung Le, Doyen Sahoo, Nancy Chen, Steven Hoi,	To overcome this, we propose Multimodal Transformer Networks (MTN) to encode videos and incorporate information from different modalities.
565	Target-Guided Open-Domain Conversation	Jianheng Tang, Tiancheng Zhao, Chenyan Xiong, Xiaodan Liang, Eric Xing, Zhiting Hu,	We propose a structured approach that introduces coarse-grained keywords to control the intended content of system responses.
566	Persuasion for Good: Towards a Personalized Persuasive Dialogue System for Social Good	Xuewei Wang, Weiyan Shi, Richard Kim, Yoojung Oh, Sijia Yang, Jingwen Zhang, Zhou Yu,	We designed an online persuasion task where one participant was asked to persuade the other to donate to a specific charity. We collected a large dataset with 1,017 dialogues and annotated emerging persuasion strategies from a subset.
567	Improving Neural Conversational Models with Entropy-Based Data Filtering	Richárd Csáky, Patrik Purgai, Gábor Recski,	While previous methods for improving the quality of open-domain response generation focused on either the underlying model or the training objective, we present a method of filtering dialog datasets by removing generic utterances from training data using a simple entropy-based approach that does not require human supervision.
568	Zero-shot Word Sense Disambiguation using Sense Definition Embeddings	Sawan Kumar, Sharmistha Jat, Karan Saxena, Partha Talukdar,	To overcome this challenge, we propose Extended WSD Incorporating Sense Embeddings (EWISE), a supervised model to perform WSD by predicting over a continuous sense embedding space as opposed to a discrete label space.
569	Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation	Daniel Loureiro, Alípio Jorge,	In this work, we show that contextual embeddings can be used to achieve unprecedented gains in Word Sense Disambiguation (WSD) tasks.
570	Word2Sense: Sparse Interpretable Word Embeddings	Abhishek Panigrahi, Harsha Vardhan Simhadri, Chiranjib Bhattacharyya,	We present an unsupervised method to generate Word2Sense word embeddings that are interpretable – each dimension of the embedding space corresponds to a fine-grained sense, and the non-negative value of the embedding along the j-th dimension represents the relevance of the j-th sense to the word.
571	Modeling Semantic Compositionality with Sememe Knowledge	Fanchao Qi, Junjie Huang, Chenghao Yang, Zhiyuan Liu, Xiao Chen, Qun Liu, Maosong Sun,	In this paper, we verify the effectiveness of sememes, the minimum semantic units of human languages, in modeling SC by a confirmatory experiment.
572	Predicting Humorousness and Metaphor Novelty with Gaussian Process Preference Learning	Edwin Simpson, Erik-Lân Do Dinh, Tristan Miller, Iryna Gurevych,	We introduce a Bayesian approach for predicting humorousness and metaphor novelty using Gaussian process preference learning (GPPL), which achieves a Spearman’s ? of 0.56 against gold using word embeddings and linguistic features.
573	Empirical Linguistic Study of Sentence Embeddings	Katarzyna Krasnowska-Kiera?, Alina Wróblewska,	The purpose of the research is to answer the question whether linguistic information is retained in vector representations of sentences.
574	Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings	Yadollah Yaghoobzadeh, Katharina Kann, T. J. Hazen, Eneko Agirre, Hinrich Schütze,	We present a large dataset based on manual Wikipedia annotations and word senses, where word senses from different words are related by semantic classes.
575	Deep Neural Model Inspection and Comparison via Functional Neuron Pathways	James Fiacco, Samridhi Choudhary, Carolyn Rose,	We introduce a general method for the interpretation and comparison of neural models.
576	Collocation Classification with Unsupervised Relation Vectors	Luis Espinosa Anke, Steven Schockaert, Leo Wanner,	In this paper, we explore to which extent the current distributional landscape based on word embeddings provides a suitable basis for classification of collocations, i.e., pairs of words between which idiosyncratic lexical relations hold.
577	Corpus-based Check-up for Thesaurus	Natalia Loukachevitch,	In this paper we discuss the usefulness of applying a checking procedure to existing thesauri.
578	Confusionset-guided Pointer Networks for Chinese Spelling Check	Dingmin Wang, Yi Tay, Li Zhong,	This paper proposes Confusionset-guided Pointer Networks for Chinese Spell Check (CSC) task.
579	Generalized Data Augmentation for Low-Resource Translation	Mengzhou Xia, Xiang Kong, Antonios Anastasopoulos, Graham Neubig,	In this paper, we propose a general framework of data augmentation for low-resource machine translation not only using target-side monolingual data, but also by pivoting through a related high-resource language.
580	Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned	Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, Ivan Titov,	In this work we evaluate the contribution made by individual attention heads to the overall performance of the model and analyze the roles played by them in the encoder.
581	Better OOV Translation with Bilingual Terminology Mining	Matthias Huck, Viktor Hangya, Alexander Fraser,	We improve the translation of OOVs in NMT using easy-to-obtain monolingual data.
582	Simultaneous Translation with Flexible Policy via Restricted Imitation Learning	Baigong Zheng, Renjie Zheng, Mingbo Ma, Liang Huang,	We propose a much simpler single model that adds a “delay” token to the target vocabulary, and design a restricted dynamic oracle to greatly simplify training.
583	Target Conditioned Sampling: Optimizing Data Selection for Multilingual Neural Machine Translation	Xinyi Wang, Graham Neubig,	In this paper, we seek to construct a sampling distribution over all multilingual data, so that it minimizes the training loss of the low-resource language.
584	Adversarial Learning of Privacy-Preserving Text Representations for De-Identification of Medical Records	Max Friedrich, Arne Köhn, Gregor Wiedemann, Chris Biemann,	We introduce a method to create privacy-preserving shareable representations of medical text (i.e. they contain no PHI) that does not require expensive manual pseudonymization.
585	Merge and Label: A Novel Neural Network Architecture for Nested NER	Joseph Fisher, Andreas Vlachos,	In this paper we introduce a novel neural network architecture that first merges tokens and/or entities into entities forming nested structures, and then labels each of them independently.
586	Low-resource Deep Entity Resolution with Transfer and Active Learning	Jungo Kasai, Kun Qian, Sairam Gurajada, Yunyao Li, Lucian Popa,	In this paper, we develop a deep learning-based method that targets low-resource settings for ER through a novel combination of transfer learning and active learning.
587	A Semi-Markov Structured Support Vector Machine Model for High-Precision Named Entity Recognition	Ravneet Arora, Chen-Tse Tsai, Ketevan Tsereteli, Prabhanjan Kambadur, Yi Yang,	In this paper, we propose a neural semi-Markov structured support vector machine model that controls the precision-recall trade-off by assigning weights to different types of errors in the loss-augmented inference during training.
588	Using Human Attention to Extract Keyphrase from Microblog Post	Yingyi Zhang, Chengzhi Zhang,	Thus, this paper aims to integrate human attention into keyphrase extraction models.
589	Model-Agnostic Meta-Learning for Relation Classification with Limited Supervision	Abiola Obamuyide, Andreas Vlachos,	In this paper we frame the task of supervised relation classification as an instance of meta-learning.
590	Variational Pretraining for Semi-supervised Text Classification	Suchin Gururangan, Tam Dang, Dallas Card, Noah A. Smith,	We introduce VAMPIRE, a lightweight pretraining framework for effective text classification when data and computing resources are limited.
591	Task Refinement Learning for Improved Accuracy and Stability of Unsupervised Domain Adaptation	Yftah Ziser, Roi Reichart,	In this paper we propose a Task Refinement Learning (TRL) approach, in order to solve these problems.
592	Optimal Transport-based Alignment of Learned Character Representations for String Similarity	Derek Tam, Nicholas Monath, Ari Kobren, Aaron Traylor, Rajarshi Das, Andrew McCallum,	In this work, we present STANCE-a learned model for computing the similarity of two strings.
593	The Referential Reader: A Recurrent Entity Network for Anaphora Resolution	Fei Liu, Luke Zettlemoyer, Jacob Eisenstein,	We present a new architecture for storing and accessing entity mentions during online text processing.
594	Interpolated Spectral NGram Language Models	Ariadna Quattoni, Xavier Carreras,	In this work we employ a technique for scaling up spectral learning, and use interpolated predictions that are optimized to maximize perplexity.
595	BAM! Born-Again Multi-Task Networks for Natural Language Understanding	Kevin Clark, Minh-Thang Luong, Urvashi Khandelwal, Christopher D. Manning, Quoc V. Le,	To help address this, we propose using knowledge distillation where single-task models teach a multi-task model.
596	Curate and Generate: A Corpus and Method for Joint Control of Semantics and Style in Neural NLG	Shereen Oraby, Vrindavan Harrison, Abteen Ebrahimi, Marilyn Walker,	We present YelpNLG, a corpus of 300,000 rich, parallel meaning representations and highly stylistically varied reference texts spanning different restaurant attributes, and describe a novel methodology that can be scalably reused to generate NLG datasets for other domains.
597	Automated Chess Commentator Powered by Neural Chess Engine	Hongyu Zang, Zhiwei Yu, Xiaojun Wan,	In this paper, we explore a new approach for automated chess commentary generation, which aims to generate chess commentary texts in different categories (e.g., \textit{description}, \textit{comparison}, \textit{planning}, etc.).
598	Barack’s Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling	Robert Logan, Nelson F. Liu, Matthew E. Peters, Matt Gardner, Sameer Singh,	To address this, we introduce the knowledge graph language model (KGLM), a neural language model with mechanisms for selecting and copying facts from a knowledge graph that are relevant to the context.
599	Controllable Paraphrase Generation with a Syntactic Exemplar	Mingda Chen, Qingming Tang, Sam Wiseman, Kevin Gimpel,	In this work, we propose a novel task, where the syntax of a generated sentence is controlled rather by a sentential exemplar. To evaluate quantitatively with standard metrics, we create a novel dataset with human annotations.
600	Towards Comprehensive Description Generation from Factual Attribute-value Tables	Tianyu Liu, Fuli Luo, Pengcheng Yang, Wei Wu, Baobao Chang, Zhifang Sui,	To relieve these problems, we first propose force attention (FA) method to encourage the generator to pay more attention to the uncovered attributes to avoid potential key attributes missing. Furthermore, we propose reinforcement learning for information richness to generate more informative as well as more loyal descriptions for tables.
601	Style Transformer: Unpaired Text Style Transfer without Disentangled Latent Representation	Ning Dai, Jianze Liang, Xipeng Qiu, Xuanjing Huang,	In this paper, we propose the Style Transformer, which makes no assumption about the latent representation of source sentence and equips the power of attention mechanism in Transformer to achieve better style transfer and better content preservation.
602	Generating Sentences from Disentangled Syntactic and Semantic Spaces	Yu Bao, Hao Zhou, Shujian Huang, Lei Li, Lili Mou, Olga Vechtomova, Xin-yu Dai, Jiajun Chen,	In this paper, we propose to generate sentences from disentangled syntactic and semantic spaces.
603	Learning to Control the Fine-grained Sentiment for Story Ending Generation	Fuli Luo, Damai Dai, Pengcheng Yang, Tianyu Liu, Baobao Chang, Zhifang Sui, Xu Sun,	Therefore, we propose a generic and novel framework which consists of a sentiment analyzer and a sentimental generator, respectively addressing the two challenges.
604	Self-Attention Architectures for Answer-Agnostic Neural Question Generation	Thomas Scialom, Benjamin Piwowarski, Jacopo Staiano,	We explore how Transformers can be adapted to the task of Neural Question Generation without constraining the model to focus on a specific answer passage.
605	Unsupervised Paraphrasing without Translation	Aurko Roy, David Grangier,	This work proposes to learn paraphrasing models only from a monolingual corpus.
606	Storyboarding of Recipes: Grounded Contextual Generation	Khyathi Chandu, Eric Nyberg, Alan W Black,	We introduce a dataset for sequential procedural (how-to) text generation from images in cooking domain.
607	Negative Lexically Constrained Decoding for Paraphrase Generation	Tomoyuki Kajiwara,	To solve this problem, we propose a neural model for paraphrase generation that first identifies words in the source sentence that should be paraphrased.
608	Large-Scale Transfer Learning for Natural Language Generation	Sergey Golovanov, Rauf Kurbanov, Sergey Nikolenko, Kyryl Truskovskyi, Alexander Tselousov, Thomas Wolf,	We focus in particular on open-domain dialog as a typical high entropy generation task, presenting and comparing different architectures for adapting pretrained models with state of the art results.
609	Automatic Grammatical Error Correction for Sequence-to-sequence Text Generation: An Empirical Study	Tao Ge, Xingxing Zhang, Furu Wei, Ming Zhou,	In this paper, we present a preliminary empirical study on whether and how much automatic grammatical error correction can help improve seq2seq text generation.
610	Improving the Robustness of Question Answering Systems to Question Paraphrasing	Wee Chung Gan, Hwee Tou Ng,	Using a neural paraphrasing model trained to generate multiple paraphrased questions for a given source question and a set of paraphrase suggestions, we propose a data augmentation approach that requires no human intervention to re-train the models for improved robustness to question paraphrasing.
611	RankQA: Neural Question Answering with Answer Re-Ranking	Bernhard Kratzwald, Anna Eigenmann, Stefan Feuerriegel,	In contrast, this work proposes RankQA: RankQA extends the conventional two-stage process in neural QA with a third stage that performs an additional answer re-ranking.
612	Latent Retrieval for Weakly Supervised Open Domain Question Answering	Kenton Lee, Ming-Wei Chang, Kristina Toutanova,	We show for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system.
613	Multi-hop Reading Comprehension through Question Decomposition and Rescoring	Sewon Min, Victor Zhong, Luke Zettlemoyer, Hannaneh Hajishirzi,	We propose a system for multi-hop RC that decomposes a compositional question into simpler sub-questions that can be answered by off-the-shelf single-hop RC models.
614	Combining Knowledge Hunting and Neural Language Models to Solve the Winograd Schema Challenge	Ashok Prakash, Arpit Sharma, Arindam Mitra, Chitta Baral,	In this work, we build-up on the language model based methods and augment them with a commonsense knowledge hunting (using automatic extraction from text) module and an explicit reasoning module.
615	Careful Selection of Knowledge to Solve Open Book Question Answering	Pratyay Banerjee, Kuntal Kumar Pal, Arindam Mitra, Chitta Baral,	In this paper we address QA with respect to the OpenBookQA dataset and combine state of the art language models with abductive information retrieval (IR), information gain based re-ranking, passage selection and weighted scoring to achieve 72.0% accuracy, an 11.6% improvement over the current state of the art.
616	Learning Representation Mapping for Relation Detection in Knowledge Base Question Answering	Peng Wu, Shujian Huang, Rongxiang Weng, Zaixiang Zheng, Jianbing Zhang, Xiaohui Yan, Jiajun Chen,	In this paper, we propose a simple mapping method, named representation adapter, to learn the representation mapping for both seen and unseen relations based on previously learned relation embedding.
617	Dynamically Fused Graph Network for Multi-hop Reasoning	Lin Qiu, Yunxuan Xiao, Yanru Qu, Hao Zhou, Lei Li, Weinan Zhang, Yong Yu,	In this paper, we propose Dynamically Fused Graph Network (DFGN), a novel method to answer those questions requiring multiple scattered evidence and reasoning over them.
618	NLProlog: Reasoning with Weak Unification for Question Answering in Natural Language	Leon Weber, Pasquale Minervini, Jannes Münchmeyer, Ulf Leser, Tim Rocktäschel,	In this paper, we describe a model combining neural networks with logic programming in a novel manner for solving multi-hop reasoning tasks over natural language.
619	Modeling Intra-Relation in Math Word Problems with Different Functional Multi-Head Attentions	Jierui Li, Lei Wang, Jipeng Zhang, Yan Wang, Bing Tian Dai, Dongxiang Zhang,	To utilize the merits of deep learning models with simultaneous consideration of MWPs’ specific features, we propose a group attention mechanism to extract global features, quantity-related features, quantity-pair features and question-related features in MWPs respectively.
620	Synthetic QA Corpora Generation with Roundtrip Consistency	Chris Alberti, Daniel Andor, Emily Pitler, Jacob Devlin, Michael Collins,	We introduce a novel method of generating synthetic question answering corpora by combining models of question generation and answer extraction, and by filtering the results to ensure roundtrip consistency.
621	Are Red Roses Red? Evaluating Consistency of Question-Answering Models	Marco Tulio Ribeiro, Carlos Guestrin, Sameer Singh,	We propose a method to automatically extract such implications for instances from two QA datasets, VQA and SQuAD, which we then use to evaluate the consistency of models.
622	MC^2: Multi-perspective Convolutional Cube for Conversational Machine Reading Comprehension	Xuanyu Zhang,	To comprehend context profoundly and efficiently from different perspectives, we propose a novel neural network model, Multi-perspective Convolutional Cube (MC{\^{}}2).
623	Reducing Word Omission Errors in Neural Machine Translation: A Contrastive Learning Approach	Zonghan Yang, Yong Cheng, Yang Liu, Maosong Sun,	In this work, we propose a contrastive learning approach to reducing word omission errors in NMT.
624	Exploiting Sentential Context for Neural Machine Translation	Xing Wang, Zhaopeng Tu, Longyue Wang, Shuming Shi,	In this work, we present novel approaches to exploit sentential context for neural machine translation (NMT).
625	Wetin dey with these comments? Modeling Sociolinguistic Factors Affecting Code-switching Behavior in Nigerian Online Discussions	Innocent Ndubuisi-Obi, Sayan Ghosh, David Jurgens,	We introduce a new corpus of 330K articles and accompanying 389K comments labeled for code switching behavior.
626	Accelerating Sparse Matrix Operations in Neural Networks on Graphics Processing Units	Arturo Argueta, David Chiang,	We present two new GPU algorithms: one at the input layer, for multiplying a matrix by a few-hot vector (generalizing the more common operation of multiplication by a one-hot vector) and one at the output layer, for a fused softmax and top-N selection (commonly used in beam search).
627	An Automated Framework for Fast Cognate Detection and Bayesian Phylogenetic Inference in Computational Historical Linguistics	Taraka Rama, Johann-Mattis List,	We present a fully automated workflow for phylogenetic reconstruction on large datasets, consisting of two novel methods, one for fast detection of cognates and one for fast Bayesian phylogenetic inference.
628	Sentence Centrality Revisited for Unsupervised Summarization	Hao Zheng, Mirella Lapata,	In this paper we develop an unsupervised approach arguing that it is unrealistic to expect large-scale and high-quality training data to be available or created for different types of summaries, domains, or languages.
629	Discourse Representation Parsing for Sentences and Documents	Jiangming Liu, Shay B. Cohen, Mirella Lapata,	We introduce a novel semantic parsing task based on Discourse Representation Theory (DRT; Kamp and Reyle 1993).
630	Inducing Document Structure for Aspect-based Summarization	Lea Frermann, Alexandre Klementiev,	We tackle the task of aspect-based summarization, where, given a document and a target aspect, our models generate a summary centered around the aspect.
631	Incorporating Priors with Feature Attribution on Text Classification	Frederick Liu, Besim Avci,	Our approach integrates feature attributions into the objective function to allow machine learning practitioners to incorporate priors in model building.
632	Matching Article Pairs with Graphical Decomposition and Convolutions	Bang Liu, Di Niu, Haojie Wei, Jinghong Lin, Yancheng He, Kunfeng Lai, Yu Xu,	To model article pairs, we propose the Concept Interaction Graph to represent an article as a graph of concepts. To facilitate the evaluation of long article matching, we have created two datasets, each consisting of about 30K pairs of breaking news articles covering diverse topics in the open domain.
633	Hierarchical Transfer Learning for Multi-label Text Classification	Siddhartha Banerjee, Cem Akkaya, Francisco Perez-Sorrosal, Kostas Tsioutsiouliklis,	We propose a novel transfer learning based strategy, HTrans, where binary classifiers at lower levels in the hierarchy are initialized using parameters of the parent classifier and fine-tuned on the child category classification task.
634	Bias Analysis and Mitigation in the Evaluation of Authorship Verification	Janek Bevendorff, Matthias Hagen, Benno Stein, Martin Potthast,	In this paper we review, theoretically and practically, the authorship verification task and conclude that the underlying experiment design cannot guarantee pushing forward the state of the art-in fact, it allows for top benchmarking with a surprisingly straightforward approach. We pinpoint these sources in the evaluation chain and present a refined authorship corpus as effective countermeasure.
635	Numeracy-600K: Learning Numeracy for Detecting Exaggerated Information in Market Comments	Chung-Chi Chen, Hen-Hsen Huang, Hiroya Takamura, Hsin-Hsi Chen,	In this paper, we attempt to answer the question of whether neural network models can learn numeracy, which is the ability to predict the magnitude of a numeral at some specific position in a text description.
636	Large-Scale Multi-Label Text Classification on EU Legislation	Ilias Chalkidis, Emmanouil Fergadiotis, Prodromos Malakasiotis, Ion Androutsopoulos,	We consider Large-Scale Multi-Label Text Classification (LMTC) in the legal domain.
637	Why Didn’t You Listen to Me? Comparing User Control of Human-in-the-Loop Topic Models	Varun Kumar, Alison Smith-Renner, Leah Findlater, Kevin Seppi, Jordan Boyd-Graber,	Users should have a sense of control in HLTM systems, so we propose a control metric to measure whether refinement operations’ results match users’ expectations.
638	Encouraging Paragraph Embeddings to Remember Sentence Identity Improves Classification	Tu Vu, Mohit Iyyer,	In this paper, we investigate a state-of-the-art paragraph embedding method proposed by Zhang et al. (2017) and discover that it cannot reliably tell whether a given sentence occurs in the input paragraph or not.
639	A Multi-Task Architecture on Relevance-based Neural Query Translation	Sheikh Muhammad Sarwar, Hamed Bonab, James Allan,	We describe a multi-task learning approach to train a Neural Machine Translation (NMT) model with a Relevance-based Auxiliary Task (RAT) for search query translation.
640	Topic Modeling with Wasserstein Autoencoders	Feng Nan, Ran Ding, Ramesh Nallapati, Bing Xiang,	We propose a novel neural topic model in the Wasserstein autoencoders (WAE) framework.
641	Dense Procedure Captioning in Narrated Instructional Videos	Botian Shi, Lei Ji, Yaobo Liang, Nan Duan, Peng Chen, Zhendong Niu, Ming Zhou,	Motivated by video dense captioning, we propose a model to generate procedure captions from narrated instructional videos which are a sequence of step-wise clips with description.
642	Latent Variable Model for Multi-modal Translation	Iacer Calixto, Miguel Rios, Wilker Aziz,	In this work, we propose to model the interaction between visual and textual features for multi-modal neural machine translation (MMT) through a latent variable model.
643	Identifying Visible Actions in Lifestyle Vlogs	Oana Ignat, Laura Burdick, Jia Deng, Rada Mihalcea,	We construct a dataset with crowdsourced manual annotations of visible actions, and introduce a multimodal algorithm that leverages information derived from visual and linguistic clues to automatically infer which actions are visible in a video.
644	A Corpus for Reasoning about Natural Language Grounded in Photographs	Alane Suhr, Stephanie Zhou, Ally Zhang, Iris Zhang, Huajun Bai, Yoav Artzi,	We introduce a new dataset for joint reasoning about natural language and images, with a focus on semantic diversity, compositionality, and visual reasoning challenges.
645	Learning to Discover, Ground and Use Words with Segmental Neural Language Models	Kazuya Kawakami, Chris Dyer, Phil Blunsom,	We propose a segmental neural language model that combines the generalization power of neural networks with the ability to discover word-like units that are latent in unsegmented character sequences.
646	What Should I Ask? Using Conversationally Informative Rewards for Goal-oriented Visual Dialog.	Pushkar Shukla, Carlos Elmadjian, Richika Sharan, Vivek Kulkarni, Matthew Turk, William Yang Wang,	In this work, we focus on the task of goal-oriented visual dialogue, aiming to automatically generate a series of questions about an image with a single objective.
647	Symbolic Inductive Bias for Visually Grounded Learning of Spoken Language	Grzegorz Chrupa?a,	We propose to use multitask learning to exploit existing transcribed speech within the end-to-end setting.
648	Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog	Zhe Gan, Yu Cheng, Ahmed Kholy, Linjie Li, Jingjing Liu, Jianfeng Gao,	This paper presents a new model for visual dialog, Recurrent Dual Attention Network (ReDAN), using multi-step reasoning to answer a series of questions about an image.
649	Lattice Transformer for Speech Translation	Pei Zhang, Niyu Ge, Boxing Chen, Kai Fan,	The goal of this work is to extend the attention mechanism of the transformer to naturally consume the lattice in addition to the traditional sequential input.
650	Informative Image Captioning with External Sources of Information	Sanqiang Zhao, Piyush Sharma, Tomer Levinboim, Radu Soricut,	We introduce a multimodal, multi-encoder model based on Transformer that ingests both image features and multiple sources of entity labels.
651	CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication	Jin-Hwa Kim, Nikita Kitaev, Xinlei Chen, Marcus Rohrbach, Byoung-Tak Zhang, Yuandong Tian, Dhruv Batra, Devi Parikh,	In this work, we propose a goal-driven collaborative task that combines language, perception, and action. We collect the CoDraw dataset of {\textasciitilde}10K dialogs consisting of {\textasciitilde}138K messages exchanged between human players.
652	Bridging by Word: Image Grounded Vocabulary Construction for Visual Captioning	Zhihao Fan, Zhongyu Wei, Siyuan Wang, Xuanjing Huang,	To tackle this problem, we propose to construct an image-grounded vocabulary, based on which, captions are generated with limitation and guidance.
653	Distilling Translations with Visual Awareness	Julia Ive, Pranava Madhyastha, Lucia Specia,	We propose a translate-and-refine approach to this problem where images are only used by a second stage decoder.
654	VIFIDEL: Evaluating the Visual Fidelity of Image Descriptions	Pranava Madhyastha, Josiah Wang, Lucia Specia,	We propose a novel image-aware metric for this task: VIFIDEL.
655	Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation	Ronghang Hu, Daniel Fried, Anna Rohrbach, Dan Klein, Trevor Darrell, Kate Saenko,	To better use all the available modalities, we propose to decompose the grounding procedure into a set of expert models with access to different modalities (including object detections) and ensemble them at prediction time, improving the performance of state-of-the-art models on the VLN task.
656	Multimodal Transformer for Unaligned Multimodal Language Sequences	Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J. Zico Kolter, Louis-Philippe Morency, Ruslan Salakhutdinov,	In this paper, we introduce the Multimodal Transformer (MulT) to generically address the above issues in an end-to-end manner without explicitly aligning the data.
657	Show, Describe and Conclude: On Exploiting the Structure Information of Chest X-ray Reports	Baoyu Jing, Zeya Wang, Eric Xing,	In this work, we propose a novel framework which exploits the structure information between and within report sections for generating CXR imaging reports.
658	Visual Story Post-Editing	Ting-Yao Hsu, Chieh-Yang Huang, Yen-Chia Hsu, Ting-Hao Huang,	We introduce the first dataset for human edits of machine-generated visual stories and explore how these collected edits may be used for the visual story post-editing task.
659	Multimodal Abstractive Summarization for How2 Videos	Shruti Palaskar, Jind?ich Libovický, Spandana Gella, Florian Metze,	In this paper, we study abstractive summarization for open-domain videos.
660	Learning to Relate from Captions and Bounding Boxes	Sarthak Garg, Joel Ruben Antony Moniz, Anshu Aviral, Priyatham Bollimpalli,	In this work, we propose a novel approach that predicts the relationships between various entities in an image in a weakly supervised manner by relying on image captions and object bounding box annotations as the sole source of supervision.