Paper Digest: NAACL 2019 Highlights
Download NAACL-2019-Industry-Paper-Digests.pdf– highlights of all 28 NAACL-2019 industry track papers (.PDF file size is ~0.2M).
The North American Chapter of the Association for Computational Linguistics (NAACL) is one of the top natural language processing conferences in the world. In 2019, it is to be held in Minneapolis, MN. There were ~2,000 paper submissions, of which 424 were accepted. In addition, 28 industry papers are also accepted.
To help AI community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summary to quickly get the main idea of each paper.
We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting AI paper, you are welcome to sign up our free paper digest service to get new paper updates customized to your own interests on a daily basis.
Paper Digest Team
team@paperdigest.org
TABLE 1: NAACL 2019 Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | Entity Recognition at First Sight: Improving NER with Eye Movement Information | Nora Hollenstein, Ce Zhang, | In this work, we leverage eye movement features from three corpora with recorded gaze information to augment a state-of-the-art neural model for named entity recognition (NER) with gaze embeddings. |
2 | The emergence of number and syntax units in LSTM language models | Yair Lakretz, Germán Kruszewski, Théo Desbordes, Dieuwke Hupkes, Stanislas Dehaene, Marco Baroni, | We present here a detailed study of the inner mechanics of number tracking in LSTMs at the single neuron level. |
3 | Neural Self-Training through Spaced Repetition | Hadi Amiri, | In this work, we tackle the above challenges by introducing a new data sampling technique based on spaced repetition that dynamically samples informative and diverse unlabeled instances with respect to individual learner and instance characteristics. |
4 | Neural language models as psycholinguistic subjects: Representations of syntactic state | Richard Futrell, Ethan Wilcox, Takashi Morita, Peng Qian, Miguel Ballesteros, Roger Levy, | We investigate the extent to which the behavior of neural network language models reflects incremental representations of syntactic state. |
5 | Understanding language-elicited EEG data by predicting it from a fine-tuned language model | Dan Schwartz, Tom Mitchell, | We take a step towards better understanding the ERPs by finetuning a language model to predict them. |
6 | Pre-training on high-resource speech recognition improves low-resource speech-to-text translation | Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, Sharon Goldwater, | We present a simple approach to improve direct speech-to-text translation (ST) when the source language is low-resource: we pre-train the model on a high-resource automatic speech recognition (ASR) task, and then fine-tune its parameters for ST. We demonstrate that our approach is effective by pre-training on 300 hours of English ASR data to improve Spanish English ST from 10.8 to 20.2 BLEU when only 20 hours of Spanish-English ST training data are available. |
7 | Measuring the perceptual availability of phonological features during language acquisition using unsupervised binary stochastic autoencoders | Cory Shain, Micha Elsner, | In this paper, we deploy binary stochastic neural autoencoder networks as models of infant language learning in two typologically unrelated languages (Xitsonga and English). |
8 | Giving Attention to the Unexpected: Using Prosody Innovations in Disfluency Detection | Vicky Zayats, Mari Ostendorf, | This paper introduces a new approach to extracting acoustic-prosodic cues using text-based distributional prediction of acoustic cues to derive vector z-score features (innovations). |
9 | Massively Multilingual Adversarial Speech Recognition | Oliver Adams, Matthew Wiesner, Shinji Watanabe, David Yarowsky, | We report on adaptation of multilingual end-to-end speech recognition models trained on as many as 100 languages. |
10 | Lost in Interpretation: Predicting Untranslated Terminology in Simultaneous Interpretation | Nikolai Vogler, Craig Stewart, Graham Neubig, | In this paper, we propose a task of predicting which terminology simultaneous interpreters will leave untranslated, and examine methods that perform this task using supervised sequence taggers. |
11 | AudioCaps: Generating Captions for Audios in The Wild | Chris Dongjoo Kim, Byeongchang Kim, Hyunmin Lee, Gunhee Kim, | We explore the problem of Audio Captioning: generating natural language description for any kind of audio in the wild, which has been surprisingly unexplored in previous research. |
12 | “President Vows to Cut textlessTaxestextgreater Hair”: Dataset and Analysis of Creative Text Editing for Humorous Headlines | Nabil Hossain, John Krumm, Michael Gamon, | We introduce, release, and analyze a new dataset, called Humicroedit, for research in computational humor. |
13 | Answer-based Adversarial Training for Generating Clarification Questions | Sudha Rao, Hal Daumé III, | We present an approach for generating clarification questions with the goal of eliciting new information that would make the given textual context more complete. |
14 | Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data | Wei Zhao, Liang Wang, Kewei Shen, Ruoyu Jia, Jingming Liu, | In this paper, we propose a copy-augmented architecture for the GEC task by copying the unchanged words from the source sentence to the target sentence. |
15 | Topic-Guided Variational Auto-Encoder for Text Generation | Wenlin Wang, Zhe Gan, Hongteng Xu, Ruiyi Zhang, Guoyin Wang, Dinghan Shen, Changyou Chen, Lawrence Carin, | We propose a topic-guided variational auto-encoder (TGVAE) model for text generation. |
16 | Implementation of a Chomsky-Schutzenberger n-best parser for weighted multiple context-free grammars | Thomas Ruprecht, Tobias Denkinger, | We provide the first implementation of Chomsky-Sch{\”u}tzenberger parsing. |
17 | Phylogenic Multi-Lingual Dependency Parsing | Mathieu Dehouck, Pascal Denis, | In this paper, drawing inspiration from multi-task learning, we make use of the phylogenetic tree to guide the learning of multi-lingual dependency parsers leveraging languages structural similarities. |
18 | Discontinuous Constituency Parsing with a Stack-Free Transition System and a Dynamic Oracle | Maximin Coavoux, Shay B. Cohen, | We introduce a novel transition system for discontinuous constituency parsing. |
19 | How Bad are PoS Tagger in Cross-Corpora Settings? Evaluating Annotation Divergence in the UD Project. | Guillaume Wisniewski, François Yvon, | How Bad are PoS Tagger in Cross-Corpora Settings? Evaluating Annotation Divergence in the UD Project. |
20 | CCG Parsing Algorithm with Incremental Tree Rotation | Miloš Stanojevi?, Mark Steedman, | We propose a new incremental parsing algorithm for CCG following the same revealing tradition of work but having a purely syntactic approach that does not depend on access to a distinct level of semantic representation. |
21 | Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing | Hao Fu, Chunyuan Li, Xiaodong Liu, Jianfeng Gao, Asli Celikyilmaz, Lawrence Carin, | In this paper we study different scheduling schemes for $\beta$, and show that KL vanishing is caused by the lack of good latent codes in training decoder at the beginning of optimization. |
22 | Recurrent models and lower bounds for projective syntactic decoding | Natalie Schluter, | We show how recurrent models can carry out projective maximum spanning tree decoding. |
23 | Evaluating Composition Models for Verb Phrase Elliptical Sentence Embeddings | Gijs Wijnholds, Mehrnoosh Sadrzadeh, | In this paper, we develop different models for embedding VP-elliptical sentences. |
24 | Neural Finite-State Transducers: Beyond Rational Relations | Chu-Cheng Lin, Hao Zhu, Matthew R. Gormley, Jason Eisner, | We present training and inference algorithms for locally and globally normalized variants of NFSTs. |
25 | Riemannian Normalizing Flow on Variational Wasserstein Autoencoder for Text Modeling | Prince Zizhuang Wang, William Yang Wang, | To address this problem, we introduce an improved Variational Wasserstein Autoencoder (WAE) with Riemannian Normalizing Flow (RNF) for text modeling. |
26 | A Study of Incorrect Paraphrases in Crowdsourced User Utterances | Mohammad-Ali Yaghoub-Zadeh-Fard, Boualem Benatallah, Moshe Chai Barukh, Shayan Zamanirad, | In this paper, we investigate common crowdsourced paraphrasing issues, and propose an annotated dataset called Para-Quality, for detecting the quality issues. |
27 | ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters | Abdalghani Abujabal, Rishiraj Saha Roy, Mohamed Yahya, Gerhard Weikum, | We introduce ComQA, a large dataset of real user questions that exhibit different challenging aspects such as compositionality, temporal reasoning, and comparisons. |
28 | FreebaseQA: A New Factoid QA Data Set Matching Trivia-Style Question-Answer Pairs with Freebase | Kelvin Jiang, Dekun Wu, Hui Jiang, | In this paper, we present a new data set, named FreebaseQA, for open-domain factoid question answering (QA) tasks over structured knowledge bases, like Freebase. |
29 | Simple Question Answering with Subgraph Ranking and Joint-Scoring | Wenbo Zhao, Tagyoung Chung, Anuj Goyal, Angeliki Metallinou, | Motivated by this, we present a unified framework to describe and analyze existing approaches. |
30 | Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question Answering | Jianmo Ni, Chenguang Zhu, Weizhu Chen, Julian McAuley, | In this paper we propose a retriever-reader model that learns to attend on essential terms during the question answering process. |
31 | UHop: An Unrestricted-Hop Relation Extraction Framework for Knowledge-Based Question Answering | Zi-Yuan Chen, Chih-Hung Chang, Yi-Pei Chen, Jijnasa Nayak, Lun-Wei Ku, | In this paper, we propose UHop, an unrestricted-hop framework which relaxes this restriction by use of a transition-based search framework to replace the relation-chain-based search one. |
32 | BAG: Bi-directional Attention Entity Graph Convolutional Network for Multi-hop Reasoning Question Answering | Yu Cao, Meng Fang, Dacheng Tao, | We propose a Bi-directional Attention Entity Graph Convolutional Network (BAG), leveraging relationships between nodes in an entity graph and attention information between a query and the entity graph, to solve this task. |
33 | Vector of Locally-Aggregated Word Embeddings (VLAWE): A Novel Document-level Representation | Radu Tudor Ionescu, Andrei Butnaru, | In this paper, we propose a novel representation for text documents based on aggregating word embedding vectors into document embeddings. |
34 | Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis | Md Shad Akhtar, Dushyant Chauhan, Deepanway Ghosal, Soujanya Poria, Asif Ekbal, Pushpak Bhattacharyya, | In this paper, we present a deep multi-task learning framework that jointly performs sentiment and emotion analysis both. |
35 | Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence | Chi Sun, Luyao Huang, Xipeng Qiu, | In this paper, we construct an auxiliary sentence from the aspect and convert ABSA to a sentence-pair classification task, such as question answering (QA) and natural language inference (NLI). |
36 | A Variational Approach to Weakly Supervised Document-Level Multi-Aspect Sentiment Classification | Ziqian Zeng, Wenxuan Zhou, Xin Liu, Yangqiu Song, | In this paper, we propose a variational approach to weakly supervised document-level multi-aspect sentiment classification. |
37 | HiGRU: Hierarchical Gated Recurrent Units for Utterance-Level Emotion Recognition | Wenxiang Jiao, Haiqin Yang, Irwin King, Michael R. Lyu, | In this paper, we address three challenges in utterance-level emotion recognition in dialogue systems: (1) the same word can deliver different emotions in different contexts; (2) some emotions are rarely seen in general dialogues; (3) long-range contextual information is hard to be effectively captured. |
38 | Learning Interpretable Negation Rules via Weak Supervision at Document Level: A Reinforcement Learning Approach | Nicolas Pröllochs, Stefan Feuerriegel, Dirk Neumann, | To the best of our knowledge, our work presents the first approach that eliminates the need for world-level negation labels, replacing it instead with document-level sentiment annotations. |
39 | Simplified Neural Unsupervised Domain Adaptation | Timothy Miller, | In this work, we show that it is possible to improve on existing neural domain adaptation algorithms by 1) jointly training the representation learner with the task learner; and 2) removing the need for heuristically-selected “pivot features.” |
40 | Learning Bilingual Sentiment-Specific Word Embeddings without Cross-lingual Supervision | Yanlin Feng, Xiaojun Wan, | In this work, we propose UBiSE (Unsupervised Bilingual Sentiment Embeddings), which learns sentiment-specific word representations for two languages in a common space without any cross-lingual supervision. |
41 | ReWE: Regressing Word Embeddings for Regularization of Neural Machine Translation Systems | Inigo Jauregi Unanue, Ehsan Zare Borzeshi, Nazanin Esmaili, Massimo Piccardi, | To mollify this problem, we propose regressing word embeddings (ReWE) as a new regularization technique in a system that is jointly trained to predict the next word in the translation (categorical value) and its word embedding (continuous value). |
42 | Lost in Machine Translation: A Method to Reduce Meaning Loss | Reuben Cohn-Gordon, Noah Goodman, | Building on Bayesian models of informative utterance production, we present a method to define a less ambiguous translation system in terms of an underlying pre-trained neural sequence-to-sequence model. |
43 | Bi-Directional Differentiable Input Reconstruction for Low-Resource Neural Machine Translation | Xing Niu, Weijia Xu, Marine Carpuat, | We aim to better exploit the limited amounts of parallel text available in low-resource settings by introducing a differentiable reconstruction loss for neural machine translation (NMT). |
44 | Code-Switching for Enhancing NMT with Pre-Specified Translation | Kai Song, Yue Zhang, Heng Yu, Weihua Luo, Kun Wang, Min Zhang, | We investigate a data augmentation method, making code-switched training data by replacing source phrases with their target translations. |
45 | Aligning Vector-spaces with Noisy Supervised Lexicon | Noa Yehezkel Lubin, Jacob Goldberger, Yoav Goldberg, | We propose a model that accounts for noisy pairs. |
46 | Understanding and Improving Hidden Representations for Neural Machine Translation | Guanlin Li, Lemao Liu, Xintong Li, Conghui Zhu, Tiejun Zhao, Shuming Shi, | Towards understanding for performance improvement, we first artificially construct a sequence of nested relative tasks and measure the feature generalization ability of the learned hidden representation over these tasks. |
47 | Content Differences in Syntactic and Semantic Representation | Daniel Hershcovich, Omri Abend, Ari Rappoport, | We target this gap, and take Universal Dependencies (UD) and UCCA as a test case. |
48 | Attentive Mimicking: Better Word Embeddings by Attending to Informative Contexts | Timo Schick, Hinrich Schütze, | In this paper, we introduce attentive mimicking: the mimicking model is given access not only to a word’s surface form, but also to all available contexts and learns to attend to the most informative and reliable contexts for computing an embedding. |
49 | Evaluating Style Transfer for Text | Remi Mir, Bjarke Felbo, Nick Obradovich, Iyad Rahwan, | We propose a set of metrics for automated evaluation and demonstrate that they are more strongly correlated and in agreement with human judgment: direction-corrected Earth Mover’s Distance, Word Mover’s Distance on style-masked texts, and adversarial classification for the respective aspects. |
50 | Big BiRD: A Large, Fine-Grained, Bigram Relatedness Dataset for Examining Semantic Composition | Shima Asaadi, Saif Mohammad, Svetlana Kiritchenko, | In this paper, we describe how we created a large, fine-grained, bigram relatedness dataset (BiRD), using a comparative annotation technique called Best-Worst Scaling. |
51 | Outlier Detection for Improved Data Quality and Diversity in Dialog Systems | Stefan Larson, Anish Mahendran, Andrew Lee, Jonathan K. Kummerfeld, Parker Hill, Michael A. Laurenzano, Johann Hauswald, Lingjia Tang, Jason Mars, | We introduce a simple and effective technique for detecting both erroneous and unique samples in a corpus of short texts using neural sentence embeddings combined with distance-based outlier detection. |
52 | Asking the Right Question: Inferring Advice-Seeking Intentions from Personal Narratives | Liye Fu, Jonathan P. Chang, Cristian Danescu-Niculescu-Mizil, | To test the capabilities of NLP systems to recover such intuition, we introduce the new task of inferring what is the advice-seeking goal behind a personal narrative. |
53 | Seeing Things from a Different Angle:Discovering Diverse Perspectives about Claims | Sihao Chen, Daniel Khashabi, Wenpeng Yin, Chris Callison-Burch, Dan Roth, | Inherently, this is a natural language understanding task, and we propose to address it as such. |
54 | IMHO Fine-Tuning Improves Claim Detection | Tuhin Chakrabarty, Christopher Hidey, Kathy McKeown, | We propose to alleviate this problem by fine-tuning a language model using a Reddit corpus of 5.5 million opinionated claims. |
55 | Joint Multiple Intent Detection and Slot Labeling for Goal-Oriented Dialog | Rashmi Gangadharaiah, Balakrishnan Narayanaswamy, | We investigate an attention-based neural network model that performs multi-label classification for identifying multiple intents and produces labels for both intents and slot-labels at the token-level. |
56 | CITE: A Corpus of Image-Text Discourse Relations | Malihe Alikhani, Sreyasi Nag Chowdhury, Gerard de Melo, Matthew Stone, | This paper presents a novel crowd-sourced resource for multimodal discourse: our resource characterizes inferences in image-text contexts in the domain of cooking recipes in the form of coherence relations. |
57 | Improving Dialogue State Tracking by Discerning the Relevant Context | Sanuj Sharma, Prafulla Kumar Choubey, Ruihong Huang, | We, therefore, propose a novel framework for DST that identifies relevant historical context by referring to the past utterances where a particular slot-value changes and uses that together with weighted system utterance to identify the relevant context. |
58 | CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog | Satwik Kottur, José M. F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach, | We develop CLEVR-Dialog, a large diagnostic dataset for studying multi-round reasoning in visual dialog. |
59 | Learning Outside the Box: Discourse-level Features Improve Metaphor Identification | Jesse Mu, Helen Yannakoudakis, Ekaterina Shutova, | Inspired by pragmatic accounts of metaphor, we argue that broader discourse features are crucial for better metaphor identification. |
60 | Detection of Abusive Language: the Problem of Biased Datasets | Michael Wiegand, Josef Ruppenhofer, Thomas Kleinbauer, | We discuss the impact of data bias on abusive language detection. |
61 | Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them | Hila Gonen, Yoav Goldberg, | Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. |
62 | Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings | Thomas Manzini, Lim Yao Chong, Alan W. Black, Yulia Tsvetkov, | In this work, we propose a method to debias word embeddings in multiclass settings such as race and religion, extending the work of (Bolukbasi et al., 2016) from the binary setting, such as binary gender. |
63 | On Measuring Social Biases in Sentence Encoders | Chandler May, Alex Wang, Shikha Bordia, Samuel R. Bowman, Rachel Rudinger, | Accordingly, we extend the Word Embedding Association Test to measure bias in sentence encoders. |
64 | Gender Bias in Contextualized Word Embeddings | Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, Kai-Wei Chang, | In this paper, we quantify, analyze and mitigate gender bias exhibited in ELMo’s contextualized word vectors. |
65 | Combining Sentiment Lexica with a Multi-View Variational Autoencoder | Alexander Miserlis Hoyle, Lawrence Wolf-Sonkin, Hanna Wallach, Ryan Cotterell, Isabelle Augenstein, | We introduce a generative model of sentiment lexica to combine disparate scales into a common latent representation. |
66 | Enhancing Opinion Role Labeling with Semantic-Aware Word Representations from Semantic Role Labeling | Meishan Zhang, Peili Liang, Guohong Fu, | In this work, we propose a simple and novel method to enhance ORL by utilizing SRL, presenting semantic-aware word representations which are learned from SRL. |
67 | Frowning Frodo, Wincing Leia, and a Seriously Great Friendship: Learning to Classify Emotional Relationships of Fictional Characters | Evgeny Kim, Roman Klinger, | In this paper, we combine these aspects into a unified framework to classify emotional relationships of fictional characters. |
68 | Generalizing Unmasking for Short Texts | Janek Bevendorff, Benno Stein, Matthias Hagen, Martin Potthast, | In this paper, we present a generalized unmasking approach which allows for authorship verification of texts as short as four printed pages with very high precision at an adjustable recall tradeoff. |
69 | Adversarial Training for Satire Detection: Controlling for Confounding Variables | Robert McHardy, Heike Adel, Roman Klinger, | We therefore propose a novel model for satire detection with an adversarial component to control for the confounding variable of publication source. |
70 | Keyphrase Generation: A Text Summarization Struggle | Erion Çano, Ond?ej Bojar, | In this paper, we explore the possibility of considering the keyphrase string as an abstractive summary of the title and the abstract. First, we collect, process and release a large dataset of scientific paper metadata that contains 2.2 million records. |
71 | SEQ^3: Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive Sentence Compression | Christos Baziotis, Ion Androutsopoulos, Ioannis Konstas, Alexandros Potamianos, | We present a sequence-to-sequence-to-sequence autoencoder (SEQ{\^{}}3), consisting of two chained encoder-decoder pairs, with words used as a sequence of discrete latent variables. |
72 | Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation | Ori Shapira, David Gabay, Yang Gao, Hadar Ronen, Ramakanth Pasunuru, Mohit Bansal, Yael Amsterdamer, Ido Dagan, | We revisit the Pyramid approach, proposing a lightweight sampling-based version that is crowdsourcable. |
73 | Serial Recall Effects in Neural Language Modeling | Hassan Hajipoor, Hadi Amiri, Maseud Rahgozar, Farhad Oroumchian, | In this research, we investigate neural language models in the context of these serial recall effects. |
74 | Fast Concept Mention Grouping for Concept Map-based Multi-Document Summarization | Tobias Falke, Iryna Gurevych, | In this paper, we propose two alternative grouping techniques based on locality sensitive hashing, approximate nearest neighbor search and a fast clustering algorithm. |
75 | Syntax-aware Neural Semantic Role Labeling with Supertags | Jungo Kasai, Dan Friedman, Robert Frank, Dragomir Radev, Owen Rambow, | We introduce a new syntax-aware model for dependency-based semantic role labeling that outperforms syntax-agnostic models for English and Spanish. |
76 | Left-to-Right Dependency Parsing with Pointer Networks | Daniel Fernández-González, Carlos Gómez-Rodríguez, | We propose a novel transition-based algorithm that straightforwardly parses sentences from left to right by building n attachments, with n being the length of the input sentence. |
77 | Viable Dependency Parsing as Sequence Labeling | Michalina Strzyz, David Vilares, Carlos Gómez-Rodríguez, | We show instead that with a conventional BILSTM-based model it is possible to obtain fast and accurate parsers. |
78 | Pooled Contextualized Embeddings for Named Entity Recognition | Alan Akbik, Tanja Bergmann, Roland Vollgraf, | To address this drawback, we propose a method in which we dynamically aggregate contextualized embeddings of each unique string that we encounter. |
79 | Better Modeling of Incomplete Annotations for Named Entity Recognition | Zhanming Jie, Pengjun Xie, Wei Lu, Ruixue Ding, Linlin Li, | We highlight several pitfalls associated with learning under such a setup in the context of NER and identify limitations associated with existing approaches, proposing a novel yet easy-to-implement approach for recognizing named entities with incomplete data annotations. |
80 | Event Detection without Triggers | Shulin Liu, Yang Li, Feng Zhang, Tao Yang, Xinpeng Zhou, | In this work, we propose a novel framework dubbed as Type-aware Bias Neural Network with Attention Mechanisms (TBNNAM), which encodes the representation of a sentence based on target event types. |
81 | Sub-event detection from twitter streams as a sequence labeling problem | Giannis Bekoulis, Johannes Deleu, Thomas Demeester, Chris Develder, | This paper introduces improved methods for sub-event detection in social media streams, by applying neural sequence models not only on the level of individual posts, but also directly on the stream level. |
82 | GraphIE: A Graph-Based Framework for Information Extraction | Yujie Qian, Enrico Santus, Zhijing Jin, Jiang Guo, Regina Barzilay, | In this paper, we introduce GraphIE, a framework that operates over a graph representing a broad set of dependencies between textual units (i.e. words or sentences). |
83 | OpenKI: Integrating Open Information Extraction and Knowledge Bases with Relation Inference | Dongxu Zhang, Subhabrata Mukherjee, Colin Lockard, Luna Dong, Andrew McCallum, | In this paper, we consider advancing web-scale knowledge extraction and alignment by integrating OpenIE extractions in the form of (subject, predicate, object) triples with Knowledge Bases (KB). |
84 | Imposing Label-Relational Inductive Bias for Extremely Fine-Grained Entity Typing | Wenhan Xiong, Jiawei Wu, Deren Lei, Mo Yu, Shiyu Chang, Xiaoxiao Guo, William Yang Wang, | To model the underlying label correlations without access to manually annotated label structures, we introduce a novel label-relational inductive bias, represented by a graph propagation layer that effectively encodes both global label co-occurrence statistics and word-level similarities. |
85 | Improving Event Coreference Resolution by Learning Argument Compatibility from Unlabeled Data | Yin Jou Huang, Jing Lu, Sadao Kurohashi, Vincent Ng, | In this work, we propose a transfer learning framework for event coreference resolution that utilizes a large amount of unlabeled data to learn argument compatibility of event mentions. |
86 | Sentence Embedding Alignment for Lifelong Relation Extraction | Hong Wang, Wenhan Xiong, Mo Yu, Xiaoxiao Guo, Shiyu Chang, William Yang Wang, | Specifically, we utilize an explicit alignment model to mitigate the sentence embedding distortion of learned model when training on new data and new relations. |
87 | Description-Based Zero-shot Fine-Grained Entity Typing | Rasha Obeidat, Xiaoli Fern, Hamed Shahbazi, Prasad Tadepalli, | This work proposes a zero-shot entity typing approach that utilizes the type description available from Wikipedia to build a distributed semantic representation of the types. |
88 | Adversarial Decomposition of Text Representation | Alexey Romanov, Anna Rumshisky, Anna Rogers, David Donahue, | In this paper, we present a method for adversarial decomposition of text representation. |
89 | PoMo: Generating Entity-Specific Post-Modifiers in Context | Jun Seok Kang, Robert Logan, Zewei Chu, Yang Chen, Dheeru Dua, Kevin Gimpel, Sameer Singh, Niranjan Balasubramanian, | We introduce entity post-modifier generation as an instance of a collaborative writing task. |
90 | Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting | J. Edward Hu, Huda Khayrallah, Ryan Culkin, Patrick Xia, Tongfei Chen, Matt Post, Benjamin Van Durme, | We describe vectorized dynamic beam allocation, which extends work in lexically-constrained decoding to work with batching, leading to a five-fold improvement in throughput when working with positive constraints. |
91 | Courteously Yours: Inducing courteous behavior in Customer Care responses using Reinforced Pointer Generator Network | Hitesh Golchha, Mauajama Firdaus, Asif Ekbal, Pushpak Bhattacharyya, | In this paper, we propose an effective deep learning framework for inducing courteous behavior in customer care responses. |
92 | How to Avoid Sentences Spelling Boring? Towards a Neural Approach to Unsupervised Metaphor Generation | Zhiwei Yu, Xiaojun Wan, | In order to create novel metaphors, we propose a neural approach to metaphor generation and explore the shared inferential structure of a metaphorical usage and a literal usage of a verb. |
93 | Incorporating Context and External Knowledge for Pronoun Coreference Resolution | Hongming Zhang, Yan Song, Yangqiu Song, | In this paper, we propose a two-layer model for pronoun coreference resolution that leverages both context and external knowledge, where a knowledge attention mechanism is designed to ensure the model leveraging the appropriate source of external knowledge based on different context. |
94 | Unsupervised Deep Structured Semantic Models for Commonsense Reasoning | Shuohang Wang, Sheng Zhang, Yelong Shen, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Jing Jiang, | We propose two neural network models based on the Deep Structured Semantic Models (DSSM) framework to tackle two classic commonsense reasoning tasks, Winograd Schema challenges (WSC) and Pronoun Disambiguation (PDP). |
95 | Recovering dropped pronouns in Chinese conversations via modeling their referents | Jingxuan Yang, Jianzhuo Tong, Si Li, Sheng Gao, Jun Guo, Nianwen Xue, | In this work, we present a novel end-to-end neural network model to recover dropped pronouns in conversational data. |
96 | The problem with probabilistic DAG automata for semantic graphs | Ieva Vasiljeva, Sorcha Gilroy, Adam Lopez, | We show that some DAG automata cannot be made into useful probabilistic models by the nearly universal strategy of assigning weights to transitions. |
97 | A Systematic Study of Leveraging Subword Information for Learning Word Representations | Yi Zhu, Ivan Vuli?, Anna Korhonen, | In this work, we deliver such a study focusing on the variation of two crucial components required for subword-level integration into word representation models: 1) segmentation of words into subword units, and 2) subword composition functions to obtain final word representations. |
98 | Better Word Embeddings by Disentangling Contextual n-Gram Information | Prakhar Gupta, Matteo Pagliardini, Martin Jaggi, | In this paper, we show how training word embeddings jointly with bigram and even trigram embeddings, results in improved unigram embeddings. |
99 | Integration of Knowledge Graph Embedding Into Topic Modeling with Hierarchical Dirichlet Process | Dingcheng Li, Siamak Zamani, Jingyuan Zhang, Ping Li, | In this paper, we develop \textit{topic modeling with knowledge graph embedding} (TMKGE), a Bayesian nonparametric model to employ knowledge graph (KG) embedding in the context of topic modeling, for extracting more coherent topics. |
100 | Correlation Coefficients and Semantic Textual Similarity | Vitalii Zhelezniak, Aleksandar Savkov, April Shen, Nils Hammerla, | In this work, we illustrate that for all common word vectors, cosine similarity is essentially equivalent to the Pearson correlation coefficient, which provides some justification for its use. |
101 | Generating Token-Level Explanations for Natural Language Inference | James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal, | In this paper, we show that it is possible to generate token-level explanations for NLI without the need for training data explicitly annotated for this purpose. |
102 | Strong Baselines for Complex Word Identification across Multiple Languages | Pierre Finnimore, Elisabeth Fritzsch, Daniel King, Alison Sneyd, Aneeq Ur Rehman, Fernando Alva-Manchego, Andreas Vlachos, | In this paper, we present monolingual and cross-lingual CWI models that perform as well as (or better than) most models submitted to the latest CWI Shared Task. |
103 | Adaptive Convolution for Multi-Relational Learning | Xiaotian Jiang, Quan Wang, Bin Wang, | In this work we introduce ConvR, an adaptive convolutional network designed to maximize entity-relation interactions in a convolutional fashion. |
104 | Graph Pattern Entity Ranking Model for Knowledge Graph Completion | Takuma Ebisu, Ryutaro Ichise, | In this paper, we utilize graph patterns in a knowledge graph to overcome such problems. |
105 | Adversarial Training for Weakly Supervised Event Detection | Xiaozhi Wang, Xu Han, Zhiyuan Liu, Maosong Sun, Peng Li, | To address these issues, we build a large event-related candidate set with good coverage and then apply an adversarial training mechanism to iteratively identify those informative instances from the candidate set and filter out those noisy ones. |
106 | A Submodular Feature-Aware Framework for Label Subset Selection in Extreme Classification Problems | Elham J. Barezi, Ian D. Wood, Pascale Fung, Hamid R. Rabiee, | We propose a submodular maximization framework with linear cost to find informative labels which are most relevant to other labels yet least redundant with each other. |
107 | Relation Extraction with Temporal Reasoning Based on Memory Augmented Distant Supervision | Jianhao Yan, Lin He, Ruqin Huang, Jian Li, Ying Liu, | This paper formulates the problem of relation extraction with temporal reasoning and proposes a solution to predict whether two given entities participate in a relation at a given time spot. For this purpose, we construct a dataset called WIKI-TIME which additionally includes the valid period of a certain relation of two entities in the knowledge base. |
108 | Integrating Semantic Knowledge to Tackle Zero-shot Text Classification | Jingqing Zhang, Piyawat Lertvittayakumjorn, Yike Guo, | In this paper, we propose a two-phase framework together with data augmentation and feature augmentation to solve this problem. |
109 | Word-Node2Vec: Improving Word Embedding with Document-Level Non-Local Word Co-occurrences | Procheta Sen, Debasis Ganguly, Gareth Jones, | In this paper, we propose a graph-based word embedding method, named word-node2vec’. |
110 | Cross-Topic Distributional Semantic Representations Via Unsupervised Mappings | Eleftheria Briakou, Nikos Athanasiou, Alexandros Potamianos, | In this work, we propose a DSM that learns multiple distributional representations of a word based on different topics. |
111 | What just happened? Evaluating retrofitted distributional word vectors | Dmetri Hayes, | We propose root-mean-square error (RMSE) as an alternative evaluation metric, and demonstrate that correlation measures and RMSE sometimes yield opposite conclusions concerning the efficacy of retrofitting. |
112 | Linguistic Knowledge and Transferability of Contextual Representations | Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, Noah A. Smith, | To shed light on the linguistic knowledge they capture, we study the representations produced by several recent pretrained contextualizers (variants of ELMo, the OpenAI transformer language model, and BERT) with a suite of sixteen diverse probing tasks. |
113 | Mutual Information Maximization for Simple and Accurate Part-Of-Speech Induction | Karl Stratos, | We focus on two training objectives that are amenable to stochastic gradient descent (SGD): a novel generalization of the classical Brown clustering objective and a recently proposed variational lower bound. |
114 | Unsupervised Recurrent Neural Network Grammars | Yoon Kim, Alexander Rush, Lei Yu, Adhiguna Kuncoro, Chris Dyer, Gábor Melis, | In this work, we experiment with unsupervised learning of RNNGs. |
115 | Cooperative Learning of Disjoint Syntax and Semantics | Serhii Havrylov, Germán Kruszewski, Armand Joulin, | In this work, we present a recursive model inspired by Choi et al. (2018) that reaches near perfect accuracy on this task. |
116 | Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders | Andrew Drozdov, Patrick Verga, Mohit Yadav, Mohit Iyyer, Andrew McCallum, | We introduce the deep inside-outside recursive autoencoder (DIORA), a fully-unsupervised method for discovering syntax that simultaneously learns representations for constituents within the induced tree. |
117 | Knowledge-Augmented Language Model and Its Application to Unsupervised Named-Entity Recognition | Angli Liu, Jingfei Du, Veselin Stoyanov, | Our work demonstrates that named entities (and possibly other types of world knowledge) can be modeled successfully using predictive learning and training on large corpora of text without any additional information. |
118 | Syntax-Enhanced Neural Machine Translation with Syntax-Aware Word Representations | Meishan Zhang, Zhenghua Li, Guohong Fu, Min Zhang, | In this work, we propose a novel method to integrate source-side syntax implicitly for NMT. |
119 | Competence-based Curriculum Learning for Neural Machine Translation | Emmanouil Antonios Platanios, Otilia Stretcu, Graham Neubig, Barnabas Poczos, Tom Mitchell, | In this paper, we propose a curriculum learning framework for NMT that reduces training time, reduces the need for specialized heuristics or large batch sizes, and results in overall better performance. |
120 | Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation | Jiawei Wu, Xin Wang, William Yang Wang, | To avoid this fundamental issue, we propose an alternative but more effective approach, extract-edit, to extract and then edit real sentences from the target monolingual corpora. |
121 | Consistency by Agreement in Zero-Shot Neural Machine Translation | Maruan Al-Shedivat, Ankur Parikh, | In this paper, we focus on zero-shot generalization-a challenging setup that tests models on translation directions they have not been optimized for at training time. |
122 | Modeling Recurrence for Transformer | Jie Hao, Xing Wang, Baosong Yang, Longyue Wang, Jinfeng Zhang, Zhaopeng Tu, | In response to this problem, we propose to directly model recurrence for Transformer with an additional recurrence encoder. |
123 | Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models | Tiancheng Zhao, Kaige Xie, Maxine Eskenazi, | This paper proposes a novel latent action framework that treats the action spaces of an end-to-end dialog agent as latent variables and develops unsupervised methods in order to induce its own action space from the data. |
124 | Skeleton-to-Response: Dialogue Generation Guided by Retrieval Memory | Deng Cai, Yan Wang, Wei Bi, Zhaopeng Tu, Xiaojiang Liu, Wai Lam, Shuming Shi, | In this paper, we propose a new framework which exploits retrieval results via a skeleton-to-response paradigm. |
125 | Jointly Optimizing Diversity and Relevance in Neural Response Generation | Xiang Gao, Sungjin Lee, Yizhe Zhang, Chris Brockett, Michel Galley, Jianfeng Gao, Bill Dolan, | In this paper, we propose a SpaceFusion model to jointly optimize diversity and relevance that essentially fuses the latent space of a sequence-to-sequence model and that of an autoencoder model by leveraging novel regularization terms. |
126 | Disentangling Language and Knowledge in Task-Oriented Dialogs | Dinesh Raghu, Nikhil Gupta, Mausam, | We propose an encoder-decoder architecture (BoSsNet) with a novel Bag-of-Sequences (BoSs) memory, which facilitates the disentangled learning of the response’s language model and its knowledge incorporation. |
127 | Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together | Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang, | In this paper, we propose a novel attention mechanism called “Multi-mask Tensorized Self-Attention” (MTSA), which is as fast and as memory-efficient as a CNN, but significantly outperforms previous CNN-/RNN-/attention-based models. |
128 | WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations | Mohammad Taher Pilehvar, Jose Camacho-Collados, | In this paper we show that existing models have surpassed the performance ceiling of the standard evaluation dataset for the purpose, i.e., Stanford Contextual Word Similarity, and highlight its shortcomings. |
129 | Does My Rebuttal Matter? Insights from a Major NLP Conference | Yang Gao, Steffen Eger, Ilia Kuznetsov, Iryna Gurevych, Yusuke Miyao, | Aiming to fill this gap, we present a corpus that contains over 4k reviews and 1.2k author responses from ACL-2018. |
130 | Casting Light on Invisible Cities: Computationally Engaging with Literary Criticism | Shufan Wang, Mohit Iyyer, | While most previous work focuses on “distant reading” by algorithmically discovering high-level patterns from large collections of literary works, here we sharpen the focus of our methods to a single literary theory about Italo Calvino’s postmodern novel Invisible Cities, which consists of 55 short descriptions of imaginary cities. |
131 | PAWS: Paraphrase Adversaries from Word Scrambling | Yuan Zhang, Jason Baldridge, Luheng He, | This paper introduces PAWS (Paraphrase Adversaries from Word Scrambling), a new dataset with 108,463 well-formed paraphrase and non-paraphrase pairs with high lexical overlap. |
132 | Cross-Corpora Evaluation and Analysis of Grammatical Error Correction Models — Is Single-Corpus Evaluation Enough? | Masato Mita, Tomoya Mizumoto, Masahiro Kaneko, Ryo Nagata, Kentaro Inui, | This study explores the necessity of performing cross-corpora evaluation for grammatical error correction (GEC) models. |
133 | Star-Transformer | Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, Zheng Zhang, | In this paper, we present Star-Transformer, a lightweight alternative by careful sparsification. |
134 | Adaptation of Hierarchical Structured Models for Speech Act Recognition in Asynchronous Conversation | Tasnim Mohiuddin, Thanh-Tung Nguyen, Shafiq Joty, | In this paper, we propose methods to effectively leverage abundant unlabeled conversational data and the available labeled data from synchronous domains. |
135 | From legal to technical concept: Towards an automated classification of German political Twitter postings as criminal offenses | Frederike Zufall, Tobias Horsmann, Torsten Zesch, | In this article, we analyze which Twitter posts could actually be deemed offenses under German criminal law. |
136 | Joint Multi-Label Attention Networks for Social Text Annotation | Hang Dong, Wei Wang, Kaizhu Huang, Frans Coenen, | We propose a novel attention network for document annotation with user-generated tags. |
137 | Multi-Channel Convolutional Neural Network for Twitter Emotion and Sentiment Recognition | Jumayel Islam, Robert E. Mercer, Lu Xiao, | In this paper, we propose a novel use of a multi-channel convolutional neural architecture which can effectively use different emotion and sentiment indicators such as hashtags, emoticons and emojis that are present in the tweets and improve the performance of emotion and sentiment identification. |
138 | Detecting Cybersecurity Events from Noisy Short Text | Semih Yagcioglu, Mehmet saygin Seyfioglu, Begum Citamak, Batuhan Bardak, Seren Guldamlasioglu, Azmi Yuksel, Emin Islam Tatli, | In this study, we propose a method that leverages both domain-specific word embeddings and task-specific features to detect cyber security events from tweets. We collected a new dataset of cyber security related tweets from Twitter and manually annotated a subset of 2K of them. |
139 | White-to-Black: Efficient Distillation of Black-Box Adversarial Attacks | Yotam Gil, Yoav Chai, Or Gorodissky, Jonathan Berant, | In this work, we show that the knowledge implicit in the optimization procedure can be distilled into another more efficient neural network. |
140 | Analyzing the Perceived Severity of Cybersecurity Threats Reported on Social Media | Shi Zong, Alan Ritter, Graham Mueller, Evan Wright, | In this paper, we investigate methods to analyze the severity of cybersecurity threats based on the language that is used to describe them online. |
141 | Fake News Detection using Deep Markov Random Fields | Duc Minh Nguyen, Tien Huu Do, Robert Calderbank, Nikos Deligiannis, | To overcome this limitation, we develop a graph-theoretic method that inherits the power of deep learning while at the same time utilizing the correlations among the articles. |
142 | Issue Framing in Online Discussion Fora | Mareike Hartmann, Tallulah Jansen, Isabelle Augenstein, Anders Søgaard, | In this paper, we introduce a new issue frame annotated corpus of online discussions. |
143 | Vector of Locally Aggregated Embeddings for Text Representation | Hadi Amiri, Mitra Mohtarami, | We present Vector of Locally Aggregated Embeddings (VLAE) for effective and, ultimately, lossless representation of textual content. |
144 | Predicting the Type and Target of Offensive Posts in Social Media | Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar, | In contrast, here we target several different kinds of offensive content. |
145 | Biomedical Event Extraction based on Knowledge-driven Tree-LSTM | Diya Li, Lifu Huang, Heng Ji, Jiawei Han, | To better encode contextual information and external background knowledge, we propose a novel knowledge base (KB)-driven tree-structured long short-term memory networks (Tree-LSTM) framework, incorporating two new types of features: (1) dependency structures to capture wide contexts; (2) entity properties (types and category descriptions) from external ontologies via entity linking. |
146 | Detecting cognitive impairments by agreeing on interpretations of linguistic features | Zining Zhu, Jekaterina Novikova, Frank Rudzicz, | In this paper, we take a third approach, proposing Consensus Networks (CNs), a framework to classify after reaching agreements between modalities. |
147 | Relation Extraction using Explicit Context Conditioning | Gaurav Singh, Parminder Bhatia, | We refer to such indirect relations as second-order relations, and describe an efficient implementation for computing them. |
148 | Conversation Model Fine-Tuning for Classifying Client Utterances in Counseling Dialogues | Sungjoon Park, Donghyun Kim, Alice Oh, | With proper anonymization, we collect counselor-client dialogues, define meaningful categories of client utterances with professional counselors, and develop a novel neural network model for classifying the client utterances. |
149 | Using Similarity Measures to Select Pretraining Data for NER | Xiang Dai, Sarvnaz Karimi, Ben Hachey, Cecile Paris, | We propose three cost-effective measures to quantify different aspects of similarity between source pretraining and target task data. |
150 | Predicting Annotation Difficulty to Improve Task Routing and Model Performance for Biomedical Information Extraction | Yinfei Yang, Oshin Agarwal, Chris Tar, Byron C. Wallace, Ani Nenkova, | In this paper we demonstrate that directly modeling instance difficulty can be used to improve model performance and to route instances to appropriate annotators. |
151 | Detecting Depression in Social Media using Fine-Grained Emotions | Mario Ezra Aragon, Adrian Pastor Lopez Monroy, Luis Carlos Gonzalez Gurrola, Manuel Montes-y-Gomez, | We propose a new representation called Bag of Sub-Emotions (BoSE), which represents social media documents by a set of fine-grained emotions automatically generated using a lexical resource of emotions and subword embeddings. |
152 | A Silver Standard Corpus of Human Phenotype-Gene Relations | Diana Sousa, Andre Lamurias, Francisco M Couto, | This paper presents the Phenotype-Gene Relations (PGR) corpus, a silver standard corpus of human phenotype and gene annotations and their relations. We generated this corpus using Named-Entity Recognition tools, whose results were partially evaluated by eight curators, obtaining a precision of 87.01%. |
153 | Improving Lemmatization of Non-Standard Languages with Joint Learning | Enrique Manjavacas, Ákos Kádár, Mike Kestemont, | In the present paper we aim to improve lemmatization performance on a set of non-standard historical languages in which the difficulty is increased by an additional aspect (iii): spelling variation due to lacking orthographic standards. Finally, to encourage future work on processing of non-standard varieties, we release the dataset of non-standard languages underlying the present study, which is based on openly accessible sources. |
154 | One Size Does Not Fit All: Comparing NMT Representations of Different Granularities | Nadir Durrani, Fahim Dalvi, Hassan Sajjad, Yonatan Belinkov, Preslav Nakov, | We found that while representations derived from subwords are slightly better for modeling syntax, character-based representations are superior for modeling morphology and are also more robust to noisy input. |
155 | A Simple Joint Model for Improved Contextual Neural Lemmatization | Chaitanya Malaviya, Shijie Wu, Ryan Cotterell, | We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages from the Universal Dependencies corpora. |
156 | A Probabilistic Generative Model of Linguistic Typology | Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein, | By modelling all languages and features within the same architecture, we show how structural similarities between languages can be exploited to predict typological features with near-perfect accuracy, outperforming several baselines on the task of predicting held-out features. |
157 | Quantifying the morphosyntactic content of Brown Clusters | Manuel Ciosici, Leon Derczynski, Ira Assent, | We show that increases in Average Mutual Information, the clustering algorithms’ optimization goal, are highly correlated with improvements in encoding of morphosyntactic information. |
158 | Analyzing Bayesian Crosslingual Transfer in Topic Models | Shudong Hao, Michael J. Paul, | We introduce a theoretical analysis of crosslingual transfer in probabilistic topic models. |
159 | Recursive Subtree Composition in LSTM-Based Dependency Parsing | Miryam de Lhoneux, Miguel Ballesteros, Joakim Nivre, | We investigate the impact of adding a tree layer on top of a sequential model by recursively composing subtree representations (composition) in a transition-based parser that uses features extracted by a BiLSTM. |
160 | Cross-lingual CCG Induction | Kilian Evang, | We propose an alternative making use of cross-lingual learning: an existing source-language parser is used together with a parallel corpus to induce a grammar and parsing model for a target language. |
161 | Density Matching for Bilingual Word Embedding | Chunting Zhou, Xuezhe Ma, Di Wang, Graham Neubig, | In this paper, we propose an approach that instead expresses the two monolingual embedding spaces as probability densities defined by a Gaussian mixture model, and matches the two densities using a method called normalizing flow. |
162 | Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing | Tal Schuster, Ori Ram, Regina Barzilay, Amir Globerson, | We introduce a novel method for multilingual transfer that utilizes deep contextual embeddings, pretrained in an unsupervised fashion. |
163 | Early Rumour Detection | Kaimin Zhou, Chang Shu, Binyang Li, Jey Han Lau, | To address this, we present a novel methodology for early rumour detection. |
164 | Microblog Hashtag Generation via Encoding Conversation Contexts | Yue Wang, Jing Li, Irwin King, Michael R. Lyu, Shuming Shi, | Different from previous work considering hashtags to be inseparable, our work is the first effort to annotate hashtags with a novel sequence generation framework via viewing the hashtag as a short sequence of words. |
165 | Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems | Steffen Eger, Gözde Gül ?ahin, Andreas Rücklé, Ji-Ung Lee, Claudia Schulz, Mohsen Mesgar, Krishnkant Swarnkar, Edwin Simpson, Iryna Gurevych, | We consider this as a new type of adversarial attack in NLP, a setting to which humans are very robust, as our experiments with both simple and more difficult visual perturbations demonstrate. |
166 | Something’s Brewing! Early Prediction of Controversy-causing Posts from Discussion Features | Jack Hessel, Lillian Lee, | Using data from several different communities on reddit.com, we predict the ultimate controversiality of posts, leveraging features drawn from both the textual content and the tree structure of the early comments that initiate the discussion. |
167 | No Permanent Friends or Enemies: Tracking Relationships between Nations from News | Xiaochuang Han, Eunsol Choi, Chenhao Tan, | In this work, we explore unsupervised neural models to infer relations between nations from news articles. |
168 | Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation | Sebastian Gehrmann, Steven Layne, Franck Dernoncourt, | In particular, we present an extractive pipeline for section title generation by first selecting the most salient sentence and then applying deletion-based compression. |
169 | Unifying Human and Statistical Evaluation for Natural Language Generation | Tatsunori Hashimoto, Hugh Zhang, Percy Liang, | In this paper, we propose a unified framework which evaluates both diversity and quality, based on the optimal error rate of predicting whether a sentence is human- or machine-generated. |
170 | What makes a good conversation? How controllable attributes affect human judgments | Abigail See, Stephen Roller, Douwe Kiela, Jason Weston, | In this work, we examine two controllable neural text generation methods, conditional training and weighted decoding, in order to control four important attributes for chit-chat dialogue: repetition, specificity, response-relatedness and question-asking. |
171 | An Empirical Investigation of Global and Local Normalization for Recurrent Neural Sequence Models Using a Continuous Relaxation to Beam Search | Kartik Goyal, Chris Dyer, Taylor Berg-Kirkpatrick, | In this paper, we attempt to shed light on this problem through an empirical study. |
172 | Pun Generation with Surprise | He He, Nanyun Peng, Percy Liang, | In this paper, we propose an unsupervised approach to pun generation based on lots of raw (unhumorous) text and a surprisal principle. |
173 | Single Document Summarization as Tree Induction | Yang Liu, Ivan Titov, Mirella Lapata, | In this paper, we conceptualize single-document extractive summarization as a tree induction problem. |
174 | Fixed That for You: Generating Contrastive Claims with Semantic Edits | Christopher Hidey, Kathy McKeown, | To generate contrastive claims, we create a corpus of Reddit comment pairs self-labeled by posters using the acronym FTFY (fixed that for you). |
175 | Box of Lies: Multimodal Deception Detection in Dialogues | Felix Soldner, Verónica Pérez-Rosas, Rada Mihalcea, | In this paper, we address the task of detecting multimodal deceptive cues during conversational dialogues. We introduce a multimodal dataset containing deceptive conversations between participants playing the Box of Lies game from The Tonight Show Starring Jimmy Fallon, in which they try to guess whether an object description provided by their opponent is deceptive or not. |
176 | A Crowdsourced Corpus of Multiple Judgments and Disagreement on Anaphoric Interpretation | Massimo Poesio, Jon Chamberlain, Silviu Paun, Juntao Yu, Alexandra Uma, Udo Kruschwitz, | We present a corpus of anaphoric information (coreference) crowdsourced through a game-with-a-purpose. |
177 | A Streamlined Method for Sourcing Discourse-level Argumentation Annotations from the Crowd | Tristan Miller, Maria Sukhareva, Iryna Gurevych, | We present a method that breaks down a popular but relatively complex discourse-level argument annotation scheme into a simpler, iterative procedure that can be applied even by untrained annotators. |
178 | Unsupervised Dialog Structure Learning | Weiyan Shi, Tiancheng Zhao, Zhou Yu, | We propose to extract dialog structures using a modified VRNN model with discrete latent vectors. |
179 | Modeling Document-level Causal Structures for Event Causal Relation Identification | Lei Gao, Prafulla Kumar Choubey, Ruihong Huang, | We aim to comprehensively identify all the event causal relations in a document, both within a sentence and across sentences, which is important for reconstructing pivotal event structures. |
180 | Hierarchical User and Item Representation with Three-Tier Attention for Recommendation | Chuhan Wu, Fangzhao Wu, Junxin Liu, Yongfeng Huang, | In this paper, we propose a hierarchical user and item representation model with three-tier attention to learn user and item representations from reviews for recommendation. |
181 | Text Similarity Estimation Based on Word Embeddings and Matrix Norms for Targeted Marketing | Tim vor der Brück, Marc Pouly, | Motivated by an industrial application from the domain of youth marketing, where this approach produced only mediocre results, we propose an alternative way of combining the word vectors using matrix norms. |
182 | Glocal: Incorporating Global Information in Local Convolution for Keyphrase Extraction | Animesh Prasad, Min-Yen Kan, | We address this shortcoming by allowing the proper incorporation of global information into the GCN family of models through the use of scaled node weights. |
183 | A Study of Latent Structured Prediction Approaches to Passage Reranking | Iryna Haponchyk, Alessandro Moschitti, | In this paper, we propose a structured output approach which regards rankings as latent variables. |
184 | Combining Distant and Direct Supervision for Neural Relation Extraction | Iz Beltagy, Kyle Lo, Waleed Ammar, | We improve such models by combining the distant supervision data with an additional directly-supervised data, which we use as supervision for the attention weights. |
185 | Tweet Stance Detection Using an Attention based Neural Ensemble Model | Umme Aymun Siddiqua, Abu Nowshed Chy, Masaki Aono, | In this paper, we propose a neural ensemble model that adopts the strengths of these two LSTM variants to learn better long-term dependencies, where each module coupled with an attention mechanism that amplifies the contribution of important elements in the final representation. |
186 | Word Embedding-Based Automatic MT Evaluation Metric using Word Position Information | Hiroshi Echizen’ya, Kenji Araki, Eduard Hovy, | We propose a new automatic evaluation metric for machine translation. |
187 | Learning to Stop in Structured Prediction for Neural Machine Translation | Mingbo Ma, Renjie Zheng, Liang Huang, | We propose a novel ranking method which enables an optimal beam search stop- ping criteria. |
188 | Learning Unsupervised Multilingual Word Embeddings with Incremental Multilingual Hubs | Geert Heyman, Bregt Verreet, Ivan Vuli?, Marie-Francine Moens, | In this work, we propose a new robust framework for learning unsupervised multilingual word embeddings that mitigates the instability issues. |
189 | Curriculum Learning for Domain Adaptation in Neural Machine Translation | Xuan Zhang, Pamela Shapiro, Gaurav Kumar, Paul McNamee, Marine Carpuat, Kevin Duh, | We introduce a curriculum learning approach to adapt generic neural machine translation models to a specific domain. |
190 | Improving Robustness of Machine Translation with Synthetic Noise | Vaibhav Vaibhav, Sumeet Singh, Craig Stewart, Graham Neubig, | In this paper we propose methods to enhance the robustness of MT systems by emulating naturally occurring noise in otherwise clean data. |
191 | Non-Parametric Adaptation for Neural Machine Translation | Ankur Bapna, Orhan Firat, | We propose a novel n-gram level retrieval approach that relies on local phrase level similarities, allowing us to retrieve neighbors that are useful for translation even when overall sentence similarity is low. |
192 | Online Distilling from Checkpoints for Neural Machine Translation | Hao-Ran Wei, Shujian Huang, Ran Wang, Xin-Yu Dai, Jiajun Chen, | In contrast, we propose an online knowledge distillation method. |
193 | Value-based Search in Execution Space for Mapping Instructions to Programs | Dor Muhlgay, Jonathan Herzig, Jonathan Berant, | In this work, we propose a search algorithm that uses the target world state, known at training time, to train a critic network that predicts the expected reward of every search state. |
194 | VQD: Visual Query Detection In Natural Scenes | Manoj Acharya, Karan Jariwala, Christopher Kanan, | We propose a new visual grounding task called Visual Query Detection (VQD). |
195 | Improving Natural Language Interaction with Robots Using Advice | Nikhil Mehta, Dan Goldwasser, | In this paper we take the first step towards increasing the bandwidth of this interaction, and suggest a protocol for including advice, high-level observations about the task, which can help constrain the agent’s prediction. |
196 | Generating Knowledge Graph Paths from Textual Definitions using Sequence-to-Sequence Models | Victor Prokhorov, Mohammad Taher Pilehvar, Nigel Collier, | We present a novel method for mapping unrestricted text to knowledge graph entities by framing the task as a sequence-to-sequence problem. |
197 | Shifting the Baseline: Single Modality Performance on Visual Navigation & QA | Jesse Thomason, Daniel Gordon, Yonatan Bisk, | We present unimodal ablations on three recent datasets in visual navigation and QA, seeing an up to 29% absolute gain in performance over published baselines. |
198 | ExCL: Extractive Clip Localization Using Natural Language Descriptions | Soham Ghosh, Anuva Agarwal, Zarana Parekh, Alexander Hauptmann, | In order to select the most relevant video clip corresponding to the given text description, we propose a novel extractive approach that predicts the start and end frames by leveraging cross-modal interactions between the text and video – this removes the need to retrieve and re-rank multiple proposal segments. |
199 | Detecting dementia in Mandarin Chinese using transfer learning from a parallel corpus | Bai Li, Yi-Te Hsu, Frank Rudzicz, | We propose a method to learn a correspondence between independently engineered lexicosyntactic features in two languages, using a large parallel corpus of out-of-domain movie dialogue data. |
200 | Cross-lingual Visual Verb Sense Disambiguation | Spandana Gella, Desmond Elliott, Frank Keller, | We extend this line of work to the more challenging task of cross-lingual verb sense disambiguation, introducing the MultiSense dataset of 9,504 images annotated with English, German, and Spanish verbs. |
201 | Subword-Level Language Identification for Intra-Word Code-Switching | Manuel Mager, Özlem Çetino?lu, Katharina Kann, | In this paper, we extend the language identification task to the subword-level, such that it includes splitting mixed words while tagging each part with a language ID. |
202 | MuST-C: a Multilingual Speech Translation Corpus | Mattia A. Di Gangi, Roldano Cattoni, Luisa Bentivogli, Matteo Negri, Marco Turchi, | To fill this gap, we created MuST-C, a multilingual speech translation corpus whose size and quality will facilitate the training of end-to-end systems for SLT from English into 8 languages. |
203 | Contextualization of Morphological Inflection | Ekaterina Vylomova, Ryan Cotterell, Trevor Cohn, Timothy Baldwin, Jason Eisner, | In this paper, we isolate the task of predicting a fully inflected sentence from its partially lemmatized version. |
204 | A Robust Abstractive System for Cross-Lingual Summarization | Jessica Ouyang, Boya Song, Kathy McKeown, | We present a robust neural abstractive summarization system for cross-lingual summarization. We construct summarization corpora for documents automatically translated from three low-resource languages, Somali, Swahili, and Tagalog, using machine translation and the New York Times summarization corpus. |
205 | Improving Neural Machine Translation with Neural Syntactic Distance | Chunpeng Ma, Akihiro Tamura, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao, | We propose five strategies to improve NMT with NSD. |
206 | Measuring Immediate Adaptation Performance for Neural Machine Translation | Patrick Simianer, Joern Wuebker, John DeNero, | To this end, we propose new metrics that directly evaluate immediate adaptation performance for machine translation. |
207 | Differentiable Sampling with Flexible Reference Word Order for Neural Machine Translation | Weijia Xu, Xing Niu, Marine Carpuat, | Our new differentiable sampling algorithm addresses this issue by optimizing the probability that the reference can be aligned with the sampled output, based on a soft alignment predicted by the model itself. |
208 | Reinforcement Learning based Curriculum Optimization for Neural Machine Translation | Gaurav Kumar, George Foster, Colin Cherry, Maxim Krikun, | We consider the problem of making efficient use of heterogeneous training data in neural machine translation (NMT). |
209 | Overcoming Catastrophic Forgetting During Domain Adaptation of Neural Machine Translation | Brian Thompson, Jeremy Gwinnup, Huda Khayrallah, Kevin Duh, Philipp Koehn, | In this work, we interpret the drop in general-domain performance as catastrophic forgetting of general-domain knowledge. |
210 | Short-Term Meaning Shift: A Distributional Exploration | Marco Del Tredici, Raquel Fernández, Gemma Boleda, | We present the first exploration of meaning shift over short periods of time in online communities using distributional representations. We create a small annotated dataset and use it to assess the performance of a standard model for meaning shift detection on short-term meaning shift. |
211 | Detecting Derogatory Compounds — An Unsupervised Approach | Michael Wiegand, Maximilian Wolf, Josef Ruppenhofer, | We propose an unsupervised classification approach that incorporates linguistic properties of compounds. |
212 | Personalized Neural Embeddings for Collaborative Filtering with Text | Guangneng Hu, | We develop a Personalized Neural Embedding (PNE) framework to exploit both interactions and words seamlessly. |
213 | An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models | Alexandra Chronopoulou, Christos Baziotis, Alexandros Potamianos, | In this paper we present a conceptually simple and effective transfer learning approach that addresses the problem of catastrophic forgetting. |
214 | Incorporating Emoji Descriptions Improves Tweet Classification | Abhishek Singh, Eduardo Blanco, Wei Jin, | In this paper, we present a simple strategy to process emojis: replace them with their natural language description and use pretrained word embeddings as normally done with standard words. |
215 | Modeling Personal Biases in Language Use by Inducing Personalized Word Embeddings | Daisuke Oba, Naoki Yoshinaga, Shoetsu Sato, Satoshi Akasaki, Masashi Toyoda, | In this study, we propose a method of modeling such personal biases in word meanings (hereafter, semantic variations) with personalized word embeddings obtained by solving a task on subjective text while regarding words used by different individuals as different words. |
216 | Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media | Ramy Baly, Georgi Karadzhov, Abdelrhman Saleh, James Glass, Preslav Nakov, | In particular, we propose a multi-task ordinal regression framework that models the two problems jointly. |
217 | Joint Detection and Location of English Puns | Yanyan Zou, Wei Lu, | This paper presents an approach that addresses pun detection and pun location jointly from a sequence labeling perspective. |
218 | Harry Potter and the Action Prediction Challenge from Natural Language | David Vilares, Carlos Gómez-Rodríguez, | We explore the challenge of action prediction from textual descriptions of scenes, a testbed to approximate whether text inference can be used to predict upcoming actions. |
219 | Argument Mining for Understanding Peer Reviews | Xinyu Hua, Mitko Nikolov, Nikhil Badugu, Lu Wang, | In this work, we study the content and structure of peer reviews under the argument mining framework, through automatically detecting (1) the argumentative propositions put forward by reviewers, and (2) their types (e.g., evaluating the work or making suggestions for improvement). |
220 | An annotated dataset of literary entities | David Bamman, Sejal Popat, Sheng Shen, | We present empirical results demonstrating the performance of nested entity recognition models in this domain; training natively on in-domain literary data yields an improvement of over 20 absolute points in F-score (from 45.7 to 68.3), and mitigates a disparate impact in performance for male and female entities present in models trained on news data. We present a new dataset comprised of 210,532 tokens evenly drawn from 100 different English-language literary texts annotated for ACE entity categories (person, location, geo-political entity, facility, organization, and vehicle). |
221 | Abusive Language Detection with Graph Convolutional Networks | Pushkar Mishra, Marco Del Tredici, Helen Yannakoudakis, Ekaterina Shutova, | In contrast, working with graph convolutional networks (GCNs), we present the first approach that captures not only the structure of online communities but also the linguistic behavior of the users within them. |
222 | On the Importance of Distinguishing Word Meaning Representations: A Case Study on Reverse Dictionary Mapping | Mohammad Taher Pilehvar, | Through a set of experiments on a state-of-the-art reverse dictionary system based on neural networks, we show that a simple adjustment aimed at addressing the meaning conflation deficiency can lead to substantial improvements. |
223 | Factorising AMR generation through syntax | Kris Cao, Stephen Clark, | We show that decomposing the generation process this way leads to state-of-the-art single model performance generating from AMR without additional unlabelled data. |
224 | A Crowdsourced Frame Disambiguation Corpus with Ambiguity | Anca Dumitrache, Lora Aroyo, Chris Welty, | We present a resource for the task of FrameNet semantic frame disambiguation of over 5,000 word-sentence pairs from the Wikipedia corpus. |
225 | Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets | Nelson F. Liu, Roy Schwartz, Noah A. Smith, | We introduce inoculation by fine-tuning, a new analysis method for studying challenge datasets by exposing models (the metaphorical patient) to a small amount of data from the challenge dataset (a metaphorical pathogen) and assessing how well they can adapt. |
226 | A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization | Dai Quoc Nguyen, Thanh Vu, Tu Dinh Nguyen, Dat Quoc Nguyen, Dinh Phung, | In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object). |
227 | Partial Or Complete, That’s The Question | Qiang Ning, Hangfeng He, Chuchu Fan, Dan Roth, | This paper questions this common perception, motivated by the fact that structures consist of interdependent sets of variables. |
228 | Sequential Attention with Keyword Mask Model for Community-based Question Answering | Jianxin Yang, Wenge Rong, Libin Shi, Zhang Xiong, | We propose a Sequential Attention with Keyword Mask model(SAKM) for CQA to imitate human reading behavior. |
229 | Simple Attention-Based Representation Learning for Ranking Short Social Media Posts | Peng Shi, Jinfeng Rao, Jimmy Lin, | This paper explores the problem of ranking short social media posts with respect to user queries using neural networks. |
230 | AttentiveChecker: A Bi-Directional Attention Flow Mechanism for Fact Verification | Santosh Tokala, Vishal G, Avirup Saha, Niloy Ganguly, | In this paper, we present a completely task-agnostic pipelined system, AttentiveChecker, consisting of three homogeneous Bi-Directional Attention Flow (BIDAF) networks, which are multi-layer hierarchical networks that represent the context at different levels of granularity. |
231 | Practical, Efficient, and Customizable Active Learning for Named Entity Recognition in the Digital Humanities | Alexander Erdmann, David Joseph Wrisley, Benjamin Allen, Christopher Brown, Sophie Cohen-Bodénès, Micha Elsner, Yukun Feng, Brian Joseph, Béatrice Joyeux-Prunel, Marie-Catherine de Marneffe, | Thus, we propose an active learning solution for named entity recognition, attempting to maximize a custom model’s improvement per additional unit of manual annotation. |
232 | Doc2hash: Learning Discrete Latent variables for Documents Retrieval | Yifei Zhang, Hao Zhu, | In this paper, we propose a method, Doc2hash, that solves the gradient flow problem of the discrete stochastic layer by using continuous relaxation on priors, and trains the generative model in an end-to-end manner to generate hash codes. |
233 | Evaluating Text GANs as Language Models | Guy Tevet, Gavriel Habib, Vered Shwartz, Jonathan Berant, | In this work, we propose to approximate the distribution of text generated by a GAN, which permits evaluating them with traditional probability-based LM metrics. |
234 | Latent Code and Text-based Generative Adversarial Networks for Soft-text Generation | Md Akmal Haidar, Mehdi Rezagholizadeh, Alan Do Omri, Ahmad Rashid, | In this work, we introduce a novel text-based approach called Soft-GAN to effectively exploit GAN setup for text generation. |
235 | Neural Text Generation from Rich Semantic Representations | Valerie Hajdik, Jan Buys, Michael Wayne Goodman, Emily M. Bender, | We propose neural models to generate high-quality text from structured representations based on Minimal Recursion Semantics (MRS). |
236 | Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation | Amit Moryossef, Yoav Goldberg, Ido Dagan, | For training a plan-to-text generator, we present a method for matching reference texts to their corresponding text plans. |
237 | Evaluating Rewards for Question Generation Models | Tom Hosking, Sebastian Riedel, | We therefore optimise directly for various objectives beyond simply replicating the ground truth questions, including a novel approach using an adversarial discriminator that seeks to generate questions that are indistinguishable from real examples. |
238 | Text Generation from Knowledge Graphs with Graph Transformers | Rik Koncel-Kedziorski, Dhanush Bekal, Yi Luan, Mirella Lapata, Hannaneh Hajishirzi, | In this work, we address the problem of generating coherent multi-sentence texts from the output of an information extraction system, and in particular a knowledge graph. |
239 | Open Information Extraction from Question-Answer Pairs | Nikita Bhutani, Yoshihiko Suhara, Wang-Chiew Tan, Alon Halevy, H. V. Jagadish, | We describe NeurON, a system for extracting tuples from question-answer pairs. |
240 | Question Answering by Reasoning Across Documents with Graph Convolutional Networks | Nicola De Cao, Wilker Aziz, Ivan Titov, | We introduce a neural model which integrates and reasons relying on information spread within documents and across multiple documents. |
241 | A Qualitative Comparison of CoQA, SQuAD 2.0 and QuAC | Mark Yatskar, | We compare three new datasets for question answering: SQuAD 2.0, QuAC, and CoQA, along several of their new features: (1) unanswerable questions, (2) multi-turn interactions, and (3) abstractive answers.We show that the datasets provide complementary coverage of the first two aspects, but weak coverage of the third.Because of the datasets’ structural similarity, a single extractive model can be easily adapted to any of the datasets and we show improved baseline results on both SQuAD 2.0 and CoQA. |
242 | BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis | Hu Xu, Bing Liu, Lei Shu, Philip Yu, | We call this problem Review Reading Comprehension (RRC). In this work, we first build an RRC dataset called ReviewRC based on a popular benchmark for aspect-based sentiment analysis. |
243 | Old is Gold: Linguistic Driven Approach for Entity and Relation Linking of Short Text | Ahmad Sakor, Isaiah Onando Mulang’, Kuldeep Singh, Saeedeh Shekarpour, Maria Esther Vidal, Jens Lehmann, Sören Auer, | In this work, we present the Falcon approach which effectively maps entities and relations within a short text to its mentions of a background knowledge graph. |
244 | Be Consistent! Improving Procedural Text Comprehension using Label Consistency | Xinya Du, Bhavana Dalvi, Niket Tandon, Antoine Bosselut, Wen-tau Yih, Peter Clark, Claire Cardie, | We present a new learning framework that leverages label consistency during training, allowing consistency bias to be built into the model. |
245 | MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms | Aida Amini, Saadia Gabriel, Shanchuan Lin, Rik Koncel-Kedziorski, Yejin Choi, Hannaneh Hajishirzi, | We introduce a large-scale dataset of math word problems and an interpretable neural math problem solver by learning to map problems to their operation programs. |
246 | DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs | Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, Matt Gardner, | We introduce a new reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs. |
247 | An Encoding Strategy Based Word-Character LSTM for Chinese NER | Wei Liu, Tongge Xu, Qinghua Xu, Jiayu Song, Yueran Zu, | In this paper, we propose a novel word-character LSTM(WC-LSTM) model to add word information into the start or the end character of the word, alleviating the influence of word segmentation errors while obtaining the word boundary information. |
248 | Highly Effective Arabic Diacritization using Sequence to Sequence Modeling | Hamdy Mubarak, Ahmed Abdelali, Hassan Sajjad, Younes Samih, Kareem Darwish, | In this work, we present a unified character level sequence-to-sequence deep learning model that recovers both types of diacritics without the use of explicit feature engineering. |
249 | SC-LSTM: Learning Task-Specific Representations in Multi-Task Learning for Sequence Labeling | Peng Lu, Ting Bai, Philippe Langlais, | In order to do so, we propose a new LSTM cell which contains both shared parameters that can learn from all tasks, and task-specific parameters that can learn task-specific information. |
250 | Learning to Denoise Distantly-Labeled Data for Entity Typing | Yasumasa Onoe, Greg Durrett, | We investigate this approach on the ultra-fine entity typing task of Choi et al. (2018). |
251 | A Simple and Robust Approach to Detecting Subject-Verb Agreement Errors | Simon Flachs, Ophélie Lacroix, Marek Rei, Helen Yannakoudakis, Anders Søgaard, | We observe that rule-based error generation is less sensitive to syntactic parsing errors and irregularities than error detection and explore a simple, yet efficient approach to getting the best of both worlds: We train neural sequential labelers on the combination of large volumes of silver standard data, obtained through rule-based error generation, and gold standard data. |
252 | A Grounded Unsupervised Universal Part-of-Speech Tagger for Low-Resource Languages | Ronald Cardenas, Ying Lin, Heng Ji, Jonathan May, | In this work, we describe an approach for low-resource unsupervised POS tagging that yields fully grounded output and requires no labeled training data. |
253 | On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing | Wasi Ahmad, Zhisong Zhang, Xuezhe Ma, Eduard Hovy, Kai-Wei Chang, Nanyun Peng, | In this paper, we investigate crosslingual transfer and posit that an orderagnostic model will perform better when transferring to distant foreign languages. |
254 | A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations | Mingda Chen, Qingming Tang, Sam Wiseman, Kevin Gimpel, | We propose a generative model for a sentence that uses two latent variables, with one intended to represent the syntax of the sentence and the other to represent its semantics. |
255 | Self-Discriminative Learning for Unsupervised Document Embedding | Hong-You Chen, Chin-Hua Hu, Leila Wehbe, Shou-de Lin, | Unlike most previous work which learn the embedding based on self-prediction of the surface of text, we explicitly exploit the inter-document information and directly model the relations of documents in embedding space with a discriminative network and a novel objective. |
256 | Adaptive Convolution for Text Classification | Byung-Ju Choi, Jun-Hyung Park, SangKeun Lee, | In this paper, we present an adaptive convolution for text classification to give flexibility to convolutional neural networks (CNNs). |
257 | Zero-Shot Cross-Lingual Opinion Target Extraction | Soufian Jebbara, Philipp Cimiano, | In this work, we address the lack of available annotated data for specific languages by proposing a zero-shot cross-lingual approach for the extraction of opinion target expressions. |
258 | Adversarial Category Alignment Network for Cross-domain Sentiment Classification | Xiaoye Qu, Zhikang Zou, Yu Cheng, Yang Yang, Pan Zhou, | In this work, we propose an adversarial category alignment network (ACAN), which attempts to enhance category consistency between the source domain and the target domain. |
259 | Target-oriented Opinion Words Extraction with Target-fused Neural Sequence Labeling | Zhifang Fan, Zhen Wu, Xin-Yu Dai, Shujian Huang, Jiajun Chen, | In this paper, we propose a novel sequence labeling subtask for ABSA named TOWE (Target-oriented Opinion Words Extraction), which aims at extracting the corresponding opinion words for a given opinion target. We build four datasets for TOWE based on several popular ABSA benchmarks from laptop and restaurant reviews. |
260 | Abstractive Summarization of Reddit Posts with Multi-level Memory Networks | Byeongchang Kim, Hyunwoo Kim, Gunhee Kim, | We address the problem of abstractive summarization in two directions: proposing a novel dataset and a new model. First, we collect Reddit TIFU dataset, consisting of 120K posts from the online discussion forum Reddit. |
261 | Automatic learner summary assessment for reading comprehension | Menglin Xia, Ekaterina Kochmar, Ted Briscoe, | We present a summarization task for evaluating non-native reading comprehension and propose three novel approaches to automatically assess the learner summaries. |
262 | Data-efficient Neural Text Compression with Interactive Learning | Avinesh P.V.S, Christian M. Meyer, | In this paper, we propose a novel interactive setup to neural text compression that enables transferring a model to new domains and compression tasks with minimal human supervision. |
263 | Text Generation with Exemplar-based Adaptive Decoding | Hao Peng, Ankur Parikh, Manaal Faruqui, Bhuwan Dhingra, Dipanjan Das, | We propose a novel conditioned text generation model. |
264 | Guiding Extractive Summarization with Question-Answering Rewards | Kristjan Arumae, Fei Liu, | In this paper we describe a novel framework to guide a supervised, extractive summarization system with question-answering rewards. |
265 | Beyond task success: A closer look at jointly learning to see, ask, and GuessWhat | Ravi Shekhar, Aashish Venkatesh, Tim Baumgärtner, Elia Bruni, Barbara Plank, Raffaella Bernardi, Raquel Fernández, | We propose a grounded dialogue state encoder which addresses a foundational issue on how to integrate visual grounding with dialogue system components. |
266 | The World in My Mind: Visual Dialog with Adversarial Multi-modal Feature Encoding | Yiqun Yao, Jiaming Xu, Bo Xu, | In this paper, we propose a novel Adversarial Multi-modal Feature Encoding (AMFE) framework for effective and robust auxiliary training of visual dialog systems. |
267 | Strong and Simple Baselines for Multimodal Utterance Embeddings | Paul Pu Liang, Yao Chong Lim, Yao-Hung Hubert Tsai, Ruslan Salakhutdinov, Louis-Philippe Morency, | In this paper, we propose two simple but strong baselines to learn embeddings of multimodal utterances. |
268 | Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout | Hao Tan, Licheng Yu, Mohit Bansal, | In this paper, we present a generalizable navigational agent. |
269 | Towards Content Transfer through Grounded Text Generation | Shrimai Prabhumoye, Chris Quirk, Michel Galley, | This paper introduces the notion of Content Transfer for long-form text generation, where the task is to generate a next sentence in a document that both fits its context and is grounded in a content-rich external textual source such as a news story. As another contribution of this paper, we release a benchmark dataset of 640k Wikipedia referenced sentences paired with the source articles to encourage exploration of this new task. |
270 | Improving Machine Reading Comprehension with General Reading Strategies | Kai Sun, Dian Yu, Dong Yu, Claire Cardie, | Inspired by reading strategies identified in cognitive science, and given limited computational resources – just a pre-trained model and a fixed number of training instances – we propose three general strategies aimed to improve non-extractive machine reading comprehension (MRC): (i) BACK AND FORTH READING that considers both the original and reverse order of an input sequence, (ii) HIGHLIGHTING, which adds a trainable embedding to the text embedding of tokens that are relevant to the question and candidate answers, and (iii) SELF-ASSESSMENT that generates practice questions and candidate answers directly from the text in an unsupervised manner. |
271 | Multi-task Learning with Sample Re-weighting for Machine Reading Comprehension | Yichong Xu, Xiaodong Liu, Yelong Shen, Jingjing Liu, Jianfeng Gao, | We propose a multi-task learning framework to learn a joint Machine Reading Comprehension (MRC) model that can be applied to a wide range of MRC tasks in different domains. |
272 | Semantically-Aligned Equation Generation for Solving and Reasoning Math Word Problems | Ting-Rui Chiang, Yun-Nung Chen, | Motivated by the intuition about how human generates the equations given the problem texts, this paper presents a neural approach to automatically solve math word problems by operating symbols according to their semantic meanings in texts. |
273 | Iterative Search for Weakly Supervised Semantic Parsing | Pradeep Dasigi, Matt Gardner, Shikhar Murty, Luke Zettlemoyer, Eduard Hovy, | We propose a novel iterative training algorithm that alternates between searching for consistent logical forms and maximizing the marginal likelihood of the retrieved ones. |
274 | Alignment over Heterogeneous Embeddings for Question Answering | Vikas Yadav, Steven Bethard, Mihai Surdeanu, | We propose a simple, fast, and mostly-unsupervised approach for non-factoid question answering (QA) called Alignment over Heterogeneous Embeddings (AHE). |
275 | Bridging the Gap: Attending to Discontinuity in Identification of Multiword Expressions | Omid Rohanian, Shiva Taslimipoor, Samaneh Kouchaki, Le An Ha, Ruslan Mitkov, | We introduce a new method to tag Multiword Expressions (MWEs) using a linguistically interpretable language-independent deep learning architecture. |
276 | Incorporating Word Attention into Character-Based Word Segmentation | Shohei Higashiyama, Masao Utiyama, Eiichiro Sumita, Masao Ideuchi, Yoshiaki Oida, Yohei Sakamoto, Isaac Okada, | We propose a character-based model utilizing word information to leverage the advantages of both types of models. |
277 | VCWE: Visual Character-Enhanced Word Embeddings | Chi Sun, Xipeng Qiu, Xuanjing Huang, | In this paper, we propose a model to learn Chinese word embeddings via three-level composition: (1) a convolutional neural network to extract the intra-character compositionality from the visual shape of a character; (2) a recurrent neural network with self-attention to compose character representation into word embeddings; (3) the Skip-Gram framework to capture non-compositionality directly from the contextual information. |
278 | Subword Encoding in Lattice LSTM for Chinese Word Segmentation | Jie Yang, Yue Zhang, Shuailong Liang, | We investigate subword information for Chinese word segmentation, by integrating sub word embeddings trained using byte-pair encoding into a Lattice LSTM (LaLSTM) network over a character sequence. |
279 | Improving Cross-Domain Chinese Word Segmentation with Word Embeddings | Yuxiao Ye, Weikang Li, Yue Zhang, Likun Qiu, Jian Sun, | In this paper, we propose a semi-supervised word-based approach to improving cross-domain CWS given a baseline segmenter. |
280 | Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging | Apostolos Kemos, Heike Adel, Hinrich Schütze, | In this paper, we propose to eliminate the need for tokenizers with an end-to-end character-level semi-Markov conditional random field. |
281 | Shrinking Japanese Morphological Analyzers With Neural Networks and Semi-supervised Learning | Arseny Tolmachev, Daisuke Kawahara, Sadao Kurohashi, | We propose a compact alternative to these cumbersome approaches which do not rely on any externally provided n-gram or word representations. |
282 | Neural Constituency Parsing of Speech Transcripts | Paria Jamshid Lou, Yufei Wang, Mark Johnson, | This paper studies the performance of a neural self-attentive parser on transcribed speech. |
283 | Acoustic-to-Word Models with Conversational Context Information | Suyoun Kim, Florian Metze, | In this work, we present a direct acoustic-to-word, end-to-end speech recognition model capable of utilizing the conversational context to better process long conversations. |
284 | A Dynamic Speaker Model for Conversational Interactions | Hao Cheng, Hao Fang, Mari Ostendorf, | In this work, we introduce a neural model for learning a dynamically updated speaker embedding in a conversational context. |
285 | Fluent Translations from Disfluent Speech in End-to-End Speech Translation | Elizabeth Salesky, Matthias Sperber, Alexander Waibel, | We use a sequence-to-sequence model to translate from noisy, disfluent speech to fluent text with disfluencies removed using the recently collected copy-edited’ references for the Fisher Spanish-English dataset. |
286 | Relation Classification Using Segment-Level Attention-based CNN and Dependency-based RNN | Van-Hien Tran, Van-Thuy Phi, Hiroyuki Shindo, Yuji Matsumoto, | In this paper, we propose a new model effectively combining Segment-level Attention-based Convolutional Neural Networks (SACNNs) and Dependency-based Recurrent Neural Networks (DepRNNs). |
287 | Document-Level Event Factuality Identification via Adversarial Neural Network | Zhong Qian, Peifeng Li, Qiaoming Zhu, Guodong Zhou, | Document-Level Event Factuality Identification via Adversarial Neural Network |
288 | Distant Supervision Relation Extraction with Intra-Bag and Inter-Bag Attentions | Zhi-Xiu Ye, Zhen-Hua Ling, | This paper presents a neural relation extraction method to deal with the noisy training data generated by distant supervision. |
289 | Ranking-Based Autoencoder for Extreme Multi-label Classification | Bingyu Wang, Li Chen, Wei Sun, Kechen Qin, Kefeng Li, Hui Zhou, | In this paper, we propose a deep learning XML method, with a word-vector-based self-attention, followed by a ranking-based AutoEncoder architecture. |
290 | Posterior-regularized REINFORCE for Instance Selection in Distant Supervision | Qi Zhang, Siliang Tang, Xiang Ren, Fei Wu, Shiliang Pu, Yueting Zhuang, | This paper provides a new way to improve the efficiency of the REINFORCE training process. |
291 | Scalable Collapsed Inference for High-Dimensional Topic Models | Rashidul Islam, James Foulds, | In this paper, we develop an online inference algorithm for topic models which leverages stochasticity to scale well in the number of documents, sparsity to scale well in the number of topics, and which operates in the collapsed representation of the topic model for improved accuracy and run-time performance. |
292 | An Integrated Approach for Keyphrase Generation via Exploring the Power of Retrieval and Extraction | Wang Chen, Hou Pong Chan, Piji Li, Lidong Bing, Irwin King, | In this paper, we present a novel integrated approach for keyphrase generation (KG). |
293 | Predicting Malware Attributes from Cybersecurity Texts | Arpita Roy, Youngja Park, Shimei Pan, | In this paper, we propose a novel feature learning method to leverage diverse knowledge sources such as small amount of human annotations, unlabeled text and specifications about malware attribute labels. |
294 | Improving Distantly-supervised Entity Typing with Compact Latent Space Clustering | Bo Chen, Xiaotao Gu, Yufeng Hu, Siliang Tang, Guoping Hu, Yueting Zhuang, Xiang Ren, | In this work, we propose to regularize distantly supervised models with Compact Latent Space Clustering (CLSC) to bypass this problem and effectively utilize noisy data yet. |
295 | Modelling Instance-Level Annotator Reliability for Natural Language Labelling Tasks | Maolin Li, Arvid Fahlström Myrman, Tingting Mu, Sophia Ananiadou, | In this paper, we propose an unsupervised model which can handle both binary and multi-class labels. |
296 | Review-Driven Multi-Label Music Style Classification by Exploiting Style Correlations | Guangxiang Zhao, Jingjing Xu, Qi Zeng, Xuancheng Ren, Xu Sun, | This paper explores a new natural languageprocessing task, review-driven multi-label musicstyle classification. |
297 | Fact Discovery from Knowledge Base via Facet Decomposition | Zihao Fu, Yankai Lin, Zhiyuan Liu, Wai Lam, | To tackle this new problem, we propose a novel framework that decomposes the discovery problem into several facet discovery components. |
298 | A Richer-but-Smarter Shortest Dependency Path with Attentive Augmentation for Relation Extraction | Duy-Cat Can, Hoang-Quynh Le, Quang-Thuy Ha, Nigel Collier, | In this work, we propose a novel model that combines the advantages of these two approaches. |
299 | Bidirectional Attentive Memory Networks for Question Answering over Knowledge Bases | Yu Chen, Lingfei Wu, Mohammed J Zaki, | In this work, we propose to directly model the two-way flow of interactions between the questions and the KB via a novel Bidirectional Attentive Memory Network, called BAMnet. |
300 | BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions | Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, Kristina Toutanova, | In this paper we study yes/no questions that are naturally occurring – meaning that they are generated in unprompted and unconstrained settings. We build a reading comprehension dataset, BoolQ, of such questions, and show that they are unexpectedly challenging. |
301 | Enhancing Key-Value Memory Neural Networks for Knowledge Based Question Answering | Kun Xu, Yuxuan Lai, Yansong Feng, Zhiguo Wang, | In this paper, we propose a novel mechanism to enable conventional KV-MemNNs models to perform interpretable reasoning for complex questions. |
302 | Repurposing Entailment for Multi-Hop Question Answering Tasks | Harsh Trivedi, Heeyoung Kwon, Tushar Khot, Ashish Sabharwal, Niranjan Balasubramanian, | We introduce Multee, a general architecture that can effectively use entailment models for multi-hop QA tasks. |
303 | GenderQuant: Quantifying Mention-Level Genderedness | Ananya, Nitya Parthasarthi, Sameer Singh, | In this paper, we use existing NLP pipelines to automatically annotate gender of mentions in the text. |
304 | Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings | Dorottya Demszky, Nikhil Garg, Rob Voigt, James Zou, Jesse Shapiro, Matthew Gentzkow, Dan Jurafsky, | We provide an NLP framework to uncover four linguistic dimensions of political polarization in social media: topic choice, framing, affect and illocutionary force. |
305 | Learning to Decipher Hate Symbols | Jing Qian, Mai ElSherief, Elizabeth Belding, William Yang Wang, | In this paper, we propose a novel task of deciphering hate symbols. To do this, we leveraged the Urban Dictionary and collected a new, symbol-rich Twitter corpus of hate speech. |
306 | Long-tail Relation Extraction via Knowledge Graph Embeddings and Graph Convolution Networks | Ningyu Zhang, Shumin Deng, Zhanlin Sun, Guanying Wang, Xi Chen, Wei Zhang, Huajun Chen, | We propose a distance supervised relation extraction approach for long-tailed, imbalanced data which is prevalent in real-world settings. |
307 | GAN Driven Semi-distant Supervision for Relation Extraction | Pengshuai Li, Xinsong Zhang, Weijia Jia, Hai Zhao, | To address this issue, we propose a novel semi-distant supervision approach for relation extraction by constructing a small accurate dataset and properly leveraging numerous instances without relation labels. |
308 | A general framework for information extraction using dynamic span graphs | Yi Luan, Dave Wadden, Luheng He, Amy Shah, Mari Ostendorf, Hannaneh Hajishirzi, | We introduce a general framework for several information extraction tasks that share span representations using dynamically constructed span graphs. |
309 | OpenCeres: When Open Information Extraction Meets the Semi-Structured Web | Colin Lockard, Prashant Shiralkar, Xin Luna Dong, | In this paper, we define the problem of OpenIE from semi-structured websites to extract such facts, and present an approach for solving it. We also introduce a labeled evaluation dataset to motivate research in this area. |
310 | Structured Minimally Supervised Learning for Neural Relation Extraction | Fan Bai, Alan Ritter, | We present an approach to minimally supervised relation extraction that combines the benefits of learned representations and structured learning, and accurately predicts sentence-level relation mentions given only proposition-level supervision from a KB. |
311 | Neural Machine Translation of Text from Non-Native Speakers | Antonios Anastasopoulos, Alison Lui, Toan Q. Nguyen, David Chiang, | In this paper, we show that augmenting training data with sentences containing artificially-introduced grammatical errors can make the system more robust to such errors. We also present a set of Spanish translations of the JFLEG grammar error correction corpus, which allows for testing NMT robustness to real grammatical errors. |
312 | Improving Domain Adaptation Translation with Domain Invariant and Specific Information | Shuhao Gu, Yang Feng, Qun Liu, | In this paper, we propose a method to explicitly model the two kinds of information in the encoder-decoder framework so as to exploit out-of-domain data in in-domain training. |
313 | Selective Attention for Context-aware Neural Machine Translation | Sameen Maruf, André F. T. Martins, Gholamreza Haffari, | To this end, we propose a novel and scalable top-down approach to hierarchical attention for context-aware NMT which uses sparse attention to selectively focus on relevant sentences in the document context and then attends to key words in those sentences. |
314 | On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models | Paul Michel, Xian Li, Graham Neubig, Juan Pino, | Using the example of untargeted attacks on machine translation (MT), we propose a new evaluation framework for adversarial attacks on seq2seq models that takes the semantic equivalence of the pre- and post-perturbation input into account. |
315 | Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction | Kazuma Hashimoto, Yoshimasa Tsuruoka, | To improve the efficiency of reinforcement learning, we present a novel approach for reducing the action space based on dynamic vocabulary prediction. |
316 | Mitigating Uncertainty in Document Classification | Xuchao Zhang, Fanglan Chen, ChangTien Lu, Naren Ramakrishnan, | In this paper, we propose a novel neural-network-based model that applies a new dropout-entropy method for uncertainty measurement. |
317 | Complexity-Weighted Loss and Diverse Reranking for Sentence Simplification | Reno Kriz, Joao Sedoc, Marianna Apidianaki, Carolina Zheng, Gaurav Kumar, Eleni Miltsakaki, Chris Callison-Burch, | We aim to alleviate this issue through the use of two main techniques. Second, we generate a large set of diverse candidate simplifications at test time, and rerank these to promote fluency, adequacy, and simplicity. |
318 | Predicting Helpful Posts in Open-Ended Discussion Forums: A Neural Architecture | Kishaloy Halder, Min-Yen Kan, Kazunari Sugiyama, | In this paper, we address the task of identifying helpful posts in a forum thread to help users comprehend long running discussion threads, which often contain repetitive or irrelevant posts. |
319 | Text Classification with Few Examples using Controlled Generalization | Abhijit Mahabal, Jason Baldridge, Burcu Karagol Ayan, Vincent Perot, Dan Roth, | This produces task-specific semantic vectors; here, we show that a feed-forward network over these vectors is especially effective in low-data scenarios, compared to existing state-of-the-art methods. |
320 | Reinforcement Learning Based Text Style Transfer without Parallel Training Corpus | Hongyu Gong, Suma Bhat, Lingfei Wu, JinJun Xiong, Wen-mei Hwu, | In this paper, we address this challenge by using a reinforcement-learning-based generator-evaluator architecture. |
321 | Adapting RNN Sequence Prediction Model to Multi-label Set Prediction | Kechen Qin, Cheng Li, Virgil Pavlu, Javed Aslam, | We present an adaptation of RNN sequence models to the problem of multi-label classification for text, where the target is a set of labels, not a sequence. |
322 | Customizing Grapheme-to-Phoneme System for Non-Trivial Transcription Problems in Bangla Language | Sudipta Saha Shubha, Nafis Sadeq, Shafayat Ahmed, Md. Nahidul Islam, Muhammad Abdullah Adnan, Md. Yasin Ali Khan, Mohammad Zuberul Islam, | As the performance of data-driven approaches for G2P conversion depend largely on pronunciation lexicon on which the system is trained, in this paper, we investigate on developing an improved training lexicon by identifying and categorizing the critical cases in Bangla language and include those critical cases in training lexicon for developing a robust G2P conversion system in Bangla language. |
323 | Connecting Language and Knowledge with Heterogeneous Representations for Neural Relation Extraction | Peng Xu, Denilson Barbosa, | We help close the gap with a framework that unifies the learning of RE and KBE models leading to significant improvements over the state-of-the-art in RE. |
324 | Segmentation-free compositional $n$-gram embedding | Geewook Kim, Kazuki Fukui, Hidetoshi Shimodaira, | We propose a new type of representation learning method that models words, phrases and sentences seamlessly. |
325 | Exploiting Noisy Data in Distant Supervision Relation Classification | Kaijia Yang, Liang He, Xin-Yu Dai, Shujian Huang, Jiajun Chen, | Different from previous works that underutilize noisy data which inherently characterize the property of classification, in this paper, we propose RCEND, a novel framework to enhance Relation Classification by Exploiting Noisy Data. |
326 | Misspelling Oblivious Word Embeddings | Aleksandra Piktus, Necati Bora Edizel, Piotr Bojanowski, Edouard Grave, Rui Ferreira, Fabrizio Silvestri, | In this paper we present a method to learn word embeddings that are resilient to misspellings. |
327 | Learning Relational Representations by Analogy using Hierarchical Siamese Networks | Gaetano Rossiello, Alfio Gliozzo, Robert Farrell, Nicolas Fauceglia, Michael Glass, | We address relation extraction as an analogy problem by proposing a novel approach to learn representations of relations expressed by their textual mentions. Following this idea, we collect a large set of analogous pairs by matching triples in knowledge bases with web-scale corpora through distant supervision. |
328 | An Effective Label Noise Model for DNN Text Classification | Ishan Jindal, Daniel Pressel, Brian Lester, Matthew Nokleby, | In this paper, we propose an approach to training deep networks that is robust to label noise. |
329 | Understanding Learning Dynamics Of Language Models with SVCCA | Naomi Saphra, Adam Lopez, | We present the first study on the learning dynamics of neural language models, using a simple and flexible analysis method called Singular Vector Canonical Correlation Analysis (SVCCA), which enables us to compare learned representations across time and across models, without the need to evaluate directly on annotated data. |
330 | Using Large Corpus N-gram Statistics to Improve Recurrent Neural Language Models | Yiben Yang, Ji-Ping Wang, Doug Downey, | We explore a technique that uses large corpus n-gram statistics as a regularizer for training a neural network LM on a smaller corpus. |
331 | Continual Learning for Sentence Representations Using Conceptors | Tianlin Liu, Lyle Ungar, Joao Sedoc, | In this paper, we consider a continual learning scenario for sentence representations: Given a sequence of corpora, we aim to optimize the sentence encoder with respect to the new corpus while maintaining its accuracy on the old corpora. |
332 | Relation Discovery with Out-of-Relation Knowledge Base as Supervision | Yan Liang, Xin Liu, Jianwen Zhang, Yangqiu Song, | In this paper, we study the problem of how to use out-of-relation knowledge bases to supervise the discovery of unseen relations, where out-of-relation means that relations to discover from the text corpus and those in knowledge bases are not overlapped. We construct a set of constraints between entity pairs based on the knowledge base embedding and then incorporate constraints into the relation discovery by a variational auto-encoder based algorithm. |
333 | Corpora Generation for Grammatical Error Correction | Jared Lichtarge, Chris Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar, Simon Tong, | We describe two approaches for generating large parallel datasets for GEC using publicly available Wikipedia data. |
334 | Structural Supervision Improves Learning of Non-Local Grammatical Dependencies | Ethan Wilcox, Peng Qian, Richard Futrell, Miguel Ballesteros, Roger Levy, | Here we investigate whether supervision with hierarchical structure enhances learning of a range of grammatical dependencies, a question that has previously been addressed only for subject-verb agreement. |
335 | Benchmarking Approximate Inference Methods for Neural Structured Prediction | Lifu Tu, Kevin Gimpel, | In this paper, we compare these two families of inference methods on three sequence labeling datasets. |
336 | Evaluating and Enhancing the Robustness of Dialogue Systems: A Case Study on a Negotiation Agent | Minhao Cheng, Wei Wei, Cho-Jui Hsieh, | In this paper, we develop algorithms to evaluate the robustness of a dialogue agent by carefully designed attacks using adversarial agents. |
337 | Investigating Robustness and Interpretability of Link Prediction via Adversarial Modifications | Pouya Pezeshkpour, Yifan Tian, Sameer Singh, | In this paper, we propose adversarial modifications for link prediction models: identifying the fact to add into or remove from the knowledge graph that changes the prediction for a target fact after the model is retrained. |
338 | Analysis Methods in Neural Language Processing: A Survey | Yonatan Belinkov, James Glass, | In this survey paper, we review analysis methods in neural language processing, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work. |
339 | Transferable Neural Projection Representations | Chinnadhurai Sankar, Sujith Ravi, Zornitsa Kozareva, | In this paper, we propose a skip-gram based architecture coupled with Locality-Sensitive Hashing (LSH) projections to learn efficient dynamically computable representations. |
340 | Semantic Role Labeling with Associated Memory Network | Chaoyu Guan, Yuhao Cheng, Hai Zhao, | This paper proposes a novel syntax-agnostic SRL model enhanced by the proposed associated memory network (AMN), which makes use of inter-sentence attention of label-known associated sentences as a kind of memory to further enhance dependency-based SRL. |
341 | Better, Faster, Stronger Sequence Tagging Constituent Parsers | David Vilares, Mostafa Abdou, Anders Søgaard, | In this work, we address the following weaknesses of such constituent parsers: (a) high error rates around closing brackets of long constituents, (b) large label sets, leading to sparsity, and (c) error propagation arising from greedy decoding. |
342 | CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition | Yuying Zhu, Guoxin Wang, | In this paper, we investigate a Convolutional Attention Network called CAN for Chinese NER, which consists of a character-based convolutional neural network (CNN) with local-attention layer and a gated recurrent unit (GRU) with global self-attention layer to capture the information from adjacent characters and sentence contexts. |
343 | Decomposed Local Models for Coordinate Structure Parsing | Hiroki Teranishi, Hiroyuki Shindo, Yuji Matsumoto, | We propose a simple and accurate model for coordination boundary identification. |
344 | Multi-Task Learning for Japanese Predicate Argument Structure Analysis | Hikaru Omori, Mamoru Komachi, | To address this problem, we present a multi-task learning method for PASA and ENASA. |
345 | Domain adaptation for part-of-speech tagging of noisy user-generated text | Luisa März, Dietrich Trautmann, Benjamin Roth, | We propose an architecture that trains an out-of-domain model on a large newswire corpus, and transfers those weights by using them as a prior for a model trained on the target domain (a data-set of German Tweets) for which there is very little annotations available. |
346 | Neural Chinese Address Parsing | Hao Li, Wei Lu, Pengjun Xie, Linlin Li, | This paper introduces a new task – Chinese address parsing – the task of mapping Chinese addresses into semantically meaningful chunks. We create and publicly release a new dataset consisting of 15K Chinese addresses, and conduct extensive experiments on the dataset to investigate the model effectiveness and robustness. |
347 | Learning Hierarchical Discourse-level Structure for Fake News Detection | Hamid Karimi, Jiliang Tang, | To address these challenges, we propose Hierarchical Discourse-level Structure for Fake news detection. |
348 | DiscoFuse: A Large-Scale Dataset for Discourse-Based Sentence Fusion | Mor Geva, Eric Malmi, Idan Szpektor, Jonathan Berant, | In this paper, we propose a method for automatically-generating fusion examples from raw text and present DiscoFuse, a large scale dataset for discourse-based sentence fusion. |
349 | Linguistically-Informed Specificity and Semantic Plausibility for Dialogue Generation | Wei-Jen Ko, Greg Durrett, Junyi Jessy Li, | In this work, we examine whether specificity is solely a frequency-related notion and find that more linguistically-driven specificity measures are better suited to improving response informativeness. |
350 | Learning to Describe Unknown Phrases with Local and Global Contexts | Shonosuke Ishiwatari, Hiroaki Hayashi, Naoki Yoshinaga, Graham Neubig, Shoetsu Sato, Masashi Toyoda, Masaru Kitsuregawa, | To solve this task, we propose a neural description model that consists of two context encoders and a description decoder. |
351 | Mining Discourse Markers for Unsupervised Sentence Representation Learning | Damien Sileo, Tim Van de Cruys, Camille Pradel, Philippe Muller, | In the present work, we propose a method to automatically discover sentence pairs with relevant discourse markers, and apply it to massive amounts of data. |
352 | How Large a Vocabulary Does Text Classification Need? A Variational Approach to Vocabulary Selection | Wenhu Chen, Yu Su, Yilin Shen, Zhiyu Chen, Xifeng Yan, William Yang Wang, | In this paper, we provide a more sophisticated variational vocabulary dropout (VVD) based on variational dropout to perform vocabulary selection, which can intelligently select the subset of the vocabulary to achieve the required performance. |
353 | Subword-based Compact Reconstruction of Word Embeddings | Shota Sasaki, Jun Suzuki, Kentaro Inui, | In this paper, we propose a method of reconstructing pre-trained word embeddings using subword information that can effectively represent a large number of subword embeddings in a considerably small fixed space. |
354 | Bayesian Learning for Neural Dependency Parsing | Ehsan Shareghi, Yingzhen Li, Yi Zhu, Roi Reichart, Anna Korhonen, | We demonstrate that in the small data regime, where uncertainty around parameter estimation and model prediction matters the most, Bayesian neural modeling is very effective. |
355 | AutoSeM: Automatic Task Selection and Mixing in Multi-Task Learning | Han Guo, Ramakanth Pasunuru, Mohit Bansal, | To address these issues, we present AutoSeM, a two-stage MTL pipeline, where the first stage automatically selects the most useful auxiliary tasks via a Beta-Bernoulli multi-armed bandit with Thompson Sampling, and the second stage learns the training mixing ratio of these selected auxiliary tasks via a Gaussian Process based Bayesian optimization framework. |
356 | Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages | Shauli Ravfogel, Yoav Goldberg, Tal Linzen, | We propose a paradigm that addresses these issues: we create synthetic versions of English, which differ from English in one or more typological parameters, and generate corpora for those languages based on a parsed English corpus. |
357 | Attention is not Explanation | Sarthak Jain, Byron C. Wallace, | In this work we perform extensive experiments across a variety of NLP tasks that aim to assess the degree to which attention weights provide meaningful “explanations” for predictions. |
358 | Playing Text-Adventure Games with Graph-Based Deep Reinforcement Learning | Prithviraj Ammanabrolu, Mark Riedl, | We present a deep reinforcement learning architecture that represents the game state as a knowledge graph which is learned during exploration. |
359 | Information Aggregation for Multi-Head Attention with Routing-by-Agreement | Jian Li, Baosong Yang, Zi-Yi Dou, Xing Wang, Michael R. Lyu, Zhaopeng Tu, | In this work, we propose to improve the information aggregation for multi-head attention with a more powerful routing-by-agreement algorithm. |
360 | Context Dependent Semantic Parsing over Temporally Structured Data | Charles Chen, Razvan Bunescu, | We describe a new semantic parsing setting that allows users to query the system using both natural language questions and actions within a graphical user interface. |
361 | Structural Scaffolds for Citation Intent Classification in Scientific Publications | Arman Cohan, Waleed Ammar, Madeleine van Zuylen, Field Cady, | We propose structural scaffolds, a multitask model to incorporate structural information of scientific papers into citations for effective classification of citation intents. In addition, we introduce a new dataset of citation intents (SciCite) which is more than five times larger and covers multiple scientific domains compared with existing datasets. |
362 | pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference | Mandar Joshi, Eunsol Choi, Omer Levy, Daniel Weld, Luke Zettlemoyer, | This paper proposes new methods for learning and using embeddings of word pairs that implicitly represent background knowledge about such relationships. |
363 | Submodular Optimization-based Diverse Paraphrasing and its Effectiveness in Data Augmentation | Ashutosh Kumar, Satwik Bhattamishra, Manik Bhandari, Partha Talukdar, | In this work, we focus on the task of obtaining highly diverse paraphrases while not compromising on paraphrasing quality. |
364 | Let’s Make Your Request More Persuasive: Modeling Persuasive Strategies via Semi-Supervised Neural Nets on Crowdfunding Platforms | Diyi Yang, Jiaao Chen, Zichao Yang, Dan Jurafsky, Eduard Hovy, | Building on theories of persuasion, we propose a neural network to quantify persuasiveness and identify the persuasive strategies in advocacy requests. |
365 | Recursive Routing Networks: Learning to Compose Modules for Language Understanding | Ignacio Cases, Clemens Rosenbaum, Matthew Riemer, Atticus Geiger, Tim Klinger, Alex Tamkin, Olivia Li, Sandhini Agarwal, Joshua D. Greene, Dan Jurafsky, Christopher Potts, Lauri Karttunen, | We introduce Recursive Routing Networks (RRNs), which are modular, adaptable models that learn effectively in diverse environments. To show that RRNs can learn to specialize to more fine-grained semantic distinctions, we introduce a new corpus of NLI examples involving implicative predicates, and show that the model components become fine-tuned to the inferential signatures that are characteristic of these predicates. |
366 | Structural Neural Encoders for AMR-to-text Generation | Marco Damonte, Shay B. Cohen, | We investigate the extent to which reentrancies (nodes with multiple parents) have an impact on AMR-to-text generation by comparing graph encoders to tree encoders, where reentrancies are not preserved. |
367 | Multilingual prediction of Alzheimer’s disease through domain adaptation and concept-based language modelling | Kathleen C. Fraser, Nicklas Linz, Bai Li, Kristina Lundholm Fors, Frank Rudzicz, Alexandra Konig, Jan Alexandersson, Philippe Robert, Dimitrios Kokkinakis, | Here, we compare several methods of domain adaptation to augment a small French dataset of picture descriptions (n = 57) with a much larger English dataset (n = 550), for the task of automatically distinguishing participants with dementia from controls. |
368 | Ranking and Selecting Multi-Hop Knowledge Paths to Better Predict Human Needs | Debjit Paul, Anette Frank, | We present a novel method to extract, rank, filter and select multi-hop relation paths from a commonsense knowledge resource to interpret the expression of sentiment in terms of their underlying human needs. |
369 | NLP Whack-A-Mole: Challenges in Cross-Domain Temporal Expression Extraction | Amy Olex, Luke Maffey, Bridget McInnes, | Here we explore parsing issues that arose when running our system, a tool built on Newswire text, on clinical notes in the THYME corpus. |
370 | Document-Level N-ary Relation Extraction with Multiscale Representation Learning | Robin Jia, Cliff Wong, Hoifung Poon, | In this paper, we propose a novel multiscale neural architecture for document-level n-ary relation extraction. |
371 | Inferring Which Medical Treatments Work from Reports of Clinical Trials | Eric Lehman, Jay DeYoung, Regina Barzilay, Byron C. Wallace, | In this paper, we present a new task and corpus for making this unstructured published scientific evidence actionable. We present a new corpus for this task comprising 10,000+ prompts coupled with full-text articles describing RCTs. |
372 | Decay-Function-Free Time-Aware Attention to Context and Speaker Indicator for Spoken Language Understanding | Jonggu Kim, Jong-Hyeok Lee, | To capture salient contextual information for spoken language understanding (SLU) of a dialogue, we propose time-aware models that automatically learn the latent time-decay function of the history without a manual time-decay function. |
373 | Dialogue Act Classification with Context-Aware Self-Attention | Vipul Raheja, Joel Tetreault, | Dialogue Act Classification with Context-Aware Self-Attention. |
374 | Affect-Driven Dialog Generation | Pierre Colombo, Wojciech Witon, Ashutosh Modi, James Kennedy, Mubbasir Kapadia, | In this paper, we present an affect-driven dialog system, which generates emotional responses in a controlled manner using a continuous representation of emotions. |
375 | Multi-Level Memory for Task Oriented Dialogs | Revanth Gangi Reddy, Danish Contractor, Dinesh Raghu, Sachindra Joshi, | In this paper we relax the strong assumptions made by existing architectures and separate memories used for modeling dialog context and KB results. |
376 | Topic Spotting using Hierarchical Networks with Self Attention | Pooja Chitkara, Ashutosh Modi, Pravalika Avvaru, Sepehr Janghorbani, Mubbasir Kapadia, | We propose a hierarchical model with self attention for topic spotting. |
377 | Top-Down Structurally-Constrained Neural Response Generation with Lexicalized Probabilistic Context-Free Grammar | Wenchao Du, Alan W. Black, | We applied our model to the task of dialog response generation, and found it significantly improves over sequence-to-sequence baseline, in terms of diversity and relevance. |
378 | What do Entity-Centric Models Learn? Insights from Entity Linking in Multi-Party Dialogue | Laura Aina, Carina Silberer, Ionut-Teodor Sorodoc, Matthijs Westera, Gemma Boleda, | In this paper we analyze the behavior of two recently proposed entity-centric models in a referential task, Entity Linking in Multi-party Dialogue (SemEval 2018 Task 4). |
379 | Continuous Learning for Large-scale Personalized Domain Classification | Han Li, Jihwan Lee, Sidharth Mudgal, Ruhi Sarikaya, Young-Bum Kim, | In this paper, we propose CoNDA, a neural-based approach for continuous domain adaption with normalization and regularization. |
380 | Cross-lingual Transfer Learning for Multilingual Task Oriented Dialog | Sebastian Schuster, Sonal Gupta, Rushin Shah, Mike Lewis, | In this paper, we present a new data set of 57k annotated utterances in English (43k), Spanish (8.6k) and Thai (5k) across the domains weather, alarm, and reminder. |
381 | Evaluating Coherence in Dialogue Systems using Entailment | Nouha Dziri, Ehsan Kamalloo, Kory Mathewson, Osmar Zaiane, | In this paper, we present interpretable metrics for evaluating topic coherence by making use of distributed sentence representations. |
382 | On Knowledge distillation from complex networks for response prediction | Siddhartha Arora, Mitesh M. Khapra, Harish G. Ramaswamy, | In order to overcome this, we use standard simple models which do not capture all pairwise interactions, but learn to emulate certain characteristics of a complex teacher network. |
383 | Cross-lingual Multi-Level Adversarial Transfer to Enhance Low-Resource Name Tagging | Lifu Huang, Heng Ji, Jonathan May, | We focus on improving name tagging for low-resource languages using annotations from related languages. |
384 | Unsupervised Extraction of Partial Translations for Neural Machine Translation | Benjamin Marie, Atsushi Fujita, | In this work, we assume that new translation knowledge can be extracted from monolingual data, without relying at all on existing parallel data. |
385 | Low-Resource Syntactic Transfer with Unsupervised Source Reordering | Mohammad Sadegh Rasooli, Michael Collins, | We describe a cross-lingual transfer method for dependency parsing that takes into account the problem of word order differences between source and target languages. |
386 | Revisiting Adversarial Autoencoder for Unsupervised Word Translation with Cycle Consistency and Improved Training | Tasnim Mohiuddin, Shafiq Joty, | In this work, we revisit adversarial autoencoder for unsupervised word translation and propose two novel extensions to it that yield more stable training and improved results. |
387 | Addressing word-order Divergence in Multilingual Neural Machine Translation for extremely Low Resource Languages | Rudra Murthy, Anoop Kunchukuttan, Pushpak Bhattacharyya, | To bridge this divergence, we propose to pre-order the assisting language sentences to match the word order of the source language and train the parent model. |
388 | Massively Multilingual Neural Machine Translation | Roee Aharoni, Melvin Johnson, Orhan Firat, | We perform extensive experiments in training massively multilingual NMT models, involving up to 103 distinct languages and 204 translation directions simultaneously. |
389 | A Large-Scale Comparison of Historical Text Normalization Systems | Marcel Bollmann, | This paper presents the largest study of historical text normalization done so far. |
390 | Combining Discourse Markers and Cross-lingual Embeddings for Synonym–Antonym Classification | Michael Roth, Shyam Upadhyay, | In this work, we improve the transfer by exploiting monolingual information, expressed in the form of co-occurrences with discourse markers that convey contrast. |
391 | Context-Aware Cross-Lingual Mapping | Hanan Aldarmaki, Mona Diab, | In this paper, we propose an alternative to word-level mapping that better reflects sentence-level cross-lingual similarity. |
392 | Polyglot Contextual Representations Improve Crosslingual Transfer | Phoebe Mulcaire, Jungo Kasai, Noah A. Smith, | We introduce Rosita, a method to produce multilingual contextual word representations by training a single language model on text from multiple languages. |
393 | Typological Features for Multilingual Delexicalised Dependency Parsing | Manon Scholivet, Franck Dary, Alexis Nasr, Benoit Favre, Carlos Ramisch, | Our work investigates the use of high-level language descriptions in the form of typological features for multilingual dependency parsing. |
394 | Recommendations for Datasets for Source Code Summarization | Alexander LeClair, Collin McMillan, | In this paper, we make recommendations for these standards from experimental results. We release a dataset based on prior work of over 2.1m pairs of Java methods and one sentence method descriptions from over 28k Java projects. |
395 | Question Answering as an Automatic Evaluation Metric for News Article Summarization | Matan Eyal, Tal Baumel, Michael Elhadad, | We present an alternative, extrinsic, evaluation metric for this task, Answering Performance for Evaluation of Summaries. |
396 | Understanding the Behaviour of Neural Abstractive Summarizers using Contrastive Examples | Krtin Kumar, Jackie Chi Kit Cheung, | We investigate how they achieve this performance with respect to human-written gold-standard abstracts, and whether the systems are able to understand deeper syntactic and semantic structures. We generate a set of contrastive summaries which are perturbed, deficient versions of human-written summaries, and test whether existing neural summarizers score them more highly than the human-written summaries. |
397 | Jointly Extracting and Compressing Documents with Summary State Representations | Afonso Mendes, Shashi Narayan, Sebastião Miranda, Zita Marinho, André F. T. Martins, Shay B. Cohen, | We present a new neural model for text summarization that first extracts sentences from a document and then compresses them. |
398 | News Article Teaser Tweets and How to Generate Them | Sanjeev Kumar Karn, Mark Buckley, Ulli Waltinger, Hinrich Schütze, | In this work, we define the task of teaser generation and provide an evaluation benchmark and baseline systems for the process of generating teasers. |
399 | Cross-referencing Using Fine-grained Topic Modeling | Jeffrey Lund, Piper Armstrong, Wilson Fearn, Stephen Cowley, Emily Hales, Kevin Seppi, | We develop a topic-based system for automatically producing candidate cross-references which can be easily verified by human annotators. |
400 | Conversation Initiation by Diverse News Contents Introduction | Satoshi Akasaki, Nobuhiro Kaji, | In this paper, we consider the system as a conversation initiator and propose a novel task of generating the initial utterance in open-domain non-task-oriented conversation. To address the lack of the training data for this task, we constructed a novel large-scale dataset through crowd-sourcing. |
401 | Positional Encoding to Control Output Sequence Length | Sho Takase, Naoaki Okazaki, | In this paper, we propose a simple but effective extension of a sinusoidal positional encoding (Vaswani et al., 2017) so that a neural encoder-decoder model preserves the length constraint. |
402 | The Lower The Simpler: Simplifying Hierarchical Recurrent Models | Chao Wang, Hui Jiang, | To improve the training efficiency of hierarchical recurrent models without compromising their performance, we propose a strategy named as “the lower the simpler”, which is to simplify the baseline models by making the lower layers simpler than the upper layers. |
403 | Using Natural Language Relations between Answer Choices for Machine Comprehension | Rajkumar Pujari, Dan Goldwasser, | In this paper, we propose a method to leverage the natural language relations between the answer choices, such as entailment and contradiction, to improve the performance of machine comprehension. |
404 | Saliency Learning: Teaching the Model Where to Pay Attention | Reza Ghaeini, Xiaoli Fern, Hamed Shahbazi, Prasad Tadepalli, | In this paper, we aim to teach the model to make the right prediction for the right reason by providing explanation training and ensuring the alignment of the model’s explanation with the ground truth explanation. |
405 | Understanding Dataset Design Choices for Multi-hop Reasoning | Jifan Chen, Greg Durrett, | In this paper, we investigate two recently proposed datasets, WikiHop and HotpotQA. |
406 | Neural Grammatical Error Correction with Finite State Transducers | Felix Stahlberg, Christopher Bryant, Bill Byrne, | We show how to improve LM-GEC by applying modelling techniques based on finite state transducers. |
407 | Convolutional Self-Attention Networks | Baosong Yang, Longyue Wang, Derek F. Wong, Lidia S. Chao, Zhaopeng Tu, | In this work, we propose novel convolutional self-attention networks, which offer SANs the abilities to 1) strengthen dependencies among neighboring elements, and 2) model the interaction between features extracted by multiple attention heads. |
408 | Rethinking Complex Neural Network Architectures for Document Classification | Ashutosh Adhikari, Achyudh Ram, Raphael Tang, Jimmy Lin, | Our work provides an open-source platform and the foundation for future work in document classification. |
409 | Pre-trained language model representations for language generation | Sergey Edunov, Alexei Baevski, Michael Auli, | In this paper, we examine different strategies to integrate pre-trained representations into sequence to sequence models and apply it to neural machine translation and abstractive summarization. |
410 | Pragmatically Informative Text Generation | Sheng Shen, Daniel Fried, Jacob Andreas, Dan Klein, | We consider two pragmatic modeling methods for text generation: one where pragmatics is imposed by information preservation, and another where pragmatics is imposed by explicit modeling of distractors. |
411 | Stochastic Wasserstein Autoencoder for Probabilistic Sentence Generation | Hareesh Bahuleyan, Lili Mou, Hao Zhou, Olga Vechtomova, | In this paper, we propose to use the Wasserstein autoencoder (WAE) for probabilistic sentence generation, where the encoder could be either stochastic or deterministic. |
412 | Benchmarking Hierarchical Script Knowledge | Yonatan Bisk, Jan Buys, Karl Pichotta, Yejin Choi, | In this paper, we introduce KidsCook, a parallel script corpus, as well as a cloze task which matches video captions with missing procedural details. |
413 | A large-scale study of the effects of word frequency and predictability in naturalistic reading | Cory Shain, | This paper examines the generalizability of this finding to more realistic conditions of sentence processing by studying effects of frequency and predictability in three large-scale naturalistic reading corpora. |
414 | Augmenting word2vec with latent Dirichlet allocation within a clinical application | Akshay Budhkar, Frank Rudzicz, | This paper presents three hybrid models that directly combine latent Dirichlet allocation and word embedding for distinguishing between speakers with and without Alzheimer’s disease from transcripts of picture descriptions. |
415 | On the Idiosyncrasies of the Mandarin Chinese Classifier System | Shijia Liu, Hongyuan Mei, Adina Williams, Ryan Cotterell, | In this paper, we introduce an information-theoretic approach to measuring idiosyncrasy; we examine how much the uncertainty in Mandarin Chinese classifiers can be reduced by knowing semantic information about the nouns that the classifiers modify. |
416 | Joint Learning of Pre-Trained and Random Units for Domain Adaptation in Part-of-Speech Tagging | Sara Meftah, Youssef Tamaazousti, Nasredine Semmar, Hassane Essafi, Fatiha Sadat, | In this paper, we propose to augment the target-network with normalised, weighted and randomly initialised units that beget a better adaptation while maintaining the valuable source knowledge. |
417 | Show Some Love to Your n-grams: A Bit of Progress and Stronger n-gram Language Modeling Baselines | Ehsan Shareghi, Daniela Gerz, Ivan Vuli?, Anna Korhonen, | In this paper, we examine the recent progress in n-gram literature, running experiments on 50 languages covering all morphological language families. |
418 | Training Data Augmentation for Context-Sensitive Neural Lemmatizer Using Inflection Tables and Raw Text | Toms Bergmanis, Sharon Goldwater, | To combine the efficiency of type-based learning with the benefits of context, we propose a way to train a context-sensitive lemmatizer with little or no labeled corpus data, using inflection tables from the UniMorph project and raw text examples from Wikipedia that provide sentence contexts for the unambiguous UniMorph examples. |
419 | A Structural Probe for Finding Syntax in Word Representations | John Hewitt, Christopher D. Manning, | In this work, we propose a structural probe, which evaluates whether syntax trees are embedded in a linear transformation of a neural network’s word representation space. |
420 | CNM: An Interpretable Complex-valued Network for Matching | Qiuchi Li, Benyou Wang, Massimo Melucci, | This paper seeks to model human language by the mathematical framework of quantum physics. |
421 | CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge | Alon Talmor, Jonathan Herzig, Nicholas Lourie, Jonathan Berant, | To investigate question answering with prior knowledge, we present CommonsenseQA: a challenging new dataset for commonsense question answering. |
422 | Probing the Need for Visual Context in Multimodal Machine Translation | Ozan Caglayan, Pranava Madhyastha, Lucia Specia, Loïc Barrault, | In this paper we probe the contribution of the visual modality to state-of-the-art MMT models by conducting a systematic analysis where we partially deprive the models from source-side textual context. |
423 | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, | We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. |
424 | What’s in a Name? Reducing Bias in Bios without Access to Protected Attributes | Alexey Romanov, Maria De-Arteaga, Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, Anna Rumshisky, Adam Kalai, | In the context of mitigating bias in occupation classification, we propose a method for discouraging correlation between the predicted probability of an individual’s true occupation and a word embedding of their name. |
TABLE 2: NAACL 2019 Industry Track Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | Enabling Real-time Neural IME with Incremental Vocabulary Selection | Jiali Yao, Raphael Shu, Xinjian Li, Katsutoshi Ohtsuki, Hideki Nakayama, | In this work, we articulate the bottleneck of neural IME decoding to be the heavy softmax computation over a large vocabulary. |
2 | Locale-agnostic Universal Domain Classification Model in Spoken Language Understanding | Jihwan Lee, Ruhi Sarikaya, Young-Bum Kim, | In this paper, we introduce an approach for leveraging available data across multiple locales sharing the same language to 1) improve domain classification model accuracy in Spoken Language Understanding and user experience even if new locales do not have sufficient data and 2) reduce the cost of scaling the domain classifier to a large number of locales. |
3 | Practical Semantic Parsing for Spoken Language Understanding | Marco Damonte, Rahul Goel, Tagyoung Chung, | We build a transfer learning framework for executable semantic parsing. |
4 | Fast Prototyping a Dialogue Comprehension System for Nurse-Patient Conversations on Symptom Monitoring | Zhengyuan Liu, Hazel Lim, Nur Farah Ain Suhaimi, Shao Chuen Tong, Sharon Ong, Angela Ng, Sheldon Lee, Michael R. Macdonald, Savitha Ramasamy, Pavitra Krishnaswamy, Wai Leng Chow, Nancy F. Chen, | In this work, we investigate fast prototyping of a dialogue comprehension system by leveraging on minimal nurse-to-patient conversations. |
5 | Graph Convolution for Multimodal Information Extraction from Visually Rich Documents | Xiaojing Liu, Feiyu Gao, Qiong Zhang, Huasha Zhao, | In this paper, we introduce a graph convolution based model to combine textual and visual information presented in VRDs. |
6 | Diversifying Reply Suggestions Using a Matching-Conditional Variational Autoencoder | Budhaditya Deb, Peter Bailey, Milad Shokouhi, | We propose a constrained-sampling approach to make the variational inference in M-CVAE efficient for our production system. |
7 | Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting | Yichao Lu, Manisha Srivastava, Jared Kramer, Heba Elfardy, Andrea Kahn, Song Wang, Vikas Bhardwaj, | We present real-world results for two issue types in the customer service domain. |
8 | Detecting Customer Complaint Escalation with Recurrent Neural Networks and Manually-Engineered Features | Wei Yang, Luchen Tan, Chunwei Lu, Anqi Cui, Han Li, Xi Chen, Kun Xiong, Muzi Wang, Ming Li, Jian Pei, Jimmy Lin, | We describe a hybrid model that tackles this challenge by integrating recurrent neural networks with manually-engineered features. |
9 | Multi-Modal Generative Adversarial Network for Short Product Title Generation in Mobile E-Commerce | Jianguo Zhang, Pengcheng Zou, Zhao Li, Yao Wan, Xiuming Pan, Yu Gong, Philip S. Yu, | In this paper, we propose a Multi-Modal Generative Adversarial Network (MM-GAN) for short product title generation in E-Commerce, which innovatively incorporates image information and attribute tags from product, as well as textual information from original long titles. |
10 | A Case Study on Neural Headline Generation for Editing Support | Kazuma Murao, Ken Kobayashi, Hayato Kobayashi, Taichi Yatsuka, Takeshi Masuyama, Tatsuru Higurashi, Yoshimune Tabuchi, | In this paper, we describe a practical use case of neural headline generation in a news aggregator, where dozens of professional editors constantly select important news articles and manually create their headlines, which are much shorter than the original headlines. |
11 | Neural Lexicons for Slot Tagging in Spoken Language Understanding | Kyle Williams, | We develop models that encode lexicon information as neural features for use in a Long-short term memory neural network. |
12 | Active Learning for New Domains in Natural Language Understanding | Stanislav Peshterliev, John Kearney, Abhyuday Jagannatha, Imre Kiss, Spyros Matsoukas, | We propose an algorithm called Majority-CRF that uses an ensemble of classification models to guide the selection of relevant utterances, as well as a sequence labeling model to help prioritize informative examples. |
13 | Scaling Multi-Domain Dialogue State Tracking via Query Reformulation | Pushpendre Rastogi, Arpit Gupta, Tongfei Chen, Mathias Lambert, | We present a novel approach to dialogue state tracking and referring expression resolution tasks. |
14 | Are the Tools up to the Task? an Evaluation of Commercial Dialog Tools in Developing Conversational Enterprise-grade Dialog Systems | Marie Meteer, Meghan Hickey, Carmi Rothberg, David Nahamoo, Ellen Eide Kislal, | In this paper, we provide both quantitative and qualitative results in three main areas: natural language understanding, dialog, and text generation. |
15 | Development and Deployment of a Large-Scale Dialog-based Intelligent Tutoring System | Shazia Afzal, Tejas Dhamecha, Nirmal Mukhi, Renuka Sindhgatta, Smit Marvaniya, Matthew Ventura, Jessica Yarbro, | In this paper, we describe and reflect on the design, methods, decisions and assessments that led to the successful deployment of our AI driven DBT currently being used by several hundreds of college level students for practice and self-regulated study in diverse subjects like Sociology, Communications, and American Government. |
16 | Learning When Not to Answer: a Ternary Reward Structure for Reinforcement Learning Based Question Answering | Fréderic Godin, Anjishnu Kumar, Arpit Mittal, | In this paper, we investigate the challenges of using reinforcement learning agents for question-answering over knowledge graphs for real-world applications. |
17 | Extraction of Message Sequence Charts from Software Use-Case Descriptions | Girish Palshikar, Nitin Ramrakhiyani, Sangameshwar Patil, Sachin Pawar, Swapnil Hingmire, Vasudeva Varma, Pushpak Bhattacharyya, | In this paper, we describe a linguistic knowledge-based approach to extract MSCs from use-cases. |
18 | Improving Knowledge Base Construction from Robust Infobox Extraction | Boya Peng, Yejin Huh, Xiao Ling, Michele Banko, | This paper presents a robust approach that tackles all three challenges. |
19 | A k-Nearest Neighbor Approach towards Multi-level Sequence Labeling | Yue Chen, John Chen, | In this paper we present a new method for intent recognition for complex dialog management in low resource situations. |
20 | Train One Get One Free: Partially Supervised Neural Network for Bug Report Duplicate Detection and Clustering | Lahari Poddar, Leonardo Neves, William Brendel, Luis Marujo, Sergey Tulyakov, Pradeep Karuturi, | This paper proposes a neural architecture that can jointly (1) detect if two bug reports are duplicates, and (2) aggregate them into latent topics. |
21 | Robust Semantic Parsing with Adversarial Learning for Domain Generalization | Gabriel Marzinotto, Geraldine Damnati, Frederic Bechet, Benoit Favre, | We propose to perform Semantic Parsing with a domain classification adversarial task, covering various use-cases with or without explicit knowledge of the domain. |
22 | TOI-CNN: a Solution of Information Extraction on Chinese Insurance Policy | Lin Sun, Kai Zhang, Fule Ji, Zhenhua Yang, | This paper shows a problem of Element Tagging on Insurance Policy (ETIP). We have collected a large Chinese insurance contract dataset and labeled the critical elements of seven categories to test the performance of the proposed method. |
23 | Cross-lingual Transfer Learning for Japanese Named Entity Recognition | Andrew Johnson, Penny Karanasou, Judith Gaspers, Dietrich Klakow, | This work explores cross-lingual transfer learning (TL) for named entity recognition, focusing on bootstrapping Japanese from English. |
24 | Neural Text Normalization with Subword Units | Courtney Mansfield, Ming Sun, Yuzong Liu, Ankur Gandhe, Bjorn Hoffmeister, | In this paper, we frame TN as a machine translation task and tackle it with sequence-to-sequence (seq2seq) models. |
25 | Audio De-identification – a New Entity Recognition Task | Ido Cohn, Itay Laish, Genady Beryozkin, Gang Li, Izhak Shafran, Idan Szpektor, Tzvika Hartman, Avinatan Hassidim, Yossi Matias, | To this end, we define the task of audio de-ID, in which audio spans with entity mentions should be detected. Finally, we introduce a novel metric for audio de-ID and a new evaluation benchmark consisting of a large labeled segment of the Switchboard and Fisher audio datasets and detail our pipeline’s results on it. |
26 | In Other News: a Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data | Nishant Prateek, Mateusz Łajszczak, Roberto Barra-Chicote, Thomas Drugman, Jaime Lorenzo-Trueba, Thomas Merritt, Srikanth Ronanki, Trevor Wood, | In this paper different styles of speech are analysed based on prosodic variations, from this a model is proposed to synthesise speech in the style of a newscaster, with just a few hours of supplementary data. We pose the problem of synthesising in a target style using limited data as that of creating a bi-style model that can synthesise both neutral-style and newscaster-style speech via a one-hot vector which factorises the two styles. |
27 | Generate, Filter, and Rank: Grammaticality Classification for Production-Ready NLG Systems | Ashwini Challa, Kartikeya Upasani, Anusha Balakrishnan, Rajen Subba, | We propose the use of a generate, filter, and rank framework, in which candidate responses are first filtered to eliminate unacceptable responses, and then ranked to select the best response. We release a grammatical classification and semantic correctness classification dataset for the weather domain that consists of responses generated by 3 data-driven NLG systems. |
28 | Content-based Dwell Time Engagement Prediction Model for News Articles | Heidar Davoudi, Aijun An, Gordon Edall, | In this paper, we propose a novel content-based approach based on a deep neural network architecture for predicting article dwell times. |