Paper Digest: EMNLP 2018 Highlights
The Conference on Empirical Methods in Natural Language Processing (EMNLP) is one of the top natural language processing conferences in the world. In 2018, it is to be held in Brussels, Belgium. There were 1,376 long paper submissions, of which 351 were accepted and 855 short paper submissions, of which 198 were accepted.
To help AI community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting AI paper, you are welcome to sign up our free paper digest service to get new paper updates customized to your own interests on a daily basis.
Paper Digest Team
team@paperdigest.org
TABLE 1: EMNLP 2018 Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | Privacy-preserving Neural Representations of Text | Maximin Coavoux, Shashi Narayan, Shay B. Cohen | This article deals with adversarial attacks towards deep learning systems for Natural Language Processing (NLP), in the context of privacy protection. |
2 | Adversarial Removal of Demographic Attributes from Text Data | Yanai Elazar, Yoav Goldberg | We explore several techniques to improve the effectiveness of the adversarial component. |
3 | DeClarE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning | Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, Gerhard Weikum | Research on automated fact-checking has proposed methods based on supervised learning, but these approaches do not consider external evidence apart from labeled training instances. |
4 | It’s going to be okay: Measuring Access to Support in Online Communities | Zijian Wang, David Jurgens | We apply these methods to create a new massive corpus of 102M online interactions with gender-labeled users, each rated by degree of supportiveness. |
5 | Detecting Gang-Involved Escalation on Social Media Using Context | Serina Chang, Ruiqi Zhong, Ethan Adams, Fei-Tzin Lee, Siddharth Varia, Desmond Patton, William Frey, Chris Kedzie, Kathy McKeown | In this paper, we present a novel system for detecting Aggression and Loss in social media. |
6 | Reasoning about Actions and State Changes by Injecting Commonsense Knowledge | Niket Tandon, Bhavana Dalvi, Joel Grus, Wen-tau Yih, Antoine Bosselut, Peter Clark | In this paper, we show how the predicted effects of actions in the context of a paragraph can be improved in two ways: (1) by incorporating global, commonsense constraints (e.g., a non-existent entity cannot be destroyed), and (2) by biasing reading with preferences from large-scale corpora (e.g., trees rarely move). |
7 | Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation | Adam Poliak, Aparajita Haldar, Rachel Rudinger, J. Edward Hu, Ellie Pavlick, Aaron Steven White, Benjamin Van Durme | We present a large-scale collection of diverse natural language inference (NLI) datasets that help provide insight into how well a sentence representation captures distinct types of reasoning. |
8 | Textual Analogy Parsing: What’s Shared and What’s Compared among Analogous Facts | Matthew Lamm, Arun Chaganty, Christopher D. Manning, Dan Jurafsky, Percy Liang | In this paper, we propose the task of Textual Analogy Parsing (TAP) to model this higher-order meaning. We present a new dataset for TAP, baselines, and a model that successfully uses an ILP to enforce the structural constraints of the problem. |
9 | SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference | Rowan Zellers, Yonatan Bisk, Roy Schwartz, Yejin Choi | In this paper, we introduce the task of grounded commonsense inference, unifying natural language inference and commonsense reasoning. |
10 | TwoWingOS: A Two-Wing Optimization Strategy for Evidential Claim Verification | Wenpeng Yin, Dan Roth | We propose to consider these two aspects jointly. |
11 | Associative Multichannel Autoencoder for Multimodal Word Representation | Shaonan Wang, Jiajun Zhang, Chengqing Zong | In this paper we address the problem of learning multimodal word representations by integrating textual, visual and auditory inputs. |
12 | Game-Based Video-Context Dialogue | Ramakanth Pasunuru, Mohit Bansal | To move closer towards such multimodal conversational skills and visually-situated applications, we introduce a new video-context, many-speaker dialogue dataset based on live-broadcast soccer game videos and chats from Twitch.tv. |
13 | simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions | Fenglin Liu, Xuancheng Ren, Yuanxin Liu, Houfeng Wang, Xu Sun | In this paper, we propose the Stepwise Image-Topic Merging Network (simNet) that makes use of the two kinds of attention at the same time. |
14 | Multimodal Language Analysis with Recurrent Multistage Fusion | Paul Pu Liang, Ziyin Liu, AmirAli Bagher Zadeh, Louis-Philippe Morency | In this paper, we propose the Recurrent Multistage Fusion Network (RMFN) which decomposes the fusion problem into multiple stages, each of them focused on a subset of multimodal signals for specialized, effective fusion. |
15 | Temporally Grounding Natural Sentence in Video | Jingyuan Chen, Xinpeng Chen, Lin Ma, Zequn Jie, Tat-Seng Chua | We introduce an effective and efficient method that grounds (i.e., localizes) natural sentences in long, untrimmed video sequences. |
16 | PreCo: A Large-scale Dataset in Preschool Vocabulary for Coreference Resolution | Hong Chen, Zhenhua Fan, Hao Lu, Alan Yuille, Shu Rong | We introduce PreCo, a large-scale English dataset for coreference resolution. To strengthen the training-test overlap, we collect a large corpus of 38K documents and 12.5M words which are mostly from the vocabulary of English-speaking preschoolers. |
17 | Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism | Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao, Shengping Liu | In this paper, we propose a novel adversarial transfer learning framework to make full use of task-shared boundaries information and prevent the task-specific features of CWS. |
18 | Using Linguistic Features to Improve the Generalization Capability of Neural Coreference Resolvers | Nafise Sadat Moosavi, Michael Strube | In this paper, we investigate the role of linguistic features in building more generalizable coreference resolvers. |
19 | Neural Segmental Hypergraphs for Overlapping Mention Recognition | Bailin Wang, Wei Lu | In this work, we propose a novel segmental hypergraph representation to model overlapping entity mentions that are prevalent in many practical datasets. |
20 | Variational Sequential Labelers for Semi-Supervised Learning | Mingda Chen, Qingming Tang, Karen Livescu, Kevin Gimpel | We introduce a family of multitask variational methods for semi-supervised sequence labeling. |
21 | Joint Representation Learning of Cross-lingual Words and Entities via Attentive Distant Supervision | Yixin Cao, Lei Hou, Juanzi Li, Zhiyuan Liu, Chengjiang Li, Xu Chen, Tiansi Dong | In this paper, we propose a novel method for joint representation learning of cross-lingual words and entities. |
22 | Deep Pivot-Based Modeling for Cross-language Cross-domain Transfer with Minimal Guidance | Yftah Ziser, Roi Reichart | In this work we consider this problem, and propose a framework that builds on pivot-based learning, structure-aware Deep Neural Networks (particularly LSTMs and CNNs) and bilingual word embeddings, with the goal of training a model on labeled data from one (language, domain) pair so that it can be effectively applied to another (language, domain) pair. |
23 | Multi-lingual Common Semantic Space Construction via Cluster-consistent Word Embedding | Lifu Huang, Kyunghyun Cho, Boliang Zhang, Heng Ji, Kevin Knight | Beyond word alignment, we introduce multiple cluster-level alignments and enforce the word clusters to be consistently distributed across multiple languages. |
24 | Unsupervised Multilingual Word Embeddings | Xilun Chen, Claire Cardie | To address this shortcoming, we propose a fully unsupervised framework for learning MWEs that directly exploits the relations between all language pairs. |
25 | CLUSE: Cross-Lingual Unsupervised Sense Embeddings | Ta-Chung Chi, Yun-Nung Chen | This paper proposes a modularized sense induction and representation learning model that jointly learns bilingual sense embeddings that align well in the vector space, where the cross-lingual signal in the English-Chinese parallel corpus is exploited to capture the collocation and distributed characteristics in the language pair. |
26 | Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization | Edoardo Maria Ponti, Ivan Vulić, Goran Glavaš, Nikola Mrkšić, Anna Korhonen | We propose a novel approach to specializing the full distributional vocabulary. |
27 | Improving Cross-Lingual Word Embeddings by Meeting in the Middle | Yerai Doval, Jose Camacho-Collados, Luis Espinosa-Anke, Steven Schockaert | In this work, we propose to apply an additional transformation after the initial alignment step, which moves cross-lingual synonyms towards a middle point between them. |
28 | WikiAtomicEdits: A Multilingual Corpus of Wikipedia Edits for Modeling Language and Discourse | Manaal Faruqui, Ellie Pavlick, Ian Tenney, Dipanjan Das | We use the collected data to show that the language generated during editing differs from the language that we observe in standard corpora, and that models trained on edits encode different aspects of semantics and discourse than models trained on raw text. We release a corpus of 43 million atomic edits across 8 languages. |
29 | On the Relation between Linguistic Typology and (Limitations of) Multilingual Language Modeling | Daniela Gerz, Ivan Vulić, Edoardo Maria Ponti, Roi Reichart, Anna Korhonen | In this work, we analyse the implications of this variation on the language modeling (LM) task. |
30 | A Fast, Compact, Accurate Model for Language Identification of Codemixed Text | Yuan Zhang, Jason Riesa, Daniel Gillick, Anton Bakalov, Jason Baldridge, David Weiss | We address fine-grained multilingual language identification: providing a language code for every token in a sentence, including codemixed text containing multiple languages. |
31 | Personalized Microblog Sentiment Classification via Adversarial Cross-lingual Multi-task Learning | Weichao Wang, Shi Feng, Wei Gao, Daling Wang, Yifei Zhang | Based on this observation, in this paper we propose a novel user-attention-based Convolutional Neural Network (CNN) model with adversarial cross-lingual learning framework. |
32 | Cross-lingual Knowledge Graph Alignment via Graph Convolutional Networks | Zhichun Wang, Qingsong Lv, Xiaohan Lan, Yu Zhang | In this paper, we propose a novel approach for cross-lingual KG alignment via graph convolutional networks (GCNs). |
33 | Cross-lingual Lexical Sememe Prediction | Fanchao Qi, Yankai Lin, Maosong Sun, Hao Zhu, Ruobing Xie, Zhiyuan Liu | We propose a novel framework to model correlations between sememes and multi-lingual words in low-dimensional semantic space for sememe prediction. |
34 | Neural Cross-Lingual Named Entity Recognition with Minimal Resources | Jiateng Xie, Zhilin Yang, Graham Neubig, Noah A. Smith, Jaime Carbonell | To improve mapping of lexical items across languages, we propose a method that finds translations based on bilingual word embeddings. |
35 | A Stable and Effective Learning Strategy for Trainable Greedy Decoding | Yun Chen, Victor O.K. Li, Kyunghyun Cho, Samuel Bowman | In this paper, we propose a flexible new method that allows us to reap nearly the full benefits of beam search with nearly no additional computational cost. |
36 | Addressing Troublesome Words in Neural Machine Translation | Yang Zhao, Jiajun Zhang, Zhongjun He, Chengqing Zong, Hua Wu | To address this problem, we propose a novel memoryenhanced NMT method. |
37 | Top-down Tree Structured Decoding with Syntactic Connections for Neural Machine Translation and Parsing | Jetic Gū, Hassan S. Shavarani, Anoop Sarkar | We compare our NMT model with sequential and state of the art syntax-based NMT models and show that our model produces more fluent translations with better reordering. |
38 | XL-NBT: A Cross-lingual Neural Belief Tracking Framework | Wenhu Chen, Jianshu Chen, Yu Su, Xin Wang, Dong Yu, Xifeng Yan, William Yang Wang | We specifically discuss two types of common parallel resources: bilingual corpus and bilingual dictionary, and design different transfer learning strategies accordingly. |
39 | Contextual Parameter Generation for Universal Neural Machine Translation | Emmanouil Antonios Platanios, Mrinmaya Sachan, Graham Neubig, Tom Mitchell | Our approach requires no changes to the model architecture of a standard NMT system, but instead introduces a new component, the contextual parameter generator (CPG), that generates the parameters of the system (e.g., weights in a neural network). |
40 | Back-Translation Sampling by Targeting Difficult Words in Neural Machine Translation | Marzieh Fadaee, Christof Monz | In this work, we explore different aspects of back-translation, and show that words with high prediction loss during training benefit most from the addition of synthetic data. |
41 | Multi-Domain Neural Machine Translation with Word-Level Domain Context Discrimination | Jiali Zeng, Jinsong Su, Huating Wen, Yang Liu, Jun Xie, Yongjing Yin, Jianqiang Zhao | Based on this intuition, in this paper, we devote to distinguishing and exploiting word-level domain contexts for multi-domain NMT. |
42 | A Discriminative Latent-Variable Model for Bilingual Lexicon Induction | Sebastian Ruder, Ryan Cotterell, Yova Kementchedjhieva, Anders Søgaard | We introduce a novel discriminative latent-variable model for the task of bilingual lexicon induction. |
43 | Non-Adversarial Unsupervised Word Translation | Yedid Hoshen, Lior Wolf | In this paper, we make the observation that two sufficiently similar distributions can be aligned correctly with iterative matching methods. |
44 | Semi-Autoregressive Neural Machine Translation | Chunqi Wang, Ji Zhang, Haiqing Chen | In this paper, we propose a novel model for fast sequence generation – the semi-autoregressive Transformer (SAT). |
45 | Understanding Back-Translation at Scale | Sergey Edunov, Myle Ott, Michael Auli, David Grangier | This work broadens the understanding of back-translation and investigates a number of methods to generate synthetic source sentences. |
46 | Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages | Shyam Upadhyay, Jordan Kodner, Dan Roth | In this work, we present a bootstrapping algorithm that uses constrained discovery to improve generation, and can be used with as few as 500 training examples, which we show can be sourced from annotators in a matter of hours. |
47 | NORMA: Neighborhood Sensitive Maps for Multilingual Word Embeddings | Ndapa Nakashole | With the goal of capturing such differences, we propose a method for learning neighborhood sensitive maps, NORMA. |
48 | Adaptive Multi-pass Decoder for Neural Machine Translation | Xinwei Geng, Xiaocheng Feng, Bing Qin, Ting Liu | In this paper, we propose a novel architecture called adaptive multi-pass decoder, which introduces a flexible multi-pass polishing mechanism to extend the capacity of NMT via reinforcement learning. |
49 | Improving the Transformer Translation Model with Document-Level Context | Jiacheng Zhang, Huanbo Luan, Maosong Sun, Feifei Zhai, Jingfang Xu, Min Zhang, Yang Liu | In this work, we extend the Transformer model with a new context encoder to represent document-level context, which is then incorporated into the original encoder and decoder. |
50 | MTNT: A Testbed for Machine Translation of Noisy Text | Paul Michel, Graham Neubig | In this paper, we propose a benchmark dataset for Machine Translation of Noisy Text (MTNT), consisting of noisy comments on Reddit (\url{www.reddit.com}) and professionally sourced translations. |
51 | SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach | Michael Petrochuk, Luke Zettlemoyer | In this paper, we present new evidence that this benchmark can be nearly solved by standard methods. |
52 | Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension | Minjoon Seo, Tom Kwiatkowski, Ankur Parikh, Ali Farhadi, Hannaneh Hajishirzi | It additionally leads to a significant scalability advantage since the encoding of the answer candidate phrases in the document can be pre-computed and indexed offline for efficient retrieval. |
53 | Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering | Jinhyuk Lee, Seongjun Yun, Hyunjae Kim, Miyoung Ko, Jaewoo Kang | In this paper, we introduce Paragraph Ranker which ranks paragraphs of retrieved documents for a higher answer recall with less noise. |
54 | Cut to the Chase: A Context Zoom-in Network for Reading Comprehension | Sathish Reddy Indurthi, Seunghak Yu, Seohyun Back, Heriberto Cuayáhuitl | We present a novel neural-based architecture that is capable of extracting relevant regions based on a given question-document pair and generating a well-formed answer. |
55 | Adaptive Document Retrieval for Deep Question Answering | Bernhard Kratzwald, Stefan Feuerriegel | As a remedy, we propose an adaptive document retrieval model. |
56 | Why is unsupervised alignment of English embeddings from different algorithms so hard? | Mareike Hartmann, Yova Kementchedjhieva, Anders Søgaard | This paper presents a challenge to the community: Generative adversarial networks (GANs) can perfectly align independent English word embeddings induced using the same algorithm, based on distributional information alone; but fails to do so, for two different embeddings algorithms. |
57 | Quantifying Context Overlap for Training Word Embeddings | Yimeng Zhuang, Jinghui Xie, Yinhe Zheng, Xuan Zhu | In this paper, a metric is designed to estimate second order co-occurrence relations based on context overlap. |
58 | Neural Latent Relational Analysis to Capture Lexical Semantic Relations in a Vector Space | Koki Washio, Tsuneaki Kato | In this paper, we propose a novel model of this pattern-based approach, neural latent relational analysis (NLRA). |
59 | Generalizing Word Embeddings using Bag of Subwords | Jinman Zhao, Sidharth Mudgal, Yingyu Liang | We propose a subword-level word vector generation model that views words as bags of character $n$-grams. |
60 | Neural Metaphor Detection in Context | Ge Gao, Eunsol Choi, Yejin Choi, Luke Zettlemoyer | We present end-to-end neural models for detecting metaphorical word use in context. |
61 | Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging | Barbara Plank, Željko Agić | The model exploits annotation projection, instance selection, tag dictionaries, morphological lexicons, and distributed representations, all in a uniform framework. |
62 | Unsupervised Bilingual Lexicon Induction via Latent Variable Models | Zi-Yi Dou, Zhi-Hao Zhou, Shujian Huang | With the recent advances in generative models, we propose a novel approach which builds cross-lingual dictionaries via latent variable models and adversarial training with no parallel corpora. |
63 | Learning Unsupervised Word Translations Without Adversaries | Tanmoy Mukherjee, Makoto Yamada, Timothy Hospedales | We present a statistical dependency-based approach to bilingual dictionary induction that is unsupervised – no seed dictionary or parallel corpora required; and introduces no adversary – therefore being much easier to train. |
64 | Adversarial Training for Multi-task and Multi-lingual Joint Modeling of Utterance Intent Classification | Ryo Masumura, Yusuke Shinohara, Ryuichiro Higashinaka, Yushi Aono | This paper proposes an adversarial training method for the multi-task and multi-lingual joint modeling needed for utterance intent classification. |
65 | Surprisingly Easy Hard-Attention for Sequence to Sequence Learning | Shiv Shankar, Siddhant Garg, Sunita Sarawagi | In this paper we show that a simple beam approximation of the joint distribution between attention and output is an easy, accurate, and efficient attention mechanism for sequence to sequence learning. |
66 | Joint Learning for Emotion Classification and Emotion Cause Detection | Ying Chen, Wenjun Hou, Xiyao Cheng, Shoushan Li | We present a neural network-based joint approach for emotion classification and emotion cause detection, which attempts to capture mutual benefits across the two sub-tasks of emotion analysis. |
67 | Exploring Optimism and Pessimism in Twitter Using Deep Learning | Cornelia Caragea, Liviu P. Dinu, Bogdan Dumitru | In this paper, we explore a range of deep learning models to predict optimism and pessimism in Twitter at both tweet and user level and show that these models substantially outperform traditional machine learning classifiers used in prior work. |
68 | Predicting News Headline Popularity with Syntactic and Semantic Knowledge Using Multi-Task Learning | Sotiris Lamprinidis, Daniel Hardt, Dirk Hovy | We model each of these factors in a multi-task GRU network to predict headline popularity. |
69 | Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates | Di Chen, Jiachen Du, Lidong Bing, Ruifeng Xu | To alleviate this problem, this paper proposes a hybrid neural attention model which combines self and cross attention mechanism to locate salient part from textual context and interaction between users. |
70 | Increasing In-Class Similarity by Retrofitting Embeddings with Demographic Information | Dirk Hovy, Tommaso Fornaciari | We use homophily cues to retrofit text-based author representations with non-linguistic information, and introduce a trade-off parameter. |
71 | A Syntactically Constrained Bidirectional-Asynchronous Approach for Emotional Conversation Generation | Jingyuan Li, Xiao Sun | In this paper, a syntactically constrained bidirectional-asynchronous approach for emotional conversation generation (E-SCBA) is proposed to address this issue. |
72 | Auto-Dialabel: Labeling Dialogue Data with Unsupervised Learning | Chen Shi, Qi Chen, Lei Sha, Sujian Li, Xu Sun, Houfeng Wang, Lintao Zhang | In this paper, we instead propose our framework auto-dialabel to automatically cluster the dialogue intents and slots. In this framework, we collect a set of context features, leverage an autoencoder for feature assembly, and adapt a dynamic hierarchical clustering method for intent and slot labeling. |
73 | Extending Neural Generative Conversational Model using External Knowledge Sources | Prasanna Parthasarathi, Joelle Pineau | This work proposes an architecture to incorporate unstructured knowledge sources to enhance the next utterance prediction in chit-chat type of generative dialogue models. |
74 | Modeling Temporality of Human Intentions by Domain Adaptation | Xiaolei Huang, Lixing Liu, Kate Carey, Joshua Woolley, Stefan Scherer, Brian Borsari | This paper proposes a method that models the temporal factor by using domain adaptation on clinical dialogue corpora, Motivational Interviewing (MI). |
75 | An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation | Liangchen Luo, Jingjing Xu, Junyang Lin, Qi Zeng, Xu Sun | To address this problem, we propose an Auto-Encoder Matching (AEM) model to learn such dependency. |
76 | A Dataset for Document Grounded Conversations | Kangyan Zhou, Shrimai Prabhumoye, Alan W Black | This paper introduces a document grounded dataset for conversations. |
77 | Out-of-domain Detection based on Generative Adversarial Network | Seonghan Ryu, Sangjun Koo, Hwanjo Yu, Gary Geunbae Lee | The main goal of this paper is to develop out-of-domain (OOD) detection for dialog systems. |
78 | Listening Comprehension over Argumentative Content | Shachar Mirkin, Guy Moshkowich, Matan Orbach, Lili Kotlerman, Yoav Kantor, Tamar Lavee, Michal Jacovi, Yonatan Bilu, Ranit Aharonov, Noam Slonim | All data used in this work is freely available for research. This paper presents a task for machine listening comprehension in the argumentation domain and a corresponding dataset in English. |
79 | Using active learning to expand training data for implicit discourse relation recognition | Yang Xu, Yu Hong, Huibin Ruan, Jianmin Yao, Min Zhang, Guodong Zhou | In this paper, we follow Rutherford and Xue (2015) to expand the training data set using the corpus of explicitly-related arguments, by arbitrarily dropping the overtly presented discourse connectives. |
80 | Learning To Split and Rephrase From Wikipedia Edit History | Jan A. Botha, Manaal Faruqui, John Alex, Jason Baldridge, Dipanjan Das | Learning To Split and Rephrase From Wikipedia Edit History |
81 | BLEU is Not Suitable for the Evaluation of Text Simplification | Elior Sulem, Omri Abend, Ari Rappoport | In this paper we show that BLEU is not suitable for the evaluation of sentence splitting, the major structural simplification operation. |
82 | S2SPMN: A Simple and Effective Framework for Response Generation with Relevant Information | Jiaxin Pei, Chenliang Li | In this paper, we propose Sequence to Sequence with Prototype Memory Network (S2SPMN) to exploit the relevant information provided by the large dialogue corpus to enhance response generation. |
83 | Improving Reinforcement Learning Based Image Captioning with Natural Language Prior | Tszhang Guo, Shiyu Chang, Mo Yu, Kun Bai | Recently, Reinforcement Learning (RL) approaches have demonstrated advanced performance in image captioning by directly optimizing the metric used for testing. |
84 | Training for Diversity in Image Paragraph Captioning | Luke Melas-Kyriazi, Alexander Rush, George Han | In this work, we consider applying sequence-level training for this task. |
85 | A Graph-theoretic Summary Evaluation for ROUGE | Elaheh ShafieiBavani, Mohammad Ebrahimi, Raymond Wong, Fang Chen | We propose a graph-based approach adopted into ROUGE to evaluate summaries based on both lexical and semantic similarities. |
86 | Guided Neural Language Generation for Abstractive Summarization using Abstract Meaning Representation | Hardy Hardy, Andreas Vlachos | In this paper, we extend previous work on abstractive summarization using Abstract Meaning Representation (AMR) with a neural language generation stage which we guide using the source document. |
87 | Evaluating Multiple System Summary Lengths: A Case Study | Ori Shapira, David Gabay, Hadar Ronen, Judit Bar-Ilan, Yael Amsterdamer, Ani Nenkova, Ido Dagan | In this paper, we raise the research question of whether reference summaries of a single length can be used to reliably evaluate system summaries of multiple lengths. |
88 | Neural Latent Extractive Document Summarization | Xingxing Zhang, Mirella Lapata, Furu Wei, Ming Zhou | We propose a latent variable extractive model, where sentences are viewed as latent variables and sentences with activated variables are used to infer gold summaries. |
89 | On the Abstractiveness of Neural Document Summarization | Fangfang Zhang, Jin-ge Yao, Rui Yan | These findings suggest the possibility for future efforts towards more efficient systems that could better utilize the vocabulary in the original document. |
90 | Automatic Essay Scoring Incorporating Rating Schema via Reinforcement Learning | Yucheng Wang, Zhongyu Wei, Yaqian Zhou, Xuanjing Huang | In order to address this issue, we propose a reinforcement learning framework for essay scoring that incorporates quadratic weighted kappa as guidance to optimize the scoring system. |
91 | Identifying Well-formed Natural Language Questions | Manaal Faruqui, Dipanjan Das | Here, we introduce a new task of identifying a well-formed natural language question. We construct and release a dataset of 25,100 publicly available questions classified into well-formed and non-wellformed categories and report an accuracy of 70.7% on the test set. |
92 | Self-Governing Neural Networks for On-Device Short Text Classification | Sujith Ravi, Zornitsa Kozareva | We propose on-device Self-Governing Neural Networks (SGNNs), which learn compact projection vectors with local sensitive hashing. |
93 | HFT-CNN: Learning Hierarchical Category Structure for Multi-label Short Text Categorization | Kazuya Shimura, Jiyi Li, Fumiyo Fukumoto | We propose an approach which can effectively utilize the data in the upper levels to contribute the categorization in the lower levels by applying the Convolutional Neural Network (CNN) with a fine-tuning technique. |
94 | A Hierarchical Neural Attention-based Text Classifier | Koustuv Sinha, Yue Dong, Jackie Chi Kit Cheung, Derek Ruths | In this work, we use external knowledge in the form of topic category taxonomies to aide the classification by introducing a deep hierarchical neural attention-based classifier. |
95 | Labeled Anchors and a Scalable, Transparent, and Interactive Classifier | Jeffrey Lund, Stephen Cowley, Wilson Fearn, Emily Hales, Kevin Seppi | We propose Labeled Anchors, an interactive and supervised topic model based on the anchor words algorithm (Arora et al., 2013). |
96 | Coherence-Aware Neural Topic Modeling | Ran Ding, Ramesh Nallapati, Bing Xiang | In this work, under a neural variational inference framework, we propose methods to incorporate a topic coherence objective into the training process. |
97 | Utilizing Character and Word Embeddings for Text Normalization with Sequence-to-Sequence Models | Daniel Watson, Nasser Zalmout, Nizar Habash | To address these challenges, we use a sequence-to-sequence model with character-based attention, which in addition to its self-learned character embeddings, uses word embeddings pre-trained with an approach that also models subword information. |
98 | Topic Intrusion for Automatic Topic Model Evaluation | Shraey Bhatia, Jey Han Lau, Timothy Baldwin | In this paper, we explore the topic intrusion task – the task of guessing an outlier topic given a document and a few topics – and propose a method to automate it. |
99 | Supervised and Unsupervised Methods for Robust Separation of Section Titles and Prose Text in Web Documents | Abhijith Athreya Mysore Gopinath, Shomir Wilson, Norman Sadeh | To remedy this, we present a flexible system for automatically extracting the hierarchical section titles and prose organization of web documents irrespective of differences in HTML representation. |
100 | SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation | Xinyi Wang, Hieu Pham, Zihang Dai, Graham Neubig | In this work, we examine methods for data augmentation for text-based tasks such as neural machine translation (NMT). |
101 | Improving Unsupervised Word-by-Word Translation with Language Model and Denoising Autoencoder | Yunsu Kim, Jiahui Geng, Hermann Ney | In this paper, we propose simple yet effective methods to improve word-by-word translation of cross-lingual embeddings, using only monolingual corpora but without any back-translation. |
102 | Decipherment of Substitution Ciphers with Neural Language Models | Nishant Kambhatla, Anahita Mansouri Bigvand, Anoop Sarkar | We propose a beam search algorithm that scores the entire candidate plaintext at each step of the decipherment using a neural language model. |
103 | Rapid Adaptation of Neural Machine Translation to New Languages | Graham Neubig, Junjie Hu | We propose methods based on starting with massively multilingual “seed models”, which can be trained ahead-of-time, and then continuing training on data related to the LRL. |
104 | Compact Personalized Models for Neural Machine Translation | Joern Wuebker, Patrick Simianer, John DeNero | We propose and compare methods for gradient-based domain adaptation of self-attentive neural machine translation models. |
105 | Self-Governing Neural Networks for On-Device Short Text Classification | Sujith Ravi, Zornitsa Kozareva | We propose on-device Self-Governing Neural Networks (SGNNs), which learn compact projection vectors with local sensitive hashing. |
106 | Supervised Domain Enablement Attention for Personalized Domain Classification | Joo-Kyung Kim, Young-Bum Kim | In this paper, we propose a supervised enablement attention mechanism, which utilizes sigmoid activation for the attention weighting so that the attention can be computed with more expressive power without the weight sum constraint of softmax attention. |
107 | A Deep Neural Network Sentence Level Classification Method with Context Information | Xingyi Song, Johann Petrak, Angus Roberts | We present a new method for sentence classification, Context-LSTM-CNN, that makes use of potentially large contexts. |
108 | Towards Dynamic Computation Graphs via Sparse Latent Structure | Vlad Niculae, André F. T. Martins, Claire Cardie | Using the recently proposed SparseMAP inference, which retrieves a sparse distribution over latent structures, we propose a novel approach for end-to-end learning of latent structure predictors jointly with a downstream predictor. |
109 | Convolutional Neural Networks with Recurrent Neural Filters | Yi Yang | In this work, we model convolution filters with RNNs that naturally capture compositionality and long-term dependencies in language. |
110 | Exploiting Rich Syntactic Information for Semantic Parsing with Graph-to-Sequence Model | Kun Xu, Lingfei Wu, Zhiguo Wang, Mo Yu, Liwei Chen, Vadim Sheinin | Exploiting Rich Syntactic Information for Semantic Parsing with Graph-to-Sequence Model |
111 | Retrieval-Based Neural Code Generation | Shirley Anugrah Hayati, Raphael Olivier, Pravalika Avvaru, Pengcheng Yin, Anthony Tomasic, Graham Neubig | We introduce RECODE, a method based on subtree retrieval that makes it possible to explicitly reference existing code examples within a neural code generation model. |
112 | SQL-to-Text Generation with Graph-to-Sequence Model | Kun Xu, Lingfei Wu, Zhiguo Wang, Yansong Feng, Vadim Sheinin | In this paper, we propose a graph-to-sequence model to encode the global structure information into node embeddings. |
113 | Generating Syntactic Paraphrases | Emilie Colin, Claire Gardent | We study the automatic generation of syntactic paraphrases using four different models for generation: data-to-text generation, text-to-text generation, text reduction and text expansion, We derive training data for each of these tasks from the WebNLG dataset and we show (i) that conditioning generation on syntactic constraints effectively permits the generation of syntactically distinct paraphrases for the same input and (ii) that exploiting different types of input (data, text or data+text) further increases the number of distinct paraphrases that can be generated for a given input. |
114 | Neural-Davidsonian Semantic Proto-role Labeling | Rachel Rudinger, Adam Teichert, Ryan Culkin, Sheng Zhang, Benjamin Van Durme | We present a model for semantic proto-role labeling (SPRL) using an adapted bidirectional LSTM encoding strategy that we call NeuralDavidsonian: predicate-argument structure is represented as pairs of hidden states corresponding to predicate and argument head tokens of the input sequence. |
115 | Conversational Decision-Making Model for Predicting the King’s Decision in the Annals of the Joseon Dynasty | JinYeong Bak, Alice Oh | As a dataset, we introduce conversational meeting records from a historical corpus, and develop a hierarchical RNN structure with attention and pre-trained speaker embedding in the form of a, Conversational Decision Making Model (CDMM). |
116 | Toward Fast and Accurate Neural Discourse Segmentation | Yizhong Wang, Sujian Li, Jingfeng Yang | In this paper, we propose an end-to-end neural segmenter based on BiLSTM-CRF framework. |
117 | A Dataset for Telling the Stories of Social Media Videos | Spandana Gella, Mike Lewis, Marcus Rohrbach | To learn and evaluate such models, we introduce VideoStory a new large-scale dataset for video description as a new challenge for multi-sentence video description. |
118 | Cascaded Mutual Modulation for Visual Reasoning | Yiqun Yao, Jiaming Xu, Feng Wang, Bo Xu | We propose CMM: Cascaded Mutual Modulation as a novel end-to-end visual reasoning model. |
119 | How agents see things: On visual representations in an emergent language game | Diane Bouchacourt, Marco Baroni | In this paper, we consider the referential games of Lazaridou et al. (2017), and investigate the representations the agents develop during their evolving interaction. |
120 | Attention-Based Capsule Networks with Dynamic Routing for Relation Extraction | Ningyu Zhang, Shumin Deng, Zhanling Sun, Xi Chen, Wei Zhang, Huajun Chen | In this paper, we explore the capsule networks used for relation extraction in a multi-instance multi-label learning framework and propose a novel neural approach based on capsule networks with attention mechanisms. |
121 | Put It Back: Entity Typing with Language Model Enhancement | Ji Xin, Hao Zhu, Xu Han, Zhiyuan Liu, Maosong Sun | To address this issue, we propose entity typing with language model enhancement. |
122 | Event Detection with Neural Networks: A Rigorous Empirical Evaluation | Walker Orr, Prasad Tadepalli, Xiaoli Fern | In this paper we present a novel GRU-based model that combines syntactic information along with temporal structure through an attention mechanism. |
123 | PubSE: A Hierarchical Model for Publication Extraction from Academic Homepages | Yiqing Zhang, Jianzhong Qi, Rui Zhang, Chuandong Yin | To capture the positional and structural diversity, we propose an end-to-end hierarchical model named PubSE based on Bi-LSTM-CRF. |
124 | A Neural Transition-based Model for Nested Mention Recognition | Bailin Wang, Wei Lu, Yu Wang, Hongxia Jin | This paper introduces a scalable transition-based method to model the nested structure of mentions. |
125 | Genre Separation Network with Adversarial Training for Cross-genre Relation Extraction | Ge Shi, Chong Feng, Lifu Huang, Boliang Zhang, Heng Ji, Lejian Liao, Heyan Huang | Relation Extraction suffers from dramatical performance decrease when training a model on one genre and directly applying it to a new genre, due to the distinct feature distributions. |
126 | Effective Use of Context in Noisy Entity Linking | David Mueller, Greg Durrett | We investigate several techniques for using these cues in the context of noisy entity linking on short texts. |
127 | Exploiting Contextual Information via Dynamic Memory Network for Event Detection | Shaobo Liu, Rui Cheng, Xiaoming Yu, Xueqi Cheng | In light of the multi-hop mechanism of the DMN to model the context, we propose the trigger detection dynamic memory network (TD-DMN) to tackle the event detection problem. |
128 | Do explanations make VQA models more predictable to a human? | Arjun Chandrasekaran, Viraj Prabhu, Deshraj Yadav, Prithvijit Chattopadhyay, Devi Parikh | In this work, we analyze if existing explanations indeed make a VQA model – its responses as well as failures – more predictable to a human. |
129 | Facts That Matter | Marco Ponza, Luciano Del Corro, Gerhard Weikum | This work introduces fact salience: The task of generating a machine-readable representation of the most prominent information in a text document as a set of facts. |
130 | Entity Tracking Improves Cloze-style Reading Comprehension | Luong Hoang, Sam Wiseman, Alexander Rush | Recent work has improved on modeling for reading comprehension tasks with simple approaches such as the Attention Sum-Reader; however, automatic systems still significantly trail human performance. |
131 | Adversarial Domain Adaptation for Duplicate Question Detection | Darsh Shah, Tao Lei, Alessandro Moschitti, Salvatore Romeo, Preslav Nakov | We address the problem of detecting duplicate questions in forums, which is an important step towards automating the process of answering new questions. |
132 | Translating a Math Word Problem to a Expression Tree | Lei Wang, Yan Wang, Deng Cai, Dongxiang Zhang, Xiaojiang Liu | In this paper, by considering the uniqueness of expression tree, we propose an equation normalization method to normalize the duplicated equations. |
133 | Semantic Linking in Convolutional Neural Networks for Answer Sentence Selection | Massimo Nicosia, Alessandro Moschitti | In this paper, instead of focusing on architecture engineering, we take advantage of small amounts of labelled data that model semantic phenomena in text to encode matching features directly in the word representations. |
134 | A dataset and baselines for sequential open-domain question answering | Ahmed Elgohary, Chen Zhao, Jordan Boyd-Graber | We present QBLink, a new dataset of fully human-authored questions. |
135 | Improving the results of string kernels in sentiment analysis and Arabic dialect identification by adapting them to your test set | Radu Tudor Ionescu, Andrei M. Butnaru | In this paper, we apply two simple yet effective transductive learning approaches to further improve the results of string kernels. |
136 | Parameterized Convolutional Neural Networks for Aspect Level Sentiment Classification | Binxuan Huang, Kathleen Carley | We introduce a novel parameterized convolutional neural network for aspect level sentiment classification. |
137 | Improving Multi-label Emotion Classification via Sentiment Classification with Dual Attention Transfer Network | Jianfei Yu, Luís Marujo, Jing Jiang, Pradeep Karuturi, William Brendel | In this paper, we target at improving the performance of multi-label emotion classification with the help of sentiment classification. |
138 | Learning Sentiment Memories for Sentiment Modification without Parallel Data | Yi Zhang, Jingjing Xu, Pengcheng Yang, Xu Sun | In this paper, motivated by the fact the non-emotional context (e.g., “staff”) provides strong cues for the occurrence of emotional words (e.g., “friendly”), we propose a novel method that automatically extracts appropriate sentiment information from learned sentiment memories according to the specific context. |
139 | Joint Aspect and Polarity Classification for Aspect-based Sentiment Analysis with End-to-End Neural Networks | Martin Schmitt, Simon Steinheber, Konrad Schreiber, Benjamin Roth | In this work, we propose a new model for aspect-based sentiment analysis. |
140 | Representing Social Media Users for Sarcasm Detection | Y. Alex Kolchinski, Christopher Potts | We explore two methods for representing authors in the context of textual sarcasm detection: a Bayesian approach that directly represents authors’ propensities to be sarcastic, and a dense embedding approach that can learn interactions between the author and the text. |
141 | Syntactical Analysis of the Weaknesses of Sentiment Analyzers | Rohil Verma, Samuel Kim, David Walter | We describe the particular syntactic phenomenon that these analyzers fail to understand that any ideal sentiment analyzer must. We also provide a set of 150 test sentences that any ideal sentiment analyzer must be able to understand. |
142 | Is Nike female? Exploring the role of sound symbolism in predicting brand name gender | Sridhar Moorthy, Ruth Pogacar, Samin Khan, Yang Xu | We present a simple computational approach that uses sound symbolism to address this open issue. |
143 | Improving Large-Scale Fact-Checking using Decomposable Attention Models and Lexical Tagging | Nayeon Lee, Chien-Sheng Wu, Pascale Fung | In this paper, we extend an existing pipeline approach to better tackle this problem. |
144 | Harnessing Popularity in Social Media for Extractive Summarization of Online Conversations | Ryuji Kano, Yasuhide Miura, Motoki Taniguchi, Yan-Ying Chen, Francine Chen, Tomoko Ohkuma | We propose Disjunctive model that computes the contribution of content and context separately. For evaluation, we build a dataset where the informativeness of comments is annotated. |
145 | Identifying Locus of Control in Social Media Language | Masoud Rouhizadeh, Kokil Jaidka, Laura Smith, H. Andrew Schwartz, Anneke Buffone, Lyle Ungar | We present rich insights into these linguistic aspects and find that while the language signaling control is easy to identify, it is more challenging to label it is internally or externally controlled, with lexical features outperforming syntactic features at the task. |
146 | Somm: Into the Model | Shengli Hu | We propose and train corresponding machine learning models that match these skills, and compare algorithmic results with real data collected from a large group of wine professionals. |
147 | Fine-Grained Emotion Detection in Health-Related Online Posts | Hamed Khanpour, Cornelia Caragea | Fine-Grained Emotion Detection in Health-Related Online Posts |
148 | The Remarkable Benefit of User-Level Aggregation for Lexical-based Population-Level Predictions | Salvatore Giorgi, Daniel Preoţiuc-Pietro, Anneke Buffone, Daniel Rieman, Lyle Ungar, H. Andrew Schwartz | This paper describes a simple yet effective method for building community-level models using Twitter language aggregated by user. We make our aggregated and anonymized community-level data, derived from 37 billion tweets – over 1 billion of which were mapped to counties, available for research. |
149 | Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement | Jason Lee, Elman Mansimov, Kyunghyun Cho | We propose a conditional non-autoregressive neural sequence model based on iterative refinement. |
150 | Large Margin Neural Language Model | Jiaji Huang, Yi Li, Wei Ping, Liang Huang | We propose a large margin criterion for training neural language models. |
151 | Targeted Syntactic Evaluation of Language Models | Rebecca Marvin, Tal Linzen | We present a data set for evaluating the grammaticality of the predictions of a language model. |
152 | Rational Recurrences | Hao Peng, Roy Schwartz, Sam Thomson, Noah A. Smith | In this work, we show that some recurrent neural networks also share this connection to WFSAs. |
153 | Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling | Liyuan Liu, Xiang Ren, Jingbo Shang, Xiaotao Gu, Jian Peng, Jiawei Han | Here we propose to compress bulky LMs while preserving useful information with regard to a specific task. |
154 | Automatic Event Salience Identification | Zhengzhong Liu, Chenyan Xiong, Teruko Mitamura, Eduard Hovy | This paper empirically studies Event Salience and proposes two salience detection models based on discourse relations. |
155 | Temporal Information Extraction by Predicting Relative Time-lines | Artuur Leeuwenberg, Marie-Francine Moens | In this paper, we propose a new method to construct a linear time-line from a set of (extracted) temporal relations. |
156 | Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation | Xiao Liu, Zhunchen Luo, Heyan Huang | In this paper, we propose a novel Jointly Multiple Events Extraction (JMEE) framework to jointly extract multiple event triggers and arguments by introducing syntactic shortcut arcs to enhance information flow and attention-based graph convolution networks to model graph information. |
157 | RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information | Shikhar Vashishth, Rishabh Joshi, Sai Suman Prayaga, Chiranjib Bhattacharyya, Partha Talukdar | In this paper, we propose RESIDE, a distantly-supervised neural relation extraction method which utilizes additional side information from KBs for improved relation extraction. |
158 | Collective Event Detection via a Hierarchical and Bias Tagging Networks with Gated Multi-level Attention Mechanisms | Yubo Chen, Hang Yang, Kang Liu, Jun Zhao, Yantao Jia | This paper proposes a novel framework dubbed as Hierarchical and Bias Tagging Networks with Gated Multi-level Attention Mechanisms (HBTNGMA) to solve the two problems simultaneously. |
159 | Valency-Augmented Dependency Parsing | Tianze Shi, Lillian Lee | We present a complete, automated, and efficient approach for utilizing valency analysis in making dependency parsing decisions. |
160 | Unsupervised Learning of Syntactic Structure with Invertible Neural Projections | Junxian He, Graham Neubig, Taylor Berg-Kirkpatrick | In this work, we propose a novel generative model that jointly learns discrete syntactic structure and continuous word representations in an unsupervised fashion by cascading an invertible neural network with a structured generative prior. |
161 | Dynamic Oracles for Top-Down and In-Order Shift-Reduce Constituent Parsing | Daniel Fernández-González, Carlos Gómez-Rodríguez | We introduce novel dynamic oracles for training two of the most accurate known shift-reduce algorithms for constituent parsing: the top-down and in-order transition-based parsers. |
162 | Constituent Parsing as Sequence Labeling | Carlos Gómez-Rodríguez, David Vilares | We introduce a method to reduce constituent parsing to sequence labeling. We then use the PTB and CTB treebanks as testbeds and propose a set of fast baselines. |
163 | Synthetic Data Made to Order: The Case of Parsing | Dingquan Wang, Jason Eisner | We show how to (stochastically) permute the constituents of an existing dependency treebank so that its surface part-of-speech statistics approximately match those of the target language. |
164 | Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions | Qing Li, Jianlong Fu, Dongfei Yu, Tao Mei, Jiebo Luo | In this work, we propose to break up the end-to-end VQA into two steps: explaining and reasoning, in an attempt towards a more explainable VQA by shedding light on the intermediate results between these two steps. |
165 | Learning a Policy for Opportunistic Active Learning | Aishwarya Padmakumar, Peter Stone, Raymond Mooney | In this work, we use reinforcement learning for such an object retrieval task, to learn a policy that effectively trades off task completion with model improvement that would benefit future tasks. |
166 | RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes | Semih Yagcioglu, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis | In this work, we introduce RecipeQA, a dataset for multimodal comprehension of cooking recipes. |
167 | TVQA: Localized, Compositional Video Question Answering | Jie Lei, Licheng Yu, Mohit Bansal, Tamara Berg | In this paper, we present TVQA, a large-scale video QA dataset based on 6 popular TV shows. |
168 | Localizing Moments in Video with Temporal Language | Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, Bryan Russell | We propose a new model that explicitly reasons about different temporal segments in a video, and shows that temporal context is important for localizing phrases which include temporal language. |
169 | Card-660: Cambridge Rare Word Dataset – a Reliable Benchmark for Infrequent Word Representation Models | Mohammad Taher Pilehvar, Dimitri Kartsaklis, Victor Prokhorov, Nigel Collier | We show in this paper that the only existing benchmark (the Stanford Rare Word dataset) suffers from low-confidence annotations and limited vocabulary; hence, it does not constitute a solid comparison framework. In order to fill this evaluation gap, we propose Cambridge Rare word Dataset (Card-660), an expert-annotated word similarity dataset which provides a highly reliable, yet challenging, benchmark for rare word representation techniques. |
170 | Leveraging Gloss Knowledge in Neural Word Sense Disambiguation by Hierarchical Co-Attention | Fuli Luo, Tianyu Liu, Zexue He, Qiaolin Xia, Zhifang Sui, Baobao Chang | In this paper, we find that the learning for the context and gloss representation can benefit from each other. |
171 | Weeding out Conventionalized Metaphors: A Corpus of Novel Metaphor Annotations | Erik-Lân Do Dinh, Hannah Wieland, Iryna Gurevych | The goal of this paper is to alleviate this situation by introducing a crowdsourced novel metaphor annotation layer for an existing metaphor corpus. |
172 | Streaming word similarity mining on the cheap | Olof Görnerup, Daniel Gillblad | In this paper, we propose a fast and lightweight method for estimating similarities from streams by explicitly counting second-order co-occurrences. |
173 | Memory, Show the Way: Memory Based Few Shot Word Representation Learning | Jingyuan Sun, Shaonan Wang, Chengqing Zong | In this paper, we propose Mem2Vec, a memory based embedding learning method capable of acquiring high quality word representations from fairly limited context. |
174 | Disambiguated skip-gram model | Karol Grzegorczyk, Marcin Kurdziel | We present disambiguated skip-gram: a neural-probabilistic model for learning multi-sense distributed representations of words. |
175 | Picking Apart Story Salads | Su Wang, Eric Holgate, Greg Durrett, Katrin Erk | To make this task accessible to neural models, we introduce \textit{Story Salads}, mixtures of multiple documents that can be generated at scale. |
176 | Dynamic Meta-Embeddings for Improved Sentence Representations | Douwe Kiela, Changhan Wang, Kyunghyun Cho | To that end, we introduce dynamic meta-embeddings, a simple yet effective method for the supervised learning of embedding ensembles, which leads to state-of-the-art performance within the same model class on a variety of tasks. |
177 | A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images | Melissa Ailem, Bowen Zhang, Aurelien Bellet, Pascal Denis, Fei Sha | In this work, we propose a novel probabilistic model to formalize how linguistic and perceptual inputs can work in concert to explain the observed word-context pairs in a text corpus. |
178 | Transfer and Multi-Task Learning for Noun–Noun Compound Interpretation | Murhaf Fares, Stephan Oepen, Erik Velldal | In this paper, we empirically evaluate the utility of transfer and multi-task learning on a challenging semantic classification task: semantic interpretation of noun-noun compounds. |
179 | Dissecting Contextual Word Embeddings: Architecture and Representation | Matthew Peters, Mark Neumann, Luke Zettlemoyer, Wen-tau Yih | In this paper, we present a detailed empirical study of how the choice of neural architecture (e.g. LSTM, CNN, or self attention) influences both end task accuracy and qualitative properties of the representations that are learned. |
180 | Preposition Sense Disambiguation and Representation | Hongyu Gong, Jiaqi Mu, Suma Bhat, Pramod Viswanath | In this paper we match each preposition’s left- and right context, and their interplay to the geometry of the word vectors to the left and right of the preposition. |
181 | Auto-Encoding Dictionary Definitions into Consistent Word Embeddings | Tom Bosc, Pascal Vincent | This paper presents a simple model that learns to compute word embeddings by processing dictionary definitions and trying to reconstruct them. |
182 | Spot the Odd Man Out: Exploring the Associative Power of Lexical Resources | Gabriel Stanovsky, Mark Hopkins | We propose Odd-Man-Out, a novel task which aims to test different properties of word representations. |
183 | Neural Multitask Learning for Simile Recognition | Lizhen Liu, Xiao Hu, Wei Song, Ruiji Fu, Ting Liu, Guoping Hu | We propose a neural network framework for jointly optimizing three tasks: simile sentence classification, simile component extraction and language modeling. We construct an annotated corpus for this research, which consists of 11.3k sentences that contain a comparator. |
184 | Structured Alignment Networks for Matching Sentences | Yang Liu, Matt Gardner, Mirella Lapata | We introduce a model of structured alignments between sentences, showing how to compare two sentences by matching their latent structures. |
185 | Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference | Yi Tay, Anh Tuan Luu, Siu Cheung Hui | This paper presents a new deep learning architecture for Natural Language Inference (NLI). |
186 | Convolutional Interaction Network for Natural Language Inference | Jingjing Gong, Xipeng Qiu, Xinchi Chen, Dong Liang, Xuanjing Huang | In this paper, we propose the Convolutional Interaction Network (CIN), a general model to capture the interaction between two sentences, which can be an alternative to the attention mechanism for NLI. |
187 | Lessons from Natural Language Inference in the Clinical Domain | Alexey Romanov, Chaitanya Shivade | We present strategies to: 1) leverage transfer learning using datasets from the open domain, (e.g. SNLI) and 2) incorporate domain knowledge from external data and lexical sources (e.g. medical terminologies). |
188 | Question Generation from SQL Queries Improves Neural Semantic Parsing | Daya Guo, Yibo Sun, Duyu Tang, Nan Duan, Jian Yin, Hong Chi, James Cao, Peng Chen, Ming Zhou | In this paper, we study how to learn a semantic parser of state-of-the-art accuracy with less supervised training data. |
189 | SemRegex: A Semantics-Based Approach for Generating Regular Expressions from Natural Language Specifications | Zexuan Zhong, Jiaqi Guo, Wei Yang, Jian Peng, Tao Xie, Jian-Guang Lou, Ting Liu, Dongmei Zhang | Recent research proposes syntax-based approaches to address the problem of generating programs from natural language specifications. |
190 | Decoupling Structure and Lexicon for Zero-Shot Semantic Parsing | Jonathan Herzig, Jonathan Berant | In this paper, we introduce a zero-shot approach to semantic parsing that can parse utterances in unseen domains while only being trained on examples in other source domains. |
191 | A Span Selection Model for Semantic Role Labeling | Hiroki Ouchi, Hiroyuki Shindo, Yuji Matsumoto | We present a simple and accurate span-based model for semantic role labeling (SRL). |
192 | Mapping Language to Code in Programmatic Context | Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Luke Zettlemoyer | To study this phenomenon, we introduce the task of generating class member functions given English documentation and the programmatic context provided by the rest of the class. |
193 | SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task | Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li, Dragomir Radev | In this paper we propose SyntaxSQLNet, a syntax tree network to address the complex and cross-domain text-to-SQL generation task. |
194 | Cross-lingual Decompositional Semantic Parsing | Sheng Zhang, Xutai Ma, Rachel Rudinger, Kevin Duh, Benjamin Van Durme | We introduce the task of cross-lingual decompositional semantic parsing: mapping content provided in a source language into a decompositional semantic analysis based on a target language. |
195 | Learning to Learn Semantic Parsers from Natural Language Supervision | Igor Labutov, Bishan Yang, Tom Mitchell | Inspired by this observation, we propose a learning algorithm for training semantic parsers from supervision (feedback) expressed in natural language. We construct a novel dataset of natural language feedback in a conversational setting, and show that our method is effective at learning a semantic parser from such natural language supervision. |
196 | DeepCx: A transition-based approach for shallow semantic parsing with complex constructional triggers | Jesse Dunietz, Jaime Carbonell, Lori Levin | This paper introduces the surface construction labeling (SCL) task, which expands the coverage of Shallow Semantic Parsing (SSP) to include frames triggered by complex constructions. |
197 | What It Takes to Achieve 100% Condition Accuracy on WikiSQL | Semih Yavuz, Izzeddin Gur, Yu Su, Xifeng Yan | In this paper, we ask two questions, “Why is the accuracy still low for such simple queries?” |
198 | Better Transition-Based AMR Parsing with a Refined Search Space | Zhijiang Guo, Wei Lu | This paper introduces a simple yet effective transition-based system for Abstract Meaning Representation (AMR) parsing. |
199 | Heuristically Informed Unsupervised Idiom Usage Recognition | Changsheng Liu, Rebecca Hwa | This paper proposes an unsupervised learning method for recognizing the intended usages of idioms. |
200 | Coming to Your Senses: on Controls and Evaluation Sets in Polysemy Research | Haim Dubossarsky, Eitan Grossman, Daphna Weinshall | Theoretical analysis shows that this may on its own be beneficial for the estimation of word similarity, by reducing the bias in the estimation of the cosine distance. Furthermore, we provide empirical data and an analytic discussion that may account for the previously reported improved performance. |
201 | Predicting Semantic Relations using Global Graph Properties | Yuval Pinter, Jacob Eisenstein | In this paper, we combine global and local properties of semantic graphs through the framework of Max-Margin Markov Graph Models (M3GM), a novel extension of Exponential Random Graph Model (ERGM) that scales to large multi-relational graphs. |
202 | Learning Scalar Adjective Intensity from Paraphrases | Anne Cocos, Veronica Wharton, Ellie Pavlick, Marianna Apidianaki, Chris Callison-Burch | We propose a new paraphrase-based method to automatically learn the relative intensity relation that holds between a pair of scalar adjectives. |
203 | Pointwise HSIC: A Linear-Time Kernelized Co-occurrence Norm for Sparse Linguistic Expressions | Sho Yokoi, Sosuke Kobayashi, Kenji Fukumizu, Jun Suzuki, Kentaro Inui | In this paper, we propose a new kernel-based co-occurrence measure that can be applied to sparse linguistic expressions (e.g., sentences) with a very short learning time, as an alternative to pointwise mutual information (PMI). |
204 | Neural Related Work Summarization with a Joint Context-driven Attention Mechanism | Yongzhen Wang, Xiaozhong Liu, Zheng Gao | In this paper, we develop a neural data-driven summarizer by leveraging the seq2seq paradigm, in which a joint context-driven attention mechanism is proposed to measure the contextual relevance within full texts and a heterogeneous bibliography graph simultaneously. |
205 | Improving Neural Abstractive Document Summarization with Explicit Information Selection Modeling | Wei Li, Xinyan Xiao, Yajuan Lyu, Yuanzhuo Wang | In this paper, we propose to extend the basic neural encoding-decoding framework with an information selection layer to explicitly model and optimize the information selection process in abstractive document summarization. |
206 | Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization | Shashi Narayan, Shay B. Cohen, Mirella Lapata | We propose a novel abstractive model which is conditioned on the article’s topics and based entirely on convolutional neural networks. |
207 | Improving Abstraction in Text Summarization | Wojciech Kryściński, Romain Paulus, Caiming Xiong, Richard Socher | We propose two techniques to improve the level of abstraction of generated summaries. |
208 | Content Selection in Deep Learning Models of Summarization | Chris Kedzie, Kathleen McKeown, Hal Daumé III | We find that many sophisticated features of state of the art extractive summarizers do not improve performance over simpler models. |
209 | Improved Semantic-Aware Network Embedding with Fine-Grained Word Alignment | Dinghan Shen, Xinyuan Zhang, Ricardo Henao, Lawrence Carin | In this paper, we propose to incorporate semantic features into network embeddings by matching important words between text sequences for all pairs of vertices. |
210 | Learning Context-Sensitive Convolutional Filters for Text Processing | Dinghan Shen, Martin Renqiang Min, Yitong Li, Lawrence Carin | In this paper, we consider an approach of using a small meta network to learn context-sensitive convolutional filters for text processing. |
211 | Deep Relevance Ranking Using Enhanced Document-Query Interactions | Ryan McDonald, George Brokos, Ion Androutsopoulos | We explore several new models for document relevance ranking, building upon the Deep Relevance Matching Model (DRMM) of Guo et al. (2016). |
212 | Learning Neural Representation for CLIR with Adversarial Framework | Bo Li, Ping Cheng | In this paper, we follow the success of neural representation in natural language processing (NLP) and develop a novel text representation model based on adversarial learning, which seeks a task-specific embedding space for CLIR. |
213 | AD3: Attentive Deep Document Dater | Swayambhu Nath Ray, Shib Sankar Dasgupta, Partha Talukdar | In this paper, we propose Attentive Deep Document Dater (AD3), an attention-based neural document dating system which utilizes both context and temporal information in documents in a flexible and principled manner. |
214 | Gromov-Wasserstein Alignment of Word Embedding Spaces | David Alvarez-Melis, Tommi Jaakkola | In this paper, we cast the correspondence problem directly as an optimal transport (OT) problem, building on the idea that word embeddings arise from metric recovery algorithms. |
215 | Deep Probabilistic Logic: A Unifying Framework for Indirect Supervision | Hai Wang, Hoifung Poon | In this paper, we propose deep probabilistic logic (DPL) as a general framework for indirect supervision, by composing probabilistic logic with deep learning. |
216 | Deriving Machine Attention from Human Rationales | Yujia Bao, Shiyu Chang, Mo Yu, Regina Barzilay | In this paper, we demonstrate that even in the low-resource scenario, attention can be learned effectively. |
217 | Semi-Supervised Sequence Modeling with Cross-View Training | Kevin Clark, Minh-Thang Luong, Christopher D. Manning, Quoc Le | We therefore propose Cross-View Training (CVT), a semi-supervised learning algorithm that improves the representations of a Bi-LSTM sentence encoder using a mix of labeled and unlabeled data. |
218 | A Probabilistic Annotation Model for Crowdsourcing Coreference | Silviu Paun, Jon Chamberlain, Udo Kruschwitz, Juntao Yu, Massimo Poesio | This paper addresses one crucial hurdle on the way to make this possible, by introducing a new model of annotation for aggregating crowdsourced anaphoric annotations. |
219 | A Deterministic Algorithm for Bridging Anaphora Resolution | Yufang Hou | In this paper, we create new word vectors by combining embeddings{\_}PP with GloVe. |
220 | A Knowledge Hunting Framework for Common Sense Reasoning | Ali Emami, Noelia De La Cruz, Adam Trischler, Kaheer Suleman, Jackie Chi Kit Cheung | We introduce an automatic system that achieves state-of-the-art results on the Winograd Schema Challenge (WSC), a common sense reasoning task that requires diverse, complex forms of inference and knowledge. |
221 | Mapping Text to Knowledge Graph Entities using Multi-Sense LSTMs | Dimitri Kartsaklis, Mohammad Taher Pilehvar, Nigel Collier | The ideas of this work are demonstrated on large-scale text-to-entity mapping and entity classification tasks, with state of the art results. |
222 | Differentiating Concepts and Instances for Knowledge Graph Embedding | Xin Lv, Lei Hou, Juanzi Li, Zhiyuan Liu | In this paper, we propose a novel knowledge graph embedding model named TransC by differentiating concepts and instances. |
223 | One-Shot Relational Learning for Knowledge Graphs | Wenhan Xiong, Mo Yu, Shiyu Chang, Xiaoxiao Guo, William Yang Wang | In this work, we aim at predicting new facts under a challenging setting where only one training instance is available. |
224 | Regular Expression Guided Entity Mention Mining from Noisy Web Data | Shanshan Zhang, Lihong He, Slobodan Vucetic, Eduard Dragut | Rather than abandoning REs as a go-to approach for entity detection, this paper explores ways to combine the expressive power of REs, ability of deep learning to learn from large data, and human-in-the loop approach into a new integrated framework for entity identification from web data. |
225 | HyTE: Hyperplane-based Temporally aware Knowledge Graph Embedding | Shib Sankar Dasgupta, Swayambhu Nath Ray, Partha Talukdar | In this paper, we propose HyTE, a temporally aware KG embedding method which explicitly incorporates time in the entity-relation space by associating each timestamp with a corresponding hyperplane. |
226 | Neural Adaptation Layers for Cross-domain Named Entity Recognition | Bill Yuchen Lin, Wei Lu | In this paper, we empirically investigate effective methods for conveniently adapting an existing, well-trained neural NER model for a new domain. |
227 | Entity Linking within a Social Media Platform: A Case Study on Yelp | Hongliang Dai, Yangqiu Song, Liwei Qiu, Rijia Liu | In this paper, we study a new entity linking problem where both the entity mentions and the target entities are within a same social media platform. |
228 | Annotation of a Large Clinical Entity Corpus | Pinal Patel, Disha Davey, Vishal Panchal, Parth Pathak | In this paper, we have described in detail the annotation guidelines, annotation process and our approaches in creating a CER (clinical entity recognition) corpus of 5,160 clinical documents from forty different clinical specialities. |
229 | Visual Supervision in Bootstrapped Information Extraction | Matthew Berger, Ajay Nagesh, Joshua Levine, Mihai Surdeanu, Helen Zhang | To enable effective data annotation in a scatterplot, we have developed an embedding-based bootstrapping model that learns the distributional similarity of entities through the patterns that match them in a large data corpus, while being discriminative with respect to human-labeled and machine-promoted entities. |
230 | Learning Named Entity Tagger using Domain-Specific Dictionary | Jingbo Shang, Liyuan Liu, Xiaotao Gu, Xiang Ren, Teng Ren, Jiawei Han | Here we propose two neural models to suit noisy distant supervision from the dictionary. |
231 | Zero-Shot Open Entity Typing as Type-Compatible Grounding | Ben Zhou, Daniel Khashabi, Chen-Tse Tsai, Dan Roth | In this work we propose a zero-shot entity typing approach that requires no annotated data and can flexibly identify newly defined types. |
232 | Attention-Guided Answer Distillation for Machine Reading Comprehension | Minghao Hu, Yuxing Peng, Furu Wei, Zhen Huang, Dongsheng Li, Nan Yang, Ming Zhou | This paper tackles these problems by leveraging knowledge distillation, which aims to transfer knowledge from an ensemble model to a single model. |
233 | Interpretation of Natural Language Rules in Conversational Machine Reading | Marzieh Saeidi, Max Bartolo, Patrick Lewis, Sameer Singh, Tim Rocktäschel, Mike Sheldon, Guillaume Bouchard, Sebastian Riedel | In this paper, we formalise this task and develop a crowd-sourcing strategy to collect 37k task instances based on real-world rules and crowd-generated questions and scenarios. |
234 | A State-transition Framework to Answer Complex Questions over Knowledge Base | Sen Hu, Lei Zou, Xinbo Zhang | To address that, in this paper, we propose a State Transition-based approach to translate a complex natural language question N to a semantic query graph (SQG), which is used to match the underlying knowledge graph to find the answers to question N. |
235 | A Multi-answer Multi-task Framework for Real-world Machine Reading Comprehension | Jiahua Liu, Wan Wei, Maosong Sun, Hao Chen, Yantao Du, Dekang Lin | We propose a multi-answer multi-task framework, in which different loss functions are used for multiple reference answers. |
236 | Logician and Orator: Learning from the Duality between Language and Knowledge in Open Domain | Mingming Sun, Xu Li, Ping Li | We propose the task of Open-Domain Information Narration (OIN) as the reverse task of Open Information Extraction (OIE), to implement the dual structure between language and knowledge in the open domain. |
237 | MemoReader: Large-Scale Reading Comprehension through Neural Memory Controller | Seohyun Back, Seunghak Yu, Sathish Reddy Indurthi, Jihie Kim, Jaegul Choo | In this paper, we propose a novel deep neural network architecture to handle a long-range dependency in RC tasks. |
238 | Multi-Granular Sequence Encoding via Dilated Compositional Units for Reading Comprehension | Yi Tay, Anh Tuan Luu, Siu Cheung Hui | This paper presents a new compositional encoder for reading comprehension (RC). |
239 | Neural Compositional Denotational Semantics for Question Answering | Nitish Gupta, Mike Lewis | We introduce an end-to-end differentiable model for interpreting questions about a knowledge graph (KG), which is inspired by formal approaches to semantics. |
240 | Cross-Pair Text Representations for Answer Sentence Selection | Kateryna Tymoshenko, Alessandro Moschitti | In this paper, we compute scalar products between vectors representing similarity between members of different pairs, in place of simply using a single vector for each pair. |
241 | QuAC: Question Answering in Context | Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, Luke Zettlemoyer | We present QuAC, a dataset for Question Answering in Context that contains 14K information-seeking QA dialogs (100K questions in total). |
242 | Knowledge Base Question Answering via Encoding of Complex Query Graphs | Kangqi Luo, Fengli Lin, Xusheng Luo, Kenny Zhu | In this work, we encode such complex query structure into a uniform vector representation, and thus successfully capture the interactions between individual semantic components within a complex question. |
243 | Neural Relation Extraction via Inner-Sentence Noise Reduction and Transfer Learning | Tianyi Liu, Xinsong Zhang, Wanhao Zhou, Weijia Jia | To mitigate this problem, we propose a novel word-level distant supervised approach for relation extraction. |
244 | Graph Convolution over Pruned Dependency Trees Improves Relation Extraction | Yuhao Zhang, Peng Qi, Christopher D. Manning | We propose an extension of graph convolutional networks that is tailored for relation extraction, which pools information over arbitrary dependency structures efficiently in parallel. |
245 | Multi-Level Structured Self-Attentions for Distantly Supervised Relation Extraction | Jinhua Du, Jingguang Han, Andy Way, Dadong Wan | To alleviate this issue, we propose a novel multi-level structured (2-D matrix) self-attention mechanism for DS-RE in a multi-instance learning (MIL) framework using bidirectional recurrent neural networks (BiRNN). |
246 | N-ary Relation Extraction using Graph-State LSTM | Linfeng Song, Yue Zhang, Zhiguo Wang, Daniel Gildea | We propose a graph-state LSTM model, which uses a parallel state to model each word, recurrently enriching state values via message passing. |
247 | Hierarchical Relation Extraction with Coarse-to-Fine Grained Attention | Xu Han, Pengfei Yu, Zhiyuan Liu, Maosong Sun, Peng Li | In this paper, we aim to incorporate the hierarchical information of relations for distantly supervised relation extraction and propose a novel hierarchical attention scheme. |
248 | Label-Free Distant Supervision for Relation Extraction via Knowledge Graph Embedding | Guanying Wang, Wen Zhang, Ruoxu Wang, Yalin Zhou, Xi Chen, Wei Zhang, Hai Zhu, Huajun Chen | This paper proposes a label-free distant supervision method, which makes no use of the relation labels under this inadequate assumption, but only uses the prior knowledge derived from the KG to supervise the learning of the classifier directly and softly. |
249 | Extracting Entities and Relations with Joint Minimum Risk Training | Changzhi Sun, Yuanbin Wu, Man Lan, Shiliang Sun, Wenting Wang, Kuang-Chih Lee, Kewen Wu | Unlike prior efforts, we propose a new lightweight joint learning paradigm based on minimum risk training (MRT). |
250 | Large-scale Exploration of Neural Relation Classification Architectures | Hoang-Quynh Le, Duy-Cat Can, Sinh T. Vu, Thanh Hai Dang, Mohammad Taher Pilehvar, Nigel Collier | In this work, we present a systematic large-scale analysis of neural relation classification architectures on six benchmark datasets with widely varying characteristics. |
251 | Possessors Change Over Time: A Case Study with Artworks | Dhivya Chinnappa, Eduardo Blanco | This paper presents a corpus and experimental results to extract possession relations over time. |
252 | Using Lexical Alignment and Referring Ability to Address Data Sparsity in Situated Dialog Reference Resolution | Todd Shore, Gabriel Skantze | Using Lexical Alignment and Referring Ability to Address Data Sparsity in Situated Dialog Reference Resolution |
253 | Subgoal Discovery for Hierarchical Dialogue Policy Learning | Da Tang, Xiujun Li, Jianfeng Gao, Chong Wang, Lihong Li, Tony Jebara | In this paper, we propose a divide-and-conquer approach that discovers and exploits the hidden structure of the task to enable efficient policy learning. |
254 | Supervised Clustering of Questions into Intents for Dialog System Applications | Iryna Haponchyk, Antonio Uva, Seunghak Yu, Olga Uryupina, Alessandro Moschitti | In this paper, we propose a model for automatically clustering questions into user intents to help the design tasks. |
255 | Towards Exploiting Background Knowledge for Building Conversation Systems | Nikita Moghe, Siddhartha Arora, Suman Banerjee, Mitesh M. Khapra | To facilitate the development of such natural conversation models which mimic the human process of conversing, we create a new dataset containing movie chats wherein each response is explicitly generated by copying and/or modifying sentences from unstructured background knowledge such as plots, comments and reviews about the movie. |
256 | Decoupling Strategy and Generation in Negotiation Dialogues | He He, Derek Chen, Anusha Balakrishnan, Percy Liang | In this paper, we propose a modular approach based on coarse dialogue acts (e.g., propose(price=50)) that decouples strategy and generation. We test our approach on the recently proposed DEALORNODEAL game, and we also collect a richer dataset based on real items on Craigslist. |
257 | Large-scale Cloze Test Dataset Created by Teachers | Qizhe Xie, Guokun Lai, Zihang Dai, Eduard Hovy | In this paper, we propose the first large-scale human-created cloze test dataset CLOTH, containing questions used in middle-school and high-school language exams. |
258 | emrQA: A Large Corpus for Question Answering on Electronic Medical Records | Anusri Pampari, Preethi Raghavan, Jennifer Liang, Jian Peng | We propose a novel methodology to generate domain-specific large-scale question answering (QA) datasets by re-purposing existing annotations for other NLP tasks. |
259 | HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering | Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, Christopher D. Manning | We introduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowing QA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems’ ability to extract relevant facts and perform necessary comparison. |
260 | Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering | Todor Mihaylov, Peter Clark, Tushar Khot, Ashish Sabharwal | We present a new kind of question answering dataset, OpenBookQA, modeled after open book exams for assessing human understanding of a subject. |
261 | Evaluating Theory of Mind in Question Answering | Aida Nematzadeh, Kaylee Burns, Erin Grant, Alison Gopnik, Tom Griffiths | We propose a new dataset for evaluating question answering models with respect to their capacity to reason about beliefs. |
262 | A Unified Syntax-aware Framework for Semantic Role Labeling | Zuchao Li, Shexia He, Jiaxun Cai, Zhuosheng Zhang, Hai Zhao, Gongshen Liu, Linlin Li, Luo Si | To comprehensively explore the role of syntax for SRL task, we extend existing models and propose a unified framework to investigate more effective and more diverse ways of incorporating syntax into sequential neural networks. |
263 | Semantics as a Foreign Language | Gabriel Stanovsky, Ido Dagan | We propose a novel approach to semantic dependency parsing (SDP) by casting the task as an instance of multi-lingual machine translation, where each semantic representation is a different foreign dialect. |
264 | An AMR Aligner Tuned by Transition-based Parser | Yijia Liu, Wanxiang Che, Bo Zheng, Bing Qin, Ting Liu | In this paper, we propose a new rich resource enhanced AMR aligner which produces multiple alignments and a new transition system for AMR parsing along with its oracle parser. |
265 | Dependency-based Hybrid Trees for Semantic Parsing | Zhanming Jie, Wei Lu | We propose a novel dependency-based hybrid tree model for semantic parsing, which converts natural language utterance into machine interpretable meaning representations. |
266 | Policy Shaping and Generalized Update Equations for Semantic Parsing from Denotations | Dipendra Misra, Ming-Wei Chang, Xiaodong He, Wen-tau Yih | We propose effective and general solutions to each of them. |
267 | Sentence Compression for Arbitrary Languages via Multilingual Pivoting | Jonathan Mallinson, Rico Sennrich, Mirella Lapata | In this paper we advocate the use of bilingual corpora which are abundantly available for training sentence compression models. |
268 | Unsupervised Cross-lingual Transfer of Word Embedding Spaces | Ruochen Xu, Yiming Yang, Naoki Otani, Yuexin Wu | This paper proposes an unsupervised learning approach that does not require any cross-lingual labeled data. |
269 | XNLI: Evaluating Cross-lingual Sentence Representations | Alexis Conneau, Ruty Rinott, Guillaume Lample, Adina Williams, Samuel Bowman, Holger Schwenk, Veselin Stoyanov | In this work, we construct an evaluation set for XLU by extending the development and test sets of the Multi-Genre Natural Language Inference Corpus (MultiNLI) to 14 languages, including low-resource languages such as Swahili and Urdu. |
270 | Joint Multilingual Supervision for Cross-lingual Entity Linking | Shyam Upadhyay, Nitish Gupta, Dan Roth | We address this challenge by developing the first XEL approach that combines supervision from multiple languages jointly. |
271 | Fine-grained Coordinated Cross-lingual Text Stream Alignment for Endless Language Knowledge Acquisition | Tao Ge, Qing Dou, Heng Ji, Lei Cui, Baobao Chang, Zhifang Sui, Furu Wei, Ming Zhou | This paper proposes to study fine-grained coordinated cross-lingual text stream alignment through a novel information network decipherment paradigm. |
272 | WECA: A WordNet-Encoded Collocation-Attention Network for Homographic Pun Recognition | Yufeng Diao, Hongfei Lin, Di Wu, Liang Yang, Kan Xu, Zhihao Yang, Jian Wang, Shaowu Zhang, Bo Xu, Dongyu Zhang | We show the effectiveness of the model to present the capability of choosing qualitatively informative words. |
273 | A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check | Dingmin Wang, Yan Song, Jing Li, Jialong Han, Haisong Zhang | In this paper, we propose a novel approach of constructing CSC corpus with automatically generated spelling errors, which are either visually or phonologically resembled characters, corresponding to the OCR- and ASR-based methods, respectively. |
274 | Neural Quality Estimation of Grammatical Error Correction | Shamil Chollampatt, Hwee Tou Ng | We propose the first neural approach to automatic quality estimation of GEC output sentences that does not employ any hand-crafted features. |
275 | Transferring from Formal Newswire Domain with Hypernet for Twitter POS Tagging | Tao Gui, Qi Zhang, Jingjing Gong, Minlong Peng, Di Liang, Keyu Ding, Xuanjing Huang | To achieve this task, in this work, we propose a hypernetwork-based method to generate different parameters to separately model contexts with different expression styles. |
276 | Free as in Free Word Order: An Energy Based Model for Word Segmentation and Morphological Tagging in Sanskrit | Amrith Krishna, Bishal Santra, Sasi Prasanth Bandaru, Gaurav Sahu, Vishnu Dutt Sharma, Pavankumar Satuluri, Pawan Goyal | We propose a structured prediction framework that jointly solves the word segmentation and morphological tagging tasks in Sanskrit. |
277 | A Challenge Set and Methods for Noun-Verb Ambiguity | Ali Elkahky, Kellie Webster, Daniel Andor, Emily Pitler | This paper creates a new dataset of over 30,000 naturally-occurring non-trivial examples of noun-verb ambiguity. |
278 | What do character-level models learn about morphology? The case of dependency parsing | Clara Vania, Andreas Grivas, Adam Lopez | When parsing morphologically-rich languages with neural models, it is beneficial to model input at the character level, and it has been claimed that this is because character-level models learn morphology. |
279 | Learning Better Internal Structure of Words for Sequence Labeling | Yingwei Xin, Ethan Hart, Vibhuti Mahajan, Jean-David Ruvini | We evaluate our proposed model on six sequence labeling datasets, including named entity recognition, part-of-speech tagging, and syntactic chunking. |
280 | ICON: Interactive Conversational Memory Network for Multimodal Emotion Detection | Devamanyu Hazarika, Soujanya Poria, Rada Mihalcea, Erik Cambria, Roger Zimmermann | To this end, we propose Interactive COnversational memory Network (ICON), a multimodal emotion detection framework that extracts multimodal features from conversational videos and hierarchically models the self- and inter-speaker emotional influences into global memories. |
281 | Discriminative Learning of Open-Vocabulary Object Retrieval and Localization by Negative Phrase Augmentation | Ryota Hinami, Shin’ichi Satoh | In this paper, we address the problem of open-vocabulary object retrieval and localization, where the target object is specified by a textual query (e.g., a word or phrase). |
282 | Grounding Semantic Roles in Images | Carina Silberer, Manfred Pinkal | We render candidate participants as image regions of objects, and train a model which learns to ground roles in the regions which depict the corresponding participant. |
283 | Commonsense Justification for Action Explanation | Shaohua Yang, Qiaozi Gao, Sari Sadiya, Joyce Chai | In particular, we have developed an approach based on the generative Conditional Variational Autoencoder(CVAE) that models object relations/attributes of the world as latent variables and jointly learns a performer that predicts actions and an explainer that gathers commonsense evidence to justify the action. |
284 | Learning Personas from Dialogue with Attentive Memory Networks | Eric Chu, Prashanth Vijayaraghavan, Deb Roy | We introduce neural models to learn persona embeddings in a supervised character trope classification task. |
285 | Grounding language acquisition by training semantic parsers using captioned videos | Candace Ross, Andrei Barbu, Yevgeni Berzak, Battushig Myanganbayar, Boris Katz | We develop a semantic parser that is trained in a grounded setting using pairs of videos captioned with sentences. For this task, we collected a new dataset for grounded language acquisition. |
286 | Translating Navigation Instructions in Natural Language to a High-Level Plan for Behavioral Robot Navigation | Xiaoxue Zang, Ashwini Pokle, Marynel Vázquez, Kevin Chen, Juan Carlos Niebles, Alvaro Soto, Silvio Savarese | We propose an end-to-end deep learning model for translating free-form natural language instructions to a high-level plan for behavioral robot navigation. |
287 | Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction | Dipendra Misra, Andrew Bennett, Valts Blukis, Eyvind Niklasson, Max Shatkhin, Yoav Artzi | We propose to decompose instruction execution to goal prediction and action generation. To evaluate our approach, we introduce two benchmarks for instruction following: LANI, a navigation task; and CHAI, where an agent executes household instructions. |
288 | Deconvolutional Time Series Regression: A Technique for Modeling Temporally Diffuse Effects | Cory Shain, William Schuler | This paper proposes a new statistical model that borrows from digital signal processing by recasting the predictors and response as convolutionally-related signals, using recent advances in machine learning to fit latent impulse response functions (IRFs) of arbitrary shape. |
289 | Is this Sentence Difficult? Do you Agree? | Dominique Brunato, Lorenzo De Mattei, Felice Dell’Orletta, Benedetta Iavarone, Giulia Venturi | In this paper, we present a crowdsourcing-based approach to model the human perception of sentence complexity. We collect a large corpus of sentences rated with judgments of complexity for two typologically-different languages, Italian and English. |
290 | Neural Transition Based Parsing of Web Queries: An Entity Based Approach | Rivka Malca, Roi Reichart | We propose a new BiLSTM query parser that: (1) Explicitly accounts for the unique grammar of web queries; and (2) Utilizes named entity (NE) information from a BiLSTM NE tagger, that can be jointly trained with the parser. |
291 | An Investigation of the Interactions Between Pre-Trained Word Embeddings, Character Models and POS Tags in Dependency Parsing | Aaron Smith, Miryam de Lhoneux, Sara Stymne, Joakim Nivre | We provide a comprehensive analysis of the interactions between pre-trained word embeddings, character models and POS tags in a transition-based dependency parser. |
292 | Depth-bounding is effective: Improvements and evaluation of unsupervised PCFG induction | Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz | Modern depth-bounded grammar inducers have been shown to be more accurate than early unbounded PCFG inducers, but this technique has never been compared against unbounded induction within the same system, in part because most previous depth-bounding models are built around sequence models, the complexity of which grows exponentially with the maximum allowed depth. |
293 | Incremental Computation of Infix Probabilities for Probabilistic Finite Automata | Marco Cognetta, Yo-Sub Han, Soon Chan Kwon | We tackle this problem and suggest a method that computes infix probabilities incrementally for probabilistic finite automata by representing all the probabilities of matching strings as a series of transition matrix calculations. |
294 | Syntax Encoding with Application in Authorship Attribution | Richong Zhang, Zhiyuan Hu, Hongyu Guo, Yongyi Mao | We propose a novel strategy to encode the syntax parse tree of sentence into a learnable distributed representation. |
295 | Sanskrit Word Segmentation Using Character-level Recurrent and Convolutional Neural Networks | Oliver Hellwig, Sebastian Nehrdich | The paper introduces end-to-end neural network models that tokenize Sanskrit by jointly splitting compounds and resolving phonetic merges (Sandhi). |
296 | Session-level Language Modeling for Conversational Speech | Wayne Xiong, Lingfeng Wu, Jun Zhang, Andreas Stolcke | We propose to generalize language models for conversational speech recognition to allow them to operate across utterance boundaries and speaker changes, thereby capturing conversation-level phenomena such as adjacency pairs, lexical entrainment, and topical coherence. |
297 | Towards Less Generic Responses in Neural Conversation Models: A Statistical Re-weighting Method | Yahui Liu, Wei Bi, Jun Gao, Xiaojiang Liu, Jian Yao, Shuming Shi | Inspired by this observation, we introduce a statistical re-weighting method that assigns different weights for the multiple responses of the same query, and trains the common neural generation model with the weights. |
298 | Training Millions of Personalized Dialogue Agents | Pierre-Emmanuel Mazaré, Samuel Humeau, Martin Raison, Antoine Bordes | In this paper we introduce a new dataset providing 5 million personas and 700 million persona-based dialogues. |
299 | Towards Universal Dialogue State Tracking | Liliang Ren, Kaige Xie, Lu Chen, Kai Yu | To tackle these challenges, we propose StateNet, a universal dialogue state tracker. |
300 | Semantic Parsing for Task Oriented Dialog using Hierarchical Representations | Sonal Gupta, Rushin Shah, Mrinal Mohit, Anuj Kumar, Mike Lewis | We propose a hierarchical annotation scheme for semantic parsing that allows the representation of compositional queries, and can be efficiently and accurately parsed by standard constituency parsing models. We release a dataset of 44k annotated queries (\url{http://fb.me/semanticparsingdialog}), and show that parsing models outperform sequence-to-sequence approaches on this dataset. |
301 | The glass ceiling in NLP | Natalie Schluter | In this paper, we provide empirical evidence based on a rigourously studied mathematical model for bi-populated networks, that a glass ceiling within the field of NLP has developed since the mid 2000s. |
302 | Reducing Gender Bias in Abusive Language Detection | Ji Ho Park, Jamin Shin, Pascale Fung | In this work, we measure them on models trained with different datasets, while analyzing the effect of different pre-trained word embeddings and model architectures. |
303 | SafeCity: Understanding Diverse Forms of Sexual Harassment Personal Stories | Sweta Karlekar, Mohit Bansal | In order to push forward the fight against such harassment and abuse, we present the task of automatically categorizing and analyzing various forms of sexual harassment, based on stories shared on the online forum SafeCity. |
304 | Learning multiview embeddings for assessing dementia | Chloé Pou-Prom, Frank Rudzicz | In this work, we leverage the multiview nature of a small AD dataset, DementiaBank, to learn an embedding that captures different modes of cognitive impairment. |
305 | WikiConv: A Corpus of the Complete Conversational History of a Large Online Collaborative Community | Yiqing Hua, Cristian Danescu-Niculescu-Mizil, Dario Taraborelli, Nithum Thain, Jeffery Sorensen, Lucas Dixon | We present a corpus that encompasses the complete history of conversations between contributors to Wikipedia, one of the largest online collaborative communities. |
306 | Marginal Likelihood Training of BiLSTM-CRF for Biomedical Named Entity Recognition from Disjoint Label Sets | Nathan Greenberg, Trapit Bansal, Patrick Verga, Andrew McCallum | This paper presents a method for training a single CRF extractor from multiple datasets with disjoint or partially overlapping sets of entity types. |
307 | Adversarial training for multi-context joint entity and relation extraction | Giannis Bekoulis, Johannes Deleu, Thomas Demeester, Chris Develder | We show how to use AT for the tasks of entity recognition and relation extraction. |
308 | Structured Multi-Label Biomedical Text Tagging via Attentive Neural Tree Decoding | Gaurav Singh, James Thomas, Iain Marshall, John Shawe-Taylor, Byron C. Wallace | We propose a model for tagging unstructured texts with an arbitrary number of terms drawn from a tree-structured vocabulary (i.e., an ontology). |
309 | Deep Exhaustive Model for Nested Named Entity Recognition | Mohammad Golam Sohrab, Makoto Miwa | We propose a simple deep neural model for nested named entity recognition (NER). |
310 | Evaluating the Utility of Hand-crafted Features in Sequence Labelling | Minghao Wu, Fei Liu, Trevor Cohn | In this work, we test this claim by proposing a new method for exploiting handcrafted features as part of a novel hybrid learning approach, incorporating a feature auto-encoder loss component. |
311 | Improved Dependency Parsing using Implicit Word Connections Learned from Unlabeled Data | Wenhui Wang, Baobao Chang, Mairgup Mansur | In this paper, we propose to implicitly capture word connections from unlabeled data by a word ordering model with self-attention mechanism. |
312 | A Framework for Understanding the Role of Morphology in Universal Dependency Parsing | Mathieu Dehouck, Pascal Denis | This paper presents a simple framework for characterizing morphological complexity and how it encodes syntactic information. |
313 | The Lazy Encoder: A Fine-Grained Analysis of the Role of Morphology in Neural Machine Translation | Arianna Bisazza, Clara Tump | To shed more light into the role played by linguistic structure in the process of neural machine translation, we perform a fine-grained analysis of how various source-side morphological features are captured at different levels of the NMT encoder while varying the target language. |
314 | Imitation Learning for Neural Morphological String Transduction | Peter Makarov, Simon Clematide | We employ imitation learning to train a neural transition-based string transducer for morphological tasks such as inflection generation and lemmatization. |
315 | An Encoder-Decoder Approach to the Paradigm Cell Filling Problem | Miikka Silfverberg, Mans Hulden | We also publish a new dataset for this task and code implementing the system described in this paper. |
316 | Generating Natural Language Adversarial Examples | Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, Kai-Wei Chang | Given these challenges, we use a black-box population-based optimization algorithm to generate semantically and syntactically similar adversarial examples that fool well-trained sentiment analysis and textual entailment models with success rates of 97% and 70%, respectively. |
317 | Multi-Head Attention with Disagreement Regularization | Jian Li, Zhaopeng Tu, Baosong Yang, Michael R. Lyu, Tong Zhang | In this work, we introduce a disagreement regularization to explicitly encourage the diversity among multiple attention heads. |
318 | Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study | Aditya Siddhant, Zachary C. Lipton | Thus, given a new task, we have no opportunity to compare models and acquisition functions. |
319 | Bayesian Compression for Natural Language Processing | Nadezhda Chirkova, Ekaterina Lobacheva, Dmitry Vetrov | We propose a Bayesian sparsification technique for RNNs which allows compressing the RNN dozens or hundreds of times without time-consuming hyperparameters tuning. |
320 | Multimodal neural pronunciation modeling for spoken languages with logographic origin | Minh Nguyen, Gia H. Ngo, Nancy Chen | In this work, we propose a multimodal approach to predict the pronunciation of Cantonese logographic characters, using neural networks with a geometric representation of logographs and pronunciation of cognates in historically related languages. |
321 | Chinese Pinyin Aided IME, Input What You Have Not Keystroked Yet | Yafang Huang, Hai Zhao | This paper thus for the first time introduces a sequence-to-sequence model with gated-attention mechanism for the core task in IMEs. |
322 | Estimating Marginal Probabilities of n-grams for Recurrent Neural Language Models | Thanapon Noraset, Doug Downey, Lidong Bing | We introduce a simple method of altering the RNNLM training to make the model more accurate at marginal estimation. |
323 | How to represent a word and predict it, too: Improving tied architectures for language modelling | Kristina Gulordava, Laura Aina, Gemma Boleda | We propose a simple modification to these architectures that decouples the hidden state from the word embedding prediction. |
324 | The Importance of Generation Order in Language Modeling | Nicolas Ford, Daniel Duckworth, Mohammad Norouzi, George Dahl | This paper studies the influence of token generation order on model quality via a novel two-pass language model that produces partially-filled sentence “templates” and then fills in missing tokens. |
325 | Document-Level Neural Machine Translation with Hierarchical Attention Networks | Lesly Miculicich, Dhananjay Ram, Nikolaos Pappas, James Henderson | For this purpose, we propose a hierarchical attention model to capture the context in a structured and dynamic manner. |
326 | Three Strategies to Improve One-to-Many Multilingual Translation | Yining Wang, Jiajun Zhang, Feifei Zhai, Jingfang Xu, Chengqing Zong | In this work, we introduce three strategies to improve one-to-many multilingual translation by balancing the shared and unique features. |
327 | Multi-Source Syntactic Neural Machine Translation | Anna Currey, Kenneth Heafield | We introduce a novel multi-source technique for incorporating source syntax into neural machine translation using linearized parses. |
328 | Fixing Translation Divergences in Parallel Corpora for Neural MT | MinhQuang Pham, Josep Crego, Jean Senellart, François Yvon | This paper describes an unsupervised method for detecting translation divergences in parallel sentences. |
329 | Adversarial Evaluation of Multimodal Machine Translation | Desmond Elliott | We present an adversarial evaluation to directly examine the utility of the image data in this task. |
330 | Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion | Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Hervé Jégou, Edouard Grave | In this paper, we propose an unified formulation that directly optimizes a retrieval criterion in an end-to-end fashion. |
331 | Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation | Junyang Lin, Xu Sun, Xuancheng Ren, Muyu Li, Qi Su | Therefore, we propose a new model with a mechanism called Self-Adaptive Control of Temperature (SACT) to control the softness of attention by means of an attention temperature. |
332 | Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation | Nikolay Bogoychev, Kenneth Heafield, Alham Fikri Aji, Marcin Junczys-Dowmunt | In order to achieve further speedup we introduce a technique that delays gradient updates effectively increasing the mini-batch size. |
333 | Learning to Jointly Translate and Predict Dropped Pronouns with a Shared Reconstruction Mechanism | Longyue Wang, Zhaopeng Tu, Andy Way, Qun Liu | In this work, we improve the original model from two perspectives. |
334 | Getting Gender Right in Neural Machine Translation | Eva Vanmassenhove, Christian Hardmeier, Andy Way | Our contribution is two-fold: (1) the compilation of large datasets with speaker information for 20 language pairs, and (2) a simple set of experiments that incorporate gender information into NMT for multiple language pairs. |
335 | Towards Two-Dimensional Sequence to Sequence Model in Neural Machine Translation | Parnia Bahar, Christopher Brix, Hermann Ney | This work investigates an alternative model for neural machine translation (NMT) and proposes a novel architecture, where we employ a multi-dimensional long short-term memory (MDLSTM) for translation modelling. |
336 | End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification | Jindřich Libovický, Jindřich Helcl | We present a novel non-autoregressive architecture based on connectionist temporal classification and evaluate it on the task of neural machine translation. |
337 | Prediction Improves Simultaneous Neural Machine Translation | Ashkan Alinejad, Maryam Siahbani, Anoop Sarkar | We propose a new general-purpose prediction action which predicts future words in the input to improve quality and minimize delay in simultaneous translation. |
338 | Training Deeper Neural Machine Translation Models with Transparent Attention | Ankur Bapna, Mia Chen, Orhan Firat, Yuan Cao, Yonghui Wu | In this work we attempt to train significantly (2-3x) deeper Transformer and Bi-RNN encoders for machine translation. |
339 | Context and Copying in Neural Machine Translation | Rebecca Knowles, Philipp Koehn | In this work, we show that they learn to copy words based on both the context in which the words appear as well as features of the words themselves. |
340 | Encoding Gated Translation Memory into Neural Machine Translation | Qian Cao, Deyi Xiong | In this paper, we propose a novel method to combine the strengths of both TM and neural machine translation (NMT) for high-quality translation. |
341 | Automatic Post-Editing of Machine Translation: A Neural Programmer-Interpreter Approach | Thuy-Trang Vu, Gholamreza Haffari | In this paper, we present a neural programmer-interpreter approach to this task, resembling the way that human perform post-editing using discrete edit operations, wich we refer to as programs. |
342 | Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation | Yilin Yang, Liang Huang, Mingbo Ma | We explain why this happens, and propose several methods to address this problem. |
343 | Multi-Multi-View Learning: Multilingual and Multi-Representation Entity Typing | Yadollah Yaghoobzadeh, Hinrich Schütze | We employ mul-itiview learning for increasing the accuracy and coverage of entity type information in KBs. |
344 | Word Embeddings for Code-Mixed Language Processing | Adithya Pratapa, Monojit Choudhury, Sunayana Sitaram | We compare three existing bilingual word embedding approaches, and a novel approach of training skip-grams on synthetic code-mixed text generated through linguistic models of code-mixing, on two tasks – sentiment analysis and POS tagging for code-mixed text. |
345 | On the Strength of Character Language Models for Multilingual Named Entity Recognition | Xiaodong Yu, Stephen Mayhew, Mark Sammons, Dan Roth | This paper analyzes the capabilities of corpus-agnostic Character-level Language Models (CLMs) in the binary task of distinguishing name tokens from non-name tokens. |
346 | Code-switched Language Models Using Dual RNNs and Same-Source Pretraining | Saurabh Garg, Tanmay Parekh, Preethi Jyothi | We propose two techniques that significantly improve these LMs: 1) A novel recurrent neural network unit with dual components that focus on each language in the code-switched text separately 2) Pretraining the LM using synthetic text from a generative model estimated using the training data. |
347 | Part-of-Speech Tagging for Code-Switched, Transliterated Texts without Explicit Language Identification | Kelsey Ball, Dan Garrette | In this paper, we present a novel model architecture that is trained exclusively on monolingual resources, but can be applied to unseen code-switched text at inference time. |
348 | Zero-shot User Intent Detection via Capsule Neural Networks | Congying Xia, Chenwei Zhang, Xiaohui Yan, Yi Chang, Philip Yu | We propose two capsule-based architectures: IntentCapsNet that extracts semantic features from utterances and aggregates them to discriminate existing intents, and IntentCapsNet-ZSL which gives IntentCapsNet the zero-shot learning ability to discriminate emerging intents via knowledge transfer from existing intents. |
349 | Hierarchical Neural Networks for Sequential Sentence Classification in Medical Scientific Abstracts | Di Jin, Peter Szolovits | In this work, we present a hierarchical sequential labeling network to make use of the contextual information within surrounding sentences to help classify the current sentence. |
350 | Investigating Capsule Networks with Dynamic Routing for Text Classification | Min Yang, Wei Zhao, Jianbo Ye, Zeyang Lei, Zhou Zhao, Soufei Zhang | In this study, we explore capsule networks with dynamic routing for text classification. |
351 | Topic Memory Networks for Short Text Classification | Jichuan Zeng, Jing Li, Yan Song, Cuiyun Gao, Michael R. Lyu, Irwin King | To address this issue, we propose topic memory networks for short text classification with a novel topic memory mechanism to encode latent topic representations indicative of class labels. |
352 | Few-Shot and Zero-Shot Multi-Label Learning for Structured Label Spaces | Anthony Rios, Ramakanth Kavuluru | In this paper, we perform a fine-grained evaluation to understand how state-of-the-art methods perform on infrequent labels. |
353 | Automatic Poetry Generation with Mutual Reinforcement Learning | Xiaoyuan Yi, Maosong Sun, Ruoyu Li, Wenhao Li | Besides, inspired by writing theories, we propose a novel mutual reinforcement learning schema. |
354 | Variational Autoregressive Decoder for Neural Response Generation | Jiachen Du, Wenjie Li, Yulan He, Ruifeng Xu, Lidong Bing, Xuan Wang | To solve this problem, we propose a novel model that sequentially introduces a series of latent variables to condition the generation of each word in the response sequence. |
355 | Integrating Transformer and Paraphrase Rules for Sentence Simplification | Sanqiang Zhao, Rui Meng, Daqing He, Andi Saptono, Bambang Parmanto | In this paper, we explore a novel model based on a multi-layer and multi-head attention architecture and we propose two innovative approaches to integrate the Simple PPDB (A Paraphrase Database for Simplification), an external paraphrase knowledge base for simplification that covers a wide range of real-world simplification rules. |
356 | Learning Neural Templates for Text Generation | Sam Wiseman, Stuart Shieber, Alexander Rush | This work proposes a neural generation system using a hidden semi-markov model (HSMM) decoder, which learns latent, discrete templates jointly with learning to generate. |
357 | Multi-Reference Training with Pseudo-References for Neural Translation and Text Generation | Renjie Zheng, Mingbo Ma, Liang Huang | Multi-Reference Training with Pseudo-References for Neural Translation and Text Generation |
358 | Knowledge Graph Embedding with Hierarchical Relation Structure | Zhao Zhang, Fuzhen Zhuang, Meng Qu, Fen Lin, Qing He | To this end, in this paper, we extend existing KGE models TransE, TransH and DistMult, to learn knowledge representations by leveraging the information from the HRS. |
359 | Embedding Multimodal Relational Data for Knowledge Base Completion | Pouya Pezeshkpour, Liyan Chen, Sameer Singh | In this paper, we propose multimodal knowledge base embeddings (MKBE) that use different neural encoders for this variety of observed data, and combine them with existing relational models to learn embeddings of the entities and multimodal data. |
360 | Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction | Yi Luan, Luheng He, Mari Ostendorf, Hannaneh Hajishirzi | We introduce a multi-task setup of identifying entities, relations, and coreference clusters in scientific articles. |
361 | Playing 20 Question Game with Policy-Based Reinforcement Learning | Huang Hu, Xianchao Wu, Bingfeng Luo, Chongyang Tao, Can Xu, Wei Wu, Zhan Chen | In this paper, we propose a novel policy-based Reinforcement Learning (RL) method, which enables the questioner agent to learn the optimal policy of question selection through continuous interactions with users. |
362 | Multi-Hop Knowledge Graph Reasoning with Reward Shaping | Xi Victoria Lin, Richard Socher, Caiming Xiong | We propose two modeling advances to address both issues: (1) we reduce the impact of false negative supervision by adopting a pretrained one-hop embedding model to estimate the reward of unobserved facts; (2) we counter the sensitivity to spurious paths of on-policy RL by forcing the agent to explore a diverse set of paths using randomly generated edge masks. |
363 | Neural Transductive Learning and Beyond: Morphological Generation in the Minimal-Resource Setting | Katharina Kann, Hinrich Schütze | We propose two new methods for the minimal-resource setting: (i) Paradigm transduction: Since we assume only few paradigms available for training, neural seq2seq models are able to capture relationships between paradigm cells, but are tied to the idiosyncracies of the training set. |
364 | Implicational Universals in Stochastic Constraint-Based Phonology | Giorgio Magri | This paper focuses on the most basic implicational universals in phonological theory, called T-orders after Anttila and Andrus (2006). |
365 | Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules? | Fréderic Godin, Kris Demuynck, Joni Dambre, Wesley De Neve, Thomas Demeester | In this paper, we investigate which character-level patterns neural networks learn and if those patterns coincide with manually-defined word segmentations and annotations. |
366 | Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations | Aditi Chaudhary, Chunting Zhou, Lori Levin, Graham Neubig, David R. Mortensen, Jaime Carbonell | We present two approaches for improving generalization to low-resourced languages by adapting continuous word representations using linguistically motivated subword units: phonemes, morphemes and graphemes. |
367 | A Computational Exploration of Exaggeration | Enrica Troiano, Carlo Strapparava, Gözde Özbal, Serra Sinem Tekiroğlu | This paper presents a first computational approach to this figure of speech. |
368 | Building Context-aware Clause Representations for Situation Entity Type Classification | Zeyu Dai, Ruihong Huang | Specifically, we propose a hierarchical recurrent neural network model to read a whole paragraph at a time and jointly learn representations for all the clauses in the paragraph by extensively modeling context influences and inter-dependencies of clauses. |
369 | Hierarchical Dirichlet Gaussian Marked Hawkes Process for Narrative Reconstruction in Continuous Time Domain | Yeon Seonwoo, Alice Oh, Sungjoon Park | In this paper, we propose the Hierarchical Dirichlet Gaussian Marked Hawkes process (HD-GMHP) for reconstructing the narratives and thread structures of news articles and discussion posts. |
370 | Investigating the Role of Argumentation in the Rhetorical Analysis of Scientific Publications with Neural Multi-Task Learning Models | Anne Lauscher, Goran Glavaš, Simone Paolo Ponzetto, Kai Eckert | Acknowledging the argumentative nature of scientific text, in this work we investigate the link between the argumentative structure of scientific publications and rhetorical aspects such as discourse categories or citation contexts. |
371 | Neural Ranking Models for Temporal Dependency Structure Parsing | Yuchen Zhang, Nianwen Xue | We design and build the first neural temporal dependency parser. |
372 | Causal Explanation Analysis on Social Media | Youngseo Son, Nipun Bayas, H. Andrew Schwartz | Here, we explore automating causal explanation analysis, building on discourse parsing, and presenting two novel subtasks: causality detection (determining whether a causal explanation exists at all) and causal explanation identification (identifying the specific phrase that is the explanation). |
373 | LRMM: Learning to Recommend with Missing Modalities | Cheng Wang, Mathias Niepert, Hui Li | In this paper, we propose LRMM, a novel framework that mitigates not only the problem of missing modalities but also more generally the cold-start problem of recommender systems. |
374 | Content Explorer: Recommending Novel Entities for a Document Writer | Michal Lukasik, Richard Zens | In this paper, we formulate the problem of recommending topics to a writer. |
375 | A Genre-Aware Attention Model to Improve the Likability Prediction of Books | Suraj Maharjan, Manuel Montes, Fabio A. González, Thamar Solorio | We propose a novel multimodal neural architecture that incorporates genre supervision to assign weights to individual feature types. |
376 | Thread Popularity Prediction and Tracking with a Permutation-invariant Model | Hou Pong Chan, Irwin King | In this work, we propose a novel approach to tackle this problem. |
377 | IARM: Inter-Aspect Relation Modeling with Memory Networks in Aspect-Based Sentiment Analysis | Navonil Majumder, Soujanya Poria, Alexander Gelbukh, Md. Shad Akhtar, Erik Cambria, Asif Ekbal | In this paper, we present a novel approach of incorporating the neighboring aspects related information into the sentiment classification of the target aspect using memory networks. |
378 | Limbic: Author-Based Sentiment Aspect Modeling Regularized with Word Embeddings and Discourse Relations | Zhe Zhang, Munindar Singh | We propose Limbic, an unsupervised probabilistic model that addresses the problem of discovering aspects and sentiments and associating them with authors of opinionated texts. |
379 | An Interpretable Neural Network with Topical Information for Relevant Emotion Ranking | Yang Yang, Deyu Zhou, Yulan He | We proposed a novel interpretable neural network approach for relevant emotion ranking. |
380 | Multi-grained Attention Network for Aspect-Level Sentiment Classification | Feifan Fan, Yansong Feng, Dongyan Zhao | We propose a novel multi-grained attention network (MGAN) model for aspect level sentiment classification. |
381 | Attentive Gated Lexicon Reader with Contrastive Contextual Co-Attention for Sentiment Classification | Yi Tay, Anh Tuan Luu, Siu Cheung Hui, Jian Su | This paper proposes a new neural architecture that exploits readily available sentiment lexicon resources. |
382 | Contextual Inter-modal Attention for Multi-modal Sentiment Analysis | Deepanway Ghosal, Md Shad Akhtar, Dushyant Chauhan, Soujanya Poria, Asif Ekbal, Pushpak Bhattacharyya | In this paper, we propose a recurrent neural network based multi-modal attention framework that leverages the contextual information for utterance-level sentiment prediction. |
383 | Adaptive Semi-supervised Learning for Cross-domain Sentiment Classification | Ruidan He, Wee Sun Lee, Hwee Tou Ng, Daniel Dahlmeier | We consider the cross-domain sentiment classification problem, where a sentiment classifier is to be learned from a source domain and to be generalized to a target domain. |
384 | ExtRA: Extracting Prominent Review Aspects from Customer Feedback | Zhiyi Luo, Shanshan Huang, Frank F. Xu, Bill Yuchen Lin, Hanyuan Shi, Kenny Zhu | In this paper, we propose a novel framework, for extracting the most prominent aspects of a given product type from textual reviews. |
385 | Cross-Lingual Cross-Platform Rumor Verification Pivoting on Multimedia Content | Weiming Wen, Songwen Su, Zhou Yu | We introduce a new features set, cross-lingual cross-platform features that leverage the semantic similarity between the rumors and the external information. |
386 | Extractive Adversarial Networks: High-Recall Explanations for Identifying Personal Attacks in Social Media Posts | Samuel Carton, Qiaozhu Mei, Paul Resnick | We introduce an adversarial method for producing high-recall explanations of neural text classifier decisions. We develop a validation set of human-annotated personal attacks to evaluate the impact of these changes. |
387 | Automatic Detection of Vague Words and Sentences in Privacy Policies | Logan Lebanoff, Fei Liu | In this paper, we seek to identify vague content in privacy policies. We construct the first corpus of human-annotated vague words and sentences and present empirical studies on automatic vagueness detection. |
388 | Multi-view Models for Political Ideology Detection of News Articles | Vivek Kulkarni, Junting Ye, Steve Skiena, William Yang Wang | Drawing inspiration from recent advances in neural inference, we propose a novel attention based multi-view model to leverage cues from all of the above views to identify the ideology evinced by a news article. |
389 | Predicting Factuality of Reporting and Bias of News Media Sources | Ramy Baly, Georgi Karadzhov, Dimitar Alexandrov, James Glass, Preslav Nakov | We present a study on predicting the factuality of reporting and bias of news media. |
390 | Legal Judgment Prediction via Topological Learning | Haoxi Zhong, Zhipeng Guo, Cunchao Tu, Chaojun Xiao, Zhiyuan Liu, Maosong Sun | We conduct experiments on several real-world large-scale datasets of criminal cases in the civil law system. |
391 | Hierarchical CVAE for Fine-Grained Hate Speech Classification | Jing Qian, Mai ElSherief, Elizabeth Belding, William Yang Wang | In this paper, we propose a novel method on a fine-grained hate speech classification task, which focuses on differentiating among 40 hate groups of 13 different hate group categories. |
392 | Residualized Factor Adaptation for Community Social Media Prediction Tasks | Mohammadzaman Zamani, H. Andrew Schwartz, Veronica Lynn, Salvatore Giorgi, Niranjan Balasubramanian | In this paper, we present residualized factor adaptation, a novel approach to community prediction tasks which both (a) effectively integrates community attributes, as well as (b) adapts linguistic features to community attributes (factors). |
393 | Framing and Agenda-setting in Russian News: a Computational Analysis of Intricate Political Strategies | Anjalie Field, Doron Kliger, Shuly Wintner, Jennifer Pan, Dan Jurafsky, Yulia Tsvetkov | We introduce embedding-based methods for cross-lingually projecting English frames to Russian, and discover that these articles emphasize U.S. moral failings and threats to the U.S. |
394 | Identifying the sentiment styles of YouTube’s vloggers | Bennett Kleinberg, Maximilian Mozes, Isabelle van der Vegt | This paper examined the continuous sentiment styles employed in 27,333 vlogs using a dynamic intra-textual approach to sentiment analysis. |
395 | Native Language Identification with User Generated Content | Gili Goldin, Ella Rabinovich, Shuly Wintner | We provide a detailed analysis of the features that sheds light on differences between native and nonnative speakers, and among nonnative speakers with different backgrounds. |
396 | Beyond Error Propagation in Neural Machine Translation: Characteristics of Language Also Matter | Lijun Wu, Xu Tan, Di He, Fei Tian, Tao Qin, Jianhuang Lai, Tie-Yan Liu | In this paper, we conduct a series of analyses to deeply understand this problem and get several interesting findings. |
397 | A Study of Reinforcement Learning for Neural Machine Translation | Lijun Wu, Fei Tian, Tao Qin, Jianhuang Lai, Tie-Yan Liu | In this paper, taking several large-scale translation tasks as testbeds, we conduct a systematic study on how to train better NMT models using reinforcement learning. |
398 | Meta-Learning for Low-Resource Neural Machine Translation | Jiatao Gu, Yong Wang, Yun Chen, Victor O. K. Li, Kyunghyun Cho | In this paper, we propose to extend the recently introduced model-agnostic meta-learning algorithm (MAML, Finn, et al., 2017) for low-resource neural machine translation (NMT). |
399 | Unsupervised Statistical Machine Translation | Mikel Artetxe, Gorka Labaka, Eneko Agirre | In this paper, we propose an alternative approach based on phrase-based Statistical Machine Translation (SMT) that significantly closes the gap with supervised systems. |
400 | A Visual Attention Grounding Neural Model for Multimodal Machine Translation | Mingyang Zhou, Runxiang Cheng, Yong Jae Lee, Zhou Yu | We introduce a novel multimodal machine translation model that utilizes parallel visual and textual information. We also collected a new multilingual multimodal product description dataset to simulate a real-world international online shopping scenario. |
401 | Sentiment Classification towards Question-Answering with Hierarchical Matching Network | Chenlin Shen, Changlong Sun, Jingjing Wang, Yangyang Kang, Shoushan Li, Xiaozhong Liu, Luo Si, Min Zhang, Guodong Zhou | In this study, we propose a novel task/method to address QA sentiment analysis. In particular, we create a high-quality annotated corpus with specially-designed annotation guidelines for QA-style sentiment classification. |
402 | Cross-topic Argument Mining from Heterogeneous Sources | Christian Stab, Tristan Miller, Benjamin Schiller, Pranav Rai, Iryna Gurevych | In this paper, we propose a new sentential annotation scheme that is reliably applicable by crowd workers to arbitrary Web texts. |
403 | Summarizing Opinions: Aspect Extraction Meets Sentiment Prediction and They Are Both Weakly Supervised | Stefanos Angelidis, Mirella Lapata | We present a neural framework for opinion summarization from online product reviews which is knowledge-lean and only requires light supervision (e.g., in the form of product domain labels and user-provided ratings). We introduce an opinion summarization dataset that includes a training set of product reviews from six diverse domains and human-annotated development and test sets with gold standard aspect annotations, salience labels, and opinion summaries. |
404 | CARER: Contextualized Affect Representations for Emotion Recognition | Elvis Saravia, Hsien-Chi Toby Liu, Yen-Hao Huang, Junlin Wu, Yi-Shin Chen | We propose a semi-supervised, graph-based algorithm to produce rich structural descriptors which serve as the building blocks for constructing contextualized affect representations from text. |
405 | Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency | Zhuang Ma, Michael Collins | In particular, we analyze two variants of NCE for conditional models: one based on a classification objective, the other based on a ranking objective. |
406 | CaLcs: Continuously Approximating Longest Common Subsequence for Sequence Level Optimization | Semih Yavuz, Chung-Cheng Chiu, Patrick Nguyen, Yonghui Wu | In this paper, we present an alternative direction towards mitigating this problem by introducing a new objective (CaLcs) based on a differentiable surrogate of longest common subsequence (LCS) measure that captures sequence-level structure similarity. |
407 | Pathologies of Neural Models Make Interpretations Difficult | Shi Feng, Eric Wallace, Alvin Grissom II, Mohit Iyyer, Pedro Rodriguez, Jordan Boyd-Graber | To understand the limitations of these methods, we use input reduction, which iteratively removes the least important word from the input. |
408 | Phrase-level Self-Attention Networks for Universal Sentence Encoding | Wei Wu, Houfeng Wang, Tianyu Liu, Shuming Ma | To this end, we propose Phrase-level Self-Attention Networks (PSAN) that perform self-attention across words inside a phrase to capture context dependencies at the phrase level, and use the gated memory updating mechanism to refine each word’s representation hierarchically with longer-term context dependencies captured in a larger phrase. |
409 | BanditSum: Extractive Summarization as a Contextual Bandit | Yue Dong, Yikang Shen, Eric Crawford, Herke van Hoof, Jackie Chi Kit Cheung | In this work, we propose a novel method for training neural networks to perform single-document extractive summarization without heuristically-generated extractive labels. |
410 | A Word-Complexity Lexicon and A Neural Readability Ranking Model for Lexical Simplification | Mounica Maddela, Wei Xu | We create a human-rated word-complexity lexicon of 15,000 English words and propose a novel neural readability ranking model with a Gaussian-based feature vectorization layer that utilizes these human ratings to measure the complexity of any given word or phrase. |
411 | Learning Latent Semantic Annotations for Grounding Natural Language to Structured Data | Guanghui Qin, Jin-Ge Yao, Xuening Wang, Jinpeng Wang, Chin-Yew Lin | In this paper, we attempt at learning explicit latent semantic annotations from paired structured tables and texts, establishing correspondences between various types of values and texts. |
412 | Syntactic Scaffolds for Semantic Structures | Swabha Swayamdipta, Sam Thomson, Kenton Lee, Luke Zettlemoyer, Chris Dyer, Noah A. Smith | We introduce the syntactic scaffold, an approach to incorporating syntactic information into semantic tasks. |
413 | Hierarchical Quantized Representations for Script Generation | Noah Weber, Leena Shekhar, Niranjan Balasubramanian, Nathanael Chambers | To capture this type of information, we propose an autoencoder model with a latent space defined by a hierarchy of categorical variables. |
414 | Semantic Role Labeling for Learner Chinese: the Importance of Syntactic Parsing and L2-L1 Parallel Data | Zi Lin, Yuguang Duan, Yuanyuan Zhao, Weiwei Sun, Xiaojun Wan | We find two non-obvious facts: 1) the L1-sentence-trained systems performs rather badly on the L2 data; 2) the performance drop from the L1 data to the L2 data of the two parser-based systems is much smaller, indicating the importance of syntactic parsing in SRL for interlanguages. |
415 | A Teacher-Student Framework for Maintainable Dialog Manager | Weikang Wang, Jiajun Zhang, Han Zhang, Mei-Yuh Hwang, Chengqing Zong, Zhifei Li | To address this issue, we propose a practical teacher-student framework to extend RL-based dialog systems without retraining from scratch. |
416 | Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning | Shang-Yu Su, Xiujun Li, Jianfeng Gao, Jingjing Liu, Yun-Nung Chen | This paper presents a Discriminative Deep Dyna-Q (D3Q) approach to improving the effectiveness and robustness of Deep Dyna-Q (DDQ), a recently proposed framework that extends the Dyna-Q algorithm to integrate planning for task-completion dialogue policy learning. |
417 | A Self-Attentive Model with Gate Mechanism for Spoken Language Understanding | Changliang Li, Liang Li, Ji Qi | In this work, we propose a novel self-attentive model with gate mechanism to fully utilize the semantic correlation between slot and intent. |
418 | Learning End-to-End Goal-Oriented Dialog with Multiple Answers | Janarthanan Rajendran, Jatin Ganhotra, Satinder Singh, Lazaros Polymenakos | In this work, we focus on this problem in the goal-oriented dialog setting where there are different paths to reach a goal. |
419 | AirDialogue: An Environment for Goal-Oriented Dialogue Research | Wei Wei, Quoc Le, Andrew Dai, Jia Li | We present AirDialogue, a large dataset that contains 301,427 goal-oriented conversations. |
420 | QuaSE: Sequence Editing under Quantifiable Guidance | Yi Liao, Lidong Bing, Piji Li, Shuming Shi, Wai Lam, Tong Zhang | In this paper, the proposed framework contains two latent factors, namely, outcome factor and content factor, disentangled from the input sentence to allow convenient editing to change the outcome and keep the content. |
421 | Paraphrase Generation with Deep Reinforcement Learning | Zichao Li, Xin Jiang, Lifeng Shang, Hang Li | In this paper, we present a deep reinforcement learning approach to paraphrase generation. |
422 | Operation-guided Neural Networks for High Fidelity Data-To-Text Generation | Feng Nie, Jinpeng Wang, Jin-Ge Yao, Rong Pan, Chin-Yew Lin | In this paper, we attempt to improve the fidelity of neural data-to-text generation by utilizing pre-executed symbolic operations. |
423 | Generating Classical Chinese Poems via Conditional Variational Autoencoder and Adversarial Training | Juntao Li, Yan Song, Haisong Zhang, Dongmin Chen, Shuming Shi, Dongyan Zhao, Rui Yan | Towards filling the gap, in this paper, we propose a conditional variational autoencoder with adversarial training for classical Chinese poem generation, where the autoencoder part generates poems with novel terms and a discriminator is applied to adversarially learn their thematic consistency with their titles. |
424 | Paragraph-level Neural Question Generation with Maxout Pointer and Gated Self-attention Networks | Yao Zhao, Xiaochuan Ni, Yuanyuan Ding, Qifa Ke | In this paper, we propose a maxout pointer mechanism with gated self-attention encoder to address the challenges of processing long text inputs for question generation. |
425 | Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task | Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, Dragomir Radev | We present \textit{Spider}, a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 college students. |
426 | Unsupervised Natural Language Generation with Denoising Autoencoders | Markus Freitag, Scott Roy | In our approach, we interpret the structured data as a corrupt representation of the desired output and use a denoising auto-encoder to reconstruct the sentence. |
427 | Answer-focused and Position-aware Neural Question Generation | Xingwu Sun, Jing Liu, Yajuan Lyu, Wei He, Yanjun Ma, Shi Wang | In this paper, we focus on the problem of question generation (QG). |
428 | Diversity-Promoting GAN: A Cross-Entropy Based Generative Adversarial Network for Diversified Text Generation | Jingjing Xu, Xuancheng Ren, Junyang Lin, Xu Sun | To tackle this problem, we propose a new text generation model, called Diversity-Promoting Generative Adversarial Network (DP-GAN). |
429 | Towards a Better Metric for Evaluating Question Generation Systems | Preksha Nema, Mitesh M. Khapra | In this work, we show that current automatic evaluation metrics based on $n$-gram similarity do not always correlate well with human judgments about \textit{answerability} of a question. |
430 | Stylistic Chinese Poetry Generation via Unsupervised Style Disentanglement | Cheng Yang, Maosong Sun, Xiaoyuan Yi, Wenhao Li | We propose a novel model which requires no supervised style labeling by incorporating mutual information, a concept in information theory, into modeling. |
431 | Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints | Ashutosh Baheti, Alan Ritter, Jiwei Li, Bill Dolan | To address this challenge, we propose a simple yet effective approach for incorporating side information in the form of distributional constraints over the generated responses. |
432 | Better Conversations by Modeling, Filtering, and Optimizing for Coherence and Diversity | Xinnuo Xu, Ondřej Dušek, Ioannis Konstas, Verena Rieser | Better Conversations by Modeling, Filtering, and Optimizing for Coherence and Diversity |
433 | Incorporating Background Knowledge into Video Description Generation | Spencer Whitehead, Heng Ji, Mohit Bansal, Shih-Fu Chang, Clare Voss | The model learns to incorporate entities found in the topically related documents into the description via an entity pointer network and the generation procedure is guided by the event and entity types from the topically related documents through a knowledge gate, which is a gating mechanism added to the model’s decoder that takes a one-hot vector of these types. We collect a news video dataset to generate enriched descriptions that include important background knowledge, such as named entities and related events, which allows the user to fully understand the video content. |
434 | Multimodal Differential Network for Visual Question Generation | Badri Narayana Patro, Sandeep Kumar, Vinod Kumar Kurmi, Vinay Namboodiri | In this paper, we propose the use of exemplars for obtaining the relevant context. |
435 | Entity-aware Image Caption Generation | Di Lu, Spencer Whitehead, Lifu Huang, Heng Ji, Shih-Fu Chang | In this paper we propose a new task which aims to generate informative image captions, given images and hashtags as input. |
436 | Learning to Describe Differences Between Pairs of Similar Images | Harsh Jhamtani, Taylor Berg-Kirkpatrick | In this paper, we introduce the task of automatically generating text to describe the differences between two similar images. We collect a new dataset by crowd-sourcing difference descriptions for pairs of image frames extracted from video-surveillance footage. |
437 | Object Hallucination in Image Captioning | Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns, Trevor Darrell, Kate Saenko | In this work, we propose a new image relevance metric to evaluate current models with veridical visual labels and assess their rate of object hallucination. |
438 | Abstractive Text-Image Summarization Using Multi-Modal Attentional Hierarchical RNN | Jingqiang Chen, Hai Zhuge | This paper proposes an abstractive text-image summarization model using the attentional hierarchical Encoder-Decoder model to summarize a text document and its accompanying images simultaneously, and then to align the sentences and images in summaries. |
439 | Keyphrase Generation with Correlation Constraints | Jun Chen, Xiaoming Zhang, Yu Wu, Zhao Yan, Zhoujun Li | In this paper, we study automatic keyphrase generation. |
440 | Closed-Book Training to Improve Summarization Encoder Memory | Yichen Jiang, Mohit Bansal | In this paper, we aim to improve the memorization capabilities of the encoder of a pointer-generator model by adding an additional closed-book’ decoder without attention and pointer mechanisms. |
441 | Improving Neural Abstractive Document Summarization with Structural Regularization | Wei Li, Xinyan Xiao, Yajuan Lyu, Yuanzhuo Wang | In this paper, we propose to leverage the structural information of both documents and multi-sentence summaries to improve the document summarization performance. |
442 | Iterative Document Representation Learning Towards Summarization with Polishing | Xiuying Chen, Shen Gao, Chongyang Tao, Yan Song, Dongyan Zhao, Rui Yan | In this paper, we introduce Iterative Text Summarization (ITS), an iteration-based model for supervised extractive text summarization, inspired by the observation that it is often necessary for a human to read an article multiple times in order to fully understand and summarize its contents. |
443 | Bottom-Up Abstractive Summarization | Sebastian Gehrmann, Yuntian Deng, Alexander Rush | This work explores the use of data-efficient content selectors to over-determine phrases in a source document that should be part of the summary. |
444 | Controlling Length in Abstractive Summarization Using a Convolutional Neural Network | Yizhu Liu, Zhiyi Luo, Kenny Zhu | In this paper, we propose an approach to constrain the summary length by extending a convolutional sequence to sequence model. |
445 | APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning | Yang Gao, Christian M. Meyer, Iryna Gurevych | We propose a method to perform automatic document summarisation without using reference summaries. |
446 | Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization | Logan Lebanoff, Kaiqiang Song, Fei Liu | In this paper, we present an initial investigation into a novel adaptation method. |
447 | Semi-Supervised Learning for Neural Keyphrase Generation | Hai Ye, Lu Wang | In this paper, we propose semi-supervised keyphrase generation methods by leveraging both labeled data and large-scale unlabeled samples for learning. |
448 | MSMO: Multimodal Summarization with Multimodal Output | Junnan Zhu, Haoran Li, Tianshang Liu, Yu Zhou, Jiajun Zhang, Chengqing Zong | In this paper, we propose a novel task, multimodal summarization with multimodal output (MSMO). To handle this task, we first collect a large-scale dataset for MSMO research. |
449 | Frustratingly Easy Model Ensemble for Abstractive Summarization | Hayato Kobayashi | In this paper, we propose an alternative, simple but effective unsupervised ensemble method, \textit{post-ensemble}, that combines multiple models by selecting a majority-like output in post-processing. |
450 | Automatic Pyramid Evaluation Exploiting EDU-based Extractive Reference Summaries | Tsutomu Hirao, Hidetaka Kamigaito, Masaaki Nagata | This paper tackles automation of the pyramid method, a reliable manual evaluation framework. |
451 | Learning to Encode Text as Human-Readable Summaries using Generative Adversarial Networks | Yaushian Wang, Hung-Yi Lee | In this paper, we propose training an auto-encoder that encodes input text into human-readable sentences, and unpaired abstractive summarization is thereby achieved. |
452 | Joint Multitask Learning for Community Question Answering Using Task-Specific Embeddings | Shafiq Joty, Lluís Màrquez, Preslav Nakov | We address jointly two important tasks for Question Answering in community forums: given a new question, (i) find related existing questions, and (ii) find relevant answers to this new question. |
453 | What Makes Reading Comprehension Questions Easier? | Saku Sugawara, Kentaro Inui, Satoshi Sekine, Akiko Aizawa | In this work, we investigate what makes questions easier across recent 12 MRC datasets with three question styles (answer extraction, description, and multiple choice). |
454 | Commonsense for Generative Multi-Hop Question Answering Tasks | Lisa Bauer, Yicheng Wang, Mohit Bansal | Reading comprehension QA tasks have seen a recent surge in popularity, yet most works have focused on fact-finding extractive QA. |
455 | Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text | Haitian Sun, Bhuwan Dhingra, Manzil Zaheer, Kathryn Mazaitis, Ruslan Salakhutdinov, William Cohen | In this paper we look at a more practical setting, namely QA over the combination of a KB and entity-linked text, which is appropriate when an incomplete KB is available with a large text corpus. |
456 | A Nil-Aware Answer Extraction Framework for Question Answering | Souvik Kundu, Hwee Tou Ng | In this paper, we focus on developing QA systems that can extract an answer for a question if and only if the associated passage contains an answer. |
457 | Exploiting Deep Representations for Neural Machine Translation | Zi-Yi Dou, Zhaopeng Tu, Xing Wang, Shuming Shi, Tong Zhang | In this work, we propose to simultaneously expose all of these signals with layer aggregation and multi-layer attention mechanisms. |
458 | Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures | Gongbo Tang, Mathias Müller, Annette Rios, Rico Sennrich | We hypothesize that the strong performance of CNNs and self-attentional networks could also be due to their ability to extract semantic features from the source text, and we evaluate RNNs, CNNs and self-attention networks on two tasks: subject-verb agreement (where capturing long-range dependencies is required) and word sense disambiguation (where semantic feature extraction is required). |
459 | Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks | Biao Zhang, Deyi Xiong, Jinsong Su, Qian Lin, Huiji Zhang | In this paper, we propose an additionsubtraction twin-gated recurrent network (ATR) to simplify neural machine translation. |
460 | Speeding Up Neural Machine Translation Decoding by Cube Pruning | Wen Zhang, Liang Huang, Yang Feng, Lei Shen, Qun Liu | We apply cube pruning, a popular technique to speed up dynamic programming, into neural machine translation to speed up the translation. |
461 | Revisiting Character-Based Neural Machine Translation with Capacity and Compression | Colin Cherry, George Foster, Ankur Bapna, Orhan Firat, Wolfgang Macherey | In this paper, we show that the modeling problem can be solved by standard sequence-to-sequence architectures of sufficient depth, and that deep models operating at the character level outperform identical models operating over word fragments. |
462 | A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation | Jingjing Xu, Xuancheng Ren, Yi Zhang, Qi Zeng, Xiaoyan Cai, Xu Sun | To address this problem, we propose a skeleton-based model to promote the coherence of generated stories. |
463 | NEXUS Network: Connecting the Preceding and the Following in Dialogue Generation | Xiaoyu Shen, Hui Su, Wenjie Li, Dietrich Klakow | In this paper, we argue that a good response should smoothly connect both the preceding dialogue history and the following conversations. |
464 | A Neural Local Coherence Model for Text Quality Assessment | Mohsen Mesgar, Michael Strube | We propose a local coherence model that captures the flow of what semantically connects adjacent sentences in a text. |
465 | Deep Attentive Sentence Ordering Network | Baiyun Cui, Yingming Li, Ming Chen, Zhongfei Zhang | In this paper, we propose a novel deep attentive sentence ordering network (referred as ATTOrderNet) which integrates self-attention mechanism with LSTMs in the encoding of input sentences. |
466 | Getting to “Hearer-old”: Charting Referring Expressions Across Time | Ieva Staliūnaitė, Hannah Rohde, Bonnie Webber, Annie Louis | This paper presents the first study of how expressions that refer to the same entity develop over time. |
467 | Making “fetch” happen: The influence of social and linguistic context on nonstandard word growth and decline | Ian Stewart, Jacob Eisenstein | Making “fetch” happen: The influence of social and linguistic context on nonstandard word growth and decline |
468 | Analyzing Correlated Evolution of Multiple Features Using Latent Representations | Yugo Murawaki | Here we propose latent representation-based analysis in which (1) a sequence of discrete surface features is projected to a sequence of independent binary variables and (2) phylogenetic inference is performed on the latent space. |
469 | Capturing Regional Variation with Distributed Place Representations and Geographic Retrofitting | Dirk Hovy, Christoph Purschke | We use Doc2Vec on a corpus of 16.8M anonymous online posts in the German-speaking area to learn continuous document representations of cities. |
470 | Characterizing Interactions and Relationships between People | Farzana Rashid, Eduardo Blanco | This paper presents a set of dimensions to characterize the association between two people. We introduce and analyze a new corpus, and present experimental results showing that the task can be automated. |
471 | Why Swear? Analyzing and Inferring the Intentions of Vulgar Expressions | Eric Holgate, Isabel Cachola, Daniel Preoţiuc-Pietro, Junyi Jessy Li | We introduce a novel data set of 7,800 tweets from users with known demographic traits where all instances of vulgar words are annotated with one of the six categories of vulgar word use. |
472 | Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks | Steffen Eger, Paul Youssef, Iryna Gurevych | We find that a largely unknown activation function performs most stably across all tasks, the so-called penalized tanh function. |
473 | Hard Non-Monotonic Attention for Character-Level Transduction | Shijie Wu, Pamela Shapiro, Ryan Cotterell | In this work, we introduce an exact, polynomial-time algorithm for marginalizing over the exponential number of non-monotonic alignments between two strings, showing that hard attention models can be viewed as neural reparameterizations of the classical IBM Model 1. |
474 | Speed Reading: Learning to Read ForBackward via Shuttle | Tsu-Jui Fu, Wei-Yun Ma | We present LSTM-Shuttle, which applies human speed reading techniques to natural language processing tasks for accurate and efficient comprehension. |
475 | Modeling Localness for Self-Attention Networks | Baosong Yang, Zhaopeng Tu, Derek F. Wong, Fandong Meng, Lidia S. Chao, Tong Zhang | In this work, we propose to model localness for self-attention networks, which enhances the ability of capturing useful local context. |
476 | Chargrid: Towards Understanding 2D Documents | Anoop R Katti, Christian Reisswig, Cordula Guder, Sebastian Brarda, Steffen Bickel, Johannes Höhne, Jean Baptiste Faddoul | Based on this representation, we present a generic document understanding pipeline for structured documents. |
477 | Simple Recurrent Units for Highly Parallelizable Recurrence | Tao Lei, Yu Zhang, Sida I. Wang, Hui Dai, Yoav Artzi | In this work, we propose the Simple Recurrent Unit (SRU), a light recurrent unit that balances model capacity and scalability. |
478 | NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval | Canjia Li, Yingfei Sun, Ben He, Le Wang, Kai Hui, Andrew Yates, Le Sun, Jungang Xu | To bridge this gap, we propose an end-to-end neural PRF framework that can be used with existing neural IR models by embedding different neural models as building blocks. |
479 | Co-Stack Residual Affinity Networks with Multi-level Attention Refinement for Matching Text Sequences | Yi Tay, Anh Tuan Luu, Siu Cheung Hui | This paper proposes Co-Stack Residual Affinity Networks (CSRAN), a new and universal neural architecture for this problem. |
480 | Spherical Latent Spaces for Stable Variational Autoencoders | Jiacheng Xu, Greg Durrett | In this work, we experiment with another choice of latent distribution, namely the von Mises-Fisher (vMF) distribution, which places mass on the surface of the unit hypersphere. |
481 | Learning Universal Sentence Representations with Mean-Max Attention Autoencoder | Minghua Zhang, Yunfang Wu, Weikang Li, Wei Li | In this paper, we propose a mean-max attention autoencoder (mean-max AAE) within the encoder-decoder framework. |
482 | Word Mover’s Embedding: From Word2Vec to Document Embedding | Lingfei Wu, Ian En-Hsu Yen, Kun Xu, Fangli Xu, Avinash Balakrishnan, Pin-Yu Chen, Pradeep Ravikumar, Michael J. Witbrock | In this paper, we propose the Word Mover’s Embedding (WME), a novel approach to building an unsupervised document (sentence) embedding from pre-trained word embeddings. |
483 | Multilingual Clustering of Streaming News | Sebastião Miranda, Artūrs Znotiņš, Shay B. Cohen, Guntis Barzdins | To this end, we describe a novel method for clustering an incoming stream of multilingual documents into monolingual and crosslingual clusters. |
484 | Multi-Task Label Embedding for Text Classification | Honglun Zhang, Liqiang Xiao, Wenqing Chen, Yongkun Wang, Yaohui Jin | In this paper, we propose Multi-Task Label Embedding to convert labels in text classification into semantic vectors, thereby turning the original tasks into vector matching tasks. |
485 | Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification | Junyang Lin, Qi Su, Pengcheng Yang, Shuming Ma, Xu Sun | We propose a novel model for multi-label text classification, which is based on sequence-to-sequence learning. |
486 | MCapsNet: Capsule Network for Text with Multi-Task Learning | Liqiang Xiao, Honglun Zhang, Wenqing Chen, Yongkun Wang, Yaohui Jin | This paper investigates the performance of capsule network for text, and proposes a capsule-based multi-task learning architecture, which is unified, simple and effective. |
487 | Uncertainty-aware generative models for inferring document class prevalence | Katherine Keith, Brendan O’Connor | In this work, we present (1) a generative probabilistic modeling approach to prevalence estimation, and (2) the construction and evaluation of prevalence confidence intervals; in particular, we demonstrate that an off-the-shelf discriminative classifier can be given a generative re-interpretation, by backing out an implicit individual-level likelihood function, which can be used to conduct fast and simple group-level Bayesian inference. |
488 | Challenges of Using Text Classifiers for Causal Inference | Zach Wood-Doughty, Ilya Shpitser, Mark Dredze | We demonstrate how to conduct causal analyses using text classifiers on simulated and Yelp data, and discuss the opportunities and challenges of future work that uses text data in causal inference. |
489 | Direct Output Connection for a High-Rank Language Model | Sho Takase, Jun Suzuki, Masaaki Nagata | This paper proposes a state-of-the-art recurrent neural network (RNN) language model that combines probability distributions computed not only from a final RNN layer but also middle layers. |
490 | Disfluency Detection using Auto-Correlational Neural Networks | Paria Jamshid Lou, Peter Anderson, Mark Johnson | As an alternative, this paper proposes a simple yet effective model for automatic disfluency detection, called an auto-correlational neural network (ACNN). |
491 | Pyramidal Recurrent Unit for Language Modeling | Sachin Mehta, Rik Koncel-Kedziorski, Mohammad Rastegari, Hannaneh Hajishirzi | We introduce the Pyramidal Recurrent Unit (PRU), which enables learning representations in high dimensional space with more generalization power and fewer parameters. |
492 | On Tree-Based Neural Sentence Modeling | Haoyue Shi, Hao Zhou, Jiaze Chen, Lei Li | Though trivial trees contain no syntactic information, those encoders get competitive or even better results on all of the ten downstream tasks we investigated. |
493 | Language Modeling with Sparse Product of Sememe Experts | Yihong Gu, Jun Yan, Hao Zhu, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Fen Lin, Leyu Lin | In this paper, we argue that words are atomic language units but not necessarily atomic semantic units. |
494 | Siamese Network-Based Supervised Topic Modeling | Minghui Huang, Yanghui Rao, Yuwei Liu, Haoran Xie, Fu Lee Wang | In this study, we propose a supervised topic model based on the Siamese network, which can trade off label-specific word distributions with document-specific label distributions in a uniform framework. |
495 | GraphBTM: Graph Enhanced Autoencoded Variational Inference for Biterm Topic Model | Qile Zhu, Zheng Feng, Xiaolin Li | In this paper, we propose a novel way called GraphBTM to represent biterms as graphs and design a Graph Convolutional Networks (GCNs) with residual connections to extract transitive features from biterms. We also propose a dataset called All News extracted from 15 news publishers, in which documents are much longer than 20 Newsgroups. |
496 | Modeling Online Discourse with Coupled Distributed Topics | Akshay Srivatsan, Zachary Wojtowicz, Taylor Berg-Kirkpatrick | In this paper, we propose a deep, globally normalized topic model that incorporates structural relationships connecting documents in socially generated corpora, such as online forums. |
497 | Learning Disentangled Representations of Texts with Application to Biomedical Abstracts | Sarthak Jain, Edward Banner, Jan-Willem van de Meent, Iain J. Marshall, Byron C. Wallace | We propose a method for learning disentangled representations of texts that code for distinct and complementary aspects, with the aim of affording efficient model transfer and interpretability. |
498 | Multi-Source Domain Adaptation with Mixture of Experts | Jiang Guo, Darsh Shah, Regina Barzilay | We propose a mixture-of-experts approach for unsupervised domain adaptation from multiple sources. |
499 | A Neural Model of Adaptation in Reading | Marten van Schijndel, Tal Linzen | We provide further support to this claim by showing that the addition of a simple adaptation mechanism to a neural language model improves our predictions of human reading times compared to a non-adaptive model. |
500 | Understanding Deep Learning Performance through an Examination of Test Set Difficulty: A Psychometric Case Study | John Lalor, Hao Wu, Tsendsuren Munkhdalai, Hong Yu | In this work we examine the impact of a test set question’s difficulty to determine if there is a relationship between difficulty and performance. |
501 | Lexicosyntactic Inference in Neural Models | Aaron Steven White, Rachel Rudinger, Kyle Rawlins, Benjamin Van Durme | We investigate neural models’ ability to capture lexicosyntactic inferences: inferences triggered by the interaction of lexical and syntactic information. We take the task of event factuality prediction as a case study and build a factuality judgment dataset for all English clause-embedding verbs in various syntactic contexts. |
502 | Dual Fixed-Size Ordinally Forgetting Encoding (FOFE) for Competitive Neural Language Models | Sedtawut Watcharawittayakul, Mingbin Xu, Hui Jiang | In this paper, we propose a new approach to employ the fixed-size ordinally-forgetting encoding (FOFE) (Zhang et al., 2015b) in neural languages modelling, called dual-FOFE. |
503 | The Importance of Being Recurrent for Modeling Hierarchical Structure | Ke Tran, Arianna Bisazza, Christof Monz | In this work, we compare the two architectures-recurrent versus non-recurrent-with respect to their ability to model hierarchical structure and find that recurrency is indeed important for this purpose. |
504 | Joint Learning for Targeted Sentiment Analysis | Dehong Ma, Sujian Li, Houfeng Wang | In this paper, we carefully design the hierarchical stack bidirectional gated recurrent units (HSBi-GRU) model to learn abstract features for both tasks, and we propose a HSBi-GRU based joint model which allows the target label to have influence on their sentiment label. |
505 | Revisiting the Importance of Encoding Logic Rules in Sentiment Classification | Kalpesh Krishna, Preethi Jyothi, Mohit Iyyer | We analyze the performance of different sentiment classification models on syntactically complex inputs like A-but-B sentences. |
506 | A Co-Attention Neural Network Model for Emotion Cause Analysis with Emotional Context Awareness | Xiangju Li, Kaisong Song, Shi Feng, Daling Wang, Yifei Zhang | Therefore, we propose a co-attention neural network model for emotion cause analysis with emotional context awareness. |
507 | Modeling Empathy and Distress in Reaction to News Stories | Sven Buechel, Anneke Buffone, Barry Slaff, Lyle Ungar, João Sedoc | In contrast, this contribution presents the first publicly available gold standard for empathy prediction. |
508 | Interpretable Emoji Prediction via Label-Wise Attention LSTMs | Francesco Barbieri, Luis Espinosa-Anke, Jose Camacho-Collados, Steven Schockaert, Horacio Saggion | In this paper we propose a label-wise attention mechanism with which we attempt to better understand the nuances underlying emoji prediction. |
509 | A Tree-based Decoder for Neural Machine Translation | Xinyi Wang, Hieu Pham, Pengcheng Yin, Graham Neubig | In this paper, we (1) propose an NMT model that can naturally generate the topology of an arbitrary tree structure on the target side, and (2) experiment with various target tree structures. |
510 | Greedy Search with Probabilistic N-gram Matching for Neural Machine Translation | Chenze Shao, Xilin Chen, Yang Feng | On these grounds, we present a method with a differentiable sequence-level training objective based on probabilistic n-gram matching which can avoid the reinforcement framework. |
511 | Exploring Recombination for Efficient Decoding of Neural Machine Translation | Zhisong Zhang, Rui Wang, Masao Utiyama, Eiichiro Sumita, Hai Zhao | In this work, we introduce recombination in NMT decoding based on the concept of the “equivalence” of partial hypotheses. |
512 | Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation | Samuel Läubli, Rico Sennrich, Martin Volk | Recent research suggests that neural machine translation achieves parity with professional human translation on the WMT Chinese-English news translation task. |
513 | Automatic Reference-Based Evaluation of Pronoun Translation Misses the Point | Liane Guillou, Christian Hardmeier | We compare the performance of the APT and AutoPRF metrics for pronoun translation against a manually annotated dataset comprising human judgements as to the correctness of translations of the PROTEST test suite. |
514 | FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation | Xu Han, Hao Zhu, Pengfei Yu, Ziyun Wang, Yuan Yao, Zhiyuan Liu, Maosong Sun | We present a Few-Shot Relation Classification Dataset (dataset), consisting of 70, 000 sentences on 100 relations derived from Wikipedia and annotated by crowdworkers. |
515 | A strong baseline for question relevancy ranking | Ana Gonzalez, Isabelle Augenstein, Anders Søgaard | We present a strong baseline for question relevancy ranking by training a simple multi-task feed forward network on a bag of 14 distance measures for the input question pair. |
516 | Learning Sequence Encoders for Temporal Knowledge Graph Completion | Alberto García-Durán, Sebastijan Dumančić, Mathias Niepert | In this work we consider temporal knowledge graphs where relations between entities may only hold for a time interval or a specific point in time. |
517 | Similar but not the Same: Word Sense Disambiguation Improves Event Detection via Neural Representation Matching | Weiyi Lu, Thien Huu Nguyen | In this work, we propose a method to transfer the knowledge learned on WSD to ED by matching the neural representations learned for the two tasks. |
518 | Learning Word Representations with Cross-Sentence Dependency for End-to-End Co-reference Resolution | Hongyin Luo, Jim Glass | In this work, we present a word embedding model that learns cross-sentence dependency for improving end-to-end co-reference resolution (E2E-CR). |
519 | Word Relation Autoencoder for Unseen Hypernym Extraction Using Word Embeddings | Hong-You Chen, Cheng-Syuan Lee, Keng-Te Liao, Shou-De Lin | We propose a word relation autoencoder (WRAE) model to address the challenge. |
520 | Refining Pretrained Word Embeddings Using Layer-wise Relevance Propagation | Akira Utsumi | In this paper, we propose a simple method for refining pretrained word embeddings using layer-wise relevance propagation. |
521 | Learning Gender-Neutral Word Embeddings | Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, Kai-Wei Chang | To address this concern, in this paper, we propose a novel training procedure for learning gender-neutral word embeddings. |
522 | Learning Concept Abstractness Using Weak Supervision | Ella Rabinovich, Benjamin Sznajder, Artem Spector, Ilya Shnayderman, Ranit Aharonov, David Konopnicki, Noam Slonim | We introduce a weakly supervised approach for inferring the property of abstractness of words and expressions in the complete absence of labeled data. |
523 | Word Sense Induction with Neural biLM and Symmetric Patterns | Asaf Amrami, Yoav Goldberg | An established method for Word Sense Induction (WSI) uses a language model to predict probable substitutes for target words, and induces senses by clustering these resulting substitute vectors. |
524 | InferLite: Simple Universal Sentence Representations from Natural Language Inference Data | Jamie Kiros, William Chan | In order to better understand the components that lead to effective representations, we propose a lightweight version of InferSent, called InferLite, that does not use any recurrent layers and operates on a collection of pre-trained word embeddings. |
525 | Similarity-Based Reconstruction Loss for Meaning Representation | Olga Kovaleva, Anna Rumshisky, Alexey Romanov | Using an autoencoder framework, we propose and evaluate several loss functions that can be used as an alternative to the commonly used cross-entropy reconstruction loss. |
526 | What can we learn from Semantic Tagging? | Mostafa Abdou, Artur Kulmizev, Vinit Ravishankar, Lasha Abzianidze, Johan Bos | We investigate the effects of multi-task learning using the recently introduced task of semantic tagging. |
527 | Conditional Word Embedding and Hypothesis Testing via Bayes-by-Backprop | Rujun Han, Michael Gill, Arthur Spirling, Kyunghyun Cho | We address these concerns with a model that incorporates document covariates to estimate conditional word embedding distributions. |
528 | Classifying Referential and Non-referential It Using Gaze | Victoria Yaneva, Le An Ha, Richard Evans, Ruslan Mitkov | In this paper we use eye-tracking data to learn how humans perform this disambiguation and use this knowledge to improve the automatic classification of it. |
529 | State-of-the-art Chinese Word Segmentation with Bi-LSTMs | Ji Ma, Kuzman Ganchev, David Weiss | Surprisingly, we find that a bidirectional LSTM model, when combined with standard deep learning techniques and best practices, can achieve better accuracy on many of the popular datasets as compared to models based on more complex neuralnetwork architectures. |
530 | Sanskrit Sandhi Splitting using seq2(seq)2 | Rahul Aralikatte, Neelamadhav Gantayat, Naveen Panwar, Anush Sankaran, Senthil Mani | In this research, we propose a novel deep learning architecture called Double Decoder RNN (DD-RNN), which (i) predicts the location of the split(s) with 95% accuracy, and (ii) predicts the constituent words (learning the Sandhi splitting rules) with 79.5% accuracy, outperforming the state-of-art by 20%. |
531 | Unsupervised Neural Word Segmentation for Chinese via Segmental Language Modeling | Zhiqing Sun, Zhi-Hong Deng | In this paper, we propose the segmental language models (SLMs) for CWS. |
532 | LemmaTag: Jointly Tagging and Lemmatizing for Morphologically Rich Languages with BRNNs | Daniel Kondratyuk, Tomáš Gavenčiak, Milan Straka, Jan Hajič | We present LemmaTag, a featureless neural network architecture that jointly generates part-of-speech tags and lemmas for sentences by using bidirectional RNNs with character-level and word-level embeddings. |
533 | Recovering Missing Characters in Old Hawaiian Writing | Brendan Shillingford, Oiwi Parker Jones | We introduce two related methods to help solve this transliteration problem automatically. |
534 | When data permutations are pathological: the case of neural natural language inference | Natalie Schluter, Daniel Varab | In this paper, we illustrate this scenario for a trending NLP task: Natural Language Inference (NLI). |
535 | Bridging Knowledge Gaps in Neural Entailment via Symbolic Models | Dongyeop Kang, Tushar Khot, Ashish Sabharwal, Peter Clark | To facilitate this lookup, we propose a fact-level decomposition of the hypothesis, and verifying the resulting sub-facts against both the textual premise and the structured KB. |
536 | The BQ Corpus: A Large-scale Domain-specific Chinese Corpus For Sentence Semantic Equivalence Identification | Jing Chen, Qingcai Chen, Xin Liu, Haijun Yang, Daohe Lu, Buzhou Tang | This paper introduces the Bank Question (BQ) corpus, a Chinese corpus for sentence semantic equivalence identification (SSEI). |
537 | Interpreting Recurrent and Attention-Based Neural Models: a Case Study on Natural Language Inference | Reza Ghaeini, Xiaoli Fern, Prasad Tadepalli | In this paper, we take a step toward explaining such deep learning based models through a case study on a popular neural model for NLI. |
538 | Towards Semi-Supervised Learning for Deep Semantic Role Labeling | Sanket Vaibhav Mehta, Jay Yoon Lee, Jaime Carbonell | The paper proposes a semi-supervised semantic role labeling method that outperforms the state-of-the-art in limited SRL training corpora. |
539 | Identifying Domain Adjacent Instances for Semantic Parsers | James Ferguson, Janara Christensen, Edward Li, Edgar Gonzàlez | This approach improves the performance of a downstream semantic parser run on in-domain and domain-adjacent instances. |
540 | Mapping natural language commands to web elements | Panupong Pasupat, Tian-Shun Jiang, Evan Liu, Kelvin Guu, Percy Liang | We propose a new task for grounding language in this environment: given a natural language command (e.g., “click on the second article”), choose the correct element on the web page (e.g., a hyperlink or text box). We collected a dataset of over 50,000 commands that capture various phenomena such as functional references (e.g. “find who made this site”), relational reasoning (e.g. “article by john”), and visual reasoning (e.g. “top-most article”). |
541 | Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection | Sudhanshu Kasewa, Pontus Stenetorp, Sebastian Riedel | Initial work on inducing errors in this way using statistical machine translation has shown promise; we investigate cheaply constructing synthetic samples, given a small corpus of human-annotated data, using an off-the-rack attentive sequence-to-sequence model and a straight-forward post-processing procedure. |
542 | Modeling Input Uncertainty in Neural Network Dependency Parsing | Rob van der Goot, Gertjan van Noord | In this paper, we investigate whether these new neural approaches provide similar functionality as lexical normalization, or whether they are complementary. |
543 | Parameter sharing between dependency parsers for related languages | Miryam de Lhoneux, Johannes Bjerva, Isabelle Augenstein, Anders Søgaard | We present an evaluation of 27 different parameter sharing strategies across 10 languages, representing five pairs of related languages, each pair from a different language family. |
544 | Grammar Induction with Neural Language Models: An Unusual Replication | Phu Mon Htut, Kyunghyun Cho, Samuel Bowman | In a recent paper, Shen et al. (2018) introduce such a model and report near-state-of-the-art results on the target task of language modeling, and the first strong latent tree learning result on constituency parsing. |
545 | Data Augmentation via Dependency Tree Morphing for Low-Resource Languages | Gözde Gül Şahin, Mark Steedman | We present two simple text augmentation techniques using dependency trees, inspired from image processing. |
546 | How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks | Divyansh Kaushik, Zachary C. Lipton | In this paper, we establish sensible baselines for the bAbI, SQuAD, CBT, CNN, and Who-did-What datasets, finding that question- and passage-only models often perform surprisingly well. |
547 | MultiWOZ – A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling | Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Iñigo Casanueva, Stefan Ultes, Osman Ramadan, Milica Gašić | The proposed data-collection pipeline is entirely based on crowd-sourcing without the need of hiring professional annotators;secondly, a set of benchmark results of belief tracking, dialogue act and response generation is reported, which shows the usability of the data and sets a baseline for future studies. Even though machine learning has become the major scene in dialogue research community, the real breakthrough has been blocked by the scale of data available.To address this fundamental obstacle, we introduce the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple domains and topics.At a size of 10k dialogues, it is at least one order of magnitude larger than all previous annotated task-oriented corpora.The contribution of this work apart from the open-sourced dataset is two-fold:firstly, a detailed description of the data collection procedure along with a summary of data structure and analysis is provided. |
548 | Linguistically-Informed Self-Attention for Semantic Role Labeling | Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, Andrew McCallum | In this work, we present linguistically-informed self-attention (LISA): a neural network model that combines multi-head self-attention with multi-task learning across dependency parsing, part-of-speech tagging, predicate detection and SRL. |
549 | Phrase-Based & Neural Unsupervised Machine Translation | Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, Marc’Aurelio Ranzato | We propose two model variants, a neural and a phrase-based model. |