Paper Digest: EMNLP 2019 Highlights

November 1, 2019November 30, 2019 admin

Download EMNLP-2019-Paper-Digests.pdf– highlights of all ~680 EMNLP-2019 papers.
The Conference on Empirical Methods in Natural Language Processing (EMNLP) is one of the top natural language processing conferences in the world. In 2019, it is to be held in Hong Kong, China. There were 1,813 long paper submissions, of which 465 were accepted and 1,063 short paper submissions, of which 218 were accepted. A large number of these papers also published their code ( code download link).

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: EMNLP 2019 Long/Short Papers

	Title	Authors	Highlight
1	Attending to Future Tokens for Bidirectional Sequence Generation	Carolin Lawrence, Bhushan Kotnis, Mathias Niepert	We propose to make the sequence generation process bidirectional by employing special placeholder tokens.
2	Attention is not not Explanation	Sarah Wiegreffe, Yuval Pinter	We propose four alternative tests to determine when/whether attention can be used as explanation: a simple uniform-weights baseline; a variance calibration based on multiple random seed runs; a diagnostic framework using frozen weights from pretrained models; and an end-to-end adversarial attention training protocol.
3	Practical Obstacles to Deploying Active Learning	David Lowell, Zachary C. Lipton, Byron C. Wallace	In this paper, we show that while AL may provide benefits when used with specific models and for particular domains, the benefits of current approaches do not generalize reliably across models and tasks.
4	Transfer Learning Between Related Tasks Using Expected Label Proportions	Matan Ben Noach, Yoav Goldberg	We propose a novel application of the XR framework for transfer learning between related tasks, where knowing the labels of task A provides an estimation of the label proportion of task B.
5	Knowledge Enhanced Contextual Word Representations	Matthew E. Peters, Mark Neumann, Robert Logan, Roy Schwartz, Vidur Joshi, Sameer Singh, Noah A. Smith	We propose a general method to embed multiple knowledge bases (KBs) into large scale models, and thereby enhance their representations with structured, human-curated knowledge.
6	How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings	Kawin Ethayarajh	This suggests that upper layers of contextualizing models produce more context-specific representations, much like how upper layers of LSTMs produce more task-specific representations.
7	Room to Glo: A Systematic Comparison of Semantic Change Detection Approaches with Word Embeddings	Philippa Shoemark, Farhana Ferdousi Liza, Dong Nguyen, Scott Hale, Barbara McGillivray	We propose a new evaluation framework for semantic change detection and find that (i) using the whole time series is preferable over only comparing between the first and last time points; (ii) independently trained and aligned embeddings perform better than continuously trained embeddings for long time periods; and (iii) that the reference point for comparison matters.
8	Correlations between Word Vector Sets	Vitalii Zhelezniak, April Shen, Daniel Busbridge, Aleksandar Savkov, Nils Hammerla	Just like cosine similarity is used to compare individual word vectors, we introduce a novel application of the centered kernel alignment (CKA) as a natural generalisation of squared cosine similarity for sets of word vectors.
9	Game Theory Meets Embeddings: a Unified Framework for Word Sense Disambiguation	Rocco Tripodi, Roberto Navigli	Game-theoretic models, thanks to their intrinsic ability to exploit contextual information, have shown to be particularly suited for the Word Sense Disambiguation task.
10	Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog	Ryuichi Takanobu, Hanlin Zhu, Minlie Huang	To this end, we propose Guided Dialog Policy Learning, a novel algorithm based on Adversarial Inverse Reinforcement Learning for joint reward estimation and policy optimization in multi-domain task-oriented dialog.
11	Multi-hop Selector Network for Multi-turn Response Selection in Retrieval-based Chatbots	Chunyuan Yuan, Wei Zhou, Mingming Li, Shangwen Lv, Fuqing Zhu, Jizhong Han, Songlin Hu	In this paper, we will analyze the side effect of using too many context utterances and propose a multi-hop selector network (MSN) to alleviate the problem.
12	MoEL: Mixture of Empathetic Listeners	Zhaojiang Lin, Andrea Madotto, Jamin Shin, Peng Xu, Pascale Fung	In this paper, we propose a novel end-to-end approach for modeling empathy in dialogue systems: Mixture of Empathetic Listeners (MoEL).
13	Entity-Consistent End-to-end Task-Oriented Dialogue System with KB Retriever	Libo Qin, Yijia Liu, Wanxiang Che, Haoyang Wen, Yangming Li, Ting Liu	In this paper, we propose a novel framework which queries the KB in two steps to improve the consistency of generated entities.
14	Building Task-Oriented Visual Dialog Systems Through Alternative Optimization Between Dialog Policy and Language Generation	Mingyang Zhou, Josh Arnold, Zhou Yu	This paper proposes a novel framework that alternatively trains a RL policy for image guessing and a supervised seq2seq model to improve dialog generation quality.
15	DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation	Deepanway Ghosal, Navonil Majumder, Soujanya Poria, Niyati Chhaya, Alexander Gelbukh	In this paper, we present Dialogue Graph Convolutional Network (DialogueGCN), a graph neural network based approach to ERC.
16	Knowledge-Enriched Transformer for Emotion Detection in Textual Conversations	Peixiang Zhong, Di Wang, Chunyan Miao	In this paper, we address these challenges by proposing a Knowledge-Enriched Transformer (KET), where contextual utterances are interpreted using hierarchical self-attention and external commonsense knowledge is dynamically leveraged using a context-aware affective graph attention mechanism.
17	Interpretable Relevant Emotion Ranking with Event-Driven Attention	Yang Yang, Deyu ZHOU, Yulan He, Meng Zhang	In this paper, we proposed a novel interpretable relevant emotion ranking model with the event information incorporated into a deep learning architecture using the event-driven attentions.
18	Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects	Jianmo Ni, Jiacheng Li, Julian McAuley	We seek to introduce new datasets and methods to address the recommendation justification task.
19	Using Customer Service Dialogues for Satisfaction Analysis with Context-Assisted Multiple Instance Learning	Kaisong Song, Lidong Bing, Wei Gao, Jun Lin, Lujun Zhao, Jiancheng Wang, Changlong Sun, Xiaozhong Liu, Qiong Zhang	In this paper, we conduct a pilot study on the task of service satisfaction analysis (SSA) based on multi-turn CS dialogues. We construct two CS dialogue datasets from a top E-commerce platform.
20	Leveraging Dependency Forest for Neural Medical Relation Extraction	Linfeng Song, Yue Zhang, Daniel Gildea, Mo Yu, Zhiguo Wang, jinsong su	We investigate a method to alleviate this problem by utilizing dependency forests.
21	Open Relation Extraction: Relational Knowledge Transfer from Supervised Data to Unsupervised Data	Ruidong Wu, Yuan Yao, Xu Han, Ruobing Xie, Zhiyuan Liu, Fen Lin, Leyu Lin, Maosong Sun	To address this issue, we propose Relational Siamese Networks (RSNs) to learn similarity metrics of relations from labeled data of pre-defined relations, and then transfer the relational knowledge to identify novel relations in unlabeled data.
22	Improving Relation Extraction with Knowledge-attention	Pengfei Li, Kezhi Mao, Xuefeng Yang, Qi Li	We propose a novel knowledge-attention encoder which incorporates prior knowledge from external lexical resources into deep neural networks for relation extraction task.
23	Jointly Learning Entity and Relation Representations for Entity Alignment	Yuting Wu, Xiao Liu, Yansong Feng, Zheng Wang, Dongyan Zhao	This paper presents a novel joint learning framework for entity alignment.
24	Tackling Long-Tailed Relations and Uncommon Entities in Knowledge Graph Completion	Zihao Wang, Kwunping Lai, Piji Li, Lidong Bing, Wai Lam	Therefore, we propose a meta-learning framework that aims at handling infrequent relations with few-shot learning and uncommon entities by using textual descriptions.
25	Low-Resource Name Tagging Learned with Weakly Labeled Data	Yixin Cao, Zikun Hu, Tat-seng Chua, Zhiyuan Liu, Heng Ji	In this paper, we propose a novel neural model for name tagging solely based on weakly labeled (WL) data, so that it can be applied in any low-resource settings.
26	Learning Dynamic Context Augmentation for Global Entity Linking	Xiyuan Yang, Xiaotao Gu, Sheng Lin, Siliang Tang, Yueting Zhuang, Fei Wu, Zhigang Chen, Guoping Hu, Xiang Ren	In this paper, we propose a simple yet effective solution, called Dynamic Context Augmentation (DCA), for collective EL, which requires only one pass through the mentions in a document.
27	Open Event Extraction from Online Text using a Generative Adversarial Network	Rui Wang, Deyu ZHOU, Yulan He	To address these limitations, we propose an event extraction model based on Generative Adversarial Nets, called Adversarial-neural Event Model (AEM).
28	Learning to Bootstrap for Entity Set Expansion	Lingyong Yan, Xianpei Han, Le Sun, Ben He	To address the above two problems, we propose a novel bootstrapping method combining the Monte Carlo Tree Search (MCTS) algorithm with a deep similarity network, which can efficiently estimate delayed feedback for pattern evaluation and adaptively score entities given sparse supervision signals.
29	Multi-Input Multi-Output Sequence Labeling for Joint Extraction of Fact and Condition Tuples from Scientific Text	Tianwen Jiang, Tong Zhao, Bing Qin, Ting Liu, Nitesh Chawla, Meng Jiang	In this work, we propose a new sequence labeling framework (as well as a new tag schema) to jointly extract the fact and condition tuples from statement sentences.
30	Cross-lingual Structure Transfer for Relation and Event Extraction	Ananya Subburathinam, Di Lu, Heng Ji, Jonathan May, Shih-Fu Chang, Avirup Sil, Clare Voss	We investigate the suitability of cross-lingual structure transfer techniques for these tasks.
31	Uncover the Ground-Truth Relations in Distant Supervision: A Neural Expectation-Maximization Framework	Junfan Chen, Richong Zhang, Yongyi Mao, Hongyu Guo, Jie Xu	To cope with this challenge, we propose a novel label-denoising framework that combines neural network with probabilistic modelling, which naturally takes into account the noisy labels during learning.
32	Doc2EDAG: An End-to-End Document-level Framework for Chinese Financial Event Extraction	Shun Zheng, Wei Cao, Wei Xu, Jiang Bian	To address these challenges, we propose a novel end-to-end model, Doc2EDAG, which can generate an entity-based directed acyclic graph to fulfill the document-level EE (DEE) effectively. To demonstrate the effectiveness of Doc2EDAG, we build a large-scale real-world dataset consisting of Chinese financial announcements with the challenges mentioned above.
33	Event Detection with Trigger-Aware Lattice Neural Network	Ning Ding, Ziran Li, Zhiyuan Liu, Haitao Zheng, Zibo Lin	To address the two issues simultaneously, we propose the Trigger-aware Lattice Neural Net- work (TLNN).
34	A Boundary-aware Neural Model for Nested Named Entity Recognition	Changmeng Zheng, Yi Cai, Jingyun Xu, Ho-fung Leung, Guandong Xu	We propose a boundary-aware neural model for nested NER which leverages entity boundaries to predict entity categorical labels.
35	Learning the Extraction Order of Multiple Relational Facts in a Sentence with Reinforcement Learning	Xiangrong Zeng, Shizhu He, Daojian Zeng, Kang Liu, Shengping Liu, Jun Zhao	In this paper we argue that the extraction order is important in this task.
36	CaRe: Open Knowledge Graph Embeddings	Swapnil Gupta, Sreyash Kenkre, Partha Talukdar	We fill this gap in the paper and propose Canonicalization-infused Representations (CaRe) for OpenKGs.
37	Self-Attention Enhanced CNNs and Collaborative Curriculum Learning for Distantly Supervised Relation Extraction	Yuyun Huang, Jinhua Du	In this paper, we propose a novel model that employs a collaborative curriculum learning framework to reduce the effects of mislabelled data.
38	Neural Cross-Lingual Relation Extraction Based on Bilingual Word Embedding Mapping	Jian Ni, Radu Florian	In this paper, we propose a new approach for cross-lingual RE model transfer based on bilingual word embedding mapping.
39	Leveraging 2-hop Distant Supervision from Table Entity Pairs for Relation Extraction	xiang deng, Huan Sun	In this paper, we introduce a new strategy named 2-hop DS to enhance distantly supervised RE, based on the observation that there exist a large number of relational tables on the Web which contain entity pairs that share common relations.
40	EntEval: A Holistic Evaluation Benchmark for Entity Representations	Mingda Chen, Zewei Chu, Yang Chen, Karl Stratos, Kevin Gimpel	In this work, we propose EntEval: a test suite of diverse tasks that require nontrivial understanding of entities including entity typing, entity similarity, entity relation prediction, and entity disambiguation.
41	Joint Event and Temporal Relation Extraction with Shared Representations and Structured Prediction	Rujun Han, Qiang Ning, Nanyun Peng	We propose a joint event and temporal relation extraction model with shared representation learning and structured prediction.
42	Hierarchical Text Classification with Reinforced Label Assignment	Yuning Mao, Jingjing Tian, Jiawei Han, Xiang Ren	To solve the mismatch between training and inference as well as modeling label dependencies in a more principled way, we formulate HTC as a Markov decision process and propose to learn a Label Assignment Policy via deep reinforcement learning to determine where to place an object and when to stop the assignment process.
43	Investigating Capsule Network and Semantic Feature on Hyperplanes for Text Classification	Chunning Du, Haifeng Sun, Jingyu Wang, Qi Qi, Jianxin Liao, Chun Wang, Bing Ma	Therefore, we propose to use capsule networks to construct the vectorized representation of semantics and utilize hyperplanes to decompose each capsule to acquire the specific senses.
44	Label-Specific Document Representation for Multi-Label Text Classification	Lin Xiao, Xin Huang, Boli Chen, Liping Jing	In this paper, we propose a Label-Specific Attention Network (LSAN) to learn a label-specific document representation.
45	Hierarchical Attention Prototypical Networks for Few-Shot Text Classification	Shengli Sun, Qingfeng Sun, Kevin Zhou, Tengchao Lv	In this work, we propose a hierarchical attention prototypical networks (HAPN) for few-shot text classification.
46	Many Faces of Feature Importance: Comparing Built-in and Post-hoc Feature Importance in Text Classification	Vivian Lai, Zheng Cai, Chenhao Tan	In this work, we systematically compare feature importance from built-in mechanisms in a model such as attention values and post-hoc methods that approximate model behavior such as LIME.
47	Enhancing Local Feature Extraction with Global Representation for Neural Text Classification	Guocheng Niu, Hengru Xu, Bolei He, Xinyan Xiao, Hua Wu, Sheng GAO	This paper proposes a novel Encoder1-Encoder2 architecture, where global information is incorporated into the procedure of local feature extraction from scratch.
48	Latent-Variable Generative Models for Data-Efficient Text Classification	Xiaoan Ding, Kevin Gimpel	In this paper, we improve generative text classifiers by introducing discrete latent variables into the generative story, and explore several graphical model configurations.
49	PaRe: A Paper-Reviewer Matching Approach Using a Common Topic Space	Omer Anjum, Hongyu Gong, Suma Bhat, Wen-Mei Hwu, JinJun Xiong	Our approach, the common topic model, jointly models the topics common to the submission and the reviewer’s profile while relying on abstract topic vectors.
50	Linking artificial and human neural representations of language	Jon Gauthier, Roger Levy	What information from an act of sentence understanding is robustly represented in the human brain? We investigate this question by comparing sentence encoding models on a brain decoding task, where the sentence that an experimental participant has seen must be predicted from the fMRI signal evoked by the sentence.
51	Neural Text Summarization: A Critical Evaluation	Wojciech Kryscinski, Nitish Shirish Keskar, Bryan McCann, Caiming Xiong, Richard Socher	We critically evaluate key ingredients of the current research setup: datasets, evaluation metrics, and models, and highlight three primary shortcomings: 1) automatically collected datasets leave the task underconstrained and may contain noise detrimental to training and evaluation, 2) current evaluation protocol is weakly correlated with human judgment and does not account for important characteristics such as factual correctness, 3) models overfit to layout biases of current datasets and offer limited diversity in their outputs.
52	Neural data-to-text generation: A comparison between pipeline and end-to-end architectures	Thiago Castro Ferreira, Chris van der Lee, Emiel van Miltenburg, Emiel Krahmer	This study introduces a systematic comparison between neural pipeline and end-to-end data-to-text approaches for the generation of text from RDF triples.
53	MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance	Wei Zhao, Maxime Peyrard, Fei Liu, Yang Gao, Christian M. Meyer, Steffen Eger	In this paper we investigate strategies to encode system and reference texts to devise a metric that shows a high correlation with human judgment of text quality.
54	Select and Attend: Towards Controllable Content Selection in Text Generation	Xiaoyu Shen, Jun Suzuki, Kentaro Inui, Hui Su, Dietrich Klakow, Satoshi Sekine	This paper tackles this problem by decoupling content selection from the decoder.
55	Sentence-Level Content Planning and Style Specification for Neural Text Generation	Xinyu Hua, Lu Wang	To address these issues, we present an end-to-end trained two-step generation model, where a sentence-level content planner first decides on the keyphrases to cover as well as a desired language style, followed by a surface realization decoder that generates relevant and coherent text.
56	Translate and Label! An Encoder-Decoder Approach for Cross-lingual Semantic Role Labeling	Angel Daza, Anette Frank	We propose a Cross-lingual Encoder-Decoder model that simultaneously translates and generates sentences with Semantic Role Labeling annotations in a resource-poor target language.
57	Syntax-Enhanced Self-Attention-Based Semantic Role Labeling	Yue Zhang, Rui Wang, Luo Si	We present different approaches of en- coding the syntactic information derived from dependency trees of different quality and representations; we propose a syntax-enhanced self-attention model and compare it with other two strong baseline methods; and we con- duct experiments with newly published deep contextualized word representations as well.
58	VerbAtlas: a Novel Large-Scale Verbal Semantic Resource and Its Application to Semantic Role Labeling	Andrea Di Fabio, Simone Conia, Roberto Navigli	We present VerbAtlas, a new, hand-crafted lexical-semantic resource whose goal is to bring together all verbal synsets from WordNet into semantically-coherent frames.
59	Parameter-free Sentence Embedding via Orthogonal Basis	Ziyi Yang, Chenguang Zhu, Weizhu Chen	We propose a simple and robust non-parameterized approach for building sentence representations.
60	Evaluation Benchmarks and Learning Criteria for Discourse-Aware Sentence Representations	Mingda Chen, Zewei Chu, Kevin Gimpel	We benchmark sentence encoders pretrained with our proposed training objectives, as well as other popular pretrained sentence encoders on DiscoEval and other sentence evaluation tasks.
61	Extracting Possessions from Social Media: Images Complement Language	Dhivya Chinnappa, Srikala Murugan, Eduardo Blanco	This paper describes a new dataset and experiments to determine whether authors of tweets possess the objects they tweet about.
62	Learning to Speak and Act in a Fantasy Text Adventure Game	Jack Urbanek, Angela Fan, Siddharth Karamcheti, Saachi Jain, Samuel Humeau, Emily Dinan, Tim Rocktäschel, Douwe Kiela, Arthur Szlam, Jason Weston	We introduce a large-scale crowdsourced text adventure game as a research platform for studying grounded dialogue.
63	Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning	Khanh Nguyen, Hal Daumé III	We develop “Help, Anna!” (HANNA), an interactive photo-realistic simulator in which an agent fulfills object-finding tasks by requesting and interpreting natural language-and-vision assistance.
64	Incorporating Visual Semantics into Sentence Representations within a Grounded Space	Patrick Bordes, Eloi Zablocki, Laure Soulier, Benjamin Piwowarski, patrick Gallinari	To overcome this limitation, we propose to transfer visual information to textual representations by learning an intermediate representation space: the grounded space.
65	Neural Naturalist: Generating Fine-Grained Image Comparisons	Maxwell Forbes, Christine Kaeser-Chen, Piyush Sharma, Serge Belongie	We introduce the new Birds-to-Words dataset of 41k sentences describing fine-grained differences between photographs of birds. We propose a new model called Neural Naturalist that uses a joint image encoding and comparative module to generate comparative language, and evaluate the results with humans who must use the descriptions to distinguish real images.
66	Fine-Grained Evaluation for Entity Linking	Henry Rosales-Méndez, Aidan Hogan, Barbara Poblete	We propose a fuzzy recall metric to address the lack of consensus and conclude with fine-grained evaluation results comparing a selection of online EL systems.
67	Supervising Unsupervised Open Information Extraction Models	Arpita Roy, Youngja Park, Taesung Lee, Shimei Pan	We propose a novel supervised open information extraction (Open IE) framework that leverages an ensemble of unsupervised Open IE systems and a small amount of labeled data to improve system performance.
68	Neural Cross-Lingual Event Detection with Minimal Parallel Resources	Jian Liu, Yubo Chen, Kang Liu, Jun Zhao	In this paper, we propose a new method for cross-lingual ED, demonstrating a minimal dependency on parallel resources.
69	KnowledgeNet: A Benchmark Dataset for Knowledge Base Population	Filipe Mesquita, Matteo Cannaviccio, Jordan Schmidek, Paramita Mirza, Denilson Barbosa	KnowledgeNet is a benchmark dataset for the task of automatically populating a knowledge base (Wikidata) with facts expressed in natural language text on the web.
70	Effective Use of Transformer Networks for Entity Tracking	Aditya Gupta, Greg Durrett	In this paper, we explore the use of pre-trained transformer networks for entity tracking tasks in procedural text.
71	Explicit Cross-lingual Pre-training for Unsupervised Machine Translation	Shuo Ren, Yu Wu, Shujie Liu, Ming Zhou, Shuai Ma	In this paper, we propose a novel cross-lingual pre-training method for unsupervised machine translation by incorporating explicit cross-lingual training signals.
72	Latent Part-of-Speech Sequences for Neural Machine Translation	Xuewen Yang, Yingru Liu, Dongliang Xie, Xin Wang, Niranjan Balasubramanian	In this work, we introduce a new latent variable model, LaSyn, that captures the co-dependence between syntax and semantics, while allowing for effective and efficient inference over the latent space.
73	Improving Back-Translation with Uncertainty-based Confidence Estimation	Shuo Wang, Yang Liu, Chao Wang, Huanbo Luan, Maosong Sun	In this work, we propose to quantify the confidence of NMT model predictions based on model uncertainty.
74	Towards Linear Time Neural Machine Translation with Capsule Networks	Mingxuan Wang	To the best of our knowledge, this is the first work that capsule networks have been empirically investigated for sequence to sequence problems.
75	Modeling Multi-mapping Relations for Precise Cross-lingual Entity Alignment	Xiaofei Shi, Yanghua Xiao	To solve this issue, we propose a new embedding-based framework.
76	Supervised and Nonlinear Alignment of Two Embedding Spaces for Dictionary Induction in Low Resourced Languages	Masud Moshtaghi	In this study, we first describe the general requirements for the success of these techniques and then present a noise tolerant piecewise linear technique to learn a non-linear mapping between two monolingual word embedding vector spaces.
77	Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT	Shijie Wu, Mark Dredze	This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI, document classification, NER, POS tagging, and dependency parsing.
78	Iterative Dual Domain Adaptation for Neural Machine Translation	Jiali Zeng, Yang Liu, jinsong su, yubing Ge, Yaojie Lu, Yongjing Yin, jiebo luo	In this paper, we argue that such a strategy fails to fully extract the domain-shared translation knowledge, and repeatedly utilizing corpora of different domains can lead to better distillation of domain-shared translation knowledge.
79	Multi-agent Learning for Neural Machine Translation	tianchi bi, hao xiong, Zhongjun He, Hua Wu, Haifeng Wang	In this paper, we extend the training framework to the multi-agent sce- nario by introducing diverse agents in an in- teractive updating process.
80	Pivot-based Transfer Learning for Neural Machine Translation between Non-English Languages	Yunsu Kim, Petre Petrov, Pavel Petrushkov, Shahram Khadivi, Hermann Ney	We propose three methods to increase the relation among source, pivot, and target languages in the pre-training: 1) step-wise training of a single model for different language pairs, 2) additional adapter component to smoothly connect pre-trained encoder and decoder, and 3) cross-lingual encoder training via autoencoding of the pivot language.
81	Context-Aware Monolingual Repair for Neural Machine Translation	Elena Voita, Rico Sennrich, Ivan Titov	We propose a monolingual DocRepair model to correct inconsistencies between sentence-level translations.
82	Multi-Granularity Self-Attention for Neural Machine Translation	Jie Hao, Xing Wang, Shuming Shi, Jinfeng Zhang, Zhaopeng Tu	In this work, we present {{\textbackslash}em multi-granularity self-attention} (Mg-Sa): a neural network that combines multi-head self-attention and phrase modeling.
83	Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention	Biao Zhang, Ivan Titov, Rico Sennrich	We propose depth-scaled initialization (DS-Init), which decreases parameter variance at the initialization stage, and reduces output variance of residual connections so as to ease gradient back-propagation through normalization layers.
84	A Discriminative Neural Model for Cross-Lingual Word Alignment	Elias Stengel-Eskin, Tzu-ray Su, Matt Post, Benjamin Van Durme	We introduce a novel discriminative word alignment model, which we integrate into a Transformer-based machine translation model.
85	One Model to Learn Both: Zero Pronoun Prediction and Translation	Longyue Wang, Zhaopeng Tu, Xing Wang, Shuming Shi	In this paper, we propose a unified and discourse-aware ZP translation approach for neural MT models.
86	Dynamic Past and Future for Neural Machine Translation	Zaixiang Zheng, Shujian Huang, Zhaopeng Tu, XIN-YU DAI, Jiajun CHEN	In this paper, we propose to model the {\textbackslash}textit{dynamic principles} by explicitly separating source words into groups of translated and untranslated contents through parts-to-wholes assignment.
87	Revisit Automatic Error Detection for Wrong and Missing Translation — A Supervised Approach	Wenqiang Lei, Weiwen Xu, Ai Ti Aw, Yuanxin Xiang, Tat Seng Chua	To have a closer study of these issues and accelerate model development, we propose automatic detecting adequacy errors in MT hypothesis for MT model evaluation.
88	Towards Understanding Neural Machine Translation with Word Importance	Shilin He, Zhaopeng Tu, Xing Wang, Longyue Wang, Michael Lyu, Shuming Shi	In this work, we propose to address this gap by focusing on understanding the input-output behavior of NMT models.
89	Multilingual Neural Machine Translation with Language Clustering	Xu Tan, Jiale Chen, Di He, Yingce Xia, Tao QIN, Tie-Yan Liu	In this work, we develop a framework that clusters languages into different groups and trains one multilingual model for each cluster.
90	Don’t Forget the Long Tail! A Comprehensive Analysis of Morphological Generalization in Bilingual Lexicon Induction	Paula Czarnowska, Sebastian Ruder, Edouard Grave, Ryan Cotterell, Ann Copestake	In this work, we investigate whether state-of-the-art bilingual lexicon inducers are capable of learning this kind of generalization.
91	Pushing the Limits of Low-Resource Morphological Inflection	Antonios Anastasopoulos, Graham Neubig	In response, we propose a battery of improvements that greatly improve performance under such low-resource conditions.
92	Cross-Lingual Dependency Parsing Using Code-Mixed TreeBank	Meishan Zhang, Yue Zhang, Guohong Fu	To address this problem, we investigate syntactic transfer by code mixing, translating only confident words in a source treebank.
93	Hierarchical Pointer Net Parsing	Linlin Liu, Xiang Lin, Shafiq Joty, Simeng Han, Lidong Bing	In this paper, we propose hierarchical pointer network parsers, and apply them to dependency and sentence-level discourse parsing tasks.
94	Semi-Supervised Semantic Role Labeling with Cross-View Training	Rui Cai, Mirella Lapata	We propose an end-to-end SRL model and demonstrate it can effectively leverage unlabeled data under the cross-view training modeling paradigm.
95	Low-Resource Sequence Labeling via Unsupervised Multilingual Contextualized Representations	Zuyi Bao, Rui Huang, Chen Li, Kenny Zhu	In this work, we propose a Multilingual Language Model with deep semantic Alignment (MLMA) to generate language-independent representations for cross-lingual sequence labeling.
96	A Lexicon-Based Graph Neural Network for Chinese NER	Tao Gui, Yicheng Zou, Qi Zhang, Minlong Peng, Jinlan Fu, Zhongyu Wei, Xuanjing Huang	In this work, we try to alleviate this problem by introducing a lexicon-based graph neural network with global semantics, in which lexicon knowledge is used to connect characters to capture the local composition, while a global relay node can capture global sentence semantics and long-range dependency.
97	CM-Net: A Novel Collaborative Memory Network for Spoken Language Understanding	Yijin Liu, Fandong Meng, Jinchao Zhang, Jie Zhou, Yufeng Chen, Jinan Xu	To address this issue, in this paper we propose a novel Collaborative Memory Network (CM-Net) based on the well-designed block, named CM-block.
98	Tree Transformer: Integrating Tree Structures into Self-Attention	Yaushian Wang, Hung-Yi Lee, Yun-Nung Chen	This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures.
99	Semantic Role Labeling with Iterative Structure Refinement	Chunchuan Lyu, Shay B. Cohen, Ivan Titov	We model interactions between argument labeling decisions through iterative refinement.
100	Entity Projection via Machine Translation for Cross-Lingual NER	Alankar Jain, Bhargavi Paranjape, Zachary C. Lipton	We propose a system that improves over prior entity-projection methods by: (a) leveraging machine translation systems twice: first for translating sentences and subsequently for translating entities; (b) matching entities based on orthographic and phonetic similarity; and (c) identifying matches based on distributional statistics derived from the dataset.
101	A Bayesian Approach for Sequence Tagging with Crowds	Edwin D. Simpson, Iryna Gurevych	To address this, we propose a Bayesian method for aggregating sequence tags that reduces errors by modelling sequential dependencies between the annotations as well as the ground-truth labels.
102	A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages	Clara Vania, Yova Kementchedjhieva, Anders Søgaard, Adam Lopez	We systematically compare a set of simple strategies for improving low-resource parsers: data augmentation, which has not been tested before; cross-lingual training; and transliteration.
103	Target Language-Aware Constrained Inference for Cross-lingual Dependency Parsing	Tao Meng, Nanyun Peng, Kai-Wei Chang	In this paper, we show that weak supervisions of linguistic knowledge for the target languages can improve a cross-lingual graph-based dependency parser substantially.
104	Look-up and Adapt: A One-shot Semantic Parser	Zhichu Lu, Forough Arabshahi, Igor Labutov, Tom Mitchell	In this paper, we propose a semantic parser that generalizes to out-of-domain examples by learning a general strategy for parsing an unseen utterance through adapting the logical forms of seen utterances, instead of learning to generate a logical form from scratch.
105	Similarity Based Auxiliary Classifier for Named Entity Recognition	Shiyuan Xiao, Yuanxin Ouyang, Wenge Rong, Jianxin Yang, Zhang Xiong	Inspired by previous work in which a multi-task strategy is used to solve segmentation problems, we design a similarity based auxiliary classifier (SAC), which can distinguish entity words from non-entity words.
106	Variable beam search for generative neural parsing and its relevance for the analysis of neuro-imaging signal	Benoit Crabbé, Murielle Fabre, Christophe Pallier	This paper describes a method of variable beam size inference for Recurrent Neural Network Grammar (rnng) by drawing inspiration from sequential Monte-Carlo methods such as particle filtering.
107	Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets	Mor Geva, Yoav Goldberg, Jonathan Berant	In this paper, we perform a series of experiments showing these concerns are evident in three recent NLP datasets.
108	Robust Text Classifier on Test-Time Budgets	Md Rizwan Parvez, Tolga Bolukbasi, Kai-Wei Chang, Venkatesh Saligrama	To this end, we propose a data aggregation method to train the classifier, allowing it to achieve competitive performance on fractured sentences.
109	Commonsense Knowledge Mining from Pretrained Models	Joe Davison, Joshua Feldman, Alexander Rush	In this work, we develop a method for generating commonsense knowledge using a large, pre-trained bidirectional language model.
110	RNN Architecture Learning with Sparse Regularization	Jesse Dodge, Roy Schwartz, Hao Peng, Noah A. Smith	We present a structure learning method for learning sparse, parameter-efficient NLP models.
111	Analytical Methods for Interpretable Ultradense Word Embeddings	Philipp Dufter, Hinrich Schütze	In this work, we investigate three methods for making word spaces interpretable by rotation: Densifier (Rothe et al., 2016), linear SVMs and DensRay, a new method we propose.
112	Investigating Meta-Learning Algorithms for Low-Resource Natural Language Understanding Tasks	Zi-Yi Dou, Keyi Yu, Antonios Anastasopoulos	Inspired by the recent success of optimization-based meta-learning algorithms, in this paper, we explore the model-agnostic meta-learning algorithm (MAML) and its variants for low-resource NLU tasks.
113	Retrofitting Contextualized Word Embeddings with Paraphrases	Weijia Shi, Muhao Chen, Pei Zhou, Kai-Wei Chang	To address this issue, we propose a post-processing approach to retrofit the embedding with paraphrases.
114	Incorporating Contextual and Syntactic Structures Improves Semantic Similarity Modeling	Linqing Liu, Wei Yang, Jinfeng Rao, Raphael Tang, Jimmy Lin	However, such structure priors have not been well exploited in previous work for semantic modeling. To examine their effectiveness, we start with the Pairwise Word Interaction Model, one of the best models according to a recent reproducibility study, then introduce components for modeling context and structure using multi-layer BiLSTMs and TreeLSTMs.
115	Neural Linguistic Steganography	Zachary Ziegler, Yuntian Deng, Alexander Rush	We propose a steganography technique based on arithmetic coding with large-scale neural language models.
116	The Feasibility of Embedding Based Automatic Evaluation for Single Document Summarization	Simeng Sun, Ani Nenkova	Here we present a suite of experiments on using distributed representations for evaluating summarizers, both in reference-based and in reference-free setting.
117	Attention Optimization for Abstractive Document Summarization	Min Gui, Junfeng Tian, Rui Wang, Zhenglu Yang	We propose attention refinement unit paired with local variance loss to impose supervision on the attention model at each decoding step, and we also propose a global variance loss to optimize the attention distributions of all decoding steps from the global perspective.
118	Rewarding Coreference Resolvers for Being Consistent with World Knowledge	Rahul Aralikatte, Heather Lent, Ana Valeria Gonzalez, Daniel Herschcovich, Chen Qiu, Anders Sandholm, Michael Ringaard, Anders Søgaard	We show how to improve coreference resolvers by forwarding their input to a relation extraction system and reward the resolvers for producing triples that are found in knowledge bases.
119	An Empirical Study of Incorporating Pseudo Data into Grammatical Error Correction	Shun Kiyono, Jun Suzuki, Masato Mita, Tomoya Mizumoto, Kentaro Inui	In this study, these choices are investigated through extensive experiments, and state-of-the-art performance is achieved on the CoNLL-2014 test set (F0.5=65.0) and the official test set of the BEA-2019 shared task (F0.5=70.2) without making any modifications to the model architecture.
120	A Multilingual Topic Model for Learning Weighted Topic Links Across Corpora with Low Comparability	Weiwei Yang, Jordan Boyd-Graber, Philip Resnik	We introduce a new model that does not rely on this assumption, particularly useful in important low-resource language scenarios.
121	Measure Country-Level Socio-Economic Indicators with Streaming News: An Empirical Study	Bonan Min, Xiaoxi Zhao	In this paper, we propose Event-Centric Indicator Measure (ECIM), a novel approach to measure socio-economic indicators with events.
122	Towards Extracting Medical Family History from Natural Language Interactions: A New Dataset and Baselines	Mahmoud Azab, Stephane Dadian, Vivi Nastase, Larry An, Rada Mihalcea	We introduce a new dataset consisting of natural language interactions annotated with medical family histories, obtained during interactions with a genetic counselor and through crowdsourcing, following a questionnaire created by experts in the domain.
123	Multi-task Learning for Natural Language Generation in Task-Oriented Dialogue	Chenguang Zhu, Michael Zeng, Xuedong Huang	In this paper, we propose a novel multi-task learning framework, NLG-LM, for natural language generation.
124	Dirichlet Latent Variable Hierarchical Recurrent Encoder-Decoder in Dialogue Generation	Min Zeng, Yisen Wang, Yuan Luo	To address the issues, we propose to use the Dirichlet distribution with flexible structures to characterize the latent variables in place of the traditional Gaussian distribution, called Dirichlet Latent Variable Hierarchical Recurrent Encoder-Decoder model (Dir-VHRED).
125	Semi-Supervised Bootstrapping of Dialogue State Trackers for Task-Oriented Modelling	Bo-Hsiang Tseng, Marek Rei, Pawe? Budzianowski, Richard Turner, Bill Byrne, Anna Korhonen	In this paper, we investigate semi-supervised learning methods that are able to reduce the amount of required intermediate labelling.
126	A Progressive Model to Enable Continual Learning for Semantic Slot Filling	Yilin Shen, Xiangyu Zeng, Hongxia Jin	In this paper, we introduce a novel progressive slot filling model, ProgModel.
127	CASA-NLU: Context-Aware Self-Attentive Natural Language Understanding for Task-Oriented Chatbots	Arshit Gupta, Peng Zhang, Garima Lalwani, Mona Diab	In this work, we propose a context-aware self-attentive NLU (CASA-NLU) model that uses multiple signals over a variable context window, such as previous intents, slots, dialog acts and utterances, in addition to the current user utterance.
128	Sampling Matters! An Empirical Study of Negative Sampling Strategies for Learning of Matching Models in Retrieval-based Dialogue Systems	Jia Li, Chongyang Tao, wei wu, Yansong Feng, Dongyan Zhao, Rui Yan	We study how to sample negative examples to automatically construct a training set for effective model learning in retrieval-based dialogue systems.
129	Zero-shot Cross-lingual Dialogue Systems with Transferable Latent Variables	Zihan Liu, Jamin Shin, Yan Xu, Genta Indra Winata, Peng Xu, Andrea Madotto, Pascale Fung	Hence, we propose a zero-shot adaptation of task-oriented dialogue system to low-resource languages.
130	Modeling Multi-Action Policy for Task-Oriented Dialogues	Lei Shu, Hu Xu, Bing Liu, Piero Molino	In this paper, we compare the performance of several models on the task of predicting multiple acts for each turn.
131	An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction	Stefan Larson, Anish Mahendran, Joseph J. Peper, Christopher Clarke, Andrew Lee, Parker Hill, Jonathan K. Kummerfeld, Kevin Leach, Michael A. Laurenzano, Lingjia Tang, Jason mars	We introduce a new dataset that includes queries that are out-of-scope—i.e., queries that do not fall into any of the system’s supported intents.
132	Automatically Learning Data Augmentation Policies for Dialogue Tasks	Tong Niu, Mohit Bansal	In our work, we adapt AutoAugment to automatically discover effective perturbation policies for natural language processing (NLP) tasks such as dialogue generation.
133	uniblock: Scoring and Filtering Corpus with Unicode Block Information	Yingbo Gao, Weiyue Wang, Hermann Ney	In this paper, we introduce a simple statistical method, uniblock, to overcome this problem.
134	Multilingual word translation using auxiliary languages	Hagai Taitelbaum, Gal Chechik, Jacob Goldberger	In this study we propose a multilingual translation procedure that uses all the learned mappings to translate a word from one language to another.
135	Towards Better Modeling Hierarchical Structure for Self-Attention with Ordered Neurons	Jie Hao, Xing Wang, Shuming Shi, Jinfeng Zhang, Zhaopeng Tu	With the belief that modeling hierarchical structure is an essential complementary between SANs and RNNs, we propose to further enhance the strength of hybrid models with an advanced variant of RNNs — Ordered Neurons LSTM (ON-LSTM), which introduces a syntax-oriented inductive bias to perform tree-like composition.
136	Vecalign: Improved Sentence Alignment in Linear Time and Space	Brian Thompson, Philipp Koehn	We introduce Vecalign, a novel bilingual sentence alignment method which is linear in time and space with respect to the number of sentences being aligned and which requires only bilingual sentence embeddings.
137	Simpler and Faster Learning of Adaptive Policies for Simultaneous Translation	Baigong Zheng, Renjie Zheng, Mingbo Ma, Liang Huang	To combine the merits of both approaches, we propose a simple supervised-learning framework to learn an adaptive policy from oracle READ/WRITE sequences generated from parallel text.
138	Adversarial Learning with Contextual Embeddings for Zero-resource Cross-lingual Classification and NER	Phillip Keung, yichao lu, Vikas Bhardwaj	We improve upon multilingual BERT’s zero-resource cross-lingual performance via adversarial learning.
139	Recurrent Positional Embedding for Neural Machine Translation	Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita	To address this issue, this work proposes a recurrent positional embedding approach based on word vector.
140	Machine Translation for Machines: the Sentiment Classification Use Case	amirhossein tebbifakhr, Luisa Bentivogli, Matteo Negri, Marco Turchi	We propose a neural machine translation (NMT) approach that, instead of pursuing adequacy and fluency (“human-oriented” quality criteria), aims to generate translations that are best suited as input to a natural language processing component designed for a specific downstream task (a “machine-oriented” criterion).
141	Investigating the Effectiveness of BPE: The Power of Shorter Sequences	Matthias Gallé	We link BPE to the broader family of dictionary-based compression algorithms and compare it with other members of this family.
142	HABLex: Human Annotated Bilingual Lexicons for Experiments in Machine Translation	Brian Thompson, Rebecca Knowles, Xuan Zhang, Huda Khayrallah, Kevin Duh, Philipp Koehn	In this work, we present the HABLex dataset, designed to test methods for bilingual lexicon integration into neural machine translation.
143	Handling Syntactic Divergence in Low-resource Machine Translation	Chunting Zhou, Xuezhe Ma, Junjie Hu, Graham Neubig	In this paper, we propose a simple yet effective solution, whereby target-language sentences are re-ordered to match the order of the source and used as an additional source of training-time supervision.
144	Speculative Beam Search for Simultaneous Translation	Renjie Zheng, Mingbo Ma, Baigong Zheng, Liang Huang	To address this challenge, we propose a new speculative beam search algorithm that hallucinates several steps into the future in order to reach a more accurate decision by implicitly benefiting from a target language model.
145	Self-Attention with Structural Position Representations	Xing Wang, Zhaopeng Tu, Longyue Wang, Shuming Shi	In this work, we propose to augment SANs with structural position representations to model the latent structure of the input sentence, which is complementary to the standard sequential positional representations.
146	Exploiting Multilingualism through Multistage Fine-Tuning for Low-Resource Neural Machine Translation	Raj Dabre, Atsushi Fujita, Chenhui Chu	This paper highlights the impressive utility of multi-parallel corpora for transfer learning in a one-to-many low-resource neural machine translation (NMT) setting.
147	Unsupervised Domain Adaptation for Neural Machine Translation with Domain-Aware Feature Embeddings	Zi-Yi Dou, Junjie Hu, Antonios Anastasopoulos, Graham Neubig	In this work, we propose an approach that adapts models with domain-aware feature embeddings, which are learned via an auxiliary language modeling task.
148	A Regularization-based Framework for Bilingual Grammar Induction	Yong Jiang, Wenjuan Han, Kewei Tu	We propose three regularization methods that encourage similarity between model parameters, dependency edge scores, and parse trees respectively.
149	Encoders Help You Disambiguate Word Senses in Neural Machine Translation	Gongbo Tang, Rico Sennrich, Joakim Nivre	In this paper, we explore the ability of NMT encoders and decoders to disambiguate word senses by evaluating hidden states and investigating the distributions of self-attention.
150	Korean Morphological Analysis with Tied Sequence-to-Sequence Multi-Task Model	Hyun-Je Song, Seong-Bae Park	This paper formulates Korean morphological analysis as a combination of the tasks and presents a tied sequence-to-sequence multi-task model for training the two tasks simultaneously without any explicit regularization.
151	Efficient Convolutional Neural Networks for Diacritic Restoration	Sawsan Alqahtani, Ajay Mishra, Mona Diab	As diacritic restoration benefits from both previous as well as subsequent timesteps, we further apply and evaluate a variant of TCN, Acausal TCN (A-TCN), which incorporates context from both directions (previous and future) rather than strictly incorporating previous context as in the case of TCN.
152	Improving Generative Visual Dialog by Answering Diverse Questions	Vishvak Murahari, Prithvijit Chattopadhyay, Dhruv Batra, Devi Parikh, Abhishek Das	To improve this, we devise a simple auxiliary objective that incentivizes Q-Bot to ask diverse questions, thus reducing repetitions and in turn enabling A-Bot to explore a larger state space during RL i.e. be exposed to more visual concepts to talk about, and varied questions to answer.
153	Cross-lingual Transfer Learning with Data Selection for Large-Scale Spoken Language Understanding	Quynh Do, Judith Gaspers	In this paper, we address this question and propose a simple but effective language model based source-language data selection method for cross-lingual transfer learning in large-scale spoken language understanding.
154	Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations	Po-Yao Huang, Xiaojun Chang, Alexander Hauptmann	With the aim of promoting and understanding the multilingual version of image search, we leverage visual object detection and propose a model with diverse multi-head attention to learn grounded multilingual multimodal representations.
155	Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering	Soravit Changpinyo, Bo Pang, Piyush Sharma, Radu Soricut	In this paper, we examine the effect of decoupling box proposal and featurization for down-stream tasks.
156	REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning	Ming Jiang, Junjie Hu, Qiuyuan Huang, Lei Zhang, Jana Diesner, Jianfeng Gao	In this study, we present a fine-grained evaluation method REO for automatically measuring the performance of image captioning systems.
157	WSLLN:Weakly Supervised Natural Language Localization Networks	Mingfei Gao, Larry Davis, Richard Socher, Caiming Xiong	We propose weakly supervised language localization networks (WSLLN) to detect events in long, untrimmed videos given language queries.
158	Grounding learning of modifier dynamics: An application to color naming	Xudong Han, Philip Schulz, Trevor Cohn	We present a model of color modifiers that, compared with previous additive models in RGB space, learns more complex transformations.
159	Robust Navigation with Language Pretraining and Stochastic Sampling	Xiujun Li, Chunyuan Li, Qiaolin Xia, Yonatan Bisk, Asli Celikyilmaz, Jianfeng Gao, Noah A. Smith, Yejin Choi	In this paper, we report two simple but highly effective methods to address these challenges and lead to a new state-of-the-art performance.
160	Towards Making a Dependency Parser See	Michalina Strzyz, David Vilares, Carlos Gómez-Rodríguez	We explore whether it is possible to leverage eye-tracking data in an RNN dependency parser (for English) when such information is only available during training – i.e. no aggregated or token-level gaze features are used at inference time.
161	Unsupervised Labeled Parsing with Deep Inside-Outside Recursive Autoencoders	Andrew Drozdov, Patrick Verga, Yi-Pei Chen, Mohit Iyyer, Andrew McCallum	In this work, we show that we can effectively recover these types of labels using the learned phrase vectors from deep inside-outside recursive autoencoders (DIORA).
162	Dependency Parsing for Spoken Dialog Systems	Sam Davidson, Dian Yu, Zhou Yu	Therefore, we propose the Spoken Conversation Universal Dependencies (SCUD) annotation scheme that extends the Universal Dependencies (UD) (Nivre et al., 2016) guidelines to spoken human-machine dialogs.
163	Span-based Hierarchical Semantic Parsing for Task-Oriented Dialog	Panupong Pasupat, Sonal Gupta, Karishma Mandyam, Rushin Shah, Mike Lewis, Luke Zettlemoyer	We propose a semantic parser for parsing compositional utterances into Task Oriented Parse (TOP), a tree representation that has intents and slots as labels of nesting tree nodes.
164	Enhancing Context Modeling with a Query-Guided Capsule Network for Document-level Translation	Zhengxin Yang, Jinchao Zhang, Fandong Meng, Shuhao Gu, Yang Feng, Jie Zhou	To address this problem, we propose a query-guided capsule networks to cluster context information into different perspectives from which the target translation may concern.
165	Simple, Scalable Adaptation for Neural Machine Translation	Ankur Bapna, Orhan Firat	We propose a simple yet efficient approach for adaptation in NMT.
166	Controlling Text Complexity in Neural Machine Translation	Sweta Agrawal, Marine Carpuat	This work introduces a machine translation task where the output is aimed at audiences of different levels of target language proficiency.
167	Investigating Multilingual NMT Representations at Scale	Sneha Kudugunta, Ankur Bapna, Isaac Caswell, Orhan Firat	In this work, we attempt to understand massively multilingual NMT representations (with 103 languages) using Singular Value Canonical Correlation Analysis (SVCCA), a representation similarity framework that allows us to compare representations across different languages, layers and models.
168	Hierarchical Modeling of Global Context for Document-Level Neural Machine Translation	Xin Tan, Longyin Zhang, Deyi Xiong, Guodong Zhou	In this paper, we propose a hierarchical model to learn the global context for document-level neural machine translation (NMT).
169	Cross-Lingual Machine Reading Comprehension	Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, Guoping Hu	In this paper, we propose Cross-Lingual Machine Reading Comprehension (CLMRC) task for the languages other than English.
170	A Multi-Type Multi-Span Network for Reading Comprehension that Requires Discrete Reasoning	Minghao Hu, Yuxing Peng, Zhen Huang, Dongsheng Li	In this paper, we introduce the Multi-Type Multi-Span Network (MTMSN), a neural reading comprehension model that combines a multi-type answer predictor designed to support various answer types (e.g., span, count, negation, and arithmetic expression) with a multi-span extraction method for dynamically producing one or multiple text spans.
171	Neural Duplicate Question Detection without Labeled Training Data	Andreas Rücklé, Nafise Sadat Moosavi, Iryna Gurevych	In this work, we propose two novel methods—weak supervision using the title and body of a question, and the automatic generation of duplicate questions—and show that both can achieve improved performances even though they do not require any labeled data.
172	Asking Clarification Questions in Knowledge-Based Question Answering	Jingjing Xu, Yuechen Wang, Duyu Tang, Nan Duan, Pengcheng Yang, Qi Zeng, Ming Zhou, Xu SUN	In this paper, we construct a new clarification dataset, CLAQUA, with nearly 40K open-domain examples.
173	Multi-View Domain Adapted Sentence Embeddings for Low-Resource Unsupervised Duplicate Question Detection	Nina Poerner, Hinrich Schütze	We address the problem of Duplicate Question Detection (DQD) in low-resource domain-specific Community Question Answering forums.
174	Multi-label Categorization of Accounts of Sexism using a Neural Framework	Pulkit Parikh, Harika Abburi, Pinkesh Badjatiya, Radhika Krishnan, Niyati Chhaya, Manish Gupta, Vasudeva Varma	We develop a neural solution for this multi-label classification that can combine sentence representations obtained using models such as BERT with distributional and linguistic word embeddings using a flexible, hierarchical architecture involving recurrent components and optional convolutional ones.
175	The Trumpiest Trump? Identifying a Subject’s Most Characteristic Tweets	Charuta Pethe, Steve Skiena	We quantify the extent to which a given short text is characteristic of a specific person, using a dataset of tweets from fifteen celebrities. Such analysis is useful for generating excerpts of high-volume Twitter profiles, and understanding how representativeness relates to tweet popularity.
176	Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts	Luke Breitfeller, Emily Ahn, David Jurgens, Yulia Tsvetkov	In this paper, we devise a general but nuanced, computationally operationalizable typology of microaggressions based on a small subset of data that we have.
177	Reinforced Product Metadata Selection for Helpfulness Assessment of Customer Reviews	Miao Fan, Chao Feng, Mingming Sun, Ping Li	To address this problem, we propose a novel framework composed of two mutual-benefit modules.
178	Learning Invariant Representations of Social Media Users	Nicholas Andrews, Marcus Bishop	In this paper, we propose a novel procedure to learn a mapping from short episodes of user activity on social media to a vector space in which the distance between points captures the similarity of the corresponding users’ invariant features.
179	(Male, Bachelor) and (Female, Ph.D) have different connotations: Parallelly Annotated Stylistic Language Dataset with Multiple Personas	Dongyeop Kang, Varun Gangal, Eduard Hovy	We release PASTEL, the parallel and annotated stylistic language dataset, that contains ~41K parallel sentences (8.3K parallel stories) annotated across different personas.
180	Movie Plot Analysis via Turning Point Identification	Pinelopi Papalampidi, Frank Keller, Mirella Lapata	We propose the task of turning point identification in movies as a means of analyzing their narrative structure. We introduce a dataset consisting of screenplays and plot synopses annotated with turning points and present an end-to-end neural network model that identifies turning points in plot synopses and projects them onto scenes in screenplays.
181	Latent Suicide Risk Detection on Microblog via Suicide-Oriented Word Embeddings and Layered Attention	Lei Cao, Huijun Zhang, Ling Feng, Zihan Wei, Xin Wang, Ningyun Li, Xiaohao He	Enlightened by the hidden “tree holes” phenomenon on microblog, where people at suicide risk tend to disclose their inner real feelings and thoughts to the microblog space whose authors have committed suicide, we explore the use of tree holes to enhance microblog-based suicide risk detection from the following two perspectives. A large-scale well-labelled suicide data set is also reported in the paper.
182	Deep Ordinal Regression for Pledge Specificity Prediction	Shivashankar Subramanian, Trevor Cohn, Timothy Baldwin	In this paper we collate a novel dataset of manifestos from eleven Australian federal election cycles, with over 12,000 sentences annotated with specificity (e.g., rhetorical vs detailed pledge) on a fine-grained scale. We propose deep ordinal regression approaches for specificity prediction, under both supervised and semi-supervised settings, and provide empirical results demonstrating the effectiveness of the proposed techniques over several baseline approaches.
183	Data-Efficient Goal-Oriented Conversation with Dialogue Knowledge Transfer Networks	Igor Shalyminov, Sungjin Lee, Arash Eshghi, Oliver Lemon	In this paper, we present the Dialogue Knowledge Transfer Network (DiKTNet), a state-of-the-art approach to goal-oriented dialogue generation which only uses a few example dialogues (i.e. few-shot learning), none of which has to be annotated.
184	Multi-Granularity Representations of Dialog	Shikib Mehri, Maxine Eskenazi	This paper introduces a novel training procedure which explicitly learns multiple representations of language at several levels of granularity.
185	Are You for Real? Detecting Identity Fraud via Dialogue Interactions	Weikang Wang, Jiajun Zhang, Qian Li, Chengqing Zong, Zhifei Li	In this paper, we focus on identity fraud detection in loan applications and propose to solve this problem with a novel interactive dialogue system which consists of two modules.
186	Hierarchy Response Learning for Neural Conversation Generation	Bo Zhang, Xiaoming Zhang	Unlike past work that has focused on diversifying the output at word-level or discourse-level with a flat model to alleviate this problem, we propose a hierarchical generation model to capture the different levels of diversity using the conditional variational autoencoders.
187	Knowledge Aware Conversation Generation with Explainable Reasoning over Augmented Graphs	zhibin liu, Zheng-Yu Niu, Hua Wu, Haifeng Wang	To address this challenge, we propose a knowledge aware chatting machine with three components, an augmented knowledge graph with both triples and texts, knowledge selector, and knowledge aware response generator.
188	Adaptive Parameterization for Neural Dialogue Generation	Hengyi Cai, Hongshen Chen, Cheng Zhang, Yonghao Song, Xiaofang Zhao, Dawei Yin	In this work, we propose an Adaptive Neural Dialogue generation model, AdaND, which manages various conversations with conversation-specific parameterization.
189	Towards Knowledge-Based Recommender Dialog System	Qibin Chen, Junyang Lin, Yichang Zhang, Ming Ding, Yukuo Cen, Hongxia Yang, Jie Tang	In this paper, we propose a novel end-to-end framework called KBRD, which stands for Knowledge-Based Recommender Dialog System.
190	Structuring Latent Spaces for Stylized Response Generation	Xiang Gao, Yizhe Zhang, Sungjin Lee, Michel Galley, Chris Brockett, Jianfeng Gao, Bill Dolan	We propose StyleFusion, which bridges conversation modeling and non-parallel style transfer by sharing a structured latent space.
191	Improving Open-Domain Dialogue Systems via Multi-Turn Incomplete Utterance Restoration	Zhufeng Pan, Kun Bai, Yan Wang, Lianqiang Zhou, Xiaojiang Liu	To facilitate the study of incomplete utterance restoration for open-domain dialogue systems, a large-scale multi-turn dataset Restoration-200K is collected and manually labeled with the explicit relation between an utterance and its context. We also propose a “pick-and-combine” model to restore the incomplete utterance from its context.
192	Unsupervised Context Rewriting for Open Domain Conversation	Kun Zhou, Kai Zhang, Yu Wu, Shujie Liu, Jingsong Yu	This paper proposes an explicit context rewriting method, which rewrites the last utterance by considering context history.
193	Dually Interactive Matching Network for Personalized Response Selection in Retrieval-Based Chatbots	Jia-Chen Gu, Zhen-Hua Ling, Xiaodan Zhu, Quan Liu	This paper proposes a dually interactive matching network (DIM) for presenting the personalities of dialogue agents in retrieval-based chatbots.
194	DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic Knowledge Graphs	Yi-Lin Tuan, Yun-Nung Chen, Hung-yi Lee	This paper proposes a new task about how to apply dynamic knowledge graphs in neural conversation model and presents a novel TV series conversation corpus (DyKgChat) for the task.
195	Retrieval-guided Dialogue Response Generation via a Matching-to-Generation Framework	Deng Cai, Yan Wang, Wei Bi, Zhaopeng Tu, Xiaojiang Liu, Shuming Shi	This paper presents a novel framework in which the skeleton extraction is made by an interpretable matching model and the following skeleton-guided response generation is accomplished by a separately trained generator.
196	Scalable and Accurate Dialogue State Tracking via Hierarchical Sequence Generation	Liliang Ren, Jianmo Ni, Julian McAuley	In this paper, we investigate how to approach DST using a generation framework without the pre-defined ontology list.
197	Low-Resource Response Generation with Template Prior	Ze Yang, wei wu, Jian Yang, Can Xu, zhoujun li	Since the paired data now is no longer enough to train a neural generation model, we consider leveraging the large scale of unpaired data that are much easier to obtain, and propose response generation with both paired and unpaired data.
198	A Discrete CVAE for Response Generation on Short-Text Conversation	Jun Gao, Wei Bi, Xiaojiang Liu, Junhui Li, Guodong Zhou, Shuming Shi	In this paper, we introduce a discrete latent variable with an explicit semantic meaning to improve the CVAE on short-text conversation.
199	Who Is Speaking to Whom? Learning to Identify Utterance Addressee in Multi-Party Conversations	Ran Le, Wenpeng Hu, Mingyue Shang, Zhenjun You, Lidong Bing, Dongyan Zhao, Rui Yan	In this paper, we aim to tackle the challenge of identifying all the miss- ing addressees in a conversation session.
200	A Semi-Supervised Stable Variational Network for Promoting Replier-Consistency in Dialogue Generation	Jinxin Chang, Ruifang He, Longbiao Wang, Xiangyu Zhao, Ting Yang, Ruifang Wang	However, the sampled information from latent space usually becomes useless due to the KL divergence vanishing issue, and the highly abstractive global variables easily dilute the personal features of replier, leading to a non replier-specific response. Therefore, a novel Semi-Supervised Stable Variational Network (SSVN) is proposed to address these issues.
201	Modeling Personalization in Continuous Space for Response Generation via Augmented Wasserstein Autoencoders	Zhangming Chan, Juntao Li, Xiaopeng Yang, Xiuying Chen, Wenpeng Hu, Dongyan Zhao, Rui Yan	In this work, we improve the WAE for response generation.
202	Variational Hierarchical User-based Conversation Model	JinYeong Bak, Alice Oh	To overcome this limitation, we propose a new model with a stochastic variable designed to capture the speaker information and deliver it to the conversational context. To test whether our model generates more appropriate conversation responses, we build a new conversation corpus containing approximately 27,000 speakers and 770,000 conversations.
203	Recommendation as a Communication Game: Self-Supervised Bot-Play for Goal-oriented Dialogue	Dongyeop Kang, Anusha Balakrishnan, Pararth Shah, Paul Crook, Y-Lan Boureau, Jason Weston	In this work, we collect a goal-driven recommendation dialogue dataset (GoRecDial), which consists of 9,125 dialogue games and 81,260 conversation turns between pairs of human workers recommending movies to each other. We leverage the dataset to develop an end-to-end dialogue system that can simultaneously converse and recommend.
204	CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases	Tao Yu, Rui Zhang, Heyang Er, Suyi Li, Eric Xue, Bo Pang, Xi Victoria Lin, Yi Chern Tan, Tianze Shi, Zihan Li, Youxuan Jiang, Michihiro Yasunaga, Sungrok Shim, Tao Chen, Alexander Fabbri, Zifan Li, Luyao Chen, Yuwen Zhang, Shreya Dixit, Vincent Zhang, Caiming Xiong, Richard Socher, Walter Lasecki, Dragomir Radev	We present CoSQL, a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems.
205	A Practical Dialogue-Act-Driven Conversation Model for Multi-Turn Response Selection	Harshit Kumar, Arvind Agarwal, Sachindra Joshi	This paper proposes an end-to-end multi-task model for conversation modeling, which is optimized for two tasks, dialogue act prediction and response selection, with the latter being the task of interest.
206	How to Build User Simulators to Train RL-based Dialog Systems	Weiyan Shi, Kun Qian, Xuewei Wang, Zhou Yu	We propose a method of standardizing user simulator building that can be used by the community to compare dialog system quality using the same set of user simulators fairly.
207	Low-Rank HOCA: Efficient High-Order Cross-Modal Attention for Video Captioning	Tao Jin, Siyu Huang, Yingming Li, Zhongfei Zhang	Motivated by this, we propose a video captioning model with High-Order Cross-Modal Attention (HOCA) where the attention weights are calculated based on the high-order correlation tensor to capture the frame-level cross-modal interaction of different modalities sufficiently.
208	Image Captioning with Very Scarce Supervised Data: Adversarial Semi-Supervised Learning Approach	Dong-Jin Kim, Jinsoo Choi, Tae-Hyun Oh, In So Kweon	In this paper, we develop a novel data-efficient semi-supervised framework for training an image captioning model. To evaluate, we construct scarcely-paired COCO dataset, a modified version of MS COCO caption dataset.
209	Dual Attention Networks for Visual Reference Resolution in Visual Dialog	Gi-Cheon Kang, Jaeseo Lim, Byoung-Tak Zhang	In this paper, we propose Dual Attention Networks (DAN) for visual reference resolution in VisDial.
210	Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents	Jack Hessel, Lillian Lee, David Mimno	We present algorithms that discover image-sentence relationships without relying on explicit multimodal annotation in training.
211	UR-FUNNY: A Multimodal Language Dataset for Understanding Humor	Md Kamrul Hasan, Wasifur Rahman, AmirAli Bagher Zadeh, Jianyuan Zhong, Md Iftekhar Tanveer, Louis-Philippe Morency, Mohammed (Ehsan) Hoque	The dataset and accompanying studies, present a framework in multimodal humor detection for the natural language processing community. This paper presents a diverse multimodal dataset, called UR-FUNNY, to open the door to understanding multimodal language used in expressing humor.
212	Partners in Crime: Multi-view Sequential Inference for Movie Understanding	Nikos Papasarantopoulos, Lea Frermann, Mirella Lapata, Shay B. Cohen	We describe an incremental neural architecture paired with a novel training objective for incremental inference.
213	Guiding the Flowing of Semantics: Interpretable Video Captioning via POS Tag	Xinyu Xiao, Lingfeng Wang, Bin Fan, Shinming Xiang, Chunhong Pan	To address these problems, we propose an Adaptive Semantic Guidance Network (ASGN), which instantiates the whole video semantics to different POS-aware semantics with the supervision of part of speech (POS) tag.
214	A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding	Libo Qin, Wanxiang Che, Yangming Li, Haoyang Wen, Ting Liu	In this paper, we propose a novel framework for SLU to better incorporate the intent information, which further guiding the slot filling.
215	Talk2Car: Taking Control of Your Self-Driving Car	Thierry Deruyttere, Simon Vandenhende, Dusan Grujicic, Luc Van Gool, Marie-Francine Moens	In this paper we consider the former. Our work presents the Talk2Car dataset, which is the first object referral dataset that contains commands written in natural language for self-driving cars.
216	Fact-Checking Meets Fauxtography: Verifying Claims About Images	Dimitrina Zlatkova, Preslav Nakov, Ivan Koychev	In particular, we create a new dataset for this problem, and we explore a variety of features modeling the claim, the image, and the relationship between the claim and the image.
217	Video Dialog via Progressive Inference and Cross-Transformer	Weike Jin, Zhou Zhao, Mao Gu, Jun Xiao, Furu Wei, Yueting Zhuang	In this paper, we introduce a novel progressive inference mechanism for video dialog, which progressively updates query information based on dialog history and video content until the agent think the information is sufficient and unambiguous.
218	Executing Instructions in Situated Collaborative Interactions	Alane Suhr, Claudia Yan, Jack Schluger, Stanley Yu, Hadi Khader, Marwa Mouallem, Iris Zhang, Yoav Artzi	We introduce a learning approach focused on recovery from cascading errors between instructions, and modeling methods to explicitly reason about instructions with multiple goals.
219	Fusion of Detected Objects in Text for Visual Question Answering	Chris Alberti, Jeffrey Ling, Michael Collins, David Reitter	To advance models of multimodal context, we introduce a simple yet powerful neural architecture for data that combines vision and natural language.
220	TIGEr: Text-to-Image Grounding for Image Caption Evaluation	Ming Jiang, Qiuyuan Huang, Lei Zhang, Xin Wang, Pengchuan Zhang, Zhe Gan, Jana Diesner, Jianfeng Gao	This paper presents a new metric called TIGEr for the automatic evaluation of image captioning systems.
221	Universal Adversarial Triggers for Attacking and Analyzing NLP	Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, Sameer Singh	We propose a gradient-guided search over tokens which finds short trigger sequences (e.g., one word for classification and four words for language modeling) that successfully trigger the target prediction.
222	To Annotate or Not? Predicting Performance Drop under Domain Shift	Hady Elsahar, Matthias Gallé	In this paper, we study the problem of predicting the performance drop of modern NLP models under domain-shift, in the absence of any target domain labels.
223	Adaptively Sparse Transformers	Gonçalo M. Correia, Vlad Niculae, André F. T. Martins	In this work, we introduce the adaptively sparse Transformer, wherein attention heads have flexible, context-dependent sparsity patterns.
224	Show Your Work: Improved Reporting of Experimental Results	Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A. Smith	In this paper, we demonstrate that test-set performance scores alone are insufficient for drawing accurate conclusions about which model performs best.
225	A Deep Factorization of Style and Structure in Fonts	Akshay Srivatsan, Jonathan Barron, Dan Klein, Taylor Berg-Kirkpatrick	We propose a deep factorization model for typographic analysis that disentangles content from style.
226	Cross-lingual Semantic Specialization via Lexical Relation Induction	Edoardo Maria Ponti, Ivan Vuli?, Goran Glavaš, Roi Reichart, Anna Korhonen	To bridge this gap, we propose a novel method that transfers specialization from a resource-rich source language (English) to virtually any target language.
227	Modelling the interplay of metaphor and emotion through multitask learning	Verna Dankers, Marek Rei, Martha Lewis, Ekaterina Shutova	In this paper, we investigate the relationship between metaphor and emotion within a computational framework, by proposing the first joint model of these phenomena.
228	How well do NLI models capture verb veridicality?	Alexis Ross, Ellie Pavlick	We investigate whether a state-of-the-art natural language inference model (BERT) learns to make correct inferences about veridicality in verb-complement constructions. We introduce an NLI dataset for veridicality evaluation consisting of 1,500 sentence pairs, covering 137 unique verbs.
229	Modeling Color Terminology Across Thousands of Languages	Arya D. McCarthy, Winston Wu, Aaron Mueller, William Watson, David Yarowsky	This paper employs a set of diverse measures on massively cross-linguistic data to operationalize and critique the Berlin and Kay color term hypotheses.
230	Negative Focus Detection via Contextual Attention Mechanism	Longxiang Shen, Bowei Zou, Yu Hong, Guodong Zhou, Qiaoming Zhu, AiTi Aw	In particular, we introduce a framework which consists of a Bidirectional Long Short-Term Memory (BiLSTM) neural network and a Conditional Random Fields (CRF) layer to effectively encode the order information and the long-range context dependency in a sentence.
231	A Unified Neural Coherence Model	Han Cheol Moon, Tasnim Mohiuddin, Shafiq Joty, Chi Xu	In this paper, we propose a unified coherence model that incorporates sentence grammar, inter-sentence coherence relations, and global coherence patterns into a common neural framework.
232	Topic-Guided Coherence Modeling for Sentence Ordering by Preserving Global and Local Information	Byungkook Oh, Seungmin Seo, Cheolheon Shin, Eunju Jo, Kyong-Ho Lee	We propose a novel topic-guided coherence modeling (TGCM) for sentence ordering.
233	Neural Generative Rhetorical Structure Parsing	Amandla Mabona, Laura Rimell, Stephen Clark, Andreas Vlachos	In this paper, we present the first generative model for RST parsing.
234	Weak Supervision for Learning Discourse Structure	Sonia Badene, Kate Thompson, Jean-Pierre Lorré, Nicholas Asher	This paper provides a detailed comparison of a data programming approach with (i) off-the-shelf, state-of-the-art deep learning architectures that optimize their representations (BERT) and (ii) handcrafted-feature approaches previously used in the discourse analysis literature.
235	Predicting Discourse Structure using Distant Supervision from Sentiment	Patrick Huber, Giuseppe Carenini	We propose a novel approach that uses distant supervision on an auxiliary task (sentiment classification), to generate abundant data for RST-style discourse structure prediction.
236	The Myth of Double-Blind Review Revisited: ACL vs. EMNLP	Cornelia Caragea, Ana Uban, Liviu P. Dinu	We study this question on the ACL and EMNLP paper collections and present an analysis on how well deep learning techniques can infer the authors of a paper.
237	Uncover Sexual Harassment Patterns from Personal Stories by Joint Key Element Extraction and Categorization	Yingchi Liu, Quanzhi Li, Marika Cifor, Xiaozhong Liu, Qiong Zhang, Luo Si	In this study, we manually annotated those stories with labels in the dimensions of location, time, and harassers’ characteristics, and marked the key elements related to these dimensions.
238	Identifying Predictive Causal Factors from News Streams	Ananth Balashankar, Sunandan Chakraborty, Samuel Fraiberger, Lakshminarayanan Subramanian	We propose a new framework to uncover the relationship between news events and real world phenomena.
239	Training Data Augmentation for Detecting Adverse Drug Reactions in User-Generated Content	Sepideh Mesbah, Jie Yang, Robert-Jan Sips, Manuel Valle Torre, Christoph Lofi, Alessandro Bozzon, Geert-Jan Houben	In this paper, we introduce a data augmentation approach that leverages variational autoencoders to learn high-quality data distributions from a large unlabeled dataset, and subsequently, to automatically generate a large labeled training set from a small set of labeled samples.
240	Deep Reinforcement Learning-based Text Anonymization against Private-Attribute Inference	Ahmadreza Mosallanezhad, Ghazaleh Beigi, Huan Liu	In this paper, we study the problem of textual data anonymization and propose a novel Reinforcement Learning-based Text Anonymizor, RLTA, which addresses the problem of private-attribute leakage while preserving the utility of textual data.
241	Tree-structured Decoding for Solving Math Word Problems	Qianying Liu, Wenyv Guan, Sujian Li, Daisuke Kawahara	To address this problem, we propose a tree-structured decoding method that generates the abstract syntax tree of the equation in a top-down manner.
242	PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text	Haitian Sun, Tania Bedrax-Weiss, William Cohen	We describe PullNet, an integrated framework for (1) learning what to retrieve and (2) reasoning with this heterogeneous information to find the best answer.
243	Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning	Lifu Huang, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi	In this paper, we introduce Cosmos QA, a large-scale dataset of 35,600 problems that require commonsense-based reading comprehension, formulated as multiple-choice questions.
244	Finding Generalizable Evidence by Learning to Convince Qtextbackslash&A Models	Ethan Perez, Siddharth Karamcheti, Rob Fergus, Jason Weston, Douwe Kiela, Kyunghyun Cho	We propose a system that finds the strongest supporting evidence for a given answer to a question, using passage-based question-answering (QA) as a testbed.
245	Ranking and Sampling in Open-Domain Question Answering	Yanfu Xu, Zheng Lin, Yuanxin Liu, Rui Liu, Weiping Wang, Dan Meng	In this paper, we first introduce a ranking model leveraging the paragraph-question and the paragraph-paragraph relevance to compute a confidence score for each paragraph. Furthermore, based on the scores, we design a modified weighted sampling strategy for training to mitigate the influence of the noisy and distracting paragraphs.
246	A Non-commutative Bilinear Model for Answering Path Queries in Knowledge Graphs	Katsuhiko Hayashi, Masashi Shimbo	In this paper, we propose a new bilinear KGE model, called BlockHolE, based on block circulant matrices.
247	Generating Questions for Knowledge Bases via Incorporating Diversified Contexts and Answer-Aware Loss	Cao Liu, Kang Liu, Shizhu He, Zaiqing Nie, Jun Zhao	In this paper, we strive toward the above two issues via incorporating diversified contexts and answer-aware loss.
248	Multi-Task Learning for Conversational Question Answering over a Large-Scale Knowledge Base	Tao Shen, Xiubo Geng, Tao QIN, Daya Guo, Duyu Tang, Nan Duan, Guodong Long, Daxin Jiang	To tackle these issues, we propose an innovative multi-task learning framework where a pointer-equipped semantic parsing model is designed to resolve coreference in conversations, and naturally empower joint learning with a novel type-aware entity detection model.
249	BiPaR: A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels	Yimin Jing, Deyi Xiong, Zhen Yan	This paper presents BiPaR, a bilingual parallel novel-style machine reading comprehension (MRC) dataset, developed to support multilingual and cross-lingual reading comprehension.
250	Language Models as Knowledge Bases?	Fabio Petroni, Tim Rocktäschel, Sebastian Riedel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander Miller	We present an in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models.
251	NumNet: Machine Reading Comprehension with Numerical Reasoning	Qiu Ran, Yankai Lin, Peng Li, Jie Zhou, Zhiyuan Liu	To address this issue, we propose a numerical MRC model named as NumNet, which utilizes a numerically-aware graph neural network to consider the comparing information and performs numerical reasoning over numbers in the question and passage.
252	Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks	Haoyang Huang, Yaobo Liang, Nan Duan, Ming Gong, Linjun Shou, Daxin Jiang, Ming Zhou	We present Unicoder, a universal language encoder that is insensitive to different languages.
253	Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering	Shiyue Zhang, Mohit Bansal	We propose two ways to generate synthetic QA pairs: generate new questions from existing articles or collect QA pairs from new articles.
254	Adversarial Domain Adaptation for Machine Reading Comprehension	Huazheng Wang, Zhe Gan, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Hongning Wang	In this paper, we focus on unsupervised domain adaptation for Machine Reading Comprehension (MRC), where the source domain has a large amount of labeled data, while only unlabeled passages are available in the target domain.
255	Incorporating External Knowledge into Machine Reading for Generative Question Answering	Bin Bi, Chen Wu, Ming Yan, Wei Wang, Jiangnan Xia, Chenliang Li	In this paper, we propose a new neural model, Knowledge-Enriched Answer Generator (KEAG), which is able to compose a natural answer by exploiting and aggregating evidence from all four information sources available: question, passage, vocabulary and knowledge.
256	Answering questions by learning to rank – Learning to rank by answering questions	George Sebastian Pirtoaca, Traian Rebedea, Stefan Ruseti	The contribution of this article is two-fold. First, it describes a method which can be used to semantically rank documents extracted from Wikipedia or similar natural language corpora. Second, we propose a model employing the semantic ranking that holds the first place in two of the most popular leaderboards for answering multiple-choice questions: ARC Easy and Challenge.
257	Discourse-Aware Semantic Self-Attention for Narrative Reading Comprehension	Todor Mihaylov, Anette Frank	In this work, we propose to use linguistic annotations as a basis for a Discourse-Aware Semantic Self-Attention encoder that we employ for reading comprehension on narrative texts.
258	Revealing the Importance of Semantic Retrieval for Machine Reading at Scale	Yixin Nie, Songhe Wang, Mohit Bansal	In this work, we give general guidelines on system design for MRS by proposing a simple yet effective pipeline system with special consideration on hierarchical semantic retrieval at both paragraph and sentence level, and their potential effects on the downstream task.
259	PubMedQA: A Dataset for Biomedical Research Question Answering	Qiao Jin, Bhuwan Dhingra, Zhengping Liu, William Cohen, Xinghua Lu	We introduce PubMedQA, a novel biomedical question answering (QA) dataset collected from PubMed abstracts.
260	Quick and (not so) Dirty: Unsupervised Selection of Justification Sentences for Multi-hop Question Answering	Vikas Yadav, Steven Bethard, Mihai Surdeanu	We propose an unsupervised strategy for the selection of justification sentences for multi-hop question answering (QA) that (a) maximizes the relevance of the selected sentences, (b) minimizes the overlap between the selected facts, and (c) maximizes the coverage of both question and answer.
261	Answering Complex Open-domain Questions Through Iterative Query Generation	Peng Qi, Xiaowen Lin, Leo Mehr, Zijian Wang, Christopher D. Manning	We present GoldEn (Gold Entity) Retriever, which iterates between reading context and retrieving more supporting documents to answer open-domain multi-hop questions.
262	NL2pSQL: Generating Pseudo-SQL Queries from Under-Specified Natural Language Questions	Fuxiang Chen, Seung-won Hwang, Jaegul Choo, Jung-Woo Ha, Sunghun Kim	Here we describe a new NL2pSQL task to generate pSQL codes from natural language questions on under-specified database issues, NL2pSQL.
263	Leveraging Frequent Query Substructures to Generate Formal Queries for Complex Question Answering	Jiwei Ding, Wei Hu, Qixin Xu, Yuzhong Qu	In this paper, we propose SubQG, a new query generation approach based on frequent query substructures, which helps rank the existing (but nonsignificant) query structures or build new query structures.
264	Incorporating Graph Attention Mechanism into Knowledge Graph Reasoning Based on Deep Reinforcement Learning	Heng Wang, Shuangyin Li, Rong Pan, Mingzhi Mao	In this paper, we present a deep reinforcement learning based model named by AttnPath, which incorporates LSTM and Graph Attention Mechanism as the memory components.
265	Learning to Update Knowledge Graphs by Reading News	Jizhi Tang, Yansong Feng, Dongyan Zhao	In this paper, we propose a novel neural network method, GUpdater, to tackle these problems.
266	DIVINE: A Generative Adversarial Imitation Learning Framework for Knowledge Graph Reasoning	Ruiping Li, Xiang Cheng	To this end, in this paper, we present DIVINE, a novel plug-and-play framework based on generative adversarial imitation learning for enhancing existing RL-based methods.
267	Original Semantics-Oriented Attention and Deep Fusion Network for Sentence Matching	Mingtong Liu, Yujie Zhang, Jinan Xu, Yufeng Chen	In this paper, we present an original semantics-oriented attention and deep fusion network (OSOA-DFN) for sentence matching.
268	Representation Learning with Ordered Relation Paths for Knowledge Graph Completion	Yao Zhu, Hongzhi Liu, Zhonghai Wu, Yang Song, Tao Zhang	To solve these problems, we propose a novel KG completion method named OPTransE.
269	Collaborative Policy Learning for Open Knowledge Graph Reasoning	Cong Fu, Tong Chen, Meng Qu, Woojeong Jin, Xiang Ren	We propose a novel reinforcement learning framework to train two collaborative agents jointly, i.e., a multi-hop graph reasoner and a fact extractor.
270	Modeling Event Background for If-Then Commonsense Reasoning Using Context-aware Variational Autoencoder	Li Du, Xiao Ding, Ting Liu, Zhongyang Li	To address these issues, we propose a novel context-aware variational autoencoder effectively learning event background information to guide the If-Then reasoning.
271	Asynchronous Deep Interaction Network for Natural Language Inference	Di Liang, Fubao Zhang, Qi Zhang, Xuanjing Huang	In this paper, we propose an asynchronous deep interaction network (ADIN) to complete the task.
272	Keep Calm and Switch On! Preserving Sentiment and Fluency in Semantic Text Exchange	Steven Y. Feng, Aaron W. Li, Jesse Hoey	In this paper, we present a novel method for measurably adjusting the semantics of text while preserving its sentiment and fluency, a task we call semantic text exchange.
273	Query-focused Scenario Construction	Su Wang, Greg Durrett, Katrin Erk	The news coverage of events often contains not one but multiple incompatible accounts of what happened. We develop a query-based system that extracts compatible sets of events (scenarios) from such data, formulated as one-class clustering.
274	Semi-supervised Entity Alignment via Joint Knowledge Embedding Model and Cross-graph Model	Chengjiang Li, Yixin Cao, Lei Hou, Jiaxin Shi, Juanzi Li, Tat-Seng Chua	In this paper, we propose a semi-supervised entity alignment method by joint Knowledge Embedding model and Cross-Graph model (KECG).
275	Designing and Interpreting Probes with Control Tasks	John Hewitt, Percy Liang	In this paper, we propose control tasks, which associate word types with random outputs, to complement linguistic tasks.
276	Specializing Word Embeddings (for Parsing) by Information Bottleneck	Xiang Lisa Li, Jason Eisner	We propose a very fast variational information bottleneck (VIB) method to nonlinearly compress these embeddings, keeping only the information that helps a discriminative parser.
277	Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing – A Tale of Two Parsers Revisited	Artur Kulmizev, Miryam de Lhoneux, Johannes Gontrum, Elena Fano, Joakim Nivre	In this paper, we show that, even though some details of the picture have changed after the switch to neural networks and continuous representations, the basic trade-off between rich features and global optimization remains essentially the same.
278	Semantic graph parsing with recurrent neural network DAG grammars	Federico Fancellu, Sorcha Gilroy, Adam Lopez, Mirella Lapata	We present recurrent neural network DAG grammars, a graph-aware sequence model that generates only well-formed graphs while sidestepping many difficulties in graph prediction.
279	75 Languages, 1 Model: Parsing Universal Dependencies Universally	Dan Kondratyuk, Milan Straka	We present UDify, a multilingual multi-task model capable of accurately predicting universal part-of-speech, morphological features, lemmas, and dependency trees simultaneously for all 124 Universal Dependencies treebanks across 75 languages.
280	Interactive Language Learning by Question Answering	Xingdi Yuan, Marc-Alexandre Côté, Jie Fu, Zhouhan Lin, Chris Pal, Yoshua Bengio, Adam Trischler	We propose and evaluate a set of baseline models for the QAit task that includes deep reinforcement learning agents.
281	What’s Missing: A Knowledge Gap Guided Approach for Multi-hop Question Answering	Tushar Khot, Ashish Sabharwal, Peter Clark	We propose jointly training a model to simultaneously fill this knowledge gap and compose it with the provided partial knowledge.
282	KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning	Bill Yuchen Lin, Xinyue Chen, Jamin Chen, Xiang Ren	In this paper, we propose a textual inference framework for answering commonsense questions, which effectively utilizes external, structured commonsense knowledge graphs to perform explainable inferences.
283	Learning with Limited Data for Multilingual Reading Comprehension	Kyungjae Lee, Sunghyun Park, Hojae Han, Jinyoung Yeo, Seung-won Hwang, Juho Lee	To address this challenge, we propose a weakly-supervised framework that quantifies such noises from automatically generated labels, to deemphasize or fix noisy data in training.
284	A Discrete Hard EM Approach for Weakly Supervised Question Answering	Sewon Min, Danqi Chen, Hannaneh Hajishirzi, Luke Zettlemoyer	In this paper, we show it is possible to convert such tasks into discrete latent variable learning problems with a precomputed, task-specific set of possible solutions (e.g. different mentions or equations) that contains one correct option.
285	Is the Red Square Big? MALeViC: Modeling Adjectives Leveraging Visual Contexts	Sandro Pezzelle, Raquel Fernández	This work aims at modeling how the meaning of gradable adjectives of size (big’, small’) can be learned from visually-grounded contexts. In contrast with the standard computational approach that simplistically treats gradable adjectives as fixed’ attributes, we pose the problem as relational: to be successful, a model has to consider the full visual context.
286	Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs	Alex Warstadt, Yu Cao, Ioana Grosu, Wei Peng, Hagen Blix, Yining Nie, Anna Alsop, Shikha Bordia, Haokun Liu, Alicia Parrish, Sheng-Fu Wang, Jason Phang, Anhad Mohananey, Phu Mon Htut, Paloma Jeretic, Samuel R. Bowman	We explore five experimental methods inspired by prior work evaluating pretrained sentence representation models. We use a single linguistic phenomenon, negative polarity item (NPI) licensing, as a case study for our experiments.
287	Representation of Constituents in Neural Language Models: Coordination Phrase as a Case Study	Aixiu AN, Peng Qian, Ethan Wilcox, Roger Levy	Here we investigate neural models’ ability to represent constituent-level features, using coordinated noun phrases as a case study.
288	Towards Zero-shot Language Modeling	Edoardo Maria Ponti, Ivan Vuli?, Ryan Cotterell, Roi Reichart, Anna Korhonen	Can we construct a neural language model which is inductively biased towards learning human language? Motivated by this question, we aim at constructing an informative prior for held-out languages on the task of character-level, open-vocabulary language modelling.
289	What Gets Echoed? Understanding the “Pointers” in Explanations of Persuasive Arguments	David Atkinson, Kumar Bhargav Srinivasan, Chenhao Tan	We propose a novel word-level prediction task to investigate how explanations selectively reuse, or echo, information from what is being explained (henceforth, explanandum).
290	Modeling Frames in Argumentation	Yamen Ajjour, Milad Alshomary, Henning Wachsmuth, Benno Stein	We present a fully unsupervised approach to this task, which first removes topical information and then identifies frames using clustering. For evaluation purposes, we provide a corpus with 12, 326 debate-portal arguments, organized along the frames of the debates’ topics.
291	AMPERSAND: Argument Mining for PERSuAsive oNline Discussions	Tuhin Chakrabarty, Christopher Hidey, Smaranda Muresan, Kathy McKeown, Alyssa Hwang	We propose a computational model for argument mining in online persuasive discussion forums that brings together the micro-level (argument as product) and macro-level (argument as process) models of argumentation.
292	Evaluating adversarial attacks against multiple fact verification systems	James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal	We introduce two novel scoring metrics, attack potency and system resilience which take into account the correctness of the adversarial instances, an aspect often ignored in adversarial evaluations.
293	Nonsense!: Quality Control via Two-Step Reason Selection for Annotating Local Acceptability and Related Attributes in News Editorials	Wonsuk Yang, seungwon yoon, Ada Carpenter, Jong Park	In this study, we present a simple but powerful quality control method using two-step reason selection.
294	Evaluating Pronominal Anaphora in Machine Translation: An Evaluation Measure and a Test Suite	Prathyusha Jwalapuram, Shafiq Joty, Irina Temnikova, Preslav Nakov	With this aim in mind, we contribute an extensive, targeted dataset that can be used as a test suite for pronoun translation, covering multiple source languages and different pronoun errors drawn from real system translations, for English.
295	A Regularization Approach for Incorporating Event Knowledge and Coreference Relations into Neural Discourse Parsing	Zeyu Dai, Ruihong Huang	Realizing that external knowledge and linguistic constraints may not always apply in understanding a particular context, we propose a regularization approach that tightly integrates these constraints with contexts for deriving word representations.
296	Weakly Supervised Multilingual Causality Extraction from Wikipedia	Chikara Hashimoto	We present a method for extracting causality knowledge from Wikipedia, such as Protectionism -{\textgreater} Trade war, where the cause and effect entities correspond to Wikipedia articles.
297	Attribute-aware Sequence Network for Review Summarization	Junjie Li, Xuepeng Wang, Dawei Yin, Chengqing Zong	Therefore, we propose an Attribute-aware Sequence Network (ASN) to take the aforementioned users’ characteristics into account, which includes three modules: an attribute encoder encodes the attribute preferences over the words; an attribute-aware review encoder adopts an attribute-based selective mechanism to select the important information of a review; and an attribute-aware summary decoder incorporates attribute embedding and attribute-specific word-using habits into word prediction.
298	Extractive Summarization of Long Documents by Combining Global and Local Context	Wen Xiao, Giuseppe Carenini	In this paper, we propose a novel neural single-document extractive summarization model for long documents, incorporating both the global context of the whole document and the local context within the current topic.
299	Enhancing Neural Data-To-Text Generation Models with External Background Knowledge	Shuang Chen, Jinpeng Wang, Xiaocheng Feng, Feng Jiang, Bing Qin, Chin-Yew Lin	In this paper, we enhance neural data-to-text models with external knowledge in a simple but effective way to improve the fidelity of generated text.
300	Reading Like HER: Human Reading Inspired Extractive Summarization	Ling Luo, Xiang Ao, Yan Song, Feiyang Pan, Min Yang, Qing He	In this work, we re-examine the problem of extractive text summarization for long documents.
301	Contrastive Attention Mechanism for Abstractive Sentence Summarization	Xiangyu Duan, Hongfei Yu, Mingming Yin, Min Zhang, Weihua Luo, Yue Zhang	We propose a contrastive attention mechanism to extend the sequence-to-sequence framework for abstractive sentence summarization task, which aims to generate a brief summary of a given source sentence.
302	NCLS: Neural Cross-Lingual Summarization	Junnan Zhu, Qian Wang, Yining Wang, Yu Zhou, Jiajun Zhang, Shaonan Wang, Chengqing Zong	To handle that, we present an end-to-end CLS framework, which we refer to as Neural Cross-Lingual Summarization (NCLS), for the first time.
303	Clickbait? Sensational Headline Generation with Auto-tuned Reinforcement Learning	Peng Xu, Chien-Sheng Wu, Andrea Madotto, Pascale Fung	In this paper, we propose a model that generates sensational headlines without labeled data.
304	Concept Pointer Network for Abstractive Summarization	Wenbo Wang, Yang Gao, Heyan Huang, Yuxiang Zhou	Inspired by the popular pointer generator sequence-to-sequence model, this paper presents a concept pointer network for improving these aspects of abstractive summarization.
305	Surface Realisation Using Full Delexicalisation	Anastasia Shimorina, Claire Gardent	We propose a modular approach to surface realisation which models each of these components separately, and evaluate our approach on the 10 languages covered by the SR’18 Surface Realisation Shared Task shallow track.
306	IMaT: Unsupervised Text Attribute Transfer via Iterative Matching and Translation	Zhijing Jin, Di Jin, Jonas Mueller, Nicholas Matthews, Enrico Santus	In contrast, we propose a simpler approach, Iterative Matching and Translation (IMaT), which: (1) constructs a pseudo-parallel corpus by aligning a subset of semantically similar sentences from the source and the target corpora; (2) applies a standard sequence-to-sequence model to learn the attribute transfer; (3) iteratively improves the learned transfer function by refining imperfections in the alignment.
307	Better Rewards Yield Better Summaries: Learning to Summarise Without References	Florian Böhm, Yang Gao, Christian M. Meyer, Ori Shapira, Ido Dagan, Iryna Gurevych	To find a better reward function that can guide RL to generate human-appealing summaries, we learn a reward function from human ratings on 2,500 summaries.
308	Mixture Content Selection for Diverse Sequence Generation	Jaemin Cho, Minjoon Seo, Hannaneh Hajishirzi	We present a method to explicitly separate diversification from generation using a general plug-and-play module (called SELECTOR) that wraps around and guides an existing encoder-decoder model.
309	An End-to-End Generative Architecture for Paraphrase Generation	Qian Yang, zhouyuan huo, Dinghan Shen, Yong Cheng, Wenlin Wang, Guoyin Wang, Lawrence Carin	To overcome these challenges, we propose the first end-to-end conditional generative architecture for generating paraphrases via adversarial training, which does not depend on extra linguistic information.
310	Table-to-Text Generation with Effective Hierarchical Encoder on Three Dimensions (Row, Column and Time)	Heng Gong, Xiaocheng Feng, Bing Qin, Ting Liu	To address aforementioned problems, not only do we model each table cell considering other records in the same row, we also enrich table’s representation by modeling each table cell in context of other cells in the same column or with historical (time dimension) data respectively.
311	Subtopic-driven Multi-Document Summarization	Xin Zheng, Aixin Sun, Jing Li, Karthik Muthuswamy	In this paper, we propose a summarization model called STDS.
312	Referring Expression Generation Using Entity Profiles	Meng Cao, Jackie Chi Kit Cheung	In this study, we address this in two ways. First, we propose task setups in which we specifically test a REG system’s ability to generalize to entities not seen during training. Second, we propose a profile-based deep neural network model, ProfileREG, which encodes both the local context and an external profile of the entity to generate reference realizations.
313	Exploring Diverse Expressions for Paraphrase Generation	Lihua Qian, Lin Qiu, Weinan Zhang, Xin Jiang, Yong Yu	In this paper, we propose a novel approach with two discriminators and multiple generators to generate a variety of different paraphrases.
314	Enhancing AMR-to-Text Generation with Dual Graph Representations	Leonardo F. R. Ribeiro, Claire Gardent, Iryna Gurevych	To address this difficulty, we propose a novel graph-to-sequence model that encodes different but complementary perspectives of the structural information contained in the AMR graph.
315	Keeping Consistency of Sentence Generation and Document Classification with Multi-Task Learning	Toru Nishino, Shotaro Misawa, Ryuji Kano, Tomoki Taniguchi, Yasuhide Miura, Tomoko Ohkuma	The purpose of our study is to generate multiple outputs consistently.
316	Toward a Task of Feedback Comment Generation for Writing Learning	Ryo Nagata	In this paper, we introduce a novel task called feedback comment generation — a task of automatically generating feedback comments such as a hint or an explanatory note for writing learning for non-native learners of English.
317	Improving Question Generation With to the Point Context	Jingjing Li, Yifan Gao, Lidong Bing, Irwin King, Michael R. Lyu	To address this issue, we propose a method to jointly model the unstructured sentence and the structured answer-relevant relation (extracted from the sentence in advance) for question generation.
318	Deep Copycat Networks for Text-to-Text Generation	Julia Ive, Pranava Madhyastha, Lucia Specia	We introduce Copycat, a transformer-based pointer network for such tasks which obtains competitive results in abstractive text summarisation and generates more abstractive summaries.
319	Towards Controllable and Personalized Review Generation	Pan Li, Alexander Tuzhilin	In this paper, we propose a novel model RevGAN that automatically generates controllable and personalized user reviews based on the arbitrarily given sentimental and stylistic information.
320	Answers Unite! Unsupervised Metrics for Reinforced Summarization Models	Thomas Scialom, Sylvain Lamprier, Benjamin Piwowarski, Jacopo Staiano	We thus explore and propose alternative evaluation measures: the reported human-evaluation analysis shows that the proposed metrics, based on Question Answering, favorably compare to ROUGE — with the additional property of not requiring reference summaries.
321	Long and Diverse Text Generation with Planning-based Hierarchical Variational Model	Zhihong Shao, Minlie Huang, Jiangtao Wen, Wenfei Xu, xiaoyan zhu	To address these issues, we propose a Planning-based Hierarchical Variational Model (PHVM).
322	“Transforming” Delete, Retrieve, Generate Approach for Controlled Text Style Transfer	Akhilesh Sudhakar, Bhargav Upadhyay, Arjun Maheswaran	In this work we introduce the Generative Style Transformer (GST) – a new approach to rewriting sentences to a target style in the absence of parallel style corpora.
323	An Entity-Driven Framework for Abstractive Summarization	Eva Sharma, Luyang Huang, Zhe Hu, Lu Wang	In this paper, we introduce SENECA, a novel System for ENtity-drivEn Coherent Abstractive summarization framework that leverages entity information to generate informative and coherent abstracts.
324	Neural Extractive Text Summarization with Syntactic Compression	Jiacheng Xu, Greg Durrett	In this work, we present a neural model for single-document summarization based on joint extraction and syntactic compression.
325	Domain Adaptive Text Style Transfer	Dianqi Li, Yizhe Zhang, Zhe Gan, Yu Cheng, Chris Brockett, Bill Dolan, Ming-Ting Sun	In this paper, we examine domain adaptation for text style transfer to leverage massively available data from other domains.
326	Let’s Ask Again: Refine Network for Automatic Question Generation	Preksha Nema, Akash Kumar Mohankumar, Mitesh M. Khapra, Balaji Vasan Srinivasan, Balaraman Ravindran	In this work, we focus on the task of Automatic Question Generation (AQG) where given a passage and an answer the task is to generate the corresponding question.
327	Earlier Isn’t Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization	Taehee Jung, Dongyeop Kang, Lucas Mentch, Eduard Hovy	Following in the spirit of the claim that summarization is a combination of sub-functions, we define three sub-aspects of summarization: position, importance, and diversity and conduct an extensive analysis of the biases of each sub-aspect with respect to the domain of nine different summarization corpora (e.g., news, academic papers, meeting minutes, movie script, books, posts).
328	Lost in Evaluation: Misleading Benchmarks for Bilingual Dictionary Induction	Yova Kementchedjhieva, Mareike Hartmann, Anders Søgaard	We study the composition and quality of the test sets for five diverse languages from this dataset, with concerning findings: (1) a quarter of the data consists of proper nouns, which can be hardly indicative of BDI performance, and (2) there are pervasive gaps in the gold-standard targets.
329	Towards Realistic Practices In Low-Resource Natural Language Processing: The Development Set	Katharina Kann, Kyunghyun Cho, Samuel R. Bowman	Here, we aim to answer the following questions: Does using a development set for early stopping in the low-resource setting influence results as compared to a more realistic alternative, where the number of training epochs is tuned on development languages? And does it lead to overestimation or underestimation of performance?
330	Synchronously Generating Two Languages with Interactive Decoding	Yining Wang, Jiajun Zhang, Long Zhou, Yuchen Liu, Chengqing Zong	In this paper, we introduce a novel interactive approach to translate a source language into two different languages simultaneously and interactively.
331	On NMT Search Errors and Model Errors: Cat Got Your Tongue?	Felix Stahlberg, Bill Byrne	We present an exact inference procedure for neural sequence models based on a combination of beam search and depth-first search.
332	“Going on a vacation” takes longer than “Going for a walk”: A Study of Temporal Commonsense Understanding	Ben Zhou, Daniel Khashabi, Qiang Ning, Dan Roth	This paper systematically studies this temporal commonsense problem.
333	QAInfomax: Learning Robust Question Answering System by Mutual Information Maximization	Yi-Ting Yeh, Yun-Nung Chen	To address this problem, we propose QAInfomax as a regularizer in reading comprehension systems by maximizing mutual information among passages, a question, and its answer.
334	Adapting Meta Knowledge Graph Information for Multi-Hop Reasoning over Few-Shot Relations	Xin Lv, Yuxian Gu, Xu Han, Lei Hou, Juanzi Li, Zhiyuan Liu	In this paper, we propose a meta-based multi-hop reasoning method (Meta-KGR), which adopts meta-learning to learn effective meta parameters from high-frequency relations that could quickly adapt to few-shot relations.
335	How Reasonable are Common-Sense Reasoning Tasks: A Case-Study on the Winograd Schema Challenge and SWAG	Paul Trichelair, Ali Emami, Adam Trischler, Kaheer Suleman, Jackie Chi Kit Cheung	The question we ask in this paper is whether improved performance on these benchmarks represents genuine progress towards common-sense-enabled systems.
336	Pun-GAN: Generative Adversarial Network for Pun Generation	Fuli Luo, Shunyao Li, Pengcheng Yang, Lei Li, Baobao Chang, Zhifang Sui, Xu SUN	In this paper, we focus on the task of generating a pun sentence given a pair of word senses.
337	Multi-Task Learning with Language Modeling for Question Generation	Wenjie Zhou, Minghua Zhang, Yunfang Wu	Based on the attention-based pointer generator model, we propose to incorporate an auxiliary task of language modeling to help question generation in a hierarchical multi-task learning structure.
338	Autoregressive Text Generation Beyond Feedback Loops	Florian Schmidt, Stephan Mandt, Thomas Hofmann	In this paper, we combine a latent state space model with a CRF observation model.
339	The Woman Worked as a Babysitter: On Biases in Language Generation	Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, Nanyun Peng	We present a systematic study of biases in natural language generation (NLG) by analyzing text generated from prompts that contain mentions of different demographic groups.
340	On the Importance of Delexicalization for Fact Verification	Sandeep Suntwal, Mithun Paul, Rebecca Sharp, Mihai Surdeanu	Here, we investigate the importance that a model assigns to various aspects of data while learning and making predictions, specifically, in a recognizing textual entailment (RTE) task.
341	Towards Debiasing Fact Verification Models	Tal Schuster, Darsh Shah, Yun Jie Serene Yeo, Daniel Roberto Filizzola Ortiz, Enrico Santus, Regina Barzilay	In this paper, we investigate the cause of this phenomenon, identifying strong cues for predicting labels solely based on the claim, without considering any evidence.
342	Recognizing Conflict Opinions in Aspect-level Sentiment Classification with Dual Attention Networks	Xingwei Tan, Yi Cai, Changxi Zhu	In this paper, we propose a multi-label classification model with dual attention mechanism to address these problems.
343	Investigating Dynamic Routing in Tree-Structured LSTM for Sentiment Analysis	Jin Wang, Liang-Chih Yu, K. Robert Lai, Xuejie Zhang	To overcome the bias problem, this study proposes a capsule tree-LSTM model, introducing a dynamic routing algorithm as an aggregation layer to build sentence representation by assigning different weights to nodes according to their contributions to prediction.
344	A Label Informative Wide textbackslash& Deep Classifier for Patents and Papers	Muyao Niu, Jie Cai	In this paper, we provide a simple and effective baseline for classifying both patents and papers to the well-established Cooperative Patent Classification (CPC).
345	Text Level Graph Neural Network for Text Classification	Lianzhe Huang, Dehong Ma, Sujian Li, Xiaodong Zhang, Houfeng WANG	To tackle the problems, we propose a new GNN based model that builds graphs for each input text with global parameters sharing instead of a single graph for the whole corpus.
346	Semantic Relatedness Based Re-ranker for Text Spotting	Ahmed Sabir, Francesc Moreno, Lluís Padró	Our goal is to improve the performance of vision systems by leveraging semantic information.
347	Delta-training: Simple Semi-Supervised Text Classification using Pretrained Word Embeddings	Hwiyeol Jo, Ceyda Cinarel	We propose a novel and simple method for semi-supervised text classification.
348	Visual Detection with Context for Document Layout Analysis	Carlos Soto, Shinjae Yoo	We present 1) a work in progress method to visually segment key regions of scientific articles using an object detection technique augmented with contextual features, and 2) a novel dataset of region-labeled articles.
349	Evaluating Topic Quality with Posterior Variability	Linzi Xing, Michael J. Paul, Giuseppe Carenini	We derive a novel measure of LDA topic quality using the variability of the posterior distributions.
350	Neural Topic Model with Reinforcement Learning	Lin Gui, Jia Leng, Gabriele Pergola, yu zhou, Ruifeng Xu, Yulan He	In this paper, we borrow the idea of reinforcement learning and incorporate topic coherence measures as reward signals to guide the learning of a VAE-based topic model.
351	Modelling Stopping Criteria for Search Results using Poisson Processes	Alison Sneyd, Mark Stevenson	In this work, a novel method for determining a stopping criterion is proposed that models the rate at which relevant documents occur using a Poisson process.
352	Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval	Zeynep Akkalyoncu Yilmaz, Wei Yang, Haotian Zhang, Jimmy Lin	This paper applies BERT to ad hoc document retrieval on news articles, which requires addressing two challenges: relevance judgments in existing test collections are typically provided only at the document level, and documents often exceed the length that BERT was designed to handle.
353	The Challenges of Optimizing Machine Translation for Low Resource Cross-Language Information Retrieval	Constantine Lignos, Daniel Cohen, Yen-Chieh Lien, Pratik Mehta, W. Bruce Croft, Scott Miller	In this paper, we examine the relationship between the performance of MT systems and both neural and term frequency-based IR models to identify how CLIR performance can be best predicted from MT quality.
354	Rotate King to get Queen: Word Relationships as Orthogonal Transformations in Embedding Space	Kawin Ethayarajh	We document an alternative way in which downstream models might learn these relationships: orthogonal and linear transformations.
355	GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge	Luyao Huang, Chi Sun, Xipeng Qiu, Xuanjing Huang	In this paper, we focus on how to better leverage gloss knowledge in a supervised neural WSD system.
356	Leveraging Adjective-Noun Phrasing Knowledge for Comparison Relation Prediction in Text-to-SQL	Haoyan Liu, Lei Fang, Qian Liu, Bei Chen, Jian-Guang LOU, Zhoujun Li	In this paper, we propose to leverage adjective-noun phrasing knowledge mined from the web to predict the comparison relations in text-to-SQL.
357	Bridging the Defined and the Defining: Exploiting Implicit Lexical Semantic Relations in Definition Modeling	Koki Washio, Satoshi Sekine, Tsuneaki Kato	In this paper, we propose definition modeling methods that use lexical semantic relations.
358	Don’t Just Scratch the Surface: Enhancing Word Representations for Korean with Hanja	Kang Min Yoo, Taeuk Kim, Sang-goo Lee	We propose a simple yet effective approach for improving Korean word representations using additional linguistic annotation (i.e. Hanja).
359	SyntagNet: Challenging Supervised Word Sense Disambiguation with Lexical-Semantic Combinations	Marco Maru, Federico Scozzafava, Federico Martelli, Roberto Navigli	This paper introduces SyntagNet, a novel resource consisting of manually disambiguated lexical-semantic combinations.
360	Hierarchical Meta-Embeddings for Code-Switching Named Entity Recognition	Genta Indra Winata, Zhaojiang Lin, Jamin Shin, Zihan Liu, Pascale Fung	Therefore, we propose Hierarchical Meta-Embeddings (HME) that learn to combine multiple monolingual word-level and subword-level embeddings to create language-agnostic lexical representations.
361	Fine-tune BERT with Sparse Self-Attention Mechanism	Baiyun Cui, Yingming Li, Ming Chen, Zhongfei Zhang	In this paper, we develop a novel Sparse Self-Attention Fine-tuning model (referred as SSAF) which integrates sparsity into self-attention mechanism to enhance the fine-tuning performance of BERT.
362	Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy Labels	Lukas Lange, Michael A. Hedderich, Dietrich Klakow	We propose to cluster the training data using the input features and then compute different confusion matrices for each cluster.
363	A Multi-Pairwise Extension of Procrustes Analysis for Multilingual Word Translation	Hagai Taitelbaum, Gal Chechik, Jacob Goldberger	In this paper we present a novel approach to simultaneously representing multiple languages in a common space.
364	Out-of-Domain Detection for Low-Resource Text Classification Tasks	Ming Tan, Yang Yu, Haoyu Wang, Dakuo Wang, Saloni Potdar, Shiyu Chang, Mo Yu	In this work, we propose an {\textbackslash}emph{OOD-resistant Prototypical Network} to tackle this zero-shot OOD detection and few-shot ID classification task.
365	Harnessing Pre-Trained Neural Networks with Rules for Formality Style Transfer	Yunli Wang, Yu Wu, Lili Mou, Zhoujun Li, Wenhan Chao	We propose three fine-tuning methods in this paper and achieve a new state-of-the-art on benchmark datasets
366	Multiple Text Style Transfer by using Word-level Conditional Generative Adversarial Network with Two-Phase Training	Chih-Te Lai, Yi-Te Hong, Hong-You Chen, Chi-Jen Lu, Shou-De Lin	In this paper, we propose a new GAN model with a word-level conditional architecture and a two-phase training procedure.
367	Improved Differentiable Architecture Search for Language Modeling and Named Entity Recognition	Yufan Jiang, Chi Hu, Tong Xiao, Chunliang Zhang, Jingbo Zhu	In this paper, we study differentiable neural architecture search (NAS) methods for natural language processing.
368	Using Pairwise Occurrence Information to Improve Knowledge Graph Completion on Large-Scale Datasets	Esma Balkir, Masha Naslidnyk, Dave Palfrey, Arpit Mittal	In this paper we use occurrences of entity-relation pairs in the dataset to construct a joint learning model and to increase the quality of sampled negatives during training.
369	Single Training Dimension Selection for Word Embedding with PCA	Yu Wang	In this paper, we present a fast and reliable method based on PCA to select the number of dimensions for word embeddings.
370	A Surprisingly Effective Fix for Deep Latent Variable Modeling of Text	Bohan Li, Junxian He, Graham Neubig, Taylor Berg-Kirkpatrick, Yiming Yang	In this paper, we investigate a simple fix for posterior collapse which yields surprisingly effective results.
371	SciBERT: A Pretrained Language Model for Scientific Text	Iz Beltagy, Kyle Lo, Arman Cohan	We release SciBERT, a pretrained language model based on BERT (Devlin et. al., 2018) to address the lack of high-quality, large-scale labeled scientific data.
372	Humor Detection: A Transformer Gets the Last Laugh	Orion Weller, Kevin Seppi	In this paper we extend that capability by proposing a new task: assessing whether or not a joke is humorous.
373	Combining Global Sparse Gradients with Local Gradients in Distributed Neural Network Training	Alham Fikri Aji, Kenneth Heafield, Nikolay Bogoychev	We restore gradient quality by combining the compressed global gradient with the node’s locally computed uncompressed gradient.
374	Small and Practical BERT Models for Sequence Labeling	Henry Tsai, Jason Riesa, Melvin Johnson, Naveen Arivazhagan, Xin Li, Amelia Archer	We propose a practical scheme to train a single multilingual sequence labeling model that yields state of the art results and is small and fast enough to run on a single CPU.
375	Data Augmentation with Atomic Templates for Spoken Language Understanding	Zijian Zhao, Su Zhu, Kai Yu	In this work, we propose a data augmentation method with atomic templates for SLU, which involves minimum human efforts.
376	PaLM: A Hybrid Parser and Language Model	Hao Peng, Roy Schwartz, Noah A. Smith	We present PaLM, a hybrid parser and neural language model.
377	A Pilot Study for Chinese SQL Semantic Parsing	Qingkai Min, Yuefeng Shi, Yue Zhang	We compare character- and word-based encoders for a semantic parser, and different embedding schemes.
378	Global Reasoning over Database Structures for Text-to-SQL Parsing	Ben Bogin, Matt Gardner, Jonathan Berant	In this work, we propose a semantic parser that globally reasons about the structure of the output query to make a more contextually-informed selection of database constants.
379	Transductive Learning of Neural Language Models for Syntactic and Semantic Analysis	Hiroki Ouchi, Jun Suzuki, Kentaro Inui	Here we conduct an empirical study of transductive learning for neural models and demonstrate its utility in syntactic and semantic tasks.
380	Efficient Sentence Embedding using Discrete Cosine Transform	Nada Almarwani, Hanan Aldarmaki, Mona Diab	As an efficient alternative, we propose the use of discrete cosine transform (DCT) to compress word sequences in an order-preserving manner.
381	A Search-based Neural Model for Biomedical Nested and Overlapping Event Detection	Kurt Junshean Espinosa, Makoto Miwa, Sophia Ananiadou	We tackle the nested and overlapping event detection task and propose a novel search-based neural network (SBNN) structured prediction model that treats the task as a search problem on a relation graph of trigger-argument structures.
382	PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification	Yinfei Yang, Yuan Zhang, Chris Tar, Jason Baldridge	We remedy this gap with PAWS-X, a new dataset of 23,659 human translated PAWS evaluation pairs in six typologically distinct languages: French, Spanish, German, Chinese, Japanese, and Korean.
383	Pretrained Language Models for Sequential Sentence Classification	Arman Cohan, Iz Beltagy, Daniel King, Bhavana Dalvi, Dan Weld	In this work, we show that pretrained language models, BERT (Devlin et al., 2018) in particular, can be used for this task to capture contextual dependencies without the need for hierarchical encoding nor a CRF.
384	Emergent Linguistic Phenomena in Multi-Agent Communication Games	Laura Harding Graesser, Kyunghyun Cho, Douwe Kiela	We describe a multi-agent communication framework for examining high-level linguistic phenomena at the community-level.
385	TalkDown: A Corpus for Condescension Detection in Context	Zijian Wang, Christopher Potts	To address this, we present TalkDown, a new labeled dataset of condescending linguistic acts in context.
386	Summary Cloze: A New Task for Content Selection in Topic-Focused Summarization	Daniel Deutsch, Dan Roth	In this work, we propose a new method for studying content selection in topic-focused summarization called the summary cloze task.
387	Text Summarization with Pretrained Encoders	Yang Liu, Mirella Lapata	In this paper, we showcase how BERT can be usefully applied in text summarization and propose a general framework for both extractive and abstractive models.
388	How to Write Summaries with Patterns? Learning towards Abstractive Summarization through Prototype Editing	Shen Gao, Xiuying Chen, Piji Li, Zhangming Chan, Dongyan Zhao, Rui Yan	To tackle these challenges, we design a model named Prototype Editing based Summary Generator (PESG).
389	BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle	Peter West, Ari Holtzman, Jan Buys, Yejin Choi	In this paper, we propose a novel approach to unsupervised sentence summarization by mapping the Information Bottleneck principle to a conditional language modelling objective: given a sentence, our approach seeks a compressed sentence that can best predict the next sentence.
390	Improving Latent Alignment in Text Summarization by Generalizing the Pointer Generator	Xiaoyu Shen, Yang Zhao, Hui Su, Dietrich Klakow	In this paper, we address these problems by allowing the model to “edit” pointed tokens instead of always hard copying them.
391	Learning Semantic Parsers from Denotations with Latent Structured Alignments and Abstract Programs	Bailin Wang, Ivan Titov, Mirella Lapata	Our goal is to instill an inductive bias in the parser to help it distinguish between spurious and correct programs.
392	Broad-Coverage Semantic Parsing as Transduction	Sheng Zhang, Xutai Ma, Kevin Duh, Benjamin Van Durme	We unify different broad-coverage semantic parsing tasks into a transduction parsing paradigm, and propose an attention-based neural transducer that incrementally builds meaning representation via a sequence of semantic relations.
393	Core Semantic First: A Top-down Approach for AMR Parsing	Deng Cai, Wai Lam	We introduce a novel scheme for parsing a piece of text into its Abstract Meaning Representation (AMR): Graph Spanning based Parsing (GSP).
394	Don’t paraphrase, detect! Rapid and Effective Data Collection for Semantic Parsing	Jonathan Herzig, Jonathan Berant	In this paper, we thoroughly analyze two sources of mismatch in this process: the mismatch in logical form distribution and the mismatch in language distribution between the true and induced distributions. We quantify the effects of these mismatches, and propose a new data collection approach that mitigates them.
395	Improving Distantly-Supervised Relation Extraction with Joint Label Embedding	Linmei Hu, Luhao Zhang, Chuan Shi, Liqiang Nie, Weili Guan, Cheng Yang	In this paper, we propose a novel multi-layer attention-based model to improve relation extraction with joint label embedding.
396	Leverage Lexical Knowledge for Chinese Named Entity Recognition via Collaborative Graph Network	Dianbo Sui, Yubo Chen, Kang Liu, Jun Zhao, Shengping Liu	We present a Collaborative Graph Network to solve these challenges.
397	Looking Beyond Label Noise: Shifted Label Distribution Matters in Distantly Supervised Relation Extraction	Qinyuan Ye, Liyuan Liu, Maosen Zhang, Xiang Ren	In this paper, we study the problem what limits the performance of DS-trained neural models, conduct thorough analyses, and identify a factor that can influence the performance greatly, shifted label distribution.
398	Easy First Relation Extraction with Information Redundancy	Shuai Ma, Gang Wang, Yansong Feng, Jinpeng Huai	In this paper, we propose an easy first approach for relation extraction with information redundancies, embedded in the results produced by local sentence level extractors, during which conflict decisions are resolved with domain and uniqueness constraints.
399	Dependency-Guided LSTM-CRF for Named Entity Recognition	Zhanming Jie, Wei Lu	In this work, we propose a simple yet effective dependency-guided LSTM-CRF model to encode the complete dependency trees and capture the above properties for the task of named entity recognition (NER).
400	Cross-Cultural Transfer Learning for Text Classification	Dor Ringel, Gal Lavee, Ido Guy, Kira Radinsky	In this work, we show that cross-cultural differences can be harnessed for natural language text classification.
401	Combining Unsupervised Pre-training and Annotator Rationales to Improve Low-shot Text Classification	Oren Melamud, Mihaela Bornea, Ken Barker	In this work, we combine these two approaches to improve low-shot text classification with two novel methods: a simple bag-of-words embedding approach; and a more complex context-aware method, based on the BERT model.
402	ProSeqo: Projection Sequence Networks for On-Device Text Classification	Zornitsa Kozareva, Sujith Ravi	We propose a novel on-device sequence model for text classification using recurrent projections.
403	Induction Networks for Few-Shot Text Classification	Ruiying Geng, Binhua Li, Yongbin Li, Xiaodan Zhu, Ping Jian, Jian Sun	In this paper, we propose a novel Induction Network to learn such a generalized class-wise representation, by innovatively leveraging the dynamic routing algorithm in meta-learning.
404	Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach	Wenpeng Yin, Jamaal Hay, Dan Roth	Our contributions include: i) The datasets we provide facilitate studying 0Shot-TC relative to conceptually different and diverse aspects: the “topic” aspect includes “sports” and “politics” as labels; the “emotion” aspect includes “joy” and “anger”; the “situation” aspect includes “medical assistance” and “water shortage”.
405	A Logic-Driven Framework for Consistency of Neural Models	Tao Li, Vivek Gupta, Maitrey Mehta, Vivek Srikumar	In this paper, we formalize such inconsistency as a generalization of prediction error.
406	Style Transfer for Texts: Retrain, Report Errors, Compare with Rewrites	Alexey Tikhonov, Viacheslav Shibaev, Aleksander Nagaev, Aigul Nugmanova, Ivan P. Yamshchikov	This paper shows that standard assessment methodology for style transfer has several significant problems.
407	Implicit Deep Latent Variable Models for Text Generation	Le Fang, Chunyuan Li, Jianfeng Gao, Wen Dong, Changyou Chen	In this paper, we advocate sample-based representations of variational distributions for natural language, leading to implicit latent features, which can provide flexible representation power compared with Gaussian-based posteriors.
408	Text Emotion Distribution Learning from Small Sample: A Meta-Learning Approach	Zhenjie Zhao, Xiaojuan Ma	In this paper, we propose a meta-learning approach to learn text emotion distributions from a small sample.
409	Judge the Judges: A Large-Scale Evaluation Study of Neural Language Models for Online Review Generation	Cristina Garbacea, Samuel Carton, Shiyan Yan, Qiaozhu Mei	We conduct a large-scale, systematic study to evaluate the existing evaluation methods for natural language generation in the context of generating online product reviews.
410	Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks	Nils Reimers, Iryna Gurevych	In this publication, we present Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity.
411	Learning Only from Relevant Keywords and Unlabeled Documents	Nontawat Charoenphakdee, Jongyeong Lee, Yiping Jin, Dittaya Wanvarie, Masashi Sugiyama	In this paper, we propose a theoretically guaranteed learning framework that is simple to implement and has flexible choices of models, e.g., linear models or neural networks.
412	Denoising based Sequence-to-Sequence Pre-training for Text Generation	Liang Wang, Wei Zhao, Ruoyu Jia, Sujian Li, Jingming Liu	This paper presents a new sequence-to-sequence (seq2seq) pre-training method PoDA (Pre-training of Denoising Autoencoders), which learns representations suitable for text generation tasks.
413	Dialog Intent Induction with Deep Multi-View Clustering	Hugh Perkins, Yi Yang	We introduce the dialog intent induction task and present a novel deep multi-view clustering approach to tackle the problem.
414	Nearly-Unsupervised Hashcode Representations for Biomedical Relation Extraction	Sahil Garg, Aram Galstyan, Greg Ver Steeg, Guillermo Cecchi	In this paper, we propose to optimize the hashcode representations in a nearly unsupervised manner, in which we only use data points, but not their class labels, for learning.
415	Auditing Deep Learning processes through Kernel-based Explanatory Models	Danilo Croce, Daniele Rossini, Roberto Basili	In this paper, we discuss the application of Layerwise Relevance Propagation over a linguistically motivated neural architecture, the Kernel-based Deep Architecture, in order to trace back connections between linguistic properties of input instances and system decisions.
416	Enhancing Variational Autoencoders with Mutual Information Neural Estimation for Text Generation	Dong Qian, William K. Cheung	In this paper, we propose to introduce a mutual information (MI) term between the input and its latent variable to regularize the objective of the VAE.
417	Sampling Bias in Deep Active Classification: An Empirical Study	Ameya Prabhu, Charles Dognin, Maneesh Singh	Based on the above, we propose a simple baseline for deep active text classification that outperforms the state of the art.
418	Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases	Christopher Clark, Mark Yatskar, Luke Zettlemoyer	In this paper, we show that if we have prior knowledge of such biases, we can train a model to be more robust to domain shift.
419	Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation	Po-Sen Huang, Robert Stanforth, Johannes Welbl, Chris Dyer, Dani Yogatama, Sven Gowal, Krishnamurthy Dvijotham, Pushmeet Kohli	In this work, we approach the problem from the opposite direction: to formally verify a system’s robustness against a predefined class of adversarial attacks.
420	Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control	Mo Yu, Shiyu Chang, Yang Zhang, Tommi Jaakkola	We introduce an introspective model which explicitly predicts and incorporates the outcome into the selection process.
421	Experimenting with Power Divergences for Language Modeling	Matthieu Labeau, Shay B. Cohen	In this paper, we experiment with several families (alpha, beta and gamma) of power divergences, generalized from the KL divergence, for learning language models with an objective different than standard MLE.
422	Hierarchically-Refined Label Attention Network for Sequence Labeling	Leyang Cui, Yue Zhang	For better representing label sequences, we investigate a hierarchically-refined label attention network, which explicitly leverages label embeddings and captures potential long-term label dependency by giving each word incrementally refined label distributions with hierarchical attention.
423	Certified Robustness to Adversarial Word Substitutions	Robin Jia, Aditi Raghunathan, Kerem Göksel, Percy Liang	We train the first models that are provably robust to all word substitutions in this family.
424	Visualizing and Understanding the Effectiveness of BERT	Yaru Hao, Li Dong, Furu Wei, Ke Xu	In this paper, we propose to visualize loss landscapes and optimization trajectories of fine-tuning BERT on specific datasets.
425	Topics to Avoid: Demoting Latent Confounds in Text Classification	Sachin Kumar, Shuly Wintner, Noah A. Smith, Yulia Tsvetkov	We propose a method that represents the latent topical confounds and a model which “unlearns” confounding features by predicting both the label of the input text and the confound; but we train the two predictors adversarially in an alternating fashion to learn a text representation that predicts the correct label but is less prone to using information about the confound.
426	Learning to Ask for Conversational Machine Learning	Shashank Srivastava, Igor Labutov, Tom Mitchell	We present a reinforcement learning framework, where the learner’s actions correspond to question types and the reward for asking a question is based on how the teacher’s response changes performance of the resulting machine learning model on the learning task.
427	Language Modeling for Code-Switching: Evaluation, Integration of Monolingual Data, and Discriminative Training	Hila Gonen, Yoav Goldberg	We tackle these three issues: we propose an ASR-motivated evaluation setup which is decoupled from an ASR system and the choice of vocabulary, and provide an evaluation dataset for English-Spanish code-switching.
428	Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs	Angela Fan, Claire Gardent, Chloé Braud, Antoine Bordes	We propose constructing a local graph structured knowledge base for each query, which compresses the web search information and reduces redundancy.
429	Fine-grained Knowledge Fusion for Sequence Labeling Domain Adaptation	Huiyun Yang, Shujian Huang, XIN-YU DAI, Jiajun CHEN	To take the multi-level domain relevance discrepancy into account, in this paper, we propose a fine-grained knowledge fusion model with the domain relevance modeling scheme to control the balance between learning from the target domain data and learning from the source domain model.
430	Exploiting Monolingual Data at Scale for Neural Machine Translation	Lijun Wu, Yiren Wang, Yingce Xia, Tao QIN, Jianhuang Lai, Tie-Yan Liu	In this work, we study how to use both the source-side and target-side monolingual data for NMT, and propose an effective strategy leveraging both of them.
431	Meta Relational Learning for Few-Shot Link Prediction in Knowledge Graphs	Mingyang Chen, Wen Zhang, Wei Zhang, Qiang Chen, Huajun Chen	In this work, we propose a Meta Relational Learning (MetaR) framework to do the common but challenging few-shot link prediction in KGs, namely predicting new triples about a relation by only observing a few associative triples.
432	Distributionally Robust Language Modeling	Yonatan Oren, Shiori Sagawa, Tatsunori Hashimoto, Percy Liang	To remedy this without the knowledge of the test distribution, we propose an approach which trains a model that performs well over a wide range of potential test distributions.
433	Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling	Xiaochuang Han, Jacob Eisenstein	To address this scenario, we propose domain-adaptive fine-tuning, in which the contextualized embeddings are adapted by masked language modeling on text from the target domain.
434	Learning Latent Parameters without Human Response Patterns: Item Response Theory with Artificial Crowds	John P. Lalor, Hao Wu, Hong Yu	In this work we propose learning IRT models using RPs generated from artificial crowds of DNN models.
435	Parallel Iterative Edit Models for Local Sequence Transduction	Abhijeet Awasthi, Sunita Sarawagi, Rasna Goyal, Sabyasachi Ghosh, Vihari Piratla	We present a Parallel Iterative Edit (PIE) model for the problem of local sequence transduction arising in tasks like Grammatical error correction (GEC).
436	ARAML: A Stable Adversarial Training Framework for Text Generation	Pei Ke, Fei Huang, Minlie Huang, xiaoyan zhu	To tackle this problem, we propose a novel framework called Adversarial Reward Augmented Maximum Likelihood (ARAML).
437	FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow	Xuezhe Ma, Chunting Zhou, Xian Li, Graham Neubig, Eduard Hovy	In this paper, we propose a simple, efficient, and effective model for non-autoregressive sequence generation using latent variable models.
438	Compositional Generalization for Primitive Substitutions	Yuanpeng Li, Liang Zhao, Jianyu Wang, Joel Hestness	In this paper, we conduct fundamental research for encoding compositionality in neural networks.
439	WikiCREM: A Large Unsupervised Corpus for Coreference Resolution	Vid Kocijan, Oana-Maria Camburu, Ana-Maria Cretu, Yordan Yordanov, Phil Blunsom, Thomas Lukasiewicz	In this work, we introduce WikiCREM (Wikipedia CoREferences Masked) a large-scale, yet accurate dataset of pronoun disambiguation instances.
440	Identifying and Explaining Discriminative Attributes	Armins Stepanjans, André Freitas	This paper describes an explicit word vector representation model (WVM) to support the identification of discriminative attributes.
441	Patient Knowledge Distillation for BERT Model Compression	Siqi Sun, Yu Cheng, Zhe Gan, Jingjing Liu	In order to alleviate this resource hunger in large-scale model training, we propose a Patient Knowledge Distillation approach to compress an original large model (teacher) into an equally-effective lightweight shallow network (student).
442	Neural Gaussian Copula for Variational Autoencoder	Prince Zizhuang Wang, William Yang Wang	We propose Gaussian Copula Variational Autoencoder (VAE) to avert this problem.
443	Transformer Dissection: An Unified Understanding for Transformer’s Attention via the Lens of Kernel	Yao-Hung Hubert Tsai, Shaojie Bai, Makoto Yamada, Louis-Philippe Morency, Ruslan Salakhutdinov	In this paper, we present a new formulation of attention via the lens of the kernel.
444	Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification	Jiawei Wu, Wenhan Xiong, William Yang Wang	In this paper, we propose a meta-learning method to capture these complex label dependencies.
445	Revealing the Dark Secrets of BERT	Olga Kovaleva, Alexey Romanov, Anna Rogers, Anna Rumshisky	In the current work, we focus on the interpretation of self-attention, which is one of the fundamental underlying components of BERT.
446	Machine Translation With Weakly Paired Documents	Lijun Wu, Jinhua Zhu, Di He, Fei Gao, Tao QIN, Jianhuang Lai, Tie-Yan Liu	Observing that weakly paired bilingual documents are much easier to collect than bilingual sentences, e.g., from Wikipedia, news websites or books, in this paper, we investigate training translation models with weakly paired bilingual documents.
447	Countering Language Drift via Visual Grounding	Jason Lee, Kyunghyun Cho, Douwe Kiela	We recast translation as a multi-agent communication game and examine auxiliary training constraints for their effectiveness in mitigating language drift.
448	The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives	Elena Voita, Rico Sennrich, Ivan Titov	In this work, we use canonical correlation analysis and mutual information estimators to study how information flows across Transformer layers and observe that the choice of the objective determines this process.
449	Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?	Ivan Vuli?, Goran Glavaš, Roi Reichart, Anna Korhonen	In this paper, we question the ability of even the most robust unsupervised CLWE approaches to induce meaningful CLWEs in these more challenging settings.
450	Weakly-Supervised Concept-based Adversarial Learning for Cross-lingual Word Embeddings	Haozhou Wang, James Henderson, Paola Merlo	In this paper, we propose a weakly-supervised adversarial training method to overcome this limitation, based on the intuition that mapping across languages is better done at the concept level than at the word level.
451	Aligning Cross-Lingual Entities with Multi-Aspect Information	Hsiu-Wei Yang, Yanyan Zou, Peng Shi, Wei Lu, Jimmy Lin, Xu SUN	In this work, we investigate embedding-based approaches to encode entities from multilingual KGs into the same vector space, where equivalent entities are close to each other.
452	Contrastive Language Adaptation for Cross-Lingual Stance Detection	Mitra Mohtarami, James Glass, Preslav Nakov	In particular, we introduce a novel contrastive language adaptation approach applied to memory networks, which ensures accurate alignment of stances in the source and target languages, and can effectively deal with the challenge of limited labeled data in the target language.
453	Jointly Learning to Align and Translate with Transformer Models	Sarthak Garg, Stephan Peitz, Udhyakumar Nallasamy, Matthias Paulik	In this paper, we present an approach to train a Transformer model to produce both accurate translations and alignments.
454	Social IQa: Commonsense Reasoning about Social Interactions	Maarten Sap, Hannah Rashkin, Derek Chen, Ronan Le Bras, Yejin Choi	We introduce Social IQa, the first large-scale benchmark for commonsense reasoning about social situations.
455	Self-Assembling Modular Networks for Interpretable Multi-Hop Reasoning	Yichen Jiang, Mohit Bansal	In this work, we present an interpretable, controller-based Self-Assembling Neural Modular Network (Hu et al., 2017, 2018) for multi-hop reasoning, where we design four novel modules (Find, Relocate, Compare, NoOp) to perform unique types of language reasoning.
456	Posing Fair Generalization Tasks for Natural Language Inference	Atticus Geiger, Ignacio Cases, Lauri Karttunen, Christopher Potts	In this paper, we define and motivate a formal notion of fairness in this sense.
457	Everything Happens for a Reason: Discovering the Purpose of Actions in Procedural Text	Bhavana Dalvi, Niket Tandon, Antoine Bosselut, Wen-tau Yih, Peter Clark	We present our new model (XPAD) that biases effect predictions towards those that (1) explain more of the actions in the paragraph and (2) are more plausible with respect to background knowledge.
458	CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text	Koustuv Sinha, Shagun Sodhani, Jin Dong, Joelle Pineau, William L. Hamilton	In this work, we introduce a diagnostic benchmark suite, named CLUTRR, to clarify some key issues related to the robustness and systematicity of NLU systems.
459	Taskmaster-1: Toward a Realistic and Diverse Dialog Dataset	Bill Byrne, Karthik Krishnamoorthi, Chinnadhurai Sankar, Arvind Neelakantan, Ben Goodrich, Daniel Duckworth, Semih Yavuz, Amit Dubey, Kyu-Young Kim, Andy Cedilnik	To help satisfy this elementary requirement, we introduce the initial release of the Taskmaster-1 dataset which includes 13,215 task-based dialogs comprising six domains.
460	Multi-Domain Goal-Oriented Dialogues (MultiDoGO): Strategies toward Curating and Annotating Large Scale Dialogue Data	Denis Peskov, Nancy Clarke, Jason Krone, Brigi Fodor, Yi Zhang, Adel Youssef, Mona Diab	In this paper, we present strategies toward curating and annotating large scale goal oriented dialogue data. We introduce the MultiDoGO dataset to overcome these limitations.
461	Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack	Emily Dinan, Samuel Humeau, Bharath Chintagunta, Jason Weston	In this work, we develop a training scheme for a model to become robust to such human attacks by an iterative build it, break it, fix it scheme with humans and models in the loop.
462	GECOR: An End-to-End Generative Ellipsis and Co-reference Resolution Model for Task-Oriented Dialogue	Jun Quan, Deyi Xiong, Bonnie Webber, Changjian Hu	In this paper, we treat the resolution of ellipsis and co-reference in dialogue as a problem of generating omitted or referred expressions from the dialogue context.
463	Task-Oriented Conversation Generation Using Heterogeneous Memory Networks	Zehao Lin, Xinjing Huang, Feng Ji, Haiqing Chen, Yin Zhang	In this paper, we propose a novel and versatile external memory networks called Heterogeneous Memory Networks (HMNs), to simultaneously utilize user utterances, dialogue history and background knowledge tuples.
464	Aspect-based Sentiment Classification with Aspect-specific Graph Convolutional Networks	Chen Zhang, Qiuchi Li, Dawei Song	To tackle this problem, we propose to build a Graph Convolutional Network (GCN) over the dependency tree of a sentence to exploit syntactical information and word dependencies.
465	Coupling Global and Local Context for Unsupervised Aspect Extraction	Ming Liao, Jing Li, Haisong Zhang, Lingzhi Wang, Xixin Wu, Kam-Fai Wong	We propose a novel neural model, capable of coupling global and local representation to discover aspect words.
466	Transferable End-to-End Aspect-based Sentiment Analysis with Selective Adversarial Learning	Zheng Li, Xin Li, Ying Wei, Lidong Bing, Yu Zhang, Qiang Yang	To resolve it, we propose a novel Selective Adversarial Learning (SAL) method to align the inferred correlation vectors that automatically capture their latent relations.
467	CAN: Constrained Attention Networks for Multi-Aspect Sentiment Analysis	Mengting Hu, Shiwan Zhao, Li Zhang, Keke Cai, Zhong Su, Renhong Cheng, Xiaowei Shen	In this paper, we propose constrained attention networks (CAN), a simple yet effective solution, to regularize the attention for multi-aspect sentiment analysis, which alleviates the drawback of the attention mechanism.
468	Leveraging Just a Few Keywords for Fine-Grained Aspect Detection Through Weakly Supervised Co-Training	Giannis Karamanolakis, Daniel Hsu, Luis Gravano	In this work, we consider weakly supervised approaches for training aspect classifiers that only require the user to provide a small set of seed words (i.e., weakly positive indicators) for the aspects of interest.
469	Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts	Julia Kruk, Jonah Lubin, Karan Sikka, Xiao Lin, Dan Jurafsky, Ajay Divakaran	Here we introduce a multimodal dataset of {\$}1299{\$} Instagram posts labeled for three orthogonal taxonomies: the authorial intent behind the image-caption pair, the contextual relationship between the literal meanings of the image and caption, and the semiotic relationship between the signified meanings of the image and caption.
470	Neural Conversation Recommendation with Online Interaction Modeling	Xingshan Zeng, Jing Li, Lu Wang, Kam-Fai Wong	In this paper, we present a novel framework to automatically recommend conversations to users based on their prior conversation behaviors.
471	Different Absorption from the Same Sharing: Sifted Multi-task Learning for Fake News Detection	Lianwei Wu, Yuan Rao, Haolin Jin, Ambreen Nazir, Ling Sun	In this paper, we design a sifted multi-task learning method with a selected sharing layer for fake news detection.
472	Text-based inference of moral sentiment change	Jing Yi Xie, Renato Ferreira Pinto Junior, Graeme Hirst, Yang Xu	We present a text-based framework for investigating moral sentiment change of the public via longitudinal corpora.
473	Detecting Causal Language Use in Science Findings	Bei Yu, Yingya Li, Jun Wang	In this study, we first annotated a corpus of over 3,000 PubMed research conclusion sentences, then developed a BERT-based prediction model that classifies conclusion sentences into “no relationship”, “correlational”, “conditional causal”, and “direct causal” categories, achieving an accuracy of 0.90 and a macro-F1 of 0.88. We then applied the prediction model to measure the causal language use in the research conclusions of about 38,000 observational studies in PubMed.
474	Multilingual and Multi-Aspect Hate Speech Analysis	Nedjma Ousidhoum, Zizheng Lin, Hongming Zhang, Yangqiu Song, Dit-Yan Yeung	In this paper, we present a new multilingual multi-aspect hate speech analysis dataset and use it to test the current state-of-the-art multilingual multitask learning approaches.
475	MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims	Isabelle Augenstein, Christina Lioma, Dongsheng Wang, Lucas Chaves Lima, Casper Hansen, Christian Hansen, Jakob Grue Simonsen	We contribute the largest publicly available dataset of naturally occurring factual claims for the purpose of automatic claim verification.
476	A Deep Neural Information Fusion Architecture for Textual Network Embeddings	Zenan Xu, Qinliang Su, Xiaojun Quan, Weijia Zhang	In this paper, a deep neural architecture is proposed to effectively fuse the two kinds of informations into one representation.
477	You Shall Know a User by the Company It Keeps: Dynamic Representations for Social Media Users in NLP	Marco Del Tredici, Diego Marcheggiani, Sabine Schulte im Walde, Raquel Fernández	We present a model based on Graph Attention Networks that captures this observation.
478	Adaptive Ensembling: Unsupervised Domain Adaptation for Political Document Analysis	Shrey Desai, Barea Sinno, Alex Rosenfeld, Junyi Jessy Li	To bridge this gap, we present adaptive ensembling, an unsupervised domain adaptation framework, equipped with a novel text classification model and time-aware training to ensure our methods work well with diachronic corpora.
479	Macrocosm: Social Media Persona Linking for Open Source Intelligence Applications	Graham Horwood, Ning Yu, Thomas Boggs, Changjiang Yang, Chad Holvenstot	This paper presents a multi-modal analysis of cross-contextual online social media (Macrocosm), a data-driven approach to detect similarities among user personas over six modalities: usernames, patterns-of-life, stylometry, semantic content, image content, and social network associations.
480	A Hierarchical Location Prediction Neural Network for Twitter User Geolocation	Binxuan Huang, Kathleen Carley	In this paper, we propose a hierarchical location prediction neural network for Twitter user geolocation.
481	Trouble on the Horizon: Forecasting the Derailment of Online Conversations as they Develop	Jonathan P. Chang, Cristian Danescu-Niculescu-Mizil	In this work we introduce a conversational forecasting model that learns an unsupervised representation of conversational dynamics and exploits it to predict future derailment as the conversation develops.
482	A Benchmark Dataset for Learning to Intervene in Online Hate Speech	Jing Qian, Anna Bethke, Yinyin Liu, Elizabeth Belding, William Yang Wang	In this paper, we propose a novel task of generative hate speech intervention, where the goal is to automatically generate responses to intervene during online conversations that contain hate speech. As a part of this work, we introduce two fully-labeled large-scale hate speech intervention datasets collected from Gab and Reddit.
483	Detecting and Reducing Bias in a High Stakes Domain	Ruiqi Zhong, Yanda Chen, Desmond Patton, Charlotte Selous, Kathy McKeown	To address the possibility of bias in this sensitive application, we developed an approach to systematically interpret the state of the art model.
484	CodeSwitch-Reddit: Exploration of Written Multilingual Discourse in Online Discussion Forums	Ella Rabinovich, Masih Sultani, Suzanne Stevenson	We introduce a novel, large, and diverse dataset of written code-switched productions, curated from topical threads of multiple bilingual communities on the Reddit discussion platform, and explore questions that were mainly addressed in the context of spoken language thus far.
485	Modeling Conversation Structure and Temporal Dynamics for Jointly Predicting Rumor Stance and Veracity	Penghui Wei, Nan Xu, Wenji Mao	In this paper, we propose a hierarchical multi-task learning framework for jointly predicting rumor stance and veracity on Twitter, which consists of two components.
486	Reconstructing Capsule Networks for Zero-shot Intent Classification	Han Liu, Xiaotong Zhang, Lu Fan, Xuandi Fu, Qimai Li, Xiao-Ming Wu, Albert Y.S. Lam	To overcome these limitations, we propose to reconstruct capsule networks for zero-shot intent classification.
487	Domain Adaptation for Person-Job Fit with Transferable Deep Global Match Network	Shuqing Bian, Wayne Xin Zhao, Yang Song, Tao Zhang, Ji-Rong Wen	We study the domain adaptation problem for person-job fit.
488	Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification	Hu Linmei, Tianchi Yang, Chuan Shi, Houye Ji, Xiaoli Li	In this paper, we propose a novel heterogeneous graph neural network based method for semi-supervised short text classification, leveraging full advantage of few labeled data and large unlabeled data through information propagation along the graph.
489	Comparing and Developing Tools to Measure the Readability of Domain-Specific Texts	Elissa Redmiles, Lisa Maszkiewicz, Emily Hwang, Dhruv Kuchhal, Everest Liu, Miraida Morales, Denis Peskov, Sudha Rao, Rock Stevens, Kristina Gligori?, Sean Kross, Michelle Mazurek, Hal Daumé III	In this work, we present a comparison of the validity of well-known readability measures and introduce a novel approach, Smart Cloze, which is designed to address shortcomings of existing measures.
490	News2vec: News Network Embedding with Subnode Information	Ye Ma, Lu Zong, Yikang Yang, Jionglong Su	With the aim of filling this gap, the News2vec model is proposed to allow the distributed representation of news taking into account its associated features.
491	Recursive Context-Aware Lexical Simplification	Sian Gooding, Ekaterina Kochmar	This paper presents a novel architecture for recursive context-aware lexical simplification, REC-LS, that is capable of (1) making use of the wider context when detecting the words in need of simplification and suggesting alternatives, and (2) taking previous simplification steps into account.
492	Leveraging Medical Literature for Section Prediction in Electronic Health Records	Sara Rosenthal, Ken Barker, Zhicheng Liang	We propose using sections from medical literature (e.g., textbooks, journals, web content) that contain content similar to that found in EHR sections.
493	Neural News Recommendation with Heterogeneous User Behavior	Chuhan Wu, Fangzhao Wu, Mingxiao An, Tao Qi, Jianqiang Huang, Yongfeng Huang, Xing Xie	In this paper, we propose a neural news recommendation approach which can exploit heterogeneous user behaviors.
494	Reviews Meet Graphs: Enhancing User and Item Representations for Recommendation with Hierarchical Attentive Graph Neural Network	Chuhan Wu, Fangzhao Wu, Tao Qi, Suyu Ge, Yongfeng Huang, Xing Xie	In this paper, we propose a neural recommendation approach which can utilize useful information from both review content and user-item graphs.
495	Event Representation Learning Enhanced with External Commonsense Knowledge	Xiao Ding, Kuo Liao, Ting Liu, Zhongyang Li, Junwen Duan	To address this issue, this paper proposes to leverage external commonsense knowledge about the intent and sentiment of the event.
496	Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification	Yichao Zhou, Jyun-Yu Jiang, Kai-Wei Chang, Wei Wang	In this paper, we propose a novel framework, learning to discriminate perturbations (DISP), to identify and adjust malicious perturbations, thereby blocking adversarial attacks for text classification models.
497	A Neural Citation Count Prediction Model based on Peer Review Text	Siqing Li, Wayne Xin Zhao, Eddy Jing Yin, Ji-Rong Wen	In this paper, we take the initiative to utilize peer review data for the CCP task with a neural prediction model.
498	Connecting the Dots: Document-level Neural Relation Extraction with Edge-oriented Graphs	Fenia Christopoulou, Makoto Miwa, Sophia Ananiadou	We thus propose an edge-oriented graph neural model for document-level relation extraction.
499	Semi-supervised Text Style Transfer: Cross Projection in Latent Space	Mingyue Shang, Piji Li, Zhenxin Fu, Lidong Bing, Dongyan Zhao, Shuming Shi, Rui Yan	With these two types of training data, we introduce a projection function between the latent space of different styles and design two constraints to train it.
500	Question Answering for Privacy Policies: Combining Computational and Legal Perspectives	Abhilasha Ravichander, Alan W Black, Shomir Wilson, Thomas Norton, Norman Sadeh	We present PrivacyQA, a corpus consisting of 1750 questions about the privacy policies of mobile applications, and over 3500 expert annotations of relevant answers.
501	Stick to the Facts: Learning towards a Fidelity-oriented E-Commerce Product Description Generation	Zhangming Chan, Xiuying Chen, Yongliang Wang, Juntao Li, Zhiqiang Zhang, Kun Gai, Dongyan Zhao, Rui Yan	To bridge this gap we propose a model named Fidelity-oriented Product Description Generator (FPDG).
502	Fine-Grained Entity Typing via Hierarchical Multi Graph Convolutional Networks	Hailong Jin, Lei Hou, Juanzi Li, Tiansi Dong	We convert this problem into the task of graph-based semi-supervised classification, and propose Hierarchical Multi Graph Convolutional Network (HMGCN), a novel Deep Learning architecture to tackle this problem.
503	Learning to Infer Entities, Properties and their Relations from Clinical Conversations	Nan Du, Mingqiu Wang, Linh Tran, Gang Lee, Izhak Shafran	We extend the SAT model to jointly infer not only entities and their properties but also relations between them.
504	Practical Correlated Topic Modeling and Analysis via the Rectified Anchor Word Algorithm	Moontae Lee, Sungjun Cho, David Bindel, David Mimno	This paper aims to solidify the foundations of spectral topic inference and provide a practical implementation for anchor-based topic modeling.
505	Modeling the Relationship between User Comments and Edits in Document Revision	Xuchao Zhang, Dheeraj Rajagopal, Michael Gamon, Sujay Kumar Jauhar, ChangTien Lu	Thus, in this paper we explore the relationship between comments and edits by defining two novel, related tasks: Comment Ranking and Edit Anchoring.
506	PRADO: Projection Attention Networks for Document Classification On-Device	Karthik Krishnamoorthi, Sujith Ravi, Zornitsa Kozareva	We propose a novel projection attention neural network PRADO that combines trainable projections with attention and convolutions.
507	Subword Language Model for Query Auto-Completion	Gyuwan Kim	We present how to utilize subword language models for the fast and accurate generation of query completion candidates.
508	Enhancing Dialogue Symptom Diagnosis with Global Attention and Symptom Graph	Xinzhu Lin, Xiahui He, Qin Chen, Huaixiao Tou, Zhongyu Wei, Ting Chen	In order to further enhance the performance of symptom diagnosis over dialogues, we propose a global attention mechanism to capture more symptom related information, and build a symptom graph to model the associations between symptoms rather than treating each symptom independently.
509	Counterfactual Story Reasoning and Generation	Lianhui Qin, Antoine Bosselut, Ari Holtzman, Chandra Bhagavatula, Elizabeth Clark, Yejin Choi	In this paper, we propose Counterfactual Story Rewriting: given an original story and an intervening counterfactual event, the task is to minimally revise the story to make it compatible with the given counterfactual event.
510	Encode, Tag, Realize: High-Precision Text Editing	Eric Malmi, Sebastian Krause, Sascha Rothe, Daniil Mirylenka, Aliaksei Severyn	To predict the edit operations, we propose a novel model, which combines a BERT encoder with an autoregressive Transformer decoder.
511	Answer-guided and Semantic Coherent Question Generation in Open-domain Conversation	Weichao Wang, Shi Feng, Daling Wang, Yifei Zhang	Thus, we devise two methods to further enhance semantic coherence between post and question under the guidance of answer.
512	Read, Attend and Comment: A Deep Architecture for Automatic News Comment Generation	Ze Yang, Can Xu, wei wu, zhoujun li	In this paper, we propose a “read-attend-comment” procedure for news comment generation and formalize the procedure with a reading network and a generation network.
513	A Topic Augmented Text Generation Model: Joint Learning of Semantics and Structural Features	hongyin tang, Miao Li, Beihong Jin	In this paper, we propose a text generation model that learns semantics and structural features simultaneously.
514	LXMERT: Learning Cross-Modality Encoder Representations from Transformers	Hao Tan, Mohit Bansal	We thus propose the LXMERT (Learning Cross-Modality Encoder Representations from Transformers) framework to learn these vision-and-language connections.
515	Phrase Grounding by Soft-Label Chain Conditional Random Field	Jiacheng Liu, Julia Hockenmaier	In this paper, we formulate phrase grounding as a sequence labeling task where we treat candidate regions as potential labels, and use neural chain Conditional Random Fields (CRFs) to model dependencies among regions for adjacent mentions.
516	What You See is What You Get: Visual Pronoun Coreference Resolution in Dialogues	Xintong Yu, Hongming Zhang, Yangqiu Song, Yan Song, Changshui Zhang	To tackle this challenge, in this paper, we formally define the task of visual-aware pronoun coreference resolution (PCR) and introduce VisPro, a large-scale dialogue PCR dataset, to investigate whether and how the visual information can help resolve pronouns in dialogues.
517	YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension	Weiying Wang, Yongcheng Wang, Shizhe Chen, Qin Jin	In this work, we introduce “YouMakeup”, a large-scale multimodal instructional video dataset to support fine-grained semantic comprehension research in specific domain.
518	DEBUG: A Dense Bottom-Up Grounding Approach for Natural Language Video Localization	Chujie Lu, Long Chen, Chilie Tan, Xiaolin Li, Jun Xiao	In this paper, we focus on natural language video localization: localizing (ie, grounding) a natural language description in a long and untrimmed video sequence.
519	CrossWeigh: Training Named Entity Tagger from Imperfect Annotations	Zihan Wang, Jingbo Shang, Liyuan Liu, Lihao Lu, Jiacheng Liu, Jiawei Han	In this study, we dive deep into one of the widely-adopted NER benchmark datasets, CoNLL03 NER.
520	A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers	Aditi Chaudhary, Jiateng Xie, Zaid Sheikh, Graham Neubig, Jaime Carbonell	In this paper, we ask the question: given this recent progress, and some amount of human annotation, what is the most effective method for efficiently creating high-quality entity recognizers in under-resourced languages?
521	Open Domain Web Keyphrase Extraction Beyond Language Modeling	Lee Xiong, Chuan Hu, Chenyan Xiong, Daniel Campos, Arnold Overwijk	To handle the variations of domain and content quality, we develop BLING-KPE, a neural keyphrase extraction model that goes beyond language understanding using visual presentations of documents and weak supervision from search queries.
522	TuckER: Tensor Factorization for Knowledge Graph Completion	Ivana Balazevic, Carl Allen, Timothy Hospedales	We propose TuckER, a relatively straightforward but powerful linear model based on Tucker decomposition of the binary tensor representation of knowledge graph triples.
523	Human-grounded Evaluations of Explanation Methods for Text Classification	Piyawat Lertvittayakumjorn, Francesca Toni	In this paper, we consider several model-agnostic and model-specific explanation methods for CNNs for text classification and conduct three human-grounded evaluations, focusing on different purposes of explanations: (1) revealing model behavior, (2) justifying model predictions, and (3) helping humans investigate uncertain predictions.
524	A Context-based Framework for Modeling the Role and Function of On-line Resource Citations in Scientific Literature	He Zhao, Zhunchen Luo, Chong Feng, Anqing Zheng, Xiaopeng Liu	In this paper, we propose a possible solution by using a multi-task framework to build the scientific resource classifier (SciResCLF) for jointly recognizing the role and function types.
525	Adversarial Reprogramming of Text Classification Neural Networks	Paarth Neekhara, Shehzeen Hussain, Shlomo Dubnov, Farinaz Koushanfar	In this work, we develop methods to repurpose text classification neural networks for alternate tasks without modifying the network architecture or parameters.
526	Document Hashing with Mixture-Prior Generative Models	Wei Dong, Qinliang Su, Dinghan Shen, Changyou Chen	In this paper, two mixture-prior generative models are proposed, under the objective to produce high-quality hashing codes for documents.
527	On Efficient Retrieval of Top Similarity Vectors	Shulong Tan, Zhixin Zhou, Zhaozhuo Xu, Ping Li	In this paper, we demonstrate an efficient method for searching vectors via a typical non-metric matching function: inner product.
528	Multiplex Word Embeddings for Selectional Preference Acquisition	Hongming Zhang, Jiaxin Bai, Yan Song, Kun Xu, Changlong Yu, Yangqiu Song, Wilfred Ng, Dong Yu	Therefore, in this paper, we propose a multiplex word embedding model, which can be easily extended according to various relations among words.
529	MulCode: A Multiplicative Multi-way Model for Compressing Neural Language Model	Yukun Ma, Patrick H. Chen, Cho-Jui Hsieh	To compress these embedding layers, we propose MulCode, a novel multi-way multiplicative neural compressor.
530	It’s All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution	Rowan Hall Maudslay, Hila Gonen, Ryan Cotterell, Simone Teufel	We propose two improvements to CDA: Counterfactual Data Substitution (CDS), a variant of CDA in which potentially biased text is randomly substituted to avoid duplication, and the Names Intervention, a novel name-pairing technique that vastly increases the number of words being treated.
531	Examining Gender Bias in Languages with Grammatical Gender	Pei Zhou, Weijia Shi, Jieyu Zhao, Kuan-Hao Huang, Muhao Chen, Ryan Cotterell, Kai-Wei Chang	In this paper, we propose new metrics for evaluating gender bias in word embeddings of these languages and further demonstrate evidence of gender bias in bilingual embeddings which align these languages with English.
532	Weakly Supervised Cross-lingual Semantic Relation Classification via Knowledge Distillation	Yogarshi Vyas, Marine Carpuat	We introduce a cross-lingual relation classifier trained only with English examples and a bilingual dictionary.
533	Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations	Christian Hadiwinoto, Hwee Tou Ng, Wee Chung Gan	In this paper, we explore different strategies of integrating pre-trained contextualized word representations and our best strategy achieves accuracies exceeding the best prior published accuracies by significant margins on multiple benchmark WSD datasets.
534	Do NLP Models Know Numbers? Probing Numeracy in Embeddings	Eric Wallace, Yizhong Wang, Sujian Li, Sameer Singh, Matt Gardner	We begin by investigating the numerical reasoning capabilities of a state-of-the-art question answering model on the DROP dataset. We find this model excels on questions that require numerical reasoning, i.e., it already captures numeracy.
535	A Split-and-Recombine Approach for Follow-up Query Analysis	Qian Liu, Bei Chen, Haoyan Liu, Jian-Guang LOU, Lei Fang, Bin Zhou, Dongmei Zhang	To leverage the advances in context-independent semantic parsing, we propose to perform follow-up query analysis, aiming to restate context-dependent natural language queries with contextual information.
536	Text2Math: End-to-end Parsing Text into Math Expressions	Yanyan Zou, Wei Lu	We propose Text2Math, a model for semantically parsing text into math expressions.
537	Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions	Rui Zhang, Tao Yu, Heyang Er, Sungrok Shim, Eric Xue, Xi Victoria Lin, Tianze Shi, Caiming Xiong, Richard Socher, Dragomir Radev	Based on the observation that adjacent natural language questions are often linguistically dependent and their corresponding SQL queries tend to overlap, we utilize the interaction history by editing the previous predicted query to improve the generation quality.
538	Syntax-aware Multilingual Semantic Role Labeling	Shexia He, Zuchao Li, Hai Zhao	Unlike existing work, we propose a novel method guided by syntactic rule to prune arguments, which enables us to integrate syntax into multilingual SRL model simply and effectively.
539	Cloze-driven Pretraining of Self-attention Networks	Alexei Baevski, Sergey Edunov, Yinhan Liu, Luke Zettlemoyer, Michael Auli	We present a new approach for pretraining a bi-directional transformer model that provides significant performance gains across a variety of language understanding problems.
540	Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling	Jinfeng Rao, Linqing Liu, Yi Tay, Wei Yang, Peng Shi, Jimmy Lin	To bridge this gap, we propose a novel model, HCAN{\textasciitilde}(Hybrid Co-Attention Network), that comprises (1) a hybrid encoder module that includes ConvNet-based and LSTM-based encoders, (2) a relevance matching module that measures soft term matches with importance weighting at multiple granularities, and (3) a semantic matching module with co-attention mechanisms that capture context-aware semantic relatedness.
541	A Syntax-aware Multi-task Learning Framework for Chinese Semantic Role Labeling	Qingrong Xia, Zhenghua Li, Min Zhang	In this paper, we adopt a simple unified span-based model for both span-based and word-based Chinese SRL as a strong baseline.
542	Transfer Fine-Tuning: A BERT Case Study	Yuki Arase, Jun’ichi Tsujii	Herein, we propose to inject phrasal paraphrase relations into BERT in order to generate suitable representations for semantic equivalence assessment instead of increasing the model size.
543	Data-Anonymous Encoding for Text-to-SQL Generation	Zhen Dong, Shizhao Sun, Hongzhi Liu, Jian-Guang Lou, Dongmei Zhang	In this work, we propose a more efficient approach to handle table-related tokens before the semantic parser.
544	Capturing Argument Interaction in Semantic Role Labeling with Capsule Networks	Xinchi Chen, Chunchuan Lyu, Ivan Titov	We propose a new approach to modeling these interactions while maintaining efficient inference.
545	Learning Programmatic Idioms for Scalable Semantic Parsing	Srinivasan Iyer, Alvin Cheung, Luke Zettlemoyer	In this paper, we introduce an iterative method to extract code idioms from large source code corpora by repeatedly collapsing most-frequent depth-2 subtrees of their syntax trees, and train semantic parsers to apply these idioms during decoding.
546	JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation	Rajas Agashe, Srinivasan Iyer, Luke Zettlemoyer	To study code generation conditioned on a long context history, we present JuICe, a corpus of 1.5 million examples with a curated test set of 3.7K instances based on online programming assignments.
547	Model-based Interactive Semantic Parsing: A Unified Framework and A Text-to-SQL Case Study	Ziyu Yao, Yu Su, Huan Sun, Wen-tau Yih	In this paper, we propose a new, unified formulation of the interactive semantic parsing problem, where the goal is to design a model-based intelligent agent.
548	Modeling Graph Structure in Transformer for Better AMR-to-Text Generation	Jie Zhu, Junhui Li, Muhua Zhu, Longhua Qian, Min Zhang, Guodong Zhou	In this paper we eliminate such a strong limitation and propose a novel structure-aware self-attention approach to better model the relations between indirectly connected concepts in the state-of-the-art seq2seq model, i.e. the Transformer.
549	Syntax-Aware Aspect Level Sentiment Classification with Graph Attention Networks	Binxuan Huang, Kathleen Carley	In this paper, we propose a novel target-dependent graph attention network (TD-GAT) for aspect level sentiment classification, which explicitly utilizes the dependency relationship among words.
550	Learning Explicit and Implicit Structures for Targeted Sentiment Analysis	Hao Li, Wei Lu	In this work, we argue that both types of information (implicit and explicit structural information) are crucial for building a successful targeted sentiment analysis model.
551	Capsule Network with Interactive Attention for Aspect-Level Sentiment Classification	Chunning Du, Haifeng Sun, Jingyu Wang, Qi Qi, Jianxin Liao, Tong Xu, Ming Liu	To solve this problem, we propose to utilize capsule network to construct vector-based feature representation and cluster features by an EM routing algorithm.
552	Emotion Detection with Neural Personal Discrimination	Xiabing Zhou, Zhongqing Wang, Shoushan Li, Guodong Zhou, Min Zhang	Accordingly, we propose a Neural Personal Discrimination (NPD) approach to address above challenges by determining personal attributes from posts, and connecting relevant posts with similar attributes to jointly learn their emotions.
553	Specificity-Driven Cascading Approach for Unsupervised Sentiment Modification	Pengcheng Yang, Junyang Lin, Jingjing Xu, Jun Xie, Qi Su, Xu SUN	To remedy this, we propose a specificity-driven cascading approach in this work, which can effectively increase the specificity of the generated text and further improve content preservation.
554	LexicalAT: Lexical-Based Adversarial Reinforcement Training for Robust Sentiment Classification	Jingjing Xu, Liang Zhao, Hanqi Yan, Qi Zeng, Yun Liang, Xu SUN	In this work, we propose a novel adversarial training approach, LexicalAT, to improve the robustness of current classification models.
555	Leveraging Structural and Semantic Correspondence for Attribute-Oriented Aspect Sentiment Discovery	Zhe Zhang, Munindar Singh	We propose Trait, an unsupervised probabilistic model that discovers aspects and sentiments from text and associates them with different attributes.
556	From the Token to the Review: A Hierarchical Multimodal approach to Opinion Mining	Alexandre Garcia, Pierre Colombo, Florence d’Alché-Buc, Slim Essid, Chloé Clavel	In this work we aim at bridging the gap separating fine grained opinion models already developed for written language and coarse grained models developed for spontaneous multimodal opinion mining.
557	Shallow Domain Adaptive Embeddings for Sentiment Analysis	Prathusha K Sarma, Yingyu Liang, William Sethares	This paper proposes a way to improve the performance of existing algorithms for text classification in domains with strong language semantics.
558	Domain-Invariant Feature Distillation for Cross-Domain Sentiment Classification	Mengting Hu, Yike Wu, Shiwan Zhao, Honglei Guo, Renhong Cheng, Zhong Su	In this paper, we focus on aspect-level cross-domain sentiment classification, and propose to distill the domain-invariant sentiment features with the help of an orthogonal domain-dependent task, i.e. aspect detection, which is built on the aspects varying widely in different domains.
559	A Novel Aspect-Guided Deep Transition Model for Aspect Based Sentiment Analysis	Yunlong Liang, Fandong Meng, Jinchao Zhang, Jinan Xu, Yufeng Chen, Jie Zhou	In this paper, we propose a novel Aspect-Guided Deep Transition model, named AGDT, which utilizes the given aspect to guide the sentence encoding from scratch with the specially-designed deep transition architecture.
560	Human-Like Decision Making: Document-level Aspect Sentiment Classification via Hierarchical Reinforcement Learning	Jingjing Wang, Changlong Sun, Shoushan Li, Jiancheng Wang, Luo Si, Min Zhang, Xiaozhong Liu, Guodong Zhou	In this paper, to simulating the steps of analyzing aspect sentiment in a document by human beings, we propose a new Hierarchical Reinforcement Learning (HRL) approach to DASC.
561	A Dataset of General-Purpose Rebuttal	Matan Orbach, Yonatan Bilu, Ariel Gera, Yoav Kantor, Lena Dankin, Tamar Lavee, Lili Kotlerman, Shachar Mirkin, Michal Jacovi, Ranit Aharonov, Noam Slonim	Here we present a novel task of producing a critical response to a long argumentative text, and suggest a method based on general rebuttal arguments to address it.
562	Rethinking Attribute Representation and Injection for Sentiment Classification	Reinald Kim Amplayo	The de facto standard method is to incorporate them as additional biases in the attention mechanism, and more performance gains are achieved by extending the model architecture. In this paper, we show that the above method is the least effective way to represent and inject attributes.
563	A Knowledge Regularized Hierarchical Approach for Emotion Cause Analysis	Chuang Fan, Hongyu Yan, Jiachen Du, Lin Gui, Lidong Bing, Min Yang, Ruifeng Xu, Ruibin Mao	In this paper, we propose a new method to extract emotion cause with a hierarchical neural model and knowledge-based regularizations, which aims to incorporate discourse context information and restrain the parameters by sentiment lexicon and common knowledge.
564	Automatic Argument Quality Assessment – New Datasets and Methods	Assaf Toledo, Shai Gretz, Edo Cohen-Karlik, Roni Friedman, Elad Venezian, Dan Lahav, Michal Jacovi, Ranit Aharonov, Noam Slonim	We explore the task of automatic assessment of argument quality.
565	Fine-Grained Analysis of Propaganda in News Article	Giovanni Da San Martino, Seunghak Yu, Alberto Barrón-Cedeño, Rostislav Petrov, Preslav Nakov	To overcome these limitations, we propose a novel task: performing fine-grained analysis of texts by detecting all fragments that contain propaganda techniques as well as their type.
566	Context-aware Interactive Attention for Multi-modal Sentiment and Emotion Analysis	Dushyant Singh Chauhan, Md Shad Akhtar, Asif Ekbal, Pushpak Bhattacharyya	In this paper, we introduce a recurrent neural network based approach for the multi-modal sentiment and emotion analysis. The proposed model learns the inter-modal interaction among the participating modalities through an auto-encoder mechanism.
567	Sequential Learning of Convolutional Features for Effective Text Classification	Avinash Madasu, Vijjini Anvesh Rao	In this paper, we present an experimental study on the fundamental blocks of CNNs in text categorization.
568	The Role of Pragmatic and Discourse Context in Determining Argument Impact	Esin Durmus, Faisal Ladhak, Claire Cardie	This paper presents a new dataset to initiate the study of this aspect of argumentation: it consists of a diverse collection of arguments covering 741 controversial topics and comprising over 47,000 claims.
569	Aspect-Level Sentiment Analysis Via Convolution over Dependency Tree	Kai Sun, Richong Zhang, Samuel Mensah, Yongyi Mao, Xudong Liu	We propose a method based on neural networks to identify the sentiment polarity of opinion words expressed on a specific aspect of a sentence.
570	Understanding Data Augmentation in Neural Machine Translation: Two Perspectives towards Generalization	Guanlin Li, Lemao Liu, Guoping Huang, Conghui Zhu, Tiejun Zhao	Based on the observation, this paper makes an initial attempt to answer a fundamental question: what benefits, which are consistent across different methods and tasks, does DA in general obtain?
571	Simple and Effective Noisy Channel Modeling for Neural Machine Translation	Kyra Yee, Yann Dauphin, Michael Auli	We pursue an alternative approach based on standard sequence to sequence models which utilize the entire source.
572	MultiFiT: Efficient Multi-lingual Language Model Fine-tuning	Julian Eisenschlos, Sebastian Ruder, Piotr Czapla, Marcin Kadras, Sylvain Gugger, Jeremy Howard	We propose Multi-lingual language model Fine-Tuning (MultiFiT) to enable practitioners to train and fine-tune language models efficiently in their own language.
573	Hint-Based Training for Non-Autoregressive Machine Translation	Zhuohan Li, Zi Lin, Di He, Fei Tian, Tao QIN, Liwei WANG, Tie-Yan Liu	In this paper, we proposed a novel approach to leveraging the hints from hidden states and word alignments to help the training of NART models.
574	Working Hard or Hardly Working: Challenges of Integrating Typology into Neural Dependency Parsers	Adam Fisch, Jiang Guo, Regina Barzilay	This paper explores the task of leveraging typology in the context of cross-lingual dependency parsing.
575	Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing	Yuxuan Wang, Wanxiang Che, Jiang Guo, Yijia Liu, Ting Liu	We propose Cross-Lingual BERT Transformation (CLBT), a simple and efficient approach to generate cross-lingual contextualized word embeddings based on publicly available pre-trained BERT models (Devlin et al., 2018).
576	Multilingual Grammar Induction with Continuous Language Identification	Wenjuan Han, Ge Wang, Yong Jiang, Kewei Tu	In this work, we propose a novel universal grammar induction approach that represents language identities with continuous vectors and employs a neural network to predict grammar parameters based on the representation.
577	Quantifying the Semantic Core of Gender Systems	Adina Williams, Damian Blasi, Lawrence Wolf-Sonkin, Hanna Wallach, Ryan Cotterell	In this work, we present the first large-scale investigation of the arbitrariness of gender assignment that uses canonical correlation analysis as a method for correlating the gender of inanimate nouns with their lexical semantic meaning.
578	Perturbation Sensitivity Analysis to Detect Unintended Model Biases	Vinodkumar Prabhakaran, Ben Hutchinson, Margaret Mitchell	Based on this idea, we propose a generic evaluation framework, Perturbation Sensitivity Analysis, which detects unintended model biases related to named entities, and requires no new annotations or corpora.
579	Automatically Inferring Gender Associations from Language	Serina Chang, Kathy McKeown	In this paper, we pose the question: do people talk about women and men in different ways? We introduce two datasets and a novel integration of approaches for automatically inferring gender associations from language, discovering coherent word clusters, and labeling the clusters for the semantic concepts they represent.
580	Reporting the Unreported: Event Extraction for Analyzing the Local Representation of Hate Crimes	Aida Mostafazadeh Davani, Leigh Yeh, Mohammad Atari, Brendan Kennedy, Gwenyth Portillo Wightman, Elaine Gonzalez, Natalie Delong, Rhea Bhatia, Arineh Mirinjian, Xiang Ren, Morteza Dehghani	Here, we first demonstrate that event extraction and multi-instance learning, applied to a corpus of local news articles, can be used to predict instances of hate crime. We then use the trained model to detect incidents of hate in cities for which the FBI lacks statistics.
581	Minimally Supervised Learning of Affective Events Using Discourse Relations	Jun Saito, Yugo Murawaki, Sadao Kurohashi	In this paper, we propose to propagate affective polarity using discourse relations.
582	Event Detection with Multi-Order Graph Convolution and Aggregated Attention	Haoran Yan, Xiaolong Jin, Xiangbin Meng, Jiafeng Guo, Xueqi Cheng	For this reason, this paper proposes a new method for event detection, which uses a dependency tree based graph convolution network with aggregative attention to explicitly model and aggregate multi-order syntactic representations in sentences.
583	Coverage of Information Extraction from Sentences and Paragraphs	Simon Razniewski, Nitisha Jain, Paramita Mirza, Gerhard Weikum	In this paper we discuss the importance of scalar implicatures in the context of textual information extraction.
584	HMEAE: Hierarchical Modular Event Argument Extraction	Xiaozhi Wang, Ziqi Wang, Xu Han, Zhiyuan Liu, Juanzi Li, Peng Li, Maosong Sun, Jie Zhou, Xiang Ren	In this paper, we propose a Hierarchical Modular Event Argument Extraction (HMEAE) model, to provide effective inductive bias from the concept hierarchy of event argument roles.
585	Entity, Relation, and Event Extraction with Contextualized Span Representations	David Wadden, Ulme Wennberg, Yi Luan, Hannaneh Hajishirzi	We examine the capabilities of a unified, multi-task framework for three information extraction tasks: named entity recognition, relation extraction, and event extraction.
586	Next Sentence Prediction helps Implicit Discourse Relation Classification within and across Domains	Wei Shi, Vera Demberg	We here show that this shortcoming can be effectively addressed by using the bidirectional encoder representation from transformers (BERT) proposed by Devlin et al. (2019), which were trained on a next-sentence prediction task, and thus encode a representation of likely next sentences.
587	Split or Merge: Which is Better for Unsupervised RST Parsing?	Naoki Kobayashi, Tsutomu Hirao, Kengo Nakamura, Hidetaka Kamigaito, Manabu Okumura, Masaaki Nagata	In this paper, we present two language-independent unsupervised RST parsing methods based on dynamic programming.
588	BERT for Coreference Resolution: Baselines and Analysis	Mandar Joshi, Omer Levy, Luke Zettlemoyer, Daniel Weld	We apply BERT to coreference resolution, achieving a new state of the art on the GAP (+11.5 F1) and OntoNotes (+3.9 F1) benchmarks.
589	Linguistic Versus Latent Relations for Modeling Coherent Flow in Paragraphs	Dongyeop Kang, Eduard Hovy	In order to produce a coherent flow of text, we explore two forms of intersentential relations in a paragraph: one is a human-created linguistical relation that forms a structure (e.g., discourse tree) and the other is a relation from latent representation learned from the sentences themselves.
590	Event Causality Recognition Exploiting Multiple Annotators’ Judgments and Background Knowledge	Kazuma Kadowaki, Ryu Iida, Kentaro Torisawa, Jong-Hoon Oh, Julien Kloetzer	We propose new BERT-based methods for recognizing event causality such as “smoke cigarettes” –{\textgreater} “die of lung cancer” written in web texts.
591	What Part of the Neural Network Does This? Understanding LSTMs by Measuring and Dissecting Neurons	Ji Xin, Jimmy Lin, Yaoliang Yu	We find inspiration from biologists and study the affinity between individual neurons and labels, propose a novel metric to quantify the sensitivity of neurons to each label, and conduct experiments to show the validity of our proposed metric.
592	Quantity doesn’t buy quality syntax with neural language models	Marten van Schijndel, Aaron Mueller, Tal Linzen	We investigate to what extent these shortcomings can be mitigated by increasing the size of the network and the corpus on which it is trained.
593	Higher-order Comparisons of Sentence Encoder Representations	Mostafa Abdou, Artur Kulmizev, Felix Hill, Daniel M. Low, Anders Søgaard	We demonstrate the utility of RSA by establishing a previously unknown correspondence between widely-employed pretrained language encoders and human processing difficulty via eye-tracking data, showcasing its potential in the interpretability toolbox for neural models.
594	Text Genre and Training Data Size in Human-like Parsing	John Hale, Adhiguna Kuncoro, Keith Hall, Chris Dyer, Jonathan Brennan	Domain-specific training typically makes NLP systems work better. We show that this extends to cognitive modeling as well by relating the states of a neural phrase-structure parser to electrophysiological measures from human participants.
595	Feature2Vec: Distributional semantic modelling of human property knowledge	Steven Derby, Paul Miller, Barry Devereux	We propose a method for mapping human property knowledge onto a distributional semantic space, which adapts the word2vec architecture to the task of modelling concept features.
596	Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation	Arijit Ray, Karan Sikka, Ajay Divakaran, Stefan Lee, Giedrius Burachas	In this work, we introduce a dataset, ConVQA, and metrics that enable quantitative evaluation of consistency in VQA. Further, we propose a consistency-improving data augmentation module, a Consistency Teacher Module (CTM).
597	GeoSQA: A Benchmark for Scenario-based Question Answering in the Geography Domain at High School Level	Zixian Huang, Yulin Shen, Xiao Li, Yu’ang Wei, Gong Cheng, Lin Zhou, Xinyu Dai, Yuzhong Qu	In this paper, we introduce the GeoSQA dataset.
598	Revisiting the Evaluation of Theory of Mind through Question Answering	Matthew Le, Y-Lan Boureau, Maximilian Nickel	In this work, we revisit the evaluation of theory of mind through question answering.
599	Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering	Zhiguo Wang, Patrick Ng, Xiaofei Ma, Ramesh Nallapati, Bing Xiang	To tackle this issue, we propose a multi-passage BERT model to globally normalize answer scores across all passages of the same question, and this change enables our QA model find better answers by utilizing more passages.
600	A Span-Extraction Dataset for Chinese Machine Reading Comprehension	Yiming Cui, Ting Liu, Wanxiang Che, Li Xiao, Zhipeng Chen, Wentao Ma, Shijin Wang, Guoping Hu	In this paper, we introduce a Span-Extraction dataset for Chinese machine reading comprehension to add language diversities in this area.
601	MICRON: Multigranular Interaction for Contextualizing RepresentatiON in Non-factoid Question Answering	Hojae Han, Seungtaek Choi, Haeju Park, Seung-won Hwang	Specifically, we propose MICRON: Multigranular Interaction for Contextualizing RepresentatiON, a novel approach which derives contextualized uni-gram representation from n-grams.
602	Machine Reading Comprehension Using Structural Knowledge Graph-aware Network	Delai Qiu, Yuanzhe Zhang, Xinwei Feng, Xiangwen Liao, Wenbin Jiang, Yajuan Lyu, Kang Liu, Jun Zhao	To this end, we propose a Structural Knowledge Graph-aware Network(SKG) model, constructing sub-graphs for entities in the machine comprehension context.
603	Answering Conversational Questions on Structured Data without Logical Forms	Thomas Mueller, Francesco Piccinno, Peter Shaw, Massimo Nicosia, Yasemin Altun	We present a novel approach to answering sequential questions based on structured objects such as knowledge bases or tables without using a logical form as an intermediate representation.
604	Improving Answer Selection and Answer Triggering using Hard Negatives	Sawan Kumar, shweta garg, Kartik Mehta, Nikhil Rasiwasia	In this paper, we establish the effectiveness of using hard negatives, coupled with a siamese network and a suitable loss function, for the tasks of answer selection and answer triggering.
605	Can You Unpack That? Learning to Rewrite Questions-in-Context	Ahmed Elgohary, Denis Peskov, Jordan Boyd-Graber	We introduce the task of question-in-context rewriting: given the context of a conversation’s history, rewrite a context-dependent into a self-contained question with the same answer.
606	Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning	Pradeep Dasigi, Nelson F. Liu, Ana Marasovic, Noah A. Smith, Matt Gardner	We present a new crowdsourced dataset containing more than 24K span-selection questions that require resolving coreference among entities in over 4.7K English paragraphs from Wikipedia.
607	Zero-shot Reading Comprehension by Cross-lingual Transfer Learning with Multi-lingual Language Representation Model	Tsung-Yuan Hsu, Chi-Liang Liu, Hung-yi Lee	In this paper, we systematically explore zero-shot cross-lingual transfer learning on reading comprehension tasks with language representation model pre-trained on multi-lingual corpus.
608	QuaRTz: An Open-Domain Dataset of Qualitative Relationship Questions	Oyvind Tafjord, Matt Gardner, Kevin Lin, Peter Clark	We introduce the first open-domain dataset, called QuaRTz, for reasoning about textual qualitative relationships.
609	Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension	Daniel Andor, Luheng He, Kenton Lee, Emily Pitler	We enable a BERT-based reading comprehension model to perform lightweight numerical reasoning.
610	A Gated Self-attention Memory Network for Answer Selection	Tuan Lai, Quan Hung Tran, Trung Bui, Daisuke Kihara	In this work, we take a departure from the popular Compare-Aggregate architecture, and instead, propose a new gated self-attention memory network for the task.
611	Polly Want a Cracker: Analyzing Performance of Parroting on Paraphrase Generation Datasets	Hong-Ren Mao, Hung-Yi Lee	In this paper, we analyze datasets commonly used for paraphrase generation research, and show that simply parroting input sentences surpasses state-of-the-art models in the literature when evaluated on standard metrics.
612	Query-focused Sentence Compression in Linear Time	Abram Handler, Brendan O’Connor	This work introduces a new transition-based sentence compression technique developed for such settings.
613	Generating Personalized Recipes from Historical User Preferences	Bodhisattwa Prasad Majumder, Shuyang Li, Jianmo Ni, Julian McAuley	We propose a new task of personalized recipe generation to help these users: expanding a name and incomplete ingredient details into complete natural-text instructions aligned with the user’s historical preferences.
614	Generating Highly Relevant Questions	Jiazuo Qiu, Deyi Xiong	The neural seq2seq based question generation (QG) is prone to generating generic and undiversified questions that are poorly relevant to the given passage and target answer. In this paper, we propose two methods to address the issue.
615	Improving Neural Story Generation by Targeted Common Sense Grounding	Huanru Henry Mao, Bodhisattwa Prasad Majumder, Julian McAuley, Garrison Cottrell	We propose a simple multi-task learning scheme to achieve quantitatively better common sense reasoning in language models by leveraging auxiliary training signals from datasets designed to provide common sense grounding.
616	Abstract Text Summarization: A Low Resource Challenge	Shantipriya Parida, Petr Motlicek	We propose an iterative data augmentation approach which uses synthetic data along with the real summarization data for the German language.
617	Generating Modern Poetry Automatically in Finnish	Mika Hämäläinen, Khalid Alnajjar	We present a novel approach for generating poetry automatically for the morphologically rich Finnish language by using a genetic algorithm.
618	SUM-QE: a BERT-based Summary Quality Estimation Model	Stratos Xenouleas, Prodromos Malakasiotis, Marianna Apidianaki, Ion Androutsopoulos	We propose SUM-QE, a novel Quality Estimation model for summarization based on BERT.
619	An Empirical Comparison on Imitation Learning and Reinforcement Learning for Paraphrase Generation	Wanyu Du, Yangfeng Ji	In this work, we present an empirical study on how RL and IL can help boost the performance of generating paraphrases, with the pointer-generator as a base model.
620	Countering the Effects of Lead Bias in News Summarization via Multi-Stage Training and Auxiliary Losses	Matt Grenander, Yue Dong, Jackie Chi Kit Cheung, Annie Louis	We propose two techniques to make systems sensitive to the importance of content in different parts of the article.
621	Learning Rhyming Constraints using Structured Adversaries	Harsh Jhamtani, Sanket Vaibhav Mehta, Jaime Carbonell, Taylor Berg-Kirkpatrick	We propose an alternate approach that uses a structured discriminator to learn a poetry generator that directly captures rhyming constraints in a generative adversarial setup.
622	Question-type Driven Question Generation	Wenjie Zhou, Minghua Zhang, Yunfang Wu	We propose to automatically predict the question type based on the input answer and context.
623	Deep Reinforcement Learning with Distributional Semantic Rewards for Abstractive Summarization	Siyao Li, Deren Lei, Pengda Qin, William Yang Wang	In this paper, instead of Rouge-L, we explore the practicability of utilizing the distributional semantics to measure the matching degrees.
624	Clause-Wise and Recursive Decoding for Complex and Cross-Domain Text-to-SQL Generation	Dongjun Lee	In this paper, we propose a SQL clause-wise decoding neural architecture with a self-attention based database schema encoder to address the Spider task.
625	Do Nuclear Submarines Have Nuclear Captains? A Challenge Dataset for Commonsense Reasoning over Adjectives and Objects	James Mullenbach, Jonathan Gordon, Nanyun Peng, Jonathan May	To attack this challenge, we crowdsource a set of human judgments that answer the English-language question “Given a whole described by an adjective, does the adjective also describe a given part?”
626	Aggregating Bidirectional Encoder Representations Using MatchLSTM for Sequence Matching	Bo Shao, Yeyun Gong, Weizhen Qi, Nan Duan, Xiaola Lin	In this work, we propose an aggregation method to combine the Bidirectional Encoder Representations from Transformer (BERT) with a MatchLSTM layer for Sequence Matching.
627	What Does This Word Mean? Explaining Contextualized Embeddings with Natural Language Definition	Ting-Yun Chang, Yun-Nung Chen	To further investigate what contextualized word embeddings capture, this paper analyzes whether they can indicate the corresponding sense definitions and proposes a general framework that is capable of explaining word meanings given contextualized word embeddings for better interpretation.
628	Pre-Training BERT on Domain Resources for Short Answer Grading	Chul Sung, Tejas Dhamecha, Swarnadeep Saha, Tengfei Ma, Vinay Reddy, Rishi Arora	In this paper, we explore ways of improving the pre-trained contextual representations for the task of automatic short answer grading, a critical component of intelligent tutoring systems.
629	WIQA: A dataset for “What if…” reasoning over procedural text	Niket Tandon, Bhavana Dalvi, Keisuke Sakaguchi, Peter Clark, Antoine Bosselut	We introduce WIQA, the first large-scale dataset of “What if…” questions over procedural text.
630	Evaluating BERT for natural language inference: A case study on the CommitmentBank	Nanjiang Jiang, Marie-Catherine de Marneffe	We address this problem by recasting the CommitmentBank for NLI, which contains items involving reasoning over the extent to which a speaker is committed to complements of clause-embedding verbs under entailment-canceling environments (conditional, negation, modal and question).
631	Incorporating Domain Knowledge into Medical NLI using Knowledge Graphs	Soumya Sharma, Bishal Santra, Abhik Jana, Santosh Tokala, Niloy Ganguly, Pawan Goyal	In this paper, we explore how to incorporate structured domain knowledge, available in the form of a knowledge graph (UMLS), for the Medical NLI task.
632	The FLORES Evaluation Datasets for Low-Resource Machine Translation: Nepali–English and Sinhala–English	Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, Marc’Aurelio Ranzato	In this work, we introduce the FLORES evaluation datasets for Nepali-English and Sinhala- English, based on sentences translated from Wikipedia.
633	Mask-Predict: Parallel Decoding of Conditional Masked Language Models	Marjan Ghazvininejad, Omer Levy, Yinhan Liu, Luke Zettlemoyer	We, instead, use a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a partially masked target translation.
634	Learning to Copy for Automatic Post-Editing	Xuancheng Huang, Yang Liu, Huanbo Luan, Jingfang Xu, Maosong Sun	In this work, we propose a new method for modeling copying for APE.
635	Exploring Human Gender Stereotypes with Word Association Test	Yupei Du, Yuanbin Wu, Man Lan	In this work, we utilize word association test, which contains rich types of word connections annotated by human participants, to explore how gender stereotypes spread within our minds.
636	A Modular Architecture for Unsupervised Sarcasm Generation	Abhijit Mishra, Tarun Tater, Karthik Sankaranarayanan	In this paper, we propose a novel framework for sarcasm generation; the system takes a literal negative opinion as input and translates it into a sarcastic version.
637	Generating Classical Chinese Poems from Vernacular Chinese	Zhichao Yang, Pengshan Cai, Yansong Feng, Fei Li, Weijiang Feng, Elena Suet-Ying Chiu, hong yu	In this paper, we propose a novel task of generating classical Chinese poems from vernacular, which allows users to have more control over the semantic of generated poems.
638	Set to Ordered Text: Generating Discharge Instructions from Medical Billing Codes	Litton J Kurisinkel, Nancy Chen	We present set to ordered text, a natural language generation task applied to automatically generating discharge instructions from admission ICD (International Classification of Diseases) codes.
639	Constraint-based Learning of Phonological Processes	Shraddha Barke, Rose Kunkel, Nadia Polikarpova, Eric Meinhardt, Eric Bakovic, Leon Bergen	We present an unsupervised approach to learning human-readable descriptions of phonological processes from collections of related utterances.
640	Detect Camouflaged Spam Content via StoneSkipping: Graph and Text Joint Embedding for Chinese Character Variation Representation	Zhuoren Jiang, Zhe Gao, Guoxiu He, Yangyang Kang, Changlong Sun, Qiong Zhang, Luo Si, Xiaozhong Liu	This paper proposes a novel framework to jointly model Chinese variational, semantic, and contextualized representations for Chinese text spam detection task.
641	An Attentive Fine-Grained Entity Typing Model with Latent Type Representation	Ying Lin, Heng Ji	We propose a fine-grained entity typing model with a novel attention mechanism and a hybrid type classifier.
642	An Improved Neural Baseline for Temporal Relation Extraction	Qiang Ning, Sanjay Subramanian, Dan Roth	This paper proposes a new neural system that achieves about 10% absolute improvement in accuracy over the previous best system (25% error reduction) on two benchmark datasets.
643	Improving Fine-grained Entity Typing with Entity Linking	Hongliang Dai, Donghong Du, Xin Li, Yangqiu Song	In this paper, we use entity linking to help with the fine-grained entity type classification process.
644	Combining Spans into Entities: A Neural Two-Stage Approach for Recognizing Discontiguous Entities	Bailin Wang, Wei Lu	In this work, we propose a neural two-stage approach to recognizing discontiguous and overlapping entities by decomposing this problem into two subtasks: 1) it first detects all the overlapping spans that either form entities on their own or present as segments of discontiguous entities, based on the representation of segmental hypergraph, 2) next it learns to combine these segments into discontiguous entities with a classifier, which filters out other incorrect combinations of segments.
645	Cross-Sentence N-ary Relation Extraction using Lower-Arity Universal Schemas	Kosuke Akimoto, Takuya Hiraoka, Kunihiko Sadamasa, Mathias Niepert	In this paper, we propose a novel approach to cross-sentence n-ary relation extraction based on universal schemas.
646	Gazetteer-Enhanced Attentive Neural Networks for Named Entity Recognition	Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun, Bin Dong, Shanshan Jiang	To alleviate this problem, this paper proposes Gazetteer-Enhanced Attentive Neural Networks, which can enhance region-based NER by learning name knowledge of entity mentions from easily-obtainable gazetteers, rather than only from fully-annotated data.
647	“A Buster Keaton of Linguistics”: First Automated Approaches for the Extraction of Vossian Antonomasia	Michel Schwab, Robert Jäschke, Frank Fischer, Jannik Strötgen	In this paper, we propose a first method for the extraction of VAs that works completely automatically.
648	Multi-Task Learning for Chemical Named Entity Recognition with Chemical Compound Paraphrasing	Taiki Watanabe, Akihiro Tamura, Takashi Ninomiya, Takuya Makino, Tomoya Iwakura	We propose a method to improve named entity recognition (NER) for chemical compounds using multi-task learning by jointly training a chemical NER model and a chemical com- pound paraphrase model.
649	FewRel 2.0: Towards More Challenging Few-Shot Relation Classification	Tianyu Gao, Xu Han, Hao Zhu, Zhiyuan Liu, Peng Li, Maosong Sun, Jie Zhou	We present FewRel 2.0, a more challenging task to investigate two aspects of few-shot relation classification models: (1) Can they adapt to a new domain with only a handful of instances?
650	ner and pos when nothing is capitalized	Stephen Mayhew, Tatiana Tsygankova, Dan Roth	In this work, we perform a systematic analysis of solutions to this problem, modifying only the casing of the train or test data using lowercasing and truecasing methods.
651	CaRB: A Crowdsourced Benchmark for Open IE	Sangnie Bhardwaj, Samarth Aggarwal, Mausam Mausam	We contribute CaRB, an improved dataset and framework for testing Open IE systems.
652	Weakly Supervised Attention Networks for Entity Recognition	Barun Patra, Joel Ruben Antony Moniz	In this work, we aim to circumvent this requirement of word-level annotated data.
653	Revealing and Predicting Online Persuasion Strategy with Elementary Units	Gaku Morio, Ryo Egawa, Katsuhide Fujita	Our contributions are as follows: (1) annotating five types of EUs in a persuasive forum, the so-called ChangeMyView, (2) revealing both intuitive and non-intuitive strategic insights for the persuasion by analyzing 4612 annotated EUs, and (3) proposing baseline neural models that identify the EU boundary and type.
654	A Challenge Dataset and Effective Models for Aspect-Based Sentiment Analysis	Qingnan Jiang, Lei Chen, Ruifeng Xu, Xiang Ao, Min Yang	In this paper, we present a new large-scale Multi-Aspect Multi-Sentiment (MAMS) dataset, in which each sentence contains at least two different aspects with different sentiment polarities.
655	Learning with Noisy Labels for Sentence-level Sentiment Classification	Hao Wang, Bing Liu, Chaozhuo Li, Yan Yang, Tianrui Li	We propose a novel DNN model called NetAb (as shorthand for convolutional neural Networks with Ab-networks) to handle noisy labels during training.
656	DENS: A Dataset for Multi-class Emotion Analysis	Chen Liu, Muhammad Osama, Anderson De Andrade	We introduce a new dataset for multi-class emotion analysis from long-form narratives in English.
657	Multi-Task Stance Detection with Sentiment and Stance Lexicons	Yingjie Li, Cornelia Caragea	In this paper, we propose a multi-task framework that incorporates target-specific attention mechanism and at the same time takes sentiment classification as an auxiliary task.
658	A Robust Self-Learning Framework for Cross-Lingual Text Classification	Xin Dong, Gerard de Melo	In this paper, we present an elegantly simple robust self-learning framework to include unlabeled non-English samples in the fine-tuning process of pretrained multilingual representation models.
659	Learning to Flip the Sentiment of Reviews from Non-Parallel Corpora	Canasai Kruengkrai	We introduce a method for acquiring imperfectly aligned sentences from non-parallel corpora and propose a model that learns to minimize the sentiment and content losses in a fully end-to-end manner.
660	Label Embedding using Hierarchical Structure of Labels for Twitter Classification	Taro Miyazaki, Kiminobu Makino, Yuka Takei, Hiroki Okamoto, Jun Goto	Therefore, we propose a method that can consider the hierarchical structure of labels and label texts themselves.
661	Interpretable Word Embeddings via Informative Priors	Miriam Hurtado Bodell, Martin Arvidsson, Måns Magnusson	We propose the use of informative priors to create interpretable and domain-informed dimensions for probabilistic word embeddings.
662	Adversarial Removal of Demographic Attributes Revisited	Maria Barrett, Yova Kementchedjhieva, Yanai Elazar, Desmond Elliott, Anders Søgaard	We revisit their experiments and conduct a series of follow-up experiments showing that, in fact, the diagnostic classifier generalizes poorly to both new in-domain samples and new domains, indicating that it relies on correlations specific to their particular data sample.
663	A deep-learning framework to detect sarcasm targets	Jasabanta Patro, Srijan Bansal, Animesh Mukherjee	In this paper we propose a deep learning framework for sarcasm target detection in predefined sarcastic texts.
664	In Plain Sight: Media Bias Through the Lens of Factual Reporting	Lisa Fan, Marshall White, Eva Sharma, Ruisi Su, Prafulla Kumar Choubey, Ruihong Huang, Lu Wang	In this work, we investigate the effects of informational bias: factual content that can nevertheless be deployed to sway reader opinion.
665	Incorporating Label Dependencies in Multilabel Stance Detection	William Ferreira, Andreas Vlachos	In this paper, we address versions of the task in which an utterance can have multiple labels, thus corresponding to multilabel classification.
666	Investigating Sports Commentator Bias within a Large Corpus of American Football Broadcasts	Jack Merullo, Luke Yeh, Abram Handler, Alvin Grissom II, Brendan O’Connor, Mohit Iyyer	We identify major confounding factors for researchers examining racial bias in FOOTBALL, and perform a computational analysis that supports conclusions from prior social science studies.
667	Charge-Based Prison Term Prediction with Deep Gating Network	Huajie Chen, Deng Cai, Wei Dai, Zehui Dai, Yadong Ding	In this paper, we argue that charge-based prison term prediction (CPTP) not only better fits realistic needs, but also makes the total prison term prediction more accurate and interpretable. We collect the first large-scale structured data for CPTP and evaluate several competitive baselines.
668	Restoring ancient text using deep learning: a case study on Greek epigraphy	Yannis Assael, Thea Sommerschield, Jonathan Prag	This work presents Pythia, the first ancient text restoration model that recovers missing characters from a damaged text input using deep neural networks.
669	Embedding Lexical Features via Tensor Decomposition for Small Sample Humor Recognition	Zhenjie Zhao, Andrew Cattle, Evangelos Papalexakis, Xiaojuan Ma	We propose a novel tensor embedding method that can effectively extract lexical features for humor recognition.
670	EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks	Jason Wei, Kai Zou	We present EDA: easy data augmentation techniques for boosting performance on text classification tasks.
671	Neural News Recommendation with Multi-Head Self-Attention	Chuhan Wu, Fangzhao Wu, Suyu Ge, Tao Qi, Yongfeng Huang, Xing Xie	In this paper, we propose a neural news recommendation approach with multi-head self-attention (NRMS).
672	What Matters for Neural Cross-Lingual Named Entity Recognition: An Empirical Analysis	Xiaolei Huang, Jonathan May, Nanyun Peng	In this paper, we first propose a simple and efficient neural architecture for cross-lingual NER.
673	Telling the Whole Story: A Manually Annotated Chinese Dataset for the Analysis of Humor in Jokes	Dongyu Zhang, Heting Zhang, Xikai Liu, Hongfei LIN, Feng Xia	We propose a novel annotation scheme to give scenarios of how humor arises in text. We therefore create a dataset on humor with 9,123 manually annotated jokes in Chinese.
674	Generating Natural Anagrams: Towards Language Generation Under Hard Combinatorial Constraints	Masaaki Nishino, Sho Takase, Tsutomu Hirao, Masaaki Nagata	In this paper, we show that simple depth-first search can yield natural anagrams when it is combined with modern neural language models.
675	STANCY: Stance Classification Based on Consistency Cues	Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, Gerhard Weikum	In this work, we present a neural network model for stance classification leveraging BERT representations and augmenting them with a novel consistency constraint.
676	Cross-lingual intent classification in a low resource industrial setting	Talaat Khalil, Kornel Kie?czewski, Georgios Christos Chouliaras, Amina Keldibek, Maarten Versteegh	This paper explores different approaches to multilingual intent classification in a low resource setting.
677	SoftRegex: Generating Regex from Natural Language Descriptions using Softened Regex Equivalence	Jun-U Park, Sang-Ki Ko, Marco Cognetta, Yo-Sub Han	Since the regular expression equivalence problem is PSPACE-complete, we introduce the EQ_Reg model for computing the simi-larity of two regular expressions using deep neural networks.
678	Using Clinical Notes with Time Series Data for ICU Management	Swaraj Khadanga, Karan Aggarwal, Shafiq Joty, Jaideep Srivastava	We propose a method to model them jointly, achieving considerable improvement across benchmark tasks over baseline time-series model.
679	Spelling-Aware Construction of Macaronic Texts for Teaching Foreign-Language Vocabulary	Adithya Renduchintala, Philipp Koehn, Jason Eisner	We present a machine foreign-language teacher that modifies text in a student’s native language (L1) by replacing some word tokens with glosses in a foreign language (L2), in such a way that the student can acquire L2 vocabulary simply by reading the resulting macaronic text.
680	Towards Machine Reading for Interventions from Humanitarian-Assistance Program Literature	Bonan Min, Yee Seng Chan, Haoling Qiu, Joshua Fasching	In this paper, we developed a corpus annotated with interventions to foster research, and developed an information extraction system for extracting interventions and their location and time from text.
681	RUN through the Streets: A New Dataset and Baseline Models for Realistic Urban Navigation	Tzuf Paz-Argaman, Reut Tsarfaty	Here we introduce the Realistic Urban Navigation (RUN) task, aimed at interpreting NL navigation instructions based on a real, dense, urban map. Using Amazon Mechanical Turk, we collected a dataset of 2515 instructions aligned with actual routes over three regions of Manhattan.
682	Context-Aware Conversation Thread Detection in Multi-Party Chat	Ming Tan, Dakuo Wang, Yupeng Gao, Haoyu Wang, Saloni Potdar, Xiaoxiao Guo, Shiyu Chang, Mo Yu	In this work, we propose a novel Context-Aware Thread Detection (CATD) model that automatically disentangles these conversation threads.