Paper Digest: EMNLP 2023 Highlights
Note: EMNLP-2023 accepts more than 1,100 papers, this page only includes 300 of them selected by our daily paper digest algorithm. Interested users can choose to read All 1,100 EMNLP-2023 papers in a separate page, which takes quite some time to load.
To search or review papers within EMNLP-2023 related to a specific topic, please use the search by venue (EMNLP-2023), review by venue (EMNLP-2023) and question answering by venue (EMNLP-2023) services. To browse papers by author, here is a list of all authors (EMNLP-2023). You may also like to explore our “Best Paper” Digest (EMNLP), which lists the most influential EMNLP papers since 1996.
Based in New York, Paper Digest is dedicated to producing high-quality text analysis results that people can acturally use on a daily basis. Since 2018, we have been serving users across the world with a number of exclusive services to track, search, review and rewrite scientific literature.
You are welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Paper Digest: EMNLP 2023 Highlights
Paper | Author(s) | |
---|---|---|
1 | Is ChatGPT A General-Purpose Natural Language Processing Task Solver? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we empirically analyze the zero-shot learning ability of ChatGPT by evaluating it on 20 popular NLP datasets covering 7 representative task categories. |
Chengwei Qin; Aston Zhang; Zhuosheng Zhang; Jiaao Chen; Michihiro Yasunaga; Diyi Yang; |
2 | Lifelong Sequence Generation with Dynamic Module Expansion and Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the learning paradigm of humans, we propose Dynamic Module Expansion and Adaptation (DMEA), which enables the model to dynamically determine the architecture for acquiring new knowledge based on task correlation and select the most similar previous tasks to facilitate adaptation to new tasks. |
Chengwei Qin; Chen Chen; Shafiq Joty; |
3 | FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce FACTSCORE, a new evaluation that breaks a generation into a series of atomic facts and computes the percentage of atomic facts supported by a reliable knowledge source. |
Sewon Min; Kalpesh Krishna; Xinxi Lyu; Mike Lewis; Wen-tau Yih; Pang Koh; Mohit Iyyer; Luke Zettlemoyer; Hannaneh Hajishirzi; |
4 | Automatic Prompt Optimization with �Gradient Descent� and Beam Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large Language Models (LLMs) have shown impressive performance as general purpose agents, but their abilities remain highly dependent on prompts which are hand written with onerous trial-and-error effort. We propose a simple and nonparametric solution to this problem, Prompt Optimization with Textual Gradients (ProTeGi), which is inspired by numerical gradient descent to automatically improve prompts, assuming access to training data and an LLM API. |
Reid Pryzant; Dan Iter; Jerry Li; Yin Lee; Chenguang Zhu; Michael Zeng; |
5 | Language Models with Rationality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This lack of interpretability is a growing impediment to widespread use of LLMs. To address this, our goals are to make model beliefs and their inferential relationships explicit, and to resolve inconsistencies that may exist, so that answers are supported by interpretable chains of reasoning drawn from a consistent network of beliefs. |
Nora Kassner; Oyvind Tafjord; Ashish Sabharwal; Kyle Richardson; Hinrich Schuetze; Peter Clark; |
6 | Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a pipeline that can automatically generate a high-quality multi-turn chat corpus by leveraging ChatGPT to engage in a conversation with itself. |
Canwen Xu; Daya Guo; Nan Duan; Julian McAuley; |
7 | Reasoning with Language Model Is Planning with World Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To overcome the limitations, we propose a new LLM reasoning framework, Reasoning via Planning (RAP). |
Shibo Hao; Yi Gu; Haodi Ma; Joshua Hong; Zhen Wang; Daisy Wang; Zhiting Hu; |
8 | Revisiting Machine Translation for Cross-lingual Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that, by using a stronger MT system and mitigating the mismatch between training on original text and running inference on machine translated text, translate-test can do substantially better than previously assumed. |
Mikel Artetxe; Vedanuj Goswami; Shruti Bhosale; Angela Fan; Luke Zettlemoyer; |
9 | API-Assisted Code Generation for Question Answering on Varied Table Structures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, this paper introduces a unified TableQA framework that: (1) provides a unified representation for structured tables as multi-index Pandas data frames, (2) uses Python as a powerful querying language, and (3) uses few-shot prompting to translate NL questions into Python programs, which are executable on Pandas data frames. |
Yihan Cao; Shuyi Chen; Ryan Liu; Zhiruo Wang; Daniel Fried; |
10 | Navigating The Grey Area: How Expressions of Uncertainty and Overconfidence Affect Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The increased deployment of LMs for real-world tasks involving knowledge and facts makes it important to understand model epistemology: what LMs think they know, and how their attitudes toward that knowledge are affected by language use in their inputs. Here, we study an aspect of model epistemology: how epistemic markers of certainty, uncertainty, or evidentiality like �I�m sure it�s�, �I think it�s�, or �Wikipedia says it�s� affect models, and whether they contribute to model failures. |
Kaitlyn Zhou; Dan Jurafsky; Tatsunori Hashimoto; |
11 | SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose �SelfCheckGPT�, a simple sampling-based approach that can be used to fact-check the responses of black-box models in a zero-resource fashion, i. e. without an external database. |
Potsawee Manakul; Adian Liusie; Mark Gales; |
12 | C-STS: Conditional Semantic Textual Similarity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it is an inherently ambiguous task, with the sentence similarity depending on the specific aspect of interest. We resolve this ambiguity by proposing a novel task called conditional STS (C-STS) which measures similarity conditioned on an aspect elucidated in natural language (hereon, condition). |
Ameet Deshpande; Carlos Jimenez; Howard Chen; Vishvak Murahari; Victoria Graf; Tanmay Rajpurohit; Ashwin Kalyan; Danqi Chen; Karthik Narasimhan; |
13 | Transcending Scaling Laws with 0.1% Extra Compute Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we continue training a baseline language model, PaLM, with ULR2, introducing a new set of models at 8B, 62B, and 540B scale which we call U-PaLM. |
Yi Tay; Jason Wei; Hyung Chung; Vinh Tran; David So; Siamak Shakeri; Xavier Garcia; Steven Zheng; Jinfeng Rao; Aakanksha Chowdhery; Denny Zhou; Donald Metzler; Slav Petrov; Neil Houlsby; Quoc Le; Mostafa Dehghani; |
14 | RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose RepoCoder, a simple, generic, and effective framework to address the challenge. |
Fengji Zhang; Bei Chen; Yue Zhang; Jacky Keung; Jin Liu; Daoguang Zan; Yi Mao; Jian-Guang Lou; Weizhu Chen; |
15 | Active Retrieval Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we provide a generalized view of active retrieval augmented generation, methods that actively decide when and what to retrieve across the course of the generation. |
Zhengbao Jiang; Frank Xu; Luyu Gao; Zhiqing Sun; Qian Liu; Jane Dwivedi-Yu; Yiming Yang; Jamie Callan; Graham Neubig; |
16 | MEGA: Multilingual Evaluation of Generative AI Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a thorough analysis of the performance of models across languages and tasks and discuss challenges in improving the performance of generative LLMs on low-resource languages. |
Kabir Ahuja; Harshita Diddee; Rishav Hada; Millicent Ochieng; Krithika Ramesh; Prachi Jain; Akshay Nambi; Tanuja Ganu; Sameer Segal; Mohamed Ahmed; Kalika Bali; Sunayana Sitaram; |
17 | CAPSTONE: Curriculum Sampling for Dense Retrieval with Document Expansion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a curriculum sampling strategy that utilizes pseudo queries during training and progressively enhances the relevance between the generated query and the real query. |
Xingwei He; Yeyun Gong; A-Long Jin; Hang Zhang; Anlei Dong; Jian Jiao; Siu Yiu; Nan Duan; |
18 | Document-Level Machine Translation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The study focuses on three aspects: 1) Effects of Context-Aware Prompts, where we investigate the impact of different prompts on document-level translation quality and discourse phenomena; 2) Comparison of Translation Models, where we compare the translation performance of ChatGPT with commercial MT systems and advanced document-level MT methods; 3) Analysis of Discourse Modelling Abilities, where we further probe discourse knowledge encoded in LLMs and shed light on impacts of training techniques on discourse modeling. |
Longyue Wang; Chenyang Lyu; Tianbo Ji; Zhirui Zhang; Dian Yu; Shuming Shi; Zhaopeng Tu; |
19 | We�re Afraid Language Models Aren�t Modeling Ambiguity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We capture ambiguity in a sentence through its effect on entailment relations with another sentence, and collect AmbiEnt, a linguist-annotated benchmark of 1,645 examples with diverse kinds of ambiguity. |
Alisa Liu; Zhaofeng Wu; Julian Michael; Alane Suhr; Peter West; Alexander Koller; Swabha Swayamdipta; Noah Smith; Yejin Choi; |
20 | CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, there is growing concern that these LLM simulations are flattened caricatures of the personas that they aim to simulate, failing to capture the multidimensionality of people and perpetuating stereotypes. To bridge these gaps, we present CoMPosT, a framework to characterize LLM simulations using four dimensions: Context, Model, Persona, and Topic. |
Myra Cheng; Tiziano Piccardi; Diyi Yang; |
21 | Answering Questions By Meta-Reasoning Over Multiple Chains of Thought Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Multi-Chain Reasoning (MCR), an approach which prompts large language models to meta-reason over multiple chains of thought, rather than aggregate their answers. |
Ori Yoran; Tomer Wolfson; Ben Bogin; Uri Katz; Daniel Deutch; Jonathan Berant; |
22 | Reward-Augmented Decoding: Efficient Controlled Text Generation With A Unidirectional Reward Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Reward-Augmented Decoding (RAD), a text generation procedure that uses a small unidirectional reward model to encourage a language model to generate text that has certain properties. |
Haikang Deng; Colin Raffel; |
23 | ReasoningLM: Enabling Structural Subgraph Reasoning in Pre-trained Language Models for Question Answering Over Knowledge Graph Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the effectiveness, due to the divergence in model architecture, the PLM and GNN are not closely integrated, limiting the knowledge sharing and fine-grained feature interactions. To solve it, we aim to simplify the above two-module approach, and develop a more capable PLM that can directly support subgraph reasoning for KGQA, namely ReasoningLM. |
Jinhao Jiang; Kun Zhou; Xin Zhao; Yaliang Li; Ji-Rong Wen; |
24 | StructGPT: A General Framework for Large Language Model to Reason Over Structured Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to improve the reasoning ability of large language models (LLMs) over structured data in a unified way. |
Jinhao Jiang; Kun Zhou; Zican Dong; Keming Ye; Xin Zhao; Ji-Rong Wen; |
25 | Contrastive Learning for Inference in Dialogue Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we analyze the behavior of the models based on the task difficulty defined by the semantic information gap � which distinguishes inductive and deductive reasoning. |
Etsuko Ishii; Yan Xu; Bryan Wilie; Ziwei Ji; Holy Lovenia; Willy Chung; Pascale Fung; |
26 | LM Vs LM: Detecting Factual Errors Via Cross Examination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by truth-seeking mechanisms in law, we propose a factuality evaluation framework for LMs that is based on cross-examination. |
Roi Cohen; May Hamri; Mor Geva; Amir Globerson; |
27 | Query2doc: Query Expansion with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a simple yet effective query expansion approach, denoted as query2doc, to improve both sparse and dense retrieval systems. |
Liang Wang; Nan Yang; Furu Wei; |
28 | XLM-V: Overcoming The Vocabulary Bottleneck in Multilingual Masked Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a new approach for scaling to very large multilingual vocabularies by de-emphasizing token sharing between languages with little lexical overlap and assigning vocabulary capacity to achieve sufficient coverage for each individual language. |
Davis Liang; Hila Gonen; Yuning Mao; Rui Hou; Naman Goyal; Marjan Ghazvininejad; Luke Zettlemoyer; Madian Khabsa; |
29 | WiCE: Real-World Entailment for Claims in Wikipedia Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia. |
Ryo Kamoi; Tanya Goyal; Juan Rodriguez; Greg Durrett; |
30 | TaskWeb: Selecting Better Source Tasks for Multi-task NLP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate whether knowing task relationships via pairwise task transfer improves choosing one or more source tasks that help to learn a new target task. |
Joongwon Kim; Akari Asai; Gabriel Ilharco; Hannaneh Hajishirzi; |
31 | Query Rewriting in Retrieval-Augmented Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces a new framework, Rewrite-Retrieve-Read instead of the previous retrieve-then-read for the retrieval-augmented LLMs from the perspective of the query rewriting. |
Xinbei Ma; Yeyun Gong; Pengcheng He; Hai Zhao; Nan Duan; |
32 | G-Eval: NLG Evaluation Using Gpt-4 with Better Human Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present G-Eval, a framework of using large language models with chain-of-thoughts (CoT) and a form-filling paradigm, to assess the quality of NLG outputs. |
Yang Liu; Dan Iter; Yichong Xu; Shuohang Wang; Ruochen Xu; Chenguang Zhu; |
33 | TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Alternatively, large language models (LLMs) have recently shown promising results in directly evaluating generative tasks, but are too computationally expensive for practical use. Motivated by these limitations, we introduce TrueTeacher, a method for generating synthetic data by annotating diverse model-generated summaries using a LLM. |
Zorik Gekhman; Jonathan Herzig; Roee Aharoni; Chen Elkind; Idan Szpektor; |
34 | Poisoning Retrieval Corpora By Injecting Adversarial Passages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel attack for dense retrieval systems in which a malicious user generates a small number of adversarial passages by perturbing discrete tokens to maximize similarity with a provided set of training queries. |
Zexuan Zhong; Ziqing Huang; Alexander Wettig; Danqi Chen; |
35 | MQuAKE: Assessing Knowledge Editing in Language Models Via Multi-Hop Questions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a benchmark MQuAKE (Multi-hop Question Answering for Knowledge Editing) comprising multi-hop questions that assess whether edited models correctly answer questions where the answer should change as an entailed consequence of edited facts. |
Zexuan Zhong; Zhengxuan Wu; Christopher Manning; Christopher Potts; Danqi Chen; |
36 | FactKB: Generalizable Factuality Evaluation Using Language Models Enhanced with Factual Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose FactKB�a simple new approach to factuality evaluation that is generalizable across domains, in particular with respect to entities and relations. |
Shangbin Feng; Vidhisha Balachandran; Yuyang Bai; Yulia Tsvetkov; |
37 | Batch Prompting: Efficient Inference with Large Language Model APIs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose batch prompting, a simple yet effective prompting approach that enables the LLM to run inference in batches, instead of one sample at a time. |
Zhoujun Cheng; Jungo Kasai; Tao Yu; |
38 | Dissecting Recall of Factual Associations in Auto-Regressive Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While previous work looked into where factual associations are stored, only little is known about how they are retrieved internally during inference. We investigate this question through the lens of information flow. |
Mor Geva; Jasmijn Bastings; Katja Filippova; Amir Globerson; |
39 | The Troubling Emergence of Hallucination in Large Language Models – An Extensive Definition, Quantification, and Prescriptive Remediations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In conclusion, we propose two solution strategies for mitigating hallucinations. |
Vipula Rawte; Swagata Chakraborty; Agnibh Pathak; Anubhav Sarkar; S.M Towhidul Islam Tonmoy; Aman Chadha; Amit Sheth; Amitava Das; |
40 | MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets. To address this challenge, we introduce the Merger Agreement Understanding Dataset (MAUD), an expert-annotated reading comprehension dataset based on the American Bar Association�s 2021 Public Target Deal Points Study, with over 39,000 examples and over 47,000 total annotations. |
Steven Wang; Antoine Scardigli; Leonard Tang; Wei Chen; Dmitry Levkin; Anya Chen; Spencer Ball; Thomas Woodside; Oliver Zhang; Dan Hendrycks; |
41 | AnyTOD: A Programmable Task-Oriented Dialog System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose AnyTOD, an end-to-end, zero-shot task-oriented dialog (TOD) system capable of zero-shot adaptation onto unseen tasks or domains. |
Jeffrey Zhao; Yuan Cao; Raghav Gupta; Harrison Lee; Abhinav Rastogi; Mingqiu Wang; Hagen Soltau; Izhak Shafran; Yonghui Wu; |
42 | PALS: Personalized Active Learning for Subjective Tasks in NLP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present novel Personalized Active Learning techniques for Subjective NLP tasks (PALS) to either reduce the cost of the annotation process or to boost the learning effect. |
Kamil Kanclerz; Konrad Karanowski; Julita Bielaniewicz; Marcin Gruza; Piotr Milkowski; Jan Kocon; Przemyslaw Kazienko; |
43 | Reading Order Matters: Information Extraction from Visually-rich Documents By Token Path Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the reading order issue, we introduce Token Path Prediction (TPP), a simple prediction head to predict entity mentions as token sequences within documents. |
Chong Zhang; Ya Guo; Yi Tu; Huan Chen; Jinyang Tang; Huijia Zhu; Qi Zhang; Tao Gui; |
44 | Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Today�s language models can be remarkably intelligent yet still produce text that contains trivial commonsense errors. Therefore, we seek a retrospective verification approach that can reflect on the commonsense plausibility of the machine text, and introduce Vera, a general-purpose model that learns to estimate the commonsense plausibility of declarative statements. |
Jiacheng Liu; Wenya Wang; Dianzhuo Wang; Noah Smith; Yejin Choi; Hannaneh Hajishirzi; |
45 | Exchange-of-Thought: Enhancing Large Language Model Capabilities Through Cross-Model Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite this progress, their reasoning is often constrained by their intrinsic understanding, lacking external insights. To address this, we propose Exchange-of-Thought (EoT), a novel framework that enables cross-model communication during problem-solving. |
Zhangyue Yin; Qiushi Sun; Cheng Chang; Qipeng Guo; Junqi Dai; Xuanjing Huang; Xipeng Qiu; |
46 | Evaluating Object Hallucination in Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To investigate it, this work presents the first systematic study on object hallucination of LVLMs. |
Yifan Li; Yifan Du; Kun Zhou; Jinpeng Wang; Xin Zhao; Ji-Rong Wen; |
47 | Rethinking The Evaluation for Conversational Recommendation in The Era of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we embark on an investigation into the utilization of ChatGPT for CRSs, revealing the inadequacy of the existing evaluation protocol. |
Xiaolei Wang; Xinyu Tang; Xin Zhao; Jingyuan Wang; Ji-Rong Wen; |
48 | Meta-Learning Online Adaptation of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: That is, the gradient signal from important tokens representing factual information is drowned out by the gradient from inherently noisy tokens, suggesting that a dynamic, context-aware learning rate may be beneficial. We therefore propose learning which tokens to upweight. |
Nathan Hu; Eric Mitchell; Christopher Manning; Chelsea Finn; |
49 | HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To understand what types of content and to which extent LLMs are apt to hallucinate, we introduce the Hallucination Evaluation for Large Language Models (HaluEval) benchmark, a large collection of generated and human-annotated hallucinated samples for evaluating the performance of LLMs in recognizing hallucination. To generate these samples, we propose a ChatGPT-based two-step framework, i. e. , sampling-then-filtering. |
Junyi Li; Xiaoxue Cheng; Xin Zhao; Jian-Yun Nie; Ji-Rong Wen; |
50 | Enhancing Generative Retrieval with Reinforcement Learning from Relevance Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, this approach faces two fundamental challenges: (i) a discrepancy between the token-level probabilistic optimization and the broader document-level relevance estimation; (ii) an overemphasis on top-1 results at the expense of overall ranking quality. To tackle these challenges, we propose a generative retrieval model with reinforcement learning from relevance feedback, which aims to align token-level docid generation with document-level relevance estimation. |
Yujia Zhou; Zhicheng Dou; Ji-Rong Wen; |
51 | Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the most widely-used LMs are fine-tuned with reinforcement learning from human feedback (RLHF-LMs), and some studies have suggested that RLHF-LMs produce conditional probabilities that are very poorly calibrated. In light of this perceived weakness, we conduct a broad evaluation of methods for extracting confidence scores from RLHF-LMs. |
Katherine Tian; Eric Mitchell; Allan Zhou; Archit Sharma; Rafael Rafailov; Huaxiu Yao; Chelsea Finn; Christopher Manning; |
52 | A Cheaper and Better Diffusion Language Model with Soft-Masked Noise Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For example, the generally used Gaussian noise can not handle the discrete corruption well, and the objectives in continuous spaces fail to be stable for textual data in the diffusion process especially when the dimension is high. To alleviate these issues, we introduce a novel diffusion model for language modeling, Masked-Diffuse LM, with lower training cost and better performances, inspired by linguistic features in languages. |
Jiaao Chen; Aston Zhang; Mu Li; Alex Smola; Diyi Yang; |
53 | Unlearn What You Want to Forget: Efficient Unlearning for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, the ability to easily remove data related to individual users from such models while not deteriorating their predictive quality after the removal becomes increasingly important. To address these issues, in this work, we propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals, by introducing lightweight unlearning layers learned with a selective teacher-student objective into the transformers. |
Jiaao Chen; Diyi Yang; |
54 | CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose CodeBERTScore: an evaluation metric for code generation, which builds on BERTScore (Zhang et al. , 2020). |
Shuyan Zhou; Uri Alon; Sumit Agarwal; Graham Neubig; |
55 | NLI4CT: Multi-Evidence Natural Language Inference for Clinical Trial Reports Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a novel resource to advance research on NLI for reasoning on CTRs. |
Mael Jullien; Marco Valentino; Hannah Frost; Paul O�Regan; D�nal Landers; Andre Freitas; |
56 | Instructed Language Models with Retrievers Are Powerful Entity Linkers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Several methods of equipping language models with EL ability were proposed in this work, including (i) a sequence-to-sequence training EL objective with instruction-tuning, (ii) a novel generative EL framework based on a light-weight potential mention retriever that frees the model from heavy and non-parallelizable decoding, achieving 4� speedup without compromise on linking metrics. |
Zilin Xiao; Ming Gong; Jie Wu; Xingyao Zhang; Linjun Shou; Daxin Jiang; |
57 | Privacy Implications of Retrieval-Based Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present the first study of privacy risks in retrieval-based LMs, particularly kNN-LMs. |
Yangsibo Huang; Samyak Gupta; Zexuan Zhong; Kai Li; Danqi Chen; |
58 | Enabling Large Language Models to Generate Text with Citations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, our aim is to allow LLMs to generate text with citations, improving their factual correctness and verifiability. |
Tianyu Gao; Howard Yen; Jiatong Yu; Danqi Chen; |
59 | Doolittle: Benchmarks and Corpora for Academic Writing Formalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a more general task, Academic Writing Formalization (AWF), to improve the overall quality of formal academic writing at the paragraph level. |
Shizhe Diao; Yongyu Lei; Liangming Pan; Tianqing Fang; Wangchunshu Zhou; Sedrick Keh; Min-Yen Kan; Tong Zhang; |
60 | Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the advancements of T2I models, a common issue encountered by users is the need for repetitive editing of input prompts in order to receive a satisfactory image, which is time-consuming and labor-intensive. Given the demonstrated text generation power of large-scale language models, such as GPT-k, we investigate the potential of utilizing such models to improve the prompt editing process for T2I generation. |
Wanrong Zhu; Xinyi Wang; Yujie Lu; Tsu-Jui Fu; Xin Wang; Miguel Eckstein; William Wang; |
61 | Knowledge Rumination for Pre-trained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, despite the promising outcome, we empirically observe that PLMs may have already encoded rich knowledge in their pre-trained parameters but fails to fully utilize them when applying to knowledge-intensive tasks. In this paper, we propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize that related latent knowledge without retrieving them from the external corpus. |
Yunzhi Yao; Peng Wang; Shengyu Mao; Chuanqi Tan; Fei Huang; Huajun Chen; Ningyu Zhang; |
62 | Editing Large Language Models: Problems, Methods, and Opportunities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our objective is to provide valuable insights into the effectiveness and feasibility of each editing technique, thereby assisting the community in making informed decisions on the selection of the most appropriate method for a specific task or context. |
Yunzhi Yao; Peng Wang; Bozhong Tian; Siyuan Cheng; Zhoubo Li; Shumin Deng; Huajun Chen; Ningyu Zhang; |
63 | Conceptor-Aided Debiasing of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose two methods of applying conceptors (1) bias subspace projection by post-processing by the conceptor NOT operation; and (2) a new architecture, conceptor-intervened BERT (CI-BERT), which explicitly incorporates the conceptor projection into all layers during training. |
Li Yifei; Lyle Ungar; Jo�o Sedoc; |
64 | Sparse Low-rank Adaptation of Pre-trained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recognizing the need for more flexible adaptation, we extend the methodology of LoRA to an innovative approach we call sparse low-rank adaptation (SoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process. |
Ning Ding; Xingtai Lv; Qiaosen Wang; Yulin Chen; Bowen Zhou; Zhiyuan Liu; Maosong Sun; |
65 | Adapting Language Models to Compress Contexts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to adapt pre-trained LMs into AutoCompressors. |
Alexis Chevalier; Alexander Wettig; Anirudh Ajith; Danqi Chen; |
66 | We Are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we quantify the degree of influence between 23 fields of study and NLP (on each other). |
Jan Philip Wahle; Terry Ruas; Mohamed Abdalla; Bela Gipp; Saif Mohammad; |
67 | Universal Self-Adaptive Prompting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, while highly coveted and being the most general, zero-shot performances in LLMs are still typically weaker due to the lack of guidance and the difficulty of applying existing automatic prompt design methods in general tasks when ground-truth labels are unavailable. In this study, we address this by presenting Universal Self-Adaptive Prompting (USP), an automatic prompt design approach specifically tailored for zero-shot learning (while compatible with few-shot). |
Xingchen Wan; Ruoxi Sun; Hootan Nakhost; Hanjun Dai; Julian Eisenschlos; Sercan Arik; Tomas Pfister; |
68 | Syntactic Substitutability As Unsupervised Dependency Syntax Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Syntax is a latent hierarchical structure which underpins the robust and compositional nature of human language. In this work, we explore the hypothesis that syntactic dependencies can be represented in language model attention distributions and propose a new method to induce these structures theory-agnostically. |
Jasper Jian; Siva Reddy; |
69 | Composable Text Controls in Latent Space with ODEs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a new efficient approach for composable text operations in the compact latent space of text. |
Guangyi Liu; Zeyu Feng; Yuan Gao; Zichao Yang; Xiaodan Liang; Junwei Bao; Xiaodong He; Shuguang Cui; Zhen Li; Zhiting Hu; |
70 | CoAnnotating: Uncertainty-Guided Work Allocation Between Human and Large Language Models for Data Annotation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose CoAnnotating, a novel paradigm for Human-LLM co-annotation of unstructured texts at scale. |
Minzhi Li; Taiwei Shi; Caleb Ziems; Min-Yen Kan; Nancy Chen; Zhengyuan Liu; Diyi Yang; |
71 | Task-Agnostic Low-Rank Adapters for Unseen English Dialects Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, prior work on dialect struggle with generalizing to evolving and emerging dialects in a scalable manner. To fill this gap, our method, HyperLoRA, leverages expert linguistic knowledge to enable resource-efficient adaptation via hypernetworks. |
Zedian Xiao; William Held; Yanchen Liu; Diyi Yang; |
72 | Impressions: Visual Semiotics and Aesthetic Impact Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Impressions, a novel dataset through which to investigate the semiotics of images, and how specific visual features and design choices can elicit specific emotions, thoughts and beliefs. |
Julia Kruk; Caleb Ziems; Diyi Yang; |
73 | DADA: Dialect Adaptation Via Dynamic Aggregation of Linguistic Rules Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose DADA (Dialect Adaptation via Dynamic Aggregation), a modular approach to imbue SAE-trained models with multi-dialectal robustness by composing adapters which handle specific linguistic features. |
Yanchen Liu; William Held; Diyi Yang; |
74 | Language and Mental Health: Measures of Emotion Dynamics from Text As Linguistic Biosocial Markers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, for the first time, we study the relationship between tweet emotion dynamics and mental health disorders. |
Daniela Teodorescu; Tiffany Cheng; Alona Fyshe; Saif Mohammad; |
75 | Contrastive Learning of Sentence Embeddings from Scratch Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: due to copyright restrictions, data distribution issues, and messy formats, among other factors. To address these issues, we present SynCSE, a contrastive learning framework that trains sentence embeddings with synthetic data. |
Junlei Zhang; Zhenzhong Lan; Junxian He; |
76 | Specialist or Generalist? Instruction Tuning for Specific NLP Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate whether incorporating broadcoverage generalist instruction tuning can contribute to building a specialist model. |
Chufan Shi; Yixuan Su; Cheng Yang; Yujiu Yang; Deng Cai; |
77 | Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, it is still an open question: shall we pretrain large autoregressive LMs with retrieval? To answer it, we perform a comprehensive study on a scalable pre-trained retrieval-augmented LM (i. e. , RETRO) compared with standard GPT and retrieval-augmented GPT incorporated at fine-tuning or inference stages. |
Boxin Wang; Wei Ping; Peng Xu; Lawrence McAfee; Zihan Liu; Mohammad Shoeybi; Yi Dong; Oleksii Kuchaiev; Bo Li; Chaowei Xiao; Anima Anandkumar; Bryan Catanzaro; |
78 | Exploring The Impact of Model Scaling on Parameter-Efficient Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we hypothesize that model scaling mitigates the impact of design differences on PET methods. To investigate this hypothesis, we introduce a more flexible PET method called Arbitrary PET (APET) method. |
Yusheng Su; Chi-Min Chan; Jiali Cheng; Yujia Qin; Yankai Lin; Shengding Hu; Zonghan Yang; Ning Ding; Xingzhi Sun; Guotong Xie; Zhiyuan Liu; Maosong Sun; |
79 | Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and The Case of Information Extraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work shows that useful data can be synthetically generated even for tasks that cannot be solved directly by LLMs: for problems with structured outputs, it is possible to prompt an LLM to perform the task in the reverse direction, by generating plausible input text for a target output structure. |
Martin Josifoski; Marija Sakota; Maxime Peyrard; Robert West; |
80 | Spoiler Detection As Semantic Text Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is primarily because the definition of a spoiler varies depending on the viewer�s progress in the show, and conventional spoiler detection methods lack the granularity to capture this complexity. To tackle this challenge, we propose the task of spoiler matching, which involves assigning an episode number to a spoiler given a specific TV show. |
Ryan Tran; Canwen Xu; Julian McAuley; |
81 | InterFair: Debiasing with Natural Language Feedback for Fair Interpretable Predictions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore two interactive setups with a frozen predictive model and show that users able to provide feedback can achieve a better and fairer balance between task performance and bias mitigation. |
Bodhisattwa Majumder; Zexue He; Julian McAuley; |
82 | Editing Common Sense in Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate whether commonsense judgments are causally associated with localized, editable parameters in Transformers, and we provide an affirmative answer. |
Anshita Gupta; Debanjan Mondal; Akshay Sheshadri; Wenlong Zhao; Xiang Li; Sarah Wiegreffe; Niket Tandon; |
83 | Aligning Large Language Models Through Synthetic Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel alignment learning framework with synthetic feedback not dependent on extensive human annotations and proprietary LLMs. |
Sungdong Kim; Sanghwan Bae; Jamin Shin; Soyoung Kang; Donghyun Kwak; Kang Yoo; Minjoon Seo; |
84 | Three Stream Based Multi-level Event Contrastive Learning for Text-Video Event Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe that the same event triggers correspond to similar motion trajectories, which are hardly affected by the background noise. Moviated by this, we propose a Three Stream Multimodal Event Extraction framework (TSEE) that simultaneously utilizes the features of text sequence and video appearance, as well as the motion representations to enhance the event extraction capacity. |
Jiaqi Li; Chuanyi Zhang; Miaozeng Du; Dehai Min; Yongrui Chen; Guilin Qi; |
85 | SOUL: Towards Sentiment and Opinion Understanding of Language Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, despite the success of pre-trained language models in this area, they often fall short of capturing the broader complexities of sentiment analysis. To address this issue, we propose a new task called Sentiment and Opinion Understanding of Language (SOUL). |
Yue Deng; Wenxuan Zhang; Sinno Pan; Lidong Bing; |
86 | KNN-LM Does Not Improve Open-ended Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the generation quality of interpolation-based retrieval-augmented language models (LMs). |
Shufan Wang; Yixiao Song; Andrew Drozdov; Aparna Garimella; Varun Manjunatha; Mohit Iyyer; |
87 | The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models Via Chain-of-Thought Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to equip smaller LMs with the step-by-step reasoning capability by instruction tuning with CoT rationales. |
Seungone Kim; Se Joo; Doyoung Kim; Joel Jang; Seonghyeon Ye; Jamin Shin; Minjoon Seo; |
88 | PTP: Boosting Stability and Performance of Prompt Tuning with Perturbation-Based Regularizer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new algorithm, called Prompt Tuning with Perturbation-based regularizer (PTP), which can not only alleviate training instability dramatically but also boost the performance of prompt tuning. |
Lichang Chen; Jiuhai Chen; Heng Huang; Minhao Cheng; |
89 | Explore-Instruct: Enhancing Domain-Specific Instruction Coverage Through Active Exploration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing data employed for such tuning often exhibit an inadequate coverage of individual domains, limiting the scope for nuanced comprehension and interactions within these areas. To address this deficiency, we propose Explore-Instruct, a novel approach to enhance the data coverage to be used in domain-specific instruction-tuning through active exploration via Large Language Models (LLMs). |
Fanqi Wan; Xinting Huang; Tao Yang; Xiaojun Quan; Wei Bi; Shuming Shi; |
90 | TheoremQA: A Theorem-driven Question Answering Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce TheoremQA, the first theorem-driven question-answering dataset designed to evaluate AI models� capabilities to apply theorems to solve challenging science problems. |
Wenhu Chen; Ming Yin; Max Ku; Pan Lu; Yixin Wan; Xueguang Ma; Jianyu Xu; Xinyi Wang; Tony Xia; |
91 | MemeCap: A Dataset for Captioning and Interpreting Memes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the task of meme captioning and release a new dataset, MemeCap. |
EunJeong Hwang; Vered Shwartz; |
92 | Building Real-World Meeting Summarization Systems Using Large Language Models: A Practical Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies how to effectively build meeting summarization systems for real-world usage using large language models (LLMs). |
Md Tahmid Rahman Laskar; Xue-Yong Fu; Cheng Chen; Shashi Bhushan TN; |
93 | Character-LLM: A Trainable Agent for Role-Playing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce Character-LLM that teach LLMs to act as specific people such as Beethoven, Queen Cleopatra, Julius Caesar, etc. |
Yunfan Shao; Linyang Li; Junqi Dai; Xipeng Qiu; |
94 | Sparse Universal Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is mainly because scaling UT parameters is more compute and memory intensive than scaling up a VT. This paper proposes the Sparse Universal Transformer (SUT), which leverages Sparse Mixture of Experts (SMoE) to reduce UT�s computation complexity while retaining its parameter efficiency and generalization ability. |
Shawn Tan; Yikang Shen; Zhenfang Chen; Aaron Courville; Chuang Gan; |
95 | Larger Probes Tell A Different Story: Extending Psycholinguistic Datasets Via In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce new, larger datasets for negation (NEG-1500-SIMP) and role reversal (ROLE-1500) inspired by psycholinguistic studies. |
Namrata Shivagunde; Vladislav Lialin; Anna Rumshisky; |
96 | RobustGEC: Robust Grammatical Error Correction Against Subtle Context Perturbation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce RobustGEC, a benchmark designed to evaluate the context robustness of GEC systems. |
Yue Zhang; Leyang Cui; Enbo Zhao; Wei Bi; Shuming Shi; |
97 | Symbolic Planning and Code Generation for Grounded Dialogue Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, LLMs have had limited applicability in grounded task-oriented dialogue as they are difficult to steer toward task objectives and fail to handle novel grounding. We present a modular and interpretable grounded dialogue system that addresses these shortcomings by composing LLMs with a symbolic planner and grounded code execution. |
Justin Chiu; Wenting Zhao; Derek Chen; Saujas Vaduguru; Alexander Rush; Daniel Fried; |
98 | Outlier Suppression+: Accurate Quantization of Large Language Models By Equivalent and Effective Shifting and Scaling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe that these outliers are concentrated in specific channels and are asymmetric across channels. To address this issue, we propose the Outlier Suppression+ (OS+) framework, which contains the channel-wise shifting for asymmetry and channel-wise scaling for concentration. |
Xiuying Wei; Yunchen Zhang; Yuhang Li; Xiangguo Zhang; Ruihao Gong; Jinyang Guo; Xianglong Liu; |
99 | Controlling Pre-trained Language Models for Grade-Specific Text Simplification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we conduct an empirical study to understand how different control mechanisms impact the adequacy and simplicity of text simplification systems. |
Sweta Agrawal; Marine Carpuat; |
100 | Do All Languages Cost The Same? Tokenization in The Era of Commercial Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: What constitutes a token, however, is training data and model dependent with a large variance in the number of tokens required to convey the same information in different languages. In this work, we analyze the effect of this non-uniformity on the fairness of an API�s pricing policy across languages. |
Orevaoghene Ahia; Sachin Kumar; Hila Gonen; Jungo Kasai; David Mortensen; Noah Smith; Yulia Tsvetkov; |
101 | Evaluating Evaluation Metrics: A Framework for Analyzing NLG Evaluation Metrics Using Measurement Theory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recognizing the limitations of existing automatic metrics and noises from how current human evaluation was conducted, we propose MetricEval, a framework informed by measurement theory, the foundation of educational test design, for conceptualizing and evaluating the reliability and validity of NLG evaluation metrics. |
Ziang Xiao; Susu Zhang; Vivian Lai; Q. Vera Liao; |
102 | Improving Language Models� Meaning Understanding and Consistency By Learning Conceptual Roles from Dictionary Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a practical approach that alleviates the inconsistent behaviour issue by fundamentally improving PLMs� meaning awareness. |
Myeongjun Jang; Thomas Lukasiewicz; |
103 | Consistency Analysis of ChatGPT Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates the trustworthiness of ChatGPT and GPT-4 regarding logically consistent behaviour, focusing specifically on semantic consistency and the properties of negation, symmetric, and transitive consistency. |
Myeongjun Jang; Thomas Lukasiewicz; |
104 | Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs Without Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Inference-time Policy Adapters (IPA), which efficiently tailors a language model such as GPT-3 without fine-tuning it. |
Ximing Lu; Faeze Brahman; Peter West; Jaehun Jung; Khyathi Chandu; Abhilasha Ravichander; Prithviraj Ammanabrolu; Liwei Jiang; Sahana Ramnath; Nouha Dziri; Jillian Fisher; Bill Lin; Skyler Hallinan; Lianhui Qin; Xiang Ren; Sean Welleck; Yejin Choi; |
105 | KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unfortunately, this method incurs high training costs and may cause catastrophic forgetting for multi-tasking models. To overcome these limitations, we propose a knowledge-constrained decoding method called KCTS (Knowledge-Constrained Tree Search), which guides a frozen LM to generate text aligned with the reference knowledge at each decoding step using a knowledge classifier score and MCTS (Monte-Carlo Tree Search). |
Sehyun Choi; Tianqing Fang; Zhaowei Wang; Yangqiu Song; |
106 | Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination By Evaluation Benchmarks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Assuming that all relevant actors value clean test data and will cooperate to mitigate data contamination, what can be done? We propose three strategies that can make a difference: (1) Test data made public should be encrypted with a public key and licensed to disallow derivative distribution; (2) demand training exclusion controls from closed API holders, and protect your test data by refusing to evaluate without them; (3) avoid data which appears with its solution on the internet, and release the web-page context of internet-derived data along with the data. |
Alon Jacovi; Avi Caciularu; Omer Goldman; Yoav Goldberg; |
107 | Focus Your Attention (with Adaptive IIR Filters) Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new layer in which dynamic (i. e. , input-dependent) Infinite Impulse Response (IIR) filters of order two are used to process the input sequence prior to applying conventional attention. |
Shahar Lutati; Itamar Zimerman; Lior Wolf; |
108 | DetGPT: Detect What You Need Via Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a new paradigm for object detection that we call reasoning-based object detection. |
Renjie Pi; Jiahui Gao; Shizhe Diao; Rui Pan; Hanze Dong; Jipeng Zhang; Lewei Yao; Jianhua Han; Hang Xu; Lingpeng Kong; Tong Zhang; |
109 | Q2d: Turning Questions Into Dialogs to Teach Models How to Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose q2d: an automatic data generation pipeline that generates information-seeking dialogs from questions. |
Yonatan Bitton; Shlomi Cohen-Ganor; Ido Hakimi; Yoad Lewenberg; Roee Aharoni; Enav Weinreb; |
110 | ReTAG: Reasoning Aware Table to Analytic Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through analysis of popular table to text benchmarks (ToTTo (Parikh et al. , 2020 and InfoTabs (Gupta et al. , 2020) we observe that in order to generate the ideal summary, multiple types of reasoning is needed coupled with access to knowledge beyond the scope of the table. To address this gap, we propose ReTAG, a table and reasoning aware model that uses vector-quantization to infuse different types of analytical reasoning into the output. |
Deepanway Ghosal; Preksha Nema; Aravindan Raghuveer; |
111 | MoT: Memory-of-Thought Enables ChatGPT to Self-Improve Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a framework, **MoT**, to let the LLM self-improve through **M**emory **o**f **T**houghts, without annotated datasets and parameter updates. |
Xiaonan Li; Xipeng Qiu; |
112 | UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose UPRISE (Universal Prompt Retrieval for Improving zero-Shot Evaluation), which tunes a lightweight and versatile retriever that automatically retrieves prompts for a given zero-shot task input. |
Daixuan Cheng; Shaohan Huang; Junyu Bi; Yuefeng Zhan; Jianfeng Liu; Yujing Wang; Hao Sun; Furu Wei; Weiwei Deng; Qi Zhang; |
113 | Do Transformers Parse While Predicting The Masked Word? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Some doubts have been raised whether the models are doing parsing or only some computation weakly correlated with it. Concretely: (a) Is it possible to explicitly describe transformers with realistic embedding dimensions, number of heads, etc. that are capable of doing parsing ? or even approximate parsing? (b) Why do pre-trained models capture parsing structure? This paper takes a step toward answering these questions in the context of generative modeling with PCFGs. We show that masked language models like BERT or RoBERTa of moderate sizes can approximately execute the Inside-Outside algorithm for the English PCFG (Marcus et al. , 1993). |
Haoyu Zhao; Abhishek Panigrahi; Rong Ge; Sanjeev Arora; |
114 | SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Data scarcity has been a long standing issue in the field of open-domain social dialogue. To quench this thirst, we present SODA: the first publicly available, million-scale high-quality social dialogue dataset. |
Hyunwoo Kim; Jack Hessel; Liwei Jiang; Peter West; Ximing Lu; Youngjae Yu; Pei Zhou; Ronan Bras; Malihe Alikhani; Gunhee Kim; Maarten Sap; Yejin Choi; |
115 | Explaining with Contrastive Phrasal Highlighting: A Case Study in Assisting Humans to Detect Translation Differences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a technique to generate contrastive phrasal highlights that explain the predictions of a semantic divergence model via phrase alignment guided erasure. |
Eleftheria Briakou; Navita Goyal; Marine Carpuat; |
116 | APrompt: Attention Prompt Tuning for Efficient Adaptation of Pre-trained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel Attention Prompt tuning method, namely APrompt, for efficient adaptation of pre-trained language models. |
Qifan Wang; Yuning Mao; Jingang Wang; Hanchao Yu; Shaoliang Nie; Sinong Wang; Fuli Feng; Lifu Huang; Xiaojun Quan; Zenglin Xu; Dongfang Liu; |
117 | Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify three different classes of disagreement, which we term confabulation, deception, and heterogeneity. |
Kevin Liu; Stephen Casper; Dylan Hadfield-Menell; Jacob Andreas; |
118 | Can We Edit Multimodal Large Language Models? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on editing multimodal Large Language Models (LLMs). |
Siyuan Cheng; Bozhong Tian; Qingbin Liu; Xi Chen; Yongheng Wang; Huajun Chen; Ningyu Zhang; |
119 | Active Instruction Tuning: Improving Cross-Task Generalization By Training on Prompt Sensitive Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We discover that training on ambiguous (prompt-uncertain) tasks improves generalization while training on difficult (prompt-certain and low-probability) tasks offers no benefit, underscoring the importance of task selection for instruction tuning. |
Po-Nien Kung; Fan Yin; Di Wu; Kai-Wei Chang; Nanyun Peng; |
120 | HalOmi: A Manually Annotated Benchmark for Multilingual Hallucination and Omission Detection in Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we release an annotated dataset for the hallucination and omission phenomena covering 18 translation directions with varying resource levels and scripts. |
David Dale; Elena Voita; Janice Lam; Prangthip Hansanti; Christophe Ropers; Elahe Kalbassi; Cynthia Gao; Loic Barrault; Marta Costa-juss�; |
121 | Oolong: Investigating What Makes Transfer Learning Hard with Controlled Studies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To disentangle the impact of different factors like syntactic similarity and vocabulary similarity, we propose a set of controlled transfer studies: we systematically transform the language of the GLUE benchmark, altering one axis of crosslingual variation at a time, and then measure the resulting drops in a pretrained model�s downstream performance. |
Zhengxuan Wu; Alex Tamkin; Isabel Papadimitriou; |
122 | UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose several chart-specific pretraining tasks that include: (i) low-level tasks to extract the visual elements (e. g. , bars, lines) and data from charts, and (ii) high-level tasks to acquire chart understanding and reasoning skills. |
Ahmed Masry; Parsa Kavehzadeh; Do Long; Enamul Hoque; Shafiq Joty; |
123 | Reformulating NLP Tasks to Capture Longitudinal Manifestation of Language Disorders in People with Dementia Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we automatically learn linguistic disorder patterns by making use of a moderately-sized pre-trained language model and forcing it to focus on reformulated natural language processing (NLP) tasks and associated linguistic patterns. |
Dimitris Gkoumas; Matthew Purver; Maria Liakata; |
124 | A Digital Language Coherence Marker for Monitoring Dementia Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we propose methods to capture language coherence as a cost-effective, human-interpretable digital marker for monitoring cognitive changes in people with dementia. |
Dimitris Gkoumas; Adam Tsakalidis; Maria Liakata; |
125 | Ties Matter: Meta-Evaluating Modern Metrics with Pairwise Accuracy and Tie Calibration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose instead to meta-evaluate metrics with a version of pairwise accuracy that gives metrics credit for correctly predicting ties, in combination with a tie calibration procedure that automatically introduces ties into metric scores, enabling fair comparison between metrics that do and do not predict ties. |
Daniel Deutsch; George Foster; Markus Freitag; |
126 | ReCEval: Evaluating Reasoning Chains Via Correctness and Informativeness Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we propose ReCEval (Reasoning Chain Evaluation), a framework that evaluates reasoning chains via two key properties: (1) correctness, i. e. , each step makes a valid inference based on information contained within the step, preceding steps, and input context, and (2) informativeness, i. e. , each step provides new information that is helpful towards deriving the generated answer. |
Archiki Prasad; Swarnadeep Saha; Xiang Zhou; Mohit Bansal; |
127 | Skill-Based Few-Shot Selection for In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose **Skill-KNN**, a skill-based few-shot selection method for in-context learning. |
Shengnan An; Bo Zhou; Zeqi Lin; Qiang Fu; Bei Chen; Nanning Zheng; Weizhu Chen; Jian-Guang Lou; |
128 | IfQA: A Dataset for Open-domain Question Answering Under Counterfactual Presuppositions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although counterfactual reasoning is a fundamental aspect of intelligence, the lack of large-scale counterfactual open-domain question-answering (QA) benchmarks makes it difficult to evaluate and improve models on this ability. To address this void, we introduce the first such dataset, named IfQA, where each question is based on a counterfactual presupposition via an �if� clause. |
Wenhao Yu; Meng Jiang; Peter Clark; Ashish Sabharwal; |
129 | Explanation Selection Using Unlabeled Data for Chain-of-Thought Prompting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper tackles the problem of how to optimize explanation-infused prompts in a blackbox fashion. |
Xi Ye; Greg Durrett; |
130 | GlobalBench: A Benchmark for Global Progress in Natural Language Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To track and further incentivize the global development of equitable language technology, we introduce GlobalBench. |
Yueqi Song; Simran Khanuja; Pengfei Liu; Fahim Faisal; Alissa Ostapenko; Genta Winata; Alham Aji; Samuel Cahyawijaya; Yulia Tsvetkov; Antonios Anastasopoulos; Graham Neubig; |
131 | UDAPDR: Unsupervised Domain Adaptation Via LLM Prompting and Distillation of Rerankers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, such datasets are often unavailable, and their utility for real-world applications can diminish quickly due to domain shifts. To address this challenge, we develop and motivate a method for using large language models (LLMs) to generate large numbers of synthetic queries cheaply. |
Jon Saad-Falcon; Omar Khattab; Keshav Santhanam; Radu Florian; Martin Franz; Salim Roukos; Avirup Sil; Md Sultan; Christopher Potts; |
132 | Byte Pair Encoding for Symbolic Music Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that Byte Pair Encoding, a compression technique widely used for natural language, significantly decreases the sequence length while increasing the vocabulary size. |
Nathan Fradet; Nicolas Gutowski; Fabien Chhel; Jean-Pierre Briot; |
133 | Incorporating Structured Representations Into Pretrained Vision & Language Models Using Scene Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we ask whether small SG datasets can provide sufficient information for enhancing structured understanding of pretrained VLMs. We show that it is indeed possible to improve VLMs when learning from SGs by integrating components that incorporate structured information into both visual and textual representations. |
Roei Herzig; Alon Mendelson; Leonid Karlinsky; Assaf Arbelle; Rogerio Feris; Trevor Darrell; Amir Globerson; |
134 | Can We Edit Factual Knowledge By In-Context Learning? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by in-context learning (ICL), a new paradigm based on demonstration contexts without parameter updating, we explore whether ICL can edit factual knowledge. |
Ce Zheng; Lei Li; Qingxiu Dong; Yuxuan Fan; Zhiyong Wu; Jingjing Xu; Baobao Chang; |
135 | Merging Experts Into One: Improving Computational Efficiency of Mixture of Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Can we retain the advantages of adding more experts without substantially increasing the computational costs? In this paper, we first demonstrate the superiority of selecting multiple experts and then propose a computation-efficient approach called Merging Experts into One (MEO), which reduces the computation cost to that of a single expert. |
Shwai He; Run-Ze Fan; Liang Ding; Li Shen; Tianyi Zhou; Dacheng Tao; |
136 | SummEdits: Measuring LLM Ability at Factual Reasoning Through The Lens of Summarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a closer analysis reveals issues with existing evaluation benchmarks, affecting evaluation precision. To address this, we propose a new protocol for inconsistency detection benchmark creation and implement it in a 10-domain benchmark called SummEdits. |
Philippe Laban; Wojciech Kryscinski; Divyansh Agarwal; Alexander Fabbri; Caiming Xiong; Shafiq Joty; Chien-Sheng Wu; |
137 | Mitigating Temporal Misalignment By Discarding Outdated Facts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate the effects of temporal misalignment, we propose fact duration prediction: the task of predicting how long a given fact will remain true. |
Michael Zhang; Eunsol Choi; |
138 | Have LLMs Advanced Enough? A Challenging Problem Solving Benchmark For Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In response, we present JEEBench, a considerably more challenging benchmark dataset for evaluating the problem solving abilities of LLMs. |
Daman Arora; Himanshu Singh; Mausam; |
139 | Noisy Exemplars Make Large Language Models More Robust: A Domain-Agnostic Behavioral Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, there is little existing work investigating the robustness of LLMs with few-shot prompting techniques. Therefore, we introduce a systematic approach to test the robustness of LLMs in multi-hop reasoning tasks via domain-agnostic perturbations. |
Hongyi Zheng; Abulhair Saparov; |
140 | Conversational Semantic Parsing Using Dynamic Context Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we consider the task of conversational semantic parsing over general purpose knowledge graphs (KGs) with millions of entities, and thousands of relation-types. |
Parag Jain; Mirella Lapata; |
141 | Enhancing Textbooks with Visuals from The Web for Improved Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the effectiveness of vision-language models to automatically enhance textbooks with images from the web. |
Janvijay Singh; Vil�m Zouhar; Mrinmaya Sachan; |
142 | Enhancing Biomedical Lay Summarisation with External Knowledge Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Using both automatic and human evaluations, we systematically investigate the effectiveness of three different approaches for incorporating knowledge graphs within lay summarisation models, with each method targeting a distinct area of the encoder-decoder model architecture. |
Tomas Goldsack; Zhihao Zhang; Chen Tang; Carolina Scarton; Chenghua Lin; |
143 | 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a relation-aware one-stage framework, named 3D Relative Position-aware Network (3DRP-Net), which can effectively capture the relative spatial relationships between objects and enhance object attributes. |
Zehan Wang; Haifeng Huang; Yang Zhao; Linjun Li; Xize Cheng; Yichen Zhu; Aoxiong Yin; Zhou Zhao; |
144 | Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Dynosaur, a dynamic growth paradigm for the automatic curation of instruction-tuning data. |
Da Yin; Xiao Liu; Fan Yin; Ming Zhong; Hritik Bansal; Jiawei Han; Kai-Wei Chang; |
145 | MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they inherently lack 2D graph perception � a critical ability of human professionals in comprehending molecules� topological structures. To bridge this gap, we propose MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter. |
Zhiyuan Liu; Sihang Li; Yanchen Luo; Hao Fei; Yixin Cao; Kenji Kawaguchi; Xiang Wang; Tat-Seng Chua; |
146 | GD-COMET: A Geo-Diverse Commonsense Inference Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present GD-COMET, a geo-diverse version of the COMET commonsense inference model. |
Mehar Bhatia; Vered Shwartz; |
147 | Crossing The Threshold: Idiomatic Machine Translation Through Retrieval Augmentation and Loss Weighting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve translation of natural idioms, we introduce two straightforward yet effective techniques: the strategic upweighting of training loss on potentially idiomatic sentences, and using retrieval-augmented models. |
Emmy Liu; Aditi Chaudhary; Graham Neubig; |
148 | Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Transformers have become a key architecture in speech processing, but our understanding of how they build up representations of acoustic and linguistic structure is limited. In this study, we address this gap by investigating how measures of �context-mixing� developed for text models can be adapted and applied to models of spoken language. |
Hosein Mohebbi; Grzegorz Chrupala; Willem Zuidema; Afra Alishahi; |
149 | Non-autoregressive Streaming Transformer for Simultaneous Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address those issues, we propose non-autoregressive streaming Transformer (NAST) which comprises a unidirectional encoder and a non-autoregressive decoder with intra-chunk parallelism. |
Zhengrui Ma; Shaolei Zhang; Shoutao Guo; Chenze Shao; Min Zhang; Yang Feng; |
150 | SeqXGPT: Sentence-Level AI-Generated Text Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These features are composed like waves in speech processing and cannot be studied by LLMs. Therefore, we build SeqXGPT based on convolution and self-attention networks. |
Pengyu Wang; Linyang Li; Ke Ren; Botian Jiang; Dong Zhang; Xipeng Qiu; |
151 | Knowledge-Augmented Language Model Verification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To overcome these, we propose to verify the output and the knowledge of the knowledge-augmented LMs with a separate verifier, which is a small LM that is trained to detect those two types of errors through instruction-finetuning. |
Jinheon Baek; Soyeong Jeong; Minki Kang; Jong Park; Sung Hwang; |
152 | Explicit Planning Helps Language Models in Logical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose LEAP, a novel system that uses language models to perform multi-step logical reasoning and incorporates explicit planning into the inference procedure. |
Hongyu Zhao; Kangrui Wang; Mo Yu; Hongyuan Mei; |
153 | API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, three pivotal questions remain unanswered: (1) How effective are current LLMs in utilizing tools? (2) How can we enhance LLMs? ability to utilize tools? (3) What obstacles need to be overcome to leverage tools? To address these questions, we introduce API-Bank, a groundbreaking benchmark, specifically designed for tool-augmented LLMs. |
Minghao Li; Yingxiu Zhao; Bowen Yu; Feifan Song; Hangyu Li; Haiyang Yu; Zhoujun Li; Fei Huang; Yongbin Li; |
154 | Towards Interpretable Mental Health Analysis with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing relevant studies bear several limitations, including inadequate evaluations, lack of prompting strategies, and ignorance of exploring LLMs for explainability. To bridge these gaps, we comprehensively evaluate the mental health analysis and emotional reasoning ability of LLMs on 11 datasets across 5 tasks. |
Kailai Yang; Shaoxiong Ji; Tianlin Zhang; Qianqian Xie; Ziyan Kuang; Sophia Ananiadou; |
155 | Label Words Are Anchors: An Information Flow Perspective for Understanding In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the working mechanism of ICL through an information flow lens. |
Lean Wang; Lei Li; Damai Dai; Deli Chen; Hao Zhou; Fandong Meng; Jie Zhou; Xu Sun; |
156 | A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we use language models to rewrite snippets from scientific documents to be read on their own. |
Benjamin Newman; Luca Soldaini; Raymond Fok; Arman Cohan; Kyle Lo; |
157 | Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we carry out a data archaeology to infer books that are known to ChatGPT and GPT-4 using a name cloze membership inference query. |
Kent Chang; Mackenzie Cramer; Sandeep Soni; David Bamman; |
158 | Lost in Translation, Found in Spans: Identifying Claims in Multilingual Social Media Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite its importance to journalists and human fact-checkers, it remains a severely understudied problem, and the scarce research on this topic so far has only focused on English. Here we aim to bridge this gap by creating a novel dataset, X-CLAIM, consisting of 7K real-world claims collected from numerous social media platforms in five Indian languages and English. |
Shubham Mittal; Megha Sundriyal; Preslav Nakov; |
159 | Detecting Propaganda Techniques in Code-Switched Social Media Text Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Code-switching combines different languages within the same text, which poses a challenge for automatic systems. Considering this premise, we propose a novel task of detecting propaganda techniques in code-switched text. |
Muhammad Salman; Asif Hanif; Shady Shehata; Preslav Nakov; |
160 | LLM-powered Data Augmentation for Enhanced Cross-lingual Performance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper explores the potential of leveraging Large Language Models (LLMs) for data augmentation in multilingual commonsense reasoning datasets where the available training data is extremely limited. |
Chenxi Whitehouse; Monojit Choudhury; Alham Aji; |
161 | On The Challenges of Using Black-Box APIs for Toxicity Evaluation in Research Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our findings suggest that research that relied on inherited automatic toxicity scores to compare models and techniques may have resulted in inaccurate findings. |
Luiza Pozzobon; Beyza Ermis; Patrick Lewis; Sara Hooker; |
162 | Federated Learning of Large Language Models with Parameter-Efficient Prompt Tuning and Adaptive Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a Parameter-efficient prompt Tuning approach with Adaptive Optimization, i. e. , FedPepTAO, to enable efficient and effective FL of LLMs. |
Tianshi Che; Ji Liu; Yang Zhou; Jiaxiang Ren; Jiwen Zhou; Victor Sheng; Huaiyu Dai; Dejing Dou; |
163 | Appraising The Potential Uses and Harms of LLMs for Medical Systematic Reviews Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We conducted 16 interviews with international systematic review experts to characterize the perceived utility and risks of LLMs in the specific context of medical evidence reviews. |
Hye Yun; Iain Marshall; Thomas Trikalinos; Byron Wallace; |
164 | Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce IndoMMLU, the first multi-task language understanding benchmark for Indonesian culture and languages, which consists of questions from primary school to university entrance exams in Indonesia. |
Fajri Koto; Nurul Aisyah; Haonan Li; Timothy Baldwin; |
165 | IEKG: A Commonsense Knowledge Graph for Idiomatic Expressions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike prior works that enable IE comprehension through fine-tuning PTLMs with sentences containing IEs, in this work, we construct IEKG, a commonsense knowledge graph for figurative interpretations of IEs. |
Ziheng Zeng; Kellen Cheng; Srihari Nanniyur; Jianing Zhou; Suma Bhat; |
166 | Generating Data for Symbolic Language with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose SymGen which utilizes LLMs for generating various annotation-expensive symbolic language data. |
Jiacheng Ye; Chengzu Li; Lingpeng Kong; Tao Yu; |
167 | Text Encoders Bottleneck Compositionality in Contrastive Vision-language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We first curate CompPrompts, a set of increasingly compositional image captions that VL models should be able to capture (e. g. , single object, to object+property, to multiple interacting objects). Then, we train text-only recovery probes that aim to reconstruct captions from single-vector text representations produced by several VL models. |
Amita Kamath; Jack Hessel; Kai-Wei Chang; |
168 | What�s �up� with Vision-language Models? Investigating Their Struggle with Spatial Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate 18 VL models, finding that all perform poorly, e. g. , BLIP finetuned on VQAv2, which nears human parity on VQAv2, achieves 56% accuracy on our benchmarks vs. humans at 99%. |
Amita Kamath; Jack Hessel; Kai-Wei Chang; |
169 | An Integrated Search System for Korea Weather Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce WeatherSearch, an integrated search system deployed at the Korea Meteorological Administration (KMA). |
Jinkyung Jo; Dayeon Ki; Soyoung Yoon; Minjoon Seo; |
170 | Guideline Learning for In-Context Information Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Guideline Learning (GL) framework for In-context IE which reflectively learns and follows guidelines. |
Chaoxu Pang; Yixuan Cao; Qiang Ding; Ping Luo; |
171 | RainProof: An Umbrella to Shield Text Generator from Out-Of-Distribution Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on leveraging soft-probabilities in a black-box framework, i. e. we can access the soft-predictions but not the internal states of the model. |
Maxime Darrin; Pablo Piantanida; Pierre Colombo; |
172 | Continually Improving Extractive QA Via Human Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study continually improving an extractive question answering (QA) system via human user feedback. |
Ge Gao; Hung-Ting Chen; Yoav Artzi; Eunsol Choi; |
173 | BERTie Bott�s Every Flavor Labels: A Tasty Introduction to Semantic Role Labeling for Galician Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we leverage existing corpora, WordNet, and dependency parsing to build the first Galician dataset for training semantic role labeling systems in an effort to expand available NLP resources. |
Micaella Bruton; Meriem Beloucif; |
174 | Clembench: Using Game Play to Evaluate Chat-Optimized Language Models As Conversational Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a proof of concept, this paper investigates five interaction settings, showing that current chat-optimised LLMs are, to an extent, capable of following game-play instructions. |
Kranti Chalamalasetti; Jana G�tze; Sherzod Hakimov; Brielen Madureira; Philipp Sadler; David Schlangen; |
175 | Distance-Based Propagation for Efficient Knowledge Graph Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Though there are a few recent attempts to address this through learnable path pruning, they often sacrifice the performance to gain efficiency. In this work, we identify two intrinsic limitations of these methods that affect the efficiency and representation quality. |
Harry Shomer; Yao Ma; Juanhui Li; Bo Wu; Charu Aggarwal; Jiliang Tang; |
176 | Multilingual Pixel Representations for Translation and Effective Cross-lingual Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce and demonstrate how to effectively train multilingual machine translation models with pixel representations. |
Elizabeth Salesky; Neha Verma; Philipp Koehn; Matt Post; |
177 | Hybrid Inverted Index Is A Robust Accelerator for Dense Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present the Hybrid Inverted Index (HI2), where the embedding clusters and salient terms work collaboratively to accelerate dense retrieval. |
Peitian Zhang; Zheng Liu; Shitao Xiao; Zhicheng Dou; Jing Yao; |
178 | Harnessing LLMs for Temporal Data – A Study on Explainable Financial Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The study demonstrates LLMs� ability to generate well-reasoned decisions by leveraging cross-sequence information and extracting insights from text and price time series. |
Xinli Yu; Zheng Chen; Yanbin Lu; |
179 | MAggretriever: A Simple Yet Effective Approach to Zero-Shot Multilingual Dense Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce mAggretriever, which effectively leverages semantic and lexical features from pre-trained multilingual transformers (e. g. , mBERT and XLM-R) for dense retrieval. |
Sheng-Chieh Lin; Amin Ahmad; Jimmy Lin; |
180 | Indicative Summarization of Long Discussions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a novel unsupervised approach using large language models (LLMs) to generating indicative summaries for long discussions that basically serve as tables of contents. |
Shahbaz Syed; Dominik Schwabe; Khalid Khatib; Martin Potthast; |
181 | A Training-Free Debiasing Framework with Counterfactual Reasoning for Conversational Emotion Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, previous studies in ERC generally focus on capturing context-sensitive and speaker-sensitive dependencies, ignoring the unintended dataset biases of data, which hampers the generalization and fairness in ERC. To address this issue, we propose a Training-Free Debiasing framework (TFD) that operates during prediction without additional training. |
Geng Tu; Ran Jing; Bin Liang; Min Yang; Kam-Fai Wong; Ruifeng Xu; |
182 | A Mechanistic Interpretation of Arithmetic Reasoning in Language Models Using Causal Mediation Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Mathematical reasoning in large language models (LMs) has garnered significant attention in recent work, but there is a limited understanding of how these models process and store information related to arithmetic tasks within their architecture. In order to improve our understanding of this aspect of language models, we present a mechanistic interpretation of Transformer-based LMs on arithmetic questions using a causal mediation analysis framework. |
Alessandro Stolfo; Yonatan Belinkov; Mrinmaya Sachan; |
183 | Let�s Sample Step By Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we introduce Adaptive-Consistency, a cost-efficient, model-agnostic technique that dynamically adjusts the number of samples per question using a lightweight stopping criterion. |
Pranjal Aggarwal; Aman Madaan; Yiming Yang; Mausam; |
184 | Enhancing Chat Language Models By Scaling High-quality Instructional Conversations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper aims to push the upper bound of open-source models further. |
Ning Ding; Yulin Chen; Bokai Xu; Yujia Qin; Shengding Hu; Zhiyuan Liu; Maosong Sun; Bowen Zhou; |
185 | Knowledge Graph Compression Enhances Diverse Commonsense Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to the large coverage and, consequently, vast scale of ConceptNet, the extracted subgraphs may contain loosely related, redundant and irrelevant information, which can introduce noise into the model. We propose to address this by applying a differentiable graph compression algorithm that focuses on the relevant knowledge for the task. |
EunJeong Hwang; Veronika Thost; Vered Shwartz; Tengfei Ma; |
186 | Decoding The Silent Majority: Inducing Belief Augmented Social Graph with Large Language Model for Response Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches have limited exploration of how to best process and utilize these important features. To address this gap, we propose a novel framework, named SocialSense, that leverages a large language model to induce a belief-centered graph on top of an existent social network, along with graph-based propagation to capture social dynamics. |
Chenkai Sun; Jinning Li; Yi Fung; Hou Chan; Tarek Abdelzaher; ChengXiang Zhai; Heng Ji; |
187 | Non-Programmers Can Label Programs Indirectly Via Active Examples: A Case Study with Text-to-SQL Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce APEL, a framework in which non-programmers select among candidate programs generated by a seed semantic parser (e. g. , Codex). |
Ruiqi Zhong; Charlie Snell; Dan Klein; Jason Eisner; |
188 | Goal-Driven Explainable Clustering Via Language Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new task formulation, �Goal-Driven Clustering with Explanations� (GoalEx), which represents both the goal and the explanations as free-form language descriptions. |
Zihan Wang; Jingbo Shang; Ruiqi Zhong; |
189 | Grammar-Constrained Decoding for Structured NLP Tasks Without Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we demonstrate that formal grammars can describe the output space for a much wider range of tasks and argue that GCD can serve as a unified framework for structured NLP tasks in general. |
Saibo Geng; Martin Josifoski; Maxime Peyrard; Robert West; |
190 | Does The Correctness of Factual Knowledge Matter for Factual Knowledge-Enhanced Pre-trained Language Models? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a counterfactual-based analysis framework to explore the causal effects of factual knowledge injection on the performance of language models within pretrain-finetune paradigm. |
Boxi Cao; Qiaoyu Tang; Hongyu Lin; Xianpei Han; Le Sun; |
191 | When Language Models Fall in Love: Animacy Processing in Transformer Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Like previous studies, we find that LMs behave much like humans when presented with entities whose animacy is typical. However, we also show that even when presented with stories about atypically animate entities, such as a peanut in love, LMs adapt: they treat these entities as animate, though they do not adapt as well as humans. |
Michael Hanna; Yonatan Belinkov; Sandro Pezzelle; |
192 | Accelerating Toeplitz Neural Network with Constant-time Inference Complexity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to combine the strengths of TNNs and SSMs by converting TNNs to SSMs during inference, thereby enabling TNNs to achieve the same constant inference complexities as SSMs. |
Zhen Qin; Yiran Zhong; |
193 | Mirages. On Anthropomorphism in Dialogue Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we discuss the linguistic factors that contribute to the anthropomorphism of dialogue systems and the harms that can arise thereof, including reinforcing gender stereotypes and conceptions of acceptable language. |
Gavin Abercrombie; Amanda Curry; Tanvi Dinkar; Verena Rieser; Zeerak Talat; |
194 | PK-ICR: Persona-Knowledge Interactive Multi-Context Retrieval for Grounded Dialogue Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a novel grounding retrieval method that utilizes all contexts of dialogue simultaneously. |
Minsik Oh; Joosung Lee; Jiwei Li; Guoyin Wang; |
195 | Did You Mean…? Confidence-based Trade-offs in Semantic Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the DidYouMean system which better balances usability and safety by rephrasing low-confidence inputs. |
Elias Stengel-Eskin; Benjamin Van Durme; |
196 | STAIR: Learning Sparse Text and Image Representation in Grounded Tokens Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that it is possible to build a sparse semantic representation that is as powerful as, or even better than, dense presentations. |
Chen Chen; Bowen Zhang; Liangliang Cao; Jiguang Shen; Tom Gunter; Albin Jose; Alexander Toshev; Yantao Zheng; Jonathon Shlens; Ruoming Pang; Yinfei Yang; |
197 | Where to Start? Analyzing The Potential Value of Intermediate Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such a model, finetuned on some source dataset, may provide a better starting point for a new finetuning process on a desired target dataset. Here, we perform a systematic analysis of this intertraining scheme, over a wide range of English classification tasks. |
Leshem Choshen; Elad Venezian; Shachar Don-Yehiya; Noam Slonim; Yoav Katz; |
198 | INFORM : Information ENtropy Based Multi-step Reasoning FOR Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel approach by introducing information entropy (IE) as a criteria on for CoT prompt selection. |
Chuyue Zhou; Wangjie You; Juntao Li; Jing Ye; Kehai Chen; Min Zhang; |
199 | DecipherPref: Analyzing Influential Factors in Human Preference Judgments Via GPT-4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct an in-depth examination of a collection of pairwise human judgments released by OpenAI. |
Yebowen Hu; Kaiqiang Song; Sangwoo Cho; Xiaoyang Wang; Hassan Foroosh; Fei Liu; |
200 | Rethinking Negative Pairs in Code Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As an example, a bubble sorting algorithm example is less �negative� than a file saving function for the quick sorting algorithm query. In this paper, we tackle the above problems by proposing a simple yet effective Soft-InfoNCE loss that inserts weight terms into InfoNCE. |
Haochen Li; Xin Zhou; Anh Luu; Chunyan Miao; |
201 | ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose ViT-TTS, the first visual TTS model with scalable diffusion transformers. |
Huadai Liu; Rongjie Huang; Xuan Lin; Wenqiang Xu; Maozong Zheng; Hong Chen; Jinzheng He; Zhou Zhao; |
202 | Evaluating Cross-Domain Text-to-SQL Models and Benchmarks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, accurately matching a model-generated SQL query to a reference SQL query in a benchmark fails for various reasons, such as underspecified natural language queries, inherent assumptions in both model-generated and reference queries, and the non-deterministic nature of SQL output under certain conditions. In this paper, we conduct an extensive study of several prominent cross-domain text-to-SQL benchmarks and re-evaluate some of the top-performing models within these benchmarks, by both manually evaluating the SQL queries and rewriting them in equivalent expressions. |
Mohammadreza Pourreza; Davood Rafiei; |
203 | Robust Prompt Optimization for Large Language Models Against Distribution Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this light, we propose a new problem of robust prompt optimization for LLMs against distribution shifts, which requires the prompt optimized over the labeled source group can simultaneously generalize to an unlabeled target group. To solve this problem, we propose Generalized Prompt Optimization framework , which incorporates the unlabeled data from the target group into prompt optimization. |
Moxin Li; Wenjie Wang; Fuli Feng; Yixin Cao; Jizhi Zhang; Tat-Seng Chua; |
204 | GeoLM: Empowering Language Models for Geospatially Grounded Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces GeoLM, a geospatially grounded language model that enhances the understanding of geo-entities in natural language. |
Zekun Li; Wenxuan Zhou; Yao-Yi Chiang; Muhao Chen; |
205 | Can LLMs Facilitate Interpretation of Pre-trained Language Models? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose using a large language model, ChatGPT, as an annotator to enable fine-grained interpretation analysis of pre-trained language models. |
Basel Mousi; Nadir Durrani; Fahim Dalvi; |
206 | Enhancing Code-Switching for Cross-lingual SLU: A Unified View of Semantic and Grammatical Coherence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We ascribe this lack to two issues: (1) randomly replacing code-switched tokens with equal probability and (2) disregarding token-level dependency within each language. To tackle these issues, in this paper, we propose a novel method termed SoGo, for zero-shot cross-lingual SLU. |
Zhihong Zhu; Xuxin Cheng; Zhiqi Huang; Dongsheng Chen; Yuexian Zou; |
207 | Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with The GeNTE Corpus Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on GeNTE, we then overview existing reference-based evaluation approaches, highlight their limits, and propose a reference-free method more suitable to assess gender-neutral translation. |
Andrea Piergentili; Beatrice Savoldi; Dennis Fucci; Matteo Negri; Luisa Bentivogli; |
208 | Rethinking Model Selection and Decoding for Keyphrase Generation with Pre-trained Sequence-to-Sequence Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Regarding decoding, we demonstrate that while greedy search achieves strong F1 scores, it lags in recall compared with sampling-based methods. Based on these insights, we propose DeSel, a likelihood-based decode-select algorithm for seq2seq PLMs. |
Di Wu; Wasi Ahmad; Kai-Wei Chang; |
209 | HistAlign: Improving Context Dependency in Language Generation By Aligning with History Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, we find that even with training, the performance gain stemming from the cache component of current cache-LMs is suboptimal due to the misalignment between the current hidden states and those stored in the memory. In this work, we present HistAlign, a new training approach to ensure good cache alignment such that the model receives useful signals from the history. |
David Wan; Shiyue Zhang; Mohit Bansal; |
210 | Data Factors for Better Compositional Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, in contrast to this poor performance, state-of-the-art models trained on larger and more general datasets show better generalization ability. In this work, to reconcile this inconsistency, we conduct an empirical analysis by training Transformer models on a variety of training sets with different data factors, including dataset scale, pattern complexity, example difficulty, etc. |
Xiang Zhou; Yichen Jiang; Mohit Bansal; |
211 | Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With the ultimate goal of ensuring that users with different language backgrounds obtain consistent feedback from the same model, we study the cross-lingual consistency (CLC) of factual knowledge in various multilingual PLMs. |
Jirui Qi; Raquel Fern�ndez; Arianna Bisazza; |
212 | Do LLMs Understand Social Knowledge? Evaluating The Sociability of Large Language Models with SocKET Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we introduce a new theory-driven benchmark, SocKET, that contains 58 NLP tasks testing social knowledge which we group into five categories: humor & sarcasm, offensiveness, sentiment & emotion, and trustworthiness. |
Minje Choi; Jiaxin Pei; Sagar Kumar; Chang Shu; David Jurgens; |
213 | Improved Pseudo Data for Machine Translation Quality Estimation with Constrained Beam Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, pseudo data solutions are less satisfying in unsupervised scenarios because the pseudo labels are inaccurate or the pseudo translations differ from the real ones. To address these problems, we propose to generate pseudo data using the MT model with constrained beam search (CBSQE). |
Xiang Geng; Yu Zhang; Zhejian Lai; Shuaijie She; Wei Zou; Shimin Tao; Hao Yang; Jiajun Chen; Shujian Huang; |
214 | IMTLab: An Open-Source Platform for Building, Evaluating, and Diagnosing Interactive Machine Translation Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present IMTLab, an open-source end-to-end interactive machine translation (IMT) system platform that enables researchers to quickly build IMT systems with state-of-the-art models, perform an end-to-end evaluation, and diagnose the weakness of systems. |
Xu Huang; Zhirui Zhang; Ruize Gao; Yichao Du; Lemao Liu; Guoping Huang; Shuming Shi; Jiajun Chen; Shujian Huang; |
215 | Beyond Factuality: A Comprehensive Evaluation of Large Language Models As Knowledge Generators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, community concerns abound regarding the factuality and potential implications of using this uncensored knowledge. In light of this, we introduce CONNER, a COmpreheNsive kNowledge Evaluation fRamework, designed to systematically and automatically evaluate generated knowledge from six important perspectives � Factuality, Relevance, Coherence, Informativeness, Helpfulness and Validity. |
Liang Chen; Yang Deng; Yatao Bian; Zeyu Qin; Bingzhe Wu; Tat-Seng Chua; Kam-Fai Wong; |
216 | Simplicity Level Estimate (SLE): A Learned Reference-Less Metric for Sentence Simplification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new learned evaluation metric � SLE � which focuses on simplicity, outperforming almost all existing metrics in terms of correlation with human judgements. |
Liam Cripwell; Jo�l Legrand; Claire Gardent; |
217 | The Art of SOCRATIC QUESTIONING: Recursive Thinking with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the human cognitive process, we propose SOCRATIC QUESTIONING, a divide-and-conquer style algorithm that mimics the recursive thinking process. |
Jingyuan Qi; Zhiyang Xu; Ying Shen; Minqian Liu; Di Jin; Qifan Wang; Lifu Huang; |
218 | Towards A Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it is unclear whether LMs perform these tasks by cheating with answers memorized from pretraining corpus, or, via a multi-step reasoning mechanism. In this paper, we try to answer this question by exploring a mechanistic interpretation of LMs for multi-step reasoning tasks. |
Yifan Hou; Jiaoda Li; Yu Fei; Alessandro Stolfo; Wangchunshu Zhou; Guangtao Zeng; Antoine Bosselut; Mrinmaya Sachan; |
219 | A Diachronic Perspective on User Trust in AI Under Uncertainty Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, modern NLP systems are seldom calibrated and are often confidently incorrect about their predictions, which violates users� mental model and erodes their trust. In this work, we design a study where users bet on the correctness of an NLP system, and use it to study the evolution of user trust as a response to these trust-eroding events and how the user trust is rebuilt as a function of time after these events. |
Shehzaad Dhuliawala; Vil�m Zouhar; Mennatallah El-Assady; Mrinmaya Sachan; |
220 | Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As humans step into the era of large language models, these issues become increasingly prominent. This paper proposes that the robustness of language models is proportional to the extent of pre-trained knowledge they encompass. |
Jianwei Li; Qi Lei; Wei Cheng; Dongkuan Xu; |
221 | From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Addressing the challenge of adapting pre-trained vision-language models for generating insightful explanations for visual reasoning tasks with limited annotations, we present ReVisE: a Recursive Visual Explanation algorithm. |
Jiaxin Ge; Sanjay Subramanian; Trevor Darrell; Boyi Li; |
222 | Increasing Probability Mass on Answer Choices Does Not Always Improve Accuracy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Are there direct ways of reducing it, and does doing so improve task performance? We propose a mathematical formalism for SFC which allows us to quantify and bound its impact for the first time. |
Sarah Wiegreffe; Matthew Finlayson; Oyvind Tafjord; Peter Clark; Ashish Sabharwal; |
223 | INSTRUCTSCORE: Towards Explainable Text Generation Evaluation with Automatic Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although recent learned metrics show high correlation with human judgement, these metrics do not provide explicit explanation of their verdict, nor associate the scores with defects in the generated text. To address this limitation, we present INSTRUCTSCORE, a fine-grained explainable evaluation metric for text generation. |
Wenda Xu; Danqing Wang; Liangming Pan; Zhenqiao Song; Markus Freitag; William Wang; Lei Li; |
224 | Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we define gender bias as our case study. |
Laura Cabello; Emanuele Bugliarello; Stephanie Brandl; Desmond Elliott; |
225 | �Mistakes Help Us Grow�: Facilitating and Evaluating Growth Mindset Supportive Language in Classrooms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore whether large language models (LLMs) can provide automated, personalized coaching to support teachers� use of GMSL. |
Kunal Handa; Margarett Clapper; Jessica Boyle; Rose Wang; Diyi Yang; David Yeager; Dorottya Demszky; |
226 | Model-tuning Via Prompts Makes NLP Models Adversarially Robust Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we demonstrate surprising gains in adversarial robustness enjoyed by Model-tuning Via Prompts (MVP), an alternative method of adapting to downstream tasks. |
Mrigank Raman; Pratyush Maini; J Kolter; Zachary Lipton; Danish Pruthi; |
227 | Random Entity Quantization for Parameter-Efficient Compositional Knowledge Graph Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We refer to the process of obtaining corresponding codewords of each entity as entity quantization, for which previous works have designed complicated strategies. Surprisingly, this paper shows that simple random entity quantization can achieve similar results to current strategies. |
Jiaang Li; Quan Wang; Yi Liu; Licheng Zhang; Zhendong Mao; |
228 | Understanding The Inner-workings of Language Models Through Representation Dissimilarity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we show that representation dissimilarity measures, which are functions that measure the extent to which two model�s internal representations differ, can be a valuable tool for gaining insight into the mechanics of language models. |
Davis Brown; Charles Godfrey; Nicholas Konz; Jonathan Tu; Henry Kvinge; |
229 | Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a self-evolution learning (SE) based mixup approach for data augmentation in text classification, which can generate more adaptive and model-friendly pseudo samples for the model training. |
Haoqi Zheng; Qihuang Zhong; Liang Ding; Zhiliang Tian; Xin Niu; Changjian Wang; Dongsheng Li; Dacheng Tao; |
230 | Zero-shot Sharpness-Aware Quantization for Pre-trained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most of the cutting-edge zero-shot quantization methods primarily 1) apply to computer vision tasks, and 2) neglect of overfitting problem in the generative adversarial learning process, leading to sub-optimal performance. Motivated by this, we propose a novel zero-shot sharpness-aware quantization (ZSAQ) framework for the zero-shot quantization of various PLMs. |
Miaoxi Zhu; Qihuang Zhong; Li Shen; Liang Ding; Juhua Liu; Bo Du; Dacheng Tao; |
231 | NL2TL: Transforming Natural Languages to Temporal Logics Using Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an accurate and generalizable transformation framework of English instructions from NL to TL, exploring the use of Large Language Models (LLMs) at multiple stages. |
Yongchao Chen; Rujul Gandhi; Yang Zhang; Chuchu Fan; |
232 | A Simple Baseline for Knowledge-Based Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our main contribution in this paper is to propose a much simpler and readily reproducible pipeline which, in a nutshell, is based on efficient in-context learning by prompting LLaMA (1 and 2) using question-informative captions as contextual information. |
Alexandros Xenos; Themos Stafylakis; Ioannis Patras; Georgios Tzimiropoulos; |
233 | SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce SEAHORSE, a dataset for multilingual, multifaceted summarization evaluation. |
Elizabeth Clark; Shruti Rijhwani; Sebastian Gehrmann; Joshua Maynez; Roee Aharoni; Vitaly Nikolaev; Thibault Sellam; Aditya Siddhant; Dipanjan Das; Ankur Parikh; |
234 | Deciphering Stereotypes in Pre-Trained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses the issue of demographic stereotypes present in Transformer-based pre-trained language models (PLMs) and aims to deepen our understanding of how these biases are encoded in these models. |
Weicheng Ma; Henry Scheible; Brian Wang; Goutham Veeramachaneni; Pratim Chowdhary; Alan Sun; Andrew Koulogeorge; Lili Wang; Diyi Yang; Soroush Vosoughi; |
235 | Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we introduce InfoSeek, a visual question answering dataset tailored for information-seeking questions that cannot be answered with only common sense knowledge. |
Yang Chen; Hexiang Hu; Yi Luan; Haitian Sun; Soravit Changpinyo; Alan Ritter; Ming-Wei Chang; |
236 | Length Does Matter: Summary Length Can Bias Summarization Metrics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The results indicate that most metrics tend to favor longer summaries, even after accounting for other factors. To address this issue, we introduce a Bayesian normalization technique that effectively diminishes this bias. |
Xiaobo Guo; Soroush Vosoughi; |
237 | Exploring Distributional Shifts in Large Language Models for Code Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We systematically study how three large language models with code capabilities – CodeT5, Codex, and ChatGPT – generalize to out-of-domain data. |
Shushan Arakelyan; Rocktim Das; Yi Mao; Xiang Ren; |
238 | Towards Building More Robust NER Datasets: An Empirical Study on NER Dataset Bias from A Dataset Difficulty View Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous research attributes the robustness problem to the existence of NER dataset bias, where simpler and regular entity patterns induce shortcut learning. In this work, we bring new insights into this problem by comprehensively investigating the NER dataset bias from a dataset difficulty view. |
Ruotian Ma; Xiaolei Wang; Xin Zhou; Qi Zhang; Xuanjing Huang; |
239 | Hallucination Detection for Generative Large Language Models By Bayesian Sequential Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a unique framework that leverages statistical decision theory and Bayesian sequential analysis to optimize the trade-off between costs and benefits during the hallucination detection process. |
Xiaohua Wang; Yuliang Yan; Longtao Huang; Xiaoqing Zheng; Xuanjing Huang; |
240 | Towards Reliable Misinformation Mitigation: Generalization, Uncertainty, and GPT-4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose focusing on generalization, uncertainty, and how to leverage recent large language models, in order to create more practical tools to evaluate information veracity in contexts where perfect classification is impossible. |
Kellin Pelrine; Anne Imouza; Camille Thibault; Meilina Reksoprodjo; Caleb Gupta; Joel Christoph; Jean-Fran�ois Godbout; Reihaneh Rabbany; |
241 | Exploring Discourse Structure in Document-level Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a more sound paragraph-to-paragraph translation mode and explore whether discourse structure can improve DocMT. |
Xinyu Hu; Xiaojun Wan; |
242 | Evaluation Metrics in The Era of GPT-4: Reliably Evaluating Large Language Models on Sequence to Sequence Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We aim to improve the understanding of current models� performance by providing a preliminary and hybrid evaluation on a range of open and closed-source generative LLMs on three NLP benchmarks: text summarisation, text simplification and grammatical error correction (GEC), using both automatic and human evaluation. |
Andrea Sottana; Bin Liang; Kai Zou; Zheng Yuan; |
243 | How Does Generative Retrieval Scale to Millions of Passages? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conduct the first empirical study of generative retrieval techniques across various corpus scales, ultimately scaling up to the entire MS MARCO passage ranking task with a corpus of 8. |
Ronak Pradeep; Kai Hui; Jai Gupta; Adam Lelkes; Honglei Zhuang; Jimmy Lin; Donald Metzler; Vinh Tran; |
244 | EtiCor: Corpus for Analyzing LLMs for Etiquettes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose EtiCor, an Etiquettes Corpus, having texts about social norms from five different regions across the globe. |
Ashutosh Dwivedi; Pradhyumna Lavania; Ashutosh Modi; |
245 | ViStruct: Visual Structural Knowledge Extraction Via Curriculum Guided Code-Vision Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present ViStruct, a training framework to learn VLMs for effective visual structural knowledge extraction. |
Yangyi Chen; Xingyao Wang; Manling Li; Derek Hoiem; Heng Ji; |
246 | Multilingual Holistic Bias: Extending Descriptors and Patterns to Unveil Demographic Biases in Languages at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a multilingual extension of the HolisticBias dataset, the largest English template-based taxonomy of textual people references: Multilingual HolisticBias. |
Marta Costa-juss�; Pierre Andrews; Eric Smith; Prangthip Hansanti; Christophe Ropers; Elahe Kalbassi; Cynthia Gao; Daniel Licht; Carleigh Wood; |
247 | NORMSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most computational research on norms has focused on a single culture, and manually built datasets, from non-conversational settings. We address these limitations by proposing a new framework, NormSage, to automatically extract culture-specific norms from multi-lingual conversations. |
Yi Fung; Tuhin Chakrabarty; Hao Guo; Owen Rambow; Smaranda Muresan; Heng Ji; |
248 | JASMINE: Arabic GPT Models for Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using our novel benchmark, we evaluate JASMINE extensively showing powerful performance intrinsically as well as in few-shot learning on a wide range of NLP tasks. We aim to responsibly release our models and evaluation benchmark with interested researchers, along with code for experimenting with them. |
El Moatez Billah Nagoudi; Muhammad Abdul-Mageed; AbdelRahim Elmadany; Alcides Inciarte; Md Tawkat Islam Khondaker; |
249 | Set Learning for Generative Information Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, this formalization introduces a potential order bias, which can impair model learning. Targeting this issue, this paper proposes a set learning approach that considers multiple permutations of structured objects to optimize set probability approximately. |
Jiangnan Li; Yice Zhang; Bin Liang; Kam-Fai Wong; Ruifeng Xu; |
250 | What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For each test input, we measure the generator�s calibration to human production variability. Following this instance-level approach, we analyse NLG models and decoding strategies, demonstrating that probing a generator with multiple samples and, when possible, multiple references, provides the level of detail necessary to gain understanding of a model�s representation of uncertainty. |
Mario Giulianelli; Joris Baan; Wilker Aziz; Raquel Fern�ndez; Barbara Plank; |
251 | Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? A Study on Several Typical Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct empirical studies and provide experimental evidences of their performance on a wide variety of financial text analytical problems, using eight benchmark datasets from five categories of tasks. |
Xianzhi Li; Samuel Chan; Xiaodan Zhu; Yulong Pei; Zhiqiang Ma; Xiaomo Liu; Sameena Shah; |
252 | ART: Rule BAsed FutuRe-inference DeducTion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce rule bAsed futuRe-inference deducTion (ART), which aims at deducing the correct future event based on the visual phenomenon (a video) and the rule-based premises, along with an explanation of the reasoning process. |
Mengze Li; Tianqi Zhao; Bai Jionghao; Baoyi He; Jiaxu Miao; Wei Ji; Zheqi Lv; Zhou Zhao; Shengyu Zhang; Wenqiao Zhang; Fei Wu; |
253 | Learning to Describe for Predicting Zero-shot Drug-Drug Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new problem setup as zero-shot DDI prediction that deals with the case of new drugs. |
Fangqi Zhu; Yongqi Zhang; Lei Chen; Bing Qin; Ruifeng Xu; |
254 | Learning Preference Model for LLMs Via Automatic Preference Data Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose learning the preference model for LLMs via automatic preference data generation (AutoPM). |
Shijia Huang; Jianqiao Zhao; Yanyang Li; Liwei Wang; |
255 | Bridging The Gap Between Synthetic and Authentic Images for Multimodal Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, using authentic images for training and synthetic images for inference can introduce a distribution shift, resulting in performance degradation during inference. To tackle this challenge, in this paper, we feed synthetic and authentic images to the MMT model, respectively. |
Wenyu Guo; Qingkai Fang; Dong Yu; Yang Feng; |
256 | Pushdown Layers: Encoding Recursive Structure in Transformer Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces Pushdown Layers, a new self-attention layer that models recursive state via a stack tape that tracks estimated depths of every token in an incremental parse of the observed prefix. |
Shikhar Murty; Pratyusha Sharma; Jacob Andreas; Christopher Manning; |
257 | BiasX: �Thinking Slow� in Toxic Content Moderation with Explanations of Implied Social Biases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This can lead to subtle toxicity being missed, and seemingly toxic but harmless content being over-detected. We introduce BiasX, a framework that enhances content moderation setups with free-text explanations of statements� implied social biases, and explore its effectiveness through a large-scale crowdsourced user study. |
Yiming Zhang; Sravani Nanduri; Liwei Jiang; Tongshuang Wu; Maarten Sap; |
258 | Don�t Take This Out of Context!: On The Need for Contextual Models and Evaluations for Stylistic Rewriting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate integrating the preceding textual context into both the rewriting and evaluation stages of stylistic text rewriting, and introduce a new composite contextual evaluation metric CtxSimFit that combines similarity to the original sentence with contextual cohesiveness. |
Akhila Yerukola; Xuhui Zhou; Elizabeth Clark; Maarten Sap; |
259 | FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce FANToM, a new benchmark designed to stress-test ToM within information-asymmetric conversational contexts via question answering. |
Hyunwoo Kim; Melanie Sclar; Xuhui Zhou; Ronan Bras; Gunhee Kim; Yejin Choi; Maarten Sap; |
260 | LLM-FP4: 4-Bit Floating-Point Quantized Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose LLM-FP4 for quantizing both weights and activations in large language models (LLMs) down to 4-bit floating-point values, in a post-training manner. |
Shih-yang Liu; Zechun Liu; Xijie Huang; Pingcheng Dong; Kwang-Ting Cheng; |
261 | Instruct and Extract: Instruction Tuning for On-Demand Information Extraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, when it comes to information extraction � a classic task in natural language processing � most task-specific systems cannot align well with long-tail ad hoc extraction use cases for non-expert users. To address this, we propose a novel paradigm, termed On-Demand Information Extraction, to fulfill the personalized demands of real-world users. |
Yizhu Jiao; Ming Zhong; Sha Li; Ruining Zhao; Siru Ouyang; Heng Ji; Jiawei Han; |
262 | AutoTrial: Prompting Language Models for Clinical Trial Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a method named AutoTrial to aid the design of clinical eligibility criteria using language models. |
Zifeng Wang; Cao Xiao; Jimeng Sun; |
263 | A Unified View of Evaluation Metrics for Structured Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a conceptual framework that unifies a variety of evaluation metrics for different structured prediction tasks (e. g. event and relation extraction, syntactic and semantic parsing). |
Yunmo Chen; William Gantt; Tongfei Chen; Aaron White; Benjamin Van Durme; |
264 | WordArt Designer: User-Driven Artistic Typography Synthesis Using Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces WordArt Designer, a user-driven framework for artistic typography synthesis, relying on the Large Language Model (LLM). |
Jun-Yan He; Zhi-Qi Cheng; Chenyang Li; Jingdong Sun; Wangmeng Xiang; Xianhui Lin; Xiaoyang Kang; Zengke Jin; Yusen Hu; Bin Luo; Yifeng Geng; Xuansong Xie; |
265 | QTSumm: Query-Focused Summarization Over Tabular Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by this, we define a new query-focused table summarization task, where text generation models have to perform human-like reasoning and analysis over the given table to generate a tailored summary. We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables covering diverse topics. |
Yilun Zhao; Zhenting Qi; Linyong Nan; Boyu Mi; Yixin Liu; Weijin Zou; Simeng Han; Ruizhe Chen; Xiangru Tang; Yumo Xu; Dragomir Radev; Arman Cohan; |
266 | Investigating Table-to-Text Generation Capabilities of Large Language Models in Real-World Information Seeking Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the table-to-text capabilities of different LLMs using four datasets within two real-world information seeking scenarios. |
Yilun Zhao; Haowei Zhang; Shengyun Si; Linyong Nan; Xiangru Tang; Arman Cohan; |
267 | CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present CRoW, a manually-curated, multi-task benchmark that evaluates the ability of models to apply commonsense reasoning in the context of six real-world NLP tasks. |
Mete Ismayilzada; Debjit Paul; Syrielle Montariol; Mor Geva; Antoine Bosselut; |
268 | CRAB: Assessing The Strength of Causal Relationships Between Real-world Events Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present CRAB, a new Causal Reasoning Assessment Benchmark designed to evaluate causal understanding of events in real-world narratives. |
Angelika Romanou; Syrielle Montariol; Debjit Paul; Leo Laugier; Karl Aberer; Antoine Bosselut; |
269 | SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific Tables Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present SCITAB, a challenging evaluation dataset consisting of 1. |
Xinyuan Lu; Liangming Pan; Qian Liu; Preslav Nakov; Min-Yen Kan; |
270 | CaseEncoder: A Knowledge-enhanced Pre-trained Model for Legal Case Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, these models may not fully capture the underlying legal features in legal case documents. To address this issue, we propose CaseEncoder, a legal document encoder that leverages fine-grained legal knowledge in both the data sampling and pre-training phases. |
Yixiao Ma; Yueyue Wu; Weihang Su; Qingyao Ai; Yiqun Liu; |
271 | LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This lack of generalizability is due to the agent�s insensitivity to subtle changes in natural language instructions. To mitigate this issue, we propose explicitly aligning the agent�s hidden states with the instructions via contrastive learning. |
Cheng-Fu Yang; Yen-Chun Chen; Jianwei Yang; Xiyang Dai; Lu Yuan; Yu-Chiang Wang; Kai-Wei Chang; |
272 | Counter Turing Test (CT2): AI-Generated Text Detection Is Not As Easy As You May Think – Introducing AI Detectability Index (ADI) Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, to establish a quantifiable spectrum facilitating the evaluation and ranking of LLMs according to their detectability levels, we propose the AI Detectability Index (ADI). |
Megha Chakraborty; S.M Towhidul Islam Tonmoy; S M Mehedi Zaman; Shreya Gautam; Tanay Kumar; Krish Sharma; Niyar Barman; Chandan Gupta; Vinija Jain; Aman Chadha; Amit Sheth; Amitava Das; |
273 | FACTIFY3M: A Benchmark for Multimodal Fact Verification with Explainability Through 5W Question-Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite progress in automatic text-based fact verification (e. g. , FEVER, LIAR), the research community lacks substantial effort in multimodal fact verification. To address this gap, we introduce FACTIFY 3M, a dataset of 3 million samples that pushes the boundaries of the domain of fact verification via a multimodal fake news dataset, in addition to offering explainability through the concept of 5W question-answering. |
Megha Chakraborty; Khushbu Pahwa; Anku Rani; Shreyas Chatterjee; Dwip Dalal; Harshit Dave; Ritvik G; Preethi Gurumurthy; Adarsh Mahor; Samahriti Mukherjee; Aditya Pakala; Ishan Paul; Janvita Reddy; Arghya Sarkar; Kinjal Sensharma; Aman Chadha; Amit Sheth; Amitava Das; |
274 | Simple and Effective Input Reformulations for Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we reformulate inputs during finetuning for challenging translation tasks, leveraging model strengths from pretraining in novel ways to improve downstream performance. |
Brian Yu; Hansen Lillemark; Kurt Keutzer; |
275 | The Sentiment Problem: A Critical Survey Towards Deconstructing Sentiment Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our study exposes a lack of explicit definitions and frameworks for characterizing sentiment, resulting in potential challenges and biases. To tackle this issue, we propose an ethics sheet encompassing critical inquiries to guide practitioners in ensuring equitable utilization of SA. |
Pranav Venkit; Mukund Srinath; Sanjana Gautam; Saranya Venkatraman; Vipul Gupta; Rebecca Passonneau; Shomir Wilson; |
276 | PAC-tuning: Fine-tuning Pre-trained Language Models with PAC-driven Perturbed Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, adding these regularizations necessitates heavy tuning of the hyperparameters of optimization algorithms, such as the popular Adam optimizer. In this paper, we propose a two-stage fine-tuning method, PAC-tuning, to address this optimization challenge. |
Guangliang Liu; Zhiyu Xue; Xitong Zhang; Kristen Johnson; Rongrong Wang; |
277 | Unveiling The Implicit Toxicity in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While recent studies primarily focus on probing toxic outputs that can be easily detected with existing toxicity classifiers, we show that LLMs can generate diverse implicit toxic outputs that are exceptionally difficult to detect via simply zero-shot prompting. |
Jiaxin Wen; Pei Ke; Hao Sun; Zhexin Zhang; Chengfei Li; Jinfeng Bai; Minlie Huang; |
278 | Re3Dial: Retrieve, Reorganize and Rescale Conversations for Long-Turn Open-Domain Dialogue Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most dialogues in existing pre-training corpora contain fewer than three turns of dialogue. To alleviate this issue, we propose the Retrieve, Reorganize and Rescale framework (Re3Dial), which can automatically construct billion-scale long-turn dialogues by reorganizing existing short-turn ones. |
Jiaxin Wen; Hao Zhou; Jian Guan; Jie Zhou; Minlie Huang; |
279 | Multi-Source Probing for Open-Domain Conversational Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a Multi-Source Probing (MSP) method to probe the dialogue comprehension abilities of open-domain dialogue models. |
Yuanxi Li; Hao Zhou; Jie Zhou; Minlie Huang; |
280 | Building Multi-domain Dialog State Trackers from Single-domain Dialogs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a divide-and-conquer (DAC) DST paradigm and a multi-domain dialog synthesis framework, which makes building multi-domain DST models from single-domain dialogs possible. |
Qi Zhu; Zheng Zhang; Xiaoyan Zhu; Minlie Huang; |
281 | SPT: Learning to Selectively Insert Prompts for Better Prompt Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel framework, Selective Prompt Tuning (SPT), that learns to select the proper prompt layers by inserting a prompt controlled by a learnable probabilistic gate at each intermediate layer. |
Wei Zhu; Ming Tan; |
282 | Enhancing Structured Evidence Extraction for Fact Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple but effective method to enhance the extraction of structured evidence by leveraging the row and column semantics of tables. |
Zirui Wu; Nan Hu; Yansong Feng; |
283 | UniMath: A Foundational and Multimodal Mathematical Reasoner Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While significant progress has been made in natural language processing (NLP), existing methods exhibit limitations in effectively interpreting and processing diverse mathematical modalities. Therefore, we introduce UniMath, a versatile and unified system designed for multimodal mathematical reasoning tasks. |
Zhenwen Liang; Tianyu Yang; Jipeng Zhang; Xiangliang Zhang; |
284 | SLOG: A Structural Generalization Benchmark for Semantic Parsing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce SLOG, a semantic parsing dataset that extends COGS (Kim and Linzen, 2020) with 17 structural generalization cases. |
Bingzhi Li; Lucia Donatelli; Alexander Koller; Tal Linzen; Yuekun Yao; Najoung Kim; |
285 | Chain-of-Questions Training with Latent Answers for Robust Multistep Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Chain-of-Questions, a framework that trains a model to robustly answer multistep questions by generating and answering sub-questions. |
Wang Zhu; Jesse Thomason; Robin Jia; |
286 | Memory-Based Invariance Learning for Out-of-Domain Text Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we augment the original feature space using key-value memory and employ a meta-learning-based approach to enhance the quality of the invariant representations. |
Chen Jia; Yue Zhang; |
287 | Hi-ArG: Exploring The Integration of Hierarchical Argumentation Graphs in Language Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the **Hi**erarchical **Ar**gumentation **G**raph (Hi-ArG), a new structure to organize arguments. |
Jingcong Liang; Rong Ye; Meng Han; Qi Zhang; Ruofei Lai; Xinyu Zhang; Zhao Cao; Xuanjing Huang; Zhongyu Wei; |
288 | Argue with Me Tersely: Towards Sentence-Level Counter-Argument Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the ArgTersely benchmark for sentence-level counter-argument generation, drawing from a manually annotated dataset from the ChangeMyView debate forum. |
Jiayu Lin; Rong Ye; Meng Han; Qi Zhang; Ruofei Lai; Xinyu Zhang; Zhao Cao; Xuanjing Huang; Zhongyu Wei; |
289 | When Do Decompositions Help for Machine Reading? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that decompositions can be helpful in zero or limited-data settings, giving several points of improvement in exact match. |
Kangda Wei; Dawn Lawrie; Benjamin Van Durme; Yunmo Chen; Orion Weller; |
290 | Towards Conceptualization of �Fair Explanation�: Disparate Impacts of Anti-Asian Hate Speech Explanations on Content Moderators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to characterize what constitutes an explanation that is itself �fair� � an explanation that does not adversely impact specific populations. |
Tin Nguyen; Jiannan Xu; Aayushi Roy; Hal Daum� III; Marine Carpuat; |
291 | Expand, Highlight, Generate: RL-driven Document Generation for Passage Reranking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new perspective of data augmentation: generating synthetic documents from queries. |
Arian Askari; Mohammad Aliannejadi; Chuan Meng; Evangelos Kanoulas; Suzan Verberne; |
292 | E-THERAPIST: I Suggest You to Cultivate A Mindset of Positivity and Nurture Uplifting Thoughts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Focusing on this objective, we propose e-THERAPIST, a novel polite interpersonal psychotherapy dialogue system to address issues like depression, anxiety, schizophrenia, etc. |
Kshitij Mishra; Priyanshu Priya; Manisha Burja; Asif Ekbal; |
293 | PHD: Pixel-Based Language Modeling of Historical Documents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Due to the scarcity of real historical scans, we propose a novel method for generating synthetic scans to resemble real historical documents. |
Nadav Borenstein; Phillip Rust; Desmond Elliott; Isabelle Augenstein; |
294 | Towards LLM-driven Dialogue State Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we conduct an initial examination of ChatGPT�s capabilities in DST. |
Yujie Feng; Zexin Lu; Bo Liu; Liming Zhan; Xiao-Ming Wu; |
295 | LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To accelerate model inference and reduce cost, this paper presents LLMLingua, a coarse-to-fine prompt compression method that involves a budget controller to maintain semantic integrity under high compression ratios, a token-level iterative compression algorithm to better model the interdependence between compressed contents, and an instruction tuning based method for distribution alignment between language models. |
Huiqiang Jiang; Qianhui Wu; Chin-Yew Lin; Yuqing Yang; Lili Qiu; |
296 | Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the precedent-enhanced LJP framework (PLJP) � a system that leverages the strength of both LLM and domain models in the context of precedents. |
Yiquan Wu; Siying Zhou; Yifei Liu; Weiming Lu; Xiaozhong Liu; Yating Zhang; Changlong Sun; Fei Wu; Kun Kuang; |
297 | The Framework Tax: Disparities Between Inference Efficiency in NLP Research and Deployment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We denote this phenomena as the framework tax, and observe that the disparity is growing as hardware speed increases over time. In this work, we examine this phenomena through a series of case studies analyzing the effects of model design decisions, framework paradigms, and hardware platforms on total model latency. |
Jared Fernandez; Jacob Kahn; Clara Na; Yonatan Bisk; Emma Strubell; |
298 | Once Is Enough: A Light-Weight Cross-Attention for Fast Sentence Pair Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, this paper introduces a novel paradigm TopicAns for efficient sentence pair modeling. |
Yuanhang Yang; Shiyi Qi; Chuanyi Liu; Qifan Wang; Cuiyun Gao; Zenglin Xu; |
299 | Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the first results on Arabic GEC using two newly developed Transformer-based pretrained sequence-to-sequence models. |
Bashar Alhafni; Go Inoue; Christian Khairallah; Nizar Habash; |
300 | On Bilingual Lexicon Induction with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the global paradigm shift in NLP towards Large Language Models (LLMs), we examine the potential of the latest generation of LLMs for the development of bilingual lexicons. |
Yaoyiran Li; Anna Korhonen; Ivan Vulic; |
This table only includes 300 papers selected by our daily digest algorithm. To continue with the full list (~1,100 papers), please visit Paper Digest: EMNLP-2023 (Full List).