Paper Digest: ACL 2024 Papers & Highlights
To search or review papers within ACL-2024 related to a specific topic, please use the search by venue (ACL-2024), review by venue (ACL-2024) and question answering by venue (ACL-2024) services. To browse papers by author, here is a list of all authors (ACL-2024). You may also like to explore our “Best Paper” Digest (ACL), which lists the most influential ACL papers since 1981.
This list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that empowers you to write, review, get answers and more. Try us today and unlock the full potential of our services for free!
TABLE 1: Paper Digest: ACL 2024 Papers & Highlights
Paper | Author(s) | |
---|---|---|
1 | Measuring Political Bias in Large Language Models: What Is Said and How It Is Said Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to measure political bias in LLMs by analyzing both the content and style of their generated content regarding political issues. |
Yejin Bang; Delong Chen; Nayeon Lee; Pascale Fung; |
2 | MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the disparities, we introduce a new paradigm that encodes the same information with segments of consistent size across diverse languages. |
Tomasz Limisiewicz; Terra Blevins; Hila Gonen; Orevaoghene Ahia; Luke Zettlemoyer; |
3 | DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In response, we propose the DeepSeekMoE architecture towards ultimate expert specialization. |
Damai Dai; Chengqi Deng; Chenggang Zhao; R.x. Xu; Huazuo Gao; Deli Chen; Jiashi Li; Wangding Zeng; Xingkai Yu; Y. Wu; Zhenda Xie; Y.k. Li; Panpan Huang; Fuli Luo; Chong Ruan; Zhifang Sui; Wenfeng Liang; |
4 | OLMo: Accelerating The Science of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we have built OLMo, a competitive, truly Open Language Model, to enable the scientific study of language models. |
Dirk Groeneveld; Iz Beltagy; Evan Walsh; Akshita Bhagia; Rodney Kinney; Oyvind Tafjord; Ananya Jha; Hamish Ivison; Ian Magnusson; Yizhong Wang; Shane Arora; David Atkinson; Russell Authur; Khyathi Chandu; Arman Cohan; Jennifer Dumas; Yanai Elazar; Yuling Gu; Jack Hessel; Tushar Khot; William Merrill; Jacob Morrison; Niklas Muennighoff; Aakanksha Naik; Crystal Nam; Matthew Peters; Valentina Pyatkin; Abhilasha Ravichander; Dustin Schwenk; Saurabh Shah; William Smith; Emma Strubell; Nishant Subramani; Mitchell Wortsman; Pradeep Dasigi; Nathan Lambert; Kyle Richardson; Luke Zettlemoyer; Jesse Dodge; Kyle Lo; Luca Soldaini; Noah Smith; Hannaneh Hajishirzi; |
5 | The Belebele Benchmark: A Parallel Reading Comprehension Dataset in 122 Language Variants Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. |
Lucas Bandarkar; Davis Liang; Benjamin Muller; Mikel Artetxe; Satya Narayan Shukla; Donald Husa; Naman Goyal; Abhinandan Krishnan; Luke Zettlemoyer; Madian Khabsa; |
6 | Structured Tree Alignment for Evaluation of (Speech) Constituency Parsing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the structured average intersection-over-union ratio (STRUCT-IOU), an evaluation metric that compares a constituency parse tree over automatically recognized spoken word boundaries with the ground-truth parse tree over written words. |
Freda Shi; Kevin Gimpel; Karen Livescu; |
7 | Instruction-tuned Language Models Are Better Knowledge Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Therefore, we hypothesize that it is beneficial to expose LLMs to QA pairs before continued pre-training on documents so that the process of encoding knowledge from complex documents takes into account how this knowledge is accessed through questions. Based on this, we propose pre-instruction-tuning (PIT), a method that instruction-tunes on questions prior to training on documents. |
Zhengbao Jiang; Zhiqing Sun; Weijia Shi; Pedro Rodriguez; Chunting Zhou; Graham Neubig; Xi Lin; Wen-tau Yih; Srini Iyer; |
8 | VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Given that most computer interfaces cater to human perception, visual information often augments textual data in ways that text-only models struggle to harness effectively. To bridge this gap, we introduce VisualWebArena, a benchmark designed to assess the performance of multimodal web agents on *realistic visually grounded tasks*. |
Jing Yu Koh; Robert Lo; Lawrence Jang; Vikram Duvvur; Ming Lim; Po-Yu Huang; Graham Neubig; Shuyan Zhou; Russ Salakhutdinov; Daniel Fried; |
9 | Math-Shepherd: Verify and Reinforce LLMs Step-by-step Without Human Annotations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present an innovative process-oriented math process reward model called Math-shepherd, which assigns a reward score to each step of math problem solutions. |
Peiyi Wang; Lei Li; Zhihong Shao; Runxin Xu; Damai Dai; Yifei Li; Deli Chen; Yu Wu; Zhifang Sui; |
10 | Large Language Models Are Not Fair Evaluators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we uncover a positional bias in the evaluation paradigm of adopting large language models (LLMs), e. g. , GPT-4, as a referee to score and compare the quality of responses generated by candidate models. |
Peiyi Wang; Lei Li; Liang Chen; Zefan Cai; Dawei Zhu; Binghuai Lin; Yunbo Cao; Lingpeng Kong; Qi Liu; Tianyu Liu; Zhifang Sui; |
11 | What Evidence Do Language Models Find Convincing? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To resolve these ambiguous queries, one must search through a large range of websites and consider which, if any, of this evidence do I find convincing?. In this work, we study how LLMs answer this question. |
Alexander Wan; Eric Wallace; Dan Klein; |
12 | Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, our primary goal is to bridge the language gap by building a human-curated instruction-following dataset spanning 65 languages. |
Shivalika Singh; Freddie Vargus; Daniel D�souza; B�rje Karlsson; Abinaya Mahendiran; Wei-Yin Ko; Herumb Shandilya; Jay Patel; Deividas Mataciunas; Laura O�Mahony; Mike Zhang; Ramith Hettiarachchi; Joseph Wilson; Marina Machado; Luisa Moura; Dominik Krzeminski; Hakimeh Fadaei; Irem Ergun; Ifeoma Okoh; Aisha Alaagib; Oshan Mudannayake; Zaid Alyafeai; Vu Chien; Sebastian Ruder; Surya Guthikonda; Emad Alghamdi; Sebastian Gehrmann; Niklas Muennighoff; Max Bartolo; Julia Kreutzer; Ahmet �st�n; Marzieh Fadaee; Sara Hooker; |
13 | Digital Socrates: Evaluating LLMs Through Explanation Critiques Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While LLMs can provide reasoned explanations along with their answers, the nature and quality of those explanations are still poorly understood. In response, our goal is to define a detailed way of characterizing the explanation capabilities of modern models and to create a nuanced, interpretable explanation evaluation tool that can generate such characterizations automatically, without relying on expensive API calls or human annotations. |
Yuling Gu; Oyvind Tafjord; Peter Clark; |
14 | Democratizing LLMs for Low-Resource Languages By Leveraging Their English Dominant Abilities with Linguistically-Diverse Prompts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To elicit LLMs� ability onto low-resource languages without any supervised data, we propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English. |
Xuan-Phi Nguyen; Mahani Aljunied; Shafiq Joty; Lidong Bing; |
15 | Reasoning in Flux: Enhancing Large Language Models Reasoning Through Uncertainty-aware Adaptive Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we introduce Uncertainty-aware Adaptive Guidance (UAG), a novel approach for guiding LLM reasoning onto an accurate and reliable trajectory. |
Zhangyue Yin; Qiushi Sun; Qipeng Guo; Zhiyuan Zeng; Xiaonan Li; Junqi Dai; Qinyuan Cheng; Xuanjing Huang; Xipeng Qiu; |
16 | Dolma: An Open Corpus of Three Trillion Tokens for Language Model Pretraining Research Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present analyses and experimental results on intermediate states of Dolma to share what we have learned about important data curation practices. |
Luca Soldaini; Rodney Kinney; Akshita Bhagia; Dustin Schwenk; David Atkinson; Russell Authur; Ben Bogin; Khyathi Chandu; Jennifer Dumas; Yanai Elazar; Valentin Hofmann; Ananya Jha; Sachin Kumar; Li Lucy; Xinxi Lyu; Nathan Lambert; Ian Magnusson; Jacob Morrison; Niklas Muennighoff; Aakanksha Naik; Crystal Nam; Matthew Peters; Abhilasha Ravichander; Kyle Richardson; Zejiang Shen; Emma Strubell; Nishant Subramani; Oyvind Tafjord; Evan Walsh; Luke Zettlemoyer; Noah Smith; Hannaneh Hajishirzi; Iz Beltagy; Dirk Groeneveld; Jesse Dodge; Kyle Lo; |
17 | Defending Against Alignment-Breaking Attacks Via Robustly Aligned LLM Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a Robustly Aligned LLM (RA-LLM) to defend against potential alignment-breaking attacks. |
Bochuan Cao; Yuanpu Cao; Lu Lin; Jinghui Chen; |
18 | Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: But the assumption behind them that generated verification properties have better qualities than solutions may not always hold. In this paper, we treat them equally as different perspectives of LLMs� reasoning processes. |
Baizhou Huang; Shuai Lu; Xiaojun Wan; Nan Duan; |
19 | Machine Unlearning of Pre-trained Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We explore machine unlearning as a pivotal solution, with a focus on pre-trained models�a notably under-researched area. |
Jin Yao; Eli Chien; Minxin Du; Xinyao Niu; Tianhao Wang; Zezhou Cheng; Xiang Yue; |
20 | Navigating The Metrics Maze: Reconciling Score Magnitudes and Accuracies Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper investigates the dynamic range of a number of modern metrics in an effort to provide a collective understanding of the meaning of differences in scores both within and among metrics; in other words, we ask what point difference x in metric y is required between two systems for humans to notice?. |
Tom Kocmi; Vil�m Zouhar; Christian Federmann; Matt Post; |
21 | Synergistic Interplay Between Search and Large Language Models for Information Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the advantages and disadvantages of LLMs and RMs, highlighting their respective strengths in understanding user-issued queries and retrieving up-to-date information. |
Jiazhan Feng; Chongyang Tao; Xiubo Geng; Tao Shen; Can Xu; Guodong Long; Dongyan Zhao; Daxin Jiang; |
22 | Improving Text Embeddings with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps. |
Liang Wang; Nan Yang; Xiaolong Huang; Linjun Yang; Rangan Majumder; Furu Wei; |
23 | Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models (LLMs) show remarkable human-like capability in various domains and languages. To bridge this quality gap, we introduce Cendol, a collection of Indonesian LLMs encompassing both decoder-only and encoder-decoder architectures across a range of model sizes. |
Samuel Cahyawijaya; Holy Lovenia; Fajri Koto; Rifki Putri; Wawan Cenggoro; Jhonson Lee; Salsabil Akbar; Emmanuel Dave; Nuurshadieq Nuurshadieq; Muhammad Mahendra; Rr Putri; Bryan Wilie; Genta Winata; Alham Aji; Ayu Purwarianti; Pascale Fung; |
24 | Respond in My Language: Mitigating Language Inconsistency in Response Generation Based on Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the language inconsistent generation problem in monolingual instruction tuning. |
Liang Zhang; Qin Jin; Haoyang Huang; Dongdong Zhang; Furu Wei; |
25 | Selene: Pioneering Automated Proof in Software Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Selene in this paper, which is the first project-level automated proof benchmark constructed based on the real-world industrial-level operating system microkernel, seL4. |
Lichen Zhang; Shuai Lu; Nan Duan; |
26 | Agent Lumos: Unified and Modular Training for Open-Source Language Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Lumos, one of the first frameworks for training open-source LLM-based agents. |
Da Yin; Faeze Brahman; Abhilasha Ravichander; Khyathi Chandu; Kai-Wei Chang; Yejin Choi; Bill Yuchen Lin; |
27 | Long-Context Language Modeling with Parallel Context Encoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Cross-Attention to Parallel Encodings (CAPE), a framework that can be applied to any existing decoder-only LLMs for context expansion. |
Howard Yen; Tianyu Gao; Danqi Chen; |
28 | Revisiting Demonstration Selection Strategies in In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we first revisit the factors contributing to this variance from the model aspect, and find that the demonstration choice is both data- and model-dependent. We further propose a conjecture that the performance of a demonstration positively correlates with its contribution to the model?s understanding of the test samples, and accordingly propose a data- and model-dependent demonstration selection method, TopK + ConE. |
Keqin Peng; Liang Ding; Yancheng Yuan; Xuebo Liu; Min Zhang; Yuanxin Ouyang; Dacheng Tao; |
29 | How Abilities in Large Language Models Are Affected By Supervised Fine-tuning Data Composition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we specificially focuses on the interplay of data composition between mathematical reasoning, code generation, and general human-aligning abilities during SFT. |
Guanting Dong; Hongyi Yuan; Keming Lu; Chengpeng Li; Mingfeng Xue; Dayiheng Liu; Wei Wang; Zheng Yuan; Chang Zhou; Jingren Zhou; |
30 | The Unreasonable Effectiveness of Easy Training Data for Hard Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the surprising conclusion that current pretrained language models often generalize relatively well from easy to hard data, even performing as well as oracle models finetuned on hard data. |
Peter Hase; Mohit Bansal; Peter Clark; Sarah Wiegreffe; |
31 | What Does The Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we bring the arms race to the next level by investigating the opportunities and risks of state-of-the-art large language models (LLMs) in social bot detection. |
Shangbin Feng; Herun Wan; Ningnan Wang; Zhaoxuan Tan; Minnan Luo; Yulia Tsvetkov; |
32 | Don�t Hallucinate, Abstain: Identifying LLM Knowledge Gaps Via Multi-LLM Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study approaches to identify LLM knowledge gaps and abstain from answering questions when knowledge gaps are present. |
Shangbin Feng; Weijia Shi; Yike Wang; Wenxuan Ding; Vidhisha Balachandran; Yulia Tsvetkov; |
33 | The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that, even within a single model, we can find multiple subnetworks that perform similarly in-domain, but generalize vastly differently. |
Adithya Bhaskar; Dan Friedman; Danqi Chen; |
34 | \mathcal XFT: Unlocking The Power of Code Instruction Tuning By Simply Merging Upcycled Mixture-of-Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce \mathcal XFT, a simple yet powerful training scheme, by simply merging upcycled Mixture-of-Experts (MoE) to unleash the performance limit of instruction-tuned code Large Language Models (LLMs). |
Yifeng Ding; Jiawei Liu; Yuxiang Wei; Lingming Zhang; |
35 | SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Most existing GUI agents interact with the environment through extracted structured data, which can be notably lengthy (e. g. , HTML) and occasionally inaccessible (e. g. , on desktops). To alleviate this issue, we propose a novel visual GUI agent � SeeClick, which only relies on screenshots for task automation. |
Kanzhi Cheng; Qiushi Sun; Yougang Chu; Fangzhi Xu; Li YanTao; Jianbing Zhang; Zhiyong Wu; |
36 | I Am A Strange Dataset: Metalinguistic Tests for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Can large language models (LLMs) handle such language? In this paper, we present �I am a Strange Dataset�, a new dataset for addressing this question. |
Tristan Thrush; Jared Moore; Miguel Monares; Christopher Potts; Douwe Kiela; |
37 | A Chain-of-Thought Is As Strong As Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce REVEAL: Reasoning Verification Evaluation, a dataset to benchmark automatic verifiers of complex Chain-of-Thought reasoning in open-domain question-answering settings. |
Alon Jacovi; Yonatan Bitton; Bernd Bohnet; Jonathan Herzig; Or Honovich; Michael Tseng; Michael Collins; Roee Aharoni; Mor Geva; |
38 | Large Language Models Are Superpositions of All Characters: Attaining Arbitrary Role-play Via Self-Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, we introduce Ditto, the first self-alignment method for role-play, which encourages an instruction-following LLM to simulate role-play dialogues as a variant of reading comprehension, and creates a role-play training set comprising 4000 characters, surpassing the scale of currently available datasets by tenfold regarding the number of roles. |
Keming Lu; Bowen Yu; Chang Zhou; Jingren Zhou; |
39 | LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce LongBench, the first bilingual, multi-task benchmark for long context understanding, enabling a more rigorous evaluation of long context understanding. |
Yushi Bai; Xin Lv; Jiajie Zhang; Hongchang Lyu; Jiankai Tang; Zhidian Huang; Zhengxiao Du; Xiao Liu; Aohan Zeng; Lei Hou; Yuxiao Dong; Jie Tang; Juanzi Li; |
40 | Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a novel collaborative approach, namely SlimPLM, that detects missing knowledge in LLMs with a slim proxy model, to enhance the LLM�s knowledge acquisition process. |
Jiejun Tan; Zhicheng Dou; Yutao Zhu; Peidong Guo; Kun Fang; Ji-Rong Wen; |
41 | Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we present an exploration-based trajectory optimization approach, referred to as ETO. |
Yifan Song; Da Yin; Xiang Yue; Jie Huang; Sujian Li; Bill Yuchen Lin; |
42 | LLM in A Flash: Efficient Large Language Model Inference with Limited Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameters in flash memory, but bringing them on demand to DRAM. |
Keivan Alizadeh; Seyed Iman Mirzadeh; Dmitry Belenko; S. Khatamifard; Minsik Cho; Carlo C Del Mundo; Mohammad Rastegari; Mehrdad Farajtabar; |
43 | VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces VIEScore, a Visual Instruction-guided Explainable metric for evaluating any conditional image generation tasks. |
Max Ku; Dongfu Jiang; Cong Wei; Xiang Yue; Wenhu Chen; |
44 | LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios Via Prompt Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Research indicates that LLM performance hinges on the density and position of key information in the input prompt. Inspired by these findings, we propose LongLLMLingua for prompt compression towards improving LLMs� perception of the key information to simultaneously address the three challenges. |
Huiqiang Jiang; Qianhui Wu; Xufang Luo; Dongsheng Li; Chin-Yew Lin; Yuqing Yang; Lili Qiu; |
45 | Exploring The Potential of Large Language Models in Computational Argumentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work aims to embark on an assessment of LLMs, such as ChatGPT, Flan models, and LLaMA2 models, in both zero-shot and few-shot settings. |
Guizhen Chen; Liying Cheng; Anh Tuan Luu; Lidong Bing; |
46 | Revisiting Knowledge Distillation for Autoregressive Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response to this problem, we conduct a series of analyses and reveal that different tokens have different teaching modes, neglecting which will lead to performance degradation. Motivated by this, we propose a simple yet effective adaptive teaching approach (ATKD) to improve the KD. |
Qihuang Zhong; Liang Ding; Li Shen; Juhua Liu; Bo Du; Dacheng Tao; |
47 | Rephrasing The Web: A Recipe for Compute and Data-Efficient Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Web Rephrase Augmented Pre-training (WRAP) that uses an off-the-shelf instruction-tuned model prompted to paraphrase documents on the web in specific styles such as �like Wikipedia� or in �question-answer format� to jointly pre-train LLMs on real and synthetic rephrases. |
Pratyush Maini; Skyler Seto; Richard Bai; David Grangier; Yizhe Zhang; Navdeep Jaitly; |
48 | Make-A-Voice: Revisiting Voice Large Language Models As Scalable Multilingual and Multitask Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models (LLMs) have successfully served as a general-purpose interface across multiple tasks and languages, while the adaptation of voice LLMs is mostly designed for specific purposes (either single-task or monolingual), where the advantages of LLMs especially for low-resource language processing and zero-shot task generalization are less exploited in the audio community. To bridge the gap, we introduce Make-A-Voice as a multi-modal voice LLM and conduct a comprehensive study on its capability to deal with multiple tasks/languages. |
Rongjie Huang; Chunlei Zhang; Yongqi Wang; Dongchao Yang; Jinchuan Tian; Zhenhui Ye; Luping Liu; Zehan Wang; Ziyue Jiang; Xuankai Chang; Jiatong Shi; Chao Weng; Zhou Zhao; Dong Yu; |
49 | Order-Agnostic Data Augmentation for Few-Shot Named Entity Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose order-agnostic data augmentation (OaDA), an alternative solution that exploits the often overlooked order-agnostic property in the training data construction phase of sequence-to-sequence NER methods for data augmentation. |
Huiming Wang; Liying Cheng; Wenxuan Zhang; De Wen Soh; Lidong Bing; |
50 | Mo�sai: Efficient Text-to-Music Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our work, we bridge text and music via a text-to-music generation model that is highly efficient, expressive, and can handle long-term structure. |
Flavio Schneider; Ojasv Kamal; Zhijing Jin; Bernhard Sch�lkopf; |
51 | The Dawn After The Dark: An Empirical Study on Factuality Hallucination in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To tackle the hallucination, three key questions should be well studied: how to detect hallucinations (detection), why do LLMs hallucinate (source), and what can be done to mitigate them (mitigation). To address these challenges, this work presents a systematic empirical study on LLM hallucinations, focused on the three aspects of hallucination detection, source and mitigation. |
Junyi Li; Jie Chen; Ruiyang Ren; Xiaoxue Cheng; Xin Zhao; Jian-Yun Nie; Ji-Rong Wen; |
52 | Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we delve into the composition of Transformer architectures in LLMs to pinpoint language-specific regions. |
Tianyi Tang; Wenyang Luo; Haoyang Huang; Dongdong Zhang; Xiaolei Wang; Xin Zhao; Furu Wei; Ji-Rong Wen; |
53 | Active Prompting with Chain-of-Thought for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a new method, Active-Prompt, to adapt LLMs to different tasks with task-specific example prompts (annotated with human-designed CoT reasoning). |
Shizhe Diao; Pengcheng Wang; Yong Lin; Rui Pan; Xiang Liu; Tong Zhang; |
54 | Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce DecoQuant, a novel data-free low-bit quantization technique based on tensor decomposition methods, to effectively compress KV cache. |
Peiyu Liu; Ze-Feng Gao; Xin Zhao; Yipeng Ma; Tao Wang; Ji-Rong Wen; |
55 | Relying on The Unreliable: The Impact of Language Models� Reluctance to Express Uncertainty Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate how LMs incorporate confidence in responses via natural language and how downstream users behave in response to LM-articulated uncertainties. |
Kaitlyn Zhou; Jena Hwang; Xiang Ren; Maarten Sap; |
56 | AIR-Bench: Benchmarking Large Audio-Language Models Via Generative Comprehension Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce AIR-Bench (Audio InstRuction Benchmark), the first benchmark designed to evaluate the ability of LALMs to understand various types of audio signals (including human speech, natural sounds, and music), and furthermore, to interact with humans in the textual format. |
Qian Yang; Jin Xu; Wenrui Liu; Yunfei Chu; Ziyue Jiang; Xiaohuan Zhou; Yichong Leng; Yuanjun Lv; Zhou Zhao; Chang Zhou; Jingren Zhou; |
57 | Cross-Lingual Knowledge Editing in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a result, it is still unknown the effect of source language editing on a different target language. In this paper, we aim to figure out this cross-lingual effect in knowledge editing. |
Jiaan Wang; Yunlong Liang; Zengkui Sun; Yuxuan Cao; Jiarong Xu; Fandong Meng; |
58 | Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we analyze whether LLMs also hold the capability to handle multiple instructions simultaneously, denoted as Multi-Task Inference. |
Guijin Son; SangWon Baek; Sangdae Nam; Ilgyun Jeong; Seungone Kim; |
59 | LaMP: When Large Language Models Meet Personalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We additionally propose two retrieval augmentation approaches that retrieve personal items from each user profile for personalizing language model outputs. To this aim, we study various retrieval models, including term matching, semantic matching, and time-aware methods. |
Alireza Salemi; Sheshera Mysore; Michael Bendersky; Hamed Zamani; |
60 | Bridging The Preference Gap Between Retrievers and LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we examine a novel bridge mechanism. |
Zixuan Ke; Weize Kong; Cheng Li; Mingyang Zhang; Qiaozhu Mei; Michael Bendersky; |
61 | Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In conclusion, we integrate insights from social psychology to contextualize the collaboration of LLM agents, inspiring further investigations into the collaboration mechanism for LLMs. |
Jintian Zhang; Xin Xu; Ningyu Zhang; Ruibo Liu; Bryan Hooi; Shumin Deng; |
62 | NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models Via Complexity Classes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they are inadequate in offering a rigorous evaluation and prone to the risk of overfitting, as these publicly accessible and static benchmarks allow models to potentially tailor their responses to specific benchmark metrics, thereby inflating their performance. Addressing these limitations, we introduce a new benchmark NPHardEval. |
Lizhou Fan; Wenyue Hua; Lingyao Li; Haoyang Ling; Yongfeng Zhang; |
63 | Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We revisit how alignment from human preferences is formulated in the context of RL. |
Arash Ahmadian; Chris Cremer; Matthias Gall�; Marzieh Fadaee; Julia Kreutzer; Olivier Pietquin; Ahmet �st�n; Sara Hooker; |
64 | F-Eval: Asssessing Fundamental Abilities with Refined Evaluation Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge the gap, we propose F-Eval, a bilingual evaluation benchmark to evaluate the fundamental abilities, including expression, commonsense and logic. |
Yu Sun; Keyuchen Keyuchen; Shujie Wang; Peiji Li; Qipeng Guo; Hang Yan; Xipeng Qiu; Xuanjing Huang; Dahua Lin; |
65 | Explicating The Implicit: Argument Detection Beyond Sentence Boundaries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we reformulate the problem of argument detection through textual entailment to capture semantic relations across sentence boundaries. |
Paul Roit; Aviv Slobodkin; Eran Hirsch; Arie Cattan; Ayal Klein; Valentina Pyatkin; Ido Dagan; |
66 | Multi-Level Feedback Generation with Large Language Models for Empowering Novice Peer Counselors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work aims to leverage large language models to provide contextualized and multi-level feedback to empower peer counselors, especially novices, at scale. |
Alicja Chaszczewicz; Raj Shah; Ryan Louie; Bruce Arnow; Robert Kraut; Diyi Yang; |
67 | The Hidden Space of Transformer Language Adapters Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We analyze the operation of transformer language adapters, which are small modules trained on top of a frozen language model to adapt its predictions to new target languages. |
Jesujoba Alabi; Marius Mosbach; Matan Eyal; Dietrich Klakow; Mor Geva; |
68 | Narrowing The Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose GRANOLA QA, a novel evaluation setting where a predicted answer is evaluated in terms of accuracy and informativeness against a set of multi-granularity answers. |
Gal Yona; Roee Aharoni; Mor Geva; |
69 | Silent Signals, Loud Impact: LLMs for Word-Sense Disambiguation of Coded Dog Whistles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an approach for word-sense disambiguation of dog whistles from standard speech using Large Language Models (LLMs), and leverage this technique to create a dataset of 16,550 high-confidence coded examples of dog whistles used in formal and informal communication. |
Julia Kruk; Michela Marchini; Rijul Magu; Caleb Ziems; David Muchlinski; Diyi Yang; |
70 | Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nevertheless, due to data constraints, the capabilities of most open-source LLMs are primarily focused on English. To address this issue, we introduce the concept of chat vector to equip pre-trained language models with instruction following and human value alignment via simple model arithmetic. |
Shih-Cheng Huang; Pin-Zu Li; Yu-chi Hsu; Kuang-Ming Chen; Yu Tung Lin; Shih-Kai Hsiao; Richard Tsai; Hung-yi Lee; |
71 | Prompt Optimization Via Adversarial In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new method, Adversarial In-Context Learning (adv-ICL), to optimize prompts for in-context learning (ICL). |
Do Long; Yiran Zhao; Hannah Brown; Yuxi Xie; James Zhao; Nancy Chen; Kenji Kawaguchi; Michael Shieh; Junxian He; |
72 | Unintended Impacts of LLM Alignment on Global Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We explore how alignment impacts performance along three axes of global representation: English dialects, multilingualism, and opinions from and about countries worldwide. |
Michael Ryan; William Held; Diyi Yang; |
73 | Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Accordingly, we propose to integrate goal prioritization at both training and inference stages to counteract. |
Zhexin Zhang; Junxiao Yang; Pei Ke; Fei Mi; Hongning Wang; Minlie Huang; |
74 | SafetyBench: Evaluating The Safety of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present SafetyBench, a comprehensive benchmark for evaluating the safety of LLMs, which comprises 11,435 diverse multiple choice questions spanning across 7 distinct categories of safety concerns. |
Zhexin Zhang; Leqi Lei; Lindong Wu; Rui Sun; Yongkang Huang; Chong Long; Xiao Liu; Xuanyu Lei; Jie Tang; Minlie Huang; |
75 | Navigating The OverKill in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the factors for overkill by exploring how models handle and determine the safety of queries. |
Chenyu Shi; Xiao Wang; Qiming Ge; Songyang Gao; Xianjun Yang; Tao Gui; Qi Zhang; Xuanjing Huang; Xun Zhao; Dahua Lin; |
76 | How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety By Humanizing LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Observing this, we shift the perspective, by treating LLMs as human-like communicators to examine the interplay between everyday language interaction and AI safety. Specifically, we study how to persuade LLMs to jailbreak them. |
Yi Zeng; Hongpeng Lin; Jingwen Zhang; Diyi Yang; Ruoxi Jia; Weiyan Shi; |
77 | Advancement in Graph Understanding: A Multimodal Benchmark and Fine-Tuning of Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new paradigm for interactive and instructional graph data understanding and reasoning. |
Qihang Ai; Jiafan Li; Jincheng Dai; Jianwu Zhou; Lemao Liu; Haiyun Jiang; Shuming Shi; |
78 | Language Models Are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models Through Task Arithmetic Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose RESTA to perform LLM realignment towards safety, which gets compromised due to downstream task fine-tuning. |
Rishabh Bhardwaj; Duc Anh Do; Soujanya Poria; |
79 | HiRoPE: Length Extrapolation for Code Models Using Hierarchical Position Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by how human programmers navigate code, we introduce Hierarchical Rotary Position Embedding (HiRoPE), a novel approach that enhances the traditional rotary position embedding into a hierarchical format based on the hierarchical structure of source code. |
Kechi Zhang; Ge Li; Huangzhao Zhang; Zhi Jin; |
80 | CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, real-world software development often involves complex code repositories with complex dependencies and extensive documentation. To enable LLMs to handle these realworld repo-level code generation, we present CodeAgent, a novel LLM-based agent framework that employs external tools for effective repo-level code generation. |
Kechi Zhang; Jia Li; Ge Li; Xianjie Shi; Zhi Jin; |
81 | CausalGym: Benchmarking Causal Interpretability Methods on Linguistic Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: At the same time, research in model interpretability has begun to illuminate the abstract causal mechanisms shaping LM behavior. To help bring these strands of research closer together, we introduce CausalGym. |
Aryaman Arora; Dan Jurafsky; Christopher Potts; |
82 | RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To what extent can different interpretability methods successfully disentangle these roles? To help address this question, we introduce RAVEL (Resolving Attribute-Value Entanglements in Language Models), a dataset that enables tightly controlled, quantitative comparisons between a variety of existing interpretability methods. |
Jing Huang; Zhengxuan Wu; Christopher Potts; Mor Geva; Atticus Geiger; |
83 | Predicting Text Preference Via Structured Comparative Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SC2, a model that prompts LLMs to predict text preferences by generating structured intermediate comparisons. |
Jing Nathan Yan; Tianqi Liu; Justin Chiu; Jiaming Shen; Zhen Qin; Yue Yu; Charumathi Lakshmanan; Yair Kurzion; Alexander Rush; Jialu Liu; Michael Bendersky; |
84 | Video-ChatGPT: Towards Detailed Video Understanding Via Large Vision and Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new dataset of 100,000 video-instruction pairs used to train Video-ChatGPT acquired via manual and semi-automated pipeline that is easily scalable and robust to label noise. |
Muhammad Maaz; Hanoona Rasheed; Salman Khan; Fahad Khan; |
85 | Search-Adaptor: Embedding Customization for Information Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method, Search-Adaptor, for customizing LLMs for information retrieval in an efficient and robust way. |
Jinsung Yoon; Yanfei Chen; Sercan Arik; Tomas Pfister; |
86 | CritiqueLLM: Towards An Informative Critique Generation Model for Evaluation of Large Language Model Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple yet effective method called Eval-Instruct, which can first acquire pointwise grading critiques with pseudo references and then revise these critiques via multi-path prompting to obtain informative evaluation data in different tasks and settings, including pointwise grading and pairwise comparison with / without references. |
Pei Ke; Bosi Wen; Andrew Feng; Xiao Liu; Xuanyu Lei; Jiale Cheng; Shengyuan Wang; Aohan Zeng; Yuxiao Dong; Hongning Wang; Jie Tang; Minlie Huang; |
87 | Speaker Verification in Agent-generated Conversations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the ability to personalize the generated utterances to speakers, whether conducted by human or LLM, has not been well studied. To bridge this gap, our study introduces a novel evaluation challenge: speaker verification in agent-generated conversations, which aimed to verify whether two sets of utterances originate from the same speaker. |
Yizhe Yang; Palakorn Achananuparp; Heyan Huang; Jing Jiang; Ee-Peng Lim; |
88 | Full Parameter Fine-tuning for Large Language Models with Limited Resources Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new optimizer, LOw-Memory Optimization (LOMO), which fuses the gradient computation and the parameter update in one step to reduce memory usage. |
Kai Lv; Yuqing Yang; Tengxiao Liu; Qipeng Guo; Xipeng Qiu; |
89 | Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a method to evaluate the response preference by using the output probabilities of response pairs under contrastive prompt pairs, which could achieve better performance on LLaMA2-7B and LLaMA2-13B compared to RLAIF. Based on this, we propose an automatic alignment method, Direct Large Model Alignment (DLMA). |
Aiwei Liu; Haoping Bai; Zhiyun Lu; Xiang Kong; Xiaoming Wang; Jiulong Shan; Meng Cao; Lijie Wen; |
90 | OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present OlympiadBench, an Olympiad-level bilingual multimodal scientific benchmark, featuring 8,476 problems from Olympiad-level mathematics and physics competitions, including the Chinese college entrance exam. |
Chaoqun He; Renjie Luo; Yuzhuo Bai; Shengding Hu; Zhen Thai; Junhao Shen; Jinyi Hu; Xu Han; Yujie Huang; Yuxiang Zhang; Jie Liu; Lei Qi; Zhiyuan Liu; Maosong Sun; |
91 | Do Llamas Work in English? On The Latent Language of Multilingual Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Tracking intermediate embeddings through their high-dimensional space reveals three distinct phases, whereby intermediate embeddings (1) start far away from output token embeddings; (2) already in middle layers allow for decoding a semantically correct next token, but giving higher probability to its version in English than in the input language; (3) move into an input-language-specific region of the embedding space. We cast these results into a conceptual model where the three phases operate in �input space�, �concept space�, and �output space�, respectively. |
Chris Wendler; Veniamin Veselovsky; Giovanni Monea; Robert West; |
92 | Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models Without Logit Access Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces sketch-guided constrained decoding (SketchGCD), a novel approach to constrained decoding for blackbox LLMs, which operates without access to the logits of the blackbox LLM. |
Saibo Geng; Berkay D�ner; Chris Wendler; Martin Josifoski; Robert West; |
93 | Quantifying Contamination in Evaluating Code Generation Capabilities of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we perform a comprehensive study of data contamination of popular code generation benchmarks, and precisely quantify their overlap with pretraining corpus through both surface-level and semantic-level matching. |
Martin Riddell; Ansong Ni; Arman Cohan; |
94 | VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce VoiceCraft, a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on audiobooks, internet videos, and podcasts. |
Puyuan Peng; Po-Yao Huang; Shang-Wen Li; Abdelrahman Mohamed; David Harwath; |
95 | Experiential Co-Learning of Software-Developing Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce Experiential Co-Learning, a novel LLM-agent learning framework in which instructor and assistant agents gather shortcut-oriented experiences from their historical trajectories and use these past experiences for future task execution. |
Chen Qian; Yufan Dang; Jiahao Li; Wei Liu; Zihao Xie; YiFei Wang; Weize Chen; Cheng Yang; Xin Cong; Xiaoyin Che; Zhiyuan Liu; Maosong Sun; |
96 | ChatDev: Communicative Agents for Software Development Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce ChatDev, a chat-powered software development framework in which specialized agents driven by large language models (LLMs) are guided in what to communicate (via chat chain) and how to communicate (via communicative dehallucination). |
Chen Qian; Wei Liu; Hongzhang Liu; Nuo Chen; Yufan Dang; Jiahao Li; Cheng Yang; Weize Chen; Yusheng Su; Xin Cong; Juyuan Xu; Dahai Li; Zhiyuan Liu; Maosong Sun; |
97 | Quantifying Uncertainty in Answers from Any Language Model and Enhancing Their Trustworthiness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce BSDetector, a method for detecting bad and speculative answers from a pretrained Large Language Model by estimating a numeric confidence score for any output it generated. |
Jiuhai Chen; Jonas Mueller; |
98 | PRP: Propagating Universal Perturbations to Attack Large Language Model Guard-Rails Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key contribution is to show a novel attack strategy, PRP, that is successful against several open-source (e. g. , Llama 2) and closed-source (e. g. , GPT 3. |
Neal Mangaokar; Ashish Hooda; Jihye Choi; Shreyas Chandrashekaran; Kassem Fawaz; Somesh Jha; Atul Prakash; |
99 | ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While current research primarily emphasizes leveraging tools to augment LLMs, it frequently neglects emerging safety considerations tied to their application. To fill this gap, we present ToolSword, a comprehensive framework dedicated to meticulously investigating safety issues linked to LLMs in tool learning. |
Junjie Ye; Sixian Li; Guanyu Li; Caishuang Huang; Songyang Gao; Yilong Wu; Qi Zhang; Tao Gui; Xuanjing Huang; |
100 | AutoAct: Automatic Agent Learning from Scratch for QA Via Self-Planning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce AutoAct, an automatic agent learning framework for QA that does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models (e. g. , GPT-4). |
Shuofei Qiao; Ningyu Zhang; Runnan Fang; Yujie Luo; Wangchunshu Zhou; Yuchen Jiang; Chengfei Lv; Huajun Chen; |
101 | AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce AnyGPT, an any-to-any multimodal language model that utilizes discrete representations for the unified processing of various modalities, including speech, text, images, and music. |
Jun Zhan; Junqi Dai; Jiasheng Ye; Yunhua Zhou; Dong Zhang; Zhigeng Liu; Xin Zhang; Ruibin Yuan; Ge Zhang; Linyang Li; Hang Yan; Jie Fu; Tao Gui; Tianxiang Sun; Yu-Gang Jiang; Xipeng Qiu; |
102 | FastFiD: Improve Inference Efficiency of Open Domain Question Answering Via Sentence Selection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nevertheless, this framework can be relatively time-consuming, particularly due to the extensive length of the gathered passages. To address this, we introduce FastFiD in this paper, a novel approach that executes sentence selection on the encoded passages. |
Yufei Huang; Xu Han; Maosong Sun; |
103 | Shifting Attention to Relevance: Towards The Predictive Uncertainty Quantification of Free-Form Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current methods underestimate this inequality when assessing uncertainty, causing tokens with limited semantics to be equally or excessively weighted in UQ. To correct this, we propose Shifting Attention to more Relevant (SAR) components at both token- and sentence-levels for better UQ. |
Jinhao Duan; Hao Cheng; Shiqi Wang; Alex Zavalny; Chenan Wang; Renjing Xu; Bhavya Kailkhura; Kaidi Xu; |
104 | LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). |
Mostafa Elhoushi; Akshat Shrivastava; Diana Liskovich; Basil Hosmer; Bram Wasti; Liangzhen Lai; Anas Mahmoud; Bilge Acun; Saurabh Agarwal; Ahmed Roman; Ahmed Aly; Beidi Chen; Carole-Jean Wu; |
105 | COKE: A Cognitive Knowledge Graph for Machine Theory of Mind Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Though indispensable for social intelligence, ToM is still lacking for modern AI and NLP systems since they cannot access the human mental state and cognitive process beneath the training corpus. To empower AI systems with the ToM ability and narrow the gap between them and humans, in this paper, we propose COKE: the first cognitive knowledge graph for machine theory of mind. |
Jincenzi Wu; Zhuang Chen; Jiawen Deng; Sahand Sabour; Helen Meng; Minlie Huang; |
106 | 8Bench: Extending Long Context Evaluation Beyond 100K Tokens Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose , the first LLM benchmark featuring an average data length surpassing 100K tokens. |
Xinrong Zhang; Yingfa Chen; Shengding Hu; Zihang Xu; Junhao Chen; Moo Hao; Xu Han; Zhen Thai; Shuo Wang; Zhiyuan Liu; Maosong Sun; |
107 | Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although adept at devising strategies and performing tasks, these agents struggle with seeking clarification and grasping precise user intentions. To bridge this gap, we introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users� implicit intentions through explicit queries. |
Cheng Qian; Bingxiang He; Zhong Zhuang; Jia Deng; Yujia Qin; Xin Cong; Zhong Zhang; Jie Zhou; Yankai Lin; Zhiyuan Liu; Maosong Sun; |
108 | Same Task, More Tokens: The Impact of Input Length on The Reasoning Performance of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite LLMs advancements in recent times, their performance consistency across different input lengths is not well understood. We investigate this aspect by introducing a novel QA reasoning framework, specifically designed to assess the impact of input length. |
Mosh Levy; Alon Jacoby; Yoav Goldberg; |
109 | UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nevertheless, the majority of studies primarily concentrate on English, with only limited exploration into the realm of multilingual abilities. In this work, we therefore construct an open-source multilingual supervised fine-tuning dataset. |
Haoyu Wang; Shuo Wang; Yukun Yan; Xujia Wang; Zhiyu Yang; Yuzhuang Xu; Zhenghao Liu; Liner Yang; Ning Ding; Xu Han; Zhiyuan Liu; Maosong Sun; |
110 | LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose LoRA-Flow, which utilizes dynamic weights to adjust the impact of different LoRAs. |
Hanqing Wang; Bowen Ping; Shuo Wang; Xu Han; Yun Chen; Zhiyuan Liu; Maosong Sun; |
111 | GradSafe: Detecting Jailbreak Prompts for LLMs Via Safety-Critical Gradient Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose GradSafe, which effectively detects jailbreak prompts by scrutinizing the gradients of safety-critical parameters in LLMs. |
Yueqi Xie; Minghong Fang; Renjie Pi; Neil Gong; |
112 | Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work introduces Aya, a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced. |
Ahmet �st�n; Viraat Aryabumi; Zheng Yong; Wei-Yin Ko; Daniel D�souza; Gbemileke Onilude; Neel Bhandari; Shivalika Singh; Hui-Lee Ooi; Amr Kayid; Freddie Vargus; Phil Blunsom; Shayne Longpre; Niklas Muennighoff; Marzieh Fadaee; Julia Kreutzer; Sara Hooker; |
113 | LoRAMoE: Alleviating World Knowledge Forgetting in Large Language Models Via MoE-Style Plugin Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we find that large-scale increases in instruction data can damage the world knowledge previously stored in LLMs. To address this challenge, we propose LoRAMoE, a novelty framework that introduces several low-rank adapters (LoRA) and integrates them by using a router network, like a plugin version of Mixture of Experts (MoE). |
Shihan Dou; Enyu Zhou; Yan Liu; Songyang Gao; Wei Shen; Limao Xiong; Yuhao Zhou; Xiao Wang; Zhiheng Xi; Xiaoran Fan; Shiliang Pu; Jiang Zhu; Rui Zheng; Tao Gui; Qi Zhang; Xuanjing Huang; |
114 | StepCoder: Improving Code Generation with Reinforcement Learning from Compiler Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Also, since the unit tests may not cover the complicated code, optimizing LLMs by using these unexecuted code snippets is ineffective. To tackle these challenges, we introduce StepCoder, a novel RL framework for code generation, consisting of two main components: CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks, while FGO only optimizes the model by masking the unexecuted code segments to provide Fine-Grained Optimization. |
Shihan Dou; Yan Liu; Haoxiang Jia; Enyu Zhou; Limao Xiong; Junjie Shan; Caishuang Huang; Xiao Wang; Xiaoran Fan; Zhiheng Xi; Yuhao Zhou; Tao Ji; Rui Zheng; Qi Zhang; Tao Gui; Xuanjing Huang; |
115 | Draft& Verify: Lossless Large Language Model Acceleration Via Self-Speculative Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel inference scheme, self-speculative decoding, for accelerating Large Language Models (LLMs) without the need for an auxiliary model. |
Jun Zhang; Jue Wang; Huan Li; Lidan Shou; Ke Chen; Gang Chen; Sharad Mehrotra; |
116 | Soul-Mix: Enhancing Multimodal Machine Translation with Manifold Mixup Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most previous works ignore to take advantage of the complete textual inputs and the limited textual inputs at the same time, which limits the overall performance. To solve this issue, we propose a mixup method termed Soul-Mix to enhance MMT by using visual information more effectively. |
Xuxin Cheng; Ziyu Yao; Yifei Xin; Hao An; Hongxiang Li; Yaowei Li; Yuexian Zou; |
117 | Skin-in-the-Game: Decision Making Via Multi-Stakeholder Alignment in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the Skin-in-the-Game (SKIG) framework, aimed at enhancing moral reasoning in LLMs by exploring decisions� consequences from multiple stakeholder perspectives. |
Bilgehan Sel; Priya Shanmugasundaram; Mohammad Kachuee; Kun Zhou; Ruoxi Jia; Ming Jin; |
118 | A Glitch in The Matrix? Locating and Detecting Language Model Grounding with Fakepedia Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel method to study grounding abilities using Fakepedia, a novel dataset of counterfactual texts constructed to clash with a model�s internal parametric knowledge. |
Giovanni Monea; Maxime Peyrard; Martin Josifoski; Vishrav Chaudhary; Jason Eisner; Emre Kiciman; Hamid Palangi; Barun Patra; Robert West; |
119 | MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce MathGenie, a novel method for generating diverse and reliable math problems by leveraging the ground-truth solutions of the seed data. |
Zimu Lu; Aojun Zhou; Houxing Ren; Ke Wang; Weikang Shi; Junting Pan; Mingjie Zhan; Hongsheng Li; |
120 | HealMe: Harnessing Cognitive Reframing in Large Language Models for Psychotherapy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we unveil the Helping and Empowering through Adaptive Language in Mental Enhancement (HealMe) model. |
Mengxi Xiao; Qianqian Xie; Ziyan Kuang; Zhicheng Liu; Kailai Yang; Min Peng; Weiguang Han; Jimin Huang; |
121 | M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The need to identify and differentiate such content from genuine human-generated text is critical in combating disinformation, preserving the integrity of education and scientific fields, and maintaining trust in communication. In this work, we address this problem by introducing a new benchmark based on a multilingual, multi-domain and multi-generator corpus of MGTs � M4GT-Bench. |
Yuxia Wang; Jonibek Mansurov; Petar Ivanov; Jinyan Su; Artem Shelmanov; Akim Tsvigun; Osama Mohammed Afzal; Tarek Mahmoud; Giovanni Puccetti; Thomas Arnold; Alham Aji; Nizar Habash; Iryna Gurevych; Preslav Nakov; |
122 | Black-Box Prompt Optimization: Aligning Large Language Models Without Model Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we take a different perspective�Black-Box Prompt Optimization (BPO)�to perform alignments. |
Jiale Cheng; Xiao Liu; Kehan Zheng; Pei Ke; Hongning Wang; Yuxiao Dong; Jie Tang; Minlie Huang; |
123 | ANAH: Analytical Annotation of Hallucinations in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, we present ANAH, a bilingual dataset that offers ANalytical Annotation of Hallucinations in LLMs within Generative Question Answering. |
Ziwei Ji; Yuzhe Gu; Wenwei Zhang; Chengqi Lyu; Dahua Lin; Kai Chen; |
124 | One-Shot Learning As Instruction Data Prospector for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Contemporary practices in instruction tuning often hinge on enlarging data scaling without a clear strategy for ensuring data quality, inadvertently introducing noise that may compromise model performance. To address this challenge, we introduce Nuggets, a novel and efficient methodology that leverages one-shot learning to discern and select high-quality instruction data from extensive datasets. |
Yunshui Li; Binyuan Hui; Xiaobo Xia; Jiaxi Yang; Min Yang; Lei Zhang; Shuzheng Si; Ling-Hao Chen; Junhao Liu; Tongliang Liu; Fei Huang; Yongbin Li; |
125 | L-Eval: Instituting Standardized Evaluation for Long Context Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To facilitate research in this field, we propose L-Eval to institute a more standardized evaluation for Long-Context Language Models (LCLMs) addressing two key aspects: dataset construction and evaluation metrics. |
Chenxin An; Shansan Gong; Ming Zhong; Xingjian Zhao; Mukai Li; Jun Zhang; Lingpeng Kong; Xipeng Qiu; |
126 | SPOR: A Comprehensive and Practical Evaluation Method for Compositional Generalization in Data-to-Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose SPOR, a comprehensive and practical evaluation method for compositional generalization in data-to-text generation. |
Ziyao Xu; Houfeng Wang; |
127 | Detection-Correction Structure Via General Language Model for Grammatical Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces an integrated detection-correction structure, named DeCoGLM, based on the General Language Model (GLM). |
Wei Li; Houfeng Wang; |
128 | Complex Reasoning Over Logical Queries on Commonsense Knowledge Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, data scarcity makes it challenging for language models to learn to generate commonsense infer-ences for contexts and questions involving interactions between complex events. To address this demand, we present COM2 (COMplexCOMmonsense), a new dataset created by sampling multi-hop logical queries (e. g. , the joint effect or cause of both event A and B, or theeffect of the effect of event C) from an existing commonsense knowledge graph (CSKG), and verbalizing them using handcrafted rules andlarge language models into multiple-choice and text generation questions. |
Tianqing Fang; Zeming Chen; Yangqiu Song; Antoine Bosselut; |
129 | Retrieval-Augmented Multilingual Knowledge Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the problem of multilingual knowledge editing, we propose Retrieval-Augmented Multilingual Knowledge Editor (ReMaKE) to update knowledge in LLMs. |
Weixuan Wang; Barry Haddow; Alexandra Birch; |
130 | Emulated Disalignment: Safety Alignment for Large Language Models May Backfire! Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this paper introduces a training-free attack method capable of reversing safety alignment, converting the outcomes of stronger alignment into greater potential for harm by accessing only LLM output token distributions. Specifically, our method achieves this reversal by contrasting the output token distribution of a safety-aligned language model (e. g. , Llama-2-chat) against its pre-trained version (e. g. , Llama-2), so that the token predictions are shifted towards the opposite direction of safety alignment. |
Zhanhui Zhou; Jie Liu; Zhichen Dong; Jiaheng Liu; Chao Yang; Wanli Ouyang; Yu Qiao; |
131 | LangBridge: Multilingual Reasoning Without Multilingual Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce LangBridge, a zero-shot approach to adapt language models for multilingual reasoning tasks without multilingual supervision. |
Dongkeun Yoon; Joel Jang; Sungdong Kim; Seungone Kim; Sheikh Shafayat; Minjoon Seo; |
132 | Who Wrote This Code? Watermarking for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Extending a logit-modifying watermark method, we propose Selective WatErmarking via Entropy Thresholding (SWEET), which enhances detection ability and mitigates code quality degeneration by removing low-entropy segments at generating and detecting watermarks. |
Taehyun Lee; Seokhee Hong; Jaewoo Ahn; Ilgee Hong; Hwaran Lee; Sangdoo Yun; Jamin Shin; Gunhee Kim; |
133 | Pareto Optimal Learning for Estimating Large Language Model Errors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method based on Pareto optimization that generates a risk score to estimate the probability of error in an LLM response by integrating multiple sources of information. |
Theodore Zhao; Mu Wei; J. Preston; Hoifung Poon; |
134 | XCodeEval: An Execution-based Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce *xCodeEval*, the largest executable multilingual multitask benchmark to date consisting of 25 M document-level coding examples (16. |
Mohammad Abdullah Matin Khan; M Saiful Bari; Do Long; Weishi Wang; Md Rizwan Parvez; Shafiq Joty; |
135 | Self-chats from Large Language Models Make Small Emotional Support Chatbot Better Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we utilize LLMs as �Counseling Teacher� to enhance smaller models� emotion support response abilities, significantly reducing the necessity of scaling up model size. |
Zhonghua Zheng; Lizi Liao; Yang Deng; Libo Qin; Liqiang Nie; |
136 | CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the absence of a comprehensive benchmark impedes progress in this field. To bridge this gap, we introduce CharacterEval, a Chinese benchmark for comprehensive RPCA assessment, complemented by a tailored high-quality dataset. |
Quan Tu; Shilong Fan; Zihang Tian; Tianhao Shen; Shuo Shang; Xin Gao; Rui Yan; |
137 | ECBD: Evidence-Centered Benchmark Design for NLP Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: There is currently no principled way of analyzing these decisions and how they impact the validity of the benchmark�s measurements. To address this gap, we draw on evidence-centered design in educational assessments and propose Evidence-Centered Benchmark Design (ECBD), a framework which formalizes the benchmark design process into five modules. |
Yu Liu; Su Blodgett; Jackie Cheung; Vera Liao; Alexandra Olteanu; Ziang Xiao; |
138 | AboutMe: Using Self-Descriptions in Webpages to Document The Effects of English Pretraining Data Filters Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In our work, we ground web text, which is a popular pretraining data source, to its social and geographic contexts. |
Li Lucy; Suchin Gururangan; Luca Soldaini; Emma Strubell; David Bamman; Lauren Klein; Jesse Dodge; |
139 | Advancing Parameter Efficiency in Fine-tuning Via Representation Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the promising performance of current PEFT methods, they present challenges in hyperparameter selection, such as determining the rank of LoRA or Adapter, or specifying the length of soft prompts. In addressing these challenges, we propose a novel approach to fine-tuning neural models, termed Representation EDiting (RED), which scales and biases the representation produced at each layer. |
Muling Wu; Wenhao Liu; Xiaohua Wang; Tianlong Li; Changze Lv; Zixuan Ling; Zhu JianHao; Cenyuan Zhang; Xiaoqing Zheng; Xuanjing Huang; |
140 | EmoBench: Evaluating The Emotional Intelligence of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose EmoBench, a benchmark that draws upon established psychological theories and proposes a comprehensive definition for machine EI, including Emotional Understanding and Emotional Application. |
Sahand Sabour; Siyang Liu; Zheyuan Zhang; June Liu; Jinfeng Zhou; Alvionna Sunaryo; Tatia Lee; Rada Mihalcea; Minlie Huang; |
141 | Having Beer After Prayer? Measuring Cultural Bias in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that multilingual and Arabic monolingual LMs exhibit bias towards entities associated with Western culture. |
Tarek Naous; Michael Ryan; Alan Ritter; Wei Xu; |
142 | REFINESUMM: Self-Refining MLLM for Generating A Multimodal Summarization Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, generating accurate and faithful multimodal summaries is challenging, primarily due to the lack of appropriate multimodal datasets for fine-tuning that meaningfully integrate textual and visual modalities. To address this gap, we present a new dataset designed specifically for image-text multimodal summarization, harnessing the capabilities of state-of-the-art MLLMs. |
Vaidehi Patil; Leonardo Ribeiro; Mengwen Liu; Mohit Bansal; Markus Dreyer; |
143 | Probing The Multi-turn Planning Capabilities of LLMs Via 20 Question Games Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we offer a surrogate problem which assesses an LLMs�s capability to deduce an entity unknown to itself, but revealed to a judge, by asking the judge a series of queries. |
Yizhe Zhang; Jiarui Lu; Navdeep Jaitly; |
144 | DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The rapid rise to prominence of these models and these unique challenges has had immediate adverse impacts on open science and on the reproducibility of work that uses them. In this ACL 2024 theme track paper, we introduce DataDreamer, an open source Python library that allows researchers to write simple code to implement powerful LLM workflows. |
Ajay Patel; Colin Raffel; Chris Callison-Burch; |
145 | RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To assess the red-teaming of RLHF against human preference data poisoning, we propose RankPoison, a poisoning attack method on candidates� selection of preference rank flipping to reach certain malicious behaviors (e. g. , generating longer sequences, which can increase the computational cost). |
Jiongxiao Wang; Junlin Wu; Muhao Chen; Yevgeniy Vorobeychik; Chaowei Xiao; |
146 | INTERS: Unlocking The Power of Large Language Models in Search with Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While prompt-based methods can provide task descriptions to LLMs, they often fall short in facilitating a comprehensive understanding and execution of IR tasks, thereby limiting LLMs� applicability. To address this gap, in this work, we explore the potential of instruction tuning to enhance LLMs� proficiency in IR tasks. |
Yutao Zhu; Peitian Zhang; Chenghao Zhang; Yifei Chen; Binyu Xie; Zheng Liu; Ji-Rong Wen; Zhicheng Dou; |
147 | Word Embeddings Are Steers for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we theoretically and empirically revisit output word embeddings and find that their linear transformations are equivalent to steering language model generation styles. |
Chi Han; Jialiang Xu; Manling Li; Yi Fung; Chenkai Sun; Nan Jiang; Tarek Abdelzaher; Heng Ji; |
148 | WaveCoder: Widespread And Versatile Enhancement For Code Large Language Models By Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we concentrate on multiple code-related tasks and present WaveCoder, a series of Code LLMs trained with Widespread And Versatile Enhanced instruction data. |
Zhaojian Yu; Xin Zhang; Ning Shang; Yangyu Huang; Can Xu; Yishujie Zhao; Wenxiang Hu; Qiufeng Yin; |
149 | ViSAGe: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing approaches for evaluating stereotypes have a noticeable lack of coverage of global identity groups and their associated stereotypes. To address this gap, we introduce the ViSAGe (Visual Stereotypes Around the Globe) dataset to enable the evaluation of known nationality-based stereotypes in T2I models, across 135 nationalities. |
Akshita Jha; Vinodkumar Prabhakaran; Remi Denton; Sarah Laszlo; Shachi Dave; Rida Qadri; Chandan Reddy; Sunipa Dev; |
150 | LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a framework for the automated evaluation of natural language texts. |
Helia Hashemi; Jason Eisner; Corby Rosset; Benjamin Van Durme; Chris Kedzie; |
151 | Empowering Character-level Text Infilling By Eliminating Sub-Tokens Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce FIM-SE, which stands for Fill-In-the-Middle with both Starting and Ending character constraints. |
Houxing Ren; Mingjie Zhan; Zhongyuan Wu; Hongsheng Li; |
152 | ToMBench: Benchmarking Theory of Mind in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing ToM evaluations are hindered by challenges such as constrained scope, subjective judgment, and unintended contamination, yielding inadequate assessments. To address this gap, we introduce ToMBench with three key characteristics: a systematic evaluation framework encompassing 8 tasks and 31 abilities in social cognition, a multiple-choice question format to support automated and unbiased evaluation, and a build-from-scratch bilingual inventory to strictly avoid data leakage. |
Zhuang Chen; Jincenzi Wu; Jinfeng Zhou; Bosi Wen; Guanqun Bi; Gongyao Jiang; Yaru Cao; Mengting Hu; Yunghwei Lai; Zexuan Xiong; Minlie Huang; |
153 | Learning Task Decomposition to Assist Humans in Competitive Programming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel objective for learning task decomposition, termed assistive value (AssistV), which measures the feasibility and speed for humans to repair the decomposed solution. |
Jiaxin Wen; Ruiqi Zhong; Pei Ke; Zhihong Shao; Hongning Wang; Minlie Huang; |
154 | Semiparametric Token-Sequence Co-Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a semiparametric token-sequence co-supervision training method. |
Hyunji Lee; Doyoung Kim; Jihoon Jun; Se June Joo; Joel Jang; Kyoung-Woon On; Minjoon Seo; |
155 | Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present quantized side tuing (QST), which enables memory-efficient and fast finetuning of LLMs by operating through a dual-stage process. |
Zhengxin Zhang; Dan Zhao; Xupeng Miao; Gabriele Oliaro; Zhihao Zhang; Qing Li; Yong Jiang; Zhihao Jia; |
156 | Advancing Abductive Reasoning in Knowledge Graphs Through Complex Logical Hypothesis Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although many applications require the use of knowledge for explanations, the utilization of abductive reasoning in conjunction with structured knowledge, such as a knowledge graph, remains largely unexplored. To fill this gap, this paper introduces the task of complex logical hypothesis generation, as an initial step towards abductive logical reasoning with KG. |
Jiaxin Bai; Yicheng Wang; Tianshi Zheng; Yue Guo; Xin Liu; Yangqiu Song; |
157 | Not All Experts Are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we propose, for the first time to our best knowledge, post-training approaches for task-agnostic and task-specific expert pruning and skipping of MoE LLMs, tailored to improve deployment efficiency while maintaining model performance across a wide range of tasks. |
Xudong Lu; Qi Liu; Yuhui Xu; Aojun Zhou; Siyuan Huang; Bo Zhang; Junchi Yan; Hongsheng Li; |
158 | SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the success of subgoal-based methods, we propose a novel framework called SEquential subGoal Optimization (SEGO) to enhance LLMs� ability to solve mathematical problems. |
Xueliang Zhao; Xinting Huang; Wei Bi; Lingpeng Kong; |
159 | OceanGPT: A Large Language Model for Ocean Science Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose DoInstruct, a novel framework to automatically obtain a large volume of ocean domain instruction data, which generates instructions based on multi-agent collaboration. |
Zhen Bi; Ningyu Zhang; Yida Xue; Yixin Ou; Daxiong Ji; Guozhou Zheng; Huajun Chen; |
160 | ProtT3: Protein-to-Text Generation for Text-based Protein Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address their limitations, we introduce ProtT3, a framework for Protein-to-Text Generation for Text-based Protein Understanding. |
Zhiyuan Liu; An Zhang; Hao Fei; Enzhi Zhang; Xiang Wang; Kenji Kawaguchi; Tat-Seng Chua; |
161 | DeCoT: Debiasing Chain-of-Thought for Knowledge-Intensive Tasks in Large Language Models Via Causal Intervention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel causal view to formally explain the internal knowledge bias of LLMs via a Structural Causal Model (SCM). |
Junda Wu; Tong Yu; Xiang Chen; Haoliang Wang; Ryan Rossi; Sungchul Kim; Anup Rao; Julian McAuley; |
162 | Toward In-Context Teaching: Adapting Examples to Students� Misconceptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: But how effectively can these models adapt as teachers to students of different types? To study this question, we introduce a suite of models and evaluation methods we call AdapT. |
Alexis Ross; Jacob Andreas; |
163 | How to Engage Your Readers? Generating Guiding Questions to Promote Active Reading Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce GuidingQ, a dataset of 10K in-text questions from textbooks and scientific articles. |
Peng Cui; Vil�m Zouhar; Xiaoyu Zhang; Mrinmaya Sachan; |
164 | MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Taking autoregressive models as an example, although these approaches achieve high-fidelity voice cloning, they fall short in terms of inference speed, model size, and robustness. Therefore, we propose MobileSpeech, which is a fast, lightweight, and robust zero-shot text-to-speech system based on mobile devices for the first time. |
Shengpeng Ji; Ziyue Jiang; Hanting Wang; Jialong Zuo; Zhou Zhao; |
165 | Is The Pope Catholic? Yes, The Pope Is Catholic. Generative Evaluation of Non-Literal Intent Resolution in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While most existing work has focused on discriminative evaluations, we present a new approach to generatively evaluate large language models\u2019 (LLMs\u2019) intention understanding by examining their responses to non-literal utterances. Ideally, an LLM should respond in line with the true intention of a non-literal utterance, not its literal interpretation. |
Akhila Yerukola; Saujas Vaduguru; Daniel Fried; Maarten Sap; |
166 | IEPile: Unearthing Large Scale Schema-Conditioned Information Extraction Corpus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce IEPile, a comprehensive bilingual (English and Chinese) IE instruction corpus, which contains approximately 0. |
Honghao Gui; Lin Yuan; Hongbin Ye; Ningyu Zhang; Mengshu Sun; Lei Liang; Huajun Chen; |
167 | Mission: Impossible Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we develop a set of synthetic impossible languages of differing complexity, each designed by systematically altering English data with unnatural word orders and grammar rules. |
Julie Kallini; Isabel Papadimitriou; Richard Futrell; Kyle Mahowald; Christopher Potts; |
168 | What Do Language Models Hear? Probing for Auditory Representations in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: After training, the probe is tested on its ability to generalize to objects that were not seen during training. Across different language models and audio models, we find that the probe generalization is above chance in many cases, indicating that despite being trained only on raw text, language models encode grounded knowledge of sounds for some objects. |
Jerry Ngo; Yoon Kim; |
169 | SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel Shared Attention Framework (SAPT), to align the PET learning and selection via the Shared Attentive Learning & Selection module. |
Weixiang Zhao; Shilong Wang; Yulin Hu; Yanyan Zhao; Bing Qin; Xuanyu Zhang; Qing Yang; Dongliang Xu; Wanxiang Che; |
170 | Rule or Story, Which Is A Better Commonsense Expression for Talking with Large Language Models? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates the inherent commonsense ability of large language models (LLMs) expressed through storytelling. |
Ning Bian; Xianpei Han; Hongyu Lin; Yaojie Lu; Ben He; Le Sun; |
171 | TruthX: Alleviating Hallucinations By Editing Large Language Models in Truthful Space Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose TruthX, an inference-time intervention method to activate the truthfulness of LLM by identifying and editing the features within LLM�s internal representations that govern the truthfulness. |
Shaolei Zhang; Tian Yu; Yang Feng; |
172 | StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose StreamSpeech, a direct Simul-S2ST model that jointly learns translation and simultaneous policy in a unified framework of multi-task learning. |
Shaolei Zhang; Qingkai Fang; Shoutao Guo; Zhengrui Ma; Min Zhang; Yang Feng; |
173 | FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To evaluate complex reasoning in LLMs more fully, we present FanOutQA, a high-quality dataset of fan-out question-answer pairs and human-annotated decompositions with English Wikipedia as the knowledge base. |
Andrew Zhu; Alyssa Hwang; Liam Dugan; Chris Callison-Burch; |
174 | Grounding Language Model with Chunking-Free In-Context Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel Chunking-Free In-Context (CFIC) retrieval approach, specifically tailored for Retrieval-Augmented Generation (RAG) systems. |
Hongjin Qian; Zheng Liu; Kelong Mao; Yujia Zhou; Zhicheng Dou; |
175 | Large Language Models Can Learn Temporal Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose TG-LLM, a novel framework towards language-based TR. |
Siheng Xiong; Ali Payani; Ramana Kompella; Faramarz Fekri; |
176 | Can LLMs Learn from Previous Mistakes? Investigating LLMs� Errors to Boost for Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We ultimately make a thorough analysis of the reasons behind LLMs� errors, which provides directions that future research needs to overcome. |
Yongqi Tong; Dawei Li; Sizhe Wang; Yujia Wang; Fei Teng; Jingbo Shang; |
177 | MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous benchmarks have primarily focused on single-turn dialogues or provided coarse-grained and incomplete assessments of multi-turn dialogues, overlooking the complexity and fine-grained nuances of real-life dialogues. To address this issue, we introduce MT-Bench-101, specifically designed to evaluate the fine-grained abilities of LLMs in multi-turn dialogues. |
Ge Bai; Jie Liu; Xingyuan Bu; Yancheng He; Jiaheng Liu; Zhanhui Zhou; Zhuoran Lin; Wenbo Su; Tiezheng Ge; Bo Zheng; Wanli Ouyang; |
178 | DiffuCOMET: Contextual Commonsense Knowledge Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop a series of knowledge models, DiffuCOMET, that leverage diffusion to learn to reconstruct the implicit semantic connections between narrative contexts and relevant commonsense knowledge. |
Silin Gao; Mete Ismayilzada; Mengjie Zhao; Hiromi Wakaki; Yuki Mitsufuji; Antoine Bosselut; |
179 | RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we present RAID: the largest and most challenging benchmark dataset for machine-generated text detection. |
Liam Dugan; Alyssa Hwang; Filip Trhl�k; Andrew Zhu; Josh Magnus Ludan; Hainiu Xu; Daphne Ippolito; Chris Callison-Burch; |
180 | Parallel Structures in Pre-training Data Yield In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study what patterns of the pre-training data contribute to ICL. |
Yanda Chen; Chen Zhao; Zhou Yu; Kathleen McKeown; He He; |
181 | MAGE: Machine-generated Text Detection in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we build a comprehensive testbed by gathering texts from diverse human writings and deepfake texts generated by different LLMs. |
Yafu Li; Qintong Li; Leyang Cui; Wei Bi; Zhilin Wang; Longyue Wang; Linyi Yang; Shuming Shi; Yue Zhang; |
182 | Iterative Forward Tuning Boosts In-Context Learning in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we introduce a novel two-stage framework to boost ICL in LLMs. |
Jiaxi Yang; Binyuan Hui; Min Yang; Bailin Wang; Bowen Li; Binhua Li; Fei Huang; Yongbin Li; |
183 | Lightweight Reranking for Language Model Generations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel approach for reranking LLM generations. |
Siddhartha Jain; Xiaofei Ma; Anoop Deoras; Bing Xiang; |
184 | LLaMA Pro: Progressive LLaMA with Block Expansion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a new post-pretraining method for LLMs with an expansion of Transformer blocks. |
Chengyue Wu; Yukang Gan; Yixiao Ge; Zeyu Lu; Jiahao Wang; Ye Feng; Ying Shan; Ping Luo; |
185 | On The Multi-turn Instruction Following for Conversational Web Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment, supported by a specially developed dataset named Multi-Turn Mind2Web (MT-Mind2Web). |
Yang Deng; Xuan Zhang; Wenxuan Zhang; Yifei Yuan; See-Kiong Ng; Tat-Seng Chua; |
186 | CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing works tend to undervalue the step of instantiation and heavilyrely on pre-built concept taxonomies and human annotations to collect both types of knowledge, resulting in a lack of instantiated knowledge to complete reasoning, high cost, and limited scalability. To tackle these challenges, we introduce CANDLE (ConceptuAlizationand INstantiation Distillation from Large Language ModEls), a distillation framework that iteratively performs contextualized conceptualization and instantiation over commonsense knowledge bases by instructing large language models to generate both types of knowledge with critic filtering. |
Weiqi Wang; Tianqing Fang; Chunyang Li; Haochen Shi; Wenxuan Ding; Baixuan Xu; Zhaowei Wang; Jiaxin Bai; Xin Liu; Cheng Jiayang; Chunkit Chan; Yangqiu Song; |
187 | Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector Would Be Better Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, we propose a novel fine-tuned detector, PECOLA, bridging metric-based and fine-tuned methods by contrastive learning on selective perturbation. |
Shengchao Liu; Xiaoming Liu; Yichen Wang; Zehua Cheng; Chengzhengxu Li; Zhaohan Zhang; Yu Lan; Chao Shen; |
188 | Text-like Encoding of Collaborative Information in Large Language Models for Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they fail to represent the information in a text-like format, which may not align optimally with LLMs. To bridge this gap, we introduce BinLLM, a novel LLMRec method that seamlessly integrates collaborative information through text-like encoding. |
Yang Zhang; Keqin Bao; Ming Yan; Wenjie Wang; Fuli Feng; Xiangnan He; |
189 | Rethinking The Bounds of LLM Reasoning: Are Multi-Agent Discussions The Key? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent progress in LLMs discussion suggests that multi-agent discussion improves the reasoning abilities of LLMs. In this work, we reevaluate this claim through systematic experiments, where we propose a novel group discussion framework to enrich the set of discussion mechanisms. |
Qineng Wang; Zihao Wang; Ying Su; Hanghang Tong; Yangqiu Song; |
190 | Soft Self-Consistency Improves Language Models Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Soft Self-Consistency (SOFT-SC), which replaces SC�s discontinuous scoring with a continuous score computed from model likelihoods, allowing for selection even when actions are sparsely distributed. |
Han Wang; Archiki Prasad; Elias Stengel-Eskin; Mohit Bansal; |
191 | PrivLM-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present PrivLM-Bench, a multi-perspective privacy evaluation benchmark to empirically and intuitively quantify the privacy leakage of LMs. |
Haoran Li; Dadi Guo; Donghao Li; Wei Fan; Qi Hu; Xin Liu; Chunkit Chan; Duanyi Yao; Yuan Yao; Yangqiu Song; |
192 | Analyzing Temporal Complex Events with Large Language Models? A Benchmark Towards Temporal, Long Context Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel approach using Large Language Models (LLMs) to systematically extract and analyze the event chain within TCE, characterized by their key points and timestamps. |
Zhihan Zhang; Yixin Cao; Chenchen Ye; Yunshan Ma; Lizi Liao; Tat-Seng Chua; |
193 | CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, CLAMBER presents a guidance and promotes further research on proactive and trustworthy LLMs. |
Tong Zhang; Peixin Qin; Yang Deng; Chen Huang; Wenqiang Lei; Junhong Liu; Dingnan Jin; Hongru Liang; Tat-Seng Chua; |
194 | Learning Global Controller in Latent Space for Parameter-Efficient Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce an innovative parameter-efficient method for exploring optimal solutions within latent space. |
Zeqi Tan; Yongliang Shen; Xiaoxia Cheng; Chang Zong; Wenqi Zhang; Jian Shao; Weiming Lu; Yueting Zhuang; |
195 | Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Conversely, we develop a method that avoids external resources, relying instead on introducing perturbations to the input. |
Changyu Chen; Xiting Wang; Ting-En Lin; Ang Lv; Yuchuan Wu; Xin Gao; Ji-Rong Wen; Rui Yan; Yongbin Li; |
196 | ReConcile: Round-Table Conference Improves Reasoning Via Consensus Among Diverse LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the society of minds (Minsky, 1988), we propose ReConcile, a multi-model multi-agent framework designed as a round table conference among diverse LLM agents. |
Justin Chen; Swarnadeep Saha; Mohit Bansal; |
197 | OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the Open Whisper-style Speech Model (OWSM) project, we propose OWSM-CTC, a novel encoder-only speech foundation model based on Connectionist Temporal Classification (CTC). |
Yifan Peng; Yui Sudo; Muhammad Shakeel; Shinji Watanabe; |
198 | Steering Llama 2 Via Contrastive Activation Addition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Contrastive Activation Addition (CAA), a method for steering language models by modifying their activations during forward passes. |
Nina Rimsky; Nick Gabrieli; Julian Schulz; Meg Tong; Evan Hubinger; Alexander Turner; |
199 | Your Transformer Is Secretly Linear Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper reveals a novel linear characteristic exclusive to transformer decoders, including models like GPT, LLaMA, OPT, BLOOM and others. |
Anton Razzhigaev; Matvey Mikhalchuk; Elizaveta Goncharova; Nikolai Gerasimenko; Ivan Oseledets; Denis Dimitrov; Andrey Kuznetsov; |
200 | Surgical Feature-Space Decomposition of LLMs: Why, When and How? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we empirically study the efficacy of weight and feature space decomposition in transformer-based LLMs. |
Arnav Chavan; Nahush Lele; Deepak Gupta; |
201 | KnowledgeFMath: A Knowledge-Intensive Math Reasoning Dataset in Finance Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce KnowledgeFMath, a novel benchmark designed to evaluate LLMs� capabilities in solving knowledge-intensive math reasoning problems. |
Yilun Zhao; Hongjun Liu; Yitao Long; Rui Zhang; Chen Zhao; Arman Cohan; |
202 | DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Financial Documents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces DocMath-Eval, a comprehensive benchmark specifically designed to evaluate the numerical reasoning capabilities of LLMs in the context of understanding and analyzing financial documents containing both text and tables. |
Yilun Zhao; Yitao Long; Hongjun Liu; Ryo Kamoi; Linyong Nan; Lyuhao Chen; Yixin Liu; Xiangru Tang; Rui Zhang; Arman Cohan; |
203 | Detoxifying Large Language Models Via Knowledge Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper investigates using knowledge editing techniques to detoxify Large Language Models (LLMs). |
Mengru Wang; Ningyu Zhang; Ziwen Xu; Zekun Xi; Shumin Deng; Yunzhi Yao; Qishen Zhang; Linyi Yang; Jindong Wang; Huajun Chen; |
204 | Unveiling Linguistic Regions in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: From the perspective of region partitioning, this paper conducts several investigations on the linguistic competence of LLMs. |
Zhihao Zhang; Jun Zhao; Qi Zhang; Tao Gui; Xuanjing Huang; |
205 | Harder Task Needs More Experts: Dynamic Routing in MoE Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel dynamic expert selection framework for Mixture of Experts (MoE) models, aiming to enhance computational efficiency and model performance by adjusting the number of activated experts based on input difficulty. |
Quzhe Huang; Zhenwei An; Nan Zhuang; Mingxu Tao; Chen Zhang; Yang Jin; Kun Xu; Kun Xu; Liwei Chen; Songfang Huang; Yansong Feng; |
206 | Enhancing Contrastive Learning with Noise-Guided Attack: Towards Continual Relation Extraction in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recognizing the prevalence of noisy labels in real-world datasets, we introduce a more practical learning scenario, termed as noisy-CRE. In response to this challenge, we propose a noise-resistant contrastive framework called Noise-guided Attack in Contrastive Learning (NaCL), aimed at learning incremental corrupted relations. |
Ting Wu; Jingyi Liu; Rui Zheng; Tao Gui; Qi Zhang; Xuanjing Huang; |
207 | EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce EXAMS-V, a new challenging multi-discipline multimodal multilingual exam benchmark for evaluating vision language models. |
Rocktim Das; Simeon Hristov; Haonan Li; Dimitar Dimitrov; Ivan Koychev; Preslav Nakov; |
208 | PsychoGAT: A Novel Psychological Measurement Paradigm Through Interactive Fiction Games with LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose PsychoGAT (Psychological Game AgenTs) to achieve a generic gamification of psychological assessment. |
Qisen Yang; Zekun Wang; Honghui Chen; Shenzhi Wang; Yifan Pu; Xin Gao; Wenhao Huang; Shiji Song; Gao Huang; |
209 | Label-Efficient Model Selection for Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce DiffUse, an efficient method to make an informed decision between candidate text generation models based on preference annotations. |
Shir Ashury Tahan; Ariel Gera; Benjamin Sznajder; Leshem Choshen; Liat Ein-Dor; Eyal Shnarch; |
210 | A Multi-Task Embedder For Retrieval Augmented LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose LLM-Embedder for the unified support of diverse retrieval augmentation scenarios. |
Peitian Zhang; Zheng Liu; Shitao Xiao; Zhicheng Dou; Jian-Yun Nie; |
211 | Aligning Large Language Models with Human Preferences Through Representation Engineering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nevertheless, RLHF is susceptible to instability during fine-tuning and presents challenges in implementation. Drawing inspiration from the emerging field of representation engineering (RepE), this study aims to identify relevant representations for high-level human preferences embedded in patterns of activity within an LLM and achieve precise control of model behavior by transforming its representations. |
Wenhao Liu; Xiaohua Wang; Muling Wu; Tianlong Li; Changze Lv; Zixuan Ling; Zhu JianHao; Cenyuan Zhang; Xiaoqing Zheng; Xuanjing Huang; |
212 | Chain-of-Exemplar: Enhancing Distractor Generation for Multimodal Educational Question Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study the problem of multimodal educational question generation, which aims at generating subject-specific educational questions with plausible yet incorrect distractors based on multimodal educational content. |
Haohao Luo; Yang Deng; Ying Shen; See-Kiong Ng; Tat-Seng Chua; |
213 | PRewrite: Prompt Rewriting with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose PRewrite, an automated method to rewrite an under-optimized prompt to a more effective prompt. |
Weize Kong; Spurthi Hombaiah; Mingyang Zhang; Qiaozhu Mei; Michael Bendersky; |
214 | CLOMO: Counterfactual Logical Modification with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we delve into the realm of counterfactual reasoning capabilities of large language models (LLMs). |
Yinya Huang; Ruixin Hong; Hongming Zhang; Wei Shao; Zhicheng Yang; Dong Yu; Changshui Zhang; Xiaodan Liang; Linqi Song; |
215 | Document-Level Machine Translation with Large-Scale Public Parallel Corpora Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We release a large-scale open parallel corpus with document context extracted from ParaCrawl in five language pairs, along with code to compile document-level datasets for any language pair supported by ParaCrawl. We train context-aware models on these datasets and find improvements in terms of overall translation quality and targeted document-level phenomena. |
Proyag Pal; Alexandra Birch; Kenneth Heafield; |
216 | From Sights to Insights: Towards Summarization of Multimodal Clinical Documents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, patients often include images of their medical conditions in clinical documents. To effectively summarize these multimodal documents, we introduce EDI-Summ, an innovative Image-Guided Encoder-Decoder Model. |
Akash Ghosh; Mohit Tomar; Abhisek Tiwari; Sriparna Saha; Jatin Salve; Setu Sinha; |
217 | FOFO: A Benchmark to Evaluate LLMs� Format-Following Capability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents FoFo, a pioneering benchmark for evaluating large language models� (LLMs) ability to follow complex, domain-specific formats, a crucial yet under-examined capability for their application as AI agents. |
Congying Xia; Chen Xing; Jiangshu Du; Xinyi Yang; Yihao Feng; Ran Xu; Wenpeng Yin; Caiming Xiong; |
218 | PlatoLM: Teaching LLMs in Multi-Round Dialogue Via A User Simulator Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, due to challenges in gathering dialogues involving human participation, current endeavors like Baize and UltraChat rely on ChatGPT conducting roleplay to simulate humans based on instructions, resulting in overdependence on seeds, diminished human-likeness, limited topic diversity, and an absence of genuine multi-round conversational dynamics. To address the above issues, we propose a paradigm to simulate human behavior better and explore the benefits of incorporating more human-like questions in multi-turn conversations. |
Chuyi Kong; Yaxin Fan; Xiang Wan; Feng Jiang; Benyou Wang; |
219 | WebVoyager: Building An End-to-End Web Agent with Large Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing web agents typically only handle one input modality and are evaluated only in simplified web simulators or static web snapshots, greatly limiting their applicability in real-world scenarios. To bridge this gap, we introduce WebVoyager, an innovative Large Multimodal Model (LMM) powered web agent that can complete user instructions end-to-end by interacting with real-world websites. |
Hongliang He; Wenlin Yao; Kaixin Ma; Wenhao Yu; Yong Dai; Hongming Zhang; Zhenzhong Lan; Dong Yu; |
220 | Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on enabling LLMs to listen to the speaking styles and respond properly. |
Guan-Ting Lin; Cheng-Han Chiang; Hung-yi Lee; |
221 | Self-Modifying State Modeling for Simultaneous Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Besides, building decision paths requires unidirectional encoders to simulate streaming source inputs, which impairs the translation quality of SiMT models. To solve these issues, we propose Self-Modifying State Modeling (SM2), a novel training paradigm for SiMT task. |
Donglei Yu; Xiaomian Kang; Yuchen Liu; Yu Zhou; Chengqing Zong; |
222 | Learning to Decode Collaboratively with Multiple Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a method to teach multiple large language models (LLM) to collaborate by interleaving their generations at the token level. |
Zejiang Shen; Hunter Lang; Bailin Wang; Yoon Kim; David Sontag; |
223 | Ask Again, Then Fail: Large Language Models� Vacillations in Judgment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This wavering presents a significant challenge for generating reliable responses and building user trust. To comprehensively assess this issue, we introduce a Follow-up Questioning Mechanism along with two metrics to quantify this inconsistency, confirming its widespread presence in current large language models. |
Qiming Xie; Zengzhi Wang; Yi Feng; Rui Xia; |
224 | Dodo: Dynamic Contextual Compression for Decoder-only LMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Dodo, a solution for context compression. |
Guanghui Qin; Corby Rosset; Ethan Chau; Nikhil Rao; Benjamin Van Durme; |
225 | When Is Tree Search Useful for LLM Planning? It Depends on The Discriminator Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we examine how large language models (LLMs) solve multi-step problems under a language agent framework with three components: a generator, a discriminator, and a planning method. |
Ziru Chen; Michael White; Ray Mooney; Ali Payani; Yu Su; Huan Sun; |
226 | LLM Knows Body Language, Too: Translating Speech Voices Into Human Gestures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the remarkable achievements of recent research, the generation process frequently includes unintended, meaningless, or non-realistic gestures. To address this challenge, we propose a gesture translation paradigm, GesTran, which leverages large language models (LLMs) to deepen the understanding of the connection between speech and gesture and sequentially generates human gestures by interpreting gestures as a unique form of body language. |
Chenghao Xu; Guangtao Lyu; Jiexi Yan; Muli Yang; Cheng Deng; |
227 | Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Achieving this target presents notable challenges, including inbuilt visual memory and visual recall schemes within MLLMs. To address these challenges, we introduce a generative cross-modal retrieval framework, which assigns unique identifier strings to represent images and involves two training steps: learning to memorize and learning to retrieve. |
Yongqi Li; Wenjie Wang; Leigang Qu; Liqiang Nie; Wenjie Li; Tat-Seng Chua; |
228 | When Good and Reproducible Results Are A Giant with Feet of Clay: The Importance of Software Quality in NLP Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As countermeasures, we release pangoliNN, a library dedicated to testing neural models, and propose a Code-quality Checklist, with the goal of promoting coding best practices and improving software quality within the NLP community. |
Sara Papi; Marco Gaido; Andrea Pilzer; Matteo Negri; |
229 | A Joint Coreference-Aware Approach to Document-Level Target Sentiment Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we first manually annotate the coreferential opinion targets and propose a multi-task learning framework to jointly model the DTSA task and the coreference resolution task. |
Hongjie Cai; Heqing Ma; Jianfei Yu; Rui Xia; |
230 | Reflect-RL: Two-Player Online RL Fine-Tuning for LMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Reflect-RL, a two-player system to fine-tune an LM using SFT and online RL, where a frozen reflection model (player) assists the policy model (player). |
Runlong Zhou; Simon Du; Beibin Li; |
231 | T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a two-stage sign language production (SLP) paradigm that first encodes sign language sequences into discrete codes and then autoregressively generates sign language from text based on the learned codebook. |
Aoxiong Yin; Haoyuan Li; Kai Shen; Siliang Tang; Yueting Zhuang; |
232 | Do Large Language Models Latently Perform Multi-Hop Reasoning? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study whether Large Language Models (LLMs) latently perform multi-hop reasoning with complex prompts such as �The mother of the singer of �Superstition� is�. |
Sohee Yang; Elena Gribovskaya; Nora Kassner; Mor Geva; Sebastian Riedel; |
233 | MAVEN-ARG: Completing The Puzzle of All-in-One Event Understanding Dataset with Event Argument Annotation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce MAVEN-Arg, which augments MAVEN datasets with event argument annotations, making the first all-in-one dataset supporting event detection, event argument extraction (EAE), and event relation extraction. |
Xiaozhi Wang; Hao Peng; Yong Guan; Kaisheng Zeng; Jianhui Chen; Lei Hou; Xu Han; Yankai Lin; Zhiyuan Liu; Ruobing Xie; Jie Zhou; Juanzi Li; |
234 | Learning to Plan and Generate Text with Citations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose two attribution models that utilize different variants of blueprints, an abstractive model where questions are generated from scratch, and an extractive model where questions are copied from the input. |
Constanza Fierro; Reinald Kim Amplayo; Fantine Huot; Nicola De Cao; Joshua Maynez; Shashi Narayan; Mirella Lapata; |
235 | Exploring Memorization in Fine-tuned Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we conduct the first comprehensive analysis to explore language models� (LMs) memorization during fine-tuning across tasks. |
Shenglai Zeng; Yaxin Li; Jie Ren; Yiding Liu; Han Xu; Pengfei He; Yue Xing; Shuaiqiang Wang; Jiliang Tang; Dawei Yin; |
236 | Why Are Sensitive Functions Hard for Transformers? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We prove that, under the transformer architecture, the loss landscape is constrained by the input-space sensitivity: Transformers whose output is sensitive to many parts of the input string inhabit isolated points in parameter space, leading to a low-sensitivity bias in generalization. |
Michael Hahn; Mark Rofin; |
237 | Learning Disentangled Semantic Spaces of Explanations Via Invertible Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on a more general form of sentence disentanglement, targeting the localised modification and control of more general sentence semantic features. |
Yingji Zhang; Danilo Carvalho; Andre Freitas; |
238 | Inducing Systematicity in Transformers By Attending to Structurally Quantized Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We observe that when the training set is sufficiently complex, the model encodes structurally equivalent sentences using a systematic attention pattern. Inspired by this observation, we propose SQ-Transformer (Structurally Quantized) that explicitly encourages systematicity in the embeddings and attention layers even with low-complexity data. |
Yichen Jiang; Xiang Zhou; Mohit Bansal; |
239 | WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce WaterBench, the first comprehensive benchmark for LLM watermarks, in which we design three crucial factors: (1) For benchmarking procedure, to ensure an apples-to-apples comparison, we first adjust each watermarking method�s hyper-parameter to reach the same watermarking strength, then jointly evaluate their generation and detection performance. |
Shangqing Tu; Yuliang Sun; Yushi Bai; Jifan Yu; Lei Hou; Juanzi Li; |
240 | Jailbreak Open-Sourced Large Language Models Via Enforced Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A natural question is \u201ccould alignment really prevent those open-sourced large language models from being misused to generate undesired content?\u201d. In this work, we provide a negative answer to this question. |
Hangfan Zhang; Zhimeng Guo; Huaisheng Zhu; Bochuan Cao; Lu Lin; Jinyuan Jia; Jinghui Chen; Dinghao Wu; |
241 | Latxa: An Open Language Model and Evaluation Suite for Basque Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Latxa, a family of large language models for Basque ranging from 7 to 70 billion parameters. |
Julen Etxaniz; Oscar Sainz; Naiara Miguel; Itziar Aldabe; German Rigau; Eneko Agirre; Aitor Ormazabal; Mikel Artetxe; Aitor Soroa; |
242 | ArtPrompt: ASCII Art-based Jailbreak Attacks Against Aligned LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel ASCII art-based jailbreak attack and introduce a comprehensive benchmark Vision-in-Text Challenge (ViTC) to evaluate the capabilities of LLMs in recognizing prompts that cannot be solely interpreted by semantics. |
Fengqing Jiang; Zhangchen Xu; Luyao Niu; Zhen Xiang; Bhaskar Ramasubramanian; Bo Li; Radha Poovendran; |
243 | Confidence Is Not Timeless: Modeling Temporal Validity for Rule-based Temporal Knowledge Graph Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, inaccurate and heuristic confidence estimation limits the performance of rule-based methods. To alleviate such issues, we propose a framework named TempValid to explicitly model the temporal validity of rules for TKGF. |
Rikui Huang; Wei Wei; Xiaoye Qu; Shengzhe Zhang; Dangyang Chen; Yu Cheng; |
244 | KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce KIEval, a Knowledge-grounded Interactive Evaluation framework, which incorporates an LLM-powered �interactor� role for the first time to accomplish a dynamic contamination-resilient evaluation. |
Zhuohao Yu; Chang Gao; Wenjin Yao; Yidong Wang; Wei Ye; Jindong Wang; Xing Xie; Yue Zhang; Shikun Zhang; |
245 | AFaCTA: Assisting The Annotation of Factual Claim Detection with Reliable LLM Annotators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, factual claim detection, the first step in a fact-checking pipeline, suffers from two key issues that limit its scalability and generalizability: (1) inconsistency in definitions of the task and what a claim is, and (2) the high cost of manual annotation. To address (1), we review the definitions in related work and propose a unifying definition of factual claims that focuses on verifiability. To address (2), we introduce AFaCTA (Automatic Factual Claim deTection Annotator), a novel framework that assists in the annotation of factual claims with the help of large language models (LLMs). |
Jingwei Ni; Minjing Shi; Dominik Stammbach; Mrinmaya Sachan; Elliott Ash; Markus Leippold; |
246 | InstructProtein: Aligning Human and Protein Language Via Knowledge Instruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Herein, we introduce a knowledge graph-based instruction generation framework to construct a high-quality instruction dataset, addressing the annotation imbalance and the absence of instructional signals in the existing protein-text corpus. |
Zeyuan Wang; Qiang Zhang; Keyan Ding; Ming Qin; Xiang Zhuang; Xiaotong Li; Huajun Chen; |
247 | Never Lost in The Middle: Mastering Long-Context Question Answering with Position-Agnostic Decompositional Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The �lost in the middle� problem challenges most LLMs, referring to the dramatic decline in accuracy when correct information is located in the middle. To overcome this crucial issue, this paper proposes to enhance the information searching and reflection ability of LLMs in long contexts via specially designed tasks called Position-Agnostic Multi-step QA (PAM QA). |
Junqing He; Kunhao Pan; Xiaoqun Dong; Zhuoyang Song; LiuYiBo LiuYiBo; Qianguosun Qianguosun; Yuxin Liang; Hao Wang; Enming Zhang; Jiaxing Zhang; |
248 | LLMs in The Imaginarium: Tool Learning Through Simulated Trial and Error Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE), that orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory. |
Boshi Wang; Hao Fang; Jason Eisner; Benjamin Van Durme; Yu Su; |
249 | Using Synchronic Definitions and Semantic Relations to Classify Semantic Change Types Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: There is abundant evidence of the fact that the way words change their meaning can be classified in different types of change, highlighting the relationship between the old and new meanings (among which generalisation, specialisation and co-hyponymy transfer). In this paper, we present a way of detecting these types of change by constructing a model that leverages information both from synchronic lexical relations and definitions of word meanings. |
Pierluigi Cassotti; Stefano De Pascale; Nina Tahmasebi; |
250 | BitDistiller: Unleashing The Potential of Sub-4-Bit LLMs Via Self-Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces BitDistiller, a framework that synergizes Quantization-Aware Training (QAT) with Knowledge Distillation (KD) to boost the performance of LLMs at ultra-low precisions (sub-4-bit). |
DaYou Du; Yijia Zhang; Shijie Cao; Jiaqi Guo; Ting Cao; Xiaowen Chu; Ningyi Xu; |
251 | Greed Is All You Need: An Evaluation of Tokenizer Inference Methods Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We provide a controlled analysis of seven tokenizer inference methods across four different algorithms and three vocabulary sizes, performed on a novel intrinsic evaluation suite we curated for English, combining measures rooted in morphology, cognition, and information theory. |
Omri Uzan; Craig W. Schmidt; Chris Tanner; Yuval Pinter; |
252 | Unified Hallucination Detection for Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel meta-evaluation benchmark, MHaluBench, meticulously crafted to facilitate the evaluation of advancements in hallucination detection methods. |
Xiang Chen; Chenxi Wang; Yida Xue; Ningyu Zhang; Xiaoyan Yang; Qiang Li; Yue Shen; Lei Liang; Jinjie Gu; Huajun Chen; |
253 | Emergent Word Order Universals from Cognitively-Motivated Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study word-order universals through a computational simulation with language models (LMs). |
Tatsuki Kuribayashi; Ryo Ueda; Ryo Yoshida; Yohei Oseki; Ted Briscoe; Timothy Baldwin; |
254 | Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning Over Image Sequences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current MLLM benchmarks are predominantly designed to evaluate reasoning based on static information about a single image, and the ability of modern MLLMs to extrapolate from image sequences, which is essential for understanding our ever-changing world, has been less investigated. To address this challenge, this paper introduces Mementos, a new benchmark designed to assess MLLMs� sequential image reasoning abilities. |
Xiyao Wang; Yuhang Zhou; Xiaoyu Liu; Hongjin Lu; Yuancheng Xu; Feihong He; Jaehong Yoon; Taixi Lu; Fuxiao Liu; Gedas Bertasius; Mohit Bansal; Huaxiu Yao; Furong Huang; |
255 | Visualization Recommendation with Prompt-based Reprogramming of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce a novel Hierarchical Table Prompt-based reprogramming framework, named HTP. |
Xinhang Li; Jingbo Zhou; Wei Chen; Derong Xu; Tong Xu; Enhong Chen; |
256 | Modality-Aware Integration with Large Language Models for Knowledge-Based Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle these, we present a novel modality-aware integration with LLMs for KVQA (MAIL). |
Junnan Dong; Qinggang Zhang; Huachi Zhou; Daochen Zha; Pai Zheng; Xiao Huang; |
257 | IMBUE: Improving Interpersonal Effectiveness Through Simulation and Just-in-time Feedback with Human-Language Model Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we conduct a human-centered study that uses language models to simulate bespoke communication training and provide just-in-time feedback to support the practice and learning of interpersonal effectiveness skills. |
Inna Lin; Ashish Sharma; Christopher Rytting; Adam Miner; Jina Suh; Tim Althoff; |
258 | Persuading Across Diverse Domains: A Dataset and Persuasion Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we leverage GPT-4 to create the first multi-domain persuasive dialogue dataset DailyPersuasion. |
Chuhao Jin; Kening Ren; Lingzhen Kong; Xiting Wang; Ruihua Song; Huan Chen; |
259 | CopyNE: Better Contextual ASR By Copying Named Entities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we treat entities as indivisible wholes and introduce the idea of copying into ASR. |
Shilin Zhou; Zhenghua Li; Yu Hong; Min Zhang; Zhefeng Wang; Baoxing Huai; |
260 | SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper introduces SceMQA, a novel benchmark for scientific multimodal question answering at the college entrance level. |
Zhenwen Liang; Kehan Guo; Gang Liu; Taicheng Guo; Yujun Zhou; Tianyu Yang; Jiajun Jiao; Renjie Pi; Jipeng Zhang; Xiangliang Zhang; |
261 | ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose ChiMed-GPT, a new benchmark LLM designed explicitly for Chinese medical domain, and undergoes a comprehensive training regime with pre-training, SFT, and RLHF. |
Yuanhe Tian; Ruyi Gan; Yan Song; Jiaxing Zhang; Yongdong Zhang; |
262 | Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We discovered that such a contrary is due to LLM�s bias in evaluating their own output. In this paper, we formally define LLM�s self-bias � the tendency to favor its own generation � using two statistics. |
Wenda Xu; Guanglei Zhu; Xuandong Zhao; Liangming Pan; Lei Li; William Wang; |
263 | Explanation-aware Soft Ensemble Empowers Large Language Model In-context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fully unleash the power of explanations, we propose EASE, an Explanation-Aware Soft Ensemble framework to empower in-context learning with LLMs. |
Yue Yu; Jiaming Shen; Tianqi Liu; Zhen Qin; Jing Nathan Yan; Jialu Liu; Chao Zhang; Michael Bendersky; |
264 | Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although dominant in natural language processing, transformer-based models still struggle with long-sequence processing, due to the computational costs of their self-attention operations, which increase exponentially as the length of the input sequence grows. To address this challenge, we propose a **Sim**ple framework to enhance the long-content processing of off-the-shelf pre-trained transformers via three steps: **C**hunk, **A**lign, and **S**elect (SimCAS). |
Jiawen Xie; Pengyu Cheng; Xiao Liang; Yong Dai; Nan Du; |
265 | Self-Augmented In-Context Learning for Unsupervised Word Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent work has shown that, while large language models (LLMs) demonstrate strong word translation or bilingual lexicon induction (BLI) capabilities in few-shot setups, they still cannot match the performance of �traditional� mapping-based approaches in the unsupervised scenario where no seed translation pairs are available, especially for lower-resource languages. To address this challenge with LLMs, we propose self-augmented in-context learning (SAIL) for unsupervised BLI: starting from a zero-shot prompt, SAIL iteratively induces a set of high-confidence word translation pairs for in-context learning (ICL) from an LLM, which it then reapplies to the same LLM in the ICL fashion. |
Yaoyiran Li; Anna Korhonen; Ivan Vulic; |
266 | LANDeRMT: Dectecting and Routing Language-Aware Neurons for Selectively Finetuning LLMs to Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The major challenges are catastrophic forgetting and parameter interference for finetuning LLMs when provided parallel training data. To address these challenges, we propose LANDeRMT, a Language-Aware Neuron Detecting and Routing framework that selectively finetunes LLMs to Machine Translation with diverse translation training data. |
Shaolin Zhu; Leiyu Pan; Bo Li; Deyi Xiong; |
267 | Code-Switching Can Be Better Aligners: Advancing Cross-Lingual SLU Through Representation-Level and Prediction-Level Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework dubbed REPE (short for Representation-Level and Prediction-Level Alignment), which leverages both code-switched and original sentences to achieve multi-level alignment. |
Zhihong Zhu; Xuxin Cheng; Zhanpeng Chen; Xianwei Zhuang; Zhiqi Huang; Yuexian Zou; |
268 | Think Twice: Perspective-Taking Improves Large Language Models� Theory-of-Mind Capabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent improvements to LLMs� reasoning capabilities from simple yet effective prompting techniques such as Chain-of-Thought (CoT) have seen limited applicability to ToM. In this paper, we turn to the prominent cognitive science theory �Simulation Theory� to bridge this gap. |
Alex Wilf; Sihyun Lee; Paul Pu Liang; Louis-Philippe Morency; |
269 | PCAD: Towards ASR-Robust Spoken Language Understanding Via Prototype Calibration and Asymmetric Decoupling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework termed PCAD, which can calibrate bias and errors and achieve adaptive-balanced decoupling training. |
Xianwei Zhuang; Xuxin Cheng; Liming Liang; Yuxin Xie; Zhichang Wang; Zhiqi Huang; Yuexian Zou; |
270 | An Expert Is Worth One Token: Synergizing Multiple Expert LLMs As Generalist Via Expert Token Routing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Expert-Token-Routing, a unified generalist framework that facilitates seamless integration of multiple expert LLMs. |
Ziwei Chai; Guoyin Wang; Jing Su; Tianjie Zhang; Xuanwen Huang; Xuwu Wang; Jingjing Xu; Jianbo Yuan; Hongxia Yang; Fei Wu; Yang Yang; |
271 | Towards Privacy-Aware Sign Language Translation at Scale Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, scaling SLT using large-scale web-scraped datasets bears privacy risks due to the presence of biometric information, which the responsible development of SLT technologies should account for. In this work, we propose a two-stage framework for privacy-aware SLT at scale that addresses both of these issues. |
Phillip Rust; Bowen Shi; Skyler Wang; Necati Cihan Camgoz; Jean Maillard; |
272 | Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Injecting a collection of symbolic data directly into the training of LLMs can be problematic, as it disregards the synergies among different symbolic families and overlooks the need for a balanced mixture of natural and symbolic data. In this work, we tackle these challenges from both a data and framework perspective and introduce Symbol-LLM series models. |
Fangzhi Xu; Zhiyong Wu; Qiushi Sun; Siyu Ren; Fei Yuan; Shuai Yuan; Qika Lin; Yu Qiao; Jun Liu; |
273 | PathReasoner: Modeling Reasoning Path with Equivalent Extension for Logical Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we model the logical reasoning task by transforming each logical sample into reasoning paths and propose an architecture PathReasoner. |
Fangzhi Xu; Qika Lin; Tianzhe Zhao; JiaweiHan JiaweiHan; Jun Liu; |
274 | Uncertainty Aware Learning for Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios, by introducing the sample uncertainty (elicited from more capable LLMs). |
Yikun Wang; Rui Zheng; Liang Ding; Qi Zhang; Dahua Lin; Dacheng Tao; |
275 | An Investigation of Neuron Activation As A Unified Lens to Explain Chain-of-Thought Eliciting Arithmetic Reasoning of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Yet, the reason why these components are important to LLM reasoning is not explored. To fill this gap, in this work, we investigate �neuron activation� as a lens to provide a unified explanation to observations made by prior work. |
Daking Rai; Ziyu Yao; |
276 | Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To reduce the filtering cost, we study Superfiltering: Can we use a smaller and weaker model to select data for finetuning a larger and stronger model? |
Ming Li; Yong Zhang; Shwai He; Zhitao Li; Hongyu Zhao; Jianzong Wang; Ning Cheng; Tianyi Zhou; |
277 | A Modular Approach for Multimodal Summarization of TV Shows Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we address the task of summarizing television shows, which touches key areas in AI research: complex reasoning, multiple modalities, and long narratives. |
Louis Mahon; Mirella Lapata; |
278 | Retrieval Augmented Fact Verification By Synthesizing Contrastive Arguments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose retrieval augmented fact verification through the synthesis of contrasting arguments (RAFTS). |
Zhenrui Yue; Huimin Zeng; Lanyu Shang; Yifan Liu; Yang Zhang; Dong Wang; |
279 | MultiLegalPile: A 689GB Multilingual Legal Corpus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, so far, few datasets are available for specialized critical domains such as law and the available ones are often small and only in English. To fill this gap, we curate and release MultiLegalPile, a 689GB corpus in 24 languages from 17 jurisdictions. |
Joel Niklaus; Veton Matoshi; Matthias St�rmer; Ilias Chalkidis; Daniel Ho; |
280 | RomanSetu: Efficiently Unlocking Multilingual Capabilities of Large Language Models Via Romanization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study addresses the challenge of extending Large Language Models (LLMs) to non-English languages, specifically those using non-Roman scripts. We propose an approach that utilizes the romanized form of text as an interface for LLMs, hypothesizing that its frequent informal use and shared tokens with English enhance cross-lingual alignment. |
Jaavid J; Raj Dabre; Aswanth M; Jay Gala; Thanmay Jayakumar; Ratish Puduppully; Anoop Kunchukuttan; |
281 | TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous studies typically focus on specific aspects of time, lacking a comprehensive temporal reasoning benchmark. To address this, we propose TimeBench, a comprehensive hierarchical temporal reasoning benchmark that covers a broad spectrum of temporal reasoning phenomena. |
Zheng Chu; Jingchang Chen; Qianglong Chen; Weijiang Yu; Haotian Wang; Ming Liu; Bing Qin; |
282 | BeamAggR: Beam Aggregation Reasoning Over Multi-source Knowledge for Multi-hop Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, significant challenges still persist, including inaccurate and insufficient retrieval for complex questions, as well as difficulty in integrating multi-source knowledge. To address this, we propose Beam Aggregation Reasoning (BeamAggR), a reasoning framework for knowledge-intensive multi-hop QA. |
Zheng Chu; Jingchang Chen; Qianglong Chen; Haotian Wang; Kun Zhu; Xiyuan Du; Weijiang Yu; Ming Liu; Bing Qin; |
283 | Towards Faithful and Robust LLM Specialists for Evidence-Based Question-Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we systematically investigate how to robustly fine-tune LLMs for better source quality and answer attributability. |
Tobias Schimanski; Jingwei Ni; Mathias Kraus; Elliott Ash; Markus Leippold; |
284 | Improving Event Definition Following For Zero-Shot Event Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to improve zero-shot event detection by training models to better follow event definitions. |
Zefan Cai; Po-Nien Kung; Ashima Suvarna; Mingyu Ma; Hritik Bansal; Baobao Chang; P. Jeffrey Brantingham; Wei Wang; Nanyun Peng; |
285 | CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To bridge these gaps between existing benchmarks and expectations from practical applications, we introduce **CodeScope**, an execution-based, multilingual, multitask, multidimensional evaluation benchmark for comprehensively measuring LLM capabilities on coding tasks. |
Weixiang Yan; Haitian Liu; Yunkun Wang; Yunzhe Li; Qian Chen; Wen Wang; Tingyu Lin; Weishan Zhao; Li Zhu; Hari Sundaram; Shuiguang Deng; |
286 | LogogramNLP: Comparing Visual and Textual Representations of Ancient Logographic Writing Systems for NLP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce LogogramNLP, the first benchmark enabling NLP analysis of ancient logographic languages, featuring both transcribed and visual datasetsfor four writing systems along with annotations for tasks like classification, translation, and parsing. |
Danlu Chen; Freda Shi; Aditi Agarwal; Jacobo Myerston; Taylor Berg-Kirkpatrick; |
287 | Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the training of these models still relies on parallel speech data, which is extremely challenging to collect. In contrast, S2TT and TTS have accumulated a large amount of data and pretrained models, which have not been fully utilized in the development of S2ST models. Inspired by this, in this paper, we first introduce a composite S2ST model named ComSpeech, which can seamlessly integrate any pretrained S2TT and TTS models into a direct S2ST model. |
Qingkai Fang; Shaolei Zhang; Zhengrui Ma; Min Zhang; Yang Feng; |
288 | Dr.Academy: A Benchmark for Evaluating Questioning Capability in Education for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In teaching, questioning is a key skill that guides students to analyze, evaluate, and synthesize core concepts and principles. Therefore, our research introduces a benchmark to evaluate the questioning capability in education as a teacher of LLMs through evaluating their generated educational questions, utilizing Anderson and Krathwohl�s taxonomy across general, monodisciplinary, and interdisciplinary domains. |
Yuyan Chen; Songzhou Yan; Panjun Liu; Yanghua Xiao; |
289 | Enhancing Explainable Rating Prediction Through Annotated Macro Concepts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the problem, we propose the Concept Enhanced Explainable Recommendation framework (CEER), which utilizes macro concepts as the intermediary to bridge the gap between the user/item embeddings and the recommendation reasons. |
Huachi Zhou; Shuang Zhou; Hao Chen; Ninghao Liu; Fan Yang; Xiao Huang; |
290 | TimeArena: Shaping Efficient Multitasking Language Agents in A Time-Aware Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce TimeArena, a novel textual simulated environment that incorporates complex temporal dynamics and constraints that better reflect real-life planning scenarios. |
Yikai Zhang; Siyu Yuan; Caiyu Hu; Kyle Richardson; Yanghua Xiao; Jiangjie Chen; |
291 | IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To facilitate research on multilingual LLM evaluation, we release IndicGenBench \u2014 the largest benchmark for evaluating LLMs on user-facing generation tasks across a diverse set 29 of Indic languages covering 13 scripts and 4 language families. |
Harman Singh; Nitish Gupta; Shikhar Bharadwaj; Dinesh Tewari; Partha Talukdar; |
292 | Don�t Rank, Combine! Combining Machine Translation Hypotheses Using Quality Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces QE-fusion, a method that synthesizes translations using a quality estimation metric (QE), which correlates better with human judgments. |
Giorgos Vernikos; Andrei Popescu-Belis; |
293 | Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To explore the knowledge boundary for a given model, we propose projected gradient descent method with semantic constraints, a new algorithm designed to identify the optimal prompt for each piece of knowledge. |
Xunjian Yin; Xu Zhang; Jie Ruan; Xiaojun Wan; |
294 | Rethinking The Multimodal Correlation of Multimodal Sequential Learning Via Generalizable Attentional Results Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in existing literature, the alignment degree between different calculated attentional results of the same query are under-explored. Based on this concern, we propose a new constrained scheme called Multimodal Contextual Contrast (MCC), which could align the multiple attentional results from both local and global perspectives, making the information capture more efficient. |
Tao Jin; Wang Lin; Ye Wang; Linjun Li; Xize Cheng; Zhou Zhao; |
295 | MMToM-QA: Multimodal Theory of Mind Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: People can flexibly reason about another person�s mind based on conceptual representations (e. g. , goals, beliefs, plans) extracted from any available data. To address this, we introduce a multimodal Theory of Mind question answering (MMToM-QA) benchmark. |
Chuanyang Jin; Yutong Wu; Jing Cao; Jiannan Xiang; Yen-Ling Kuo; Zhiting Hu; Tomer Ullman; Antonio Torralba; Joshua Tenenbaum; Tianmin Shu; |
296 | InCharacter: Evaluating Personality Fidelity in Role-Playing Agents Through Psychological Interviews Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper, instead, introduces a novel perspective to evaluate the personality fidelity of RPAs with psychological scales. |
Xintao Wang; Yunze Xiao; Jen-tse Huang; Siyu Yuan; Rui Xu; Haoran Guo; Quan Tu; Yaying Fei; Ziang Leng; Wei Wang; Jiangjie Chen; Cheng Li; Yanghua Xiao; |
297 | Robust Singing Voice Transcription Serves Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents ROSVOT, the first robust AST model that serves SVS, incorporating a multi-scale framework that effectively captures coarse-grained note information and ensures fine-grained frame-level segmentation, coupled with an attention-based pitch decoder for reliable pitch prediction. |
Ruiqi Li; Yu Zhang; Yongqi Wang; Zhiqing Hong; Rongjie Huang; Zhou Zhao; |
298 | PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an extensive training and evaluation framework, M2KR, for KB-VQA. |
Weizhe Lin; Jingbiao Mei; Jinghong Chen; Bill Byrne; |
299 | Measuring Meaning Composition in The Human Brain with Composition Scores from Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Drawing on the key-value memory interpretation of transformer feed-forward network blocks, we introduce the Composition Score, a novel model-based metric designed to quantify the degree of meaning composition during sentence comprehension. |
Changjiang Gao; Jixing Li; Jiajun Chen; Shujian Huang; |
300 | Transferable and Efficient Non-Factual Content Detection Via Probe Training with Offline Consistency Checking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes PiNose, which trains a probing model on offline self-consistency checking results, thereby circumventing the need for human-annotated data and achieving transferability across diverse data distributions. |
Xiaokang Zhang; Zijun Yao; Jing Zhang; Kaifeng Yun; Jifan Yu; Juanzi Li; Jie Tang; |
301 | MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they often suffer from limitations such as difficulty in incorporating new knowledge, generating hallucinations, and explaining their reasoning process. To address these challenges, we propose a novel prompting pipeline, named MindMap, that leverages knowledge graphs (KGs) to enhance LLMs� inference and transparency. |
Yilin Wen; Zifeng Wang; Jimeng Sun; |
302 | Speculative Contrastive Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by speculative decoding and contrastive decoding, we introduce Speculative Contrastive Decoding (SCD), a straightforward yet powerful decoding approach that leverages predictions from smaller language models (LMs) to achieve both decoding acceleration and quality improvement. |
Hongyi Yuan; Keming Lu; Fei Huang; Zheng Yuan; Chang Zhou; |
303 | Are LLMs Classical or Nonmonotonic Reasoners? Lessons from Generics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study nonmonotonic reasoning capabilities of seven state-of-the-art LLMs in one abstract and one commonsense reasoning task featuring generics, such as �Birds fly�, and exceptions, �Penguins don�t fly� (see Fig. 1). |
Alina Leidinger; Robert Van Rooij; Ekaterina Shutova; |
304 | Quantifying The Persona Effect in LLM Simulations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study investigates how integrating persona variables\u2014demographic, social, and behavioral factors\u2014impacts LLMs’ ability to simulate diverse perspectives. |
Tiancheng Hu; Nigel Collier; |
305 | Navigate Through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we systematically investigate relevant research, summarizing advanced methods through a meticulous taxonomy that offers novel perspectives. |
Zheng Chu; Jingchang Chen; Qianglong Chen; Weijiang Yu; Tao He; Haotian Wang; Weihua Peng; Ming Liu; Bing Qin; Ting Liu; |
306 | Expedited Training of Visual Conditioned Language Generation Via Redundancy Reduction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce EVLGen, a streamlined framework designed for the pre-training of visually conditioned language generation models with high computational demands, utilizing frozen pre-trained large language models (LLMs). |
Yiren Jian; Tingkai Liu; Yunzhe Tao; Chunhui Zhang; Soroush Vosoughi; Hongxia Yang; |
307 | Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We observe that existing MCTG works generally confront a noticeable performance drop in compositional testing. To mitigate this issue, we introduce Meta-MCTG, a training framework incorporating meta-learning, where we enable models to learn how to generalize by simulating compositional generalization scenarios in the training phase. |
Tianqi Zhong; Zhaoyi Li; Quan Wang; Linqi Song; Ying Wei; Defu Lian; Zhendong Mao; |
308 | Linear-time Minimum Bayes Risk Decoding with Reference Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to approximate pairwise metric scores with scores calculated against aggregated reference representations. |
Jannis Vamvas; Rico Sennrich; |
309 | Self-Alignment for Factuality: Mitigating Hallucinations in LLMs Via Self-Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore Self-Alignment for Factuality, where we leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality. |
Xiaoying Zhang; Baolin Peng; Ye Tian; Jingyan Zhou; Lifeng Jin; Linfeng Song; Haitao Mi; Helen Meng; |
310 | RAM-EHR: Retrieval Augmentation Meets Clinical Predictions on Electronic Health Records Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present RAM-EHR, a Retrieval AugMentation pipeline to improve clinical predictions on Electronic Health Records (EHRs). |
Ran Xu; Wenqi Shi; Yue Yu; Yuchen Zhuang; Bowen Jin; May Dongmei Wang; Joyce Ho; Carl Yang; |
311 | PixT3: Pixel-based Table-To-Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present PixT3, a multimodal table-to-text model that overcomes the challenges of linearization and input size limitations encountered by existing models. |
I�igo Alonso; Eneko Agirre; Mirella Lapata; |
312 | DocLLM: A Layout-Aware Generative Language Model for Multimodal Document Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present DocLLM, a lightweight extension to traditional large language models (LLMs) for reasoning over visual documents, taking into account both textual semantics and spatial layout. |
Dongsheng Wang; Natraj Raman; Mathieu Sibue; Zhiqiang Ma; Petr Babkin; Simerjot Kaur; Yulong Pei; Armineh Nourbakhsh; Xiaomo Liu; |
313 | SeeGULL Multilingual: A Dataset of Geo-Culturally Situated Stereotypes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, gathering these resources, at scale, in varied languages and regions pose a significant challenge as it requires broad socio-cultural knowledge and can also be prohibitively expensive. To overcome this critical gap, we employ a recently introduced approach that couples LLM generations for scale with culturally situated validations for reliability, and build SeeGULL Multilingual, a global-scale multilingual dataset of social stereotypes, containing over 25K stereotypes, spanning 23 pairs of languages and regions they are common in, with human annotations, and demonstrate its utility in identifying gaps in model evaluations. |
Mukul Bhutani; Kevin Robinson; Vinodkumar Prabhakaran; Shachi Dave; Sunipa Dev; |
314 | WatME: Towards Lossless Watermarking Through Lexical Redundancy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These results suggest a more profound effect of watermarking on LLMs than previously understood. To address these challenges, we introduce Watermarking with Mutual Exclusion (WatME), a novel approach leveraging linguistic prior knowledge of inherent lexical redundancy in LLM vocabularies to seamlessly integrate watermarks. |
Liang Chen; Yatao Bian; Yang Deng; Deng Cai; Shuaiyi Li; Peilin Zhao; Kam-Fai Wong; |
315 | Co-training for Low Resource Scientific Natural Language Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The automatic annotation method based on distant supervision for the training set of SciNLI, the first and most popular dataset for this task, results in label noise which inevitably degenerates the performance of classifiers. In this paper, we propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels, reflective of the manner they are used in the subsequent training epochs. |
Mobashir Sadat; Cornelia Caragea; |
316 | Representation Learning with Conditional Information Flow Maximization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes an information-theoretic representation learning framework, named conditional information flow maximization, to extract noise-invariant sufficient representations for the input data and target task. |
Dou Hu; Lingwei Wei; Wei Zhou; Songlin Hu; |
317 | MuggleMath: Assessing The Impact of Query and Response Augmentation on Math Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we conduct an investigation for such data augmentation in math reasoning and are intended to answer: (1) What strategies of data augmentation are more effective; (2) What is the scaling relationship between the amount of augmented data and model performance; and (3) Can data augmentation incentivize generalization to out-of-domain mathematical reasoning tasks?To this end, we create two new dataset AugGSM8K and AugMATH, by complicating and diversifying the queries and sampling multiple reasoning paths from GSM8K and MATH. |
Chengpeng Li; Zheng Yuan; Hongyi Yuan; Guanting Dong; Keming Lu; Jiancan Wu; Chuanqi Tan; Xiang Wang; Chang Zhou; |
318 | Synthesizing Text-to-SQL Data from Weak and Strong LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a synthetic data approach that combines data produced by larger, more powerful models (strong models) with error information data generated by smaller, not well-aligned models (weak models). |
Jiaxi Yang; Binyuan Hui; Min Yang; Jian Yang; Junyang Lin; Chang Zhou; |
319 | ANALOGYKB: Unlocking Analogical Reasoning of Language Models with A Million-scale Knowledge Base Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current language models (LMs) still struggle to achieve human-like performance in analogical reasoning tasks due to a lack of resources for model training. In this work, we address this gap by proposing ANALOGYKB, a million-scale analogy knowledge base (KB) derived from existing knowledge graphs (KGs). |
Siyu Yuan; Jiangjie Chen; Changzhi Sun; Jiaqing Liang; Yanghua Xiao; Deqing Yang; |
320 | SportsMetrics: Blending Text and Numerical Data to Understand Information Fusion in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce four novel tasks centered around sports data analytics to evaluate the numerical reasoning and information fusion capabilities of LLMs. |
Yebowen Hu; Kaiqiang Song; Sangwoo Cho; Xiaoyang Wang; Hassan Foroosh; Dong Yu; Fei Liu; |
321 | LooGLE: Can Long-Context Language Models Understand Long Contexts? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present LooGLE , a Long Context Generic Language Evaluation benchmark. |
Jiaqi Li; Mengmeng Wang; Zilong Zheng; Muhan Zhang; |
322 | PairCFR: Enhancing Model Training on Paired Counterfactually Augmented Data Through Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We theoretically prove that contrastive loss can encourage models to leverage a broader range of features beyond those modified ones. |
Xiaoqi Qiu; Yongjie Wang; Xu Guo; Zhiwei Zeng; Yu Yue; Yuhong Feng; Chunyan Miao; |
323 | Feature-Adaptive and Data-Scalable In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a feature-adaptive and data-scalable in-context learning framework (FADS-ICL), which can leverage task-adaptive features to promote inference on the downstream task, with the supervision of beyond-context samples. |
Jiahao Li; Quan Wang; Licheng Zhang; Guoqing Jin; Zhendong Mao; |
324 | Interpretability of Language Models Via Task Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an alternative approach, concentrating on the _quality_ of LM processing, with a focus on their language abilities. |
Lucas Weber; Jaap Jumelet; Elia Bruni; Dieuwke Hupkes; |
325 | Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, to avoid relying on outputs from large models, we demonstrate that the reasoning abilities of small-scale language models can be enhanced through self-training, which involves training models with their own outputs. |
Tianduo Wang; Shichen Li; Wei Lu; |
326 | A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These pipeline methods suffer from error propagation and accumulate delays in each cascade component, resulting in reduced synchronization between the speaker and listener. To overcome these challenges, we propose a novel non-autoregressive generation framework for simultaneous speech translation (NAST-S2x), which integrates speech-to-text and speech-to-speech tasks into a unified end-to-end framework. |
Zhengrui Ma; Qingkai Fang; Shaolei Zhang; Shoutao Guo; Yang Feng; Min Zhang; |
327 | UHGEval: Benchmarking The Hallucination of Chinese Large Language Models Via Unconstrained Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Concurrently, we have established a comprehensive benchmark evaluation framework to aid subsequent researchers in undertaking scalable and reproducible experiments. |
Xun Liang; Shichao Song; Simin Niu; Zhiyu Li; Feiyu Xiong; Bo Tang; Yezhaohui Wang; Dawei He; Cheng Peng; Zhonghao Wang; Haiying Deng; |
328 | Learning or Self-aligning? Rethinking Instruction Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we design a knowledge intervention framework to decouple the potential underlying factors of IFT, thereby enabling individual analysis of different factors. |
Mengjie Ren; Boxi Cao; Hongyu Lin; Cao Liu; Xianpei Han; Ke Zeng; Wan Guanglu; Xunliang Cai; Le Sun; |
329 | M3AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel multimodal, multigenre, and multipurpose audio-visual academic lecture dataset (M3AV), which has almost 367 hours of videos from five sources covering computer science, mathematics, and medical and biology topics. |
Zhe Chen; Heyang Liu; Wenyi Yu; Guangzhi Sun; Hongcheng Liu; Ji Wu; Chao Zhang; Yu Wang; Yanfeng Wang; |
330 | AlignBench: Benchmarking Chinese Alignment of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, effective evaluation of alignment for emerging Chinese LLMs is still significantly lacking, calling for real-scenario grounded, open-ended, challenging and automatic evaluations tailored for alignment. To fill in this gap, we introduce AlignBench, a comprehensive multi-dimensional benchmark for evaluating LLMs� alignment in Chinese. |
Xiao Liu; Xuanyu Lei; Shengyuan Wang; Yue Huang; Andrew Feng; Bosi Wen; Jiale Cheng; Pei Ke; Yifan Xu; Weng Lam Tam; Xiaohan Zhang; Lichao Sun; Xiaotao Gu; Hongning Wang; Jing Zhang; Minlie Huang; Yuxiao Dong; Jie Tang; |
331 | EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the critical skill of spatial understanding in embodied environments has not been thoroughly evaluated, leaving the gap between current LVLMs and qualified embodied intelligence unknown. Therefore, we construct EmbSpatial-Bench, a benchmark for evaluating embodied spatial understanding of LVLMs. |
Mengfei Du; Binhao Wu; Zejun Li; Xuanjing Huang; Zhongyu Wei; |
332 | Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods. To tackle this issue, we propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels, aiming to filter out mismatches and thereby enhance the effectiveness of self-training. |
Yice Zhang; Jie Zeng; Weiming Hu; Ziyi Wang; Shiwei Chen; Ruifeng Xu; |
333 | Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the process by which the encoder produces the text representation is unknown. We propose the Diffusion Lens, a method for analyzing the text encoder of T2I models by generating images from its intermediate representations. |
Michael Toker; Hadas Orgad; Mor Ventura; Dana Arad; Yonatan Belinkov; |
334 | Threads of Subtlety: Detecting Machine-Generated Texts Through Discourse Motifs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Introducing a novel methodology, we leverage hierarchical parse trees and recursive hypergraphs to unveil distinctive discourse patterns in texts produced by both LLMs and humans. |
Zae Myung Kim; Kwang Lee; Preston Zhu; Vipul Raheja; Dongyeop Kang; |
335 | Are LLM-based Evaluators Confusing NLG Quality Criteria? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by behavioral testing, we elaborately design 18 types of aspect-targeted perturbation attacks for fine-grained analysis of the evaluation behaviors of different LLMs. |
Xinyu Hu; Mingqi Gao; Sen Hu; Yang Zhang; Yicheng Chen; Teng Xu; Xiaojun Wan; |
336 | Stumbling Blocks: Stress Testing The Robustness of Machine-Generated Text Detectors Under Attacks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The goal of our study is to stress test the detectors� robustness to malicious attacks under realistic scenarios. |
Yichen Wang; Shangbin Feng; Abe Hou; Xiao Pu; Chao Shen; Xiaoming Liu; Yulia Tsvetkov; Tianxing He; |
337 | Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce CHARM, the first benchmark for comprehensively and in-depth evaluating the commonsense reasoning ability of large language models (LLMs) in Chinese, which covers both globally known and Chinese-specific commonsense. |
Jiaxing Sun; Weiquan Huang; Jiang Wu; Chenya Gu; Wei Li; Songyang Zhang; Hang Yan; Conghui He; |
338 | Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Parrot, a solution aiming to enhance multi-turn instruction following for LLMs. |
Yuchong Sun; Che Liu; Kun Zhou; Jinwen Huang; Ruihua Song; Xin Zhao; Fuzheng Zhang; Di Zhang; Kun Gai; |
339 | MentalManip: A Dataset For Fine-grained Analysis of Mental Manipulation in Conversations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The detection of manipulative language is essential for protecting potential victims, yet the field of Natural Language Processing (NLP) currently faces a scarcity of resources and research on this topic. Our study addresses this gap by introducing a new dataset, named MentalManip, which consists of 4,000 annotated fictional dialogues. |
Yuxin Wang; Ivory Yang; Saeed Hassanpour; Soroush Vosoughi; |
340 | ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose ProtLLM, a versatile cross-modal large language model (LLM) for both protein-centric and protein-language tasks. |
Le Zhuo; Zewen Chi; Minghao Xu; Heyan Huang; Jianan Zhao; Heqi Zheng; Conghui He; Xian-Ling Mao; Wentao Zhang; |
341 | A Community-Centric Perspective for Characterizing and Detecting Anti-Asian Violence-Provoking Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While previous works have characterized and built tools for detecting other forms of harmful speech, like fear speech and hate speech, our work takes a community-centric approach to studying anti-Asian violence-provoking speech. |
Gaurav Verma; Rynaa Grover; Jiawei Zhou; Binny Mathew; Jordan Kraemer; Munmun Choudhury; Srijan Kumar; |
342 | Cross-Modal Projection in Multimodal LLMs Doesn�t Really Project Visual Attributes to Textual Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, via experiments on 4 datasets and under 2 fine-tuning settings, we find that as the MLLM is fine-tuned, it indeed gains domain-specific visual capabilities, but the updates do not lead to the projection extracting relevant domain-specific visual attributes. |
Gaurav Verma; Minje Choi; Kartik Sharma; Jamelle Watson-Daniels; Sejoon Oh; Srijan Kumar; |
343 | Maverick: Efficient and Accurate Coreference Resolution Defying Recent Trends Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The Coreference Resolution task is no exception; all recent state-of-the-art solutions adopt large generative autoregressive models that outperform encoder-based discriminative systems. In this work, we challenge this recent trend by introducing Maverick, a carefully designed � yet simple � pipeline, which enables running a state-of-the-art Coreference Resolution system within the constraints of an academic budget, outperforming models with up to 13 billion parameters with as few as 500 million parameters. |
Giuliano Martinelli; Edoardo Barba; Roberto Navigli; |
344 | Guardians of The Machine Translation Meta-Evaluation: Sentinel Metrics Fall In! Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work highlights two issues with the meta-evaluation framework currently employed in WMT, and assesses their impact on the metrics rankings. To do this, we introduce the concept of sentinel metrics, which are designed explicitly to scrutinize the meta-evaluation process�s accuracy, robustness, and fairness. |
Stefano Perrella; Lorenzo Proietti; Alessandro Scir�; Edoardo Barba; Roberto Navigli; |
345 | NounAtlas: Filling The Gap in Nominal Semantic Role Labeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In many contexts, however, nominal predicates are often as informative as verbal ones, thus needing proper treatment. In this paper we aim to fill this gap and make nominal SRL a first-class citizen. |
Roberto Navigli; Marco Pinto; Pasquale Silvestri; Dennis Rotondi; Simone Ciciliano; Alessandro Scir�; |
346 | Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we draw upon principles from cognitive psychology to examine inferential strategies employed by LLMs, through a detailed evaluation of their responses to propositional logic problems. |
Philipp Mondorf; Barbara Plank; |
347 | Navigating The Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Currently, there is a lack of systematic evaluations regarding detection performance in real-world applications, and a comprehensive examination of perturbation techniques and detector robustness is also absent. To bridge this gap, our work simulates real-world scenarios in both informal and professional writing, exploring the out-of-the-box performance of current detectors. |
Ying Zhou; Ben He; Le Sun; |
348 | M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the lack of tasks featuring naturally long sequences, we propose an automatic approach to convert short-sequence tasks into long-sequence scenarios. |
Wai-Chung Kwan; Xingshan Zeng; Yufei Wang; Yusen Sun; Liangyou Li; Yuxin Jiang; Lifeng Shang; Qun Liu; Kam-Fai Wong; |
349 | Explore Spurious Correlations at The Concept Level in Language Models for Text Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces two main contributions. First, we employ ChatGPT to assign concept labels to texts, assessing concept bias in models during fine-tuning or ICL on test data. We find that LMs, when encountering spurious correlations between a concept and a label in training or prompts, resort to shortcuts for predictions. Second, we introduce a data rebalancing technique that incorporates ChatGPT-generated counterfactual data, thereby balancing label distribution and mitigating spurious correlations |
Yuhang Zhou; Paiheng Xu; Xiaoyu Liu; Bang An; Wei Ai; Furong Huang; |
350 | Angry Men, Sad Women: Large Language Models Reflect Gendered Stereotypes in Emotion Attribution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, emotion and gender are closely linked in societal discourse. E.g., women are often thought of as more empathetic, while men\u2019s anger is more socially accepted. To fill this gap, we present the first comprehensive study of gendered emotion attribution in five state-of-the-art LLMs (open- and closed-source). |
Flor Plaza-del-Arco; Amanda Curry; Alba Cercas Curry; Gavin Abercrombie; Dirk Hovy; |
351 | Classist Tools: Social Class Correlates with Performance in NLP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show empirically that NLP systems� performance is affected by speakers� SES, potentially disadvantaging less-privileged socioeconomic groups. |
Amanda Curry; Giuseppe Attanasio; Zeerak Talat; Dirk Hovy; |
352 | Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such real-world concerns, however, stand in stark contrast to the artificiality of current evaluations: real users do not typically ask LLMs survey questions. Motivated by this discrepancy, we challenge the prevailing *constrained* evaluation paradigm for values and opinions in LLMs and explore more realistic *unconstrained* evaluations. |
Paul R�ttger; Valentin Hofmann; Valentina Pyatkin; Musashi Hinck; Hannah Kirk; Hinrich Schuetze; Dirk Hovy; |
353 | Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel multimodal Transformer framework using prompt learning to address the issue of missing modalities. |
Zirun Guo; Tao Jin; Zhou Zhao; |
354 | Uni-Dubbing: Zero-Shot Speech Synthesis from Visual Articulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We employ a methodology that incorporates modality alignment during the pre-training phase on multimodal datasets, uniquely facilitating zero-shot generalization through the process of freezing the video modality feature extraction component and the encoder module within the pretrained weights, thereby enabling effective cross-modal and cross-lingual transfer. |
Songju Lei; Xize Cheng; Mengjiao Lyu; Jianqiao Hu; Jintao Tan; Runlin Liu; Lingyu Xiong; Tao Jin; Xiandong Li; Zhou Zhao; |
355 | XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, audio-visual (AV) data is only available in limited amounts and for fewer languages than audio-only resources. To address this gap, we present XLAVS-R, a cross-lingual audio-visual speech representation model for noise-robust speech recognition and translation in over 100 languages. |
HyoJung Han; Mohamed Anwar; Juan Pino; Wei-Ning Hsu; Marine Carpuat; Bowen Shi; Changhan Wang; |
356 | Decoder-only Streaming Transformer for Simultaneous Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, directly applying the Decoder-only architecture to SiMT poses challenges in terms of training and inference. To alleviate the above problems, we propose the first Decoder-only SiMT model, named Decoder-only Streaming Transformer (DST). |
Shoutao Guo; Shaolei Zhang; Yang Feng; |
357 | Large Language Models Are No Longer Shallow Parsers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on analyzing and improving the capability of current state-of-the-art LLMs on a classic fundamental task, namely constituency parsing, which is the representative syntactic task in both linguistics and natural language processing. |
Yuanhe Tian; Fei Xia; Yan Song; |
358 | Dialogue Summarization with Mixture of Experts Based on Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an LLM-based approach with role-oriented routing and fusion generation to utilize mixture of experts (MoE) for dialogue summarization. |
Yuanhe Tian; Fei Xia; Yan Song; |
359 | CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose CoGenesis, a collaborative generation framework integrating large (hosted on cloud infrastructure) and small models (deployed on local devices) to address privacy concerns logically. |
Kaiyan Zhang; Jianyu Wang; Ermo Hua; Biqing Qi; Ning Ding; Bowen Zhou; |
360 | Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, their mastery of underlying inferential rules still falls short of human capabilities. To investigate this, we propose a logic scaffolding inferential rule generation framework, to construct an inferential rule base, ULogic, comprising both primitive and compositional rules across five domains. |
Siyuan Wang; Zhongyu Wei; Yejin Choi; Xiang Ren; |
361 | Beyond Traditional Benchmarks: Analyzing Behaviors of Open LLMs on Data-to-Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We analyze the behaviors of open large language models (LLMs) on the task of data-to-text (D2T) generation, i. e. , generating coherent and relevant text from structured data. |
Zdenek Kasner; Ondrej Dusek; |
362 | PRP-Graph: Pairwise Ranking Prompting to LLMs with Graph Aggregation for Effective Text Re-ranking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in the existing methods, PRP only outputs the same label for the comparison results of different confidence intervals without considering the uncertainty of pairwise comparison, which implies an underutilization of the generation probability information of LLMs. To bridge this gap, we propose PRP-Graph, a novel pairwise re-ranking approach, based on a refined scoring PRP unit that exploits the output probabilities of target labels to capture the degree of certainty of the comparison results. |
Jian Luo; Xuanang Chen; Ben He; Le Sun; |
363 | PITA: Prompting Task Interaction for Argumentation Mining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method PITA for PromptIng Task interAction to model the inter-relationships among the three subtasks within a generative framework. |
Yang Sun; Muyi Wang; Jianzhu Bao; Bin Liang; Xiaoyan Zhao; Caihua Yang; Min Yang; Ruifeng Xu; |
364 | Through The MUD: A Multi-Defendant Charge Prediction Benchmark with Linked Crime Elements Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, real-world criminal cases usually involve multiple defendants whose criminal facts are intertwined. In an early attempt to fill this gap, we introduce a new benchmark that encompasses legal cases involving multiple defendants, where each defendant is labeled with a charge and four types of crime elements, i. e. , Object Element, Objective Element, Subject Element, and Subjective Element. |
Xiao Wei; Qi Xu; Hang Yu; Qian Liu; Erik Cambria; |
365 | The Echoes of Multilinguality: Tracing Cultural Value Shifts During Language Model Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We are the first to study how languages can exert influence on the cultural values encoded for different test languages, by studying how such values are revised during fine-tuning. |
Rochelle Choenni; Anne Lauscher; Ekaterina Shutova; |
366 | An Entropy-based Text Watermarking Detection Method Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although the current text watermarking algorithms perform well in most high-entropy scenarios, its performance in low-entropy scenarios still needs to be improved. In this work, we opine that the influence of token entropy should be fully considered in the watermark detection process, i. e. , the weight of each token during watermark detection should be customized according to its entropy, rather than setting the weights of all tokens to the same value as in previous methods. |
Yijian Lu; Aiwei Liu; Dianzhi Yu; Jingjing Li; Irwin King; |
367 | Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Generative Pre-trained Speech Transformer (GPST), a hierarchical transformer designed for efficient speech language modeling. |
Yongxin Zhu; Dan Su; Liqiang He; Linli Xu; Dong Yu; |
368 | PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the success in high-resource languages, its application in lower-resource ones faces challenges due to the imbalanced foundational abilities of LLMs across different languages, stemming from the uneven language distribution in their pre-training data. To tackle this issue, we propose pivot language guided generation (PLUG), an approach that utilizes a high-resource language, primarily English, as the pivot to enhance instruction tuning in lower-resource languages. |
Zhihan Zhang; Dong-Ho Lee; Yuwei Fang; Wenhao Yu; Mengzhao Jia; Meng Jiang; Francesco Barbieri; |
369 | Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on the observation, we propose a frustratingly easy method called SEQ* for IL with PLMs. |
Junhao Zheng; Shengjie Qiu; Qianli Ma; |
370 | Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the performance of this retrieve-then-read paradigm is constrained by the retriever and the inevitable noise in the retrieved documents. To mitigate these challenges, we introduce a novel generate-then-ground (GenGround) framework, synergizing the parametric knowledge of LLMs and external documents to solve a multi-hop question. |
Zhengliang Shi; Shuo Zhang; Weiwei Sun; Shen Gao; Pengjie Ren; Zhumin Chen; Zhaochun Ren; |
371 | GumbelSoft: Diversified Language Model Watermarking Via The GumbelMax-trick Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, GM watermark encounters a major challenge with generation diversity, always yielding identical outputs for the same prompt, negatively impacting generation diversity and user experience. To overcome this limitation, we introduce a new type of GM watermark, the Logits-Addition watermark, as well as three variants that aim to enhance diversity, particularly the GumbelSoft watermark (i. e. , the softmax variant of the Logits-Addition watermark). |
Jiayi Fu; Xuandong Zhao; Ruihan Yang; Yuansen Zhang; Jiangjie Chen; Yanghua Xiao; |
372 | DRAGIN: Dynamic Retrieval Augmented Generation Based on The Real-time Information Needs of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, the strategies for deciding what to retrieve typically limit themselves to the LLM�s most recent sentence or the last few tokens, while the LLM�s information needs may span across the entire context. To overcome these limitations, we introduce a new framework, DRAGIN, i. e. , Dynamic Retrieval Augmented Generation based on the Information Needs of LLMs. |
Weihang Su; Yichen Tang; Qingyao Ai; Zhijing Wu; Yiqun Liu; |
373 | Metaphor Understanding Challenge Dataset for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We release the Metaphor Understanding Challenge Dataset (MUNCH), designed to evaluate the metaphor understanding capabilities of LLMs. |
Xiaoyu Tong; Rochelle Choenni; Martha Lewis; Ekaterina Shutova; |
374 | Context-aware Difference Distilling for Multi-change Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel context-aware difference distilling (CARD) network to capture all genuine changes for yielding sentences. |
Yunbin Tu; Liang Li; Li Su; Zheng-Jun Zha; Chenggang Yan; Qingming Huang; |
375 | GSM-Plus: A Comprehensive Benchmark for Evaluating The Robustness of LLMs As Mathematical Problem Solvers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the adversarial grade school math (GSM-Plus) dataset, an extension of GSM8K augmented with various mathematical perturbations. |
Qintong Li; Leyang Cui; Xueliang Zhao; Lingpeng Kong; Wei Bi; |
376 | Context Versus Prior Knowledge in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We hypothesize that models perform this integration in a predictable way across different questions and contexts: models will rely more on prior knowledge for questions about entities (e. g. , persons, places, etc. ) that they are more familiar with due to higher exposure in the training corpus, and be more easily persuaded by some contexts than others. To formalize this problem, we propose two mutual information-based metrics to measure a model�s dependency on a context and on its prior about an entity: first, the persuasion score of a given context represents how much a model depends on the context in its decision, and second, the susceptibility score of a given entity represents how much the model can be swayed away from its original answer distribution about an entity. |
Kevin Du; V�steinn Sn�bjarnarson; Niklas Stoehr; Jennifer White; Aaron Schein; Ryan Cotterell; |
377 | Analyzing Semantic Change Through Lexical Replacements Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we model semantic change by studying the effect of unexpected contexts introduced by lexical replacements. |
Francesco Periti; Pierluigi Cassotti; Haim Dubossarsky; Nina Tahmasebi; |
378 | Leveraging Large Language Models for Learning Complex Legal Concepts Through Storytelling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel application of large language models (LLMs) in legal education to help non-experts learn intricate legal concepts through storytelling, an effective pedagogical tool in conveying complex and abstract concepts. |
Hang Jiang; Xiajie Zhang; Robert Mahari; Daniel Kessler; Eric Ma; Tal August; Irene Li; Alex Pentland; Yoon Kim; Deb Roy; Jad Kabbara; |
379 | Generalizing Conversational Dense Retrieval Via LLM-Cognition Data Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a framework for generalizing Conversational dense retrieval via LLM-cognition data Augmentation (ConvAug). |
Haonan Chen; Zhicheng Dou; Kelong Mao; Jiongnan Liu; Ziliang Zhao; |
380 | ATLAS: Improving Lay Summarisation with Attribute-based Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In practice, audiences with different levels of expertise will have specific needs, impacting what content should appear in a lay summary and how it should be presented. Aiming to address this, we propose ATLAS, a novel abstractive summarisation approach that can control various properties that contribute to the overall �layness� of the generated summary using targeted control attributes. |
Zhihao Zhang; Tomas Goldsack; Carolina Scarton; Chenghua Lin; |
381 | FineSurE: Fine-grained Summarization Evaluation Using LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To remedy those limitations, we propose FineSurE, a fine-grained evaluator specifically tailored for the summarization task using large language models (LLMs). |
Hwanjun Song; Hang Su; Igor Shalyminov; Jason Cai; Saab Mansour; |
382 | Prompted Aspect Key Point Analysis for Quantitative Review Summarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Prompted Aspect Key Point Analysis (PAKPA) for quantitative review summarization. |
An Tang; Xiuzhen Zhang; Minh Dinh; Erik Cambria; |
383 | What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on dialects and regional languages related to German � a group of varieties that is heterogeneous in terms of prestige and standardization. |
Verena Blaschke; Christoph Purschke; Hinrich Schuetze; Barbara Plank; |
384 | SafeDecoding: Defending Against Jailbreak Attacks Via Safety-Aware Decoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we leverage (1) and (2) to develop SafeDecoding, a safety-aware decoding strategy for LLMs, to defend against jailbreak attacks. |
Zhangchen Xu; Fengqing Jiang; Luyao Niu; Jinyuan Jia; Bill Yuchen Lin; Radha Poovendran; |
385 | ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the task of automatically revising scientific papers based on peer feedback and release ARIES, a dataset of review comments and their corresponding paper edits. |
Mike D�Arcy; Alexis Ross; Erin Bransom; Bailey Kuehl; Jonathan Bragg; Tom Hope; Doug Downey; |
386 | Small But Funny: A Feedback-Driven Approach to Humor Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We hypothesize that this gap may stem from the fact that creative tasks might be hard to learn by imitation alone and explore whether an approach, involving supplementary guidance from the teacher, could yield higher performance. To address this, we study the effect of assigning a dual role to the LLM – as a \u201cteacher\u201d generating data, as well as a \u201ccritic\u201d evaluating the student\u2019s performance. |
Sahithya Ravi; Patrick Huber; Akshat Shrivastava; Vered Shwartz; Arash Einolghozati; |
387 | Calibrating Large Language Models Using Their Generations Only Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, finding effective ways to calibrate LLMs�especially when the only interface to the models is their generated text�remains a challenge. We propose APRICOT (Auxiliary prediction of confidence targets): A method to set confidence targets and train an additional model that predicts an LLM�s confidence based on its textual input and output alone. |
Dennis Ulmer; Martin Gubri; Hwaran Lee; Sangdoo Yun; Seong Oh; |
388 | Enhancing In-Context Learning Via Implicit Demonstration Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite its potential, ICL�s effectiveness heavily relies on the quality, quantity, and permutation of demonstrations, commonly leading to suboptimal and unstable performance. In this paper, we tackle this challenge for the first time from the perspective of demonstration augmentation. |
Xiaoling Zhou; Wei Ye; Yidong Wang; Chaoya Jiang; Zhemg Lee; Rui Xie; Shikun Zhang; |
389 | Disentangled Learning with Synthetic Parallel Data for Text Style Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the absence of parallel datasets for supervision, most existing studies have been conducted in an unsupervised manner, where the generated sentences often suffer from high semantic divergence and thus low semantic preservation. In this paper, we propose a novel disentanglement-based framework for TST named DisenTrans, where disentanglement means that we separate the attribute and content components in the natural language corpus and consider this task from these two perspectives. |
Jingxuan Han; Quan Wang; Zikang Guo; Benfeng Xu; Licheng Zhang; Zhendong Mao; |
390 | TaPERA: Enhancing Faithfulness and Interpretability in Long-Form Table QA By Content Planning and Execution-based Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While Large language models based systems have made significant progress, it often hallucinates, especially when the task involves complex reasoning over tables. To tackle this issue, we propose a new LLM-based framework, TaPERA, for LFTQA tasks. |
Yilun Zhao; Lyuhao Chen; Arman Cohan; Chen Zhao; |
391 | Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we mainly explore improving LMMs with visual-language knowledge alignment, especially aimed at challenging knowledge-based visual question answering (VQA). |
Yunxin Li; Xinyu Chen; Baotian Hu; Haoyuan Shi; Min Zhang; |
392 | Efficient OCR for Building A Diverse Digital History Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study models OCR as a character level image retrieval problem, using a contrastively trained vision encoder. |
Jacob Carlson; Tom Bryan; Melissa Dell; |
393 | To Generate or to Retrieve? On The Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents MedGENIE, the first generate-then-read framework for multiple-choice question answering in medicine. |
Giacomo Frisoni; Alessio Cocchieri; Alex Presepi; Gianluca Moro; Zaiqiao Meng; |
394 | The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our findings emphasize the need for fine-tuning strategies that preserve the benefits of LLMs for machine translation. |
David Stap; Eva Hasler; Bill Byrne; Christof Monz; Ke Tran; |
395 | MultiPICo: Multilingual Perspectivist Irony Corpus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This approach aimsto leverage data annotated by different individuals to model diverse perspectives that affect their opinions on subjective phenomena such as irony. In this context, we propose MultiPICo, a multilingual perspectivist corpus of ironic short conversations in different languages andlinguistic varieties extracted from Twitter and Reddit. |
Silvia Casola; Simona Frenda; Soda Lo; Erhan Sezerer; Antonio Uva; Valerio Basile; Cristina Bosco; Alessandro Pedrani; Chiara Rubagotti; Viviana Patti; Davide Bernardi; |
396 | MAP�s Not Dead Yet: Uncovering True Language Model Modes By Conditioning Away Degeneracy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we argue that mixing even a tiny amount of low-entropy noise with a population text distribution can cause the data distribution�s mode to become degenerate. We therefore propose to apply MAP decoding to the model�s true conditional distribution where the conditioning variable explicitly avoids specific degenerate behavior. |
Davis Yoshida; Kartik Goyal; Kevin Gimpel; |
397 | What Languages Are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we investigate the learnability of regular LMs (RLMs) by RNN and Transformer LMs. |
Nadav Borenstein; Anej Svete; Robin Chan; Josef Valvoda; Franz Nowak; Isabelle Augenstein; Eleanor Chodroff; Ryan Cotterell; |
398 | Investigating Cultural Alignment of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our study emphasizes the necessity for a more balanced multilingual pretraining dataset to better represent the diversity of human experience and the plurality of different cultures with many implications on the topic of cross-lingual transfer. |
Badr AlKhamissi; Muhammad ElNokrashy; Mai Alkhamissi; Mona Diab; |
399 | CSCD-NS: A Chinese Spelling Check Dataset for Native Speakers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present CSCD-NS, the first Chinese spelling check (CSC) dataset designed for native speakers, containing 40,000 samples from a Chinese social platform. |
Yong Hu; Fandong Meng; Jie Zhou; |
400 | TaSL: Continual Dialog State Tracking Via Task Skill Localization and Consolidation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present TaSL, a novel framework for task skill localization and consolidation that enables effective knowledge transfer without relying on memory replay. |
Yujie Feng; Xu Chu; Yongxin Xu; Guangyuan Shi; Bo Liu; Xiao-Ming Wu; |
401 | Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, these methods provide explainability by using heatmaps to show the general image areas potentially associated with texts rather than specific regions, making their explanations not explicit and specific enough. To address these issues, we propose a novel Adaptive patch-word Matching (AdaMatch) model to correlate chest X-ray (CXR) image regions with words in medical reports and apply it to CXR-report generation to provide explainability for the generation process. |
Wenting Chen; Linlin Shen; Jingyang Lin; Jiebo Luo; Xiang Li; Yixuan Yuan; |
402 | Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel perspective that considers the role of LLMs in RAG as �Information Refiner�, which means that regardless of correctness, completeness, or usefulness of retrieved texts, LLMs can consistently integrate knowledge within the retrieved texts and model parameters to generate the texts that are more concise, accurate, and complete than the retrieved texts. |
Shicheng Xu; Liang Pang; Mo Yu; Fandong Meng; Huawei Shen; Xueqi Cheng; Jie Zhou; |
403 | LRQuant: Learnable and Robust Post-Training Quantization for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods face two issues: 1) Most smoothing parameters are hand-crafted defined which leads to suboptimal results; 2) There are significant performance degradations when tested on unseen datasets. To address these challenges, this paper introduces a robust learnable smooth-based PTQ framework, called LRQuant. |
Jiaqi Zhao; Miao Zhang; Chao Zeng; Ming Wang; Xuebo Liu; Liqiang Nie; |
404 | RepCodec: A Speech Representation Codec for Speech Tokenization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this discretization gives rise to a loss of information, consequently impairing overall performance. To improve the performance of these discrete speech tokens, we present RepCodec, a novel speech representation codec for semantic speech tokenization. |
Zhichao Huang; Chutong Meng; Tom Ko; |
405 | PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To date, comprehensive research on the safety issues associated with multi-agent systems remains limited. In this paper, we explore these concerns through the innovative lens of agent psychology, revealing that the dark psychological states of agents constitute a significant threat to safety. |
Zaibin Zhang; Yongting Zhang; Lijun Li; Jing Shao; Hongzhi Gao; Yu Qiao; Lijun Wang; Huchuan Lu; Feng Zhao; |
406 | HyperMoE: Towards Better Mixture of Experts Via Transferring Among Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the success, most existing methods face a challenge for balance between sparsity and the availability of expert knowledge: enhancing performance through increased use of expert knowledge often results in diminishing sparsity during expert selection. To mitigate this contradiction, we propose HyperMoE, a novel MoE framework built upon Hypernetworks. |
Hao Zhao; Zihan Qiu; Huijia Wu; Zili Wang; Zhaofeng He; Jie Fu; |
407 | Making Long-Context Language Models Better Multi-Hop Reasoners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Reasoning with Attributions, a novel approach that prompts LMs to supply attributions for each assertion during their reasoning. |
Yanyang Li; Shuo Liang; Michael Lyu; Liwei Wang; |
408 | Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This poses significant challenges in developing comparable models for other languages, even those with large speaker populations, such as Arabic. To alleviate this challenge, we introduce a comprehensive family of Arabic MLLMs, dubbed *Peacock*, with strong vision and language capabilities. |
Fakhraddin Alwajih; El Moatez Billah Nagoudi; Gagan Bhatia; Abdelrahman Mohamed; Muhammad Abdul-Mageed; |
409 | The Earth Is Flat Because…: Investigating LLMs� Belief Towards Misinformation Via Persuasive Conversation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, belief can change during a multi-turn conversation, especially a persuasive one. Therefore, in this study, we delve into LLMs� susceptibility to persuasive conversations, particularly on factual questions that they can answer correctly. |
Rongwu Xu; Brian Lin; Shujian Yang; Tianqi Zhang; Weiyan Shi; Tianwei Zhang; Zhixuan Fang; Wei Xu; Han Qiu; |
410 | PokeMQA: Programmable Knowledge Editing for Multi-hop Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We thus propose a framework, Programmable knowledge editing for Multi-hop Question Answering (PokeMQA), to decouple the jobs. |
Hengrui Gu; Kaixiong Zhou; Xiaotian Han; Ninghao Liu; Ruobing Wang; Xin Wang; |
411 | More Frequent Verbs Are Associated with More Diverse Valency Frames: Efficient Principles at The Lexicon-grammar Interface Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider two measures of valency diversity for verbs: valency frame count (VFC), the number of distinct frames associated with a verb, and valency frame entropy (VFE), the average information content of frame selection associated with a verb. |
Siyu Tao; Lucia Donatelli; Michael Hahn; |
412 | Investigating Multi-Hop Factual Shortcuts in Knowledge Editing of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper systematically investigates the possibilities for LLMs to utilize shortcuts based on direct connections between the initial and terminal entities of multi-hop knowledge. |
Tianjie Ju; Yijin Chen; Xinwei Yuan; Zhuosheng Zhang; Wei Du; Yubin Zheng; Gongshen Liu; |
413 | Prototypical Reward Network for Data-Efficient Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a framework utilizing Prototypical Networks to enhance reward models under limited human feedback, enabling more stable and reliable structural learning from fewer samples. |
Jinghan Zhang; Xiting Wang; Yiqiao Jin; Changyu Chen; Xinhao Zhang; Kunpeng Liu; |
414 | ABEX: Data Augmentation for Low-Resource NLU Via Expanding Abstract Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ABEX, a novel and effective generative data augmentation methodology for low-resource Natural Language Understanding (NLU) tasks. |
Sreyan Ghosh; Utkarsh Tyagi; Sonal Kumar; Chandra Kiran Evuru; Ramaneswaran S; S Sakshi; Dinesh Manocha; |
415 | JumpCoder: Go Beyond Autoregressive Coder Via Online Modification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce JumpCoder, a novel model-agnostic framework that enables human-like online modification and non-sequential generation to augment code LLMs. |
Mouxiang Chen; Hao Tian; Zhongxin Liu; Xiaoxue Ren; Jianling Sun; |
416 | Continual Learning with Semi-supervised Contrastive Distillation for Incremental Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there are two drawbacks: 1) it requires having the training data for all domains available at the same time, which may be unrealistic due to storage or privacy concerns; 2) it requires re-training the model on the data of all domains from scratch when adding a new domain and this is time-consuming and computationally expensive. To address these issues, we present a semi-supervised contrastive distillation framework for incremental neural machine translation. |
Yunlong Liang; Fandong Meng; Jiaan Wang; Jinan Xu; Yufeng Chen; Jie Zhou; |
417 | Deciphering Oracle Bone Language with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a novel approach by adopting image generation techniques, specifically through the development of Oracle Bone Script Decipher (OBSD). |
Haisu Guan; Huanxin Yang; Xinyu Wang; Shengwei Han; Yongge Liu; Lianwen Jin; Xiang Bai; Yuliang Liu; |
418 | UniCoder: Scaling Code Large Language Model Via Universal Code Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the universal code (UniCode) as the intermediate representation. |
Tao Sun; Linzheng Chai; Jian Yang; Yuwei Yin; Hongcheng Guo; Jiaheng Liu; Bing Wang; Liqun Yang; Zhoujun Li; |
419 | Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Generative Pretrained Structured Transformers (GPST), an unsupervised SLM at scale capable of being pre-trained from scratch on raw texts with high parallelism. |
Xiang Hu; Pengyu Ji; Qingyang Zhu; Wei Wu; Kewei Tu; |
420 | Aligning Large Language Models for Controllable Recommendations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing literature primarily concentrates on integrating domain-specific knowledge into LLMs to enhance accuracy using a fixed task template, often overlooking the diversity of recommendation tasks and the ability of LLMs to follow recommendation-specific instructions. To address this gap, we first introduce a collection of supervised learning tasks, augmented with labels derived from a conventional recommender model, aimed at explicitly improving LLMs\u2019 proficiency in adhering to recommendation-specific instructions. Next, we propose a reinforcement learning-based alignment procedure to enhance LLMs\u2019 generalization ability. |
Wensheng Lu; Jianxun Lian; Wei Zhang; Guanghua Li; Mingyang Zhou; Hao Liao; Xing Xie; |
421 | DocLens: Multi-aspect Fine-grained Medical Text Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To reflect the specific requirements of medical text, in this paper, we propose a set of metrics to evaluate the completeness, conciseness, and attribution of the generated text at a fine-grained level. |
Yiqing Xie; Sheng Zhang; Hao Cheng; Pengfei Liu; Zelalem Gero; Cliff Wong; Tristan Naumann; Hoifung Poon; Carolyn Rose; |
422 | Time Is Encoded in The Weights of Finetuned Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present time vectors, a simple tool to customize language models to new time periods. |
Kai Nylund; Suchin Gururangan; Noah Smith; |
423 | LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To enable systematic evaluation, we introduce LogicBench, a natural language question-answering dataset focusing on the use of a single inference rule. |
Mihir Parmar; Nisarg Patel; Neeraj Varshney; Mutsumi Nakamura; Man Luo; Santosh Mashetty; Arindam Mitra; Chitta Baral; |
424 | That�s Optional: A Contemporary Exploration of �that� Omission in English Subordinate Clauses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building upon previous research, we extend our investigation to a larger corpus of written English, utilize contemporary large language models (LLMs) and extend the information-uniformity principles by the notion of entropy, to estimate the UID manifestations in the usecase of syntactic reduction choices. |
Ella Rabinovich; |
425 | Reasoning in Conversation: Solving Subjective Tasks Through Dialogue Simulation for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the characteristics of the tasks and the strong dialogue-generation capabilities of LLMs, we propose RiC (Reasoning in Conversation), a method that focuses on solving subjective tasks through dialogue simulation. |
Xiaolong Wang; Yile Wang; Yuanchi Zhang; Fuwen Luo; Peng Li; Maosong Sun; Yang Liu; |
426 | Interpreting Conversational Dense Retrieval By Rewriting-Enhanced Inversion of Session Embedding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents CONVINV, a simple yet effective approach to shed light on interpretable conversational dense retrieval models. |
Yiruo Cheng; Kelong Mao; Zhicheng Dou; |
427 | Multi-modal Preference Alignment Remedies Degradation of Visual Instruction Tuning on Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In conclusion, we propose a distillation-based multi-modal alignment model with fine-grained annotations on a small dataset that restores and boosts MLLM�s language capability after visual instruction tuning. |
Shengzhi Li; Rongyu Lin; Shichao Pei; |
428 | EZ-STANCE: A Large Dataset for English Zero-Shot Stance Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present EZ-STANCE, a large English ZSSD dataset with 47,316 annotated text-target pairs. |
Chenye Zhao; Cornelia Caragea; |
429 | ActionIE: Action Extraction from Scientific Literature with Programming Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such an action extraction task is particularly challenging given the intricate details and context-dependent nature of the instructions, especially in fields like chemistry where reproducibility is paramount. In this paper, we introduce ActionIE, a method that leverages Large Language Models (LLMs) to bridge this divide by converting actions written in natural language into executable Python code. |
Xianrui Zhong; Yufeng Du; Siru Ouyang; Ming Zhong; Tingfeng Luo; Qirong Ho; Hao Peng; Heng Ji; Jiawei Han; |
430 | Do Large Language Models Discriminate in Hiring Decisions on The Basis of Race, Ethnicity, and Gender? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that the hiring decisions of LLMs in many settings are more likely to favor White applicants over Hispanic applicants. |
Haozhe An; Christabel Acquaye; Colin Wang; Zongxia Li; Rachel Rudinger; |
431 | InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work proposes InfoLossQA, a framework to characterize and recover simplification-induced information loss in form of question-and-answer (QA) pairs. |
Jan Trienes; Sebastian Joseph; J�rg Schl�tterer; Christin Seifert; Kyle Lo; Wei Xu; Byron Wallace; Junyi Jessy Li; |
432 | IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate the prospect of leveraging readily available compiler intermediate representations (IR)�shared across programming languages�to improve the multilingual capabilities of Code-LMs and facilitate cross-lingual transfer. |
Indraneil Paul; Goran Glava�; Iryna Gurevych; |
433 | Tree-Averaging Algorithms for Ensemble-Based Unsupervised Discontinuous Constituency Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to build an ensemble of different runs of the existing discontinuous parser by averaging the predicted trees, to stabilize and boost performance. |
Behzad Shayegh; Yuqiao Wen; Lili Mou; |
434 | Semi-Supervised Spoken Language Glossification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a framework named Semi-Supervised Spoken Language Glossification (S3LG) for SLG. |
Huijie Yao; Wengang Zhou; Hao Zhou; Houqiang Li; |
435 | Improving Large Language Models in Event Relation Logical Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we conduct an in-depth investigation to systematically explore the capability of LLMs in understanding and applying event relation logic. |
Meiqi Chen; Yubo Ma; Kaitao Song; Yixin Cao; Yan Zhang; Dongsheng Li; |
436 | Answer Is All You Need: Instruction-following Text Embedding Via Answering The Question Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work aims to build a text embedder that can capture characteristics of texts specified by user instructions clarifying the similarity criterion. |
Letian Peng; Yuwei Zhang; Zilong Wang; Jayanth Srinivasa; Gaowen Liu; Zihan Wang; Jingbo Shang; |
437 | Transparent and Scrutable Recommendations Using Natural Language User Profiles Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With a systematic analysis into the effect of updating user profiles and system prompts, we show the advantage of our approach in easier adjustment of user preferences and a greater autonomy over users� received recommendations. |
Jerome Ramos; Hossein A. Rahmani; Xi Wang; Xiao Fu; Aldo Lipani; |
438 | OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite these efforts, there exists a lack of a unified understanding regarding the impact of various components�ranging from visual perception to action execution�on task performance. To address this gap, we introduce OPEx, a comprehensive framework that delineates the core components essential for solving embodied learning tasks: Observer, Planner, and Executor. |
Haochen Shi; Zhiyuan Sun; Xingdi Yuan; Marc-Alexandre C�t�; Bang Liu; |
439 | On The Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge this gap, we formalize CoT reasoning in a probabilistic setting. We present several results on the representational capacity of recurrent and transformer LMs with CoT reasoning, showing that they can represent the same family of distributions over strings as probabilistic Turing machines. |
Franz Nowak; Anej Svete; Alexandra Butoi; Ryan Cotterell; |
440 | To Distill or Not to Distill? On The Robustness of Robust Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: 9% WER)outperforms all other models. To gain more in-sight into the poor performance of these modelson dialectal data, we conduct an error analysisand report the main types of errors the differentmodels tend to make. |
Abdul Waheed; Karima Kadaoui; Muhammad Abdul-Mageed; |
441 | A Deep Dive Into The Trade-Offs of Parameter-Efficient Preference Alignment Techniques Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there has not yet been an extensive study of their effect on downstream performance. To address this gap, we conduct an in-depth investigation of the impact of popular choices for three crucial axes: (i) the alignment dataset (HH-RLHF and BeaverTails), (ii) the alignment technique (SFT and DPO), and (iii) the model (LLaMA-1, Vicuna-v1. |
Megh Thakkar; Quentin Fournier; Matthew Riemer; Pin-Yu Chen; Amal Zouaq; Payel Das; Sarath Chandar; |
442 | QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback Based Self-Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we find existing methods fall short in terms of reliability and efficiency when hallucinations are encountered. In this paper, we address these challenges with a framework called QueryAgent, which solves a question step-by-step and performs stepwise self-correction. |
Xiang Huang; Sitao Cheng; Shanshan Huang; Jiayu Shen; Yong Xu; Chaoyun Zhang; Yuzhong Qu; |
443 | AoE: Angle-optimized Embeddings for Semantic Textual Similarity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the cosine has saturation zones rendering vanishing gradients and hindering learning subtle semantic differences in text embeddings. To address this issue, we propose a novel Angle-optimized Embedding model, AoE. |
Xianming Li; Jing Li; |
444 | Towards Real-world Scenario: Imbalanced New Intent Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge the gap, our work introduces the imbalanced new intent discovery i-NID task, which seeks to identify familiar and novel intent categories within long-tailed distributions. |
Shun Zhang; Yan Chaoran; Jian Yang; Jiaheng Liu; Ying Mo; Jiaqi Bai; Tongliang Li; Zhoujun Li; |
445 | Using Natural Language Explanations to Improve Robustness of In-context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate whether augmenting ICL with natural language explanations (NLEs) improves the robustness of LLMs on adversarial datasets covering natural language inference and paraphrasing identification. |
Xuanli He; Yuxiang Wu; Oana-Maria Camburu; Pasquale Minervini; Pontus Stenetorp; |
446 | Revealing The Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our study introduces a novel evaluation framework to quantify and compare the knowledge revealed by IA and NA. |
Haeun Yu; Pepa Atanasova; Isabelle Augenstein; |
447 | Stealthy Attack on Large Language Model Based Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we reveal that the introduction of LLMs into recommendation models presents new security vulnerabilities due to their emphasis on the textual content of items. |
Jinghao Zhang; Yuting Liu; Qiang Liu; Shu Wu; Guibing Guo; Liang Wang; |
448 | IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present baseline models (including LLM-based) for each task, outlining the gap between models and the ground truth. |
Abhinav Joshi; Shounak Paul; Akshat Sharma; Pawan Goyal; Saptarshi Ghosh; Ashutosh Modi; |
449 | Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Subsequently, we propose a novel RAG approach known as Retrieval-augmented Adaptive Adversarial Training (RAAT). |
Feiteng Fang; Yuelin Bai; Shiwen Ni; Min Yang; Xiaojun Chen; Ruifeng Xu; |
450 | WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we formulate the task of attributed query-focused summarization (AQFS) and present WebCiteS, a Chinese dataset featuring 7k human-annotated summaries with citations. |
Haolin Deng; Chang Wang; Li Xin; Dezhang Yuan; Junlang Zhan; Tian Zhou; Jin Ma; Jun Gao; Ruifeng Xu; |
451 | Spiral of Silence: How Is Large Language Model Killing Information Retrieval?�A Case Study on Open Domain Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we construct and iteratively run a simulation pipeline to deeply investigate the short-term and long-term effects of LLM text on RAG systems. |
Xiaoyang Chen; Ben He; Hongyu Lin; Xianpei Han; Tianshu Wang; Boxi Cao; Le Sun; Yingfei Sun; |
452 | TTM-RE: Memory-Augmented Document-Level Relation Extraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To unlock the full potential of large-scale noisy training data for document-level relation extraction, we propose TTM-RE, a novel approach that integrates a trainable memory module, known as the Token Turing Machine, with a noisy-robust loss function that accounts for the positive-unlabeled setting. |
Chufan Gao; Xuan Wang; Jimeng Sun; |
453 | Blinded By Generated Contexts: How Language Models Merge Generated and Retrieved Contexts When Knowledge Conflicts? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While auxiliary information has become a key to enhancing Large Language Models (LLMs), relatively little is known about how LLMs merge these contexts, specifically contexts generated by LLMs and those retrieved from external sources. To investigate this, we formulate a systematic framework to identify whether LLMs� responses are attributed to either generated or retrieved contexts. |
Hexiang Tan; Fei Sun; Wanli Yang; Yuanzhuo Wang; Qi Cao; Xueqi Cheng; |
454 | Understanding Retrieval Robustness for Retrieval-augmented Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we analyze the robustness of a retrieval-augmented captioning model SmallCap. |
Wenyan Li; Jiaang Li; Rita Ramos; Raphael Tang; Desmond Elliott; |
455 | Estimating The Level of Dialectness Predicts Inter-annotator Agreement in Multi-dialect Arabic Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We test this by analyzing the relation between ALDi scores and the annotators� agreement, on 15 public datasets having raw individual sample annotations for various sentence-classification tasks. We find strong evidence supporting our hypothesis for 11 of them. |
Amr Keleg; Walid Magdy; Sharon Goldwater; |
456 | Estimating The Level of Dialectness Predicts Inter-annotator Agreement in Multi-dialect Arabic Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We test this by analyzing the relation between ALDi scores and the annotators� agreement, on 15 public datasets having raw individual sample annotations for various sentence-classification tasks. We find strong evidence supporting our hypothesis for 11 of them. |
Amr Keleg; Walid Magdy; Sharon Goldwater; |
457 | SirLLM: Streaming Infinite Retentive LLM Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent efforts have employed streaming inputs to alleviate the pressure of excessively long text inputs, but this approach can significantly impair the model�s long-term memory capabilities. Motivated by this challenge, we introduce Streaming Infinite Retentive LLM (SirLLM), which allows LLMs to maintain longer memory during infinite-length dialogues without the need for fine-tuning. |
Yao Yao; Zuchao Li; Hai Zhao; |
458 | MAPO: Advancing Multilingual Reasoning Through Multilingual-Alignment-as-Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enhance reasoning abilities in non-dominant languages, we propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO) to align the reasoning processes in other languages with the dominant language. |
Shuaijie She; Wei Zou; Shujian Huang; Wenhao Zhu; Xiang Liu; Xiang Geng; Jiajun Chen; |
459 | Open Grounded Planning: Challenges and Benchmark Construction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new planning task�open grounded planning. |
Shiguang Guo; Ziliang Deng; Hongyu Lin; Yaojie Lu; Xianpei Han; Le Sun; |
460 | WARDEN: Multi-Directional Backdoor Watermarks for Embedding-as-a-Service Copyright Protection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through the analysis of the recent watermarking strategy for EaaS, EmbMarker, we design a novel CSE (Clustering, Selection, Elimination) attack that removes the backdoor watermark while maintaining the high utility of embeddings, indicating that the previous watermarking approach can be breached. In response to this new threat, we propose a new protocol to make the removal of watermarks more challenging by incorporating multiple possible watermark directions. |
Anudeex Shetty; Yue Teng; Ke He; Qiongkai Xu; |
461 | Simpson�s Paradox and The Accuracy-Fluency Tradeoff in Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A good translation should be faithful to the source and should respect the norms of the target language. We address a theoretical puzzle about the relationship between these objectives. |
Zheng Wei Lim; Ekaterina Vylomova; Trevor Cohn; Charles Kemp; |
462 | Dissecting Human and LLM Preferences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we dissect the preferences of human and 32 different LLMs to understand their quantitative composition, using annotations from real-world user-model conversations for a fine-grained, scenario-wise analysis. |
Junlong Li; Fan Zhou; Shichao Sun; Yikai Zhang; Hai Zhao; Pengfei Liu; |
463 | SciMON: Scientific Inspiration Machines Optimized for Novelty Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present SciMON, a modeling framework that uses retrieval of �inspirations� from past scientific papers, and explicitly optimizes for novelty by iteratively comparing to prior papers and updating idea suggestions until sufficient novelty is achieved. |
Qingyun Wang; Doug Downey; Heng Ji; Tom Hope; |
464 | GrowOVER: How Can LLMs Adapt to Growing Real-World Knowledge? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our research indicates that retrieval-augmented language models (RaLMs) struggle with knowledge that has not been trained on or recently updated. Consequently, we introduce a novel retrieval-interactive language model framework, where the language model evaluates and reflects on its answers for further re-retrieval. |
Dayoon Ko; Jinyoung Kim; Hahyeon Choi; Gunhee Kim; |
465 | A Unified Temporal Knowledge Graph Reasoning Model Towards Interpolation and Extrapolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this paper proposes an original Temporal PAth-based Reasoning (TPAR) model for both the interpolation and extrapolation reasoning settings. |
Kai Chen; Ye Wang; Yitong Li; Aiping Li; Han Yu; Xin Song; |
466 | Time Sensitive Knowledge Editing Through Efficient Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore Parameter-Efficient Fine-Tuning (PEFT) techniques as an alternative for KE. |
Xiou Ge; Ali Mousavi; Edouard Grave; Armand Joulin; Kun Qian; Benjamin Han; Mostafa Arefiyan; Yunyao Li; |
467 | Exploring Alignment in Shared Cross-lingual Spaces Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce two metrics CALIGN and COLAP aimed at quantifying these aspects, enabling a deeper exploration of multilingual embeddings. |
Basel Mousi; Nadir Durrani; Fahim Dalvi; Majd Hawasly; Ahmed Abdelali; |
468 | Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we posit that the distribution gap between task datasets and the LLMs serves as the primary underlying cause. |
Zhaorui Yang; Tianyu Pang; Haozhe Feng; Han Wang; Wei Chen; Minfeng Zhu; Qian Liu; |
469 | DeVAn: Dense Video Annotation for Video-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel human annotated dataset for evaluating the ability for visual-language models to generate both short and long descriptions for real-world video clips, termed DeVAn (Dense Video Annotation). |
Tingkai Liu; Yunzhe Tao; Haogeng Liu; Qihang Fang; Ding Zhou; Huaibo Huang; Ran He; Hongxia Yang; |
470 | From Moments to Milestones: Incremental Timeline Summarization Leveraging Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prior research typically focuses on either event or topic timeline summarization, neglecting the potential synergy of these two forms. In this study, we bridge this gap by introducing a novel approach that leverages large language models (LLMs) for generating both event and topic timelines. |
Qisheng Hu; Geonsik Moon; Hwee Tou Ng; |
471 | Fora: A Corpus and Framework for The Study of Facilitated Dialogue Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Fora, a unique collection of annotated facilitated dialogues. |
Hope Schroeder; Deb Roy; Jad Kabbara; |
472 | DocFinQA: A Long-Context Financial Reasoning Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Financial professionals often interact with documents spanning hundreds of pages, but most financial research datasets only deal with short excerpts from these documents. To address this, we introduce a long-document financial QA task. |
Varshini Reddy; Rik Koncel-Kedziorski; Viet Lai; Michael Krumdick; Charles Lovering; Chris Tanner; |
473 | REANO: Optimising Retrieval-Augmented Reader Models Through Knowledge Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, in order to capture the dependencies between passages while tacking the issue of incompleteness in existing KGs, we propose to enhance the retrieval-augmented reader model with a knowledge graph generation module (REANO). |
Jinyuan Fang; Zaiqiao Meng; Craig MacDonald; |
474 | Exploiting Intrinsic Multilateral Logical Rules for Weakly Supervised Natural Language Video Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel plug-and-play method, Intrinsic Multilateral Logical Rules, namely IMLR, to exploit intrinsic temporal relations and logical rules for WS-NLVL. |
Zhe Xu; Kun Wei; Xu Yang; Cheng Deng; |
475 | DetermLR: Augmenting LLM-based Logical Reasoning from Indeterminacy to Determinacy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose DetermLR, a novel perspective that rethinks the reasoning process as an evolution from indeterminacy to determinacy. |
Hongda Sun; Weikai Xu; Wei Liu; Jian Luan; Bin Wang; Shuo Shang; Ji-Rong Wen; Rui Yan; |
476 | Tuning Large Multimodal Models for Videos Using Reinforcement Learning from AI Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This discrepancy often results in alignments that poorly ground the video content. To address this, we present a novel alignment strategy that employs a multimodal AI system equipped with Reinforcement Learning from AI Feedback (RLAIF), providing self-preference feedback to refine itself and facilitating the alignment of video and text modalities. |
Daechul Ahn; Yura Choi; Youngjae Yu; Dongyeop Kang; Jonghyun Choi; |
477 | Learn from Failure: Fine-tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we demonstrate the benefit of training models that additionally learn from failed search paths. |
Chenyang An; Zhibo Chen; Qihao Ye; Emily First; Letian Peng; Jiayun Zhang; Zihan Wang; Sorin Lerner; Jingbo Shang; |
478 | Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While Reinforcement Learning from Human Feedback (RLHF) shows promise in aligning LLMs, its reliance on scalar rewards often limits its ability to capture diverse user preferences in real-world applications. To address this limitation, we introduce the Directional Preference Alignment (DPA) framework. |
Haoxiang Wang; Yong Lin; Wei Xiong; Rui Yang; Shizhe Diao; Shuang Qiu; Han Zhao; Tong Zhang; |
479 | Enhancing EEG-to-Text Decoding Through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, EEG-based language decoding is still in its nascent stages, facing several technical issues such as: 1) Absence of a hybrid strategy that can effectively integrate cross-modality (between EEG and text) self-learning with intra-modality self-reconstruction of EEG features or textual sequences; 2) Under-utilization of large language models (LLMs) to enhance EEG-based language decoding. To address above issues, we propose the Contrastive EEG-Text Masked Autoencoder (CET-MAE), a novel model that orchestrates compound self-supervised learning across and within EEG and text through a dedicated multi-stream encoder. |
Jiaqi Wang; Zhenxi Song; Zhengyu Ma; Xipeng Qiu; Min Zhang; Zhiguo Zhang; |
480 | SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce SwapMoE, a framework for efficient serving of MoE-based large language models with tunable memory budgets. |
Rui Kong; Yuanchun Li; Qingtian Feng; Weijun Wang; Xiaozhou Ye; Ye Ouyang; Linghe Kong; Yunxin Liu; |
481 | A Ship of Theseus: Curious Cases of Paraphrasing in LLM-Generated Texts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, since Large Language Models (LLMs) have demonstrated remarkable proficiency in both the generation of original content and the modification of human-authored texts, a pivotal question emerges concerning the determination of authorship in instances where LLMs or similar paraphrasing tools are employed to rephrase the text\u2013i.e., whether authorship should be attributed to the original human author or the AI-powered tool. Therefore, we embark on a philosophical voyage through the seas of language and authorship to unravel this intricate puzzle. |
Nafis Irtiza Tripto; Saranya Venkatraman; Dominik Macko; Robert Moro; Ivan Srba; Adaku Uchendu; Thai Le; Dongwon Lee; |
482 | On The Role of Long-tail Knowledge in Retrieval Augmented Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we suggest that long-tail knowledge is crucial for RAG as LLMs have already remembered common world knowledge during large-scale pre-training. |
Dongyang Li; Junbing Yan; Taolin Zhang; Chengyu Wang; Xiaofeng He; Longtao Huang; Hui Xue�; Jun Huang; |
483 | ARL2: Aligning Retrievers with Black-box Large Language Models Via Self-guided Adaptive Relevance Labeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing retrievers are often misaligned with LLMs due to separate training processes and the inherent black-box nature of LLMs. To address this challenge, we propose ARL2, a retriever learning technique that harnesses LLMs as labelers. |
LingXi Zhang; Yue Yu; Kuan Wang; Chao Zhang; |
484 | Virtual Compiler Is All You Need For Assembly Code Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores training a Large Language Model (LLM) to emulate a general compiler. |
Zeyu Gao; Hao Wang; Yuanda Wang; Chao Zhang; |
485 | DIALECTBENCH: An NLP Benchmark for Dialects, Varieties, and Closely-Related Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most NLP benchmarks are limited to standard language varieties. To fill this gap, we propose DIALECTBENCH, the first-ever large-scale benchmark for NLP on varieties, which aggregates an extensive set of task-varied varieties datasets (10 text-level tasks covering 281 varieties). |
Fahim Faisal; Orevaoghene Ahia; Aarohi Srivastava; Kabir Ahuja; David Chiang; Yulia Tsvetkov; Antonios Anastasopoulos; |
486 | Attribute First, Then Generate: Locally-attributable Grounded Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a locally-attributable text generation approach, prioritizing concise attributions. |
Aviv Slobodkin; Eran Hirsch; Arie Cattan; Tal Schuster; Ido Dagan; |
487 | Model Composition for Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new paradigm through the model composition of existing MLLMs to create a new model that retains the modal understanding capabilities of each original model. |
Chi Chen; Yiyang Du; Zheng Fang; Ziyue Wang; Fuwen Luo; Peng Li; Ming Yan; Ji Zhang; Fei Huang; Maosong Sun; Yang Liu; |
488 | On The Impact of Calibration Data in Post-training Quantization and Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the first extensive empirical study on the effect of calibration data upon LLM performance. |
Miles Williams; Nikolaos Aletras; |
489 | Reducing Privacy Risks in Online Self-Disclosures with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take the initiative to protect the user-side privacy associated with online self-disclosure through detection and abstraction. |
Yao Dou; Isadora Krsek; Tarek Naous; Anubha Kabra; Sauvik Das; Alan Ritter; Wei Xu; |
490 | MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In response, we introduce Modular Story Premise Synthesis (MoPS) which breaks down story premises into modules like background and persona for automated design and generation. |
Yan Ma; Yu Qiao; Pengfei Liu; |
491 | Meta-Task Prompting Elicits Embeddings from Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new unsupervised text embedding method, Meta-Task Prompting with Explicit One-Word Limitation (MetaEOL), for generating high-quality sentence embeddings from Large Language Models (LLMs) without the need for model fine-tuning. |
Yibin Lei; Di Wu; Tianyi Zhou; Tao Shen; Yu Cao; Chongyang Tao; Andrew Yates; |
492 | EasyGen: Easing Multimodal Generation with BiDiffuser and LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present EasyGen, an efficient model designed to enhance multimodal understanding and generation by harnessing the capabilities of diffusion models and large language models (LLMs). |
Xiangyu Zhao; Bo Liu; Qijiong Liu; Guangyuan Shi; Xiao-Ming Wu; |
493 | Language Models Can Exploit Cross-Task In-context Learning for Data-Scarce Novel Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper investigates whether LLMs can generalize from labeled examples of predefined tasks to novel tasks. |
Anwoy Chatterjee; Eshaan Tanwar; Subhabrata Dutta; Tanmoy Chakraborty; |
494 | Fortify The Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we demonstrate that an inherent waveform pattern in the attention allocation of large language models (LLMs) significantly affects their performance in tasks demanding a high degree of context awareness, such as utilizing LLMs for tool-use. |
Yuhan Chen; Ang Lv; Ting-En Lin; Changyu Chen; Yuchuan Wu; Fei Huang; Yongbin Li; Rui Yan; |
495 | MELA: Multilingual Evaluation of Linguistic Acceptability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present the largest benchmark to date on linguistic acceptability: Multilingual Evaluation of Linguistic Acceptability�MELA, with 46K samples covering 10 languages from a diverse set of language families. |
Ziyin Zhang; Yikang Liu; Weifang Huang; Junyu Mao; Rui Wang; Hai Hu; |
496 | Intrinsic Task-based Evaluation for Referring Expression Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we argue that this limitation could stem from the use of a purely ratings-based human evaluation (which is a common practice in Natural Language Generation). To investigate these issues, we propose an intrinsic task-based evaluation for REG models, in which, in addition to rating the quality of REs, participants were asked to accomplish two meta-level tasks. |
Guanyi Chen; Fahime Same; Kees Van Deemter; |
497 | Multimodal Instruction Tuning with Conditional Mixture of LoRA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, applying LoRA in multimodal instruction tuning presents the challenge of task interference, which leads to performance degradation, especially when dealing with a broad array of multimodal tasks. To address this, this paper introduces a novel approach that integrates multimodal instruction tuning with Conditional Mixture-of-LoRA (MixLoRA). |
Ying Shen; Zhiyang Xu; Qifan Wang; Yu Cheng; Wenpeng Yin; Lifu Huang; |
498 | Meta-Tuning LLMs to Leverage Lexical Knowledge for Generalizable Language Style Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that current large language models struggle to capture some language styles without fine-tuning. |
Ruohao Guo; Wei Xu; Alan Ritter; |
499 | T-Eval: Evaluating The Tool Utilization Capability of Large Language Models Step By Step Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast to previous works that evaluate models holistically, we comprehensively decompose the tool utilization into multiple sub-processes, including instruction following, planning, reasoning, retrieval, understanding, and review. Based on that, we further introduce T-Eval to evaluate the tool-utilization capability step by step. |
Zehui Chen; Weihua Du; Wenwei Zhang; Kuikun Liu; Jiangning Liu; Miao Zheng; Jingming Zhuo; Songyang Zhang; Dahua Lin; Kai Chen; Feng Zhao; |
500 | Disambiguate Words Like Composing Them: A Morphology-Informed Approach to Enhance Chinese Word Sense Disambiguation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we are motivated to enhance Chinese WSD with full morphological knowledge, including both word-formations and morphemes. |
Yue Wang; Qiliang Liang; Yaqi Yin; Hansi Wang; Yang Liu; |
This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (>900 papers), please visit Paper Digest: ACL-2024 (Full List).