Paper Digest: EMNLP 2024 Papers & Highlights
Note: EMNLP-2024 accepts more than 1,300 papers, this page only includes 500 of them selected by our daily paper digest algorithm. Interested users can choose to read All 1,300 EMNLP-2024 papers in a separate page.
To search or review papers within EMNLP-2024 related to a specific topic, please use the search by venue (EMNLP-2024), review by venue (EMNLP-2024) and question answering by venue (EMNLP-2024) services. To browse papers by author, here is a list of all authors (EMNLP-2024). You may also like to explore our “Best Paper” Digest (EMNLP), which lists the most influential EMNLP papers since 1996.
This list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that empowers you to read, write, get answers and review.
Try us today and unlock the full potential of our services for free!
TABLE 1: Paper Digest: EMNLP 2024 Papers & Highlights
Paper | Author(s) | |
---|---|---|
1 | Sample Design Engineering: An Empirical Study on Designing Better Fine-Tuning Samples for Information Extraction with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces **Sample Design Engineering** (SDE), a methodical approach to enhancing LLMs’ post-tuning performance on IE tasks by refining input, output, and reasoning designs. |
Biyang Guo; He Wang; Wenyilin Xiao; Hong Chen; ZhuXin Lee; Songqiao Han; Hailiang Huang; |
2 | Breaking The Curse of Multilinguality with Cross-lingual Expert Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite their popularity in non-English NLP, multilingual language models often underperform monolingual ones due to inter-language competition for model parameters. We propose Cross-lingual Expert Language Models (X-ELM), which mitigate this competition by independently training language models on subsets of the multilingual corpus. |
Terra Blevins; Tomasz Limisiewicz; Suchin Gururangan; Margaret Li; Hila Gonen; Noah A. Smith; Luke Zettlemoyer; |
3 | VGBench: A Comprehensive Benchmark of Vector Graphics Understanding and Generation for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose VGBench, a comprehensive benchmark for LLMs on handling vector graphics through diverse aspects, including (a) both visual understanding and generation, (b) evaluation of various vector graphics formats, (c) diverse question types, (d) wide range of prompting techniques, (e) under multiple LLMs and (f) comparison with VLMs on rasterized representations. |
Bocheng Zou; Mu Cai; Jianrui Zhang; Yong Jae Lee; |
4 | A Survey on In-context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to survey and summarize the progress and challenges of ICL. |
Qingxiu Dong; Lei Li; Damai Dai; Ce Zheng; Jingyuan Ma; Rui Li; Heming Xia; Jingjing Xu; Zhiyong Wu; Baobao Chang; Xu Sun; Lei Li; Zhifang Sui; |
5 | Grounding Language in Multi-Perspective Referential Communication Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a task and dataset for referring expression generation and comprehension in multi-agent embodied environments. |
Zineng Tang; Lingjun Mao; Alane Suhr; |
6 | AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Language agents, built on top of language models (LMs), are systems that can interact with complex environments, such as the open web. In this work, we examine whether such agents can perform realistic and time-consuming tasks on the web, e. g. , monitoring real-estate markets or locating relevant nearby businesses. |
Ori Yoran; Samuel Joseph Amouyal; Chaitanya Malaviya; Ben Bogin; Ofir Press; Jonathan Berant; |
7 | AMR-Evol: Adaptive Modular Response Evolution Elicits Better Knowledge Distillation for Large Language Models in Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, our study introduces the Adaptive Modular Response Evolution (AMR-Evol) framework, which employs a two-stage process to refine response distillation. |
Ziyang Luo; Xin Li; Hongzhan Lin; Jing Ma; Lidong Bing; |
8 | Instruction Pre-Training: Language Models Are Supervised Multitask Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore supervised multitask pre-training by proposing Instruction Pre-training, a framework that scalably augments massive raw corpora with instruction-response pairs to pre-train LMs. |
Daixuan Cheng; Yuxian Gu; Shaohan Huang; Junyu Bi; Minlie Huang; Furu Wei; |
9 | SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce various evaluation measures to assess both task success and progress, utilizing gold solutions when available or approximations otherwise. |
Ben Bogin; Kejuan Yang; Shashank Gupta; Kyle Richardson; Erin Bransom; Peter Clark; Ashish Sabharwal; Tushar Khot; |
10 | Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a simple method, whiteboard-of-thought prompting, to unlock the visual reasoning capabilities of multimodal large language models across modalities. |
Sachit Menon; Richard Zemel; Carl Vondrick; |
11 | Using Language Models to Disambiguate Lexical Choices in Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We work with native speakers of nine languages to create DTAiLS, a dataset of 1,377 sentence pairs that exhibit cross-lingual concept variation when translating from English. |
Josh Barua; Sanjay Subramanian; Kayo Yin; Alane Suhr; |
12 | An Analysis of Multilingual FActScore Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new dataset for FActScore on texts generated by strong multilingual LLMs. |
Vu Trong Kim; Michael Krumdick; Varshini Reddy; Franck Dernoncourt; Viet Dac Lai; |
13 | Calibrating Language Models with Adaptive Temperature Scaling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce Adaptive Temperature Scaling (ATS), a post-hoc calibration method that predicts a temperature scaling parameter for each token prediction. |
Johnathan Xie; Annie S Chen; Yoonho Lee; Eric Mitchell; Chelsea Finn; |
14 | Unveiling Multi-level and Multi-modal Semantic Representations in The Human Brain Using Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we recorded brain activity using functional magnetic resonance imaging (fMRI) while participants viewed 8. |
Yuko Nakagi; Takuya Matsuyama; Naoko Koide-Majima; Hiroto Q. Yamaguchi; Rieko Kubo; Shinji Nishimoto; Yu Takagi; |
15 | No Culture Left Behind: ArtELingo-28, A Benchmark of WikiArt with Captions in 28 Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we present ArtELingo-28, a vision-language benchmark that spans 28 languages and encompasses approximately 200,000 annotations (140 annotations per image). |
Youssef Mohamed; Runjia Li; Ibrahim Said Ahmad; Kilichbek Haydarov; Philip Torr; Kenneth Church; Mohamed Elhoseiny; |
16 | Humans or LLMs As The Judge? A Study on Judgement Bias Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework that is free from referencing groundtruth annotations for investigating **Misinformation Oversight Bias**, **Gender Bias**, **Authority Bias** and **Beauty Bias** on LLM and human judges. |
Guiming Hardy Chen; Shunian Chen; Ziche Liu; Feng Jiang; Benyou Wang; |
17 | Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It is often conjectured that this can teach the model the behavior of hallucinating factually incorrect responses, as the model is trained to generate facts that are not grounded in its pre-existing knowledge. In this work, we study the impact of such exposure to new knowledge on the capability of the fine-tuned model to utilize its pre-existing knowledge. |
Zorik Gekhman; Gal Yona; Roee Aharoni; Matan Eyal; Amir Feder; Roi Reichart; Jonathan Herzig; |
18 | One Thousand and One Pairs: A novel Challenge for Long-context Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Synthetic long-context LLM benchmarks (e. g. , needle-in-the-haystack) test only surface-level retrieval capabilities; but how well can long-context LLMs retrieve, synthesize, and reason over information across book-length inputs? We address this question by creating NoCha, a dataset of 1,001 minimally different pairs of true and false claims about 67 recently-published English fictional books, written by human readers of those books. |
Marzena Karpinska; Katherine Thai; Kyle Lo; Tanya Goyal; Mohit Iyyer; |
19 | Evaluating Psychological Safety of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we designed unbiased prompts to systematically evaluate the psychological safety of large language models (LLMs). |
Xingxuan Li; Yutong Li; Lin Qiu; Shafiq Joty; Lidong Bing; |
20 | Modeling Layout Reading Order As Ordering Relations for Visually-rich Document Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, we argue that this formulation does not adequately convey the complete reading order information in the layout, which may potentially lead to performance decline in downstream tasks. To address this issue, we propose to model the layout reading order as ordering relations over the set of layout elements, which have sufficient expressive capability for the complete reading order information. |
Chong Zhang; Yi Tu; Yixi Zhao; Chenshu Yuan; Huan Chen; Yue Zhang; Mingxu Chai; Ya Guo; Huijia Zhu; Qi Zhang; Tao Gui; |
21 | Chain-of-Dictionary Prompting Elicits Translation in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we present a novel framework, CoD, Chain-of-Dictionary Prompting, which augments LLMs with prior knowledge with the chains of multilingual dictionaries for a subset of input words to elicit translation abilities for LLMs. |
Hongyuan Lu; Haoran Yang; Haoyang Huang; Dongdong Zhang; Wai Lam; Furu Wei; |
22 | QUIK: Towards End-to-end 4-Bit Inference on Generative Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address the general quantization problem, where both weights and activations should be quantized, which leads to computational improvements in general. |
Saleh Ashkboos; Ilia Markov; Elias Frantar; Tingxuan Zhong; Xincheng Wang; Jie Ren; Torsten Hoefler; Dan Alistarh; |
23 | MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show how to build small fact-checking models that have GPT-4-level performance but for 400x lower cost. |
Liyan Tang; Philippe Laban; Greg Durrett; |
24 | PostMark: A Robust Blackbox Watermark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop PostMark, a modular post-hoc watermarking procedure in which an input-dependent set of words (determined via a semantic embedding) is inserted into the text after the decoding process has completed. |
Yapei Chang; Kalpesh Krishna; Amir Houmansadr; John Frederick Wieting; Mohit Iyyer; |
25 | Encouraging Divergent Thinking in Large Language Models Through Multi-Agent Debate Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the DoT problem, we propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of tit for tat and a judge manages the debate process to obtain a final solution. |
Tian Liang; Zhiwei He; Wenxiang Jiao; Xing Wang; Yan Wang; Rui Wang; Yujiu Yang; Shuming Shi; Zhaopeng Tu; |
26 | Teaching LLMs to Abstain Across Languages Via Multilingual Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose strategies to enhance LLM abstention by learning from multilingual feedback, where LLMs self-reflect on proposed answers in one language by generating multiple feedback items in related languages: we show that this helps identifying the knowledge gaps across diverse languages, cultures, and communities. |
Shangbin Feng; Weijia Shi; Yike Wang; Wenxuan Ding; Orevaoghene Ahia; Shuyue Stella Li; Vidhisha Balachandran; Sunayana Sitaram; Yulia Tsvetkov; |
27 | Modular Pluralism: Pluralistic Alignment Via Multi-LLM Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Modular Pluralism, a modular framework based on multi-LLM collaboration for pluralistic alignment: it plugs into a base LLM a pool of smaller but specialized community LMs, where models collaborate in distinct modes to flexibility support three modes of pluralism: Overton, steerable, and distributional. |
Shangbin Feng; Taylor Sorensen; Yuhan Liu; Jillian Fisher; Chan Young Park; Yejin Choi; Yulia Tsvetkov; |
28 | Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Additionally, they do not possess the ability to evaluate based on custom evaluation criteria, focusing instead on general attributes like helpfulness and harmlessness. To address these issues, we introduce Prometheus 2, a more powerful evaluator LM than its predecessor that closely mirrors human and GPT-4 judgements. |
Seungone Kim; Juyoung Suk; Shayne Longpre; Bill Yuchen Lin; Jamin Shin; Sean Welleck; Graham Neubig; Moontae Lee; Kyungjae Lee; Minjoon Seo; |
29 | WPO: Enhancing RLHF with Weighted Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, off-policy preference optimization often suffers from a distributional gap between the policy used for data collection and the target policy, leading to suboptimal optimization. In this paper, we propose a novel strategy to mitigate this problem by simulating on-policy learning with off-policy preference data. |
Wenxuan Zhou; Ravi Agrawal; Shujian Zhang; Sathish Reddy Indurthi; Sanqiang Zhao; Kaiqiang Song; Silei Xu; Chenguang Zhu; |
30 | VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we release VideoFeedback, the first large-scale dataset containing human-provided multi-aspect score over 37. |
Xuan He; Dongfu Jiang; Ge Zhang; Max Ku; Achint Soni; Sherman Siu; Haonan Chen; Abhranil Chandra; Ziyan Jiang; Aaran Arulraj; Kai Wang; Quy Duc Do; Yuansheng Ni; Bohan Lyu; Yaswanth Narsupalli; Rongqi Fan; Zhiheng Lyu; Bill Yuchen Lin; Wenhu Chen; |
31 | Belief Revision: The Adaptability of Large Language Models Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Belief-R, a new dataset designed to test LMs’ belief revision ability when presented with new evidence. |
Bryan Wilie; Samuel Cahyawijaya; Etsuko Ishii; Junxian He; Pascale Fung; |
32 | Altogether: Image Captioning Via Re-aligning Alt-text Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study a principled approach Altogether based on the key idea to edit and re-align existing alt-texts associated with the images. |
Hu Xu; Po-Yao Huang; Xiaoqing Tan; Ching-Feng Yeh; Jacob Kahn; Christine Jou; Gargi Ghosh; Omer Levy; Luke Zettlemoyer; Wen-tau Yih; Shang-Wen Li; Saining Xie; Christoph Feichtenhofer; |
33 | From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we seek to quantify the impact of IA research on the broader field of NLP. |
Marius Mosbach; Vagrant Gautam; Tom�s Vergara Browne; Dietrich Klakow; Mor Geva; |
34 | Arcee’s MergeKit: A Toolkit for Merging Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: MergeKit is an open-source library designed to support this process with an efficient and extensible framework suitable for any hardware. |
Charles Goddard; Shamane Siriwardhana; Malikeh Ehghaghi; Luke Meyers; Vladimir Karpukhin; Brian Benedict; Mark McQuade; Jacob Solawetz; |
35 | Large Language Model As An Assignment Evaluator: Insights, Feedback, and Challenges in A 1000+ Student Course Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on student responses, we found that LLM-based assignment evaluators are generally acceptable to students when they have free access to these tools. However, students also noted that the LLM sometimes fails to adhere to the evaluation instructions, resulting in unreasonable assessments. Additionally, we observed that students can easily manipulate the LLM to output specific strings, allowing them to achieve high scores without meeting the assignment rubric. |
Cheng-Han Chiang; Wei-Chih Chen; Chun-Yi Kuan; Chienchou Yang; Hung-yi Lee; |
36 | Decoding Matters: Addressing Amplification Bias and Homogeneity Issue in Recommendations for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we find these methods encounter significant challenges: 1) amplification bias-where standard length normalization inflates scores for items containing tokens with generation probabilities close to 1 (termed ghost tokens), and 2) homogeneity issue-generating multiple similar or repetitive items for a user. To tackle these challenges, we introduce a new decoding approach named Debiasing-Diversifying Decoding (D3). |
Keqin Bao; Jizhi Zhang; Yang Zhang; Xinyue Huo; Chong Chen; Fuli Feng; |
37 | Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes Non-disruptive parameters insertion (Otter), inserting extra parameters into the transformer architecture to predict calibration signals along with the original LLM output. |
Chenhan Yuan; Fei Huang; Ru Peng; Keming Lu; Bowen Yu; Chang Zhou; Jingren Zhou; |
38 | UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, their ability to handle rare objects, which fall into the long tail of data distributions, is less understood. To rigorously evaluate this aspect, we introduce the Uncontextualized Uncommon Objects (UOUO) benchmark. |
Xinyu Pi; Mingyuan Wu; Jize Jiang; Haozhen Zheng; Beitong Tian; ChengXiang Zhai; Klara Nahrstedt; Zhiting Hu; |
39 | Explicit Memory Learning with Expectation Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View |
Zhangyue Yin; Qiushi Sun; Qipeng Guo; Zhiyuan Zeng; Qinyuan Cheng; Xipeng Qiu; Xuanjing Huang; |
40 | Toward Compositional Behavior in Neural Models: A Survey of Current Views Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The research literature, however, includes conflicting perspectives on how CB should be defined, evaluated, and achieved. We propose a conceptual framework to address these questions and survey researchers active in this area. |
Kate McCurdy; Paul Soulos; Paul Smolensky; Roland Fernandez; Jianfeng Gao; |
41 | Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives, thereby guiding the model to generate responses that meet the requirements. |
Yiju Guo; Ganqu Cui; Lifan Yuan; Ning Ding; Zexu Sun; Bowen Sun; Huimin Chen; Ruobing Xie; Jie Zhou; Yankai Lin; Zhiyuan Liu; Maosong Sun; |
42 | Small Agent Can Also Rock! Empowering Small Language Models As Hallucination Detector Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an autonomous LLM-based agent framework, called HaluAgent, which enables relatively smaller LLMs (e. g. Baichuan2-Chat 7B) to actively select suitable tools for detecting multiple hallucination types such as text, code, and mathematical expression. |
Xiaoxue Cheng; Junyi Li; Xin Zhao; Hongzhi Zhang; Fuzheng Zhang; Di Zhang; Kun Gai; Ji-Rong Wen; |
43 | PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce **PrExMe**, a large-scale **Pr**ompt **Ex**ploration for **Me**trics, where we evaluate more than 720 prompt templates for open-source LLM-based metrics on machine translation (MT) and summarization datasets, totalling over 6. |
Christoph Leiter; Steffen Eger; |
44 | Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As large language models (LLMs) evolve, evaluating their output reliably becomes increasingly difficult due to the high cost of human evaluation. To address this, we introduce FLAMe, a family of Foundational Large Autorater Models. |
Tu Vu; Kalpesh Krishna; Salaheddin Alzubi; Chris Tar; Manaal Faruqui; Yun-Hsuan Sung; |
45 | Hidden Persuaders: LLMs’ Political Leaning and Their Influence on Voters Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Do LLMs have political leanings and are LLMs able to shift our political views? This paper explores these questions in the context of the 2024 U. S. presidential election. |
Yujin Potter; Shiyang Lai; Junsol Kim; James Evans; Dawn Song; |
46 | REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the extensive efforts on RAG research, in existing methods, LLMs cannot precisely assess the relevance of retrieved documents, thus likely leading to misleading or even incorrect utilization of external knowledge (i. e. , retrieved documents). To address this issue, in this paper, we propose REAR, a RElevance-Aware Retrieval-augmented approach for open-domain question answering (QA). |
Yuhao Wang; Ruiyang Ren; Junyi Li; Xin Zhao; Jing Liu; Ji-Rong Wen; |
47 | Not Everything Is All You Need: Toward Low-Redundant Optimization for Large Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It indicates the existence of redundant neurons in LLMs for alignment training. To reduce its influence, we propose a low-redundant alignment method named **ALLO**, focusing on optimizing the most related neurons with the most useful supervised signals. |
Zhipeng Chen; Kun Zhou; Xin Zhao; Jingyuan Wang; Ji-Rong Wen; |
48 | Can Large Language Models Always Solve Easy Problems If They Can Solve Harder Ones? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models (LLMs) have demonstrated impressive capabilities, but still suffer from inconsistency issues (e.g. LLMs can react differently to disturbances like rephrasing or inconsequential order change). In addition to these inconsistencies, we also observe that LLMs, while capable of solving hard problems, can paradoxically fail at easier ones. To evaluate this hard-to-easy inconsistency, we develop the ConsisEval benchmark, where each entry comprises a pair of questions with a strict order of difficulty. |
Zhe Yang; Yichang Zhang; Tianyu Liu; Jian Yang; Junyang Lin; Chang Zhou; Zhifang Sui; |
49 | D3CODE: Disentangling Disagreements in Data Across Cultures on Offensiveness Detection and Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we introduce the D3CODE dataset: a large-scale cross-cultural dataset of parallel annotations for offensive language in over 4. |
Aida Mostafazadeh Davani; Mark Diaz; Dylan K Baker; Vinodkumar Prabhakaran; |
50 | Automatic Instruction Evolving for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes Auto Evol-Instruct, an end-to-end framework that evolves instruction datasets using large language models without any human effort. |
Weihao Zeng; Can Xu; Yingxiu Zhao; Jian-Guang Lou; Weizhu Chen; |
51 | Knowledge Verification to Nip Hallucination in The Bud Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs. |
Fanqi Wan; Xinting Huang; Leyang Cui; Xiaojun Quan; Wei Bi; Shuming Shi; |
52 | Turn Waste Into Worth: Rectifying Top-k Router of MoE Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the dropped tokens and padding, we propose the Rectify-Router, comprising the Intra-GPU Rectification and the Fill-in Rectification. |
Zhiyuan Zeng; Qipeng Guo; Zhaoye Fei; Zhangyue Yin; Yunhua Zhou; Linyang Li; Tianxiang Sun; Hang Yan; Dahua Lin; Xipeng Qiu; |
53 | Memorize Step By Step: Efficient Long-Context Prefilling with Incremental Memory and Decremental Chunk Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, our empirical analysis shows that the fixed-size memory results in wasted computational and GPU memory resources. Therefore, we introduces Incremental Memory (IM), a method that starts with a small memory size and gradually increases it, optimizing computational efficiency. |
Zhiyuan Zeng; Qipeng Guo; Xiaoran Liu; Zhangyue Yin; Wentao Shu; Mianqiu Huang; Bo Wang; Yunhua Zhou; Linlin Li; Qun Liu; Xipeng Qiu; |
54 | Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although such ‘glitch tokens’, tokens present in the tokenizer vocabulary but that are nearly or entirely absent during model training, have been observed across various models, a reliable method to identify and address them has been missing. We present a comprehensive analysis of Large Language Model tokenizers, specifically targeting this issue of detecting under-trained tokens. |
Sander Land; Max Bartolo; |
55 | Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients Via Eliciting and Adhering to Principles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Privacy concerns restrict data access, and collecting expert feedback, although vital, is laborious. To address this, we develop Roleplay-doh, a novel human-LLM collaboration pipeline that elicits qualitative feedback from a domain-expert, which is transformed into a set of principles, or natural language rules, that govern an LLM-prompted roleplay. |
Ryan Louie; Ananjan Nandi; William Fang; Cheng Chang; Emma Brunskill; Diyi Yang; |
56 | Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: When asked to summarize articles or answer questions given a passage, large language models (LLMs) can hallucinate details and respond with unsubstantiated answers that are inaccurate with respect to the input context. This paper describes a simple approach for detecting such **contextual hallucinations**. |
Yung-Sung Chuang; Linlu Qiu; Cheng-Yu Hsieh; Ranjay Krishna; Yoon Kim; James R. Glass; |
57 | ImageInWords: Unlocking Hyper-Detailed Image Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Trained on short web-scraped image-text, vision-language models often generate incomplete descriptions with visual inconsistencies. We address this via a novel data-centric approach with ImageInWords (IIW), a carefully designed human-in-the-loop framework for curating hyper-detailed image descriptions. |
Roopal Garg; Andrea Burns; Burcu Karagol Ayan; Yonatan Bitton; Ceslee Montgomery; Yasumasa Onoe; Andrew Bunner; Ranjay Krishna; Jason Michael Baldridge; Radu Soricut; |
58 | Evaluating N-Gram Novelty of Language Models Using Rusty-DAWG Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the extent to which modern LMs generate n-grams from their training data, evaluating both (i) the probability LMs assign to complete training n-grams and (ii) n-novelty, the proportion of n-grams generated by an LM that did not appear in the training data (for arbitrarily large n). |
William Merrill; Noah A. Smith; Yanai Elazar; |
59 | Token Erasure As A Footprint of Implicit Vocabulary Items in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we find that last token representations of named entities and multi-token words exhibit a pronounced erasure effect, where information about previous and current tokens is rapidly forgotten in early layers. |
Sheridan Feucht; David Atkinson; Byron C Wallace; David Bau; |
60 | The Greatest Good Benchmark: Measuring LLMs’ Alignment with Utilitarian Moral Dilemmas Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Greatest Good Benchmark to evaluate the moral judgments of LLMs using utilitarian dilemmas. |
Giovanni Franco Gabriel Marraffini; Andr�s Cotton; Noe Fabian Hsueh; Axel Fridman; Juan Wisznia; Luciano Del Corro; |
61 | Estimating Knowledge in Large Language Models Without Generating A Single Token Related Papers Related Patents Related Grants Related Venues Related Experts View |
Daniela Gottesman; Mor Geva; |
62 | RaTEScore: A Metric for Radiology Report Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a novel, entity-aware metric, termed as Radiological Report (Text) Evaluation (RaTEScore), to assess the quality of medical reports generated by AI models. |
Weike Zhao; Chaoyi Wu; Xiaoman Zhang; Ya Zhang; Yanfeng Wang; Weidi Xie; |
63 | Uncertainty in Language Models: Assessment Through Rank-Calibration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these measures can differ greatly, and it is unclear how to compare them, partly because they take values over different ranges (e. g. , [0,8) or [0,1]). In this work, we address this issue by developing a novel and practical framework, termed *Rank-Calibration*, to assess uncertainty and confidence measures for LMs. |
Xinmeng Huang; Shuo Li; Mengxin Yu; Matteo Sesia; Hamed Hassani; Insup Lee; Osbert Bastani; Edgar Dobriban; |
64 | Evaluating D-MERIT of Partial-annotation on Information Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View |
Royi Rassin; Yaron Fairstein; Oren Kalinsky; Guy Kushilevitz; Nachshon Cohen; Alexander Libov; Yoav Goldberg; |
65 | On The Universal Truthfulness Hyperplane Inside LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate whether a universal truthfulness hyperplane that distinguishes the model’s factually correct and incorrect outputs exists within the model. |
Junteng Liu; Shiqi Chen; Yu Cheng; Junxian He; |
66 | A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on our critical review, we present our perspectives and recommendations to ensure LLM evaluations are reproducible, reliable, and robust. |
Md Tahmid Rahman Laskar; Sawsan Alqahtani; M Saiful Bari; Mizanur Rahman; Mohammad Abdullah Matin Khan; Haidar Khan; Israt Jahan; Amran Bhuiyan; Chee Wei Tan; Md Rizwan Parvez; Enamul Hoque; Shafiq Joty; Jimmy Huang; |
67 | Rethinking The Role of Proxy Rewards in Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the role of proxy rewards in the LLM alignment via ‘reverse reward engineering’ by composing interpretable features as a white-box reward function. |
Sungdong Kim; Minjoon Seo; |
68 | Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For example, if the LLM is equally likely to output two contradicting answers to the same question, then its generated response should reflect this uncertainty by hedging its answer (e. g. , I’m not sure, but I think. . . ). We formalize faithful response uncertainty based on the gap between the model’s intrinsic confidence in the assertions it makes and the decisiveness by which they are conveyed. |
Gal Yona; Roee Aharoni; Mor Geva; |
69 | ArxivDIGESTables: Synthesizing Scientific Literature Into Tables Using Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Can we automatically generate these tables using language models (LMs)? In this work, we introduce a framework that leverages LMs to perform this task by decomposing it into separate schema and value generation steps. |
Benjamin Newman; Yoonjoo Lee; Aakanksha Naik; Pao Siangliulue; Raymond Fok; Juho Kim; Daniel S Weld; Joseph Chee Chang; Kyle Lo; |
70 | Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show empirically that finding the mistake in a student solution is challenging for current models. We propose and evaluate several verifiers for detecting these errors. |
Nico Daheim; Jakub Macina; Manu Kapur; Iryna Gurevych; Mrinmaya Sachan; |
71 | OMG-QA: Building Open-Domain Multi-Modal Generative Question Answering Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce OMG-QA, a new resource for question answering that is designed to evaluate the effectiveness of question answering systems that perform retrieval augmented generation (RAG) in scenarios that demand reasoning on multi-modal, multi-document contexts. |
Linyong Nan; Weining Fang; Aylin Rasteh; Pouya Lahabi; Weijin Zou; Yilun Zhao; Arman Cohan; |
72 | FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we delve deeply into why and how mis-calibration exists in fine-tuned models, and how distillation can alleviate the issue. |
KaShun Shum; Minrui Xu; Jianshu Zhang; Zixin Chen; Shizhe Diao; Hanze Dong; Jipeng Zhang; Muhammad Omer Raza; |
73 | Information Flow Routes: Automatically Interpreting Language Models at Scale Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These routes can be represented as graphs where nodes correspond to token representations and edges to computations. We automatically build these graphs in a top-down manner, for each prediction leaving only the most important nodes and edges. |
Javier Ferrando; Elena Voita; |
74 | Decoding Susceptibility: Modeling Misbelief to Misinformation Through A Computational Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing susceptibility studies heavily rely on self-reported beliefs, which can be subject to bias, expensive to collect, and challenging to scale for downstream applications. To address these limitations, in this work, we propose a computational approach to efficiently model users’ latent susceptibility levels. |
Yanchen Liu; Mingyu Derek Ma; Wenna Qin; Azure Zhou; Jiaao Chen; Weiyan Shi; Wei Wang; Diyi Yang; |
75 | Voices Unheard: NLP Resources and Models for Yor�b� Regional Dialects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent efforts to develop NLP technologies for African languages have focused on their standard dialects, resulting in disparities for dialects and varieties for which there are little to no resources or tools. We take steps towards bridging this gap by introducing a new high-quality parallel text and speech corpus; YORULECT across three domains and four regional yoruba dialects. |
Orevaoghene Ahia; Anuoluwapo Aremu; Diana Abagyan; Hila Gonen; David Ifeoluwa Adelani; Daud Abolade; Noah A. Smith; Yulia Tsvetkov; |
76 | Learning Personalized Alignment for Evaluating Open-ended Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional evaluation metrics rely heavily on lexical similarity with human-written references, often showing poor correlation with human judgments and failing to account for alignment with the diversity of human preferences. To address these challenges, we introduce PerSE, an interpretable evaluation framework designed to assess alignment with specific human preferences. |
Danqing Wang; Kevin Yang; Hanlin Zhu; Xiaomeng Yang; Andrew Cohen; Lei Li; Yuandong Tian; |
77 | PRISM: A New Lens for Improved Color Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While image-text pre-trained models, such as CLIP, have demonstrated impressive capabilities in learning robust text and image representations, a critical area for substantial improvement remains-precise color understanding. In this paper, we address this limitation by introducing PRISM, a simple yet highly effective method that extends CLIP’s capability to grasp the nuances of precise colors. |
Arjun Reddy Akula; Garima Pruthi; Inderjit S Dhillon; Pradyumna Narayana; Sugato Basu; Varun Jampani; |
78 | MatchTime: Towards Automatic Soccer Game Commentary Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences’ viewing experience. |
Jiayuan Rao; Haoning Wu; Chang Liu; Yanfeng Wang; Weidi Xie; |
79 | Query-OPT: Optimizing Inference of Large Language Models Via Multi-Query Instructions in Meeting Summarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, repeated calls to the LLM inference endpoints would significantly increase the costs of using them in production, making LLMs impractical for many real-world use cases. To address this problem, in this paper, we investigate whether combining the queries for the same input context in a single prompt to minimize repeated calls can be successfully used in meeting summarization. |
Md Tahmid Rahman Laskar; Elena Khasanova; Xue-Yong Fu; Cheng Chen; Shashi Bhushan Tn; |
80 | User Inference Attacks on Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we ask if it is possible to infer if any of a _user’s_ data was used to train an LLM. |
Nikhil Kandpal; Krishna Pillutla; Alina Oprea; Peter Kairouz; Christopher A. Choquette-Choo; Zheng Xu; |
81 | Enhancing Legal Case Retrieval Via Scaling High-quality Synthetic Query-Candidate Pairs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing works face two main challenges for real-world applications: existing works mainly focus on case-to-case retrieval using lengthy queries, which does not match real-world scenarios; and the limited data scale, with current datasets containing only hundreds of queries, is insufficient to satisfy the training requirements of existing data-hungry neural models. To address these issues, we introduce an automated method to construct synthetic query-candidate pairs and build the largest LCR dataset to date, LEAD, which is hundreds of times larger than existing datasets. |
Cheng Gao; Chaojun Xiao; Zhenghao Liu; Huimin Chen; Zhiyuan Liu; Maosong Sun; |
82 | Matryoshka-Adaptor: Unsupervised and Supervised Tuning for Smaller Embedding Dimensions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While high-dimensional embeddings generally demonstrate superior performance as they contain more salient information, their practical application is frequently hindered by elevated computational latency and the associated higher cost. To address these challenges, we propose Matryoshka-Adaptor, a novel tuning framework designed for the customization of LLM embeddings. |
Jinsung Yoon; Rajarishi Sinha; Sercan O Arik; Tomas Pfister; |
83 | An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we construct a novel VQA dataset, Spatial-MM, to comprehensively study LMMs’ spatial understanding and reasoning capabilities. |
Fatemeh Shiri; Xiao-Yu Guo; Mona Golestan Far; Xin Yu; Reza Haf; Yuan-Fang Li; |
84 | Heterogeneous LoRA for Federated Fine-tuning of On-Device Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Federated fine-tuning of ODFMs has unique challenges non-present in standard fine-tuning: i) ODFMs poorly generalize to downstream tasks due to their limited sizes making proper fine-tuning imperative to their performance, and ii) devices have limited and heterogeneous system capabilities and data that can deter the performance of fine-tuning. Tackling these challenges, we propose HetLoRA, a feasible and effective federated fine-tuning method for ODFMs that leverages the system and data heterogeneity at the edge. |
Yae Jee Cho; Luyang Liu; Zheng Xu; Aldi Fahrezi; Gauri Joshi; |
85 | Investigating LLMs As Voting Assistants Via Contextual Augmentation: A Case Study on The European Parliament Elections 2024 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In light of the recent 2024 European Parliament elections, we are investigating if LLMs can be used as Voting Advice Applications (VAAs). |
Ilias Chalkidis; |
86 | Systematic Biases in LLM Simulations of Debates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we highlight the limitations of LLMs in simulating human interactions, particularly focusing on LLMs’ ability to simulate political debates on topics that are important aspects of people’s day-to-day lives and decision-making processes. |
Amir Taubenfeld; Yaniv Dover; Roi Reichart; Ariel Goldstein; |
87 | Language Models Learn Rare Phenomena from Less Rare Phenomena: The Case of The Missing AANNs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Language models learn rare syntactic phenomena, but the extent to which this is attributable to generalization vs. memorization is a major open question. To that end, we iteratively trained transformer language models on systematically manipulated corpora which were human-scale in size, and then evaluated their learning of a rare grammatical phenomenon: the English Article+Adjective+Numeral+Noun (AANN) construction (\u201ca beautiful five days\u201d). |
Kanishka Misra; Kyle Mahowald; |
88 | Experimental Contexts Can Facilitate Robust Semantic Property Inference in Language Models, But Inconsistently Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: How well does this translate to previously studied meaning-sensitive tasks? We present a case-study on the extent to which experimental contexts can improve LMs’ robustness in performing property inheritance-predicting semantic properties of novel concepts, a task that they have been previously shown to fail on. |
Kanishka Misra; Allyson Ettinger; Kyle Mahowald; |
89 | Summary of A Haystack: A Challenge to Long-Context LLMs and RAG Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, evaluating the output quality of such systems on long-context tasks remains challenging, as tasks like Needle-in-a-Haystack lack complexity. In this work, we argue that summarization can play a central role in such evaluation. |
Philippe Laban; Alexander Fabbri; Caiming Xiong; Chien-Sheng Wu; |
90 | Surprise! Uniform Information Density Isn’t The Whole Story: Predicting Surprisal Contours in Long-form Discourse Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Speakers may also seek to maintain interest, adhere to writing conventions, and build compelling arguments. In this paper, we propose one such functional pressure; namely that speakers modulate information rate based on location within a hierarchically-structured model of discourse. |
Eleftheria Tsipidi; Franz Nowak; Ryan Cotterell; Ethan Wilcox; Mario Giulianelli; Alex Warstadt; |
91 | Ouroboros: Generating Longer Drafts Phrase By Phrase for Faster Speculative Decoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Therefore, generating longer drafts at less cost can lead to better decoding speedup. To achieve this, we introduce Ouroboros, which can generate draft phrases to parallelize the drafting process and meanwhile lengthen drafts in a training-free manner. |
Weilin Zhao; Yuxiang Huang; Xu Han; Wang Xu; Chaojun Xiao; Xinrong Zhang; Yewei Fang; Kaihuo Zhang; Zhiyuan Liu; Maosong Sun; |
92 | Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Mathador-LM, a new benchmark for evaluating the mathematical reasoning on large language models (LLMs), combining ruleset interpretation, planning, and problem-solving. |
Eldar Kurtic; Amir Moeini; Dan Alistarh; |
93 | Empowering Multi-step Reasoning Across Languages Via Program-Aided Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Cross-lingual Program-Aided Language Models (CrossPAL), a method for aligning reasoning programs across languages. |
Leonardo Ranaldi; Giulia Pucci; Barry Haddow; Alexandra Birch; |
94 | MLLM-Protector: Ensuring MLLM’s Safety Without Hurting Performance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This vulnerability is exacerbated by the fact that most state-of-the-art MLLMs are fine-tuned on limited image-text pairs that are much fewer than the extensive text-based pretraining corpus, which makes the MLLMs more prone to catastrophic forgetting of their original abilities during safety fine-tuning. To tackle these challenges, we introduce MLLM-Protector, a plug-and-play strategy that solves two subtasks: 1) identifying harmful responses via a lightweight harm detector, and 2) transforming harmful responses into harmless ones via a detoxifier. |
Renjie Pi; Tianyang Han; Jianshu Zhang; Yueqi Xie; Rui Pan; Qing Lian; Hanze Dong; Jipeng Zhang; Tong Zhang; |
95 | RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we perform an exhaustive study to achieve a new state of the art in aligning multilingual LLMs. |
John Dang; Arash Ahmadian; Kelly Marchisio; Julia Kreutzer; Ahmet �st�n; Sara Hooker; |
96 | LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Results affirm the framework’s effectiveness in creating adaptive agents and suggest LLM-based agents’ potential in navigating dynamic social interactions. By examining collaboration and confrontation behaviors, we offer insights into this field’s research and applications. |
Yihuai Lan; Zhiqiang Hu; Lei Wang; Yang Wang; Deheng Ye; Peilin Zhao; Ee-Peng Lim; Hui Xiong; Hao Wang; |
97 | SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, most LVLMs struggle to selectively utilize retrieved information and are sensitive to irrelevant or misleading references. To address these challenges, we propose a self-refinement framework designed to teach LVLMs to Selectively Utilize Retrieved Information (SURf). |
Jiashuo Sun; Jihai Zhang; Yucheng Zhou; Zhaochen Su; Xiaoye Qu; Yu Cheng; |
98 | STAR: SocioTechnical Approach to Red Teaming Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This research introduces STAR, a sociotechnical framework that improves on current best practices for red teaming safety of large language models. |
Laura Weidinger; John F J Mellor; Bernat Guill�n Pegueroles; Nahema Marchal; Ravin Kumar; Kristian Lum; Canfer Akbulut; Mark Diaz; A. Stevie Bergman; Mikel D. Rodriguez; Verena Rieser; William Isaac; |
99 | The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we explore the viability of different alignment approaches when balancing dual objectives: addressing and optimizing for a non-homogeneous set of languages and cultural preferences while minimizing both global and local harms. |
Aakanksha; Arash Ahmadian; Beyza Ermis; Seraphina Goldfarb-Tarrant; Julia Kreutzer; Marzieh Fadaee; Sara Hooker; |
100 | Calibrating The Confidence of Large Language Models By Eliciting Fidelity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we decompose the language model confidence into the Uncertainty about the question and the Fidelity to the answer generated by language models. |
Mozhi Zhang; Mianqiu Huang; Rundong Shi; Linsen Guo; Chong Peng; Peng Yan; Yaqian Zhou; Xipeng Qiu; |
101 | Learning from Natural Language Explanations for Generalizable Entity Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As an efficient alternative, we re-cast entity matching as a conditional generation task as opposed to binary classification. This enables us to \u201cdistill\u201d LLM reasoning into smaller entity matching models via natural language explanations. |
Somin Wadhwa; Adit Krishnan; Runhui Wang; Byron C Wallace; Luyang Kong; |
102 | ChatRetriever: Adapting Large Language Models for Generalized and Robust Conversational Dense Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents ChatRetriever, which inherits the strong generalization capability of large language models to robustly represent complex conversational sessions for dense retrieval. To achieve this, we propose a simple and effective dual-learning approach that adapts LLM for retrieval via contrastive learning while enhancing the complex session understanding through masked instruction tuning on high-quality conversational instruction tuning data. |
Kelong Mao; Chenlong Deng; Haonan Chen; Fengran Mo; Zheng Liu; Tetsuya Sakai; Zhicheng Dou; |
103 | Strategic Demonstration Selection for Improved Fairness in LLM In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Further experiments demonstrate that the proportion of minority to majority samples in demonstrations affects the trade-off between fairness and prediction accuracy. Based on these insights, we introduce a mitigation technique that employs clustering and evolutionary strategies to curate a diverse and representative sample set from the training data. |
Jingyu Hu; Weiru Liu; Mengnan Du; |
104 | RoTBench: A Multi-Level Benchmark for Evaluating The Robustness of Large Language Models in Tool Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: More surprisingly, the noise correction capability inherent in the GPT family paradoxically impedes its adaptability in the face of mild noise. In light of these findings, we propose RoTTuning, a strategy that enriches the diversity of training environments to bolster the robustness of LLMs in tool learning. |
Junjie Ye; Yilong Wu; Songyang Gao; Caishuang Huang; Sixian Li; Guanyu Li; Xiaoran Fan; Qi Zhang; Tao Gui; Xuanjing Huang; |
105 | DecorateLM: Data Engineering Through Corpus Rating, Tagging, and Editing with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce DecorateLM, a data engineering method designed to refine the pretraining corpus through data rating, tagging and editing. |
Ranchi Zhao; Zhen Leng Thai; Yifan Zhang; Shengding Hu; Jie Zhou; Yunqi Ba; Jie Cai; Zhiyuan Liu; Maosong Sun; |
106 | Making Large Language Models Better Reasoners with Orchestrated Streaming Experiences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present **RoSE** (**R**easoning with **O**rchestrated **S**treaming **E**xperiences), a general framework for solving reasoning tasks that can self-improve as it answers various reasoning questions. |
Xiangyang Liu; Junliang He; Xipeng Qiu; |
107 | InferAligner: Inference-Time Alignment for Harmlessness Through Cross-Model Guidance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these methods often involve complex and resource-intensive training processes, posing significant challenge for their implementation. Therefore, we propose InferAligner, a simple yet effective method for harmlessness alignment during inference phase. |
Pengyu Wang; Dong Zhang; Linyang Li; Chenkun Tan; Xinghao Wang; Mozhi Zhang; Ke Ren; Botian Jiang; Xipeng Qiu; |
108 | SOUL: Unlocking The Power of Second-Order Optimization for LLM Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we shed light on the significance of optimizer selection in LLM unlearning for the first time, establishing a clear connection between second-order optimization and influence unlearning (a classical approach using influence functions to update the model for data influence removal). |
Jinghan Jia; Yihua Zhang; Yimeng Zhang; Jiancheng Liu; Bharat Runwal; James Diffenderfer; Bhavya Kailkhura; Sijia Liu; |
109 | Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present MIRAGE – Model Internals-based RAG Explanations – a plug-and-play approach using model internals for faithful answer attribution in RAG applications. |
Jirui Qi; Gabriele Sarti; Raquel Fern�ndez; Arianna Bisazza; |
110 | Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we evaluate a simple approach for zero-shot cross-lingual alignment, where a reward model is trained on preference data in one source language and directly applied to other target languages. |
Zhaofeng Wu; Ananth Balashankar; Yoon Kim; Jacob Eisenstein; Ahmad Beirami; |
111 | Event Causality Identification with Synthetic Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we adopt the Rubin Causal Model to identify event causality: given two temporally ordered events, we see the first event as the treatment and the second one as the observed outcome. |
Haoyu Wang; Fengze Liu; Jiayao Zhang; Dan Roth; Kyle Richardson; |
112 | An Image Speaks A Thousand Words, But Can Everyone Listen? On Image Transcreation for Cultural Relevance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While several applications stand to benefit from this, machine translation systems remain confined to dealing with language in speech and text. In this work, we introduce a new task of translating images to make them culturally relevant. |
Simran Khanuja; Sathyanarayanan Ramamoorthy; Yueqi Song; Graham Neubig; |
113 | Discovering Knowledge-Critical Subnetworks in Pretrained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate whether pretrained language models contain various *knowledge-critical* subnetworks: particular sparse computational subgraphs that can, if removed, precisely suppress specific knowledge the model has memorized. |
Deniz Bayazit; Negar Foroutan; Zeming Chen; Gail Weiss; Antoine Bosselut; |
114 | Moral Foundations of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper uses MFT as a lens to analyze whether popular LLMs have acquired a bias towards a particular set of moral values. |
Marwa Abdulhai; Gregory Serapio-Garc�a; Clement Crepy; Daria Valter; John Canny; Natasha Jaques; |
115 | LLM See, LLM Do: Leveraging Active Inheritance to Target Non-Differentiable Objectives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To start, our work exhaustively characterizes the impact of passive inheritance of model properties by systematically studying how the source of synthetic data shapes models’ internal biases, calibration and preferences, and their generations’ textual attributes, providing one of the most comprehensive studies to-date. |
Lu�sa Shimabucoro; Sebastian Ruder; Julia Kreutzer; Marzieh Fadaee; Sara Hooker; |
116 | Hopping Too Late: Exploring The Limitations of Large Language Models on Multi-Hop Queries Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Because the second hop commences in later layers, there could be cases where these layers no longer encode the necessary knowledge for correctly predicting the answer. Motivated by this, we propose a novel back-patching analysis method whereby a hidden representation from a later layer is patched back to an earlier layer. |
Eden Biran; Daniela Gottesman; Sohee Yang; Mor Geva; Amir Globerson; |
117 | From Local Concepts to Universals: Evaluating The Multicultural Understanding of Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Still, they have limited coverage of cultures and do not adequately assess cultural diversity across universal and culture-specific local concepts. To address these limitations, we introduce the GlobalRG benchmark, comprising two challenging tasks: retrieval across universals and cultural visual grounding. |
Mehar Bhatia; Sahithya Ravi; Aditya Chinchure; EunJeong Hwang; Vered Shwartz; |
118 | Data, Data Everywhere: A Guide for Pretraining Dataset Construction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, model developers fail to disclose their construction methodology which has lead to a lack of open information on how to develop effective pretraining sets. To address this issue, we perform the first systematic study across the entire pipeline of pretraining set construction. |
Jupinder Parmar; Shrimai Prabhumoye; Joseph Jennings; Bo Liu; Aastha Jhunjhunwala; Zhilin Wang; Mostofa Patwary; Mohammad Shoeybi; Bryan Catanzaro; |
119 | PDFTriage: Question Answering Over Long, Structured Documents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When a system has to query the document for context, this incongruity is brought to the fore, and seemingly trivial questions can trip up the QA system. To bridge this fundamental gap in handling structured documents, we propose an approach called PDFTriage that enables models to retrieve the context based on either structure or content. |
Jon Saad-Falcon; Joe Barrow; Alexa Siu; Ani Nenkova; Seunghyun Yoon; Ryan A. Rossi; Franck Dernoncourt; |
120 | AgentReview: Exploring Peer Review Dynamics with LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce AgentReview, the first large language model (LLM) based peer review simulation framework, which effectively disentangles the impacts of multiple latent factors and addresses the privacy issue. |
Yiqiao Jin; Qinlin Zhao; Yiyang Wang; Hao Chen; Kaijie Zhu; Yijia Xiao; Jindong Wang; |
121 | ArMeme: Propagandistic Content in Arabic Memes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we focused on developing an Arabic memes dataset with manual annotations of propagandistic content. |
Firoj Alam; Abul Hasnat; Fatema Ahmad; Md. Arid Hasan; Maram Hasanain; |
122 | Investigating Mysteries of CoT-Augmented Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More recent efforts have shown that such rationales can also be used for model distillation: Including CoT sequences (elicited from a large teacher model) in addition to target labels when fine-tuning a small student model yields (often substantial) improvements. In this work we ask: Why and how does this additional training signal help in model distillation? |
Somin Wadhwa; Silvio Amir; Byron C Wallace; |
123 | Factuality of Large Language Models: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this survey, we critically analyze existing work with the aim to identify the major challenges and their associated causes, pointing out to potential solutions for improving the factuality of LLMs, and analyzing the obstacles to automated factuality evaluation for open-ended text generation. |
Yuxia Wang; Minghan Wang; Muhammad Arslan Manzoor; Fei Liu; Georgi Nenkov Georgiev; Rocktim Jyoti Das; Preslav Nakov; |
124 | FLIRT: Feedback Loop In-context Red Teaming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As generative models become available for public use in various applications, testing and analyzing vulnerabilities of these models has become a priority. In this work, we propose an automatic red teaming framework that evaluates a given black-box model and exposes its vulnerabilities against unsafe and inappropriate content generation. |
Ninareh Mehrabi; Palash Goyal; Christophe Dupuy; Qian Hu; Shalini Ghosh; Richard Zemel; Kai-Wei Chang; Aram Galstyan; Rahul Gupta; |
125 | ORPO: Monolithic Preference Optimization Without Reference Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we revisit SFT in the context of preference alignment, emphasizing that a minor penalty for the disfavored style is sufficient for preference alignment. |
Jiwoo Hong; Noah Lee; James Thorne; |
126 | KidLM: Advancing Language Models for Children – Early Insights and Future Directions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore foundational steps toward the development of child-specific language models, emphasizing the necessity of high-quality pre-training data. |
Mir Tafseer Nayeem; Davood Rafiei; |
127 | StyleRemix: Interpretable Authorship Obfuscation Via Distillation and Perturbation of Style Elements Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current methods using large language models (LLMs) lack interpretability and controllability, often ignoring author-specific stylistic features, resulting in less robust performance overall. To address this, we develop StyleRemix, an adaptive and interpretable obfuscation method that perturbs specific, fine-grained style elements of the original input text. |
Jillian Fisher; Skyler Hallinan; Ximing Lu; Mitchell L Gordon; Zaid Harchaoui; Yejin Choi; |
128 | AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i. e. text, image, video, audio, IMU motion sensor), and generates textual responses. |
Seungwhan Moon; Andrea Madotto; Zhaojiang Lin; Tushar Nagarajan; Matt Smith; Shashank Jain; Chun-Fu Yeh; Prakash Murugesan; Peyman Heidari; Yue Liu; Kavya Srinet; Babak Damavandi; Anuj Kumar; |
129 | RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: First, limited retrieved contexts might not cover all necessary information, while excessive retrieval can introduce irrelevant and inaccurate references, interfering with the model\u2019s generation. Second, in cases where the model originally responds correctly, applying RAG can lead to an over-reliance on retrieved contexts, resulting in incorrect answers. To address these issues, we propose RULE, which consists of two components. |
Peng Xia; Kangyu Zhu; Haoran Li; Hongtu Zhu; Yun Li; Gang Li; Linjun Zhang; Huaxiu Yao; |
130 | PARIKSHA: A Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study human and LLM-based evaluation in a multilingual, multi-cultural setting. |
Ishaan Watts; Varun Gumma; Aditya Yadavalli; Vivek Seshadri; Manohar Swaminathan; Sunayana Sitaram; |
131 | Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Self-refine Instruction-tuning method that elicits Smaller Language Models to self-improve their abilities. |
Leonardo Ranaldi; Andre Freitas; |
132 | 1+1>2: Can Large Language Models Serve As Cross-Lingual Knowledge Aggregators? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a method to enhance the multilingual performance of LLMs by aggregating knowledge from diverse languages. |
Yue Huang; Chenrui Fan; Yuan Li; Siyuan Wu; Tianyi Zhou; Xiangliang Zhang; Lichao Sun; |
133 | LLM Task Interference: An Initial Study on The Impact of Task-Switch in Conversational History Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although this sensitivity to the conversational history can often lead to improved performance on subsequent tasks, we find that performance can in fact also be negatively impacted, if there is a _task-switch_. To the best of our knowledge, our work makes the first attempt to formalize the study of such vulnerabilities and interference of tasks in conversational LLMs caused by task-switches in the conversational history. |
Akash Gupta; Ivaxi Sheth; Vyas Raina; Mark Gales; Mario Fritz; |
134 | Unifying Multimodal Retrieval Via Document Screenshot Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose Document Screenshot Embedding (DSE), a novel retrieval paradigm that regards document screenshots as a unified input format, which does not require any content extraction preprocess and preserves all the information in a document (e. g. , text, image and layout). |
Xueguang Ma; Sheng-Chieh Lin; Minghan Li; Wenhu Chen; Jimmy Lin; |
135 | TempoFormer: A Transformer for Temporally-aware Representations in Change Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we introduce TempoFormer, the first task-agnostic transformer-based and temporally-aware model for dynamic representation learning. |
Talia Tseriotou; Adam Tsakalidis; Maria Liakata; |
136 | Is This The Real Life? Is This Just Fantasy? The Misleading Success of Simulating Social Interactions With LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most recent work has used a more omniscient perspective on these simulations (e. g. , single LLM to generate all interlocutors), which is fundamentally at odds with the non-omniscient, information asymmetric interactions that involve humans and AI agents in the real world. To examine these differences, we develop an evaluation framework to simulate social interactions with LLMs in various settings (omniscient, non-omniscient). |
Xuhui Zhou; Zhe Su; Tiwalayo Eisape; Hyunwoo Kim; Maarten Sap; |
137 | Transformers Are Multi-State RNNs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View |
Matanel Oren; Michael Hassid; Nir Yarden; Yossi Adi; Roy Schwartz; |
138 | Understanding Democratization in NLP and ML Research Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we seek to clarify how democratization is understood in NLP and ML publications, through large-scale mixed-methods analyses of papers using the keyword democra* published in NLP and adjacent venues. |
Arjun Subramonian; Vagrant Gautam; Dietrich Klakow; Zeerak Talat; |
139 | On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study aims to redefine the design of vision-language models by identifying key components and creating efficient models with constrained inference costs. |
Geewook Kim; Minjoon Seo; |
140 | Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel TDA method called Debias and Denoise Attribution (DDA), which enhances influence functions by addressing fitting errors. |
Kangxi Wu; Liang Pang; Huawei Shen; Xueqi Cheng; |
141 | An Audit on The Perspectives and Challenges of Hallucinations in NLP Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We audit how hallucination in large language models (LLMs) is characterized in peer-reviewed literature, using a critical examination of 103 publications across NLP research. … |
Pranav Narayanan Venkit; Tatiana Chakravorti; Vipul Gupta; Heidi Biggs; Mukund Srinath; Koustava Goswami; Sarah Rajtmajer; Shomir Wilson; |
142 | Exploring The Practicality of Generative Retrieval on Dynamic Corpora Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on Generative Retrievals (GR), which apply autoregressive language models to IR problems, and explore their adaptability and robustness in dynamic scenarios. |
Chaeeun Kim; Soyoung Yoon; Hyunji Lee; Joel Jang; Sohee Yang; Minjoon Seo; |
143 | Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Utilization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose a method that deconstructs complex real-world questions into a graph, representing each question as a node with predecessors of background knowledge needed to solve the question. |
Miyoung Ko; Sue Hyun Park; Joonsuk Park; Minjoon Seo; |
144 | Updating CLIP to Prefer Descriptions Over Captions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although CLIPScore is a powerful generic metric that captures the similarity between a text and an image, it fails to distinguish between a caption that is meant to complement the information in an image and a description that is meant to replace an image entirely, e. g. , for accessibility. We address this shortcoming by updating the CLIP model with the Concadia dataset to assign higher scores to descriptions than captions using parameter efficient fine-tuning and a loss objective derived from work on causal interpretability. |
Amir Zur; Elisa Kreiss; Karel D’Oosterlinck; Christopher Potts; Atticus Geiger; |
145 | CharacterGLM: Customizing Social Characters with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the generalizability and adaptability across various conversational scenarios inherent in customizing social characters still lack public industrial solutions. To address these challenges, by dissecting well-rounded social characters composed of both inherent social profiles and external social behaviors, we manually collect a large-scale Chinese corpus featuring characters with diverse categories and behaviors, and develop CharacterGLM models alongside well-designed refinement methods. |
Jinfeng Zhou; Zhuang Chen; Dazhen Wan; Bosi Wen; Yi Song; Jifan Yu; Yongkang Huang; Pei Ke; Guanqun Bi; Libiao Peng; JiaMing Yang; Xiyao Xiao; Sahand Sabour; Xiaohan Zhang; Wenjing Hou; Yijia Zhang; Yuxiao Dong; Hongning Wang; Jie Tang; Minlie Huang; |
146 | Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models By Steering Parameters and Activations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Safety Arithmetic, a training-free framework enhancing LLM safety across different scenarios: Base models, Supervised fine-tuned models (SFT), and Edited models. |
Rima Hazra; Sayan Layek; Somnath Banerjee; Soujanya Poria; |
147 | Semantic Training Signals Promote Hierarchical Syntactic Generalization in Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Would neural networks generalize more like humans when trained on both form and meaning? We investigate this by examining if Transformers-neural networks without a hierarchical bias-better achieve hierarchical generalization when trained on both form and meaning compared to when trained on form alone. |
Aditya Yedetore; Najoung Kim; |
148 | Retrieved Sequence Augmentation for Protein Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that a simple alternative, Retrieved Sequence Augmentation (RSA), can enhance protein representation learning without the need for alignment and cumbersome preprocessing. |
Chang Ma; Haiteng Zhao; Lin Zheng; Jiayi Xin; Qintong Li; Lijun Wu; Zhihong Deng; Yang Young Lu; Qi Liu; Sheng Wang; Lingpeng Kong; |
149 | MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose MoDULA (Mixture of Domain-Specific and Universal LoRA), a novel Parameter Efficient Fine-Tuning (PEFT) Mixture-of-Expert (MoE) paradigm for improved fine-tuning and parameter efficiency in multi-task learning. |
Yufei Ma; Zihan Liang; Huangyu Dai; Ben Chen; Dehong Gao; Zhuoran Ran; Wang Zihan; Linbo Jin; Wen Jiang; Guannan Zhang; Xiaoyan Cai; Libin Yang; |
150 | Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the InfoGap method-an efficient and reliable approach to locating information gaps and inconsistencies in articles at the fact level, across languages. |
Farhan Samir; Chan Young Park; Anjalie Field; Vered Shwartz; Yulia Tsvetkov; |
151 | ScalingFilter: Assessing Data Quality Through Inverse Utilization of Scaling Laws Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ScalingFilter, a novel approach that evaluates text quality based on the perplexity difference between two language models trained on the same data, thereby eliminating the influence of the reference dataset in the filtering process. |
Ruihang Li; Yixuan Wei; Miaosen Zhang; Nenghai Yu; Han Hu; Houwen Peng; |
152 | A Simple LLM Framework for Long-Range Video Question-Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present LLoVi, a simple yet effective **L**anguage-based **Lo**ng-range **Vi**deo question-answering (LVQA) framework. |
Ce Zhang; Taixi Lu; Md Mohaiminul Islam; Ziyang Wang; Shoubin Yu; Mohit Bansal; Gedas Bertasius; |
153 | Detection and Measurement of Syntactic Templates in Generated Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we propose a method for evaluating diversity over syntactic features to characterize general repetition in models, beyond frequent n-grams. |
Chantal Shaib; Yanai Elazar; Junyi Jessy Li; Byron C Wallace; |
154 | Backward Lens: Projecting Language Model Gradients Into The Vocabulary Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we extend this methodology to LMs’ backward pass and gradients. |
Shahar Katz; Yonatan Belinkov; Mor Geva; Lior Wolf; |
155 | AmazonQAC: A Large-Scale, Naturalistic Query Autocomplete Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This contribution aims to stimulate further research on QAC systems to better serve user needs in diverse environments. |
Dante Everaert; Rohit Patki; Tianqi Zheng; Christopher Potts; |
156 | ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability Assessment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a comprehensive evaluation of large language models for multilingual readability assessment. |
Tarek Naous; Michael J Ryan; Anton Lavrouk; Mohit Chandra; Wei Xu; |
157 | Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation Via Global Tuple Extraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce LiveSum, a new benchmark dataset created for generating summary tables of competitions based on real-time commentary texts. |
Zheye Deng; Chunkit Chan; Weiqi Wang; Yuxi Sun; Wei Fan; Tianshi Zheng; Yauwai Yim; Yangqiu Song; |
158 | SpecHub: Provable Acceleration to Multi-Draft Speculative Decoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present SpecHub, a novel, efficient sampling-verification method for MDSD that improves acceptance rates with only linear computational overhead. |
Ryan Sun; Tianyi Zhou; Xun Chen; Lichao Sun; |
159 | Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a simple yet effective method that relies on Reject-sampling by Self-instruct with Continued Fine-tuning (ReSet), which significantly outperforms vanilla MTL. |
Zhengxuan Wu; Yuhao Zhang; Peng Qi; Yumo Xu; Rujun Han; Yian Zhang; Jifan Chen; Bonan Min; Zhiheng Huang; |
160 | When Is Multilinguality A Curse? Language Modeling for 250 High- and Low-Resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we pre-train over 10,000 monolingual and multilingual language models for over 250 languages, including multiple language families that are under-studied in NLP. We assess how language modeling performance in each language varies as a function of (1) monolingual dataset size, (2) added multilingual dataset size, (3) linguistic similarity of the added languages, and (4) model size (up to 45M parameters). |
Tyler A. Chang; Catherine Arnett; Zhuowen Tu; Ben Bergen; |
161 | Retrieval Augmented Spelling Correction for E-Commerce Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through quantitative evaluation and qualitative error analyses, we find improvements in spelling correction utilizing the RAG framework beyond a stand-alone LLM. |
Xuan Guo; Rohit Patki; Dante Everaert; Christopher Potts; |
162 | Language-to-Code Translation with A Single Labeled Example Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we describe ICIP (In-Context Inverse Programming), a method for bootstrapping a language-to-code system using mostly (or entirely) unlabeled programs written using a potentially unfamiliar (but human-readable) library or API. |
Kaj Bostrom; Harsh Jhamtani; Hao Fang; Sam Thomson; Richard Shin; Patrick Xia; Benjamin Van Durme; Jason Eisner; Jacob Andreas; |
163 | Demystifying Verbatim Memorization in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Much prior work has studied such verbatim memorization using observational data. To complement such work, we develop a framework to study verbatim memorization in a controlled setting by continuing pre-training from Pythia checkpoints with injected sequences. |
Jing Huang; Diyi Yang; Christopher Potts; |
164 | Thoughts to Target: Enhance Planning for Target-driven Conversation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current methods, such as chain-of-thought reasoning and tree-search policy learning techniques, either neglect plan rationality or require extensive human simulation procedures. Addressing this, we propose a novel two-stage framework, named EnPL, to improve the LLMs’ capability in planning conversations towards designated targets, including (1) distilling natural language plans from target-driven conversation corpus and (2) generating new plans with demonstration-guided in-context learning. |
Zhonghua Zheng; Lizi Liao; Yang Deng; Ee-Peng Lim; Minlie Huang; Liqiang Nie; |
165 | Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a large-scale study of linguistic bias exhibited by ChatGPT covering ten dialects of English (Standard American English, Standard British English, and eight widely spoken non-standard varieties from around the world). |
Eve Fleisig; Genevieve Smith; Madeline Bossi; Ishita Rustagi; Xavier Yin; Dan Klein; |
166 | GPT Vs RETRO: Exploring The Intersection of Retrieval and Parameter-Efficient Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we apply PEFT methods (P-tuning, Adapters, and LoRA) to a modified Retrieval-Enhanced Transformer (RETRO) and a baseline GPT model across several sizes, ranging from 823 million to 48 billion parameters. |
Aleksander Ficek; Jiaqi Zeng; Oleksii Kuchaiev; |
167 | Extract, Define, Canonicalize: An LLM-based Framework for Knowledge Graph Construction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we are interested in automated methods for knowledge graph creation (KGC) from input text. |
Bowen Zhang; Harold Soh; |
168 | Linear Layer Extrapolation for Fine-Grained Emotion Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe a similar pattern for fine-grained emotion classification in text, demonstrating that self-contrast can enhance encoder-based text classifiers. |
Mayukh Sharma; Sean O’Brien; Julian McAuley; |
169 | ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA Datasets with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present ConvKGYarn, a scalable method for generating up-to-date and configurable conversational KGQA datasets. |
Ronak Pradeep; Daniel Lee; Ali Mousavi; Jeffrey Pound; Yisi Sang; Jimmy Lin; Ihab Ilyas; Saloni Potdar; Mostafa Arefiyan; Yunyao Li; |
170 | Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing benchmarks employ irrelevant noise texts to artificially extend the length of test cases, diverging from the real-world scenarios of long-context applications. To bridge this gap, we propose a novel long-context benchmark, Loong, aligning with realistic scenarios through extended multi-document question answering (QA). |
Minzheng Wang; Longze Chen; Fu Cheng; Shengyi Liao; Xinghua Zhang; Bingli Wu; Haiyang Yu; Nan Xu; Lei Zhang; Run Luo; Yunshui Li; Min Yang; Fei Huang; Yongbin Li; |
171 | SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this leads to issues of repetition, bias towards popular entities, and stylistic differences from human text. In this work, we propose Synthesize by Retrieval and Refinement (SynthesizRR), which uses retrieval augmentation to introduce variety into the dataset synthesis process: as retrieved passages vary, the LLM is seeded with different content to generate its examples. |
Abhishek Divekar; Greg Durrett; |
172 | Taylor Unswift: Secured Weight Release for Large Language Models Via Taylor Expansion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Ensuring the security of released large language models (LLMs) poses a significant dilemma, as existing mechanisms either compromise ownership rights or raise data privacy concerns. To address this dilemma, we introduce TaylorMLP to protect the ownership of released LLMs and prevent their abuse. |
Guanchu Wang; Yu-Neng Chuang; Ruixiang Tang; Shaochen Zhong; Jiayi Yuan; Hongye Jin; Zirui Liu; Vipin Chaudhary; Shuai Xu; James Caverlee; Xia Hu; |
173 | Lifelong Event Detection Via Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel approach, Lifelong Event Detection via Optimal Transport (**LEDOT**), that leverages optimal transport principles to align the optimization of our classification module with the intrinsic nature of each class, as defined by their pre-trained language modeling. |
Viet Dao; Van-Cuong Pham; Quyen Tran; Thanh-Thien Le; Linh Van Ngo; Thien Huu Nguyen; |
174 | Householder Pseudo-Rotation: A Novel Approach to Activation Editing in LLMs with Direction-Magnitude Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that doing so would break the magnitude consistency of the activation vectors in LLMs. To overcome this shortcoming, we propose a novel editing method that views activations in terms of their directions and magnitudes. |
Van-Cuong Pham; Thien Huu Nguyen; |
175 | Preserving Generalization of Language Models in Few-shot Continual Relation Extraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a novel method that leverages often-discarded language model heads. |
Quyen Tran; Nguyen Xuan Thanh; Nguyen Hoang Anh; Nam Le Hai; Trung Le; Linh Van Ngo; Thien Huu Nguyen; |
176 | Don’t Just Say I Don’t Know! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To avoid providing hallucinated answers to these unknown questions, existing studies typically investigate approaches to refusing to answer these questions. In this work, we propose a novel and scalable self-alignment method to utilize the LLM itself to enhance its response-ability to different types of unknown questions, being capable of not just refusing to answer but further proactively providing explanations to the unanswerability of unknown questions. |
Yang Deng; Yong Zhao; Moxin Li; See-Kiong Ng; Tat-Seng Chua; |
177 | Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we compare two common approaches: unsupervised fine-tuning and retrieval-augmented generation (RAG). |
Oded Ovadia; Menachem Brief; Moshik Mishaeli; Oren Elisha; |
178 | TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Currently, existing methods perform all of these steps in a single pass without being able to adapt if insufficient or incorrect information is collected. To overcome this, we introduce a modular multi-LMM agent framework based on several agents with different roles, instructed by a Planner agent that updates its instructions using shared feedback from the other agents. |
Chuyi Shang; Amos You; Sanjay Subramanian; Trevor Darrell; Roei Herzig; |
179 | Language Models As Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Even so, code for one instance cannot be reused for others, although they might require the same logic to solve. We present Think-and-Execute, a novel framework that improves LLMs’ algorithmic reasoning: (1) In Think, we discover task-level logic shared across all instances, and express such logic with pseudocode; (2) In Execute, we tailor the task-level pseudocode to each instance and simulate the execution of it. |
Hyungjoo Chae; Yeonghyeon Kim; Seungone Kim; Kai Tzu-iunn Ong; Beong-woo Kwak; Moohyeon Kim; Sunghwan Kim; Taeyoon Kwon; Jiwan Chung; Youngjae Yu; Jinyoung Yeo; |
180 | Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents Coffee-Gym, a comprehensive RL environment for training models that provide feedback on code editing. |
Hyungjoo Chae; Taeyoon Kwon; Seungjun Moon; Yongho Song; Dongjin Kang; Kai Tzu-iunn Ong; Beong-woo Kwak; Seonghyeon Bae; Seung-won Hwang; Jinyoung Yeo; |
181 | ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present ECCO, a reproducible benchmark for evaluating program efficiency via two paradigms: natural language (NL) based code generation and history-based code editing. |
Siddhant Waghjale; Vishruth Veerendranath; Zhiruo Wang; Daniel Fried; |
182 | On The Reliability of Psychological Scales on Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our study aims to determine the reliability of applying personality assessments to LLMs, explicitly investigating whether LLMs demonstrate consistent personality traits. |
Jen-tse Huang; Wenxiang Jiao; Man Ho Lam; Eric John Li; Wenxuan Wang; Michael Lyu; |
183 | Personas As A Way to Model Truthfulness in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an explanation for why LMs appear to know the truth despite not being trained with truth labels. |
Nitish Joshi; Javier Rando; Abulhair Saparov; Najoung Kim; He He; |
184 | LLMs Are Prone to Fallacies in Causal Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, this work investigates: Can LLMs infer causal relations from other relational data in text? |
Nitish Joshi; Abulhair Saparov; Yixin Wang; He He; |
185 | The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose **DemOgraphic FActualIty Representation (DoFaiR)**, a benchmark to systematically quantify the trade-off between using diversity interventions and preserving demographic factuality in T2I models. |
Yixin Wan; Di Wu; Haoran Wang; Kai-Wei Chang; |
186 | TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current datasets primarily cater to user-led systems and are limited to predefined specific scenarios and slots, thereby necessitating improvements in the proactiveness, diversity, and capabilities of TOD. In this study, we present a detailed multi-domain task-oriented data construction process for conversations, and a Chinese dialogue dataset generated based on this process, **TransferTOD**, which authentically simulates human-computer dialogues in 30 popular life service scenarios. |
Ming Zhang; Caishuang Huang; Yilong Wu; Shichun Liu; Huiyuan Zheng; Yurui Dong; Yujiong Shen; Shihan Dou; Jun Zhao; Junjie Ye; Qi Zhang; Tao Gui; Xuanjing Huang; |
187 | Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we find that low-rank pre-training, normally considered as efficient methods that will compromise performance, can be scalably effective when reduced parameters are precisely targeted. |
Xingtai Lv; Ning Ding; Kaiyan Zhang; Ermo Hua; Ganqu Cui; Bowen Zhou; |
188 | Strength Lies in Differences! Improving Strategy Planning for Non-collaborative Dialogues Via Diversified User Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This poses two main challenges for existing dialogue agents: 1) The inability to integrate user-specific characteristics into the strategic planning, and 2) The difficulty of training strategic planners that can be generalized to diverse users. To address these challenges, we propose TRIP to enhance the capability in tailored strategic planning, incorporating a user-aware strategic planning module and a population-based training paradigm. |
Tong Zhang; Chen Huang; Yang Deng; Hongru Liang; Jia Liu; Zujie Wen; Wenqiang Lei; Tat-Seng Chua; |
189 | GoldCoin: Grounding Large Language Models in Privacy Laws Via Contextual Integrity Theory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the scarcity of open-source relevant case studies restricts the efficiency of LLMs in aligning with specific legal statutes. To address this challenge, we introduce a novel framework, GoldCoin, designed to efficiently ground LLMs in privacy laws for judicial assessing privacy violations. |
Wei Fan; Haoran Li; Zheye Deng; Weiqi Wang; Yangqiu Song; |
190 | LLM-Evolve: Evaluation for LLM’s Evolving Capability on Benchmarks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing LLM benchmarks evaluate LLMs on i. i. d. tasks, overlooking their ability to learn iteratively from past experiences. Our paper bridges this evaluation gap by proposing a novel framework, LLM-Evolve, which extends established benchmarks to sequential problem-solving settings. |
Jiaxuan You; Mingjie Liu; Shrimai Prabhumoye; Mostofa Patwary; Mohammad Shoeybi; Bryan Catanzaro; |
191 | Searching for Best Practices in Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we investigate existing RAG approaches and their potential combinations to identify optimal RAG practices. |
Xiaohua Wang; Zhenghua Wang; Xuan Gao; Feiran Zhang; Yixin Wu; Zhibo Xu; Tianyuan Shi; Zhengyuan Wang; Shizheng Li; Qi Qian; Ruicheng Yin; Changze Lv; Xiaoqing Zheng; Xuanjing Huang; |
192 | PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Heuristic-based Sampling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While humans struggle to optimize prompts, they are good at providing feedback about LLM outputs; we therefore introduce a new LLM-driven discrete prompt optimization framework PROMST that incorporates human-designed feedback rules to automatically offer direct suggestions for improvement. |
Yongchao Chen; Jacob Arkin; Yilun Hao; Yang Zhang; Nicholas Roy; Chuchu Fan; |
193 | Step-by-Step Reasoning to Solve Grid Puzzles: Where Do LLMs Falter? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we first develop GridPuzzle, an evaluation dataset comprising of 274 grid-based puzzles with different complexities. Second, we propose a new error taxonomy derived from manual analysis of reasoning chains from LLMs including GPT-4, Claude-3, Gemini, Mistral, and Llama-2. Then, we develop a LLM-based framework for large-scale subjective evaluation (i.e., identifying errors) and an objective metric, PuzzleEval, to evaluate the correctness of reasoning chains. |
Nemika Tyagi; Mihir Parmar; Mohith Kulkarni; Aswin Rrv; Nisarg Patel; Mutsumi Nakamura; Arindam Mitra; Chitta Baral; |
194 | Small LLMs Are Weak Tool Learners: A Multi-LLM Agent Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While traditional works focus on training a single LLM with all these capabilities, performance limitations become apparent, particularly with smaller models. To overcome these challenges, we propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer. |
Weizhou Shen; Chenliang Li; Hongzhan Chen; Ming Yan; Xiaojun Quan; Hehong Chen; Ji Zhang; Fei Huang; |
195 | How Far Can We Extract Diverse Perspectives from Large Language Models? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we explore LLMs’ capacity of generating diverse perspectives and rationales on subjective topics such as social norms and argumentative texts. |
Shirley Anugrah Hayati; Minhwa Lee; Dheeraj Rajagopal; Dongyeop Kang; |
196 | Scaling Laws for Linear Complexity Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we present the scaling laws for linear complexity language models to establish a foundation for their scalability. |
Xuyang Shen; Dong Li; Ruitao Leng; Zhen Qin; Weigao Sun; Yiran Zhong; |
197 | TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Industry Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a comprehensive framework aimed at enhancing the Task Planning and Tool Usage (TPTU) abilities of LLM-based agents in industry. |
Yilun Kong; Jingqing Ruan; YiHong Chen; Bin Zhang; Tianpeng Bao; Shi Shiwei; du Guo Qing; Xiaoru Hu; Hangyu Mao; Ziyue Li; Xingyu Zeng; Rui Zhao; Xueqian Wang; |
198 | VIMI: Grounding Video Generation Through Multi-modal Instruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this, we construct a large-scale multimodal prompt dataset by employing retrieval methods to pair in-context examples with the given text prompts and then utilize a two-stage training strategy to enable diverse video generation tasks within a model. In the first stage, we propose a multimodal conditional video generation framework for pretraining on these augmented datasets, establishing a foundational model for grounded video generation. |
Yuwei Fang; Willi Menapace; Aliaksandr Siarohin; Tsai-Shien Chen; Kuan-Chieh Wang; Ivan Skorokhodov; Graham Neubig; Sergey Tulyakov; |
199 | HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Yet the relationship between empathy and narrative style is not fully understood. In this work, we empirically examine and quantify this relationship between style and empathy using LLMs and large-scale crowdsourcing studies. |
Jocelyn J Shen; Joel Mire; Hae Won Park; Cynthia Breazeal; Maarten Sap; |
200 | The Empirical Variability of Narrative Perceptions of Social Media Texts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most NLP work on narrative detection has focused on prescriptive definitions of stories crafted by researchers, leaving open the questions: how do crowd workers perceive texts to be a story, and why? We investigate this by building StoryPerceptions, a dataset of 2,496 perceptions of storytelling in 502 social media texts from 255 crowd workers, including categorical labels along with free-text storytelling rationales, authorial intent, and more. |
Joel Mire; Maria Antoniak; Elliott Ash; Andrew Piper; Maarten Sap; |
201 | Improving Discriminative Capability of Reward Models in RLHF Using Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current methods rely on ranking losses to teach the reward model to assess preferences, but they are susceptible to noise and ambiguous data, often failing to deeply understand human intentions. To address this issue, we introduce contrastive learning into the reward modeling process. |
Lu Chen; Rui Zheng; Binghai Wang; Senjie Jin; Caishuang Huang; Junjie Ye; Zhihao Zhang; Yuhao Zhou; Zhiheng Xi; Tao Gui; Qi Zhang; Xuanjing Huang; |
202 | LONGAGENT: Achieving Question Answering for 128k-Token-Long Documents Through Multi-Agent Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce _LongAgent_, a multi-agent collaboration method that enables efficient and effective QA over 128k-token-long documents. |
Jun Zhao; Can Zu; Xu Hao; Yi Lu; Wei He; Yiwen Ding; Tao Gui; Qi Zhang; Xuanjing Huang; |
203 | Exploring The Compositional Deficiency of Large Language Models in Mathematical Reasoning Through Trap Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the compositionality of large language models (LLMs) in mathematical reasoning. |
Jun Zhao; Jingqi Tong; Yurong Mou; Ming Zhang; Qi Zhang; Xuanjing Huang; |
204 | Understanding and Mitigating Language Confusion in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate a surprising limitation of LLMs: their inability to consistently generate text in a user’s desired language. |
Kelly Marchisio; Wei-Yin Ko; Alexandre Berard; Th�o Dehaze; Sebastian Ruder; |
205 | Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Following this framework, we introduce Dynamic SUTA (DSUTA), an entropy-minimization-based continual TTA method for ASR. |
Guan-Ting Lin; Wei Ping Huang; Hung-yi Lee; |
206 | SciAgent: Tool-augmented Language Models for Scientific Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. |
Yubo Ma; Zhibin Gou; Junheng Hao; Ruochen Xu; Shuohang Wang; Liangming Pan; Yujiu Yang; Yixin Cao; Aixin Sun; |
207 | ECON: On The Detection and Resolution of Evidence Conflicts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study introduces a method for generating diverse, validated evidence conflicts to simulate real-world misinformation scenarios. |
Cheng Jiayang; Chunkit Chan; Qianqian Zhuang; Lin Qiu; Tianhang Zhang; Tengxiao Liu; Yangqiu Song; Yue Zhang; Pengfei Liu; Zheng Zhang; |
208 | MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work introduces a novel competition-based benchmark framework specifically designed to assess LLMs within multi-agent settings, providing quantitative metrics to evaluate their judgment, reasoning, deception, self-awareness, cooperation, coordination, and rationality. |
Lin Xu; Zhiyuan Hu; Daquan Zhou; Hongyu Ren; Zhen Dong; Kurt Keutzer; See-Kiong Ng; Jiashi Feng; |
209 | Prompt Leakage Effect and Mitigation Strategies for Multi-turn LLM Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we systematically investigate LLM vulnerabilities against prompt leakage for 10 closed- and open-source LLMs, across four domains. |
Divyansh Agarwal; Alexander Fabbri; Ben Risher; Philippe Laban; Shafiq Joty; Chien-Sheng Wu; |
210 | ActPlan-1K: Benchmarking The Procedural Planning Ability of Visual Language Models in Household Activities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to evaluate the planning ability of both multi-modal and counterfactual aspects, we propose ActPlan-1K. |
Ying Su; Zhan Ling; Haochen Shi; Cheng Jiayang; Yauwai Yim; Yangqiu Song; |
211 | AKEW: Assessing Knowledge Editing in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, its current evaluations deviate significantly from practice: their knowledge updates solely consist of structured facts derived from meticulously crafted datasets, instead of practical sources-unstructured texts like news articles, and they often overlook practical real-world knowledge updates. To address these issues, in this paper we propose AKEW (Assessing Knowledge Editing in the Wild), a new practical benchmark for knowledge editing. |
Xiaobao Wu; Liangming Pan; William Yang Wang; Anh Tuan Luu; |
212 | Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing research seeks to enhance RAG performance by retrieving higher-quality documents or designing RAG-specific LLMs, the internal mechanisms within LLMs that contribute to RAG’s effectiveness remain underexplored. In this paper, we aim to investigate these internal mechanisms within the popular Mixture-of-Expert (MoE)-based LLMs and demonstrate how to improve RAG by examining expert activations in these LLMs. |
Xin Zhou; Ping Nie; Yiwen Guo; Haojie Wei; Zhanqiu Zhang; Pasquale Minervini; Ruotian Ma; Tao Gui; Qi Zhang; Xuanjing Huang; |
213 | Towards Injecting Medical Visual Knowledge Into Multimodal LLMs at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using PubMedVision, we train a 34B medical MLLM **HuatuoGPT-Vision**, which shows superior performance in medical multimodal scenarios among open-source MLLMs. |
Junying Chen; Chi Gui; Ruyi Ouyang; Anningzhe Gao; Shunian Chen; Guiming Hardy Chen; Xidong Wang; Zhenyang Cai; Ke Ji; Xiang Wan; Benyou Wang; |
214 | DocHieNet: A Large and Diverse Dataset for Document Hierarchy Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a large and diverse document hierarchy parsing (DHP) dataset to compensate for the data scarcity and inconsistency problem. |
Hangdi Xing; Changxu Cheng; Feiyu Gao; Zirui Shao; Zhi Yu; Jiajun Bu; Qi Zheng; Cong Yao; |
215 | How Does The Disclosure of AI Assistance Affect The Perceptions of Writing? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent advances in generative AI technologies like large language models have boosted the incorporation of AI assistance in writing workflows, leading to the rise of a new paradigm of human-AI co-creation in writing. To understand how people perceive writings that are produced under this paradigm, in this paper, we conduct an experimental study to understand whether and how the disclosure of the level and type of AI assistance in the writing process would affect people’s perceptions of the writing on various aspects, including their evaluation on the quality of the writing, and their ranking of different writings. |
Zhuoyan Li; Chen Liang; Jing Peng; Ming Yin; |
216 | Revisiting Who’s Harry Potter: Towards Targeted Unlearning from A Causal Intervention Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper investigates Who’s Harry Potter (WHP), a pioneering yet insufficiently understood method for LLM unlearning. |
Yujian Liu; Yang Zhang; Tommi Jaakkola; Shiyu Chang; |
217 | Position Engineering: Boosting Large Language Models Through Positional Information Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel technique termed position engineering, which offers a more efficient way to guide large language models. |
Zhiyuan He; Huiqiang Jiang; Zilong Wang; Yuqing Yang; Luna K. Qiu; Lili Qiu; |
218 | Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, RAG’s significantly lower cost remains a distinct advantage. Based on this observation, we propose Self-Route, a simple yet effective method that routes queries to RAG or LC based on model self-reflection. |
Zhuowan Li; Cheng Li; Mingyang Zhang; Qiaozhu Mei; Michael Bendersky; |
219 | Towards Robust Speech Representation Learning for Thousands of Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose XEUS, a Cross-lingual Encoder for Universal Speech, trained on over 1 million hours of data across 4057 languages, extending the language coverage of SSL models 4-fold. |
William Chen; Wangyou Zhang; Yifan Peng; Xinjian Li; Jinchuan Tian; Jiatong Shi; Xuankai Chang; Soumi Maiti; Karen Livescu; Shinji Watanabe; |
220 | FinDVer: Explainable Claim Verification Over Long and Hybrid-content Financial Documents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce FinDVer, a comprehensive benchmark specifically designed to evaluate the explainable claim verification capabilities of LLMs in the context of understanding and analyzing long, hybrid-content financial documents. |
Yilun Zhao; Yitao Long; Tintin Jiang; Chengye Wang; Weiyuan Chen; Hongjun Liu; Xiangru Tang; Yiming Zhang; Chen Zhao; Arman Cohan; |
221 | LawBench: Benchmarking Legal Knowledge of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present LawBench, the first evaluation benchmark composed of 20 tasks aimed to assess the ability of Large Language Models (LLMs) to perform Chinese legal-related tasks. |
Zhiwei Fei; Xiaoyu Shen; Dawei Zhu; Fengzhe Zhou; Zhuo Han; Alan Huang; Songyang Zhang; Kai Chen; Zhixin Yin; Zongwen Shen; Jidong Ge; Vincent Ng; |
222 | Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduces Chain-of-Note (CoN), a novel approach to improve robustness of RALMs in facing noisy, irrelevant documents and in handling unknown scenarios. |
Wenhao Yu; Hongming Zhang; Xiaoman Pan; Peixin Cao; Kaixin Ma; Jian Li; Hongwei Wang; Dong Yu; |
223 | Unlocking Anticipatory Text Generation: A Constrained Approach for Large Language Models Decoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose formalizing text generation as a future-constrained generation problem to minimize undesirable behaviors and enforce faithfulness to instructions. |
Lifu Tu; Semih Yavuz; Jin Qu; Jiacheng Xu; Rui Meng; Caiming Xiong; Yingbo Zhou; |
224 | Finer: Investigating and Enhancing Fine-Grained Visual Concept Recognition in Large Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In-depth analyses show that instruction-tuned LVLMs suffer from modality gap, showing discrepancy when given textual and visual inputs that correspond to the same concept. In an effort to further the community’s endeavor in this direction, we propose a multiple granularity attribute-centric benchmark and training mixture, Finer, which aims to establish a ground to evaluate LVLMs’ fine-grained visual comprehension ability and provide significantly improved explainability. |
Jeonghwan Kim; Heng Ji; |
225 | MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Model Exclusive Task Arithmetic for merging GPT-scale models (MetaGPT) which formalizes the objective of model merging into a multi-task learning framework, aiming to minimize the average loss difference between the merged model and each individual task model. |
Yuyan Zhou; Liang Song; Bingning Wang; Weipeng Chen; |
226 | Tag-grounded Visual Instruction Tuning with Retrieval Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first identify the limitations of multimodal connectors stemming from insufficient training data. Driven by this, we propose to enhance the mapping with retrieval-augmented tag tokens, which contain rich object-aware information such as object names and attributes. |
Daiqing Qi; Handong Zhao; Zijun Wei; Sheng Li; |
227 | On The Proper Treatment of Tokenization in Psycholinguistics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The paper argues that token-level language models should be (approximately) marginalized into character-level language models before they are used in psycholinguistic studies to compute the surprisal of a region of interest; then, the marginalized character-level language model can be used to compute the surprisal of an arbitrary character substring, which we term a focal area, that the experimenter may wish to use as a predictor. Our proposal of marginalizing a token-level model into a character-level one solves this misalignment issue independently of the tokenization scheme. |
Mario Giulianelli; Luca Malagutti; Juan Luis Gastaldi; Brian DuSell; Tim Vieira; Ryan Cotterell; |
228 | PsyGUARD: An Automated System for Suicide Detection and Risk Assessment in Psychological Counseling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose PsyGUARD, an automated system for detecting suicide ideation and assessing risk in psychological counseling. |
Huachuan Qiu; Lizhi Ma; Zhenzhong Lan; |
229 | Let Me Teach You: Pedagogical Foundations of Feedback for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this opinion piece, we compile ideas from pedagogy to introduce FELT, a feedback framework for LLMs that outlines various characteristics of the feedback space, and a feedback content taxonomy based on these variables, providing a general mapping of the feedback space. |
Beatriz Borges; Niket Tandon; Tanja K�ser; Antoine Bosselut; |
230 | LitSearch: A Retrieval Benchmark for Scientific Literature Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce LitSearch, a retrieval benchmark comprising 597 realistic literature search queries about recent ML and NLP papers. |
Anirudh Ajith; Mengzhou Xia; Alexis Chevalier; Tanya Goyal; Danqi Chen; Tianyu Gao; |
231 | SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Evaluating models for SEA languages is challenging due to the scarcity of high-quality datasets, compounded by the dominance of English training data, raising concerns about potential cultural misrepresentation. To address these challenges, through a collaborative movement, we introduce SEACrowd, a comprehensive resource center that fills the resource gap by providing standardized corpora in nearly 1,000 SEA languages across three modalities. |
Holy Lovenia; Rahmad Mahendra; Salsabil Maulana Akbar; Lester James Validad Miranda; Jennifer Santoso; Elyanah Aco; Akhdan Fadhilah; Jonibek Mansurov; Joseph Marvin Imperial; Onno P. Kampman; Joel Ruben Antony Moniz; Muhammad Ravi Shulthan Habibi; Frederikus Hudi; Jann Railey Montalan; Ryan Ignatius Hadiwijaya; Joanito Agili Lopo; William Nixon; B�rje F. Karlsson; James Jaya; Ryandito Diandaru; Yuze Gao; Patrick Amadeus Irawan; Bin Wang; Jan Christian Blaise Cruz; Chenxi Whitehouse; Ivan Halim Parmonangan; Maria Khelli; Wenyu Zhang; Lucky Susanto; Reynard Adha Ryanda; Sonny Lazuardi Hermawan; Dan John Velasco; Muhammad Dehan Al Kautsar; Willy Fitra Hendria; Yasmin Moslem; Noah Flynn; Muhammad Farid Adilazuarda; Haochen Li; Johanes Lee; R. Damanhuri; Shuo Sun; Muhammad Reza Qorib; Amirbek Djanibekov; Wei Qi Leong; Quyet V. Do; Niklas Muennighoff; Tanrada Pansuwan; Ilham Firdausi Putra; Yan Xu; Tai Ngee Chia; Ayu Purwarianti; Sebastian Ruder; William Chandra Tjhi; Peerat Limkonchotiwat; Alham Fikri Aji; Sedrick Keh; Genta Indra Winata; Ruochen Zhang; Fajri Koto; Zheng Xin Yong; Samuel Cahyawijaya; |
232 | Towards Aligning Language Models with Textual Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ALT (ALignment with Textual feedback), an approach that aligns language models with user preferences expressed in text. |
Sa�c Abadal Lloret; Shehzaad Dhuliawala; Keerthiram Murugesan; Mrinmaya Sachan; |
233 | Training-free Deep Concept Injection Enables Language Models for Video Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we make the first attempt to demonstrate that the PLM is able to perform zero-shot crossmodal tasks without any crossmodal pretraining, when the observed visual concepts are injected as both additional input text tokens and augmentation in the intermediate features within each feed-forward network for the PLM. |
Xudong Lin; Manling Li; Richard Zemel; Heng Ji; Shih-Fu Chang; |
234 | Personalized Pieces: Efficient Personalized Large Language Models Through Collaborative Efforts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce Personalized Pieces (Per-Pcs), a framework that allows users to safely share and assemble personalized PEFT efficiently with collaborative efforts. |
Zhaoxuan Tan; Zheyuan Liu; Meng Jiang; |
235 | Democratizing Large Language Models Via Personalized Parameter-Efficient Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these methods faced limitations due to a lack of model ownership, resulting in constrained customization and privacy issues, and often failed to capture complex, dynamic user behavior patterns. To address these shortcomings, we introduce One PEFT Per User (OPPU), employing personalized parameter-efficient fine-tuning (PEFT) modules to store user-specific behavior patterns and preferences. |
Zhaoxuan Tan; Qingkai Zeng; Yijun Tian; Zheyuan Liu; Bing Yin; Meng Jiang; |
236 | TopViewRS: Vision-Language Models As Top-View Spatial Reasoners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study their capability to understand and reason over spatial relations from the top view. |
Chengzu Li; Caiqi Zhang; Han Zhou; Nigel Collier; Anna Korhonen; Ivan Vulic; |
237 | Casablanca: Data and Models for Multidialectal Arabic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This challenge is largely due to the absence of datasets that can empower diverse speech systems. In this paper, we seek to mitigate this obstacle for a number of Arabic dialects by presenting Casablanca, a large-scale community-driven effort to collect and transcribe a multi-dialectal Arabic dataset. |
Bashar Talafha; Karima Kadaoui; Samar Mohamed Magdy; Mariem Habiboullah; Chafei Mohamed Chafei; Ahmed Oumar El-Shangiti; Hiba Zayed; Mohamedou Cheikh Tourad; Rahaf Alhamouri; Rwaa Assi; Aisha Alraeesi; Hour Mohamed; Fakhraddin Alwajih; Abdelrahman Mohamed; Abdellah El Mekki; El Moatez Billah Nagoudi; Benelhadj Djelloul Mama Saadia; Hamzah A. Alsayadi; Walid Al-Dhabyani; Sara Shatnawi; Yasir Ech-chammakhy; Amal Makouar; Yousra Berrachedi; Mustafa Jarrar; Shady Shehata; Ismail Berrada; Muhammad Abdul-Mageed; |
238 | Large Language Models Are Poor Clinical Decision-Makers: A Comprehensive Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To better understand LLMs in the clinic, we construct a benchmark ClinicBench. |
Fenglin Liu; Zheng Li; Hongjian Zhou; Qingyu Yin; Jingfeng Yang; Xianfeng Tang; Chen Luo; Ming Zeng; Haoming Jiang; Yifan Gao; Priyanka Nigam; Sreyashi Nag; Bing Yin; Yining Hua; Xuan Zhou; Omid Rohanian; Anshul Thakur; Lei Clifton; David A. Clifton; |
239 | Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our study systematically evaluates the performance of two widely used multilingual ASR models on three datasets, encompassing 19 languages from eight language families and two speaking conditions. |
Giuseppe Attanasio; Beatrice Savoldi; Dennis Fucci; Dirk Hovy; |
240 | Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a simpler and more general declarative framework with flexible context-sensitive rules binding multiple languages (specifically, simplified English and the TPTP theorem-proving language). |
Damien Sileo; |
241 | Target-Aware Language Modeling Via Granular Data Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we revisit importance sampling with n-gram features consisting of multi-granular tokens, which strikes a good balance between sentence compression and representation capabilities. |
Ernie Chang; Pin-Jie Lin; Yang Li; Changsheng Zhao; Daeil Kim; Rastislav Rabatin; Zechun Liu; Yangyang Shi; Vikas Chandra; |
242 | Scaling Parameter-Constrained Language Models with Quality Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend the conventional understanding of scaling law by offering a microscopic view of data quality within the original formulation – effective training tokens – which we posit to be a critical determinant of performance for parameter-constrained language models. |
Ernie Chang; Matteo Paltenghi; Yang Li; Pin-Jie Lin; Changsheng Zhao; Patrick Huber; Zechun Liu; Rastislav Rabatin; Yangyang Shi; Vikas Chandra; |
243 | Initialization of Large Language Models Via Reparameterization to Mitigate Loss Spikes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, to meet these requirements in the Transformer model, the norm of the model parameters must be non-uniform, and thus, parameters whose norm is smaller are more sensitive to the parameter update. To address this issue, we propose a novel technique, weight scaling as reparameterization (WeSaR). |
Kosuke Nishida; Kyosuke Nishida; Kuniko Saito; |
244 | AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce MetaBench, the first benchmark to evaluate LLMs’ ability to plan and execute multiple APIs from various sources in order to complete the user’s task. |
Hongru Wang; Rui Wang; Boyang Xue; Heming Xia; Jingtao Cao; Zeming Liu; Jeff Z. Pan; Kam-Fai Wong; |
245 | Mitigating Open-Vocabulary Caption Hallucinations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a framework for addressing hallucinations in image captioning in the open-vocabulary setting. |
Assaf Ben-Kish; Moran Yanuka; Morris Alper; Raja Giryes; Hadar Averbuch-Elor; |
246 | Interpretability-based Tailored Knowledge Editing in Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Leveraging findings on the critical role of feed-forward MLPs in decoder-only models, we propose a tailored knowledge editing method, TailoredKE, that considers the unique information flow of each sample. |
Yihuai Hong; Aldo Lipani; |
247 | Bridging Cultures in The Kitchen: A Framework and Benchmark for Cross-Cultural Recipe Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose CARROT, a plug-and-play cultural-aware recipe information retrieval framework that incorporates cultural-aware query rewriting and re-ranking methods and evaluate it both on our benchmark and intuitive human judgments. |
Tianyi Hu; Maria Maistro; Daniel Hershcovich; |
248 | Can LLM Generate Culturally Relevant Commonsense QA Data? Case Study in Indonesian and Sundanese Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we investigate the effectiveness of using LLMs in generating culturally relevant commonsense QA datasets for Indonesian and Sundanese languages. |
Rifki Afina Putri; Faiz Ghifari Haznitrama; Dea Adhista; Alice Oh; |
249 | MIPD: Exploring Manipulation and Intention In A Novel Corpus of Polish Disinformation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study presents a novel corpus of 15,356 Polish web articles, including articles identified as containing disinformation. |
Arkadiusz Modzelewski; Giovanni Da San Martino; Pavel Savov; Magdalena Anna Wilczynska; Adam Wierzbicki; |
250 | Order of Magnitude Speedups for LLM Membership Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we adapt a recent line of work which uses quantile regression to mount membership inference attacks; we extend this work by proposing a low-cost MIA that leverages an ensemble of small quantile regression models to determine if a document belongs to the model\u2019s training set or not. |
Rongting Zhang; Martin Andres Bertran; Aaron Roth; |
251 | ADELIE: Aligning Large Language Models on Information Extraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View |
Yunjia Qi; Hao Peng; Xiaozhi Wang; Bin Xu; Lei Hou; Juanzi Li; |
252 | MIND: Multimodal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such an approach tends to generate product-centric intentions, overlook valuable visual information from product images, and incurs high costs for scalability. To address these issues, we introduce MIND, a multimodal framework that allows Large Vision-Language Models (LVLMs) to infer purchase intentions from multimodal product metadata and prioritize human-centric ones. |
Baixuan Xu; Weiqi Wang; Haochen Shi; Wenxuan Ding; Huihao Jing; Tianqing Fang; Jiaxin Bai; Xin Liu; Changlong Yu; Zheng Li; Chen Luo; Qingyu Yin; Bing Yin; Long Chen; Yangqiu Song; |
253 | Beyond Embeddings: The Promise of Visual Table in Visual Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Visual Table, a novel form of visual representation tailored for visual reasoning. |
Yiwu Zhong; Zi-Yuan Hu; Michael Lyu; Liwei Wang; |
254 | Bayesian Calibration of Win Rate Estimation with LLM Evaluators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, applying LLM evaluators naively to compare different systems can lead to unreliable results due to the inaccuracy and intrinsic bias of LLM evaluators. In order to mitigate this problem, we propose two calibration methods, Bayesian Win-Rate Sampling (BWRS) and Bayesian Dawid-Skene, both of which leverage Bayesian inference to more accurately infer the true win rate of generative language models. |
Yicheng Gao; Gonghan Xu; Zhe Wang; Arman Cohan; |
255 | Delving Into Qualitative Implications of Synthetic Data for Hate Speech Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an in-depth qualitative analysis of the potential and specific pitfalls of synthetic data for hate speech detection in English, with 3,500 manually annotated examples. |
Camilla Casula; Sebastiano Vecellio Salto; Alan Ramponi; Sara Tonelli; |
256 | LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study focuses on the topic of LLMs as NLP Researchers, particularly examining the effectiveness of LLMs in assisting paper (meta-)reviewing and its recognizability. To address this, we constructed the ReviewCritique dataset, which includes two types of information: (i) NLP papers (initial submissions rather than camera-ready) with both human-written and LLM-generated reviews, and (ii) each review comes with \u201cdeficiency\u201d labels and corresponding explanations for individual segments, annotated by experts. |
Jiangshu Du; Yibo Wang; Wenting Zhao; Zhongfen Deng; Shuaiqi Liu; Renze Lou; Henry Peng Zou; Pranav Narayanan Venkit; Nan Zhang; Mukund Srinath; Haoran Ranran Zhang; Vipul Gupta; Yinghui Li; Tao Li; Fei Wang; Qin Liu; Tianlin Liu; Pengzhi Gao; Congying Xia; Chen Xing; Cheng Jiayang; Zhaowei Wang; Ying Su; Raj Sanjay Shah; Ruohao Guo; Jing Gu; Haoran Li; Kangda Wei; Zihao Wang; Lu Cheng; Surangika Ranathunga; Meng Fang; Jie Fu; Fei Liu; Ruihong Huang; Eduardo Blanco; Yixin Cao; Rui Zhang; Philip S. Yu; Wenpeng Yin; |
257 | Model Editing Harms General Abilities of Large Language Models: Regularization to The Rescue Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we raise concerns that model editing’s improvements on factuality may come at the cost of a significant degradation of the model’s general abilities. |
Jia-Chen Gu; Hao-Xiang Xu; Jun-Yu Ma; Pan Lu; Zhen-Hua Ling; Kai-Wei Chang; Nanyun Peng; |
258 | ABSEval: An Agent-based Framework for Script Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, We introduced a novel script evaluation dataset, MCScript, consisting of more than 1,500 script evaluation tasks and steps, and developed an agent-based script evaluation framework, ABSEval, to collaboratively evaluate scripts generated by LLMs. |
Sirui Liang; Baoli Zhang; Jun Zhao; Kang Liu; |
259 | KB-Plugin: A Plug-and-play Framework for Large Language Models to Induce Programs Over Low-resourced Knowledge Bases Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose KB-Plugin, a plug-and-play framework that enables LLMs to induce programs over any low-resourced KB. |
Jiajie Zhang; Shulin Cao; Linmei Hu; Ling Feng; Lei Hou; Juanzi Li; |
260 | LUQ: Long-text Uncertainty Quantification for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To further improve the factuality of LLM responses, we propose Luq-Ensemble, a method that ensembles responses from multiple models and selects the response with the lowest uncertainty. |
Caiqi Zhang; Fangyu Liu; Marco Basaldella; Nigel Collier; |
261 | Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View |
Keunwoo Peter Yu; Zheyuan Zhang; Fengyuan Hu; Shane Storks; Joyce Chai; |
262 | Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Aligning Large Language Models (LLMs) traditionally relies on complex and costly training processes like supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). To address the challenge of achieving alignment without these extensive tuning costs and expensive annotations, we present a novel, tuning-free approach for self-alignment called Dynamic Rewarding with Prompt Optimization (DRPO). |
Somanshu Singla; Zhen Wang; Tianyang Liu; Abdullah Ashfaq; Zhiting Hu; Eric P. Xing; |
263 | Flex Tape Can’t Fix That: Bias and Misinformation in Edited Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate how weight editing methods unexpectedly amplify model biases after edits. |
Karina H Halevy; Anna Sotnikova; Badr AlKhamissi; Syrielle Montariol; Antoine Bosselut; |
264 | Consistent Autoformalization for Constructing Mathematical Libraries Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes the coordinated use of three mechanisms, most-similar retrieval augmented generation (MS-RAG), denoising steps, and auto-correction with syntax error feedback (Auto-SEF) to improve autoformalization quality. |
Lan Zhang; Xin Quan; Andre Freitas; |
265 | CHIQ: Contextual History Enhancement for Improving Query Rewriting in Conversational Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study how open-source large language models (LLMs) can be effectively deployed for improving query rewriting in conversational search, especially for ambiguous queries. |
Fengran Mo; Abbas Ghaddar; Kelong Mao; Mehdi Rezagholizadeh; Boxing Chen; Qun Liu; Jian-Yun Nie; |
266 | Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Expanding this capacity during the instruction tuning phase poses significant challenges. To address this issue, we introduce parameter-efficient sparsity crafting (PESC), which crafts dense models into sparse models using the mixture-of-experts (MoE) architecture. |
Haoyuan Wu; Haisheng Zheng; Zhuolun He; Bei Yu; |
267 | EVEDIT: Event-based Knowledge Editing for Deterministic Knowledge Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We both theoretically and empirically observe that this simplified setting often leads to uncertainty when determining the deduction anchors, causing low confidence in their answers. To mitigate this issue, we propose a novel task of event-based knowledge editing that pairs facts with event descriptions. |
Jiateng Liu; Pengfei Yu; Yuji Zhang; Sha Li; Zixuan Zhang; Ruhi Sarikaya; Kevin Small; Heng Ji; |
268 | Task Arithmetic Can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods suffer in performance when they fine-tune an automatic speech recognition (ASR) model on synthetic data as they suffer from the distributional shift commonly referred to as the synthetic-to-real gap. In this paper, we find that task arithmetic is effective at mitigating this gap. |
Hsuan Su; Hua Farn; Fan-Yun Sun; Shang-Tse Chen; Hung-yi Lee; |
269 | ClimRetrieve: A Benchmarking Dataset for Information Retrieval from Corporate Climate Disclosures Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, a significant gap remains in evaluating domain-specific information retrieval – the basis for answer generation. To address this challenge, this work simulates the typical tasks of a sustainability analyst by examining 30 sustainability reports with 16 detailed climate-related questions. |
Tobias Schimanski; Jingwei Ni; Roberto Spacey Mart�n; Nicola Ranger; Markus Leippold; |
270 | ARM: An Alignment-and-Replacement Module for Chinese Spelling Check Based on LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, by leveraging the capabilities of LLMs while mitigating their limitations, we propose a novel plug-and-play Alignment-and-Replacement Module ARM that enhances the performance of existing CSC models and without the need for retraining or fine-tuning. |
Changchun Liu; Kai Zhang; Junzhe Jiang; Zirui Liu; Hanqing Tao; Min Gao; Enhong Chen; |
271 | Towards Measuring and Modeling Culture in LLMs: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a survey of more than 90 recent papers that aim to study cultural representation and inclusion in large language models (LLMs). |
Muhammad Farid Adilazuarda; Sagnik Mukherjee; Pradhyumna Lavania; Siddhant Shivdutt Singh; Alham Fikri Aji; Jacki O’Neill; Ashutosh Modi; Monojit Choudhury; |
272 | From LLMs to MLLMs: Exploring The Landscape of Multimodal Jailbreaking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We summarize the limitations and potential research directions of multimodal jailbreaking, aiming to inspire future research and further enhance the robustness and security of MLLMs. |
Siyuan Wang; Zhuohan Long; Zhihao Fan; Zhongyu Wei; |
273 | Evaluating The Instruction-Following Robustness of Large Language Models to Prompt Injection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we establish a benchmark to evaluate the robustness of instruction-following LLMs against prompt injection attacks, assessing their ability to discern which instructions to follow and which to disregard. |
Zekun Li; Baolin Peng; Pengcheng He; Xifeng Yan; |
274 | TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, current SVS models often fail to generate singing voices rich in stylistic nuances for unseen singers. To address these challenges, we introduce TCSinger, the first zero-shot SVS model for style transfer across cross-lingual speech and singing styles, along with multi-level style control. |
Yu Zhang; Ziyue Jiang; Ruiqi Li; Changhao Pan; Jinzheng He; Rongjie Huang; Chuxin Wang; Zhou Zhao; |
275 | Evaluating Diversity in Automatic Poetry Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While most previous research has focused on forms of the Turing test when evaluating automatic poetry generation – can humans distinguish between automatic and human generated poetry – we evaluate the diversity of automatically generated poetry (with a focus on quatrains), by comparing distributions of generated poetry to distributions of human poetry along structural, lexical, semantic and stylistic dimensions, assessing different model types (word vs. character-level, general purpose LLMs vs. poetry-specific models), including the very recent LLaMA3-8B, and types of fine-tuning (conditioned vs. unconditioned). We find that current automatic poetry systems are considerably underdiverse along multiple dimensions – they often do not rhyme sufficiently, are semantically too uniform and even do not match the length distribution of human poetry. |
Yanran Chen; Hannes Gr�ner; Sina Zarrie�; Steffen Eger; |
276 | Working Memory Identifies Reasoning Limits in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study explores the inherent limitations of large language models (LLMs) from a scaling perspective, focusing on the upper bounds of their cognitive capabilities. |
Chunhui Zhang; Yiren Jian; Zhongyu Ouyang; Soroush Vosoughi; |
277 | CasiMedicos-Arg: A Medical Question Answering Dataset Annotated with Explanatory Argumentative Structures Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Developing new tools to aid residents to train their explanation skills is therefore a central objective of AI in education. In this paper, we follow this direction, and we present, to the best of our knowledge, the first multilingual dataset for Medical Question Answering where correct and incorrect diagnoses for a clinical case are enriched with a natural language explanation written by doctors. |
Ekaterina Sviridova; Anar Yeginbergen; Ainara Estarrona; Elena Cabrio; Serena Villata; Rodrigo Agerri; |
278 | BPO: Staying Close to The Behavior LLM Creates Better Online LLM Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose online Preference Optimization in proximity to the Behavior LLM (BPO), emphasizing the importance of constructing a proper trust region for LLM alignment. |
Wenda Xu; Jiachen Li; William Yang Wang; Lei Li; |
279 | LongEmbed: Extending Embedding Models for Long Context Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper explores context window extension of existing embedding models, pushing their input length to a maximum of 32,768. |
Dawei Zhu; Liang Wang; Nan Yang; Yifan Song; Wenhao Wu; Furu Wei; Sujian Li; |
280 | Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work suggests fundamentally rethinking the current practice of pruning large language models (LLMs). |
Sungbin Shin; Wonpyo Park; Jaeho Lee; Namhoon Lee; |
281 | A Thorough Examination of Decoding Methods in The Era of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper provides a comprehensive and multifaceted analysis of various decoding methods within the context of LLMs, evaluating their performance, robustness to hyperparameter changes, and decoding speeds across a wide range of tasks, models, and deployment environments. |
Chufan Shi; Haoran Yang; Deng Cai; Zhisong Zhang; Yifan Wang; Yujiu Yang; Wai Lam; |
282 | Entity Insertion in Multilingual Linked Corpora: The Case of Wikipedia Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The latter problem has not been addressed effectively, particularly in the absence of text spans in the source that could serve as anchors to insert a link to the target entity. To bridge this gap, we introduce and operationalize the task of entity insertion in information networks. |
Tom�s Feith; Akhil Arora; Martin Gerlach; Debjit Paul; Robert West; |
283 | Evaluating Character Understanding of Large Language Models Via Character Profiling from Fictional Works Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose evaluating LLMs’ character understanding capability via the character profiling task, i. e. , summarizing character profiles from corresponding materials, a widely adopted yet understudied practice for RPA development. |
Xinfeng Yuan; Siyu Yuan; Yuhan Cui; Tianhe Lin; Xintao Wang; Rui Xu; Jiangjie Chen; Deqing Yang; |
284 | An Unsupervised Approach to Achieve Supervised-Level Explainability in Healthcare Records Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: State-of-the-art explainability methods increase model transparency but rely on human-annotated evidence spans, which are costly. In this study, we propose an approach to produce plausible and faithful explanations without needing such annotations. |
Joakim Edin; Maria Maistro; Lars Maal�e; Lasse Borgholt; Jakob Drachmann Havtorn; Tuukka Ruotsalo; |
285 | Topic-Oriented Open Relation Extraction with A Priori Seed Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve topic-oriented ORE, we propose a zero-shot approach called PriORE: Open Relation Extraction with a Priori seed generation. |
Linyi Ding; Jinfeng Xiao; Sizhe Zhou; Chaoqi Yang; Jiawei Han; |
286 | RAC: Retrieval-augmented Conversation Dataset for Open-domain Question Answering in Conversational Settings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a novel retrieval-augmented conversation (RAC) dataset and develop a baseline system comprising query rewriting, retrieval, reranking, and response generation stages. |
Bonggeun Choi; JeongJae Park; Yoonsung Kim; Jaehyun Park; Youngjoong Ko; |
287 | Do We Need Language-Specific Fact-Checking Models? The Case of Chinese Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates the potential benefits of language-specific fact-checking models, focusing on the case of Chinese using CHEF dataset. |
Caiqi Zhang; Zhijiang Guo; Andreas Vlachos; |
288 | Verification and Refinement of Natural Language Explanations Through LLM-Symbolic Theorem Proving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we present a neuro-symbolic framework, named Explanation-Refiner, that integrates TPs with LLMs to generate and formalise explanatory sentences and suggest potential inference strategies for NLI. |
Xin Quan; Marco Valentino; Louise A. Dennis; Andre Freitas; |
289 | Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We hypothesize that successful attacks share some similar properties: They are effective in moving the representation of the harmful prompt towards the direction to the harmless prompts. We leverage hidden representations into the objective of existing jailbreak attacks to move the attacks along the acceptance direction, and conduct experiments to validate the above hypothesis using the proposed objective. |
Yuping Lin; Pengfei He; Han Xu; Yue Xing; Makoto Yamada; Hui Liu; Jiliang Tang; |
290 | Evaluating LLMs for Targeted Concept Simplification for Domain-Specific Texts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In a preliminary human study, we first identify that lack of context and unfamiliarity with difficult concepts is a major reason for adult readers\u2019 difficulty with domain-specific text. We then introduce targeted concept simplification, a simplification task for rewriting text to help readers comprehend text containing unfamiliar concepts. We also introduce WikiDomains, a new dataset of 22k definitions from 13 academic domains paired with a difficult concept within each definition. |
Sumit Asthana; Hannah Rashkin; Elizabeth Clark; Fantine Huot; Mirella Lapata; |
291 | Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Due to the inherent uncertainty in LLM generation, inputting the entire document may introduce off-topic information, causing the model to deviate from the central topic and affecting the relevance of the generated content. To address these issues, we propose the Retrieve-Plan-Generation (RPG) framework. |
Yuanjie Lyu; Zihan Niu; Zheyong Xie; Chao Zhang; Tong Xu; Yang Wang; Enhong Chen; |
292 | Unsupervised End-to-End Task-Oriented Dialogue with LLMs: The Power of The Noisy Channel Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an innovative approach using expectation-maximization (EM) that infers turn-level annotations as latent variables using a noisy channel model to build an end-to-end dialogue agent. |
Brendan King; Jeffrey Flanigan; |
293 | TEMA: Token Embeddings Mapping for Enriching Low-Resource Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The objective of the research we present is to remedy the problem of the low quality of language models for low-resource languages. |
Rodolfo Zevallos; N�ria Bel; Mireia Farr�s; |
294 | Puzzle Solving Using Reasoning of Large Language Models: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through a critical review of relevant datasets and benchmarks, we assess LLMs’ performance, identifying significant challenges in complex puzzle scenarios. |
Panagiotis Giadikiaroglou; Maria Lymperaiou; Giorgos Filandrianos; Giorgos Stamou; |
295 | KorSmishing Explainer: A Korean-centric LLM-based Framework for Smishing Detection and Explanation Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To mitigate the annual financial losses caused by SMS phishing (smishing) in South Korea, we propose an explainable smishing detection framework that adapts to a Korean-centric large language model (LLM). |
Yunseung Lee; Daehee Han; |
296 | Let Me Speak Freely? A Study On The Impact Of Format Restrictions On Large Language Model Performance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Structured generation, the process of producing content in standardized formats like JSON and XML, is widely utilized in real-world applications to extract key output information from large language models (LLMs). This study investigates whether such constraints on generation space impact LLMs’ abilities, including reasoning and domain knowledge comprehension. |
Zhi Rui Tam; Cheng-Kuang Wu; Yi-Lin Tsai; Chieh-Yen Lin; Hung-yi Lee; Yun-Nung Chen; |
297 | Seg2Act: Global Context-aware Action Generation for Document Logical Structuring Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Traditional approaches often fall short in handling the complexity and the variability of lengthy documents. To address these issues, we introduce Seg2Act, an end-to-end, generation-based method for document logical structuring, revisiting logical structure extraction as an action generation task. |
Zichao Li; Shaojie He; Meng Liao; Xuanang Chen; Yaojie Lu; Hongyu Lin; Yanxiong Lu; Xianpei Han; Le Sun; |
298 | On The Robustness of Editing Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work seeks to understand the strengths and limitations of editing methods, facilitating practical applications of communicative AI. |
Xinbei Ma; Tianjie Ju; Jiyang Qiu; Zhuosheng Zhang; Hai Zhao; Lifeng Liu; Yulong Wang; |
299 | Rethinking Pragmatics in Large Language Models: Towards Open-Ended Evaluation and Preference Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study addresses the challenges of assessing and enhancing social-pragmatic inference in large language models (LLMs). |
Shengguang Wu; Shusheng Yang; Zhenglun Chen; Qi Su; |
300 | BMRetriever: Tuning Large Language Models As Better Biomedical Text Retrievers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present BMRetriever, a series of dense retrievers for enhancing biomedical retrieval via unsupervised pre-training on large biomedical corpora, followed by instruction fine-tuning on a combination of labeled datasets and synthetic pairs. |
Ran Xu; Wenqi Shi; Yue Yu; Yuchen Zhuang; Yanqiao Zhu; May Dongmei Wang; Joyce C. Ho; Chao Zhang; Carl Yang; |
301 | Can Automatic Metrics Assess High-Quality Translations? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, correlation methods tend to capture only the ability of metrics to differentiate between good and bad source-translation pairs, overlooking their reliability in distinguishing alternative translations for the same source. In this paper, we confirm that this is indeed the case by showing that current metrics are insensitive to nuanced differences in translation quality. |
Sweta Agrawal; Ant�nio Farinhas; Ricardo Rei; Andre Martins; |
302 | Modeling User Preferences with Automatic Metrics: Creating A High-Quality Preference Dataset for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an approach that leverages the best of both worlds. |
Sweta Agrawal; Jos� G. C. De Souza; Ricardo Rei; Ant�nio Farinhas; Gon�alo Faria; Patrick Fernandes; Nuno M Guerreiro; Andre Martins; |
303 | Revisiting The Robustness of Watermarking to Paraphrasing Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that with limited access to model generations, we can undo the effects of watermarking and drastically improve the effectiveness of paraphrasing attacks. |
Saksham Rastogi; Danish Pruthi; |
304 | OneNet: A Fine-Tuning Free Framework for Few-Shot Entity Linking Via Large Language Model Prompting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Traditional EL methods heavily rely on large datasets to enhance their performance, a dependency that becomes problematic in the context of few-shot entity linking, where only a limited number of examples are available for training. To address this challenge, we present OneNet, an innovative framework that utilizes the few-shot learning capabilities of Large Language Models (LLMs) without the need for fine-tuning. |
Xukai Liu; Ye Liu; Kai Zhang; Kehang Wang; Qi Liu; Enhong Chen; |
305 | RAt: Injecting Implicit Bias for Text-To-Image Prompt Refinement Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study an adversarial prompt attacking problem for T2I-Refine, where to goal is to implicitly inject specific concept bias to the input prompts during the refinement process so that the generated images, still with higher quality, are explicitly biased to the target group. |
Ziyi Kou; Shichao Pei; Meng Jiang; Xiangliang Zhang; |
306 | Video-LLaVA: Learning United Visual Representation By Alignment Before Projection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we unify visual representation into the language feature space to advance the foundational LLM towards a unified LVLM. |
Bin Lin; Yang Ye; Bin Zhu; Jiaxi Cui; Munan Ning; Peng Jin; Li Yuan; |
307 | A Bayesian Approach to Harnessing The Power of LLMs in Authorship Attribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study explores the potential of pre-trained LLMs in one-shot authorship attribution, specifically utilizing Bayesian approaches and probability outputs of LLMs. |
Zhengmian Hu; Tong Zheng; Heng Huang; |
308 | Relevance Is A Guiding Light: Relevance-aware Adaptive Learning for End-to-end Task-oriented Dialogue System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate DAP, we propose the Relevance-aware Adaptive Learning (ReAL) method, a two-stage training framework that eliminates hard negatives step-by-step and aligns retrieval with generation. |
Zhanpeng Chen; Zhihong Zhu; Wanshi Xu; Xianwei Zhuang; Yuexian Zou; |
309 | Dual-oriented Disentangled Network with Counterfactual Intervention for Multimodal Intent Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods have made great progress in modal alignment and fusion, however, two vital limitations are neglected: (I) close entanglement of multimodal semantics with modal structures; (II) insufficient learning of the causal effects of semantic and modality-specific information on the final predictions under the end-to-end training fashion. To alleviate the above limitations, we introduce the Dual-oriented Disentangled Network with Counterfactual Intervention (DuoDN). |
Zhanpeng Chen; Zhihong Zhu; Xianwei Zhuang; Zhiqi Huang; Yuexian Zou; |
310 | Towards Online Continuous Sign Language Recognition and Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we take the first step towards online CSLR. |
Ronglai Zuo; Fangyun Wei; Brian Mak; |
311 | Embedding and Gradient Say Wrong: A White-Box Method for Hallucination Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we intend to model the distributional distance between the regular conditional output and the unconditional output, which is generated without a given input text. |
Xiaomeng Hu; Yiming Zhang; Ru Peng; Haozhe Zhang; Chenwei Wu; Gang Chen; Junbo Zhao; |
312 | Inductive-Deductive Strategy Reuse for Multi-Turn Instructional Dialogues Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take inspiration from the cognitive abilities inherent in human learning and propose the explicit modeling of complex dialogue flows through instructional strategy reuse. |
Jiao Ou; Jiayu Wu; Che Liu; Fuzheng Zhang; Di Zhang; Kun Gai; |
313 | Fine-Grained Detection of Solidarity for Women and Migrants in 155 Years of German Parliamentary Debates Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we investigate the frequency of (anti-)solidarity towards women and migrants in German parliamentary debates between 1867 and 2022. |
Aida Kostikova; Dominik Beese; Benjamin Paassen; Ole P�tz; Gregor Wiedemann; Steffen Eger; |
314 | MASIVE: Open-Ended Affective State Identification in English and Spanish Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we broaden our scope to a practically unbounded set of affective states, which includes any terms that humans use to describe their experiences of feeling. |
Nicholas Deas; Elsbeth Turcan; Ivan Ernesto Perez Mejia; Kathleen McKeown; |
315 | Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes SynCheck, a lightweight monitor that leverages fine-grained decoding dynamics including sequence likelihood, uncertainty quantification, context influence, and semantic alignment to synchronously detect unfaithful sentences. |
Di Wu; Jia-Chen Gu; Fan Yin; Nanyun Peng; Kai-Wei Chang; |
316 | Boosting Scientific Concepts Understanding: Can Analogy from Teacher Models Empower Student Models? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the human education process, in this paper, we propose to investigate how analogies created by teacher language models (LMs) can assist student LMs in understanding scientific concepts, thereby aligning more closely with practical scenarios. |
Siyu Yuan; Cheng Jiayang; Lin Qiu; Deqing Yang; |
317 | M2PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel Multimodal Prompt Tuning (M2PT) approach for efficient instruction tuning of MLLMs. |
Taowen Wang; Yiyang Liu; James Chenhao Liang; Junhan Zhao; Yiming Cui; Yuning Mao; Shaoliang Nie; Jiahao Liu; Fuli Feng; Zenglin Xu; Cheng Han; Lifu Huang; Qifan Wang; Dongfang Liu; |
318 | GDTB: Genre Diverse Data for English Shallow Discourse Parsing Across Modalities, Text Types, and Domains Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present and evaluate a new open-access, multi-genre benchmark for PDTB-style shallow discourse parsing, based on the existing UD English GUM corpus, for which discourse relation annotations in other frameworks already exist. |
Yang Janet Liu; Tatsuya Aoyama; Wesley Scivetti; Yilun Zhu; Shabnam Behzad; Lauren Elizabeth Levine; Jessica Lin; Devika Tiwari; Amir Zeldes; |
319 | MDPO: Conditional Preference Optimization for Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through a comparative experiment, we identify the unconditional preference problem in multimodal preference optimization, where the model overlooks the image condition. To address this problem, we propose mDPO, a multimodal DPO objective that prevents the over-prioritization of language-only preferences by also optimizing image preference. |
Fei Wang; Wenxuan Zhou; James Y. Huang; Nan Xu; Sheng Zhang; Hoifung Poon; Muhao Chen; |
320 | Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a general, model-agnostic, reinforcement learning framework, Mixture-of-Skills (MoS), that learns to optimize data usage automatically during the fine-tuning process. |
Minghao Wu; Thuy-Trang Vu; Lizhen Qu; Reza Haf; |
321 | Game on Tree: Visual Hallucination Mitigation Via Coarse-to-Fine View Tree and Game Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel plug-and-play train-free decoding algorithm named Game and Tree based Hallucination Mitigation (GTHM), designed for mitigating VH. |
Xianwei Zhuang; Zhihong Zhu; Zhanpeng Chen; Yuxin Xie; Liming Liang; Yuexian Zou; |
322 | CoverICL: Selective Annotation for In-Context Learning Via Active Graph Coverage Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate selective annotation for ICL, where there is a limited budget for annotating examples, similar to low-budget active learning (AL). |
Costas Mavromatis; Balasubramaniam Srinivasan; Zhengyuan Shen; Jiani Zhang; Huzefa Rangwala; Christos Faloutsos; George Karypis; |
323 | SPEED++: A Multilingual Event Extraction Framework for Epidemic Prediction and Preparedness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the first multilingual Event Extraction (EE) framework SPEED++ for extracting epidemic event information for any disease and language. |
Tanmay Parekh; Jeffrey Kwan; Jiarui Yu; Sparsh Johri; Hyosang Ahn; Sreya Muppalla; Kai-Wei Chang; Wei Wang; Nanyun Peng; |
324 | Fast Forwarding Low-Rank Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Enabled by these low-rank settings, we propose an even more efficient optimization strategy: Fast Forward, a simple and effective approach to accelerate large segments of SGD training. |
Adir Rahamim; Naomi Saphra; Sara Kangaslahti; Yonatan Belinkov; |
325 | Annotator-Centric Active Learning for Subjective NLP Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Annotator-Centric Active Learning (ACAL), which incorporates an annotator selection strategy following data sampling. |
Michiel Van Der Meer; Neele Falk; Pradeep K. Murukannaiah; Enrico Liscio; |
326 | Is It Really Long Context If All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, many disparate use-cases are grouped together under the umbrella term of long-context, defined simply by the total length of the model’s input, including – for example – Needle-in-a-Haystack tasks, book summarization, and information aggregation. Given their varied difficulty, in this position paper we argue that conflating different tasks by their context length is unproductive. |
Omer Goldman; Alon Jacovi; Aviv Slobodkin; Aviya Maimon; Ido Dagan; Reut Tsarfaty; |
327 | Contrastive Entity Coreference and Disambiguation for Historical Texts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We contrastively train bi-encoder models for coreferencing and disambiguating individuals in historical texts, achieving accurate, scalable performance that identifies out-of-knowledge base individuals. |
Abhishek Arora; Emily Silcock; Melissa Dell; Leander Heldring; |
328 | Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This absence causes noisy datasets and limited performance gains by modern neuro-symbolic entailment engines. To address these problems, we formulate a consistent and theoretically grounded approach to annotating decompositional entailment and evaluate its impact on LLM-based textual inference. |
Nathaniel Weir; Kate Sanders; Orion Weller; Shreya Sharma; Dongwei Jiang; Zhengping Jiang; Bhavana Dalvi Mishra; Oyvind Tafjord; Peter Jansen; Peter Clark; Benjamin Van Durme; |
329 | Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These environments are often highly expansive, making it impossible for the LLM to process them within its short-term memory. Motivated by recent research on extending the capabilities of LLMs with tools, we seek to investigate the intriguing potential of tools to augment LLMs in handling such complexity by introducing a novel class of tools, termed *middleware*, to aid in the proactive exploration within these massive environments. |
Yu Gu; Yiheng Shu; Hao Yu; Xiao Liu; Yuxiao Dong; Jie Tang; Jayanth Srinivasa; Hugo Latapie; Yu Su; |
330 | Reconsidering Sentence-Level Sign Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Historically, sign language machine translation has been posed as a sentence-level task: datasets consisting of continuous narratives are chopped up and presented to the model as isolated clips. In this work, we explore the limitations of this task framing. |
Garrett Tanzer; Maximus Shengelia; Ken Harrenstien; David Uthus; |
331 | Watch Every Step! LLM Agent Learning Via Iterative Step-level Process Refinement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce the **I**terative step-level **P**rocess **R**efinement **(IPR)** framework, which provides detailed step-by-step guidance to enhance agent training. |
Weimin Xiong; Yifan Song; Xiutian Zhao; Wenhao Wu; Xun Wang; Ke Wang; Cheng Li; Wei Peng; Sujian Li; |
332 | Academics Can Contribute to Domain-Specialized Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, leaderboards collect many individual tasks and general-purpose models often underperform in specialized domains; domain-specific or adapted models yield superior results. This focus on large general-purpose models excludes many academics and draws attention away from areas where they can make important contributions. We advocate for a renewed focus on developing and evaluating domain- and task-specific models, and highlight the unique role of academics in this endeavor. |
Mark Dredze; Genta Indra Winata; Prabhanjan Kambadur; Shijie Wu; Ozan Irsoy; Steven Lu; Vadim Dabravolski; David S Rosenberg; Sebastian Gehrmann; |
333 | Are Large Language Models Capable of Generating Human-Level Narratives? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a novel computational framework to analyze narratives through three discourse-level aspects: i) story arcs, ii) turning points, and iii) affective dimensions, including arousal and valence. |
Yufei Tian; Tenghao Huang; Miri Liu; Derek Jiang; Alexander Spangher; Muhao Chen; Jonathan May; Nanyun Peng; |
334 | IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we first identify that incorporating compositional attributes (e. g. , a green tree frog) in the design of manual prompts can significantly enhance image-text alignment scores. Building upon this observation, we propose a novel and interpretable prompt-tuning method named IntCoOp, which learns to jointly align attribute-level inductive biases and class embeddings during prompt-tuning. |
Soumya Suvra Ghosal; Samyadeep Basu; Soheil Feizi; Dinesh Manocha; |
335 | SEGMENT+: Long Text Processing with Short-Context Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we introduce Segment+, a general framework that enables LMs to handle extended inputs within limited context windows efficiently. |
Wei Shi; Shuang Li; Kerun Yu; Jinglei Chen; Zujie Liang; Xinhui Wu; Yuxi Qian; Feng Wei; Bo Zheng; Jiaqing Liang; Jiangjie Chen; Yanghua Xiao; |
336 | Can Large Language Models Enhance Predictions of Disease Progression? Investigating Through Disease Network Link Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we introduce a disease Comorbidity prediction model using LLM, named ComLLM, which leverages domain knowledge to enhance the prediction performance. |
Haohui Lu; Usman Naseem; |
337 | CoCoST: Automatic Complex Code Generation with Online Searching and Correctness Testing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, generating complex code within real-world scenarios remains challenging due to intricate structures, subtle bugs, understanding of advanced data types, and lack of supplementary contents. To address these challenges, we introduce the CoCoST framework, which enhances complex code generation by online searching for more information with planned queries and correctness testing for code refinement. |
Xinyi He; Jiaru Zou; Yun Lin; Mengyu Zhou; Shi Han; Zejian Yuan; Dongmei Zhang; |
338 | Learning to Retrieve Iteratively for In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce iterative retrieval, a novel framework that empowers retrievers to make iterative decisions through policy optimization. |
Yunmo Chen; Tongfei Chen; Harsh Jhamtani; Patrick Xia; Richard Shin; Jason Eisner; Benjamin Van Durme; |
339 | Distract Large Language Models for Automatic Jailbreak Attack Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel black-box jailbreak framework for automated red teaming of LLMs. |
Zeguan Xiao; Yan Yang; Guanhua Chen; Yun Chen; |
340 | MAIR: A Massive Benchmark for Evaluating Instructed Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose MAIR (Massive Instructed Retrieval Benchmark), a heterogeneous IR benchmark that includes 126 distinct IR tasks across 6 domains, collected from existing datasets. |
Weiwei Sun; Zhengliang Shi; Wu Jiu Long; Lingyong Yan; Xinyu Ma; Yiding Liu; Min Cao; Dawei Yin; Zhaochun Ren; |
341 | EmphAssess : A Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As part of the evaluation pipeline, we introduce EmphaClass, a new model that classifies emphasis at the frame or word level. |
Maureen de Seyssel; Antony D’Avirro; Adina Williams; Emmanuel Dupoux; |
342 | Paraphrase Types Elicit Prompt Engineering Capabilities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, changes in morphology and lexicon, i. e. , the vocabulary used, showed promise in improving prompts. These findings contribute to developing more robust language models capable of handling variability in linguistic expression. |
Jan Philip Wahle; Terry Ruas; Yang Xu; Bela Gipp; |
343 | ALVIN: Active Learning Via INterpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This oversight causes models to rely on shortcuts for predictions, i. e. , spurious correlations between input attributes and labels occurring in well-represented groups. To address this issue, we propose Active Learning Via INterpolation (ALVIN), which conducts intra-class interpolations between examples from under-represented and well-represented groups to create anchors, i. e. , artificial points situated between the example groups in the representation space. |
Michalis Korakakis; Andreas Vlachos; Adrian Weller; |
344 | FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, many LLMs exhibit significant performance discrepancies between high- and low-resource languages. To mitigate this challenge, we present FuxiTranyu, an open-source multilingual LLM, which is designed to satisfy the need of the research community for balanced and high-performing multilingual capabilities. |
Haoran Sun; Renren Jin; Shaoyang Xu; Leiyu Pan; Supryadi; Menglong Cui; Jiangcun Du; Yikun Lei; Lei Yang; Ling Shi; Juesi Xiao; Shaolin Zhu; Deyi Xiong; |
345 | Re-Evaluating Evaluation for Multilingual Summarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While previous works have argued that these approaches correlate strongly with human ratings in English, it remains unclear whether the conclusion holds for other languages. To answer this question, we construct a small-scale pilot dataset containing article-summary pairs and human ratings in English, Chinese and Indonesian. |
Jessica Zosa Forde; Ruochen Zhang; Lintang Sutawika; Alham Fikri Aji; Samuel Cahyawijaya; Genta Indra Winata; Minghao Wu; Carsten Eickhoff; Stella Biderman; Ellie Pavlick; |
346 | Leading Whitespaces of Language Models’ Subword Vocabulary Pose A Confound for Calculating Word Probabilities Related Papers Related Patents Related Grants Related Venues Related Experts View |
Byung-Doh Oh; William Schuler; |
347 | Sprout: Green Generative AI with Carbon-Efficient LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The rapid advancement of generative AI has heightened environmental concerns, particularly regarding carbon emissions. Our framework, Sprout, addresses these challenges by reducing the carbon footprint of inference in large language models (LLMs). |
Baolin Li; Yankai Jiang; Vijay Gadepally; Devesh Tiwari; |
348 | Beyond Label Attention: Transparency in Language Models for Automated Medical Coding Via Dictionary Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To facilitate accurate interpretability in medical language models, this paper leverages dictionary learning that can efficiently extract sparsely activated representations from dense language model embeddings in superposition. |
John Wu; David Wu; Jimeng Sun; |
349 | FIRST: Faster Improved Listwise Reranking with Single Token Decoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Further, they are trained with the typical language modeling objective, which treats all ranking errors uniformly-potentially at the cost of misranking highly relevant passages. Addressing these limitations, we introduce FIRST, a novel listwise LLM reranking approach leveraging the output logits of the first generated identifier to directly obtain a ranked ordering of the candidates. |
Revanth Gangi Reddy; JaeHyeok Doo; Yifei Xu; Md Arafat Sultan; Deevya Swain; Avirup Sil; Heng Ji; |
350 | Why Does New Knowledge Create Messy Ripple Effects in LLMs? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we answer the question of why most KE methods still create messy ripple effects. |
Jiaxin Qin; Zixuan Zhang; Chi Han; Pengfei Yu; Manling Li; Heng Ji; |
351 | FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we aim to explore Multitask Speech Language Model (SpeechLM) efficient inference via token reduction. |
Yichen Lu; Jiaqi Song; Chao-Han Huck Yang; Shinji Watanabe; |
352 | With Ears to See and Eyes to Hear: Sound Symbolism Experiments with Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, an understudied question is to what extent models that only have access to vision and text modalities are able to implicitly understand sound-based phenomena via abstract reasoning from orthography and imagery alone. To investigate this, we analyse the ability of VLMs and LLMs to demonstrate sound symbolism (i.e., to recognise a non-arbitrary link between sounds and concepts) as well as their ability to \u201chear\u201d via the interplay of the language and vision modules of open and closed-source multimodal models. |
Tyler Loakman; Yucheng Li; Chenghua Lin; |
353 | XCOMET-lite: Bridging The Gap Between Efficiency and Quality in Learned MT Evaluation Metrics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We employ distillation, quantization, and pruning techniques to create efficient xCOMET alternatives and introduce a novel data collection pipeline for efficient black-box distillation. |
Daniil Larionov; Mikhail Seleznyov; Vasiliy Viskov; Alexander Panchenko; Steffen Eger; |
354 | Pelican: Correcting Hallucination in Vision-LLMs Via Claim Decomposition and Program of Thought Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Pelican – a novel framework designed to detect and mitigate hallucinations through claim verification. |
Pritish Sahu; Karan Sikka; Ajay Divakaran; |
355 | Encoding Spreadsheets for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we introduce SheetEncoder, pioneering an efficient encoding method designed to unleash and optimize LLMs’ powerful understanding and reasoning capability on spreadsheets. |
Haoyu Dong; Jianbo Zhao; Yuzhang Tian; Junyu Xiong; Mengyu Zhou; Yun Lin; Jos� Cambronero; Yeye He; Shi Han; Dongmei Zhang; |
356 | Dense X Retrieval: What Retrieval Granularity Should We Use? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Distinct from the typical approach of using passages or sentences, we introduce a novel retrieval unit, proposition, for dense retrieval. |
Tong Chen; Hongwei Wang; Sihao Chen; Wenhao Yu; Kaixin Ma; Xinran Zhao; Hongming Zhang; Dong Yu; |
357 | Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, despite the improved performance of existing methods, they still possess some deficiencies, such as dependency on references and limited evaluation flexibility. Therefore, in this paper, we meticulously construct a large-scale NLG evaluation corpus **NLG-Eval** with annotations from both human and GPT-4 to alleviate the lack of relevant data in this field. |
Xinyu Hu; Li Lin; Mingqi Gao; Xunjian Yin; Xiaojun Wan; |
358 | Extracting Prompts By Inverting LLM Outputs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We consider the problem of language model inversion: given outputs of a language model, we seek to extract the prompt that generated these outputs. |
Collin Zhang; John Xavier Morris; Vitaly Shmatikov; |
359 | An Analysis and Mitigation of The Reversal Curse Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View |
Ang Lv; Kaiyi Zhang; Shufang Xie; Quan Tu; Yuhan Chen; Ji-Rong Wen; Rui Yan; |
360 | When Reasoning Meets Information Aggregation: A Case Study with Sports Narratives Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We conduct comprehensive experiments with real NBA basketball data and present SportsGen, a new method to synthesize game narratives. |
Yebowen Hu; Kaiqiang Song; Sangwoo Cho; Xiaoyang Wang; Wenlin Yao; Hassan Foroosh; Dong Yu; Fei Liu; |
361 | Towards A Similarity-adjusted Surprisal Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work leverages Ricotta and Szeidl\u2019s (2006) diversity index to extend surprisal into a metric that we term similarity-adjusted surprisal, exposing a mathematical relationship between surprisal and information value. |
Clara Meister; Mario Giulianelli; Tiago Pimentel; |
362 | How to Compute The Probability of A Word Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper derives the correct methods for computing word probabilities, highlighting issues when relying on language models that use beginning-of-word (bow)-marking tokenisers, e. g. , the GPT family. |
Tiago Pimentel; Clara Meister; |
363 | Split and Merge: Aligning Position Biases in LLM-based Evaluators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, LLM-based evaluators exhibit position bias, or inconsistency, when used to evaluate candidate answers in pairwise comparisons, favoring either the first or second answer regardless of content. To address this limitation, we propose PORTIA, an alignment-based system designed to mimic human comparison strategies to calibrate position bias in a lightweight yet effective manner. |
Zongjie Li; Chaozheng Wang; Pingchuan Ma; Daoyuan Wu; Shuai Wang; Cuiyun Gao; Yang Liu; |
364 | PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose PromptReps, which combines the advantages of both categories: no need for training and the ability to retrieve from the whole corpus. |
Shengyao Zhuang; Xueguang Ma; Bevan Koopman; Jimmy Lin; Guido Zuccon; |
365 | VIEWS: Entity-Aware News Video Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Within this task, we face challenges inherent to recognizing named entities and navigating diverse, dynamic contexts, all while relying solely on visual cues. To address these challenges, we propose a model-agnostic approach that enriches visual information extracted from videos with context sourced from external knowledge, enabling the generation of entity-aware captions. |
Hammad Ayyubi; Tianqi Liu; Arsha Nagrani; Xudong Lin; Mingda Zhang; Anurag Arnab; Feng Han; Yukun Zhu; Xuande Feng; Kevin Zhang; Jialu Liu; Shih-Fu Chang; |
366 | EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, building upon EAGLE, we propose EAGLE-2, which introduces a new technique of context-aware dynamic draft tree into drafting modeling. |
Yuhui Li; Fangyun Wei; Chao Zhang; Hongyang Zhang; |
367 | From Descriptive Richness to Bias: Unveiling The Dark Side of Generative Image Caption Enrichment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We compare standard-format captions and recent GCE processes from the perspectives of gender bias and hallucination, showing that enriched captions suffer from increased gender bias and hallucination. |
Yusuke Hirota; Ryo Hachiuma; Chao-Han Huck Yang; Yuta Nakashima; |
368 | Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations with GPT Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a straightforward yet potent Conversation Reconstruction Attack. |
Junjie Chu; Zeyang Sha; Michael Backes; Yang Zhang; |
369 | Subjective Topic Meets LLMs: Unleashing Comprehensive, Reflective and Creative Thinking Through The Negation of Negation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe that a sole emphasis on logical thinking falls short in effectively tackling subjective challenges. Therefore, we introduce a framework grounded in the principle of the Negation of Negation (NeoN) to unleash the potential comprehensive, reflective, and creative thinking abilities of LLMs. |
Fangrui Lv; Kaixiong Gong; Jian Liang; Xinyu Pang; Changshui Zhang; |
370 | Contextualized Sequence Likelihood: Enhanced Confidence Scores for Natural Language Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose enhancing the predicted sequence probability by assigning different weights to various tokens using attention values elicited from the base LLM. |
Zhen Lin; Shubhendu Trivedi; Jimeng Sun; |
371 | Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: They often struggle with simple daily tasks, such as reading time from a clock, understanding a flowchart, or planning a route using a road map. In light of this, we design a multi-modal self-instruct, utilizing large language models and their code capabilities to synthesize massive abstract images and visual reasoning instructions across daily scenarios. |
Wenqi Zhang; Zhenglin Cheng; Yuanyu He; Mengna Wang; Yongliang Shen; Zeqi Tan; Guiyang Hou; Mingqian He; Yanna Ma; Weiming Lu; Yueting Zhuang; |
372 | A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs regarding their architectures and pre-training techniques. |
Yu Zhang; Xiusi Chen; Bowen Jin; Sheng Wang; Shuiwang Ji; Wei Wang; Jiawei Han; |
373 | Revealing The Parallel Multilingual Learning Within Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we start by revealing that LLMs learn from parallel multilingual input (PMI). |
Yongyu Mu; Peinan Feng; Zhiquan Cao; Yuzhang Wu; Bei Li; Chenglong Wang; Tong Xiao; Kai Song; Tongran Liu; Chunliang Zhang; JingBo Zhu; |
374 | LLMEdgeRefine: Enhancing Text Clustering with LLM-Based Boundary Point Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, traditional clustering methods often struggle with domain-specific fine-tuning and the presence of outliers. To address these challenges, we introduce LLMEdgeRefine, an iterative clustering method enhanced by large language models (LLMs), focusing on edge points refinement. |
Zijin Feng; Luyang Lin; Lingzhi Wang; Hong Cheng; Kam-Fai Wong; |
375 | Large Language Models Can Be Contextual Privacy Protection Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Direct fine-tuning LLMs on this data without privacy protection poses a risk of data leakage of sensitive PII during inference time. To address this challenge, we introduce Contextual Privacy Protection Language Models (CPPLM), a novel paradigm for fine-tuning LLMs that effectively injects domain-specific knowledge while safeguarding inference-time data privacy. |
Yijia Xiao; Yiqiao Jin; Yushi Bai; Yue Wu; Xianjun Yang; Xiao Luo; Wenchao Yu; Xujiang Zhao; Yanchi Liu; Quanquan Gu; Haifeng Chen; Wei Wang; Wei Cheng; |
376 | MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing benchmarks mainly focus on single-turn evaluations, overlooking the models’ capabilities in multi-turn interactions. To address this gap, we introduce , a comprehensive benchmark to evaluate the multi-turn conversational abilities of LLMs. |
Wai-Chung Kwan; Xingshan Zeng; Yuxin Jiang; Yufei Wang; Liangyou Li; Lifeng Shang; Xin Jiang; Qun Liu; Kam-Fai Wong; |
377 | When Parts Are Greater Than Sums: Individual LLM Components Can Outperform Full Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on our findings, we propose component reweighting, which learns to linearly re-scale the component activations from a few labeled examples. |
Ting-Yun Chang; Jesse Thomason; Robin Jia; |
378 | Symbolic Working Memory Enhances Language Models for Complex Rule Application Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It requires anchoring the applicable rule and supporting facts at each step, amidst multiple input rules, facts, and inferred facts. To address this, we propose augmenting LLMs with external working memory and introduce a neurosymbolic framework for rule application. |
Siyuan Wang; Zhongyu Wei; Yejin Choi; Xiang Ren; |
379 | Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: More specifically, we examine how different levels of language informativeness and diversity impact agent learning and inference. |
Jiajun Xi; Yinong He; Jianing Yang; Yinpei Dai; Joyce Chai; |
380 | MARE: Multi-Aspect Rationale Extractor on Unsupervised Rationale Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Multi-Aspect Rationale Extractor (MARE) to explain and predict multiple aspects simultaneously. |
Han Jiang; Junwen Duan; Zhe Qu; Jianxin Wang; |
381 | Fewer Is More: Boosting Math Reasoning with Reinforced Context Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose CoT-Influx, a novel approach that pushes the boundary of few-shot Chain-of-Thoughts (CoT) learning to improve LLM mathematical reasoning. |
Xijie Huang; Li Lyna Zhang; Kwang-Ting Cheng; Fan Yang; Mao Yang; |
382 | Beyond The Turn-Based Game: Enabling Real-Time Conversations with Duplex Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Traditional turn-based chat systems driven by LLMs prevent users from verbally interacting with the system while generating responses.To overcome these limitations, we adapt existing LLMs to duplex models so that they can listen to users while generating output and dynamically adjust themselves to provide instant feedback. |
Xinrong Zhang; Yingfa Chen; Shengding Hu; Xu Han; Zihang Xu; Yuanwei Xu; Weilin Zhao; Maosong Sun; Zhiyuan Liu; |
383 | Pragmatic Norms Are All You Need – Why The Symbol Grounding Problem Does Not Apply to LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Starting out with countering the arguments of Bender and Koller (2020), we trace the origins of the SGP to the computational theory of mind (CTM), and we show that it only arises with natural language when questionable theories of meaning are presupposed. |
Reto Gubelmann; |
384 | Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these works are often disconnected from the impact of such debiasing on downstream applications, which is the main motivation for debiasing in the first place. In this work, we systematically test how methods for intrinsic debiasing affect neural machine translation models, by measuring the extrinsic bias of such systems under different design choices. |
Bar Iluz; Yanai Elazar; Asaf Yehudai; Gabriel Stanovsky; |
385 | ProConSuL: Project Context for Code Summarization with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Project Context for Code Summarization with LLMs (ProConSuL), a new framework to provide a large language model (LLM) with precise information about the code structure from program analysis methods such as a compiler or IDE language services and use task decomposition derived from the code structure. |
Vadim Lomshakov; Andrey Podivilov; Sergey Savin; Oleg Baryshnikov; Alena Lisevych; Sergey Nikolenko; |
386 | DA3: A Distribution-Aware Adversarial Attack Against Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, they are easy to detect using straightforward detection methods, diminishing the efficacy of such attacks. To address this issue, we propose a Distribution-Aware Adversarial Attack (DA3) method. |
Yibo Wang; Xiangjue Dong; James Caverlee; Philip S. Yu; |
387 | Enhancing Data Quality Through Simple De-duplication: Navigating Responsible Computational Social Science Research Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we conduct an in-depth examination of 20 datasets extensively used in NLP for CSS to comprehensively examine data quality. |
Yida Mu; Mali Jin; Xingyi Song; Nikolaos Aletras; |
388 | Red Teaming Language Models for Processing Contradictory Dialogues Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Each dialogue is accompanied by an explanatory label that highlights the location and details of the contradiction. With this dataset, we present a Red Teaming framework for contradictory dialogue processing. |
Xiaofei Wen; Bangzheng Li; Tenghao Huang; Muhao Chen; |
389 | LLM4Decompile: Decompiling Binary Code with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We optimize the LLM training process and introduce the LLM4Decompile-End models to decompile binary directly. |
Hanzhuo Tan; Qi Luo; Jing Li; Yuqun Zhang; |
390 | SEEKR: Selective Attention-Guided Knowledge Retention for Continual Learning of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we first explore and emphasize the importance of attention weights in knowledge retention, and then propose a SElective attEntion-guided Knowledge Retention method (SEEKR) for data-efficient replay-based continual learning of large language models (LLMs). |
Jinghan He; Haiyun Guo; Kuan Zhu; Zihan Zhao; Ming Tang; Jinqiao Wang; |
391 | PTD-SQL: Partitioning and Targeted Drilling with LLMs in Text-to-SQL Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose that employing query group partitioning allows LLMs to focus on learning the thought processes specific to a single problem type, consequently enhancing their reasoning abilities across diverse difficulty levels and problem categories. |
Ruilin Luo; Liyuan Wang; Binghuai Lin; Zicheng Lin; Yujiu Yang; |
392 | Walking in Others’ Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by social psychology principles, we propose a novel strategy named perspective-taking prompting (PeT) that inspires LLMs to integrate diverse human perspectives and self-regulate their responses. |
Rongwu Xu; Zian Zhou; Tianwei Zhang; Zehan Qi; Su Yao; Ke Xu; Wei Xu; Han Qiu; |
393 | Course-Correction: Safety Alignment Using Synthetic Preferences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve, we propose fine-tuning LLMs with preference learning, emphasizing the preference for timely course-correction. |
Rongwu Xu; Yishuo Cai; Zhenhong Zhou; Renjie Gu; Haiqin Weng; Liu Yan; Tianwei Zhang; Wei Xu; Han Qiu; |
394 | SciDQA: A Deep Reading Comprehension Dataset Over Scientific Papers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SciDQA, a new dataset for reading comprehension that challenges language models to deeply understand scientific articles, consisting of 2,937 QA pairs. |
Shruti Singh; Nandan Sarkar; Arman Cohan; |
395 | Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View |
Philipp Mondorf; Barbara Plank; |
396 | An LLM Feature-based Framework for Dialogue Constructiveness Assessment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we propose an LLM feature-based framework for dialogue constructiveness assessment that combines the strengths of feature-based and neural approaches, while mitigating their downsides. |
Lexin Zhou; Youmna Farag; Andreas Vlachos; |
397 | What Are The Generator Preferences for End-to-end Task-Oriented Dialog System? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a framework called Regulating Preferences of Generator (RPG) based on retrieval results, which includes a generator preference extractor, an entity retriever, and a generator with the gate-controlled preference regulator. |
Wanshi Xu; Xianwei Zhuang; Zhanpeng Chen; Zhihong Zhu; Xuxin Cheng; Yuexian Zou; |
398 | MSI-Agent: Incorporating Multi-Scale Insight Into Embodied Agents for Superior Planning and Decision-Making Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the emergence of irrelevant insight and the lack of general insight can greatly undermine the effectiveness of insight. To solve this problem, in this paper, we introduce **M**ulti-**S**cale **I**nsight Agent (MSI-Agent), an embodied agent designed to improve LLMs’ planning and decision-making ability by summarizing and utilizing insight effectively across different scales. |
Dayuan Fu; Biqing Qi; Yihuai Gao; Che Jiang; Guanting Dong; Bowen Zhou; |
399 | Eliminating Biased Length Reliance of Direct Preference Optimization Via Down-Sampled KL Divergence Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While previous studies mainly attributed verbosity to biased labels within the data, we propose that the issue also stems from an inherent algorithmic length reliance in DPO. |
Junru Lu; Jiazheng Li; Siyu An; Meng Zhao; Yulan He; Di Yin; Xing Sun; |
400 | Open-world Multi-label Text Classification with Extremely Weak Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study open-world multi-label text classification under extremely weak supervision (XWS), where the user only provides a brief description for classification objectives without any labels or ground-truth label space. |
Xintong Li; Jinya Jiang; Ria Dharmani; Jayanth Srinivasa; Gaowen Liu; Jingbo Shang; |
401 | MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the figurative quality of MT and propose a set of human evaluation metrics focused on the translation of figurative language. |
Shun Wang; Ge Zhang; Han Wu; Tyler Loakman; Wenhao Huang; Chenghua Lin; |
402 | In Search of The Long-Tail: Systematic Generation of Long-Tail Inferential Knowledge Via Logical Rule Guided Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we take the first step towards evaluating LLMs in the long-tail distribution of inferential knowledge. |
Huihan Li; Yuting Ning; Zeyi Liao; Siyuan Wang; Xiang Lorraine Li; Ximing Lu; Wenting Zhao; Faeze Brahman; Yejin Choi; Xiang Ren; |
403 | Attribute Diversity Determines The Systematicity Gap in VQA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the systematicity gap in visual question answering: the performance difference between reasoning on previously seen and unseen combinations of object attributes. |
Ian Berlot-Attwell; Kumar Krishna Agrawal; Annabelle Michael Carrell; Yash Sharma; Naomi Saphra; |
404 | Grasping The Essentials: Tailoring Large Language Models for Zero-Shot Relation Extraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce REPaL, comprising three stages: (1) We leverage large language models (LLMs) to generate initial seed instances from relation definitions and an unlabeled corpus. |
Sizhe Zhou; Yu Meng; Bowen Jin; Jiawei Han; |
405 | SpeechQE: Estimating The Quality of Direct Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we formulate the task of quality estimation for speech translation (SpeechQE), construct a benchmark, and evaluate a family of systems based on cascaded and end-to-end architectures. |
HyoJung Han; Kevin Duh; Marine Carpuat; |
406 | A Survey of Ontology Expansion for Conversational Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It categorizes the existing literature into three main areas: (1) New Intent Discovery, (2) New Slot-Value Discovery, and (3) Joint OnExp. By examining the methodologies, benchmarks, and challenges associated with these areas, we highlight several emerging frontiers in OnExp to improve agent performance in real-world scenarios and discuss their corresponding challenges. |
Jinggui Liang; Yuxia Wu; Yuan Fang; Hao Fei; Lizi Liao; |
407 | TRoTR: A Framework for Evaluating The Re-contextualization of Text Reuse Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework called TRoTR that relies on the notion of topic relatedness for evaluating the diachronic change of context in which text is reused. |
Francesco Periti; Pierluigi Cassotti; Stefano Montanelli; Nina Tahmasebi; Dominik Schlechtweg; |
408 | Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Credibility-aware Generation (CAG), a universally applicable framework designed to mitigate the impact of flawed information in RAG. |
Ruotong Pan; Boxi Cao; Hongyu Lin; Xianpei Han; Jia Zheng; Sirui Wang; Xunliang Cai; Le Sun; |
409 | Local Contrastive Editing of Gender Stereotypes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce local contrastive editing that enables the localization and editing of a subset of weights in a target model in relation to a reference model. |
Marlene Lutz; Rochelle Choenni; Markus Strohmaier; Anne Lauscher; |
410 | DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we present a new framework (called DocKD) that enriches the data generation process by integrating external document knowledge. |
Sungnyun Kim; Haofu Liao; Srikar Appalaraju; Peng Tang; Zhuowen Tu; Ravi Kumar Satzoda; R. Manmatha; Vijay Mahadevan; Stefano Soatto; |
411 | EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present EH-MAM (Easy-to-Hard adaptive Masked Acoustic Modeling), a novel self-supervised learning approach for speech representation learning. |
Ashish Seth; Ramaneswaran Selvakumar; S Sakshi; Sonal Kumar; Sreyan Ghosh; Dinesh Manocha; |
412 | Kiss Up, Kick Down: Exploring Behavioral Changes in Multi-modal Large Language Models with Assigned Visual Personas Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study is the first to explore whether multi-modal large language models (LLMs) can align their behaviors with visual personas, addressing a significant gap in the literature that predominantly focuses on text-based personas. We developed a novel dataset of 5K fictional avatar images for assignment as visual personas to LLMs, and analyzed their negotiation behaviors based on the visual traits depicted in these images, with a particular focus on aggressiveness. |
Seungjong Sun; Eungu Lee; Seo Yeon Baek; Seunghyun Hwang; Wonbyung Lee; Dongyan Nan; Bernard J Jansen; Jang Hyun Kim; |
413 | The Instinctive Bias: Spurious Images Lead to Illusion in MLLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we identify a typical class of inputs that baffles MLLMs, which consist of images that are highly relevant but inconsistent with answers, causing MLLMs to suffer from visual illusion. |
Tianyang Han; Qing Lian; Rui Pan; Renjie Pi; Jipeng Zhang; Shizhe Diao; Yong Lin; Tong Zhang; |
414 | Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite being widely applied, in-context learning is vulnerable to malicious attacks. In this work, we raise security concerns regarding this paradigm. |
Shuai Zhao; Meihuizi Jia; Anh Tuan Luu; Fengjun Pan; Jinming Wen; |
415 | WorryWords: Norms of Anxiety Association for Over 44k English Words Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work,we introduce WorryWords, the first large-scale repository of manually derived word-anxiety associations for over 44,450 English words. |
Saif M. Mohammad; |
416 | Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We extend this research by analyzing and comparing circuits for similar sequence continuation tasks, which include increasing sequences of Arabic numerals, number words, and months. |
Michael Lan; Philip Torr; Fazl Barez; |
417 | Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a methodology to identify next-token neurons, find prompts that highly activate them, and determine the upstream attention heads responsible. |
Clement Neo; Shay B Cohen; Fazl Barez; |
418 | Text Grafting: Near-Distribution Weak Supervision for Minority Classes in Text Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent works have started to generate the relevant texts by prompting LLMs using the class names or definitions; however, there is a high risk that LLMs cannot generate in-distribution (i. e. , similar to the corpus where the text classifier will be applied) data, leading to ungeneralizable classifiers. In this paper, we combine the advantages of these two approaches and propose to bridge the gap via a novel framework, text grafting, which aims to obtain clean and near-distribution weak supervision for minority classes. |
Letian Peng; Yi Gu; Chengyu Dong; Zihan Wang; Jingbo Shang; |
419 | Incubating Text Classifiers Following User Instruction with Nothing But LLM Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to generate text classification data given arbitrary class definitions (i. e. , user instruction), so one can train a text classifier without any human annotation or raw corpus. |
Letian Peng; Zilong Wang; Jingbo Shang; |
420 | QUDSELECT: Selective Decoding for Questions Under Discussion Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce QUDSELECT, a joint-training framework that selectively decodes the QUD dependency structures considering the QUD criteria criteria. |
Ashima Suvarna; Xiao Liu; Tanmay Parekh; Kai-Wei Chang; Nanyun Peng; |
421 | Control Large Language Models Via Divide and Conquer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conclude that black-box LLMs face significant challenges in consistently satisfying lexical constraints with prompt-based controlling. To address this bottleneck, we introduce the Divide and Conquer Generation strategy, effective for both white-box and black-box LLMs, to enhance LLMs performance in LCG tasks, which demonstrates over 90% improvement on success rate in the most challenging LCG task. |
Bingxuan Li; Yiwei Wang; Tao Meng; Kai-Wei Chang; Nanyun Peng; |
422 | Re-ReST: Reflection-Reinforced Self-Training for Language Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the use of self-training in language agents, which can generate supervision from the agent itself, offering a promising alternative without relying on human or stronger model demonstrations. |
Zi-Yi Dou; Cheng-Fu Yang; Xueqing Wu; Kai-Wei Chang; Nanyun Peng; |
423 | Measuring Psychological Depth in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Psychological Depth Scale (PDS), a novel framework rooted in literary theory that measures an LLM’s ability to produce authentic and narratively complex stories that provoke emotion, empathy, and engagement. |
Fabrice Y Harel-Canada; Hanyu Zhou; Sreya Muppalla; Zeynep Senahan Yildiz; Miryung Kim; Amit Sahai; Nanyun Peng; |
424 | Tools Fail: Detecting Silent Errors in Faulty Tools Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we introduce a framework for tools more broadly which guides us to explore a model’s ability to detect silent tool errors, and reflect on how to plan. |
Jimin Sun; So Yeon Min; Yingshan Chang; Yonatan Bisk; |
425 | Divide-Conquer-Reasoning for Consistency Evaluation and Automatic Improvement of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose , an automated framework for evaluating and improving the consistency of LLM-generated texts using a divide-conquer-reasoning approach. |
Wendi Cui; Zhuohang Li; Damien Lopez; Kamalika Das; Bradley A. Malin; Sricharan Kumar; Jiaxin Zhang; |
426 | When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech Into Large Language Models for Depression Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the application of LLMs in the identification and analysis of depressive states remains relatively unexplored, presenting an intriguing avenue for future research. In this paper, we present an innovative approach to employ an LLM in the realm of depression detection, integrating acoustic speech information into the LLM framework for this specific application. |
Xiangyu Zhang; Hexin Liu; Kaishuai Xu; Qiquan Zhang; Daijiao Liu; Beena Ahmed; Julien Epps; |
427 | Scope-enhanced Compositional Semantic Parsing for DRT Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the AMS parser, a compositional, neurosymbolic semantic parser for DRT. |
Xiulin Yang; Jonas Groschwitz; Alexander Koller; Johan Bos; |
428 | Beyond Correlation: Interpretable Evaluation of Machine Translation Metrics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, MT metrics’ capabilities have historically been evaluated using correlation with human judgment, which, despite its efficacy, falls short of providing intuitive insights into metric performance, especially in terms of new metric use cases. To address these issues, we introduce an interpretable evaluation framework for MT metrics. |
Stefano Perrella; Lorenzo Proietti; Pere-Llu�s Huguet Cabot; Edoardo Barba; Roberto Navigli; |
429 | ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these methods require additional training, hand-crafted templates or human-written explanations. To address these issues, we introduce ZEBRA, a zero-shot question answering framework that combines retrieval, case-based reasoning and introspection and dispenses with the need for additional training of the LLM. |
Francesco Maria Molfese; Simone Conia; Riccardo Orlando; Roberto Navigli; |
430 | Triad: A Framework Leveraging A Multi-Role LLM-based Agent to Solve Knowledge Base Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Triad, a unified framework that utilizes an LLM-based agent with multiple roles for KBQA tasks. |
Chang Zong; Yuchen Yan; Weiming Lu; Jian Shao; Yongfeng Huang; Heng Chang; Yueting Zhuang; |
431 | Dissecting Fine-Tuning Unlearning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we delve into the limitations of fine-tuning-based unlearning through activation patching and parameter restoration experiments. |
Yihuai Hong; Yuelin Zou; Lijie Hu; Ziqian Zeng; Di Wang; Haiqin Yang; |
432 | Unlocking Markets: A Multilingual Benchmark to Cross-Market Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a large-scale dataset comprising over 7 million questions from 17 marketplaces across 11 languages. |
Yifei Yuan; Yang Deng; Anders S�gaard; Mohammad Aliannejadi; |
433 | Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Tree-of-Lens (ToL) agent, utilizing a novel ToL grounding mechanism, to address the ScreenPR task. |
Yue Fan; Lei Ding; Ching-Chen Kuo; Shan Jiang; Yang Zhao; Xinze Guan; Jie Yang; Yi Zhang; Xin Eric Wang; |
434 | Data Contamination Can Cross Language Barriers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we first present a cross-lingual form of contamination that inflates LLMs\u2019 performance while evading current detection methods, deliberately injected by overfitting LLMs on the translated versions of benchmark test sets. Then, we propose generalization-based approaches to unmask such deeply concealed contamination. |
Feng Yao; Yufan Zhuang; Zihao Sun; Sunan Xu; Animesh Kumar; Jingbo Shang; |
435 | GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose GAMA, a novel General-purpose Large Audio-Language Model (LALM) with Advanced Audio Understanding and Complex Reasoning Abilities. |
Sreyan Ghosh; Sonal Kumar; Ashish Seth; Chandra Kiran Reddy Evuru; Utkarsh Tyagi; S Sakshi; Oriol Nieto; Ramani Duraiswami; Dinesh Manocha; |
436 | Mitigating Training Imbalance in LLM Fine-Tuning Via Selective Parameter Merging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate that the order of training data can lead to significant training imbalances, potentially resulting in performance degradation. |
Yiming Ju; Ziyi Ni; Xingrun Xing; Zhixiong Zeng; Hanyu Zhao; Siqi Fan; Zheng Zhang; |
437 | Where Am I From? Identifying Origin of LLM-generated Content Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, their widespread adoption raises concerns regarding copyright infringement, privacy violations, and security risks associated with AI-generated content. To address these concerns, we propose a novel digital forensics framework for LLMs, enabling the tracing of AI-generated content back to its source. |
Liying Li; Yihan Bai; Minhao Cheng; |
438 | RAG4ITOps: A Supervised Fine-Tunable and Comprehensive RAG Framework for IT Operations and Maintenance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a general and comprehensive framework based on Retrieval Augmented Generation (RAG) and facilitate the whole business process of establishing QA systems for IT operations and maintenance. |
Tianyang Zhang; Zhuoxuan Jiang; Shengguang Bai; Tianrui Zhang; Lin Lin; Yang Liu; Jiawei Ren; |
439 | To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper systematically investigates the impact of connectors on MLLM performance. |
Junyan Lin; Haoran Chen; Dawei Zhu; Xiaoyu Shen; |
440 | ATAP: Automatic Template-Augmented Commonsense Knowledge Graph Completion Via Pre-Trained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Text-based methods alleviate this issue, but require extensive training and fine-tuning of language models, which reduces efficiency. To alleviate these problems, we propose ATAP, the first CKGC framework that utilizes automatically generated continuous prompt templates combined with pre-trained language models (PLMs). |
Fu Zhang; Yifan Ding; Jingwei Cheng; |
441 | Social Bias Probing: Fairness Benchmarking for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment, which involves treating individuals differently according to their affiliation with a sensitive demographic group. |
Marta Marchiori Manerba; Karolina Stanczak; Riccardo Guidotti; Isabelle Augenstein; |
442 | ABLE: Personalized Disability Support with Politeness and Empathy Integration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce ABLE (Adaptive, Bespoke, Listen and Empathetic), a Conversational Support System for Physical Disabilities. |
Kshitij Mishra; Manisha Burja; Asif Ekbal; |
443 | Boosting Logical Fallacy Reasoning in LLMs Via Logical Structure Tree Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We observe that logical fallacies often use connective words to indicate an intended logical relation between two arguments, while the argument semantics does not actually support the logical relation. Inspired by this observation, we propose to build a logical structure tree to explicitly represent and track the hierarchical logic flow among relation connectives and their arguments in a statement. |
Yuanyuan Lei; Ruihong Huang; |
444 | A New Pipeline for Knowledge Graph Reasoning Enhanced By Large Language Models Without Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they require fine-tuning on open-source LLMs and are not applicable to closed-source LLMs. Therefore, in this paper, to leverage the knowledge in LLMs without fine-tuning to assist and enhance conventional KGR models, we propose a new three-stage pipeline, including knowledge alignment, KG reasoning and entity reranking. |
Zhongwu Chen; Long Bai; Zixuan Li; Zhen Huang; Xiaolong Jin; Yong Dou; |
445 | CoTKR: Chain-of-Thought Enhanced Knowledge Rewriting for Complex Knowledge Graph Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address them, we propose a novel rewriting method CoTKR, Chain- of-Thought Enhanced Knowledge Rewriting, for generating reasoning traces and corresponding knowledge in an interleaved manner, thereby mitigating the limitations of single-step knowledge rewriting. |
Yike Wu; Yi Huang; Nan Hu; Yuncheng Hua; Guilin Qi; Jiaoyan Chen; Jeff Z. Pan; |
446 | KnowledgeSG: Privacy-Preserving Synthetic Text Generation with Knowledge Distillation from Server Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: They either rely on a local model for generation, resulting in a performance decline, or take advantage of APIs, directly exposing the data to API servers. To address this issue, we propose KnowledgeSG, a novel client-server framework which enhances synthetic data quality and improves model performance while ensuring privacy. |
WenHao Wang; Xiaoyu Liang; Rui Ye; Jingyi Chai; Siheng Chen; Yanfeng Wang; |
447 | NuNER: Entity Recognition Encoder Pre-training Via LLM-Annotated Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show how to use LLMs to create NuNER, a compact language representation model specialized in the Named Entity Recognition (NER) task. |
Sergei Bogdanov; Alexandre Constantin; Timoth�e Bernard; Benoit Crabb�; Etienne P Bernard; |
448 | Waterfall: Scalable Framework for Robust Text Watermarking and Provenance for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Waterfall, the first training-free framework for robust and scalable text watermarking applicable across multiple text types (e. g. , articles, code) and languages supportable by LLMs, for general text and LLM data provenance. |
Gregory Kang Ruey Lau; Xinyuan Niu; Hieu Dao; Jiangwei Chen; Chuan-Sheng Foo; Bryan Kian Hsiang Low; |
449 | T-FREE: Subword Tokenizer-Free Generative LLMs Via Sparse Representations for Memory-Efficient Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, their performance is biased towards a reference corpus, leading to reduced effectiveness for underrepresented languages. To remedy these issues, we propose T-Free, which directly embeds words through sparse activation patterns over character triplets and does not require a reference corpus. |
Bj�rn Deiseroth; Manuel Brack; Patrick Schramowski; Kristian Kersting; Samuel Weinbach; |
450 | Large Language Models for Data Annotation and Synthesis: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, this survey includes an in-depth taxonomy of data types that LLMs can annotate, a comprehensive review of learning strategies for models utilizing LLM-generated annotations, and a detailed discussion of the primary challenges and limitations associated with using LLMs for data annotation and synthesis. Serving as a key guide, this survey aims to assist researchers and practitioners in exploring the potential of the latest LLMs for data annotation, thereby fostering future advancements in this critical field. |
Zhen Tan; Dawei Li; Song Wang; Alimohammad Beigi; Bohan Jiang; Amrita Bhattacharjee; Mansooreh Karami; Jundong Li; Lu Cheng; Huan Liu; |
451 | Glue Pizza and Eat Rocks – Exploiting Vulnerabilities in Retrieval-Augmented Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs) by integrating external knowledge bases, improving their performance in applications like fact-checking and information searching. In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases by injecting deceptive content into the retrieval database, intentionally changing the model’s behavior. |
Zhen Tan; Chengshuai Zhao; Raha Moraffah; Yifan Li; Song Wang; Jundong Li; Tianlong Chen; Huan Liu; |
452 | Assessing and Verifying Task Utility in LLM-Powered Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce AgentEval, a novel framework designed to simplify the utility verification process by automatically proposing a set of criteria tailored to the unique purpose of any given application. |
Negar Arabzadeh; Siqing Huo; Nikhil Mehta; Qingyun Wu; Chi Wang; Ahmed Hassan Awadallah; Charles L. A. Clarke; Julia Kiseleva; |
453 | FuseGen: PLM Fusion for Data-generation Based Zero-shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous solutions have primarily focused on single PLM settings, where synthetic datasets are typically restricted to specific sub-spaces and often deviate from real-world distributions, leading to severe distribution bias. To mitigate such bias, we propose FuseGen, a novel data-generation based zero-shot learning framework that introduces a new criteria for subset selection from synthetic datasets via utilizing multiple PLMs and trained STMs. |
Tianyuan Zou; Yang Liu; Peng Li; Jianqing Zhang; Jingjing Liu; Ya-Qin Zhang; |
454 | Multi-expert Prompting Improves Reliability, Safety and Usefulness of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Multi-expert Prompting, a novel enhancement of ExpertPrompting (Xu et al. , 2023), designed to improve the large language model (LLM) generation. |
Do Xuan Long; Duong Ngoc Yen; Anh Tuan Luu; Kenji Kawaguchi; Min-Yen Kan; Nancy F. Chen; |
455 | Mixture-of-Subspaces in Low-Rank Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a subspace-inspired Low-Rank Adaptation (LoRA) method, which is computationally efficient, easy to implement, and readily applicable to large language, multimodal, and diffusion models. |
Taiqiang Wu; Jiahao Wang; Zhe Zhao; Ngai Wong; |
456 | MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing speech FMs (SFMs) fall short of full compliance with the open-source principles, even if claimed otherwise, as no existing SFM has model weights, code, and training data publicly available under open-source terms. In this work, we take the first step toward filling this gap by focusing on the 24 official languages of the European Union (EU). |
Marco Gaido; Sara Papi; Luisa Bentivogli; Alessio Brutti; Mauro Cettolo; Roberto Gretter; Marco Matassoni; Mohamed Nabih; Matteo Negri; |
457 | DVD: Dynamic Contrastive Decoding for Knowledge Amplification in Multi-Document Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Retrieval-augmented generation (RAG) offers a potential remedy, yet the uneven retrieval quality and irrelevant contents may distract LLMs. In this work, we address these issues at the generation phase by treating RAG as a multi-document QA task. |
Jing Jin; Houfeng Wang; Hao Zhang; Xiaoguang Li; Zhijiang Guo; |
458 | Neuron-Level Knowledge Attribution in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a static method for pinpointing significant neurons. |
Zeping Yu; Sophia Ananiadou; |
459 | How Do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads Are Two Towers for Metric Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The query and key matrices can be considered as two towers that learn the similarity metric between the last position’s features and each demonstration at label positions. Using this hypothesis, we explain the majority label bias and recency bias in ICL and propose two methods to reduce these biases by 22% and 17%, respectively. |
Zeping Yu; Sophia Ananiadou; |
460 | Interpreting Arithmetic Mechanism in Large Language Models Through Comparative Neuron Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To delve into the reason, we introduce the Comparative Neuron Analysis (CNA) method, which identifies an internal logic chain consisting of four distinct stages from input to prediction: feature enhancing with shallow FFN neurons, feature transferring by shallow attention layers, feature predicting by arithmetic heads, and prediction enhancing among deep FFN neurons. |
Zeping Yu; Sophia Ananiadou; |
461 | MedCoT: Medical Chain of Thought Via Hierarchical Expert Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, current Med-VQA algorithms, typically reliant on singular models, lack the robustness needed for real-world medical diagnostics which usually require collaborative expert evaluation. To address these shortcomings, this paper presents MedCoT, a novel hierarchical expert verification reasoning chain method designed to enhance interpretability and accuracy in biomedical imaging inquiries. |
Jiaxiang Liu; Yuan Wang; Jiawei Du; Joey Tianyi Zhou; Zuozhu Liu; |
462 | Holistic Evaluation for Interleaved Text-and-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce InterleavedBench, the first benchmark carefully curated for the evaluation of interleaved text-and-image generation. |
Minqian Liu; Zhiyang Xu; Zihao Lin; Trevor Ashby; Joy Rimchala; Jiaxin Zhang; Lifu Huang; |
463 | RESTful-Llama: Connecting User Queries to RESTful APIs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose RESTful-Llama, a novel framework designed to enable Llama 3. |
Han Xu; Ruining Zhao; Jindong Wang; Haipeng Chen; |
464 | How Does The Textual Information Affect The Retrieval of Multimodal In-Context Learning? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our study provides an in-depth evaluation of the impact of textual information on the unsupervised selection of in-context examples in multimodal contexts, uncovering a notable sensitivity of retriever performance to the employed modalities. Based on the above finding, we introduce a novel supervised MLLM prompt retriever MSIER that leverages a trained retriever based on MLLM’s confidence to select examples, which enhances multimodal in-context learning efficiency. |
Yang Luo; Zangwei Zheng; Zirui Zhu; Yang You; |
465 | CompAct: Compressing Retrieved Documents Actively for Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Context compression tackles this issue by filtering out irrelevant information, but current methods still struggle in realistic scenarios where crucial information cannot be captured with a single-step approach. To overcome this limitation, we introduce CompAct, a novel framework that employs an active strategy to condense extensive documents without losing key information. |
Chanwoong Yoon; Taewhoo Lee; Hyeon Hwang; Minbyul Jeong; Jaewoo Kang; |
466 | Dynamic Multi-Reward Weighting for Multi-Style Controllable Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate various formulations of multi-style reward formulations, including calibrated outputs from discriminators and dynamic weighting by discriminator gradient magnitudes. |
Karin De Langis; Ryan Koo; Dongyeop Kang; |
467 | DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a dynamic decision-making framework that categorizes tasks into two distinct pathways: ‘Fast,’ designated for tasks where the LLM quickly identifies a high-confidence solution, and ‘Slow,’ allocated for tasks that the LLM perceives as complex and for which it has low confidence in immediate solutions as well as requiring more reasoning paths to verify. |
Jiabao Pan; Yan Zhang; Chen Zhang; Zuozhu Liu; Hongwei Wang; Haizhou Li; |
468 | Knowledge Conflicts for LLMs: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These conflicts can significantly impact the trustworthiness and performance of LLMs, especially in real-world applications where noise and misinformation are common. By categorizing these conflicts, exploring the causes, examining the behaviors of LLMs under such conflicts, and reviewing available solutions, this survey aims to shed light on strategies for improving the robustness of LLMs, thereby serving as a valuable resource for advancing research in this evolving area. |
Rongwu Xu; Zehan Qi; Zhijiang Guo; Cunxiang Wang; Hongru Wang; Yue Zhang; Wei Xu; |
469 | VLEU: A Method for Automatic Evaluation for Generalizability of Text-to-Image Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they fall short in evaluating a model’s ability to generalize across a broad spectrum of textual inputs. To address this gap, we propose the VLEU (Visual Language Evaluation Understudy) metric. |
Jingtao Cao; Zhang Zheng; Hongru Wang; Kam-Fai Wong; |
470 | Efficient LLM Comparative Assessment: A Product of Experts Framework for Pairwise Comparisons Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a Product of Expert (PoE) framework for efficient LLM Comparative Assessment. |
Adian Liusie; Vatsal Raina; Yassir Fathullah; Mark Gales; |
471 | Aligning Large Language Models with Diverse Political Viewpoints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Large language models such as ChatGPT exhibit striking political biases. If users query them about political information, they often take a normative stance. To overcome this, we align LLMs with diverse political viewpoints from 100,000 comments written by candidates running for national parliament in Switzerland. |
Dominik Stammbach; Philine Widmer; Eunjung Cho; Caglar Gulcehre; Elliott Ash; |
472 | Generative Subgraph Retrieval for Knowledge Graph-Grounded Dialog Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Dialog generation with Generative Subgraph Retrieval (DialogGSR), which retrieves relevant knowledge subgraphs by directly generating their token sequences on top of language models. |
Jinyoung Park; Minseok Joo; Joo-Kyung Kim; Hyunwoo J. Kim; |
473 | Fuse to Forget: Bias Reduction and Selective Memorization Through Model Fusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study the inverse problem: investigating whether model fusion can be used to reduce unwanted knowledge. |
Kerem Zaman; Leshem Choshen; Shashank Srivastava; |
474 | CELLO: Causal Evaluation of Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous work typically focuses on commonsense causality between events and/or actions, which is insufficient for applications like embodied agents and lacks the explicitly defined causal graphs required for formal causal reasoning. To overcome these limitations, we introduce a fine-grained and unified definition of causality involving interactions between humans and/or objects. |
Meiqi Chen; Bo Peng; Yan Zhang; Chaochao Lu; |
475 | Revisiting Automated Evaluation for Long-form Table Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce LFTQA-Eval, a meta-evaluation dataset comprising 2,988 human-annotated examples, to rigorously assess the efficacy of current automated metrics in assessing LLM-based LFTQA systems, with a focus on faithfulness and comprehensiveness. |
Yuqi Wang; Lyuhao Chen; Songcheng Cai; Zhijian Xu; Yilun Zhao; |
476 | The Computational Anatomy of Humility: Modeling Intellectual Humility in Online Public Discourse Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we focus on one particular virtue: intellectual humility (IH), or acknowledging the potential limitations in one’s own beliefs. |
Xiaobo Guo; Neil Potnis; Melody Yu; Nabeel Gillani; Soroush Vosoughi; |
477 | AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we leverage Heavy-Tailed Self-Regularization (HT-SR) Theory to design a fine-grained allocation strategy. |
Peijun Qing; Chongyang Gao; Yefan Zhou; Xingjian Diao; Yaoqing Yang; Soroush Vosoughi; |
478 | How Susceptible Are Large Language Models to Ideological Manipulation? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate how effectively LLMs can learn and generalize ideological biases from their instruction-tuning data. |
Kai Chen; Zihao He; Jun Yan; Taiwei Shi; Kristina Lerman; |
479 | Concept-skill Transferability-based Data Selection for Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce COINCIDE, an effective and scalable data selection technique that uses a small model as a reference model to select visual instruction tuning data for efficient finetuning of a target LVLM, focusing on diversity and transferability. |
Jaewoo Lee; Boyang Li; Sung Ju Hwang; |
480 | ChatGPT Doesn’t Trust Chargers Fans: Guardrail Sensitivity in Context Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: By generating user biographies that offer ideological and demographic information, we find a number of biases in guardrail sensitivity on GPT-3. |
Victoria R Li; Yida Chen; Naomi Saphra; |
481 | AnaloBench: Benchmarking The Identification of Abstract and Long-context Analogies Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Can language models (LMs) do the same? To answer this question, we propose AnaloBench, a benchmark to determine analogical reasoning ability in LMs. |
Xiao Ye; Andrew Wang; Jacob Choi; Yining Lu; Shreya Sharma; Lingfeng Shen; Vijay Murari Tiyyala; Nicholas Andrews; Daniel Khashabi; |
482 | TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is challenging for models to understand complex, multimodal content such as television clips, and this is in part because video-language models often rely on single-modality reasoning and lack interpretability. To combat these issues we propose TV-TREES, the first multimodal entailment tree generator. |
Kate Sanders; Nathaniel Weir; Benjamin Van Durme; |
483 | Automatically Generated Definitions and Their Utility for Modeling Word Meaning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we delve into the generation of dictionary-like sense definitions and explore their utility for modeling word meaning. |
Francesco Periti; David Alfter; Nina Tahmasebi; |
484 | GlobeSumm: A Challenging Benchmark Towards Unifying Multi-lingual, Cross-lingual and Multi-document News Summarization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current studies often neglect such real-world scenarios as they tend to focus solely on either single-language or single-document tasks. To bridge this gap, we aim to unify Multi-lingual, Cross-lingual and Multi-document Summarization into a novel task, i. e. , MCMS, which encapsulates the real-world requirements all-in-one. |
Yangfan Ye; Xiachong Feng; Xiaocheng Feng; Weitao Ma; Libo Qin; Dongliang Xu; Qing Yang; Hongtao Liu; Bing Qin; |
485 | ApiQ: Finetuning of 2-Bit Quantized Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a novel quantization framework named ApiQ, designed to restore the lost information from quantization by concurrently initializing the LoRA components and quantizing the weights of LLMs. |
Baohao Liao; Christian Herold; Shahram Khadivi; Christof Monz; |
486 | Video-Text Prompting for Weakly Supervised Spatio-Temporal Video Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Video-Text Prompting(VTP) to construct candidate feature. |
Heng Zhao; Zhao Yinjie; Bihan Wen; Yew-Soon Ong; Joey Tianyi Zhou; |
487 | TroL: Traversal of Layers for Large Language and Vision Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These large models demand costly, high-end resources for both training and inference. To address this issue, we present a new efficient LLVM family with 1.8B, 3.8B, and 7B LLM model sizes, Traversal of Layers (TroL), which enables the reuse of layers in a token-wise manner. |
Byung-Kwan Lee; Sangyun Chung; Chae Won Kim; Beomchan Park; Yong Man Ro; |
488 | OpenSep: Leveraging Large Language Models with Textual Inversion for Open World Audio Separation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose OpenSep, a novel framework that leverages large language models (LLMs) for automated audio separation, eliminating the need for manual intervention and overcoming source limitations. |
Tanvir Mahmud; Diana Marculescu; |
489 | Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the often coupled practices of using translated data in both stages, such imperfections could have been overlooked. This work investigates these issues using controlled native or translated data during the instruction tuning and evaluation stages. |
Pinzhen Chen; Simon Yu; Zhicheng Guo; Barry Haddow; |
490 | VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the VideoCLIP-XL (eXtra Length) model, which aims to unleash the long-description understanding capability of video CLIP models. |
Jiapeng Wang; Chengyu Wang; Kunzhe Huang; Jun Huang; Lianwen Jin; |
491 | Getting More from Less: Large Language Models Are Good Spontaneous Multilingual Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we first discover and comprehensively investigate the spontaneous multilingual alignment of LLMs. Firstly, we find that LLMs instruction-tuned on the question translation data (i.e. without annotated answers) are able to encourage the alignment between English and a wide range of languages, even including those unseen during instruction-tuning. Additionally, we utilize different settings and mechanistic interpretability methods to analyze the LLM\u2019s performance in the multilingual scenario comprehensively. |
Shimao Zhang; Changjiang Gao; Wenhao Zhu; Jiajun Chen; Xin Huang; Xue Han; Junlan Feng; Chao Deng; Shujian Huang; |
492 | Decompose and Compare Consistency: Measuring VLMs’ Answer Reliability Via Task-Decomposition Consistency Comparison Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate these, we propose Decompose and Compare Consistency (DeCC) for reliability measurement. |
Qian Yang; Weixiang Yan; Aishwarya Agrawal; |
493 | Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the novel challenge of VideoQA within a continual learning framework, and empirically identify a critical issue: fine-tuning a large language model (LLM) for a sequence of tasks often results in catastrophic forgetting. |
Chen Cai; Zheng Wang; Jianjun Gao; Wenyang Liu; Ye Lu; Runzhong Zhang; Kim-Hui Yap; |
494 | Bridging Modalities: Enhancing Cross-Modality Hate Speech Detection with Few-Shot In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study conducts extensive experiments using few-shot in-context learning with large language models to explore the transferability of hate speech detection between modalities. |
Ming Shan Hee; Aditi Kumaresan; Roy Ka-Wei Lee; |
495 | Automatic Sentence Segmentation of Clinical Record Narratives in Real-world Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a straightforward and effective sequence labeling classifier to predict sentence spans using a dynamic sliding window based on the prediction of each input sequence. |
Dongfang Xu; Davy Weissenbacher; Karen O’Connor; Siddharth Rawal; Graciela Gonzalez Hernandez; |
496 | FEDKIM: Adaptive Federated Knowledge Injection Into Medical Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In the medical domain, however, the development of comprehensive foundation models is constrained by limited access to diverse modalities and stringent privacy regulations. To address these constraints, this study introduces a novel knowledge injection approach, FedKIM, designed to scale the medical foundation model within a federated learning framework. |
Xiaochen Wang; Jiaqi Wang; Houping Xiao; Jinghui Chen; Fenglong Ma; |
497 | Jailbreaking LLMs with Arabic Transliteration and Arabizi Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study identifies the potential vulnerabilities of Large Language Models (LLMs) to ‘jailbreak’ attacks, specifically focusing on the Arabic language and its various forms. |
Mansour Al Ghanim; Saleh Almohaimeed; Mengxin Zheng; Yan Solihin; Qian Lou; |
498 | DC-Instruct: An Effective Framework for Generative Multi-intent Spoken Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, critical gaps exist in these frameworks: the lack of explicit modeling of dual-task dependencies and the oversight of task-specific semantic differences among utterances. To address these shortcomings, we propose DC-Instruct, a novel generative framework based on Dual-task Inter-dependent Instructions (DII) and Supervised Contrastive Instructions (SCI). |
Bowen Xing; Lizi Liao; Minlie Huang; Ivor Tsang; |
499 | The Generation Gap: Exploring Age Bias in The Value Systems of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our findings highlight the age bias in LLMs and provide insights for future work. |
Siyang Liu; Trisha Maturi; Bowen Yi; Siqi Shen; Rada Mihalcea; |
500 | Zero-shot Cross-domain Dialogue State Tracking Via Context-aware Auto-prompting and Instruction-following Contrastive Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous studies have implemented slot-based input improvements, such as schema-driven descriptions and question-answering formats, but still suffer from negative transfer for seen slots and inefficient transfer for unseen slots due to the significant source-target domain gap. To address these issues, we introduce a novel framework called Context-aware Auto-prompting and Instruction-following Contrastive Decoding (CAPID). |
Xiaoyu Dong; Yujie Feng; Zexin Lu; Guangyuan Shi; Xiao-Ming Wu; |
This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (>1,300 papers), please visit Paper Digest: EMNLP-2024 (Full List).