Paper Digest: ICLR 2024 Papers & Highlights
Note: ICLR-2024 accepts more than 2,200 papers, this page only includes 500 of them selected by our daily paper digest algorithm. Interested users can choose to read All 2,200 ICLR-2024 papers in a separate page, which takes quite some time to load.
To search or review papers within ICLR-2024 related to a specific topic, please use the search by venue (ICLR-2024), review by venue (ICLR-2024) and question answering by venue (ICLR-2024) services. To browse papers by author, here is a list of all ~9,000 authors (ICLR-2024). You may also like to explore our “Best Paper” Digest (ICLR), which lists the most influential ICLR papers since 2018.
This list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that empowers you to write, review, get answers and more. Try us today and unlock the full potential of our services for free!
TABLE 1: Paper Digest: ICLR 2024 Papers & Highlights
Paper | Author(s) | |
---|---|---|
1 | GAIA: A Benchmark for General AI Assistants Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce GAIA, a benchmark for General AI Assistants that, if solved, would represent a milestone in AI research. |
Grégoire Mialon; Clémentine Fourrier; Thomas Wolf; Yann LeCun; Thomas Scialom; |
2 | SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Stable Diffusion XL (SDXL), a latent diffusion model for text-to-image synthesis. |
Dustin Podell; Zion English; Kyle Lacey; Andreas Blattmann; Tim Dockhorn; Jonas Müller; Joe Penna; Robin Rombach; |
3 | SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Quantizing models to 3-4 bits per parameter can lead to moderate to high accuracy losses, especially for smaller models (1-10B parameters), which are suitable for edge deployment. To address this accuracy issue, we introduce the Sparse-Quantized Representation (SpQR), a new compressed format and quantization technique that enables for the first time \emph{near-lossless} compression of LLMs across model scales while reaching similar compression levels to previous methods. |
Tim Dettmers; Ruslan A. Svirschevski; Vage Egiazarian; Denis Kuznedelev; Elias Frantar; Saleh Ashkboos; Alexander Borzunov; Torsten Hoefler; Dan Alistarh; |
4 | FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We observe that the inefficiency is due to suboptimal work partitioning between different thread blocks and warps on the GPU, causing either low-occupancy or unnecessary shared memory reads/writes. We propose FlashAttention-2, with better work partitioning to address these issues. |
Tri Dao; |
5 | MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We believe that the enhanced multi-modal generation capabilities of GPT-4 stem from the utilization of sophisticated large language models (LLM). To examine this phenomenon, we present MiniGPT-4, which aligns a frozen visual encoder with a frozen advanced LLM, Vicuna, using one projection layer. |
Deyao Zhu; Jun Chen; Xiaoqian Shen; Xiang Li; Mohamed Elhoseiny; |
6 | LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce LMSYS-Chat-1M, a large-scale dataset containing one million real-world conversations with 25 state-of-the-art LLMs. |
Lianmin Zheng; Wei-Lin Chiang; Ying Sheng; Tianle Li; Siyuan Zhuang; Zhanghao Wu; Yonghao Zhuang; Zhuohan Li; Zi Lin; Eric Xing; Joseph E. Gonzalez; Ion Stoica; Hao Zhang; |
7 | Vision Transformers Need Registers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we identify and characterize artifacts in feature maps of both supervised and self-supervised ViT networks. |
Timothée Darcet; Maxime Oquab; Julien Mairal; Piotr Bojanowski; |
8 | SILO Language Models: Isolating Legal Risk In A Nonparametric Datastore Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, as we show, model performance significantly degrades if trained only on low-risk text (e.g., out-of-copyright books or government documents), due to its limited size and domain coverage. We present SILO, a new language model that manages this risk-performance tradeoff during inference. |
Sewon Min; Suchin Gururangan; Eric Wallace; Weijia Shi; Hannaneh Hajishirzi; Noah A. Smith; Luke Zettlemoyer; |
9 | Let’s Verify Step By Step Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset. |
Hunter Lightman; Vineet Kosaraju; Yuri Burda; Harrison Edwards; Bowen Baker; Teddy Lee; Jan Leike; John Schulman; Ilya Sutskever; Karl Cobbe; |
10 | How to Fine-Tune Vision Models with SGD Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our insights result in state-of-the-art accuracies on five popular distribution shift benchmarks: WILDS-FMoW, WILDS-Camelyon, BREEDS-Living-17, Waterbirds, and DomainNet. |
Ananya Kumar; Ruoqi Shen; Sebastien Bubeck; Suriya Gunasekar; |
11 | Detecting Pretraining Data from Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the pretraining data detection problem: given a piece of text and black-box access to an LLM without knowing the pretraining data, can we determine if the model was trained on the provided text? |
Weijia Shi; Anirudh Ajith; Mengzhou Xia; Yangsibo Huang; Daogao Liu; Terra Blevins; Danqi Chen; Luke Zettlemoyer; |
12 | TokenFlow: Consistent Diffusion Features for Consistent Video Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a framework that harnesses the power of a text-to-image diffusion model for the task of text-driven video editing. |
Michal Geyer; Omer Bar-Tal; Shai Bagon; Tali Dekel; |
13 | Large Language Models As Optimizers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as optimizers, where the optimization task is described in natural language. |
Chengrun Yang; Xuezhi Wang; Yifeng Lu; Hanxiao Liu; Quoc V Le; Denny Zhou; Xinyun Chen; |
14 | Fine-Tuning Language Models for Factuality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we fine-tune language models to be more factual, without human labeling and targeting more open-ended generation settings than past work. |
Katherine Tian; Eric Mitchell; Huaxiu Yao; Christopher D Manning; Chelsea Finn; |
15 | Representation Deficiency in Masked Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: One notable concern about MLM is that the special $\texttt{[MASK]}$ symbol causes a discrepancy between pretraining data and downstream data as it is present only in pretraining but not in fine-tuning. In this work, we offer a new perspective on the consequence of such a discrepancy: We demonstrate empirically and theoretically that MLM pretraining allocates some model dimensions exclusively for representing $\texttt{[MASK]}$ tokens, resulting in a representation deficiency for real tokens and limiting the pretrained model’s expressiveness when it is adapted to downstream data without $\texttt{[MASK]}$ tokens. |
Yu Meng; Jitin Krishnan; Sinong Wang; Qifan Wang; Yuning Mao; Han Fang; Marjan Ghazvininejad; Jiawei Han; Luke Zettlemoyer; |
16 | Efficient Streaming Language Models with Attention Sinks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we first demonstrate that the emergence of attention sink is due to the strong attention scores towards initial tokens as a “sink” even if they are not semantically important. Based on the above analysis, we introduce StreamingLLM, an efficient framework that enables LLMs trained with a finite length attention window to generalize to infinite sequence length without any fine-tuning. |
Guangxuan Xiao; Yuandong Tian; Beidi Chen; Song Han; Mike Lewis; |
17 | Teaching Large Language Models to Self-Debug Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, for complex programming tasks, generating the correct solution in one go becomes challenging, thus some prior works have designed program repair approaches to improve code generation performance. In this work, we propose self-debugging, which teaches a large language model to debug its predicted program. |
Xinyun Chen; Maxwell Lin; Nathanael Schärli; Denny Zhou; |
18 | Self-RAG: Learning to Retrieve, Generate, and Critique Through Self-Reflection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new framework called **Self-Reflective Retrieval-Augmented Generation (Self-RAG)** that enhances an LM’s quality and factuality through retrieval and self-reflection. |
Akari Asai; Zeqiu Wu; Yizhong Wang; Avirup Sil; Hannaneh Hajishirzi; |
19 | Quantifying Language Models’ Sensitivity to Spurious Features in Prompt Design Or: How I Learned to Start Worrying About Prompt Formatting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we focus on LLM sensitivity to a quintessential class of meaning-preserving design choices: prompt formatting. |
Melanie Sclar; Yejin Choi; Yulia Tsvetkov; Alane Suhr; |
20 | A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce WebAgent, an LLM-driven agent that learns from self-experience to complete tasks on real websites following natural language instructions. |
Izzeddin Gur; Hiroki Furuta; Austin V Huang; Mustafa Safdari; Yutaka Matsuo; Douglas Eck; Aleksandra Faust; |
21 | RA-DIT: Retrieval-Augmented Dual Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Retrieval-Augmented Dual Instruction Tuning (RA-DIT), a lightweight fine-tuning methodology that provides a third option by retrofitting any LLM with retrieval capabilities. |
Xi Victoria Lin; Xilun Chen; Mingda Chen; Weijia Shi; Maria Lomeli; Richard James; Pedro Rodriguez; Jacob Kahn; Gergely Szilvasy; Mike Lewis; Luke Zettlemoyer; Wen-tau Yih; |
22 | On The Reliability of Watermarks for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document. |
John Kirchenbauer; Jonas Geiping; Yuxin Wen; Manli Shu; Khalid Saifullah; Kezhi Kong; Kasun Fernando; Aniruddha Saha; Micah Goldblum; Tom Goldstein; |
23 | Proving Test Set Contamination in Black-Box Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a procedure for detecting test set contamination of language models with exact false positive guarantees and without access to pretraining data or model weights. |
Yonatan Oren; Nicole Meister; Niladri S. Chatterji; Faisal Ladhak; Tatsunori Hashimoto; |
24 | DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose DreamGaussian, a novel 3D content generation framework that achieves both efficiency and quality simultaneously. |
Jiaxiang Tang; Jiawei Ren; Hang Zhou; Ziwei Liu; Gang Zeng; |
25 | Grounding Multimodal Large Language Models to The World Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e.g., bounding boxes) and grounding text to the visual world.To train the model, we construct a large-scale dataset about grounded image-text pairs (GrIT) together with multimodal corpora. |
Zhiliang Peng; Wenhui Wang; Li Dong; Yaru Hao; Shaohan Huang; Shuming Ma; Qixiang Ye; Furu Wei; |
26 | Large Language Models As Tool Makers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent research has highlighted the potential of large language models (LLMs) to improve their problem-solving capabilities with the aid of suitable external tools. In our work, we further advance this concept by introducing a closed- loop framework, referred to as LLMs A s Tool Makers (LATM), where LLMs create their own reusable tools for problem-solving. |
Tianle Cai; Xuezhi Wang; Tengyu Ma; Xinyun Chen; Denny Zhou; |
27 | OctoPack: Instruction Tuning Code Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We apply instruction tuning using code, leveraging the natural structure of Git commits, which pair code changes with human instructions. |
Niklas Muennighoff; Qian Liu; Armel Randy Zebaze; Qinkai Zheng; Binyuan Hui; Terry Yue Zhuo; Swayam Singh; Xiangru Tang; Leandro Von Werra; Shayne Longpre; |
28 | CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unlike these models, humans typically utilize external tools to cross-check and refine their initial content, like using a search engine for fact-checking, or a code interpreter for debugging. Inspired by this observation, we introduce a framework called CRITIC that allows LLMs, which are essentially “black boxes” to validate and progressively amend their own outputs in a manner similar to human interaction with tools. |
Zhibin Gou; Zhihong Shao; Yeyun Gong; yelong shen; Yujiu Yang; Nan Duan; Weizhu Chen; |
29 | ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose ToRA a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical problems by seamlessly integrating natural language reasoning with the utilization of external tools (e.g., computation libraries and symbolic solvers), thereby amalgamating the analytical prowess of language and the computational efficiency of tools. |
Zhibin Gou; Zhihong Shao; Yeyun Gong; yelong shen; Yujiu Yang; Minlie Huang; Nan Duan; Weizhu Chen; |
30 | In-Context Pretraining: Language Modeling Beyond Document Boundaries Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Language models are currently trained to predict tokens given document prefixes, enabling them to zero shot long form generation and prompting-style tasks which can be reduced to document completion. We instead present IN-CONTEXT PRETRAINING, a new approach where language models are trained on a sequence of related documents, thereby explicitly encouraging them to read and reason across document boundaries. |
Weijia Shi; Sewon Min; Maria Lomeli; Chunting Zhou; Margaret Li; Xi Victoria Lin; Noah A. Smith; Luke Zettlemoyer; Wen-tau Yih; Mike Lewis; |
31 | Sheared LLaMA: Accelerating Language Model Pre-training Via Structured Pruning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study structured pruning as an effective means to develop smaller LLMs from pre-trained, larger models. |
Mengzhou Xia; Tianyu Gao; Zhiyuan Zeng; Danqi Chen; |
32 | Large Language Models As Analogical Reasoners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a new prompting approach, analogical prompting, designed to automatically guide the reasoning process of large language models. |
Michihiro Yasunaga; Xinyun Chen; Yujia Li; Panupong Pasupat; Jure Leskovec; Percy Liang; Ed H. Chi; Denny Zhou; |
33 | The Hidden Language of Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Conceptor, a novel method to interpret the internal representation of a textual concept by a diffusion model. |
Hila Chefer; Oran Lang; Mor Geva; Volodymyr Polosukhin; Assaf Shocher; michal Irani; Inbar Mosseri; Lior Wolf; |
34 | Llemma: An Open Language Model for Mathematics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Llemma, a large language model for mathematics. |
Zhangir Azerbayev; Hailey Schoelkopf; Keiran Paster; Marco Dos Santos; Stephen Marcus McAleer; Albert Q. Jiang; Jia Deng; Stella Biderman; Sean Welleck; |
35 | LLM-Assisted Code Cleaning For Training Accurate Code Generators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system. |
Naman Jain; Tianjun Zhang; Wei-Lin Chiang; Joseph E. Gonzalez; Koushik Sen; Ion Stoica; |
36 | DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel image editing method, DragonDiffusion, enabling Drag-style manipulation on Diffusion models. |
Chong Mou; Xintao Wang; Jiechong Song; Ying Shan; Jian Zhang; |
37 | MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here we introduce MetaGPT, an innovative meta-programming framework incorporating efficient human workflows into LLM-based multi-agent collaborations. |
Sirui Hong; Mingchen Zhuge; Jonathan Chen; Xiawu Zheng; Yuheng Cheng; Jinlin Wang; Ceyao Zhang; Zili Wang; Steven Ka Shing Yau; Zijuan Lin; Liyang Zhou; Chenyu Ran; Lingfeng Xiao; Chenglin Wu; Jürgen Schmidhuber; |
38 | Efficient Video Diffusion Models Via Content-Frame Motion-Latent Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is because current video diffusion models often attempt to process high-dimensional videos directly. To tackle this issue, we propose content-motion latent diffusion model (CMD), a novel efficient extension of pretrained image diffusion models for video generation. |
Sihyun Yu; Weili Nie; De-An Huang; Boyi Li; Jinwoo Shin; Anima Anandkumar; |
39 | Eureka: Human-Level Reward Design Via Coding Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, harnessing them to learn complex low-level manipulation tasks, such as dexterous pen spinning, remains an open problem. We bridge this fundamental gap and present Eureka, a human-level reward design algorithm powered by LLMs. |
Yecheng Jason Ma; William Liang; Guanzhi Wang; De-An Huang; Osbert Bastani; Dinesh Jayaraman; Yuke Zhu; Linxi Fan; Anima Anandkumar; |
40 | Take A Step Back: Evoking Reasoning Via Abstraction in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present STEP-BACK PROMPTING, a simple prompting technique that enables LLMs to do abstractions to derive high-level concepts and first principles from instances containing specific details. |
Huaixiu Steven Zheng; Swaroop Mishra; Xinyun Chen; Heng-Tze Cheng; Ed H. Chi; Quoc V Le; Denny Zhou; |
41 | Leveraging Unpaired Data for Vision-Language Generative Models Via Cycle Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce $\textbf{ITIT}$ ($\textbf{I}$n$\textbf{T}$egrating $\textbf{I}$mage $\textbf{T}$ext): an innovative training paradigm grounded in the concept of cycle consistency which allows vision-language training on $\textit{unpaired}$ image and text data. |
Tianhong Li; Sangnie Bhardwaj; Yonglong Tian; Han Zhang; Jarred Barber; Dina Katabi; Guillaume Lajoie; Huiwen Chang; Dilip Krishnan; |
42 | WildFusion: Learning 3D-Aware Latent Diffusion Models in View Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we instead model instances in view space, alleviating the need for posed images and learned camera distributions. |
Katja Schwarz; Seung Wook Kim; Jun Gao; Sanja Fidler; Andreas Geiger; Karsten Kreis; |
43 | The False Promise of Imitating Proprietary Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we critically analyze this approach of imitating language models. |
Arnav Gudibande; Eric Wallace; Charlie Victor Snell; Xinyang Geng; Hao Liu; Pieter Abbeel; Sergey Levine; Dawn Song; |
44 | Scaling Laws for Sparsely-Connected Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore the impact of parameter sparsity on the scaling behavior of Transformers trained on massive datasets (i.e., foundation models), in both vision and language domains. |
Elias Frantar; Carlos Riquelme Ruiz; Neil Houlsby; Dan Alistarh; Utku Evci; |
45 | Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present “Magic123”, a two-stage coarse-to-fine approach for high-quality, textured 3D mesh generation from a single image in the wild using *both 2D and 3D priors*. |
Guocheng Qian; Jinjie Mai; Abdullah Hamdi; Jian Ren; Aliaksandr Siarohin; Bing Li; Hsin-Ying Lee; Ivan Skorokhodov; Peter Wonka; Sergey Tulyakov; Bernard Ghanem; |
46 | Towards Understanding Sycophancy in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate the prevalence of sycophancy in RLHF-trained models and whether human preference judgments are responsible. |
Mrinank Sharma; Meg Tong; Tomasz Korbak; David Duvenaud; Amanda Askell; Samuel R. Bowman; Esin DURMUS; Zac Hatfield-Dodds; Scott R Johnston; Shauna M Kravec; Timothy Maxwell; Sam McCandlish; Kamal Ndousse; Oliver Rausch; Nicholas Schiefer; Da Yan; Miranda Zhang; Ethan Perez; |
47 | NEFTune: Noisy Embeddings Improve Instruction Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that language model finetuning can be improved, sometimes dramatically, with a simple augmentation. |
Neel Jain; Ping-yeh Chiang; Yuxin Wen; John Kirchenbauer; Hong-Min Chu; Gowthami Somepalli; Brian R. Bartoldson; Bhavya Kailkhura; Avi Schwarzschild; Aniruddha Saha; Micah Goldblum; Jonas Geiping; Tom Goldstein; |
48 | Identifying The Risks of LM Agents with An LM-Emulated Sandbox Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As tools and agents become more complex, the high cost of testing these agents will make it increasingly difficult to find high-stakes, long-tail risks. To address these challenges, we introduce ToolEmu: a framework that uses an LM to emulate tool execution and enables scalable testing of LM agents against a diverse range of tools and scenarios. |
Yangjun Ruan; Honghua Dong; Andrew Wang; Silviu Pitis; Yongchao Zhou; Jimmy Ba; Yann Dubois; Chris J. Maddison; Tatsunori Hashimoto; |
49 | CLIP The Bias: How Useful Is Balancing Data in Multimodal Learning? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: First, we reaffirm prior conclusions that CLIP can inadvertently absorb stereotypes. To counter this, we present a novel algorithm, called Multi-Modal Moment Matching (M4), designed to reduce both representation and association biases in multimodal data. |
Ibrahim Alabdulmohsin; Xiao Wang; Andreas Peter Steiner; Priya Goyal; Alexander D’Amour; Xiaohua Zhai; |
50 | Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instruction tuning is a technique for training LLMs to follow instructions. We advocate combining these two approaches, as we find that MoE models benefit more from instruction tuning than dense models. |
Sheng Shen; Le Hou; Yanqi Zhou; Nan Du; Shayne Longpre; Jason Wei; Hyung Won Chung; Barret Zoph; William Fedus; Xinyun Chen; Tu Vu; Yuexin Wu; Wuyang Chen; Albert Webson; Yunxuan Li; Vincent Y Zhao; Hongkun Yu; Kurt Keutzer; Trevor Darrell; Denny Zhou; |
51 | Personalize Segment Anything Model with One Shot Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a training-free Personalization approach for SAM, termed PerSAM.To demonstrate our efficacy, we construct a new dataset, PerSeg, for the evaluation of personalized object segmentation, and also test our methods on various one-shot image and video segmentation benchmarks. |
Renrui Zhang; Zhengkai Jiang; Ziyu Guo; Shilin Yan; Junting Pan; Hao Dong; Yu Qiao; Peng Gao; Hongsheng Li; |
52 | Large Language Models Cannot Self-Correct Reasoning Yet Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A contemporary methodology, self-correction, has been proposed as a remedy to these issues. Building upon this premise, this paper critically examines the role and efficacy of self-correction within LLMs, shedding light on its true potential and limitations. |
Jie Huang; Xinyun Chen; Swaroop Mishra; Huaixiu Steven Zheng; Adams Wei Yu; Xinying Song; Denny Zhou; |
53 | Nougat: Neural Optical Understanding for Academic Documents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Nougat (Neural Optical Understanding for Academic Documents), a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language, and demonstrate the effectiveness of our model on a new dataset of scientific documents. |
Lukas Blecher; Guillem Cucurull; Thomas Scialom; Robert Stojnic; |
54 | WizardCoder: Empowering Code Large Language Models with Evol-Instruct Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Code Evol-Instruct, a novel approach that adapts the Evol-Instruct method to the realm of code, enhancing Code LLMs to create novel models, WizardCoder. |
Ziyang Luo; Can Xu; Pu Zhao; Qingfeng Sun; Xiubo Geng; Wenxiang Hu; Chongyang Tao; Jing Ma; Qingwei Lin; Daxin Jiang; |
55 | Benchmarking and Improving Generator-Validator Consistency of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a framework for measuring the consistency between generation and validation (which we call generator-validator consistency, or GV-consistency), finding that even GPT-4 (0613), a state-of-the-art LM, is GV-consistent only 76% of the time. |
Xiang Lisa Li; Vaishnavi Shrivastava; Siyan Li; Tatsunori Hashimoto; Percy Liang; |
56 | LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present LongLoRA, an efficient fine-tuning approach that extends the context sizes of pre-trained large language models (LLMs), with limited computation cost. |
Yukang Chen; Shengju Qian; Haotian Tang; Xin Lai; Zhijian Liu; Song Han; Jiaya Jia; |
57 | The Reversal Curse: LLMs Trained on “A Is B” Fail to Learn “B Is A” Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is worth noting, however, that if ”_A_ is _B_” appears _in-context_, models can deduce the reverse relationship. We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as ”Uriah Hawthorne is the composer of _Abyssal Melodies_” and showing that they fail to correctly answer ”Who composed _Abyssal Melodies? |
Lukas Berglund; Meg Tong; Maximilian Kaufmann; Mikita Balesni; Asa Cooper Stickland; Tomasz Korbak; Owain Evans; |
58 | Ferret: Refer and Ground Anything Anywhere at Any Granularity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Ferret, a new Multimodal Large Language Model (MLLM) capable of understanding spatial referring of any shape or granularity within an image and accurately grounding open-vocabulary descriptions. |
Haoxuan You; Haotian Zhang; Zhe Gan; Xianzhi Du; Bowen Zhang; Zirui Wang; Liangliang Cao; Shih-Fu Chang; Yinfei Yang; |
59 | DreamLLM: Synergistic Multimodal Comprehension and Creation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between … |
Runpei Dong; Chunrui Han; Yuang Peng; Zekun Qi; Zheng Ge; Jinrong Yang; Liang Zhao; Jianjian Sun; Hongyu Zhou; Haoran Wei; Xiangwen Kong; Xiangyu Zhang; Kaisheng Ma; Li Yi; |
60 | LRM: Large Reconstruction Model for Single Image to 3D Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds. |
Yicong Hong; Kai Zhang; Jiuxiang Gu; Sai Bi; Yang Zhou; Difan Liu; Feng Liu; Kalyan Sunkavalli; Trung Bui; Hao Tan; |
61 | LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present LLaMA-Adapter, a lightweight adaption method for efficient instruction tuning of LLaMA. |
Renrui Zhang; Jiaming Han; Chris Liu; Aojun Zhou; Pan Lu; Yu Qiao; Hongsheng Li; Peng Gao; |
62 | Catastrophic Jailbreak of Open-source LLMs Via Exploiting Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose the generation exploitation attack, an extremely simple approach that disrupts model alignment by only manipulating variations of decoding methods. |
Yangsibo Huang; Samyak Gupta; Mengzhou Xia; Kai Li; Danqi Chen; |
63 | Language Models Represent Space and Time Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We find evidence for the latter by analyzing the learned representations of three spatial datasets (world, US, NYC places) and three temporal datasets (historical figures, artworks, news headlines) in the Llama-2 family of models. |
Wes Gurnee; Max Tegmark; |
64 | MAmmoTH: Building Math Generalist Models Through Hybrid Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce MAmmoTH, a series of open-source large language models (LLMs) specifically tailored for general math problem-solving. |
Xiang Yue; Xingwei Qu; Ge Zhang; Yao Fu; Wenhao Huang; Huan Sun; Yu Su; Wenhu Chen; |
65 | Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To fill this gap, we present the first extensive study of the unintended side-effects of persona assignment on the ability of LLMs to perform _basic reasoning tasks_. |
Shashank Gupta; Vaishnavi Shrivastava; Ameet Deshpande; Ashwin Kalyan; Peter Clark; Ashish Sabharwal; Tushar Khot; |
66 | BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce binary token representations (BTR), which use 1-bit vectors to precompute every token in passages, significantly reducing computation during inference. |
Qingqing Cao; Sewon Min; Yizhong Wang; Hannaneh Hajishirzi; |
67 | A Recipe for Improved Certifiable Robustness Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we provide a more comprehensive evaluation to better uncover the potential of Lipschitz-based certification methods. |
Kai Hu; Klas Leino; Zifan Wang; Matt Fredrikson; |
68 | Video Language Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present video language planning (VLP), an algorithm that consists of a tree search procedure, where we train (i) vision-language models to serve as both policies and value functions, and (ii) text-to-video models as dynamics models. |
Yilun Du; Sherry Yang; Pete Florence; Fei Xia; Ayzaan Wahid; brian ichter; Pierre Sermanet; Tianhe Yu; Pieter Abbeel; Joshua B. Tenenbaum; Leslie Pack Kaelbling; Andy Zeng; Jonathan Tompson; |
69 | YaRN: Efficient Context Window Extension of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these models fail to generalize past the sequence length they were trained on. We present YaRN (Yet another RoPE extensioN method), a compute-efficient method to extend the context window of such models, requiring 10x less tokens and 2.5x less training steps than previous methods. |
Bowen Peng; Jeffrey Quesnelle; Honglu Fan; Enrico Shippole; |
70 | Successor Heads: Recurring, Interpretable Attention Heads In The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we describe successor heads: attention heads that increment tokens with a natural ordering, such as numbers, months, and days. |
Rhys Gould; Euan Ong; George Ogden; Arthur Conmy; |
71 | ChatEval: Towards Better LLM-based Evaluators Through Multi-Agent Debate Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we construct a multi-agent referee team called $\textbf{ChatEval}$ to autonomously discuss and evaluate the quality of different texts. |
Chi-Min Chan; Weize Chen; Yusheng Su; Jianxuan Yu; Wei Xue; Shanghang Zhang; Jie Fu; Zhiyuan Liu; |
72 | On The Learnability of Watermarks for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we ask whether language models can directly learn to generate watermarked text, which would have significant implications for the real-world deployment of watermarks. |
Chenchen Gu; Xiang Lisa Li; Percy Liang; Tatsunori Hashimoto; |
73 | Making Retrieval-Augmented Language Models Robust to Irrelevant Context Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a thorough analysis on five open-domain question answering benchmarks, characterizing cases when retrieval reduces accuracy. |
Ori Yoran; Tomer Wolfson; Ori Ram; Jonathan Berant; |
74 | Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We outline and critically analyze potential mitigations and advocate for further research efforts toward reinforcing safety protocols for the customized fine-tuning of aligned LLMs. |
Xiangyu Qi; Yi Zeng; Tinghao Xie; Pin-Yu Chen; Ruoxi Jia; Prateek Mittal; Peter Henderson; |
75 | Unbalancedness in Neural Monge Maps Improves Unpaired Domain Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a theoretically grounded method to incorporate unbalancedness into any Monge map estimator. |
Luca Eyring; Dominik Klein; Théo Uscidda; Giovanni Palla; Niki Kilbertus; Zeynep Akata; Fabian J Theis; |
76 | GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs Via Cipher Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we discover that chat in cipher can bypass the safety alignment techniques of LLMs, which are mainly conducted in natural languages. |
Youliang Yuan; Wenxiang Jiao; Wenxuan Wang; Jen-tse Huang; Pinjia He; Shuming Shi; Zhaopeng Tu; |
77 | Function Vectors in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We report the presence of a simple neural mechanism that represents an input-output function as a vector within autoregressive transformer language models (LMs). |
Eric Todd; Millicent Li; Arnab Sen Sharma; Aaron Mueller; Byron C Wallace; David Bau; |
78 | Improved Techniques for Training Consistency Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, distillation limits the quality of consistency models to that of the pre-trained diffusion model, and LPIPS causes undesirable bias in evaluation. To tackle these challenges, we present improved techniques for consistency training, where consistency models learn directly from data without distillation. |
Yang Song; Prafulla Dhariwal; |
79 | Safe RLHF: Safe Reinforcement Learning from Human Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the inherent tension between the objectives of helpfulness and harmlessness presents a significant challenge during LLM training. To address this issue, we propose Safe Reinforcement Learning from Human Feedback (Safe RLHF), a novel algorithm for human value alignment. |
Josef Dai; Xuehai Pan; Ruiyang Sun; Jiaming Ji; Xinbo Xu; Mickel Liu; Yizhou Wang; Yaodong Yang; |
80 | Directly Fine-Tuning Diffusion Models on Differentiable Rewards Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Direct Reward Fine-Tuning (DRaFT), a simple and effective method for fine-tuning diffusion models to maximize differentiable reward functions, such as scores from human preference models. |
Kevin Clark; Paul Vicol; Kevin Swersky; David J. Fleet; |
81 | Pushing Mixture of Experts to The Limit: Extremely Parameter Efficient MoE for Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we push MoE to the limit. |
Ted Zadouri; Ahmet Üstün; Arash Ahmadian; Beyza Ermis; Acyr Locatelli; Sara Hooker; |
82 | CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual Grounding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In addition, it does not illustrate how and why the network reaches the final decision. In this paper, we address this question “Can we design an interpretable 3D visual grounding framework that has the potential to mimic the human perception system?” |
Eslam Mohamed BAKR; Mohamed Ayman Mohamed; Mahmoud Ahmed; Habib Slim; Mohamed Elhoseiny; |
83 | Kosmos-G: Generating Images in Context with Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents Kosmos-G, a model that leverages the advanced multimodal perception capabilities of Multimodal Large Language Models (MLLMs) to tackle the aforementioned challenge. |
Xichen Pan; Li Dong; Shaohan Huang; Zhiliang Peng; Wenhu Chen; Furu Wei; |
84 | Reward Model Ensembles Help Mitigate Overoptimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Using a similar setup, we conduct a systematic study to evaluate the efficacy of using ensemble-based conservative optimization objectives, specifically worst-case optimization (WCO) and uncertainty-weighted optimization (UWO), for mitigating reward model overoptimization when using two optimization methods: (a) best-of-n sampling (BoN) (b) proximal policy optimization (PPO). |
Thomas Coste; Usman Anwar; Robert Kirk; David Krueger; |
85 | Improving Domain Generalization with Domain Relations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on domain shifts, which occur when the model is applied to new domains that are different from the ones it was trained on, and propose a new approach called DG. |
Huaxiu Yao; Xinyu Yang; Xinyi Pan; Shengchao Liu; Pang Wei Koh; Chelsea Finn; |
86 | Adapting Large Language Models Via Reading Comprehension Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Taken inspiration from human learning via reading comprehension–practice after reading improves the ability to answer questions based on the learned knowledge–we propose a simple method for transforming raw corpora into reading comprehension texts. |
Daixuan Cheng; Shaohan Huang; Furu Wei; |
87 | Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the effect of code on enhancing LLMs’ reasoning capability by introducing different constraints on the Code Usage Frequency of GPT-4 Code Interpreter. |
Aojun Zhou; Ke Wang; Zimu Lu; Weikang Shi; Sichun Luo; Zipeng Qin; Shaoqing Lu; Anya Jia; Linqi Song; Mingjie Zhan; Hongsheng Li; |
88 | Multimodal Web Navigation with Instruction-Finetuned Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study data-driven offline training for web agents with vision-language foundation models. |
Hiroki Furuta; Kuang-Huei Lee; Ofir Nachum; Yutaka Matsuo; Aleksandra Faust; Shixiang Shane Gu; Izzeddin Gur; |
89 | Fine-Tuned Language Models Generate Stable Inorganic Materials As Text Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose fine-tuning large language models for generation of stable materials. |
Nate Gruver; Anuroop Sriram; Andrea Madotto; Andrew Gordon Wilson; C. Lawrence Zitnick; Zachary Ward Ulissi; |
90 | Linearity of Relation Decoding in Transformer Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Much of the knowledge encoded in transformer language models (LMs) may be expressed in terms of relations: relations between words and their synonyms, entities and their attributes, etc. We show that, for a subset of relations, this computation is well-approximated by a single linear transformation on the subject representation. |
Evan Hernandez; Arnab Sen Sharma; Tal Haklay; Kevin Meng; Martin Wattenberg; Jacob Andreas; Yonatan Belinkov; David Bau; |
91 | What Algorithms Can Transformers Learn? A Study in Length Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on length generalization, and we propose a unifying framework to understand when and how Transformers can be expected to length generalize on a given task. |
Hattie Zhou; Arwen Bradley; Etai Littwin; Noam Razin; Omid Saremi; Joshua M. Susskind; Samy Bengio; Preetum Nakkiran; |
92 | MVDream: Multi-view Diffusion for 3D Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce MVDream, a diffusion model that is able to generate consistent multi-view images from a given text prompt. |
Yichun Shi; Peng Wang; Jianglong Ye; Long Mai; Kejie Li; Xiao Yang; |
93 | ReLoRA: High-Rank Training Through Low-Rank Updates Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore parameter-efficient training techniques as an approach to training large neural networks. |
Vladislav Lialin; Sherin Muckatira; Namrata Shivagunde; Anna Rumshisky; |
94 | AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce AutoDAN, a novel jailbreak attack against aligned LLMs. |
Xiaogeng Liu; Nan Xu; Muhao Chen; Chaowei Xiao; |
95 | CoBIT: A Contrastive Bi-directional Image-Text Generation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Intuitively, the first two objectives can be considered as complementary projections between two modalities, and contrastive learning can preserve global alignment and generations facilitate fine-grained understanding. Inspired by this, we present a Contrastive Bi-directional Image-Text generation model (CoBIT) to first time unify the three pre-training objectives in one framework. |
Haoxuan You; Mandy Guo; Zhecan Wang; Kai-Wei Chang; Jason Michael Baldridge; Jiahui Yu; |
96 | Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A key implication of this result is that annotators have an incentive to misreport their preferences in order to influence the learned model, leading to vulnerabilities in the deployment of RLHF. As a step towards mitigating these problems, we introduce a class of methods called *distributional preference learning* (DPL). |
Anand Siththaranjan; Cassidy Laidlaw; Dylan Hadfield-Menell; |
97 | Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we introduce adaptive KV cache compression, a plug-and-play method that reduces the memory footprint of generative inference for Large Language Models (LLMs). |
Suyu Ge; Yunan Zhang; Liyuan Liu; Minjia Zhang; Jiawei Han; Jianfeng Gao; |
98 | Nearly $d$-Linear Convergence Bounds for Diffusion Models Via Stochastic Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a refined treatment of the error from discretizing the reverse SDE inspired by stochastic localization. |
Joe Benton; Valentin De Bortoli; Arnaud Doucet; George Deligiannidis; |
99 | Self-Alignment with Instruction Backtranslation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a scalable method to build a high quality instruction following language model by automatically labelling human-written text with corresponding instructions. |
Xian Li; Ping Yu; Chunting Zhou; Timo Schick; Omer Levy; Luke Zettlemoyer; Jason E Weston; Mike Lewis; |
100 | ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our key insights are twofold: *(1) pixels as inputs are crucial for recognition tasks; (2) VQ tokens as reconstruction targets are beneficial for generation tasks. |
Changyao Tian; Chenxin Tao; Jifeng Dai; Hao Li; Ziheng Li; Lewei Lu; Xiaogang Wang; Hongsheng Li; Gao Huang; Xizhou Zhu; |
101 | Adversarial Training Should Be Cast As A Non-Zero-Sum Game Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the promise of this approach, algorithms based on this paradigm have not engendered sufficient levels of robustness and suffer from pathological behavior like robust overfitting. To understand this shortcoming, we first show that the commonly used surrogate-based relaxation used in adversarial training algorithms voids all guarantees on the robustness of trained classifiers. The identification of this pitfall informs a novel non-zero-sum bilevel formulation of adversarial training, wherein each player optimizes a different objective function. |
Alexander Robey; Fabian Latorre; George J. Pappas; Hamed Hassani; Volkan Cevher; |
102 | SliceGPT: Compress Large Language Models By Deleting Rows and Columns Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we present SliceGPT, a new post-training sparsification scheme which replaces each weight matrix with a smaller (dense) matrix, reducing the embedding dimension of the network. |
Saleh Ashkboos; Maximilian L. Croci; Marcelo Gennari do Nascimento; Torsten Hoefler; James Hensman; |
103 | Continual Learning on A Diet: Learning from Sparsely Labeled Streams Under Constrained Computation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose and study a realistic Continual Learning (CL) setting where learning algorithms are granted a restricted computational budget per time step while training. |
Wenxuan Zhang; Youssef Mohamed; Bernard Ghanem; Philip Torr; Adel Bibi; Mohamed Elhoseiny; |
104 | Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing methods, however, are constrained to process plain text and do not support such a mechanism. This motivates us to introduce PASTA — Post-hoc Attention STeering Approach, a method that allows LLMs to read text with user-specified emphasis marks. |
Qingru Zhang; Chandan Singh; Liyuan Liu; Xiaodong Liu; Bin Yu; Jianfeng Gao; Tuo Zhao; |
105 | ZeroFlow: Scalable Scene Flow Via Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address both limitations, we propose _Scene Flow via Distillation_, a simple, scalable distillation framework that uses a label-free optimization method to produce pseudo-labels to supervise a feedforward model. |
Kyle Vedder; Neehar Peri; Nathaniel Eliot Chodosh; Ishan Khatri; ERIC EATON; Dinesh Jayaraman; Yang Liu; Deva Ramanan; James Hays; |
106 | ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is in contrast to the excellent tool-use capabilities of state-of-the-art (SOTA) closed-source LLMs, e.g., ChatGPT. To bridge this gap, we introduce ToolLLM, a general tool-use framework encompassing data construction, model training, and evaluation. |
Yujia Qin; Shihao Liang; Yining Ye; Kunlun Zhu; Lan Yan; Yaxi Lu; Yankai Lin; Xin Cong; Xiangru Tang; Bill Qian; Sihan Zhao; Lauren Hong; Runchu Tian; Ruobing Xie; Jie Zhou; Mark Gerstein; dahai li; Zhiyuan Liu; Maosong Sun; |
107 | Guaranteed Approximation Bounds for Mixed-Precision Neural Operators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We prove that the precision error is asymptotically comparable to the approximation error. Based on this, we design a simple method to optimize the memory-intensive half-precision tensor contractions by greedily finding the optimal contraction order. |
Renbo Tu; Colin White; Jean Kossaifi; Boris Bonev; Gennady Pekhimenko; Kamyar Azizzadenesheli; Anima Anandkumar; |
108 | WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show an avenue for creating large amounts of instruction data with varying levels of complexity using LLM instead of humans. |
Can Xu; Qingfeng Sun; Kai Zheng; Xiubo Geng; Pu Zhao; Jiazhan Feng; Chongyang Tao; Qingwei Lin; Daxin Jiang; |
109 | MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Large Language Models (LLMs) and Large Multimodal Models (LMMs) exhibit impressive problem-solving skills in many tasks and domains, but their ability in mathematical reasoning in visual contexts has not been systematically studied. To bridge this gap, we present MathVista, a benchmark designed to combine challenges from diverse mathematical and visual tasks. |
Pan Lu; Hritik Bansal; Tony Xia; Jiacheng Liu; Chunyuan Li; Hannaneh Hajishirzi; Hao Cheng; Kai-Wei Chang; Michel Galley; Jianfeng Gao; |
110 | Universal Guidance for Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a universal guidance algorithm that enables diffusion models to be controlled by arbitrary guidance modalities without the need to retrain any use-specific components. |
Arpit Bansal; Hong-Min Chu; Avi Schwarzschild; Soumyadip Sengupta; Micah Goldblum; Jonas Geiping; Tom Goldstein; |
111 | Project and Probe: Sample-Efficient Adaptation By Interpolating Orthogonal Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In some situations, target data labels may be expensive to obtain, so we may only have access to a limited number of target data points. To make the most of a very small target dataset, we propose a lightweight, sample-efficient approach that learns a diverse set of features and adapts to a target distribution by interpolating these features. |
Annie S Chen; Yoonho Lee; Amrith Setlur; Sergey Levine; Chelsea Finn; |
112 | Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Self-guided Masked Autoencoders (SMA), a fully domain-agnostic masked modeling method. |
Johnathan Wenjia Xie; Yoonho Lee; Annie S Chen; Chelsea Finn; |
113 | Vocos: Closing The Gap Between Time-domain and Fourier-based Neural Vocoders for High-quality Audio Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nevertheless, direct reconstruction of complex-valued spectrograms has been historically problematic, primarily due to phase recovery issues. This study seeks to close this gap by presenting Vocos, a new model that directly generates Fourier spectral coefficients. |
Hubert Siuzdak; |
114 | Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study how fine-tuning affects the internal mechanisms implemented in language models. |
Nikhil Prakash; Tamar Rott Shaham; Tal Haklay; Yonatan Belinkov; David Bau; |
115 | Window Attention Is Bugged: How Not to Interpolate Position Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fix it, we introduce a simple absolute window position embedding strategy, which solves the bug outright in Hiera and allows us to increase both speed and performance of the model in ViTDet. |
Daniel Bolya; Chaitanya Ryali; Judy Hoffman; Christoph Feichtenhofer; |
116 | FLASK: Fine-grained Language Model Evaluation Based on Alignment Skill Sets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce FLASK (Fine-grained Language Model Evaluation based on Alignment Skill Sets), a fine-grained evaluation protocol for both human-based and model-based evaluation which decomposes coarse-level scoring to a skill set-level scoring for each instruction. |
Seonghyeon Ye; Doyoung Kim; Sungdong Kim; Hyeonbin Hwang; Seungone Kim; Yongrae Jo; James Thorne; Juho Kim; Minjoon Seo; |
117 | The Unlocking Spell on Base LLMs: Rethinking Alignment Via In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on these findings, we rethink the alignment of LLMs by posing the research question: how effectively can we align base LLMs without SFT or RLHF? To address this, we introduce a simple, tuning-free alignment method, URIAL (Untuned LLMs with Restyled In-context Alignment). |
Bill Yuchen Lin; Abhilasha Ravichander; Ximing Lu; Nouha Dziri; Melanie Sclar; Khyathi Chandu; Chandra Bhagavatula; Yejin Choi; |
118 | Training Diffusion Models with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most use cases of diffusion models are not concerned with likelihoods, but instead with downstream objectives such as human-perceived image quality or drug effectiveness. In this paper, we investigate reinforcement learning methods for directly optimizing diffusion models for such objectives. |
Kevin Black; Michael Janner; Yilun Du; Ilya Kostrikov; Sergey Levine; |
119 | Zero-Shot Robotic Manipulation with Pre-Trained Image-Editing Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SuSIE, a method that leverages an image-editing diffusion model to act as a high-level planner by proposing intermediate subgoals that a low-level controller can accomplish. |
Kevin Black; Mitsuhiko Nakamoto; Pranav Atreya; Homer Rich Walke; Chelsea Finn; Aviral Kumar; Sergey Levine; |
120 | Universal Jailbreak Backdoors from Poisoned Human Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider a new threat where an attacker poisons the RLHF data to embed a jailbreak trigger into the model as a backdoor.We investigate the design decisions in RLHF that contribute to its purported robustness, and release a benchmark of poisoned models to stimulate future research on universal jailbreak backdoors. |
Javier Rando; Florian Tramèr; |
121 | Chain-of-Knowledge: Grounding Large Language Models Via Dynamic Knowledge Adapting Over Heterogeneous Sources Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present chain-of-knowledge (CoK), a novel framework that augments large language models (LLMs) by dynamically incorporating grounding information from heterogeneous sources. |
Xingxuan Li; Ruochen Zhao; Yew Ken Chia; Bosheng Ding; Shafiq Joty; Soujanya Poria; Lidong Bing; |
122 | SALMON: Self-Alignment with Instructable Reward Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision, using only a small set of human-defined principles, yet achieving superior performance. |
Zhiqing Sun; Yikang Shen; Hongxin Zhang; Qinhong Zhou; Zhenfang Chen; David Daniel Cox; Yiming Yang; Chuang Gan; |
123 | Evaluating Large Language Models at Evaluating Instruction Following Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With LLMBar, we hope to offer more insight into LLM evaluators and foster future research in developing better instruction-following models.We introduce a challenging meta-evaluation benchmark, LLMBar, designed to test the ability of an LLM evaluator in discerning instruction-following outputs. |
Zhiyuan Zeng; Jiatong Yu; Tianyu Gao; Yu Meng; Tanya Goyal; Danqi Chen; |
124 | Is Self-Repair A Silver Bullet for Code Generation? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we analyze Code Llama, GPT-3.5 and GPT-4’s ability to perform self-repair on problems taken from HumanEval and APPS. |
Theo X. Olausson; Jeevana Priya Inala; Chenglong Wang; Jianfeng Gao; Armando Solar-Lezama; |
125 | OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce OpenWebMath, an open dataset inspired by these works containing 14.7B tokens of mathematical webpages from Common Crawl. |
Keiran Paster; Marco Dos Santos; Zhangir Azerbayev; Jimmy Ba; |
126 | Self-Consuming Generative Models Go MAD Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conduct a thorough analytical and empirical analysis using state-of-the-art generative image models of three families of autophagous loops that differ in how fixed or fresh real training data is available through the generations of training and whether the samples from previous-generation models have been biased to trade off data quality versus diversity. |
Sina Alemohammad; Josue Casco-Rodriguez; Lorenzo Luzi; Ahmed Imtiaz Humayun; Hossein Babaei; Daniel LeJeune; Ali Siahkoohi; Richard Baraniuk; |
127 | Sparse Autoencoders Find Highly Interpretable Features in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we attempt to identify those directions, using sparse autoencoders to reconstruct the internal activations of a language model. |
Robert Huben; Hoagy Cunningham; Logan Riggs Smith; Aidan Ewart; Lee Sharkey; |
128 | In-context Autoencoder for Context Compression in A Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the In-context Autoencoder (ICAE), leveraging the power of a large language model (LLM) to compress a long context into short compact memory slots that can be directly conditioned on by the LLM for various purposes. |
Tao Ge; Hu Jing; Lei Wang; Xun Wang; Si-Qing Chen; Furu Wei; |
129 | LayoutNUWA: Revealing The Hidden Layout Expertise of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose LayoutNUWA, the first model that treats layout generation as a code generation task to enhance semantic information and harness the hidden layout expertise of large language models~(LLMs). |
Zecheng Tang; Chenfei Wu; Juntao Li; Nan Duan; |
130 | A Benchmark for Learning to Translate A New Language from One Grammar Book Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We turn to a field that is explicitly motivated and bottlenecked by a scarcity of web data: low-resource languages. In this paper, we introduce MTOB (Machine Translation from One Book), a benchmark for learning to translate between English and Kalamang—a language with less than 200 speakers and therefore virtually no presence on the web—using several hundred pages of field linguistics reference materials. |
Garrett Tanzer; Mirac Suzgun; Eline Visser; Dan Jurafsky; Luke Melas-Kyriazi; |
131 | Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We empirically demonstrate that overoptimization occurs notably when a poorly aligned reward model is used as the fine-tuning objective. To address this, we propose TextNorm, a simple method that enhances alignment based on a measure of reward model confidence estimated across a set of semantically contrastive text prompts. |
Kyuyoung Kim; Jongheon Jeong; Minyong An; Mohammad Ghavamzadeh; Krishnamurthy Dj Dvijotham; Jinwoo Shin; Kimin Lee; |
132 | Effective Data Augmentation With Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current augmentations cannot alter the high-level semantic attributes, such as animal species present in a scene, to enhance the diversity of data. We address the lack of diversity in data augmentation with image-to-image transformations parameterized by pre-trained text-to-image diffusion models. |
Brandon Trabucco; Kyle Doherty; Max A Gurinas; Ruslan Salakhutdinov; |
133 | WebArena: A Realistic Web Environment for Building Autonomous Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we build an environment for language-guided agents that is highly realistic and reproducible.Building upon our environment, we release a set of benchmark tasks focusing on evaluating the functional correctness of task completions. |
Shuyan Zhou; Frank F. Xu; Hao Zhu; Xuhui Zhou; Robert Lo; Abishek Sridhar; Xianyi Cheng; Tianyue Ou; Yonatan Bisk; Daniel Fried; Uri Alon; Graham Neubig; |
134 | Large Language Models Are Not Robust Multiple Choice Selectors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate selection bias, we propose a label-free, inference-time debiasing method, called PriDe, which separates the model’s prior bias for option IDs from the overall prediction distribution. |
Chujie Zheng; Hao Zhou; Fandong Meng; Jie Zhou; Minlie Huang; |
135 | GoLLIE: Annotation Guidelines Improve Zero-Shot Information-Extraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we propose GoLLIE (Guideline-following Large Language Model for IE), a model able to improve zero-shot results on unseen IE tasks by virtue of being fine-tuned to comply with annotation guidelines. |
Oscar Sainz; Iker García-Ferrero; Rodrigo Agerri; Oier Lopez de Lacalle; German Rigau; Eneko Agirre; |
136 | BooookScore: A Systematic Exploration of Book-length Summarization in The Era of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the first study of the coherence of LLM-based book-length summarizers implemented via two prompting workflows: (1) hierarchically merging chunk-level summaries, and (2) incrementally updating a running summary. |
Yapei Chang; Kyle Lo; Tanya Goyal; Mohit Iyyer; |
137 | Mixture of LoRA Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: On the other hand, Reference tuning-based fusion exhibits limitations concerning the requisite flexibility for the effective combination of multiple LoRAs. In response to these challenges, this paper introduces the Mixture of LoRA Experts (MoLE) approach, which harnesses hierarchical control and unfettered branch selection. |
Xun Wu; Shaohan Huang; Furu Wei; |
138 | Understanding The Effects of RLHF on LLM Generalisation and Diversity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While there has been significant work developing these methods, our understanding of the benefits and downsides of each stage in RLHF is still limited. To fill this gap, we present an extensive analysis of how each stage of the process (i.e. supervised fine-tuning (SFT), reward modelling, and RLHF) affects two key properties: out-of-distribution (OOD) generalisation and output diversity. |
Robert Kirk; Ishita Mediratta; Christoforos Nalmpantis; Jelena Luketina; Eric Hambro; Edward Grefenstette; Roberta Raileanu; |
139 | Generalized Schrödinger Bridge Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Modern distribution matching algorithms for training diffusion or flow models directly prescribe the time evolution of the marginal distributions between two boundary distributions. In this work, we consider a generalized distribution matching setup, where these marginals are only implicitly described as a solution to some task-specific objective function. |
Guan-Horng Liu; Yaron Lipman; Maximilian Nickel; Brian Karrer; Evangelos Theodorou; Ricky T. Q. Chen; |
140 | Jailbreak in Pieces: Compositional Adversarial Attacks on Multi-Modal Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce new jailbreak attacks on vision language models (VLMs), which use aligned LLMs and are resilient to text-only jailbreak attacks. |
Erfan Shayegani; Yue Dong; Nael Abu-Ghazaleh; |
141 | AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models Without Specific Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present AnimateDiff, a practical framework for animating personalized T2I models without requiring model-specific tuning. |
Yuwei Guo; Ceyuan Yang; Anyi Rao; Zhengyang Liang; Yaohui Wang; Yu Qiao; Maneesh Agrawala; Dahua Lin; Bo Dai; |
142 | Towards Codable Watermarking for Injecting Multi-Bits Information to LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we conduct the first systematic study on the topic of **Codable Text Watermarking for LLMs** (CTWL) that allows text watermarks to carry multi-bit customizable information. |
Lean Wang; Wenkai Yang; Deli Chen; Hao Zhou; Yankai Lin; Fandong Meng; Jie Zhou; Xu Sun; |
143 | Making LLaMA SEE and Draw with SEED Tokenizer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce $\textbf{SEED}$, an elaborate image tokenizer that empowers LLMs with the ability to $\textbf{SEE}$ and $\textbf{D}$raw at the same time. |
Yuying Ge; Sijie Zhao; Ziyun Zeng; Yixiao Ge; Chen Li; Xintao Wang; Ying Shan; |
144 | Data Filtering Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study the problem of learning a *data filtering network* (DFN) for this second step of filtering a large uncurated dataset.Based on our insights, we construct new data filtering networks that induce state-of-the-art image-text datasets. |
Alex Fang; Albin Madappally Jose; Amit Jain; Ludwig Schmidt; Alexander T Toshev; Vaishaal Shankar; |
145 | Towards Best Practices of Activation Patching in Language Models: Metrics and Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we systematically examine the impact of methodological details in activation patching, including evaluation metrics and corruption methods. |
Fred Zhang; Neel Nanda; |
146 | COLLIE: Systematic Construction of Constrained Text Generation Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present COLLIE, a grammar-based framework that allows the specification of rich, compositional constraints with diverse generation levels (word, sentence, paragraph, passage) and modeling challenges (e.g. language understanding, logical reasoning, counting, semantic planning). |
Shunyu Yao; Howard Chen; Austin W. Hanjie; Runzhe Yang; Karthik R Narasimhan; |
147 | Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Besides, while the existing works mainly focus on synthesizing the head part, it is also vital to generate natural torso and background segments to obtain a realistic talking portrait video. To address these limitations, we present Real3D-Potrait, a framework that (1) improves the one-shot 3D reconstruction power with a large image-to-plane model that distills 3D prior knowledge from a 3D face generative model; (2) facilitates accurate motion-conditioned animation with an efficient motion adapter; (3) synthesizes realistic video with natural torso movement and switchable background using a head-torso-background super-resolution model; and (4) supports one-shot audio-driven talking face generation with a generalizable audio-to-motion model. |
Zhenhui Ye; Tianyun Zhong; Yi Ren; Jiaqi Yang; Weichuang Li; Jiawei Huang; Ziyue Jiang; Jinzheng He; Rongjie Huang; Jinglin Liu; Chen Zhang; Xiang Yin; Zejun MA; Zhou Zhao; |
148 | Knowledge Card: Filling LLMs’ Knowledge Gaps with Plug-in Specialized Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose Knowledge Card, a modular framework to plug in new factual and relevant knowledge into general-purpose LLMs. |
Shangbin Feng; Weijia Shi; Yuyang Bai; Vidhisha Balachandran; Tianxing He; Yulia Tsvetkov; |
149 | Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous confidence elicitation methods, which primarily rely on *white-box access* to internal model information or model fine-tuning, have become less suitable for LLMs, especially closed-source commercial APIs. This leads to a growing need to explore the untapped area of *black-box* approaches for LLM uncertainty estimation. To better break down the problem, we define a systematic framework with three components: *prompting* strategies for eliciting verbalized confidence, *sampling* methods for generating multiple responses, and *aggregation* techniques for computing consistency. |
Miao Xiong; Zhiyuan Hu; Xinyang Lu; YIFEI LI; Jie Fu; Junxian He; Bryan Hooi; |
150 | A Variational Perspective on Solving Inverse Problems with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is however challenging in diffusion models since the nonlinear and iterative nature of the diffusion process renders the posterior intractable. To cope with this challenge, we propose a variational approach that by design seeks to approximate the true posterior distribution. |
Morteza Mardani; Jiaming Song; Jan Kautz; Arash Vahdat; |
151 | Can LLM-Generated Misinformation Be Detected? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A fundamental research question is: will LLM-generated misinformation cause more harm than human-written misinformation? We propose to tackle this question from the perspective of detection difficulty. |
Canyu Chen; Kai Shu; |
152 | PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing prompt optimization techniques, though automated through iterative sampling, often fall short in injecting domain knowledge and exploring the vast prompt space for complex expert-level prompts efficiently. To address this pressing need and achieve expert-level prompting, we introduce PromptAgent, which autonomously discovers prompts equivalent in quality to those handcrafted by experts. |
Xinyuan Wang; Chenxi Li; Zhen Wang; Fan Bai; Haotian Luo; Jiayou Zhang; Nebojsa Jojic; Eric Xing; Zhiting Hu; |
153 | Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method called reasoning on graphs (RoG) that synergizes LLMs with KGs to enable faithful and interpretable reasoning. |
LINHAO LUO; Yuan-Fang Li; Reza Haf; Shirui Pan; |
154 | Compositional Preference Models for Aligning LMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the dominant paradigm for training Preference Models (PMs) for that purpose suffers from fundamental limitations, such as lack of transparency and scalability, along with susceptibility to overfitting the preference dataset. We propose Compositional Preference Models (CPMs), a novel PM framework that decomposes one global preference assessment into several interpretable features, obtains scalar scores for these features from a prompted LM, and aggregates these scores using a logistic regression classifier. |
Dongyoung Go; Tomasz Korbak; Germán Kruszewski; Jos Rozen; Marc Dymetman; |
155 | MiniLLM: Knowledge Distillation of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a KD approach that distills LLMs into smaller language models. |
Yuxian Gu; Li Dong; Furu Wei; Minlie Huang; |
156 | Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A major limitation of SFT is that it essentially does imitation learning, which can’t fully understand what are the expected behaviors. To address this issue, we propose an improved alignment approach named $\textbf{FIGA}$. |
Geyang Guo; Ranchi Zhao; Tianyi Tang; Xin Zhao; Ji-Rong Wen; |
157 | LLM-grounded Video Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current models still struggle with intricate spatiotemporal prompts and often generate restricted or incorrect motion. To address these limitations, we introduce LLM-grounded Video Diffusion (LVD). |
Long Lian; Baifeng Shi; Adam Yala; Trevor Darrell; Boyi Li; |
158 | Vision-Language Models Are Zero-Shot Reward Models for Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a natural and general approach to using VLMs as reward models, which we call VLM-RMs. |
Juan Rocamonde; Victoriano Montesinos; Elvis Nava; Ethan Perez; David Lindner; |
159 | Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we conduct a systematic study of the inductive reasoning capabilities of LMs through $\textit{iterative hypothesis refinement}$, a technique that more closely mirrors the human inductive process than standard input-output prompting. |
Linlu Qiu; Liwei Jiang; Ximing Lu; Melanie Sclar; Valentina Pyatkin; Chandra Bhagavatula; Bailin Wang; Yoon Kim; Yejin Choi; Nouha Dziri; Xiang Ren; |
160 | DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present DreamCraft3D, a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects. |
Jingxiang Sun; Bo Zhang; Ruizhi Shao; Lizhen Wang; Wen Liu; Zhenda Xie; Yebin Liu; |
161 | Noise-free Score Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we reexamine the SDS process and introduce a straightforward interpretation that demystifies the necessity for large Classifier-Free Guidance (CFG) scales, rooted in the distillation of an undesired noise term. |
Oren Katzir; Or Patashnik; Daniel Cohen-Or; Dani Lischinski; |
162 | #InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose InsTag, an open-set instruction tagging method, to identify semantics and intentions of human instructions by tags that provide access to definitions and quantified analyses of instruction diversity and complexity. |
Keming Lu; Hongyi Yuan; Zheng Yuan; Runji Lin; Junyang Lin; Chuanqi Tan; Chang Zhou; Jingren Zhou; |
163 | Predicting Emergent Abilities with Infinite Resolution Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Task performances typically show minor gains on small models until they improve dramatically once models exceed a size threshold, exemplifying the ”emergent abilities”. In this study, we discover that small models, although they exhibit minor performance, demonstrate critical and consistent task performance improvements that are not captured by conventional evaluation strategies due to insufficient measurement resolution. |
Shengding Hu; Xin Liu; Xu Han; Xinrong Zhang; Chaoqun He; Weilin Zhao; Yankai Lin; Ning Ding; Zebin Ou; Guoyang Zeng; Zhiyuan Liu; Maosong Sun; |
164 | Prometheus: Inducing Fine-Grained Evaluation Capability in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose PROMETHEUS a fully open-source LLM that is on par with GPT-4’s evaluation capabilities when the appropriate reference materials (reference answer, score rubric) are accompanied.For this purpose, we construct a new dataset – FEEDBACK COLLECTION – that consists of 1K fine-grained score rubrics, 20K instructions, and 100K natural language feedback generated by GPT-4. |
Seungone Kim; Jamin Shin; Yejin Cho; Joel Jang; Shayne Longpre; Hwaran Lee; Sangdoo Yun; Seongjin Shin; Sungdong Kim; James Thorne; Minjoon Seo; |
165 | The Alignment Problem from A Deep Learning Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In coming years or decades, artificial general intelligence (AGI) may surpass human capabilities at many critical tasks. In this position paper, we examine the technical difficulty of fine-tuning hypothetical AGI systems based on pretrained deep models to pursue goals that are aligned with human interests. |
Richard Ngo; Lawrence Chan; Sören Mindermann; |
166 | Conformal Risk Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We extend conformal prediction to control the expected value of any monotone loss function. |
Anastasios Nikolas Angelopoulos; Stephen Bates; Adam Fisch; Lihua Lei; Tal Schuster; |
167 | Locality-Aware Graph Rewiring in GNNs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we identify three desiderata for graph-rewiring: (i) reduce over-squashing, (ii) respect the locality of the graph, and (iii) preserve the sparsity of the graph. |
Federico Barbero; Ameya Velingker; Amin Saberi; Michael M. Bronstein; Francesco Di Giovanni; |
168 | Multilingual Jailbreak Challenges in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we reveal the presence of multilingual jailbreak challenges within LLMs and consider two potential risky scenarios: unintentional and intentional. |
Yue Deng; Wenxuan Zhang; Sinno Jialin Pan; Lidong Bing; |
169 | ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This reduction is particularly valuable during the memory-bound inference step, where efficiency is paramount. Exploring sparsity patterns in ReLU-based LLMs, we unveil the reutilization of activated neurons for generating new tokens and leveraging these insights, we propose practical strategies to substantially reduce LLM inference computation up to three times, using ReLU activations with minimal performance trade-offs. |
Seyed Iman Mirzadeh; Keivan Alizadeh-Vahid; Sachin Mehta; Carlo C del Mundo; Oncel Tuzel; Golnoosh Samei; Mohammad Rastegari; Mehrdad Farajtabar; |
170 | ImagenHub: Standardizing The Evaluation of Conditional Image Generation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes ImagenHub, which is a one-stop library to standardize the inference and evaluation of all the conditional image generation models. |
Max Ku; Tianle Li; Kai Zhang; Yujie Lu; Xingyu Fu; Wenwen Zhuang; Wenhu Chen; |
171 | DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Leveraging large language models (LLMs) with emergent abilities, we propose the DiLu framework, which combines a Reasoning and a Reflection module to enable the system to perform decision-making based on common-sense knowledge and evolve continuously. |
Licheng Wen; Daocheng Fu; Xin Li; Xinyu Cai; Tao MA; Pinlong Cai; Min Dou; Botian Shi; Liang He; Yu Qiao; |
172 | Motif: Intrinsic Motivation from Artificial Intelligence Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Exploring rich environments and evaluating one’s actions without prior knowledge is immensely challenging. In this paper, we propose Motif, a general method to interface such prior knowledge from a Large Language Model (LLM) with an agent. |
Martin Klissarov; Pierluca D’Oro; Shagun Sodhani; Roberta Raileanu; Pierre-Luc Bacon; Pascal Vincent; Amy Zhang; Mikael Henaff; |
173 | MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, while LLMs can utilize extensive background knowledge and task information with in-context learning, most VLMs still struggle with understanding complex multi-modal prompts with multiple images, making VLMs less effective in downstream vision-language tasks. In this paper, we address the limitation above by 1) introducing vision-language Model with **M**ulti-**M**odal **I**n-**C**ontext **L**earning(MMICL), a new approach to allow the VLM to deal with multi-modal inputs efficiently; 2) proposing a novel context scheme to augment the in-context learning ability of the VLM; 3) constructing the Multi-modal In-Context Learning (MIC) dataset, designed to enhance the VLM’s ability to understand complex multi-modal prompts. |
Haozhe Zhao; Zefan Cai; Shuzheng Si; Xiaojian Ma; Kaikai An; Liang Chen; Zixuan Liu; Sheng Wang; Wenjuan Han; Baobao Chang; |
174 | Matryoshka Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Matryoshka Diffusion (MDM), an end-to-end framework for high-resolution image and video synthesis. |
Jiatao Gu; Shuangfei Zhai; Yizhe Zhang; Joshua M. Susskind; Navdeep Jaitly; |
175 | Remote Sensing Vision-Language Foundation Models Without Annotations Via Ground Remote Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a method to train vision-language models for remote-sensing images without using any textual annotations. |
Utkarsh Mall; Cheng Perng Phoo; Meilin Kelsey Liu; Carl Vondrick; Bharath Hariharan; Kavita Bala; |
176 | Stochastic Controlled Averaging for Federated Learning with Communication Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we revisit the seminal stochastic controlled averaging method by proposing an equivalent but more efficient/simplified formulation with halved uplink communication costs, building upon which we propose two compressed FL algorithms, SCALLION and SCAFCOM, to support unbiased and biased compression, respectively. |
Xinmeng Huang; Ping Li; Xiaoyun Li; |
177 | FeatUp: A Model-Agnostic Framework for Features at Any Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce FeatUp, a task- and model-agnostic framework to restore lost spatial information in deep features. |
Stephanie Fu; Mark Hamilton; Laura E. Brandt; Axel Feldmann; Zhoutong Zhang; William T. Freeman; |
178 | CLEX: Continuous Length Extrapolation for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Length extrapolation methods, although theoretically capable of extending the context window beyond the training sequence length, often underperform in practical long-context applications. To address these challenges, we propose Continuous Length EXtrapolation (CLEX) for LLMs. |
Guanzheng Chen; Xin Li; Zaiqiao Meng; Shangsong Liang; Lidong Bing; |
179 | Small-scale Proxies for Large-scale Transformer Training Instabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we seek ways to reproduce and study training instability at smaller scales. |
Mitchell Wortsman; Peter J Liu; Lechao Xiao; Katie E Everett; Alexander A Alemi; Ben Adlam; John D Co-Reyes; Izzeddin Gur; Abhishek Kumar; Roman Novak; Jeffrey Pennington; Jascha Sohl-Dickstein; Kelvin Xu; Jaehoon Lee; Justin Gilmer; Simon Kornblith; |
180 | Training Socially Aligned Language Models on Simulated Social Interactions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a new training paradigm that enables LMs to learn from simulated social interactions. |
Ruibo Liu; Ruixin Yang; Chenyan Jia; Ge Zhang; Diyi Yang; Soroush Vosoughi; |
181 | Graph Metanetworks for Processing Diverse Neural Architectures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, those works developed architectures tailored to specific networks such as MLPs and CNNs without normalization layers, and generalizing such architectures to other types of networks can be challenging. In this work, we overcome these challenges by building new metanetworks — neural networks that take weights from other neural networks as input. |
Derek Lim; Haggai Maron; Marc T. Law; Jonathan Lorraine; James Lucas; |
182 | AnyText: Multilingual Visual Text Generation and Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although current technology for synthesizing images is highly advanced and capable of generating images with high fidelity, it is still possible to give the show away when focusing on the text area in the generated image, as synthesized text often contains blurred, unreadable, or incorrect characters, making visual text generation one of the most challenging issues in this field. To address this issue, we introduce AnyText, a diffusion-based multilingual visual text generation and editing model, that focuses on rendering accurate and coherent text in the image. |
Yuxiang Tuo; Wangmeng Xiang; Jun-Yan He; Yifeng Geng; Xuansong Xie; |
183 | Future Language Modeling from Temporal Document History Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, there has been relatively little formalization of this general problem in the machine learning or natural language processing communities. To address this gap, we introduce the task of future language modeling: probabilistic modeling of texts in the future based on a temporal history of texts. |
Changmao Li; Jeffrey Flanigan; |
184 | Conformal Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel approach to conformal prediction for language models (LMs) in which we produce prediction sets with performance guarantees. |
Victor Quach; Adam Fisch; Tal Schuster; Adam Yala; Jae Ho Sohn; Tommi S. Jaakkola; Regina Barzilay; |
185 | Is This The Subspace You Are Looking For? An Interpretability Illusion for Subspace Activation Patching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we demonstrate that naïve approaches to subspace interventions can give rise to interpretability illusions. |
Aleksandar Makelov; Georg Lange; Atticus Geiger; Neel Nanda; |
186 | Generative Pre-training for Speech with Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While generative models have been applied to different applications in speech, there exists no general-purpose generative model that models speech directly. In this work, we take a step toward this direction by showing a single pre-trained generative model can be adapted to different downstream tasks with strong performance. |
Alexander H. Liu; Matthew Le; Apoorv Vyas; Bowen Shi; Andros Tjandra; Wei-Ning Hsu; |
187 | Contrastive Preference Learning: Learning from Human Feedback Without Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using the principle of maximum entropy, we derive \fullname (\abv), an algorithm for learning optimal policies from preferences without learning reward functions, circumventing the need for RL. |
Joey Hejna; Rafael Rafailov; Harshit Sikchi; Chelsea Finn; Scott Niekum; W. Bradley Knox; Dorsa Sadigh; |
188 | GROOT: Learning to Follow Instructions By Watching Gameplay Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of building a controller that can follow open-ended instructions in open-world environments. |
Shaofei Cai; Bowei Zhang; Zihao Wang; Xiaojian Ma; Anji Liu; Yitao Liang; |
189 | Demystifying CLIP Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we intend to reveal CLIP’s data curation approach and in our pursuit of making it open to the community introduce Metadata-Curated Language-Image Pre-training (MetaCLIP). |
Hu Xu; Saining Xie; Xiaoqing Tan; Po-Yao Huang; Russell Howes; Vasu Sharma; Shang-Wen Li; Gargi Ghosh; Luke Zettlemoyer; Christoph Feichtenhofer; |
190 | SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose SelfCheck, a general-purpose zero-shot verification schema for recognizing such errors. |
Ning Miao; Yee Whye Teh; Tom Rainforth; |
191 | Scalable Language Model with Generalized Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing methods typically encounter strict limitations and challenges in real-world scenarios, such as reliance on experience replay, optimization constraints, and inference task-ID. In this study, we introduce the Scalable Language Model (SLM) to overcome these limitations within a more challenging and generalized setting, representing a significant advancement toward practical applications for continual learning. |
Bohao PENG; Zhuotao Tian; Shu Liu; Ming-Chang Yang; Jiaya Jia; |
192 | Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: . In this work, we analyze the effect of this design choice for the alignment and evaluation of LLMs. |
Hritik Bansal; John Dang; Aditya Grover; |
193 | Guiding Instruction-based Image Editing Via Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Multimodal large language models (MLLMs) show promising capabilities in cross-modal understanding and visual-aware response generation via LMs. |
Tsu-Jui Fu; Wenze Hu; Xianzhi Du; William Yang Wang; Yinfei Yang; Zhe Gan; |
194 | UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore targeted distillation with mission-focused instruction tuning to train student models that can excel in a broad application class such as open information extraction.For evaluation, we assemble the largest NER benchmark to date, comprising 43 datasets across 9 diverse domains such as biomedicine, programming, social media, law, finance.We release the distillation recipe, data, and UniversalNER models to facilitate future research on targeted distillation. |
Wenxuan Zhou; Sheng Zhang; Yu Gu; Muhao Chen; Hoifung Poon; |
195 | Provable Robust Watermarking for AI-Generated Text Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a rigorous theoretical framework to quantify the effectiveness and robustness of LLM watermarks. |
Xuandong Zhao; Prabhanjan Vijendra Ananth; Lei Li; Yu-Xiang Wang; |
196 | Sudden Drops in The Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a case study of syntax acquisition in masked language models (MLMs) that demonstrates how analyzing the evolution of interpretable artifacts throughout training deepens our understanding of emergent behavior. |
Angelica Chen; Ravid Shwartz-Ziv; Kyunghyun Cho; Matthew L Leavitt; Naomi Saphra; |
197 | Leveraging Optimization for Adaptive Attacks on Image Watermarks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The core idea of our adaptive attacks is to replicate secret watermarking keys locally by creating surrogate keys that are differentiable and can be used to optimize the attack’s parameters. We demonstrate for Stable Diffusion models that such an attacker can break all five surveyed watermarking methods at no visible degradation in image quality. |
Nils Lukas; Abdulrahman Diaa; Lucas Fenaux; Florian Kerschbaum; |
198 | Graph Neural Networks for Learning Equivariant Representations of Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to represent neural networks as computational graphs of parameters, which allows us to harness powerful graph neural networks and transformers that preserve permutation symmetry. |
Miltiadis Kofinas; Boris Knyazev; Yan Zhang; Yunlu Chen; Gertjan J. Burghouts; Efstratios Gavves; Cees G. M. Snoek; David W. Zhang; |
199 | GeoLLM: Extracting Geospatial Knowledge from Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here we explore the question of whether the vast amounts of knowledge found in Internet language corpora, now compressed within large language models (LLMs), can be leveraged for geospatial prediction tasks. |
Rohin Manvi; Samar Khanna; Gengchen Mai; Marshall Burke; David B. Lobell; Stefano Ermon; |
200 | Knowledge Fusion of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, due to the varying architectures of these LLMs, directly blending their weights is impractical. In this paper, we introduce the notion of knowledge fusion for LLMs, aimed at combining the capabilities of existing LLMs and transferring them into a single LLM. |
Fanqi Wan; Xinting Huang; Deng Cai; Xiaojun Quan; Wei Bi; Shuming Shi; |
201 | SE(3)-Stochastic Flow Matching for Protein Backbone Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The computational design of novel protein structures has the potential to impact numerous scientific disciplines greatly. Toward this goal, we introduce \foldflow, a series of novel generative models of increasing modeling power based on the flow-matching paradigm over $3\mathrm{D}$ rigid motions—i.e. the group $\mathrm{SE(3)}$—enabling accurate modeling of protein backbones. |
Joey Bose; Tara Akhound-Sadegh; Guillaume Huguet; Kilian FATRAS; Jarrid Rector-Brooks; Cheng-Hao Liu; Andrei Cristian Nica; Maksym Korablyov; Michael M. Bronstein; Alexander Tong; |
202 | Learning Interactive Real-World Simulators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore the possibility of learning a universal simulator (UniSim) of real-world interaction through generative modeling. |
Sherry Yang; Yilun Du; Seyed Kamyar Seyed Ghasemipour; Jonathan Tompson; Leslie Pack Kaelbling; Dale Schuurmans; Pieter Abbeel; |
203 | Probabilistic Adaptation of Black-Box Text-to-Video Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by how a large language model can be prompted to perform new tasks without access to the model weights, we investigate how to adapt a black-box pretrained text-to-video model to a variety of downstream domains without weight access to the pretrained model. In answering this question, we propose \emph{\methodname}, which leverages the score function of a large pretrained video diffusion model as a probabilistic prior to guide the generation of a task-specific small video model. |
Sherry Yang; Yilun Du; Bo Dai; Dale Schuurmans; Joshua B. Tenenbaum; Pieter Abbeel; |
204 | CausalLM Is Not Optimal for In-context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we take a theoretical approach and analyze the convergence behavior of prefixLM and causalLM under a certain parameter construction. |
Nan Ding; Tomer Levinboim; Jialin Wu; Sebastian Goodman; Radu Soricut; |
205 | Mitigating Hallucination in Large Multi-Modal Models Via Robust Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To efficiently measure the hallucination generated by LMMs, we propose GPT4-Assisted Visual Instruction Evaluation (GAVIE), a stable approach to evaluate visual instruction tuning like human experts. |
Fuxiao Liu; Kevin Lin; Linjie Li; Jianfeng Wang; Yaser Yacoob; Lijuan Wang; |
206 | T-MARS: Improving Visual Representations By Circumventing Text Feature Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new state-of-the-art data filtering approach motivated by our observation that nearly $40\%$ of LAION’s images contain text that overlaps significantly with the caption. |
Pratyush Maini; Sachin Goyal; Zachary Chase Lipton; J Zico Kolter; Aditi Raghunathan; |
207 | OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although recent post-training quantization (PTQ) methods are effective in reducing memory footprint and improving the computational efficiency of LLM, they hand-craft quantization parameters, leading to low performance, especially in extremely low-bit quantization. To tackle this issue, we introduce an Omnidirectionally calibrated Quantization ($\textbf{OmniQuant}$) technique for LLMs, which achieves good performance in diverse quantization settings while maintaining the computational efficiency of PTQ by efficiently optimizing various quantization parameters. |
Wenqi Shao; Mengzhao Chen; Zhaoyang Zhang; Peng Xu; Lirui Zhao; Zhiqian Li; Kaipeng Zhang; Peng Gao; Yu Qiao; Ping Luo; |
208 | Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the lack of 3D awareness in the 2D diffusion model often destabilizes previous methods from generating a plausible 3D scene. To address this issue, we propose 3DFuse, a novel framework that incorporates 3D awareness into the pretrained 2D diffusion model, enhancing the robustness and 3D consistency of score distillation-based methods. |
Junyoung Seo; Wooseok Jang; Min-Seop Kwak; Hyeonsu Kim; Jaehoon Ko; Junho Kim; Jin-Hwa Kim; Jiyoung Lee; Seungryong Kim; |
209 | When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider two types of finetuning – full-model tuning (FMT) and parameter efficient tuning (PET, including prompt tuning and LoRA), and explore their scaling behaviors in the data-limited regime where the LLM model size substantially outweighs the finetuning data size. |
Biao Zhang; Zhongtao Liu; Colin Cherry; Orhan Firat; |
210 | Strategic Preys Make Acute Predators: Enhancing Camouflaged Object Detectors By Generating Camouflaged Objects Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: On the prey side, we propose an adversarial training framework, Camouflageator, which introduces an auxiliary generator to generate more camouflaged objects that are harder for a COD method to detect. |
Chunming He; Kai Li; Yachao Zhang; Yulun Zhang; Chenyu You; Zhenhua Guo; Xiu Li; Martin Danelljan; Fisher Yu; |
211 | EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To capitalize on the advantages while avoiding their respective drawbacks, we introduce a data-free, quantization-aware and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency. |
Yefei He; Jing Liu; Weijia Wu; Hong Zhou; Bohan Zhuang; |
212 | CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we embark on an in-depth analysis of the region-language alignment in CLIP models, which is essential for downstream open-vocabulary dense prediction tasks. |
Size Wu; Wenwei Zhang; Lumin Xu; Sheng Jin; Xiangtai Li; Wentao Liu; Chen Change Loy; |
213 | CRAFT: Customizing LLMs By Creating and Retrieving from Specialized Toolsets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present CRAFT, a general tool creation and retrieval framework for LLMs. |
Lifan Yuan; Yangyi Chen; Xingyao Wang; Yi Fung; Hao Peng; Heng Ji; |
214 | What’s In My Big Data? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose What’s In My Big Data? |
Yanai Elazar; Akshita Bhagia; Ian Helgi Magnusson; Abhilasha Ravichander; Dustin Schwenk; Alane Suhr; Evan Pete Walsh; Dirk Groeneveld; Luca Soldaini; Sameer Singh; Hannaneh Hajishirzi; Noah A. Smith; Jesse Dodge; |
215 | OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Indeed, point cloud and 3D meshes typically have a lower resolution than images and the reconstructed 3D scene geometry might not project well to the underlying 2D image sequences used to compute pixel-aligned CLIP features. To address these challenges, we propose OpenNeRF which naturally operates on posed images and directly encodes the VLM features within the NeRF. |
Francis Engelmann; Fabian Manhardt; Michael Niemeyer; Keisuke Tateno; Federico Tombari; |
216 | RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the novel approach of recursively embedding, clustering, and summarizing chunks of text, constructing a tree with differing levels of summarization from the bottom up. |
Parth Sarthi; Salman Abdullah; Aditi Tuli; Shubh Khanna; Anna Goldie; Christopher D Manning; |
217 | The Truth Is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This simple intervention, which we call LAyer-SElective Rank reduction (LASER), can be done on a model after training has completed, and requires minimal additional parameters and data. We show extensive experiments demonstrating the generality of this finding across language models and datasets, and provide in-depth analyses offering insights into both when LASER is effective and the mechanism by which it operates |
Pratyusha Sharma; Jordan T. Ash; Dipendra Misra; |
218 | On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current KD methods for auto-regressive sequence models suffer from distribution mismatch between output sequences seen during training and those generated by the student during inference. To address this issue, we introduce Generalized Knowledge Distillation (GKD). |
Rishabh Agarwal; Nino Vieillard; Yongchao Zhou; Piotr Stanczyk; Sabela Ramos Garea; Matthieu Geist; Olivier Bachem; |
219 | Human Feedback Is Not Gold Standard Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We critically analyse the use of human feedback for both training and evaluation, to verify whether it fully captures a range of crucial error criteria. |
Tom Hosking; Phil Blunsom; Max Bartolo; |
220 | Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose SAT Probe, a method probing attention patterns, that can predict factual errors and fine-grained constraint satisfaction, and allow early error identification. |
Mert Yuksekgonul; Varun Chandrasekaran; Erik Jones; Suriya Gunasekar; Ranjita Naik; Hamid Palangi; Ece Kamar; Besmira Nushi; |
221 | DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Conventional training-based methods have limitations in flexibility, particularly when adapting to new domains, and they often lack explanatory power. To address this gap, we propose a novel training-free detection strategy called Divergent N-Gram Analysis (DNA-GPT). |
Xianjun Yang; Wei Cheng; Yue Wu; Linda Ruth Petzold; William Yang Wang; Haifeng Chen; |
222 | Can LLMs Keep A Secret? Testing Privacy Implications of Language Models Via Contextual Integrity Theory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we shed light on the often-overlooked interactive settings where an LLM receives information from multiple sources and generates an output to be shared with other entities, creating the potential of exposing sensitive input data in inappropriate contexts. |
Niloofar Mireshghallah; Hyunwoo Kim; Xuhui Zhou; Yulia Tsvetkov; Maarten Sap; Reza Shokri; Yejin Choi; |
223 | Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we present **Rep**hrase, **A**ugment and **Re**ason (RepARe), a gradient-free framework that extracts salient details about the image using the underlying LVLM as a captioner and reasoner, in order to propose modifications to the original question. |
Archiki Prasad; Elias Stengel-Eskin; Mohit Bansal; |
224 | Large Language Models As Automated Aligners for Benchmarking Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the limitations via Auto-Bench, which delves into exploring LLMs as proficient aligners, measuring the alignment between VLMs and human intelligence and value through automatic data curation and assessment. |
Yuanfeng Ji; Chongjian GE; Weikai Kong; Enze Xie; Zhengying Liu; Zhenguo Li; Ping Luo; |
225 | SalUn: Empowering Machine Unlearning Via Gradient-based Weight Saliency in Both Image Classification and Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing MU methods focusing on data and/or weight perspectives often suffer limitations in unlearning accuracy, stability, and cross-domain applicability. To address these challenges, we introduce the concept of ‘weight saliency’ for MU, drawing parallels with input saliency in model explanation. |
Chongyu Fan; Jiancheng Liu; Yihua Zhang; Eric Wong; Dennis Wei; Sijia Liu; |
226 | AlpaGasus: Training A Better Alpaca with Fewer Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple and effective data selection strategy that automatically identifies and removes low-quality data using a strong LLM (e.g., ChatGPT). |
Lichang Chen; Shiyang Li; Jun Yan; Hai Wang; Kalpa Gunaratna; Vikas Yadav; Zheng Tang; Vijay Srinivasan; Tianyi Zhou; Heng Huang; Hongxia Jin; |
227 | Learning to Act from Actionless Videos Through Dense Correspondences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments from few video demonstrations without using any action annotations. |
Po-Chen Ko; Jiayuan Mao; Yilun Du; Shao-Hua Sun; Joshua B. Tenenbaum; |
228 | ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods either struggle with unidirectional exploration in expansive action spaces, trapped into a locally optimal solution, or suffer from exhaustively traversing all potential actions, causing inefficient navigation. To address these issues, we propose ToolChain*, an efficient tree search-based planning algorithm for LLM-based agents. |
Yuchen Zhuang; Xiang Chen; Tong Yu; Saayan Mitra; Victor Bursztyn; Ryan A. Rossi; Somdeb Sarkhel; Chao Zhang; |
229 | Unlocking The Power of Representations in Long-term Novelty-based Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Robust Exploration via Clustering-based Online Density Estimation (RECODE), a non-parametric method for novelty-based exploration that estimates visitation counts for clusters of states based on their similarity in a chosen embedding space. |
Alaa Saade; Steven Kapturowski; Daniele Calandriello; Charles Blundell; Pablo Sprechmann; Leopoldo Sarra; Oliver Groth; Michal Valko; Bilal Piot; |
230 | PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces PixArt-$\alpha$, a Transformer-based T2I diffusion model whose image generation quality is competitive with state-of-the-art image generators (e.g., Imagen, SDXL, and even Midjourney), reaching near-commercial application standards. |
Junsong Chen; Jincheng YU; Chongjian GE; Lewei Yao; Enze Xie; Zhongdao Wang; James Kwok; Ping Luo; Huchuan Lu; Zhenguo Li; |
231 | ZipIt! Merging Models from Different Tasks Without Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we tackle the extremely difficult problem of combining distinct models with different initializations, each solving a separate task, into one multi-task model without any additional training. |
George Stoica; Daniel Bolya; Jakob Brandt Bjorner; Pratik Ramesh; Taylor Hearn; Judy Hoffman; |
232 | Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nevertheless, there is still an inadequacy in assessing the abilities of MLLMs on **low-level visual perception and understanding**. To address this gap, we present **Q-Bench**, a holistic benchmark crafted to systematically evaluate potential abilities of MLLMs on three realms: low-level visual perception, low-level visual description, and overall visual quality assessment. |
Haoning Wu; Zicheng Zhang; Erli Zhang; Chaofeng Chen; Liang Liao; Annan Wang; Chunyi Li; Wenxiu Sun; Qiong Yan; Guangtao Zhai; Weisi Lin; |
233 | MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce MINT, a benchmark that evaluates LLMs’ ability to solve tasks with multi-turn interactions by (1) using tools and (2) leveraging natural language feedback. |
Xingyao Wang; Zihan Wang; Jiateng Liu; Yangyi Chen; Lifan Yuan; Hao Peng; Heng Ji; |
234 | Scaling Laws of RoPE-based Extrapolation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: After that, we propose \textbf{\textit{Scaling Laws of RoPE-based Extrapolation}}, a unified framework from the periodic perspective, to describe the relationship between the extrapolation performance and base value as well as tuning context length. |
Xiaoran Liu; Hang Yan; Chenxin An; Xipeng Qiu; Dahua Lin; |
235 | One Step of Gradient Descent Is Provably The Optimal In-Context Learner with One Layer of Linear Self-Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We theoretically study transformers with a single layer of linear self-attention, trained on synthetic noisy linear regression data. |
Arvind V. Mahankali; Tatsunori Hashimoto; Tengyu Ma; |
236 | Transformers Can Optimally Learn Regression Mixture Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the hypothesis that transformers can learn an optimal predictor for mixtures of regressions. |
Reese Pathak; Rajat Sen; Weihao Kong; Abhimanyu Das; |
237 | Conversational Drug Editing Using Retrieval and Domain Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While drug editing, a critical task in the drug discovery pipeline, remains largely unexplored. To bridge this gap, we propose ChatDrug, a framework to facilitate the systematic investigation of drug editing using LLMs. |
Shengchao Liu; Jiongxiao Wang; Yijin Yang; Chengpeng Wang; Ling Liu; Hongyu Guo; Chaowei Xiao; |
238 | What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we delve deeply into automatic data selection strategies for alignment. |
Wei Liu; Weihao Zeng; Keqing He; Yong Jiang; Junxian He; |
239 | RECOMP: Improving Retrieval-Augmented LMs with Context Compression and Selective Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present two compressors — an extractive compressor which selects useful sentences from retrieved documents and an abstractive compressor which generates summary by synthesizing information from multiple documents. |
Fangyuan Xu; Weijia Shi; Eunsol Choi; |
240 | GENOME: Generative Neuro-Symbolic Visual Reasoning By Growing and Reusing Modules Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the contrary, human beings gradually acquire knowledge that can be reused and grow into more profound skills for fast generalization to new tasks since we are an infant. Inspired by this, we propose generative neuro-symbolic visual reasoning by growing and reusing modules. |
Zhenfang Chen; Rui Sun; Wenjun Liu; Yining Hong; Chuang Gan; |
241 | In-Context Learning Through The Bayesian Prism Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we empirically examine how far this Bayesian perspective can help us understand ICL. To this end, we generalize the previous meta-ICL setup to hierarchical meta-ICL setup which involve unions of multiple task families. |
Madhur Panwar; Kabir Ahuja; Navin Goyal; |
242 | DEEP NEURAL NETWORK INITIALIZATION WITH SPARSITY INDUCING ACTIVATIONS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we use the large width Gaussian process limit to analyze the behaviour, at random initialization, of nonlinear activations that induce sparsity in the hidden outputs. |
Ilan Price; Nicholas Daultry Ball; Adam Christopher Jones; Samuel Chun Hei Lam; Jared Tanner; |
243 | MOFI: Learning Image Representations from Noisy Entity Annotated Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present MOFI, Manifold OF Images, a new vision foundation model designed to learn image representations from noisy entity annotated images. |
Wentao Wu; Aleksei Timofeev; Chen Chen; Bowen Zhang; Kun Duan; Shuangning Liu; Yantao Zheng; Jonathon Shlens; Xianzhi Du; Yinfei Yang; |
244 | Text2Reward: Reward Shaping with Language Models for Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Designing reward functions is a longstanding challenge in reinforcement learning (RL); it requires specialized knowledge or domain data, leading to high costs for development. To address this, we introduce Text2Reward, a data-free framework that automates the generation and shaping of dense reward functions based on large language models (LLMs). |
Tianbao Xie; Siheng Zhao; Chen Henry Wu; Yitao Liu; Qian Luo; Victor Zhong; Yanchao Yang; Tao Yu; |
245 | Social Reward: Evaluating and Enhancing Generative AI Through Million-User Feedback from An Online Creative Community Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We embark on an extensive journey of dataset curation and refinement, drawing from Picsart: an online visual creation and editing platform, yielding a first million-user-scale dataset of implicit human preferences for user-generated visual art named Picsart Image-Social. Our anal- ysis exposes the shortcomings of current metrics in modeling community creative preference of text-to-image models’ outputs, compelling us to introduce a novel predictive model explicitly tailored to address these limitations. |
Arman Isajanyan; Artur Shatveryan; David Kocharian; Zhangyang Wang; Humphrey Shi; |
246 | UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes UniAdapter, which unifies unimodal and multimodal adapters for parameter-efficient cross-modal adaptation on pre-trained vision-language models. |
Haoyu Lu; Yuqi Huo; Guoxing Yang; Zhiwu Lu; Wei Zhan; Masayoshi Tomizuka; Mingyu Ding; |
247 | VDT: General-purpose Video Diffusion Transformers Via Mask Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work introduces Video Diffusion Transformer (VDT), which pioneers the use of transformers in diffusion-based video generation. |
Haoyu Lu; Guoxing Yang; Nanyi Fei; Yuqi Huo; Zhiwu Lu; Ping Luo; Mingyu Ding; |
248 | The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of The Open World Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the All-Seeing (AS) project: a large-scale dataset and model for recognizing and understanding everything in the open world.Using a scalable data engine that incorporates human feedback and efficient models in the loop, we create a new dataset (AS-1B) with over 1.2 billion regions annotated with semantic tags, question-answering pairs, and detailed captions. |
Weiyun Wang; Min Shi; Qingyun Li; Wenhai Wang; Zhenhang Huang; Linjie Xing; Zhe Chen; Hao Li; Xizhou Zhu; Zhiguo Cao; Yushi Chen; Tong Lu; Jifeng Dai; Yu Qiao; |
249 | Influencer Backdoor Attack on Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore backdoor attacks on segmentation models to misclassify all pixels of a victim class by injecting a specific trigger on non-victim pixels during inferences, which is dubbed Influencer Backdoor Attack (IBA). |
Haoheng Lan; Jindong Gu; Philip Torr; Hengshuang Zhao; |
250 | Instant3D: Fast Text-to-3D with Sparse-view Generation and Large Reconstruction Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner. |
Jiahao Li; Hao Tan; Kai Zhang; Zexiang Xu; Fujun Luan; Yinghao Xu; Yicong Hong; Kalyan Sunkavalli; Greg Shakhnarovich; Sai Bi; |
251 | Effective Pruning of Web-scale Datasets Based on Complexity of Concept Clusters Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to improve training and data efficiency, we here push the limits of pruning large-scale multimodal datasets for training CLIP-style models. |
Amro Kamal Mohamed Abbas; Evgenia Rusak; Kushal Tirumala; Wieland Brendel; Kamalika Chaudhuri; Ari S. Morcos; |
252 | Retrieval Is Accurate Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel method that selects context-aware phrases from a collection of supporting documents. |
Bowen Cao; Deng Cai; Leyang Cui; Xuxin Cheng; Wei Bi; Yuexian Zou; Shuming Shi; |
253 | Brain Decoding: Toward Real-time Reconstruction of Visual Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose an alternative approach based on magnetoencephalography (MEG), a neuroimaging device capable of measuring brain activity with high temporal resolution ($\approx$5,000 Hz). |
Yohann Benchetrit; Hubert Banville; Jean-Remi King; |
254 | In-Context Learning Learns Label Relationships But Is Not Conventional Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide novel insights into how ICL leverages label information, revealing both capabilities and limitations. |
Jannik Kossen; Yarin Gal; Tom Rainforth; |
255 | RLCD: Reinforcement Learning from Contrastive Distillation for LM Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Reinforcement Learning from Contrastive Distillation (RLCD), a method for aligning language models to follow principles expressed in natural language (e.g., to be more harmless) without using human feedback. |
Kevin Yang; Dan Klein; Asli Celikyilmaz; Nanyun Peng; Yuandong Tian; |
256 | Emu: Generative Pretraining in Multimodality Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Emu, a multimodal foundation model that seamlessly generates images and text in multimodal context. |
Quan Sun; Qiying Yu; Yufeng Cui; Fan Zhang; Xiaosong Zhang; Yueze Wang; Hongcheng Gao; Jingjing Liu; Tiejun Huang; Xinlong Wang; |
257 | H-GAP: Humanoid Control with A Generalist Planner Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the extensive collection of human motion-captured data and the derived datasets of humanoid trajectories, such as MoCapAct, paves the way to tackle these challenges. In this context, we present Humanoid Generalist Autoencoding Planner (H-GAP), a state-action trajectory generative model trained on humanoid trajectories derived from human motion-captured data, capable of adeptly handling downstream control tasks with Model Predictive Control (MPC). |
zhengyao jiang; Yingchen Xu; Nolan Wagener; Yicheng Luo; Michael Janner; Edward Grefenstette; Tim Rocktäschel; Yuandong Tian; |
258 | Sin3DM: Learning A Diffusion Model from A Single 3D Textured Shape Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Sin3DM, a diffusion model that learns the internal patch distribution from a single 3D textured shape and generates high-quality variations with fine geometry and texture details. |
Rundi Wu; Ruoshi Liu; Carl Vondrick; Changxi Zheng; |
259 | EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, generating fine-grained segmentation masks with diffusion models often requires additional training on annotated datasets, leaving it unclear to what extent pre-trained diffusion models alone understand the semantic relations of their generated images. To address this question, we leverage the semantic knowledge extracted from Stable Diffusion (SD) and aim to develop an image segmentor capable of generating fine-grained segmentation maps without any additional training. |
Koichi Namekata; Amirmojtaba Sabour; Sanja Fidler; Seung Wook Kim; |
260 | Label-free Node Classification on Graphs with Large Language Models (LLMs) Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Yet, they face challenges in efficiently processing structural data and suffer from high inference costs. In light of these observations, this work introduces a label-free node classification on graphs with LLMs pipeline, LLM-GNN. |
Zhikai Chen; Haitao Mao; Hongzhi Wen; Haoyu Han; Wei Jin; Haiyang Zhang; Hui Liu; Jiliang Tang; |
261 | Learning Delays in Spiking Neural Networks Using Dilated Convolutions with Learnable Spacings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Yet, efficient algorithms to learn these delays have been lacking. Here, we propose a new discrete-time algorithm that addresses this issue in deep feedforward SNNs using backpropagation, in an offline manner. |
Ilyass Hammouamri; Ismail Khalfaoui-Hassani; Timothée Masquelier; |
262 | Compressing LLMs: The Truth Is Rarely Pure and Never Simple Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce **K**nowledge-**I**ntensive **C**ompressed LLM Benchmar**K** **(LLM-KICK)**, a collection of carefully-curated tasks to re-define the evaluation protocol for compressed LLMs, which have significant alignment with their dense counterparts, and perplexity fail to capture subtle change in their true capabilities. |
AJAY KUMAR JAISWAL; Zhe Gan; Xianzhi Du; Bowen Zhang; Zhangyang Wang; Yinfei Yang; |
263 | Learning to Reject with A Fixed Predictor: Application to Decontextualization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of classification with a reject option for a fixed predictor, crucial to natural language processing. We introduce a new problem formulation for this scenario, and an algorithm minimizing a new surrogate loss function. |
Christopher Mohri; Daniel Andor; Eunsol Choi; Michael Collins; Anqi Mao; Yutao Zhong; |
264 | ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods often suffer from suboptimal model compression due to their lack of a global perspective. To address this limitation in recent efficient pruning methods for large models, we propose Efficient Coarse-to-Fine Layer-Wise Pruning (ECoFLaP), a two-stage coarse-to-fine weight pruning approach for LVLMs. |
Yi-Lin Sung; Jaehong Yoon; Mohit Bansal; |
265 | Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, in this work, we focus on leveraging LLMs to capture textual information as features, which can be used to boost GNN performance on downstream tasks. |
Xiaoxin He; Xavier Bresson; Thomas Laurent; Adam Perold; Yann LeCun; Bryan Hooi; |
266 | Large-Vocabulary 3D Diffusion Model with Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a diffusion-based feed-forward framework for synthesizing massive categories of real-world 3D objects \textit{with a single generative model}. |
Ziang Cao; Fangzhou Hong; Tong Wu; Liang Pan; Ziwei Liu; |
267 | LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a remedy, we propose an effective PTQ method called LiDAR-PTQ, which is particularly curated for 3D lidar detection (both SPConv-based and SPConv-free). |
Sifan Zhou; Liang Li; Xinyu Zhang; Bo Zhang; Shipeng Bai; Miao Sun; Ziyu Zhao; Xiaobo Lu; Xiangxiang Chu; |
268 | Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new LLM-KG integrating paradigm “$\hbox{LLM}\otimes\hbox{KG}$” which treats the LLM as an agent to interactively explore related entities and relations on KGs and perform reasoning based on the retrieved knowledge. |
Jiashuo Sun; Chengjin Xu; Lumingyuan Tang; Saizhuo Wang; Chen Lin; Yeyun Gong; Lionel Ni; Heung-Yeung Shum; Jian Guo; |
269 | The Expressive Power of Transformers with Chain of Thought Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by this, we ask: *Does such intermediate generation fundamentally extend the computational power of a decoder-only transformer?* We show that the answer is *yes*, but the amount of increase depends crucially on the amount of intermediate generation. |
William Merrill; Ashish Sabharwal; |
270 | An Unforgeable Publicly Verifiable Watermark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection. To address this limitation, we propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages. |
Aiwei Liu; Leyi Pan; Xuming Hu; Shuang Li; Lijie Wen; Irwin King; Philip S. Yu; |
271 | Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: They can also output toxic or harmful text. To mitigate these safety and informational issues, we propose an attack-and-defense framework for studying the task of deleting sensitive information directly from model weights. |
Vaidehi Patil; Peter Hase; Mohit Bansal; |
272 | Circuit Component Reuse Across Tasks in Transformer Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present evidence that insights (both low-level findings about specific heads and higher-level findings about general algorithms) can indeed generalize across tasks. |
Jack Merullo; Carsten Eickhoff; Ellie Pavlick; |
273 | Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a comprehensive investigation into self-contradiction for various instruction-tuned LMs, covering evaluation, detection, and mitigation. |
Niels Mündler; Jingxuan He; Slobodan Jenko; Martin Vechev; |
274 | DistillSpec: Improving Speculative Decoding Via Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, identifying a compact draft model that is well-aligned with the target model is challenging. To tackle this issue, we propose {\em DistillSpec} that uses knowledge distillation to better align the draft model with the target model, before applying SD. |
Yongchao Zhou; Kaifeng Lyu; Ankit Singh Rawat; Aditya Krishna Menon; Afshin Rostamizadeh; Sanjiv Kumar; Jean-François Kagy; Rishabh Agarwal; |
275 | Teaching Arithmetic to Small Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study investigates how even small transformers, trained from random initialization, can efficiently learn arithmetic operations such as addition, multiplication, and elementary functions like square root, using the next-token prediction objective. |
Nayoung Lee; Kartik Sreenivasan; Jason D. Lee; Kangwook Lee; Dimitris Papailiopoulos; |
276 | DIFFTACTILE: A Physics-based Differentiable Tactile Simulator for Contact-rich Robotic Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce DIFFTACTILE, a physics-based differentiable tactile simulation system designed to enhance robotic manipulation with dense and physically accurate tactile feedback. |
Zilin Si; Gu Zhang; Qingwei Ben; Branden Romero; Zhou Xian; Chao Liu; Chuang Gan; |
277 | PlaSma: Procedural Knowledge Models for Language-based Planning and Re-Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we advocate planning using smaller language models. |
Faeze Brahman; Chandra Bhagavatula; Valentina Pyatkin; Jena D. Hwang; Xiang Lorraine Li; Hirona Jacqueline Arai; Soumya Sanyal; Keisuke Sakaguchi; Xiang Ren; Yejin Choi; |
278 | DiffusionSat: A Generative Foundation Model for Satellite Imagery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present DiffusionSat, to date the largest generative foundation model trained on a collection of publicly available large, high-resolution remote sensing datasets . |
Samar Khanna; Patrick Liu; Linqi Zhou; Chenlin Meng; Robin Rombach; Marshall Burke; David B. Lobell; Stefano Ermon; |
279 | Building Cooperative Embodied Agents Modularly with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embodied environments. |
Hongxin Zhang; Weihua Du; Jiaming Shan; Qinhong Zhou; Yilun Du; Joshua B. Tenenbaum; Tianmin Shu; Chuang Gan; |
280 | To The Cutoff… and Beyond? A Longitudinal Perspective on LLM Data Contamination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we conduct the first thorough longitudinal analysis of data contamination in LLMs by using the natural experiment of training cutoffs in GPT models to look at benchmarks released over time. |
Manley Roberts; Himanshu Thakur; Christine Herlihy; Colin White; Samuel Dooley; |
281 | SALMONN: Towards Generic Hearing Abilities for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose SALMONN, a speech audio language music open neural network, built by integrating a pre-trained text-based large language model (LLM) with speech and audio encoders into a single multimodal model. |
Changli Tang; Wenyi Yu; Guangzhi Sun; Xianzhao Chen; Tian Tan; Wei Li; Lu Lu; Zejun MA; Chao Zhang; |
282 | Thin-Shell Object Manipulations With Differentiable Physics Simulations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to teach robots to manipulate various thin-shell materials. |
Yian Wang; Juntian Zheng; Zhehuan Chen; Zhou Xian; Gu Zhang; Chao Liu; Chuang Gan; |
283 | Frozen Transformers in Language Models Are Effective Visual Encoder Layers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper reveals that large language models (LLMs), despite being trained solely on text data, are surprisingly}strong encoders for purely visual tasks in the absence of language. |
Ziqi Pang; Ziyang Xie; Yunze Man; Yu-Xiong Wang; |
284 | Fantastic Gains and Where to Find Them: On The Existence and Prospect of General Knowledge Transfer Between Any Pretrained Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Training deep networks requires various design decisions regarding for instance their architecture, data augmentation, or optimization. In this work, we find these training variations to result in networks learning unique feature sets from the data. |
Karsten Roth; Lukas Thede; A. Sophia Koepke; Oriol Vinyals; Olivier J Henaff; Zeynep Akata; |
285 | The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a unified probabilistic formulation for diffusion-based image editing, where a latent variable is edited in a task-specific manner and generally deviates from the corresponding marginal distribution induced by the original stochastic or ordinary differential equation (SDE or ODE).We build a challenging benchmark (termed \emph{DragBench}) with open-set natural, art, and AI-generated images for evaluation. |
Shen Nie; Hanzhong Allan Guo; Cheng Lu; Yuhao Zhou; Chenyu Zheng; Chongxuan Li; |
286 | The Generative AI Paradox: “What It Can Create, It May Not Understand” Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This presents us with an apparent paradox: how do we reconcile seemingly superhuman capabilities with the persistence of errors that few humans would make? In this work, we posit that this tension reflects a divergence in the configuration of intelligence in today’s generative models relative to intelligence in humans. |
Peter West; Ximing Lu; Nouha Dziri; Faeze Brahman; Linjie Li; Jena D. Hwang; Liwei Jiang; Jillian Fisher; Abhilasha Ravichander; Khyathi Chandu; Benjamin Newman; Pang Wei Koh; Allyson Ettinger; Yejin Choi; |
287 | DoLa: Decoding By Contrasting Layers Improves Factuality in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a simple decoding strategy for reducing hallucinations with pretrained LLMs that does not require conditioning on retrieved external knowledge nor additional fine-tuning. |
Yung-Sung Chuang; Yujia Xie; Hongyin Luo; Yoon Kim; James R. Glass; Pengcheng He; |
288 | JoMA: Demystifying Multilayer Transformers Via Joint Dynamics of MLP and Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Joint MLP/Attention (JoMA) dynamics, a novel mathematical framework to understand the training procedure of multilayer Transformer architectures. |
Yuandong Tian; Yiping Wang; Zhenyu Zhang; Beidi Chen; Simon Shaolei Du; |
289 | Würstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A key contribution of our work is to develop a latent diffusion technique in which we learn a detailed but extremely compact semantic image representation used to guide the diffusion process. |
Pablo Pernias; Dominic Rampas; Mats Leon Richter; Christopher Pal; Marc Aubreville; |
290 | Understanding Catastrophic Forgetting in Language Models Via Implicit Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We hypothesize that language models implicitly infer the task of the prompt and that fine-tuning skews this inference towards tasks in the fine-tuning distribution. To test this, we propose Conjugate Prompting, which artificially makes the task look farther from the fine-tuning distribution while requiring the same capability, and we find that this recovers some of the pretraining capabilities in our synthetic setup. |
Suhas Kotha; Jacob Mitchell Springer; Aditi Raghunathan; |
291 | A Progressive Training Framework for Spiking Neural Networks with Learnable Multi-hierarchical Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the widely adopted Leaky Integrate-and-Fire (LIF) model, as the mainstream neuron model in current SNN research, has been revealed to exhibit significant deficiencies in deep-layer gradient calculation and capturing global information on the time dimension. In this paper, we propose the Learnable Multi-hierarchical (LM-H) model to address these issues by dynamically regulating its membrane-related factors. |
Zecheng Hao; Xinyu Shi; Zihan Huang; Tong Bu; Zhaofei Yu; Tiejun Huang; |
292 | Threaten Spiking Neural Networks Through Combining Rate and Temporal Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we draw inspiration from two mainstream learning algorithms of SNNs and observe that SNN models reserve both rate and temporal information. |
Zecheng Hao; Tong Bu; Xinyu Shi; Zihan Huang; Zhaofei Yu; Tiejun Huang; |
293 | Uni3D: Exploring Unified 3D Representation at Scale Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Uni3D, a 3D foundation model to explore the unified 3D representation at scale. |
Junsheng Zhou; Jinsheng Wang; Baorui Ma; Yu-Shen Liu; Tiejun Huang; Xinlong Wang; |
294 | TD-MPC2: Scalable, Robust World Models for Continuous Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present TD-MPC2: a series of improvements upon the TD-MPC algorithm. |
Nicklas Hansen; Hao Su; Xiaolong Wang; |
295 | Large Multilingual Models Pivot Zero-Shot Multimodal Learning Across Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose MPM, an effective training paradigm for training large multimodal models in low-resource languages. |
Jinyi Hu; Yuan Yao; Chongyi Wang; SHAN WANG; Yinxu Pan; Qianyu Chen; Tianyu Yu; Hanghao Wu; Yue Zhao; Haoye Zhang; Xu Han; Yankai Lin; Jiao Xue; dahai li; Zhiyuan Liu; Maosong Sun; |
296 | Closing The Curious Case of Neural Text Degeneration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We provide a theoretical explanation for the effectiveness of the truncation sampling by proving that truncation methods that discard tokens below some probability threshold (the most common type of truncation) can guarantee that all sampled tokens have nonzero true probability. |
Matthew Finlayson; John Hewitt; Alexander Koller; Swabha Swayamdipta; Ashish Sabharwal; |
297 | Transformers As Decision Makers: Provable In-Context Reinforcement Learning Via Supervised Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper provides a theoretical framework that analyzes supervised pretraining for ICRL. |
Licong Lin; Yu Bai; Song Mei; |
298 | When Do Prompting and Prefix-Tuning Work? A Theory of Capabilities and Limitations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that despite the continuous embedding space being more expressive than the discrete token space, soft-prompting and prefix-tuning are potentially less expressive than full fine-tuning, even with the same number of learnable parameters. |
Aleksandar Petrov; Philip Torr; Adel Bibi; |
299 | LoftQ: LoRA-Fine-Tuning-aware Quantization for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we focus on the scenario where quantization and LoRA fine- tuning are applied together on a pre-trained model. |
Yixiao Li; Yifan Yu; Chen Liang; Nikos Karampatziakis; Pengcheng He; Weizhu Chen; Tuo Zhao; |
300 | Chain-of-Table: Evolving Tables in The Reasoning Chain for Table Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Chain-of-Table framework, where tabular data is explicitly used in the reasoning chain as a proxy for intermediate thoughts. |
Zilong Wang; Hao Zhang; Chun-Liang Li; Julian Martin Eisenschlos; Vincent Perot; Zifeng Wang; Lesly Miculicich; Yasuhisa Fujii; Jingbo Shang; Chen-Yu Lee; Tomas Pfister; |
301 | Boundary Denoising for Video Activity Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose an encoder-decoder model named DenosieLoc. |
Mengmeng Xu; Mattia Soldan; Jialin Gao; Shuming Liu; Juan-Manuel Perez-Rua; Bernard Ghanem; |
302 | Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formally study transferrable representation learning underlying CLIP and demonstrate how features from different modalities get aligned. |
Zixiang Chen; Yihe Deng; Yuanzhi Li; Quanquan Gu; |
303 | Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Large Language Models (LLMs) excel in various tasks, but they rely on carefully crafted prompts that often demand substantial human effort. To automate this process, in this paper, we propose a novel framework for discrete prompt optimization, called EvoPrompt, which borrows the idea of evolutionary algorithms (EAs) as they exhibit good performance and fast convergence. |
Qingyan Guo; Rui Wang; Junliang Guo; Bei Li; Kaitao Song; Xu Tan; Guoqing Liu; Jiang Bian; Yujiu Yang; |
304 | Single Motion Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce SinMDM, a Single Motion Diffusion Model. |
Sigal Raab; Inbal Leibovitch; Guy Tevet; Moab Arar; Amit Haim Bermano; Daniel Cohen-Or; |
305 | Talk Like A Graph: Encoding Graphs for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we perform the first comprehensive study of encoding graph-structured data as text for consumption by LLMs. |
Bahare Fatemi; Jonathan Halcrow; Bryan Perozzi; |
306 | A Semantic Invariant Robust Watermark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a semantic invariant watermarking method for LLMs that provides both attack robustness and security robustness. |
Aiwei Liu; Leyi Pan; Xuming Hu; Shiao Meng; Lijie Wen; |
307 | Towards Lossless Dataset Distillation Via Difficulty-Aligned Trajectory Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Since only so much information can be contained in such a small number of samples, it seems that to achieve truly loss dataset distillation, we must develop a distillation method that remains effective as the size of the synthetic dataset grows. In this work, we present such an algorithm and elucidate why existing methods fail to generate larger, high-quality synthetic sets. |
Ziyao Guo; Kai Wang; George Cazenavette; HUI LI; Kaipeng Zhang; Yang You; |
308 | FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, these models only support single-text conditions, whereas real-life scenarios often require multi-text conditions as the video content changes over time. To tackle these challenges, this study explores the potential of extending the text-driven capability to generate longer videos conditioned on multiple texts. |
Haonan Qiu; Menghan Xia; Yong Zhang; Yingqing He; Xintao Wang; Ying Shan; Ziwei Liu; |
309 | TapMo: Shape-aware Motion Generation of Skeleton-free Characters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present TapMo, a Text-driven Animation PIpeline for synthesizing Motion in a broad spectrum of skeleton-free 3D characters. |
Jiaxu Zhang; Shaoli Huang; Zhigang Tu; Xin Chen; Xiaohang Zhan; Gang YU; Ying Shan; |
310 | METRA: Scalable Unsupervised RL with Metric-Aware Abstraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To make unsupervised RL scalable to complex, high-dimensional environments, we propose a novel unsupervised RL objective, which we call Metric-Aware Abstraction (METRA). |
Seohong Park; Oleh Rybkin; Sergey Levine; |
311 | EasyTPP: Towards Open Benchmarking Temporal Point Processes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present EasyTPP, the first central repository of research assets (e.g., data, models, evaluation programs, documentations) in the area of event sequence modeling. |
Siqiao Xue; Xiaoming Shi; Zhixuan Chu; Yan Wang; Hongyan Hao; Fan Zhou; Caigao JIANG; Chen Pan; James Y. Zhang; Qingsong Wen; JUN ZHOU; Hongyuan Mei; |
312 | MagicDrive: Street View Generation with Diverse 3D Geometry Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce MagicDrive, a novel street view generation framework, offering diverse 3D geometry controls including camera poses, road maps, and 3D bounding boxes, together with textual descriptions, achieved through tailored encoding strategies. |
Ruiyuan Gao; Kai Chen; Enze Xie; Lanqing HONG; Zhenguo Li; Dit-Yan Yeung; Qiang Xu; |
313 | Does CLIP’s Generalization Performance Mainly Stem from High Train-test Similarity? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Foundation models like CLIP are trained on hundreds of millions of samples and effortlessly generalize to new tasks and inputs. |
Prasanna Mayilvahanan; Thaddäus Wiedemer; Evgenia Rusak; Matthias Bethge; Wieland Brendel; |
314 | Mitigating The Curse of Dimensionality for Certified Robustness Via Dual Randomized Smoothing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper explores the feasibility of providing ${\ell_2}$ certified robustness for high-dimensional input through the utilization of dual smoothing in the lower-dimensional space. |
Song Xia; Yi Yu; Xudong Jiang; Henghui Ding; |
315 | Functional Interpolation for Relative Positions Improves Long Context Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel functional relative position encoding with progressive interpolation, FIRE, to improve Transformer generalization to longer contexts. |
Shanda Li; Chong You; Guru Guruganesh; Joshua Ainslie; Santiago Ontanon; Manzil Zaheer; Sumit Sanghai; Yiming Yang; Sanjiv Kumar; Srinadh Bhojanapalli; |
316 | Multimodal Molecular Pretraining Via Modality Blending Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To derive fine-grained alignment and promote structural molecule understanding, we introduce an atomic-relation level blend-then-predict self-supervised learning approach, MoleBLEND, which first blends atom relations represented by different modalities into one unified relation matrix for joint encoding, then recovers modality-specific information for 2D and 3D structures individually. |
Qiying Yu; Yudi Zhang; Yuyan Ni; Shikun Feng; Yanyan Lan; Hao Zhou; Jingjing Liu; |
317 | Effective and Efficient Federated Tree Learning on Hybrid Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In practice, a common scenario is the hybrid data setting, where data from different parties may differ both in the features and samples. To address this, we propose HybridTree, a novel federated learning approach that enables federated tree learning on hybrid data. |
Qinbin Li; Chulin Xie; Xiaojun Xu; Xiaoyuan Liu; Ce Zhang; Bo Li; Bingsheng He; Dawn Song; |
318 | Quantifying The Plausibility of Context Reliance in Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the questions of $\textit{when}$ and $\textit{which parts}$ of the context affect model generations are typically tackled separately, and current plausibility evaluations are practically limited to a handful of artificial benchmarks. To address this, we introduce $\textbf{P}$lausibility $\textbf{E}$valuation of $\textbf{Co}$ntext $\textbf{Re}$liance (PECoRe), an end-to-end interpretability framework designed to quantify context usage in language models’ generations. |
Gabriele Sarti; Grzegorz Chrupała; Malvina Nissim; Arianna Bisazza; |
319 | The Generalization Gap in Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we compare the generalization abilities of widely used online and offline learning methods such as online reinforcement learning (RL), offline RL, sequence modeling, and behavioral cloning.We also introduce the first benchmark for evaluating generalization in offline learning, collecting datasets of varying sizes and skill-levels from Procgen (2D video games) and WebShop (e-commerce websites). |
Ishita Mediratta; Qingfei You; Minqi Jiang; Roberta Raileanu; |
320 | Tensor Programs VI: Feature Learning in Infinite Depth Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Depth-$\mu$P, a principled approach for depth scaling, allowing for the training of arbitrarily deep architectures while maximizing feature learning and diversity among nearby layers. |
Greg Yang; Dingli Yu; Chen Zhu; Soufiane Hayou; |
321 | Vision-by-Language for Training-Free Compositional Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we proposeto tackle CIR in a training-free manner via our Compositional Image Retrieval through Vision-by-Language (CIReVL), a simple, yet human-understandable and scalable pipeline that effectively recombines large-scale VLMs with large language models (LLMs). |
Shyamgopal Karthik; Karsten Roth; Massimiliano Mancini; Zeynep Akata; |
322 | Retrieval-Enhanced Contrastive Vision-Text Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, a key ingredient to their success has been the use of large-scale curated pre-training data aiming at expanding the set of concepts that they can memorize during the pre-training stage. In this work, we explore an alternative to encoding fine-grained knowledge directly into the model’s parameters: we instead train the model to retrieve this knowledge from an external memory. |
Ahmet Iscen; Mathilde Caron; Alireza Fathi; Cordelia Schmid; |
323 | CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose CoVLM, which can guide the LLM to explicitly compose visual entities and relationships among the text and dynamically communicate with the vision encoder and detection network to achieve vision-language communicative decoding. |
Junyan Li; Delin Chen; Yining Hong; Zhenfang Chen; Peihao Chen; Yikang Shen; Chuang Gan; |
324 | Finite Scalar Quantization: VQ-VAE Made Simple Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to replace vector quantization (VQ) in the latent representation of VQ-VAEs with a simple scheme termed finite scalar quantization (FSQ), where we project the VAE representation down to a few dimensions (typically less than 10). |
Fabian Mentzer; David Minnen; Eirikur Agustsson; Michael Tschannen; |
325 | Analyzing and Mitigating Object Hallucination in Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This can negatively impact many vision-language tasks, such as visual summarization and reasoning. To address this issue, we propose a simple yet powerful algorithm, LVLM Hallucination Revisor (LURE), to post-hoc rectify object hallucination in LVLMs by reconstructing less hallucinatory descriptions. |
Yiyang Zhou; Chenhang Cui; Jaehong Yoon; Linjun Zhang; Zhun Deng; Chelsea Finn; Mohit Bansal; Huaxiu Yao; |
326 | PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel unified bilevel optimization-based framework, \textsf{PARL}, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning using utility or preference-based feedback. |
Souradip Chakraborty; Amrit Bedi; Alec Koppel; Huazheng Wang; Dinesh Manocha; Mengdi Wang; Furong Huang; |
327 | Robust Agents Learn Causal World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is not known if agents must learn causal models in order to generalise to new domains, or if other inductive biases are sufficient. We answer this question, showing that any agent capable of satisfying a regret bound for a large set of distributional shifts must have learned an approximate causal model of the data generating process, which converges to the true causal model for optimal agents. |
Jonathan Richens; Tom Everitt; |
328 | HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, in real-world scenarios, agents might also face dynamically changing environments characterized by unexpected events and need to rapidly take action accordingly. To remedy this gap, we propose a new simulated embodied benchmark, called HAZARD, specifically designed to assess the decision-making abilities of embodied agents in dynamic situations. |
Qinhong Zhou; Sunli Chen; Yisong Wang; Haozhe Xu; Weihua Du; Hongxin Zhang; Yilun Du; Joshua B. Tenenbaum; Chuang Gan; |
329 | Group Preference Optimization: Few-Shot Alignment of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Group Preference Optimization (GPO), an alignment framework that steers language models to preferences of individual groups in a few-shot manner. |
Siyan Zhao; John Dang; Aditya Grover; |
330 | MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction Following Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces MUFFIN, a new scheme of instruction-following dataset curation. |
Renze Lou; Kai Zhang; Jian Xie; Yuxuan Sun; Janice Ahn; Hanzi Xu; Yu Su; Wenpeng Yin; |
331 | ZeRO++: Extremely Efficient Collective Communication for Large Model Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, when training on low-bandwidth clusters, and/or when small batch size per GPU is used, ZeRO’s effective throughput is limited due to communication overheads. To alleviate this limitation, this paper introduces ZeRO++ composing of three communication volume reduction techniques (lowprecision all-gather, data remapping, and low-precision gradient averaging) to significantly reduce the communication volume up to 4x that enables up to 2.16x better throughput at 384 GPU scale. |
Guanhua Wang; Heyang Qin; Sam Ade Jacobs; Xiaoxia Wu; Connor Holmes; Zhewei Yao; Samyam Rajbhandari; Olatunji Ruwase; Feng Yan; Lei Yang; Yuxiong He; |
332 | Graph Parsing Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, inspired by bottom-up grammar induction, we propose an efficient graph parsing algorithm to infer the pooling structure, which then drives graph pooling. |
Yunchong Song; Siyuan Huang; Xinbing Wang; Chenghu Zhou; Zhouhan Lin; |
333 | Improved Probabilistic Image-Text Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To overcome the issues, this paper presents an improved Probabilistic Cross-Modal Embeddings (named PCME++) by introducing a new probabilistic distance with a closed-form solution. |
Sanghyuk Chun; |
334 | SKILL-MIX: A Flexible and Expandable Family of Evaluations for AI Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces SKILL-MIX, a new evaluation to measure ability to combine skills. |
Dingli Yu; Simran Kaur; Arushi Gupta; Jonah Brown-Cohen; Anirudh Goyal; Sanjeev Arora; |
335 | DMV3D: Denoising Multi-view Diffusion Using 3D Large Reconstruction Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose DMV3D, a novel 3D generation approach that uses a transformer-based 3D large reconstruction model to denoise multi-view diffusion. |
Yinghao Xu; Hao Tan; Fujun Luan; Sai Bi; Peng Wang; Jiahao Li; Zifan Shi; Kalyan Sunkavalli; Gordon Wetzstein; Zexiang Xu; Kai Zhang; |
336 | Ins-DetCLIP: Aligning Detection Model to Follow Human-Language Instruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As an initial approach to the IOD task, we propose a model called Ins-DetCLIP.To develop an IOD system, we create a dataset called IOD-Bench, which consists of instruction-guided detections, along with specialized evaluation metrics. |
Renjie Pi; Lewei Yao; Jianhua Han; Xiaodan Liang; Wei Zhang; Hang Xu; |
337 | Overcoming The Pitfalls of Vision-Language Model Finetuning for OOD Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we first demonstrate that vision-language models, after long enough finetuning but without proper regularization, tend to overfit the known classes in the given dataset, with degraded performance on unknown classes. Then we propose a novel approach OGEN to address this pitfall, with the main focus on improving the OOD GENeralization of finetuned models. |
Yuhang Zang; Hanlin Goh; Joshua M. Susskind; Chen Huang; |
338 | Active Test-Time Adaptation: Theoretical Analyses and An Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a simple yet effective ATTA algorithm, known as SimATTA, using real-time sample selection techniques. |
Shurui Gui; Xiner Li; Shuiwang Ji; |
339 | FFB: A Fair Fairness Benchmark for In-Processing Group Fairness Methods Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces the Fair Fairness Benchmark (FFB), a benchmarking framework for in-processing group fairness methods.To address these issues, we introduce an open-source standardized benchmark for evaluating in-processing group fairness methods and provide a comprehensive analysis of state-of-the-art methods to ensure different notions of group fairness. |
Xiaotian Han; Jianfeng Chi; Yu Chen; Qifan Wang; Han Zhao; Na Zou; Xia Hu; |
340 | FasterViT: Fast Vision Transformers with Hierarchical Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We design a new family of hybrid CNN-ViT neural networks, named FasterViT, with a focus on high image throughput for computer vision (CV) applications. |
Ali Hatamizadeh; Greg Heinrich; Hongxu Yin; Andrew Tao; Jose M. Alvarez; Jan Kautz; Pavlo Molchanov; |
341 | SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our results indicate that neither semantic nor acoustic tokens are ideal for this purpose. Therefore, we propose SpeechTokenizer, a unified speech tokenizer for speech large language models. |
Xin Zhang; Dong Zhang; Shimin Li; Yaqian Zhou; Xipeng Qiu; |
342 | Towards Foundation Models for Knowledge Graph Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The key challenge of designing foundation models on KGs is to learn such transferable representations that enable inference on any graph with arbitrary entity and relation vocabularies. In this work, we make a step towards such foundation models and present ULTRA, an approach for learning universal and transferable graph representations. |
Mikhail Galkin; Xinyu Yuan; Hesham Mostafa; Jian Tang; Zhaocheng Zhu; |
343 | DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce DyVal, a general and flexible protocol for dynamic evaluation of LLMs. |
Kaijie Zhu; Jiaao Chen; Jindong Wang; Neil Zhenqiang Gong; Diyi Yang; Xing Xie; |
344 | KoLA: Carefully Benchmarking World Knowledge of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Given the importance of world knowledge to LLMs, we construct a Knowledge-oriented LLM Assessment benchmark (KoLA), in which we carefully design three crucial factors: (1) For ability modeling, we mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering 19 tasks. |
Jifan Yu; Xiaozhi Wang; Shangqing Tu; Shulin Cao; Daniel Zhang-Li; Xin Lv; Hao Peng; Zijun Yao; Xiaohan Zhang; Hanming Li; Chunyang Li; Zheyuan Zhang; Yushi Bai; Yantao Liu; Amy Xin; Kaifeng Yun; Linlu GONG; Nianyi Lin; Jianhui Chen; Zhili Wu; Yunjia Qi; Weikai Li; Yong Guan; Kaisheng Zeng; Ji Qi; Hailong Jin; Jinxin Liu; Yu Gu; Yuan Yao; Ning Ding; Lei Hou; Zhiyuan Liu; Xu Bin; Jie Tang; Juanzi Li; |
345 | AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, inspired by human group dynamics, we propose a multi-agent framework AgentVerse that can effectively orchestrate a collaborative group of expert agents as a greater-than-the-sum-of-its-parts system. |
Weize Chen; Yusheng Su; Jingwei Zuo; Cheng Yang; Chenfei Yuan; Chi-Min Chan; Heyang Yu; Yaxi Lu; Yi-Hsin Hung; Chen Qian; Yujia Qin; Xin Cong; Ruobing Xie; Zhiyuan Liu; Maosong Sun; Jie Zhou; |
346 | Teach LLMs to Phish: Stealing Private Information from Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new \emph{practical} data extraction attack that we call “neural phishing”. |
Ashwinee Panda; Christopher A. Choquette-Choo; Zhengming Zhang; Yaoqing Yang; Prateek Mittal; |
347 | RT-Trajectory: Robotic Task Generalization Via Hindsight Trajectory Sketches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key insight is that this kind of generalization becomes feasible if we represent the task through rough trajectory sketches. We propose a policy conditioning method using such rough trajectory sketches, which we call RT-Trajectory, that is practical, easy to specify, and allows the policy to effectively perform new tasks that would otherwise be challenging to perform. |
Jiayuan Gu; Sean Kirmani; Paul Wohlhart; Yao Lu; Montserrat Gonzalez Arenas; Kanishka Rao; Wenhao Yu; Chuyuan Fu; Keerthana Gopalakrishnan; Zhuo Xu; Priya Sundaresan; Peng Xu; Hao Su; Karol Hausman; Chelsea Finn; Quan Vuong; Ted Xiao; |
348 | DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose DataInf, an efficient influence approximation method that is practical for large-scale generative AI models. |
Yongchan Kwon; Eric Wu; Kevin Wu; James Zou; |
349 | Unified Human-Scene Interaction Via Prompted Chain-of-Contacts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a unified HSI framework, UniHSI, which supports unified control of diverse interactions through language commands.To facilitate training and evaluation, we collect a new dataset named ScenePlan that encompasses thousands of task plans generated by LLMs based on diverse scenarios. |
Zeqi Xiao; Tai Wang; Jingbo Wang; Jinkun Cao; Wenwei Zhang; Bo Dai; Dahua Lin; Jiangmiao Pang; |
350 | DittoGym: Learning to Control Soft Shape-Shifting Robots Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by nature and recent novel robot designs, we propose to go a step further and explore the novel reconfigurable robots, defined as robots that can change their morphology within their lifetime. |
Suning Huang; Boyuan Chen; Huazhe Xu; Vincent Sitzmann; |
351 | LILO: Learning Interpretable Libraries By Compressing and Documenting Code Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code to build libraries tailored to particular problem domains. |
Gabriel Grand; Lionel Wong; Matthew Bowers; Theo X. Olausson; Muxin Liu; Joshua B. Tenenbaum; Jacob Andreas; |
352 | ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate the capability of generating images from pre-trained diffusion models at much higher resolutions than the training image sizes. |
Yingqing He; Shaoshu Yang; Haoxin Chen; Xiaodong Cun; Menghan Xia; Yong Zhang; Xintao Wang; Ran He; Qifeng Chen; Ying Shan; |
353 | Towards The Fundamental Limits of Knowledge Transfer Over Finite Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We characterize the statistical efficiency of knowledge transfer through $n$ samples from a teacher to a probabilistic student classifier with input space $\mathcal{S}$ over labels $\mathcal{A}$. |
Qingyue Zhao; Banghua Zhu; |
354 | Turning Large Language Models Into Cognitive Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Yet, at the same time, these models often show unhuman-like characteristics. In the present paper, we address this gap and ask whether large language models can be turned into cognitive models. |
Marcel Binz; Eric Schulz; |
355 | One-shot Empirical Privacy Estimation for Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These shortcomings make deploying such techniques at scale difficult in practice, especially in federated settings where model training can take days or weeks. In this work, we present a novel “one-shot” approach that can systematically address these challenges, allowing efficient auditing or estimation of the privacy loss of a model during the same, single training run used to fit model parameters, and without requiring any a priori knowledge about the model architecture, task, or DP algorithm. |
Galen Andrew; Peter Kairouz; Sewoong Oh; Alina Oprea; Hugh Brendan McMahan; Vinith Menon Suriyakumar; |
356 | Towards Principled Representation Learning from Videos for Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study three commonly used approaches: autoencoding, temporal contrastive learning, and forward modeling. |
Dipendra Misra; Akanksha Saran; Tengyang Xie; Alex Lamb; John Langford; |
357 | Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Consistency Models (CM) (Song et al., 2023) accelerate score-based diffusion model sampling at the cost of sample quality but lack a natural way to trade-off quality for speed. To address this limitation, we propose Consistency Trajectory Model (CTM), a generalization encompassing CM and score-based models as special cases. |
Dongjun Kim; Chieh-Hsin Lai; Wei-Hsiang Liao; Naoki Murata; Yuhta Takida; Toshimitsu Uesaka; Yutong He; Yuki Mitsufuji; Stefano Ermon; |
358 | Online GNN Evaluation Under Test-time Graph Distribution Shifts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study a new research problem, online GNN evaluation, which aims to provide valuable insights into the well-trained GNNs’s ability to effectively generalize to real-world unlabeled graphs under the test-time graph distribution shifts. |
Xin Zheng; Dongjin Song; Qingsong Wen; Bo Du; Shirui Pan; |
359 | Understanding Reconstruction Attacks with The Neural Tangent Kernel and Dataset Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we first build a stronger version of the dataset reconstruction attack and show how it can provably recover the \emph{entire training set} in the infinite width regime. We then empirically study the characteristics of this attack on two-layer networks and reveal that its success heavily depends on deviations from the frozen infinite-width Neural Tangent Kernel limit. |
Noel Loo; Ramin Hasani; Mathias Lechner; Alexander Amini; Daniela Rus; |
360 | SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a short-to-long video diffusion model, SEINE, that focuses on generative transition and prediction. |
Xinyuan Chen; Yaohui Wang; Lingjun Zhang; Shaobin Zhuang; Xin Ma; Jiashuo Yu; Yali Wang; Dahua Lin; Yu Qiao; Ziwei Liu; |
361 | Large Language Models to Enhance Bayesian Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While there has been substantial progress in BO methods, striking this balance remains a delicate process. In this light, we present \texttt{LLAMBO}, a novel approach that integrates the capabilities of Large Language Models (LLM) within BO. |
Tennison Liu; Nicolás Astorga; Nabeel Seedat; Mihaela van der Schaar; |
362 | Zipformer: A Faster and Better Encoder for Automatic Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we describe a faster, more memory-efficient, and better-performing transformer, called Zipformer. |
Zengwei Yao; Liyong Guo; Xiaoyu Yang; Wei Kang; Fangjun Kuang; Yifan Yang; Zengrui Jin; Long Lin; Daniel Povey; |
363 | InstaFlow: One Step Is Enough for High-Quality Diffusion-Based Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel text-conditioned pipeline to turn Stable Diffusion (SD) into an ultra-fast one-step model, in which we find reflow plays a critical role in improving the assignment between noise and images. |
Xingchao Liu; Xiwen Zhang; Jianzhu Ma; Jian Peng; qiang liu; |
364 | Vision-Language Foundation Models As Effective Robot Imitators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we derive a simple and novel vision-language manipulation framework, dubbed RoboFlamingo, built upon the open-source VLMs, OpenFlamingo. |
Xinghang Li; Minghuan Liu; Hanbo Zhang; Cunjun Yu; Jie Xu; Hongtao Wu; Chilam Cheang; Ya Jing; Weinan Zhang; Huaping Liu; Hang Li; Tao Kong; |
365 | Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose **Plan-Seq-Learn** (PSL): a modular approach that uses motion planning to bridge the gap between abstract language and learned low-level control for solving long-horizon robotics tasks from scratch. |
Murtaza Dalal; Tarun Chiruvolu; Devendra Singh Chaplot; Ruslan Salakhutdinov; |
366 | Backdoor Federated Learning By Poisoning Backdoor-Critical Layers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a general in-situ approach that identifies and verifies BC layers from the perspective of attackers. |
Haomin Zhuang; Mingxian Yu; Hao Wang; Yang Hua; Jian Li; Xu Yuan; |
367 | FedWon: Triumphing Multi-domain Federated Learning Without Normalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the multi-domain problem in FL, we propose a novel method called Federated Learning Without Normalizations (FedWon). |
Weiming Zhuang; Lingjuan Lyu; |
368 | GTA: A Geometry-Aware Attention Mechanism for Multi-View Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We argue that existing positional encoding schemes are suboptimal for 3D vision tasks, as they do not respect their underlying 3D geometric structure. Based on this hypothesis, we propose a geometry-aware attention mechanism that encodes the geometric structure of tokens as relative transformation determined by the geometric relationship between queries and key-value pairs. |
Takeru Miyato; Bernhard Jaeger; Max Welling; Andreas Geiger; |
369 | Can Large Language Models Infer Causation from Correlation? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose the first benchmark dataset to test the pure causal inference skills of large language models (LLMs). |
Zhijing Jin; Jiarui Liu; Zhiheng LYU; Spencer Poff; Mrinmaya Sachan; Rada Mihalcea; Mona T. Diab; Bernhard Schölkopf; |
370 | Copilot4D: Learning Unsupervised World Models for Autonomous Driving Via Discrete Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, we propose Copilot4D, a novel world modeling approach that first tokenizes sensor observations with VQVAE, then predicts the future via discrete diffusion. |
Lunjun Zhang; Yuwen Xiong; Ze Yang; Sergio Casas; Rui Hu; Raquel Urtasun; |
371 | ControlVideo: Training-free Controllable Text-to-video Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To avert the training burden, we propose a training-free ControlVideo to produce high-quality videos based on the provided text prompts and motion sequences. |
Yabo Zhang; Yuxiang Wei; Dongsheng Jiang; XIAOPENG ZHANG; Wangmeng Zuo; Qi Tian; |
372 | Adaptive Window Pruning for Efficient Local Motion Deblurring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to adaptively and efficiently restore high-resolution locally blurred images. |
Haoying Li; Jixin Zhao; Shangchen Zhou; Huajun Feng; Chongyi Li; Chen Change Loy; |
373 | KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by rising concerns around factual incorrectness and hallucinations of LLMs, we present KITAB, a new dataset for measuring constraint satisfaction abilities of language models. |
Marah I Abdin; Suriya Gunasekar; Varun Chandrasekaran; Jerry Li; Mert Yuksekgonul; Rahee Ghosh Peshawaria; Ranjita Naik; Besmira Nushi; |
374 | Provable Compositional Generalization for Object-Centric Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate when compositional generalization is guaranteed for object-centric representations through the lens of identifiability theory. |
Thaddäus Wiedemer; Jack Brady; Alexander Panfilov; Attila Juhos; Matthias Bethge; Wieland Brendel; |
375 | TopoMLP: A Simple Yet Strong Pipeline for Driving Topology Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we first present that the topology score relies heavily on detection performance on lane and traffic elements. Therefore, we introduce a powerful 3D lane detector and an improved 2D traffic element detector to extend the upper limit of topology performance. |
Dongming Wu; Jiahao Chang; Fan Jia; Yingfei Liu; Tiancai Wang; Jianbing Shen; |
376 | Language Model Decoding As Direct Metrics Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we frame decoding from a language model as an optimization problem with the goal of strictly matching the expected performance with human texts measured by multiple metrics of desired aspects simultaneously. |
Haozhe Ji; Pei Ke; Hongning Wang; Minlie Huang; |
377 | Expected Flow Networks in Stochastic Environments and Two-player Zero-sum Games Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose expected flow networks (EFlowNets), which extend GFlowNets to stochastic environments. |
Marco Jiralerspong; Bilun Sun; Danilo Vucetic; Tianyu Zhang; Yoshua Bengio; Gauthier Gidel; Nikolay Malkin; |
378 | Parameter-Efficient Orthogonal Finetuning Via Butterfly Factorization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study a principled finetuning paradigm — Orthogonal Finetuning (OFT) — for downstream task adaptation. |
Weiyang Liu; Zeju Qiu; Yao Feng; Yuliang Xiu; Yuxuan Xue; Longhui Yu; Haiwen Feng; Zhen Liu; Juyeon Heo; Songyou Peng; Yandong Wen; Michael J. Black; Adrian Weller; Bernhard Schölkopf; |
379 | Looped Transformers Are Better at Learning Learning Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the absence of an inherent iterative structure in the transformer architecture presents a challenge in emulating the iterative algorithms, which are commonly employed in traditional machine learning methods. To address this, we propose the utilization of looped transformer architecture and its associated training methodology, with the aim of incorporating iterative characteristics into the transformer architectures. |
Liu Yang; Kangwook Lee; Robert D Nowak; Dimitris Papailiopoulos; |
380 | Revisiting Link Prediction: A Data Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we endeavor to explore principles of link prediction across diverse datasets from a data-centric perspective. |
Haitao Mao; Juanhui Li; Harry Shomer; Bingheng Li; Wenqi Fan; Yao Ma; Tong Zhao; Neil Shah; Jiliang Tang; |
381 | Overthinking The Truth: Understanding How Language Models Process False Demonstrations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study harmful imitation through the lens of a model’s internal representations, and identify two related phenomena: overthinking and false induction heads. |
Danny Halawi; Jean-Stanislas Denain; Jacob Steinhardt; |
382 | Unveiling The Pitfalls of Knowledge Editing for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper pioneers the investigation into the potential pitfalls associated with knowledge editing for LLMs. To achieve this, we introduce new benchmark datasets and propose innovative evaluation metrics. |
Zhoubo Li; Ningyu Zhang; Yunzhi Yao; Mengru Wang; Xi Chen; Huajun Chen; |
383 | ImageNet-OOD: Deciphering Modern Out-of-Distribution Detection Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To aid our investigations, we present ImageNet-OOD, a clean semantic shift dataset that minimizes the interference of covariate shift. |
William Yang; Byron Zhang; Olga Russakovsky; |
384 | Think Before You Speak: Training Language Models With Pause Tokens Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We empirically evaluate $\textit{pause-training}$ on decoder-only models of 1B and 130M parameters with causal pretraining on C4, and on downstream tasks covering reasoning, question-answering, general understanding and fact recall. |
Sachin Goyal; Ziwei Ji; Ankit Singh Rawat; Aditya Krishna Menon; Sanjiv Kumar; Vaishnavh Nagarajan; |
385 | Language Model Detectors Are Easily Optimized Against Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we demonstrate a data-efficient attack that fine-tunes language models to confuse existing detectors, leveraging recent developments in reinforcement learning of language models. |
Charlotte Nicks; Eric Mitchell; Rafael Rafailov; Archit Sharma; Christopher D Manning; Chelsea Finn; Stefano Ermon; |
386 | A Sublinear Adversarial Training Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we analyze the convergence guarantee of adversarial training procedure on a two-layer neural network with shifted ReLU activation, and shows that only $o(m)$ neurons will be activated for each input data per iteration. |
Yeqi Gao; Lianke Qin; Zhao Song; Yitan Wang; |
387 | RLIF: Interactive Imitation Learning As Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. |
Jianlan Luo; Perry Dong; Yuexiang Zhai; Yi Ma; Sergey Levine; |
388 | Beyond Weisfeiler-Lehman: A Quantitative Framework for GNN Expressiveness Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, such an expressivity measure has notable limitations: it is inherently coarse, qualitative, and may not well reflect practical requirements (e.g., the ability to encode substructures). In this paper, we introduce a novel framework for quantitatively studying the expressiveness of GNN architectures, addressing all the above limitations. |
Bohang Zhang; Jingchu Gai; Yiheng Du; Qiwei Ye; Di He; Liwei Wang; |
389 | FedImpro: Measuring and Improving Client Update in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an alternative perspective on client drift and aim to mitigate it by generating improved local models. |
Zhenheng Tang; Yonggang Zhang; Shaohuai Shi; Xinmei Tian; Tongliang Liu; Bo Han; Xiaowen Chu; |
390 | DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genomes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate that BPE not only overcomes the limitations of k-mer tokenization but also benefits from the computational efficiency of non-overlapping tokenization. Based on these insights, we introduce DNABERT-2, a refined genome foundation model that adapts an efficient tokenizer and employs multiple strategies to overcome input length constraints, reduce time and memory expenditure, and enhance model capability. |
Zhihan Zhou; Yanrong Ji; Weijian Li; Pratik Dutta; Ramana V Davuluri; Han Liu; |
391 | Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the challenges inherent in this task, we propose a GPT based model, Duolando, which autoregressively predicts the subsequent tokenized motion conditioned on the coordinated information of the music, the leader’s and the follower’s movements.To support this task, we first build a large-scale and diverse duet interactive dance dataset, DD100, by recording about 117 minutes of professional dancers’ performances. |
Li Siyao; Tianpei Gu; Zhitao Yang; Zhengyu Lin; Ziwei Liu; Henghui Ding; Lei Yang; Chen Change Loy; |
392 | SOHES: Self-supervised Open-world Hierarchical Entity Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents Self-supervised Open-world Hierarchical Entity Segmentation (SOHES), a novel approach that eliminates the need for human annotations. |
Shengcao Cao; Jiuxiang Gu; Jason Kuen; Hao Tan; Ruiyi Zhang; Handong Zhao; Ani Nenkova; Liangyan Gui; Tong Sun; Yu-Xiong Wang; |
393 | Interpreting CLIP’s Image Representation Via Text-Based Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate the CLIP image encoder by analyzing how individual model components affect the final representation. |
Yossi Gandelsman; Alexei A Efros; Jacob Steinhardt; |
394 | 3D Reconstruction with Generalizable Neural Fields Using Scene Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we introduce training generalizable Neural Fields incorporating scene Priors (NFPs). |
Yang Fu; Shalini De Mello; Xueting Li; Amey Kulkarni; Jan Kautz; Xiaolong Wang; Sifei Liu; |
395 | Bridging State and History Representations: Understanding Self-Predictive RL Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the relationships between these methods and the shared properties among them remain unclear. In this paper, we show that many of these seemingly distinct methods and frameworks for state and history abstractions are, in fact, based on a common idea of self-predictive abstraction. |
Tianwei Ni; Benjamin Eysenbach; Erfan SeyedSalehi; Michel Ma; Clement Gehring; Aditya Mahajan; Pierre-Luc Bacon; |
396 | AUGCAL: Improving Sim2Real Adaptation By Uncertainty Calibration on Augmented Synthetic Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce AUGCAL, a simple training-time patch for unsupervised adaptation that improves Sim2Real adapted models by – (1) reducing overall miscalibration, (2) reducing overconfidence in incorrect predictions and (3) improving confidence score reliability by better guiding misclassification detection – all while retaining or improving Sim2Real performance. |
Prithvijit Chattopadhyay; Bharat Goyal; Boglarka Ecsedi; Viraj Uday Prabhu; Judy Hoffman; |
397 | Denoising Diffusion Bridge Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As such, diffusion models must rely on cumbersome methods like guidance or projected sampling to incorporate this information in the generative process. In our work, we propose Denoising Diffusion Bridge Models (DDBMs), a natural alternative to this paradigm based on *diffusion bridges*, a family of processes that interpolate between two paired distributions given as endpoints. |
Linqi Zhou; Aaron Lou; Samar Khanna; Stefano Ermon; |
398 | The Consensus Game: Language Model Generation Via Equilibrium Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new, a training-free, game-theoretic procedure for language model decoding. |
Athul Paul Jacob; Yikang Shen; Gabriele Farina; Jacob Andreas; |
399 | Modeling Boundedly Rational Agents with Latent Inference Budgets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In standard models of bounded rationality, sub-optimal decision-making is simulated by adding homoscedastic noise to optimal decisions rather than actually simulating constrained inference. In this work, we introduce a latent inference budget model (L-IBM) that models these constraints explicitly, via a latent variable (inferred jointly with a model of agents’ goals) that controls the runtime of an iterative inference algorithm. |
Athul Paul Jacob; Abhishek Gupta; Jacob Andreas; |
400 | CALICO: Self-Supervised Camera-LiDAR Contrastive Pre-training for BEV Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we introduce CALICO, a novel framework that applies contrastive objectives to both LiDAR and camera backbones. |
Jiachen Sun; Haizhong Zheng; Qingzhao Zhang; Atul Prakash; Zhuoqing Mao; Chaowei Xiao; |
401 | AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Given that vector graphics are typically encoded using low-level graphics primitives, generating them directly is difficult. To address this, we propose the use of TikZ, a well-known abstract graphics language that can be compiled to vector graphics, as an intermediate representation of scientific figures. |
Jonas Belouadi; Anne Lauscher; Steffen Eger; |
402 | BEND: Benchmarking DNA Language Models on Biologically Meaningful Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we introduce **BEND**, a **BEN**chmark for **D**NA language models, featuring a collection of realistic and biologically meaningful downstream tasks defined on the human genome. |
Frederikke Isa Marin; Felix Teufel; Marc Horlacher; Dennis Madsen; Dennis Pultz; Ole Winther; Wouter Boomsma; |
403 | Scalable Diffusion for Materials Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For instance, common metrics such as the reconstruction error do not correlate well with the downstream goal of discovering novel stable materials. In this work, we tackle the scalability challenge by developing a unified crystal representation that can represent any crystal structure (UniMat), followed by training a diffusion probabilistic model on these UniMat representations. |
Sherry Yang; KwangHwan Cho; Amil Merchant; Pieter Abbeel; Dale Schuurmans; Igor Mordatch; Ekin Dogus Cubuk; |
404 | Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Once attackers maliciously induce high energy consumption and latency time (energy-latency cost) during inference of VLMs, it will exhaust computational resources. In this paper, we explore this attack surface about availability of VLMs and aim to induce high energy-latency cost during inference of VLMs. |
Kuofeng Gao; Yang Bai; Jindong Gu; Shu-Tao Xia; Philip Torr; Zhifeng Li; Wei Liu; |
405 | Ground-A-Video: Zero-shot Grounded Video Editing Using Text-to-image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a novel grounding-guided video-to-video translation framework called Ground-A-Video for multi-attribute video editing. |
Hyeonho Jeong; Jong Chul Ye; |
406 | LUT-GEMM: Quantized Matrix Multiplication Based on LUTs for Efficient Inference in Large-Scale Generative Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce LUT-GEMM, an efficient kernel for quantized matrix multiplication, which not only eliminates the resource-intensive dequantization process but also reduces computational costs compared to previous kernels for weight-only quantization. |
Gunho Park; Baeseong park; Minsub Kim; Sungjae Lee; Jeonghoon Kim; Beomseok Kwon; Se Jung Kwon; Byeongwook Kim; Youngjoo Lee; Dongsoo Lee; |
407 | Curiosity-driven Red-teaming for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current RL methods are only able to generate a small number of effective test cases resulting in a low coverage of the span of prompts that elicit undesirable responses from the target LLM. To overcome this limitation, we draw a connection between the problem of increasing the coverage of generated test cases and the well-studied approach of curiosity-driven exploration that optimizes for novelty. |
Zhang-Wei Hong; Idan Shenfeld; Tsun-Hsuan Wang; Yung-Sung Chuang; Aldo Pareja; James R. Glass; Akash Srivastava; Pulkit Agrawal; |
408 | Hypothesis Search: Inductive Reasoning with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to improve the inductive reasoning ability of LLMs by generating explicit hypotheses at multiple levels of abstraction: we prompt the LLM to propose multiple abstract hypotheses about the problem, in natural language, then implement the natural language hypotheses as concrete Python programs. |
Ruocheng Wang; Eric Zelikman; Gabriel Poesia; Yewen Pu; Nick Haber; Noah Goodman; |
409 | Path Choice Matters for Clear Attributions in Path Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the ambiguity, we introduce Concentration Principle, which centrally allocates high attributions to indispensable features, thereby endowing aesthetic and sparsity. |
Borui Zhang; Wenzhao Zheng; Jie Zhou; Jiwen Lu; |
410 | Masked Audio Generation Using A Single Non-Autoregressive Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of audio tokens. |
Alon Ziv; Itai Gat; Gael Le Lan; Tal Remez; Felix Kreuk; Jade Copet; Alexandre Défossez; Gabriel Synnaeve; Yossi Adi; |
411 | Diffusion Generative Flow Samplers: Improving Learning Signals Through Partial Trajectory Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Diffusion Generative Flow Samplers (DGFS), a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments, via parameterizing an additional “flow function”. |
Dinghuai Zhang; Ricky T. Q. Chen; Cheng-Hao Liu; Aaron Courville; Yoshua Bengio; |
412 | Gen-Z: Generative Zero-Shot Text Classification with Contextualized Label Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Language model (LM) prompting—a popular paradigm for solving NLP tasks—has been shown to be susceptible to miscalibration and brittleness to slight prompt variations, caused by its discriminative prompting approach, i.e., predicting the label given the input. To address these issues, we propose Gen-Z—a generative prompting framework for zero-shot text classification. |
Sachin Kumar; Chan Young Park; Yulia Tsvetkov; |
413 | Efficient Subgraph GNNs By Learning Effective Selection Policies Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider the problem of learning to select a small subset of the large set of possible subgraphs in a data-driven fashion. |
Beatrice Bevilacqua; Moshe Eliasof; Eli Meirom; Bruno Ribeiro; Haggai Maron; |
414 | Does Writing with Language Models Reduce Content Diversity? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we measure the impact of co-writing on diversity via a controlled experiment, where users write argumentative essays in three setups—using a base LLM (GPT3), a feedback-tuned LLM (InstructGPT), and writing without model help.We develop a set of diversity metrics and find that writing with InstructGPT (but not the GPT3) results in a statistically significant reduction in diversity. |
Vishakh Padmakumar; He He; |
415 | CellPLM: Pre-training of Cell Language Model Beyond Single Cells Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, three fundamental differences between single-cell data and natural language data are overlooked: (1) scRNA-seq data are presented as bag-of-genes instead of sequences of RNAs; (2) Cell-cell relations are more intricate and important than inter-sentence relations; and (3) The quantity of single-cell data is considerably inferior to text data, and they are very noisy. In light of these characteristics, we propose a new pre-trained model, $\textit{CellPLM}$, which takes cells as tokens and tissues as sentences. |
Hongzhi Wen; Wenzhuo Tang; Xinnan Dai; Jiayuan Ding; Wei Jin; Yuying Xie; Jiliang Tang; |
416 | MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Therefore, in this paper, we introduce MetaTool, a benchmark designed to evaluate whether LLMs have tool usage awareness and can correctly choose tools.Specifically, we create a dataset called ToolE within the benchmark. |
Yue Huang; Jiawen Shi; Yuan Li; Chenrui Fan; Siyuan Wu; Qihui Zhang; Yixin Liu; Pan Zhou; Yao Wan; Neil Zhenqiang Gong; Lichao Sun; |
417 | Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a general framework named Progressive3D, which decomposes the entire generation into a series of locally progressive editing steps to create precise 3D content for complex prompts, and we constrain the content change to only occur in regions determined by user-defined region prompts in each editing step. |
Xinhua Cheng; Tianyu Yang; Jianan Wang; Yu Li; Lei Zhang; Jian Zhang; Li Yuan; |
418 | De Novo Protein Design Using Geometric Vector Field Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the Vector Field Network (VFN), that enables network layers to perform learnable vector computations between coordinates of frame-anchored virtual atoms, thus achieving a higher capability for modeling frames. |
Weian Mao; Muzhi Zhu; Zheng Sun; Shuaike Shen; Lin Yuanbo Wu; Hao Chen; Chunhua Shen; |
419 | RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current benchmarks mainly focus on single-file tasks, leaving an assessment gap for more complex, real-world, multi-file programming scenarios. To fill this gap, we introduce RepoBench, a new benchmark specifically designed for evaluating repository-level code auto-completion systems. |
Tianyang Liu; Canwen Xu; Julian McAuley; |
420 | Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations. |
Woomin Song; Seunghyuk Oh; Sangwoo Mo; Jaehyung Kim; Sukmin Yun; Jung-Woo Ha; Jinwoo Shin; |
421 | QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a quantization-aware low-rank adaptation (QA-LoRA) algorithm. |
Yuhui Xu; Lingxi Xie; Xiaotao Gu; Xin Chen; Heng Chang; Hengheng Zhang; Zhengsu Chen; XIAOPENG ZHANG; Qi Tian; |
422 | Time-LLM: Time Series Forecasting By Reprogramming Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Time-LLM, a reprogramming framework to repurpose LLMs for general time series forecasting with the backbone language models kept intact. |
Ming Jin; Shiyu Wang; Lintao Ma; Zhixuan Chu; James Y. Zhang; Xiaoming Shi; Pin-Yu Chen; Yuxuan Liang; Yuan-Fang Li; Shirui Pan; Qingsong Wen; |
423 | DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Alternative methods, like sending data to the model’s provider for training, intensify these privacy issues facing an untrusted provider. In this paper, we present a novel solution called Differentially-Private Offsite Prompt Tuning (DP-OPT) to address this challenge. |
Junyuan Hong; Jiachen T. Wang; Chenhui Zhang; Zhangheng LI; Bo Li; Zhangyang Wang; |
424 | Teaching Language Models to Hallucinate Less with Synthetic Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that reducing hallucination on a _synthetic task_ can also reduce hallucination on real-world downstream tasks. |
Erik Jones; Hamid Palangi; Clarisse Simões Ribeiro; Varun Chandrasekaran; Subhabrata Mukherjee; Arindam Mitra; Ahmed Hassan Awadallah; Ece Kamar; |
425 | Flow Matching on General Geometries Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Riemannian Flow Matching (RFM), a simple yet powerful framework for training continuous normalizing flows on manifolds. |
Ricky T. Q. Chen; Yaron Lipman; |
426 | How to Catch An AI Liar: Lie Detection in Black-Box LLMs By Asking Unrelated Questions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we develop a simple lie detector that requires neither access to the LLM�s activations (black-box) nor ground-truth knowledge of the fact in question. |
Lorenzo Pacchiardi; Alex James Chan; Sören Mindermann; Ilan Moscovitz; Alexa Yue Pan; Yarin Gal; Owain Evans; Jan M. Brauner; |
427 | SLiMe: Segment Like Me Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by these advancements, we explore leveraging these vision-language models for segmenting images at any desired granularity using as few as one annotated sample. We propose SLiMe, which frames this problem as an optimization task. |
Aliasghar Khani; Saeid Asgari; Aditya Sanghi; Ali Mahdavi Amiri; Ghassan Hamarneh; |
428 | Linear Attention Is (maybe) All You Need (to Understand Transformer Optimization) Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Consequently, the results obtained in this paper suggest that a simple linearized Transformer model could actually be a valuable, realistic abstraction for understanding Transformer optimization. |
Kwangjun Ahn; Xiang Cheng; Minhak Song; Chulhee Yun; Ali Jadbabaie; Suvrit Sra; |
429 | LpNTK: Better Generalisation with Less Data Via Sample Interaction During Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we start from approximating the interaction between samples, i.e. how learning one sample would modify the model’s prediction on other samples. |
Shangmin Guo; Yi Ren; Stefano V Albrecht; Kenny Smith; |
430 | How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study a generalization of attention which captures triple-wise correlations. |
Josh Alman; Zhao Song; |
431 | WildChat: 1M ChatGPT Interaction Logs in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite their widespread use, there remains a lack of public datasets showcasing how these tools are used by a population of users in practice. To bridge this gap, we offered free access to ChatGPT for online users in exchange for their affirmative, consensual opt-in to anonymously collect their chat transcripts and request headers. |
Wenting Zhao; Xiang Ren; Jack Hessel; Claire Cardie; Yejin Choi; Yuntian Deng; |
432 | SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and evaluate their social intelligence. |
Xuhui Zhou; Hao Zhu; Leena Mathur; Ruohong Zhang; Haofei Yu; Zhengyang Qi; Louis-Philippe Morency; Yonatan Bisk; Daniel Fried; Graham Neubig; Maarten Sap; |
433 | How Do Language Models Bind Entities in Context? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show, via causal experiments, that LMs’ internal activations represent binding information by exhibiting appropriate binding ID vectors at the entity and attribute positions. |
Jiahai Feng; Jacob Steinhardt; |
434 | Cameras As Rays: Pose Estimation Via Ray Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast to existing approaches that pursue top-down prediction of global parametrizations of camera extrinsics, we propose a distributed representation of camera pose that treats a camera as a bundle of rays. |
Jason Y. Zhang; Amy Lin; Moneish Kumar; Tzu-Hsuan Yang; Deva Ramanan; Shubham Tulsiani; |
435 | RingAttention with Blockwise Transformers for Near-Infinite Context Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel approach, Blockwise RingAttention, which leverages blockwise computation of self-attention and feedforward to distribute long sequences across multiple devices while fully overlapping the communication of key-value blocks with the computation of blockwise attention. |
Hao Liu; Matei Zaharia; Pieter Abbeel; |
436 | Chain of Hindsight Aligns Language Models with Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nonetheless, these methods are either founded on hand-picked model generations that are favored by human annotators, rendering them inefficient in terms of data utilization and challenging to apply in general, or they depend on reinforcement learning, which often suffers from imperfect reward functions and relies on extremely challenging optimizations. In this work, we propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity. |
Hao Liu; Carmelo Sferrazza; Pieter Abbeel; |
437 | Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work aims at decreasing the end-to-end generation latency of large language models (LLMs). |
Xuefei Ning; Zinan Lin; Zixuan Zhou; Zifu Wang; Huazhong Yang; Yu Wang; |
438 | Magnushammer: A Transformer-Based Approach to Premise Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel approach to premise selection, a crucial reasoning task in automated theorem proving. |
Maciej Mikuła; Szymon Tworkowski; Szymon Antoniak; Bartosz Piotrowski; Albert Q. Jiang; Jin Peng Zhou; Christian Szegedy; Łukasz Kuciński; Piotr Miłoś; Yuhuai Wu; |
439 | Detecting, Explaining, and Mitigating Memorization in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a straightforward yet effective method for detecting memorized prompts by inspecting the magnitude of text-conditional predictions. |
Yuxin Wen; Yuchen Liu; Chen Chen; Lingjuan Lyu; |
440 | InfoBatch: Lossless Training Speed Up By Unbiased Dynamic Data Pruning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This could lead to gradient expectation bias compared to the original data. To solve this problem, we propose InfoBatch, a novel framework aiming to achieve lossless training acceleration by unbiased dynamic data pruning. |
Ziheng Qin; Kai Wang; Zangwei Zheng; Jianyang Gu; Xiangyu Peng; xu Zhao Pan; Daquan Zhou; Lei Shang; Baigui Sun; Xuansong Xie; Yang You; |
441 | Generalization in Diffusion Models Arises from Geometry-adaptive Harmonic Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we show that two DNNs trained on non-overlapping subsets of a dataset learn nearly the same score function, and thus the same density, when the number of training images is large enough. |
Zahra Kadkhodaie; Florentin Guth; Eero P Simoncelli; Stéphane Mallat; |
442 | Model Merging By Uncertainty-Based Gradient Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we connect the inaccuracy of weighted-averaging to mismatches in the gradients and propose a new uncertainty-based scheme to improve the performance by reducing the mismatch. |
Nico Daheim; Thomas Möllenhoff; Edoardo Ponti; Iryna Gurevych; Mohammad Emtiyaz Khan; |
443 | Beyond Task Performance: Evaluating and Reducing The Flaws of Large Multimodal Models with In-context-learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Indeed, task performances (e.g., VQA accuracy) alone do not provide enough clues to understand their real capabilities, limitations, and to which extent such models are aligned to human expectations. To refine our understanding of those flaws, we deviate from the current evaluation paradigm, and (1) evaluate 10 recent open-source LMMs from 3B up to 80B parameter scale, on 5 different axes; hallucinations, abstention, compositionality, explainability and instruction following. Our evaluation on these axes reveals major flaws in LMMs. While the current go-to solution to align these models is based on training, such as instruction tuning or RLHF, we rather (2) explore the training-free in-context learning (ICL) as a solution, and study how it affects these limitations. |
Mustafa Shukor; Alexandre Rame; Corentin Dancette; Matthieu Cord; |
444 | Large Language Model Cascades with Mixture of Thought Representations for Cost-Efficient Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we are motivated to study building an LLM cascade to save the cost of using LLMs, particularly for performing (e.g., mathematical, causal) reasoning tasks. |
Murong Yue; Jie Zhao; Min Zhang; Liang Du; Ziyu Yao; |
445 | Seer: Language Instructed Video Prediction with Latent Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, text-conditioned video prediction (TVP) is an essential task to facilitate general robot policy learning. To tackle this task and empower robots with the ability to foresee the future, we propose a sample and computation-efficient model, named Seer, by inflating the pretrained text-to-image (T2I) stable diffusion models along the temporal axis. |
Xianfan Gu; Chuan Wen; Weirui Ye; Jiaming Song; Yang Gao; |
446 | The Effective Horizon Explains Deep RL Performance in Stochastic Environments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new RL algorithm, SQIRL, that iteratively learns a near-optimal policy by exploring randomly to collect rollouts and then performing a limited number of steps of fitted-Q iteration over those roll- outs. |
Cassidy Laidlaw; Banghua Zhu; Stuart Russell; Anca Dragan; |
447 | The LLM Surgeon Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We provide a general framework for unstructured, semi-structured and structured pruning and improve upon weight updates to capture more correlations between weights, while remaining computationally efficient. |
Tycho F. A. van der Ouderaa; Markus Nagel; Mart Van Baalen; Tijmen Blankevoort; |
448 | MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The recently released GPT-4 Code Interpreter has demonstrated remarkable proficiency in solving challenging math problems, primarily attributed to its ability to seamlessly reason with natural language, generate code, execute code, and continue reasoning based on the execution output. In this paper, we present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations and, consequently, enhancing their mathematical reasoning abilities. |
Ke Wang; Houxing Ren; Aojun Zhou; Zimu Lu; Sichun Luo; Weikang Shi; Renrui Zhang; Linqi Song; Mingjie Zhan; Hongsheng Li; |
449 | Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, their proficiency within specialized domains such as biomolecular studies remains limited. To address this challenge, we introduce Mol-Instructions, a comprehensive instruction dataset designed for the biomolecular domain. |
Yin Fang; Xiaozhuan Liang; Ningyu Zhang; Kangwei Liu; Rui Huang; Zhuo Chen; Xiaohui Fan; Huajun Chen; |
450 | Domain-Agnostic Molecular Generation with Chemical Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, despite the potential of language models in molecule generation, they face challenges such as generating syntactically or chemically flawed molecules, having narrow domain focus, and struggling to create diverse and feasible molecules due to limited annotated data or external molecular databases. To tackle these challenges, we introduce MolGen, a pre-trained molecular language model tailored specifically for molecule generation. |
Yin Fang; Ningyu Zhang; Zhuo Chen; Lingbing Guo; Xiaohui Fan; Huajun Chen; |
451 | Deep Neural Networks Tend To Extrapolate Predictably Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, we find that this value often closely approximates the optimal constant solution (OCS), i.e., the prediction that minimizes the average loss over the training data without observing the input. We present results showing this phenomenon across 8 datasets with different distributional shifts (including CIFAR10-C and ImageNet-R, S), different loss functions (cross entropy, MSE, and Gaussian NLL), and different architectures (CNNs and transformers). |
Katie Kang; Amrith Setlur; Claire Tomlin; Sergey Levine; |
452 | An Emulator for Fine-tuning Large Language Models Using Small Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While it has been hypothesized that knowledge and skills come from pre-training, and fine-tuning mostly filters this knowledge and skillset, this intuition has not been extensively tested. To aid in doing so, we introduce a novel technique for decoupling the knowledge and skills gained in these two stages, enabling a direct answer to the question, *What would happen if we combined the knowledge learned by a large model during pre-training with the knowledge learned by a small model during fine-tuning (or vice versa)? |
Eric Mitchell; Rafael Rafailov; Archit Sharma; Chelsea Finn; Christopher D Manning; |
453 | AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce AGILE3D, an efficient, attention-based model that (1) supports simultaneous segmentation of multiple 3D objects, (2) yields more accurate segmentation masks with fewer user clicks, and (3) offers faster inference. |
Yuanwen Yue; Sabarinath Mahadevan; Jonas Schult; Francis Engelmann; Bastian Leibe; Konrad Schindler; Theodora Kontogianni; |
454 | $\infty$-Diff: Infinite Resolution Diffusion with Subsampled Mollified States Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces $\infty$-Diff, a generative diffusion model defined in an infinite-dimensional Hilbert space, which can model infinite resolution data. |
Sam Bond-Taylor; Chris G. Willcocks; |
455 | Rethinking Model Ensemble in Transfer-based Adversarial Attacks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we rethink the ensemble in adversarial attacks and define the common weakness of model ensemble with two properties: 1) the flatness of loss landscape; and 2) the closeness to the local optimum of each model. |
Huanran Chen; Yichi Zhang; Yinpeng Dong; Xiao Yang; Hang Su; Jun Zhu; |
456 | Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present $\textbf{Matcher}$, a novel perception paradigm that utilizes off-the-shelf vision foundation models to address various perception tasks. |
Yang Liu; Muzhi Zhu; Hengtao Li; Hao Chen; Xinlong Wang; Chunhua Shen; |
457 | Decomposed Diffusion Sampler for Accelerating Large-Scale Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose a novel and efficient diffusion sampling strategy that synergistically combines the diffusion sampling and Krylov subspace methods. |
Hyungjin Chung; Suhyeon Lee; Jong Chul Ye; |
458 | Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study introduces LEGO bricks, which seamlessly integrate Local-feature Enrichment and Global-content Orchestration. |
Huangjie Zheng; Zhendong Wang; Jianbo Yuan; Guanghan Ning; Pengcheng He; Quanzeng You; Hongxia Yang; Mingyuan Zhou; |
459 | On The Markov Property of Neural Algorithmic Reasoning: Analyses and Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A common paradigm in existing designs involves the use of historical embeddings in predicting the results of future execution steps. Our observation in this work is that such historical dependence intrinsically contradicts the Markov nature of algorithmic reasoning tasks. |
Montgomery Bohde; Meng Liu; Alexandra Saxton; Shuiwang Ji; |
460 | The Mechanistic Basis of Data Dependence and Abrupt Learning in An In-context Classification Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose that the sharp transitions in attention-based networks arise due to a specific chain of multi-layer operations necessary to achieve ICL, which is implemented by nested nonlinearities sequentially learned during training. |
Gautam Reddy; |
461 | Manifold Preserving Guided Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Manifold Preserving Guided Diffusion (MPGD), a training-free conditional generation framework that leverages pretrained diffusion models and off-the-shelf neural networks with minimal additional inference cost for a broad range of tasks. |
Yutong He; Naoki Murata; Chieh-Hsin Lai; Yuhta Takida; Toshimitsu Uesaka; Dongjun Kim; Wei-Hsiang Liao; Yuki Mitsufuji; J Zico Kolter; Ruslan Salakhutdinov; Stefano Ermon; |
462 | Bounds on Representation-Induced Confounding Bias for Treatment Effect Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new, representation-agnostic refutation framework for estimating bounds on the representation-induced confounding bias that comes from dimensionality reduction (or other constraints on the representations) in CATE estimation. |
Valentyn Melnychuk; Dennis Frauen; Stefan Feuerriegel; |
463 | Listen, Think, and Understand Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new audio foundation model, called LTU (Listen, Think, and Understand).To train LTU, we created a new OpenAQA-5M dataset consisting of 1.9 million closed-ended and 3.7 million open-ended, diverse (audio, question, answer) tuples, and have used an autoregressive training framework with a perception-to-understanding curriculum. |
Yuan Gong; Hongyin Luo; Alexander H. Liu; Leonid Karlinsky; James R. Glass; |
464 | OpenTab: Advancing Large Language Models As Open-domain Table Reasoners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose OpenTab, an open-domain table reasoning framework powered by LLMs. |
Kezhi Kong; Jiani Zhang; Zhengyuan Shen; Balasubramaniam Srinivasan; Chuan Lei; Christos Faloutsos; Huzefa Rangwala; George Karypis; |
465 | Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Mega-TTS 2, a generic prompting mechanism for zero-shot TTS, to tackle the aforementioned challenges. |
Ziyue Jiang; Jinglin Liu; Yi Ren; Jinzheng He; Zhenhui Ye; Shengpeng Ji; Qian Yang; Chen Zhang; Pengfei Wei; Chunfeng Wang; Xiang Yin; Zejun MA; Zhou Zhao; |
466 | Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Third, existing computer agents rely on task-specific exemplars and overlook the similarity among tasks, resulting in poor generalization to novel tasks. To address these challenges, we introduce Synapse, a computer agent featuring three key components: i) state abstraction, which filters out task-irrelevant information from raw states, allowing more exemplars within the limited context, ii) trajectory-as-exemplar prompting, which prompts the LLM with complete trajectories of the abstracted states and actions to improve multi-step decision-making, and iii) exemplar memory, which stores the embeddings of exemplars and retrieves them via similarity search for generalization to novel tasks. |
Longtao Zheng; Rundong Wang; Xinrun Wang; Bo An; |
467 | SparseFormer: Sparse Visual Recognition Via Limited Latent Tokens Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most current vision networks follow a dense paradigm, processing every single visual unit (such as pixels or patches) in a uniform manner. In this paper, we challenge this dense convention and present a new vision transformer, coined SparseFormer, to explicitly imitate human’s sparse visual recognition in an end-to-end manner. |
Ziteng Gao; Zhan Tong; Limin Wang; Mike Zheng Shou; |
468 | Visual Data-Type Understanding Does Not Emerge from Scaling Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we introduce the novel task of Visual Data-Type Identification, a basic perceptual skill with implications for data curation (e.g., noisy data-removal from large datasets, domains pecific retrieval) and autonomous vision (e.g., distinguishing changing weather conditions from camera lens staining).We develop two datasets consisting of animal images altered across a diverse set of 27 visual data-types, spanning four broad categories. |
Vishaal Udandarao; Max F Burg; Samuel Albanie; Matthias Bethge; |
469 | Quality-Diversity Through AI Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Interestingly, recent developments in language models (LMs) have enabled guiding search through \emph{AI feedback}, wherein LMs are prompted in natural language to evaluate qualitative aspects of text. Leveraging this development, we introduce Quality-Diversity through AI Feedback (QDAIF), wherein an evolutionary algorithm applies LMs to both generate variation and evaluate the quality and diversity of candidate text. |
Herbie Bradley; Andrew Dai; Hannah Benita Teufel; Jenny Zhang; Koen Oostermeijer; Marco Bellagente; Jeff Clune; Kenneth Stanley; Gregory Schott; Joel Lehman; |
470 | AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce AMAGO, an in-context Reinforcement Learning (RL) agent that uses sequence models to tackle the challenges of generalization, long-term memory, and meta-learning. |
Jake Grigsby; Linxi Fan; Yuke Zhu; |
471 | Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces an efficient strategy to transform Large Language Models (LLMs) into Multi-Modal Large Language Models. |
Bingchen Zhao; Haoqin Tu; Chen Wei; Jieru Mei; Cihang Xie; |
472 | Complete and Efficient Graph Transformers for Crystal Material Property Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel approach that utilizes the periodic patterns of unit cells to establish the lattice-based representation for each atom, enabling efficient and expressive graph representations of crystals. |
Keqiang Yan; Cong Fu; Xiaofeng Qian; Xiaoning Qian; Shuiwang Ji; |
473 | TiC-CLIP: Continual Training of CLIP Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language models: TiC-DataComp, TiC-YFCC, and TiC-Redcaps. |
Saurabh Garg; Mehrdad Farajtabar; Hadi Pouransari; Raviteja Vemulapalli; Sachin Mehta; Oncel Tuzel; Vaishaal Shankar; Fartash Faghri; |
474 | Learning Grounded Action Abstractions from Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes Ada (Action Domain Acquisition), a framework for automatically constructing task-specific planning representations using task-general background knowledge from language models (LMs). |
Lionel Wong; Jiayuan Mao; Pratyusha Sharma; Zachary S Siegel; Jiahai Feng; Noa Korneev; Joshua B. Tenenbaum; Jacob Andreas; |
475 | Posterior Sampling Based on Gradient Flows of The MMD with Negative Distance Kernel Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose conditional flows of the maximum mean discrepancy (MMD) with the negative distance kernel for posterior sampling and conditional generative modelling. |
Paul Hagemann; Johannes Hertrich; Fabian Altekrüger; Robert Beinert; Jannis Chemseddine; Gabriele Steidl; |
476 | THOUGHT PROPAGATION: AN ANALOGICAL APPROACH TO COMPLEX REASONING WITH LARGE LANGUAGE MODELS Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing prompting approaches cannot reuse insights of solving similar problems and suffer from accumulated errors in multi-step reasoning, since they prompt LLMs to reason \textit{from scratch}. To address these issues, we propose \textbf{\textit{Thought Propagation} (TP)}, which explores the analogous problems and leverages their solutions to enhance the complex reasoning ability of LLMs. |
Junchi Yu; Ran He; Zhitao Ying; |
477 | FLATTEN: Optical FLow-guided ATTENtion for Consistent Text-to-video Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, for the first time, we introduce optical flow into the attention module in diffusion model’s U-Net to address the inconsistency issue for text-to-video editing. |
Yuren Cong; Mengmeng Xu; christian simon; Shoufa Chen; Jiawei Ren; Yanping Xie; Juan-Manuel Perez-Rua; Bodo Rosenhahn; Tao Xiang; Sen He; |
478 | Be Aware of The Neighborhood Effect: Modeling Selection Bias Under Interference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To fill the gap, this paper formally formulates the neighborhood effect as an interference problem from the perspective of causal inference, and introduces a treatment representation to capture the neighborhood effect. |
Haoxuan Li; Chunyuan Zheng; Sihao Ding; Peng Wu; Zhi Geng; Fuli Feng; Xiangnan He; |
479 | Implicit Neural Representations and The Algebra of Complex Wavelets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although INRs using sinusoidal activation functions have been studied in terms of Fourier theory, recent works have shown the advantage of using wavelets instead of sinusoids as activation functions, due to their ability to simultaneously localize in both frequency and space. In this work, we approach such INRs and demonstrate how they resolve high-frequency features of signals from coarse approximations performed in the first layer of the MLP. |
T Mitchell Roddenberry; Vishwanath Saragadam; Maarten V. de Hoop; Richard Baraniuk; |
480 | Towards Image Compression with Perfect Realism at Ultra-low Bitrates Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve image quality and remove dependency on the bitrate we propose to decode with iterative diffusion models. |
Marlene Careil; Matthew J. Muckley; Jakob Verbeek; Stéphane Lathuilière; |
481 | MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, such step-wise annotation requires heavy labor, leading to insufficient training steps for current benchmarks. To fill this gap, this work introduces MUSTARD, a data generation framework that masters uniform synthesis of theorem and proof data of high quality and diversity. |
Yinya Huang; Xiaohan Lin; Zhengying Liu; Qingxing Cao; Huajian Xin; Haiming Wang; Zhenguo Li; Linqi Song; Xiaodan Liang; |
482 | Object-Aware Inversion and Reassembly for Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a new image editing paradigm, dubbed Object-aware Inversion and Reassembly (OIR), to enable object-level fine-grained editing.To systematically evaluate the effectiveness of our method, we collect two datasets called OIRBench for benchmarking single- and multi-object editing, respectively. |
Zhen Yang; Ganggui Ding; Wen Wang; Hao Chen; Bohan Zhuang; Chunhua Shen; |
483 | Elucidating The Exposure Bias in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Along with the elucidation of exposure bias, we propose a simple, yet effective, training-free method called Epsilon Scaling to alleviate the exposure bias. |
Mang Ning; Mingxiao Li; Jianlin Su; Albert Ali Salah; Itir Onal Ertugrul; |
484 | From Zero to Turbulence: Generative Modeling for 3D Flow Simulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead, we propose to approach turbulent flow simulation as a generative task directly learning the manifold of all possible turbulent flow states without relying on any initial flow state.For our experiments, we introduce a challenging 3D turbulence dataset of high-resolution flows and detailed vortex structures caused by various objects and derive two novel sample evaluation metrics for turbulent flows. |
Marten Lienen; David Lüdke; Jan Hansen-Palmus; Stephan Günnemann; |
485 | Neural Architecture Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing graph pre-training strategies cannot address the computational graph in neural architectures due to the graph size and motifs. To fulfill this potential, we propose to divide the graph into motifs which are used to rebuild the macro graph to tackle these issues, and introduce multi-level contrastive learning to achieve accurate graph representation learning. |
Xiaohuan Pei; Yanxi Li; Minjing Dong; Chang Xu; |
486 | SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify the SequenceMatch-χ2 divergence as a more suitable training objective for autoregressive models which are used for generation. |
Chris Cundy; Stefano Ermon; |
487 | Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose _motion guidance_, a zero-shot technique that allows a user to specify dense, complex motion fields that indicate where each pixel in an image should move. |
Daniel Geng; Andrew Owens; |
488 | LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In response, we present a novel approach leveraging Large Language Models (LLMs) to extract critical components from text prompts, including bounding box coordinates for foreground objects, detailed textual descriptions for individual objects, and a succinct background context. |
Hanan Gani; Shariq Farooq Bhat; Muzammal Naseer; Salman Khan; Peter Wonka; |
489 | LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a simple approach for memory-efficient adaptation of pretrained language models. |
Han Guo; Philip Greengard; Eric Xing; Yoon Kim; |
490 | Un-Mixing Test-Time Normalization Statistics: Combatting Label Temporal Correlation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This oversight leads to skewed BN statistics and undermines the reliability of the model under non-i.i.d. scenarios. To tackle this challenge, this paper presents a novel method termed ‘$\textbf{Un-Mix}$ing $\textbf{T}$est-Time $\textbf{N}$ormalization $\textbf{S}$tatistics’ (UnMix-TNS). |
Devavrat Tomar; Guillaume Vray; Jean-Philippe Thiran; Behzad Bozorgtabar; |
491 | Generative Modeling with Phase Stochastic Bridge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel generative modeling framework grounded in \textbf{phase space dynamics}, where a phase space is defined as {an augmented space encompassing both position and velocity.} |
Tianrong Chen; Jiatao Gu; Laurent Dinh; Evangelos Theodorou; Joshua M. Susskind; Shuangfei Zhai; |
492 | BarLeRIa: An Efficient Tuning Framework for Referring Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing PET approaches primarily focus on recognition tasks and typically support uni-modal optimization, while neglecting dense prediction tasks and vision language interactions. To address this limitation, we propose a novel PET framework called **B**i-direction**a**l Inte**r**twined Vision **L**anguage Effici**e**nt Tuning for **R**eferring **I**mage Segment**a**tion (**BarLeRIa**), which leverages bi-directional intertwined vision language adapters to fully exploit the frozen pre-trained models’ potential in cross-modal dense prediction tasks. |
Yaoming Wang; Jin Li; XIAOPENG ZHANG; Bowen Shi; Chenglin Li; Wenrui Dai; Hongkai Xiong; Qi Tian; |
493 | Relay Diffusion: Unifying Diffusion Process Across Resolutions for Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Relay Diffusion Model (RDM), which transfers a low-resolution image or noise into an equivalent high-resolution one for diffusion model via blurring diffusion and block noise. |
Jiayan Teng; Wendi Zheng; Ming Ding; Wenyi Hong; Jianqiao Wangni; Zhuoyi Yang; Jie Tang; |
494 | Conformal Inductive Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, conventional CP cannot be applied in inductive settings due to the implicit shift in the (calibration) scores caused by message passing with the new nodes. We fix this issue for both cases of node and edge-exchangeable graphs, recovering the standard coverage guarantee without sacrificing statistical efficiency. |
Soroush H. Zargarbashi; Aleksandar Bojchevski; |
495 | DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose DecompOpt, a structure-based molecular optimization method based on a controllable and decomposed diffusion model. |
Xiangxin Zhou; Xiwei Cheng; Yuwei Yang; Yu Bao; Liang Wang; Quanquan Gu; |
496 | Offline RL with Observation Histories: Analyzing and Improving Sample Complexity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We then identify sufficient conditions under which offline RL can still be efficient — intuitively, it needs to learn a compact representation of history comprising only features relevant for action selection. We introduce a bisimulation loss that captures the extent to which this happens, and propose that offline RL can explicitly optimize this loss to aid worst-case sample complexity. |
Joey Hong; Anca Dragan; Sergey Levine; |
497 | Bayesian Neural Controlled Differential Equations for Treatment Effect Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Needless to say, uncertainty quantification is crucial for reliable decision-making in medical applications. To fill this gap, we propose a novel Bayesian neural controlled differential equation (BNCDE) for treatment effect estimation in continuous time. |
Konstantin Hess; Valentyn Melnychuk; Dennis Frauen; Stefan Feuerriegel; |
498 | TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel framework, TEMPO, that can effectively learn time series representations. |
Defu Cao; Furong Jia; Sercan O Arik; Tomas Pfister; Yixiang Zheng; Wen Ye; Yan Liu; |
499 | The Devil Is in The Object Boundary: Towards Annotation-free Instance Segmentation Using Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, in object detection and instance segmentation, two fundamental computer vision tasks heavily reliant on extensive human annotations, foundation models such as SAM and DINO struggle to achieve satisfactory performance. In this study, we reveal that the devil is in the object boundary, $\textit{i.e.}$, these foundation models fail to discern boundaries between individual objects. |
Cheng Shi; Sibei Yang; |
500 | Amortizing Intractable Inference in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem and demonstrate that our approach enables data-efficient adaptation of LLMs to tasks that require multi-step rationalization and tool use. |
Edward J Hu; Moksh Jain; Eric Elmoznino; Younesse Kaddar; Guillaume Lajoie; Yoshua Bengio; Nikolay Malkin; |
This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~2,200 papers), please visit Paper Digest: ICLR-2024 (Full List).